Skewed workloads and Relieving Hot Spots
Imagine you have a library with books categorized by their first letter (A-Z). This is like partitioning data based on a key (like the first letter of a book title).
Problem: One letter (say, "X") becomes super popular (a celebrity author!). Everyone wants to read "X" books, causing a "hot spot" (overcrowding) in the "X" section.
Hashing doesn't fix it: Even if you assign different "buckets" based on a hash of the title, all "X" books still end up in the "X" bucket.
Solution 1: Split the key: Add a random number to each "X" book title (e.g., "X123", "X456"). This spreads them across different buckets, reducing the crowd in "X".
Drawback: Now you need to check all buckets with "X" to find all the books (more work for reading).
Solution 2 (future): Imagine the library magically adjusts shelves based on popularity, automatically spreading out the "X" books.
Takeaway: Choose the solution that best fits your needs. Splitting keys helps with hot spots but adds complexity. Future systems might handle this automatically.
Partitioning and Secondary Indexes
A secondary index usually doesn't identify a record uniquely but rather is a way of searching for occurences of a particular value; find all action by user 123, find all articles containing the word hogwash, find all cars whose color is red. In these statements you can see we use a secondary index for our query.
This is essential for any database design. The issue with secondary indexes are they don't map neatly to partitions. This happens due to non uniqueness, for example find all action by user means find all actions| filter by user, so the find all will go search on all partition. This is a simple example for this problem.
There are two main approaches to partitioning a database with secondary indexes : document-based partitioning and term based partitioning.
No comments:
Post a Comment