
100 ways to fail with DynamoDB indexing
Working with DynamoDB can feel like opening Pandora’s box of indexing pitfalls. Sure, it’s a high-performance key-value database — and you don’t get much more than that out of the box. The moment you need to go beyond simple key-value lookups, custom indexes become mandatory. At monday.com, we use DynamoDB heavily (including in our next-gen mondayDB), and we’ve wrestled with these challenges firsthand. If you think indexing relational databases is hard, DynamoDB has way more pitfalls hiding under the hood.
This is not your typical “what’s the difference between local and global secondary index” type of article. Also, we won’t cover how data partitioning affects performance. We assume you know that already. What we want to cover are surprises, unexpected behaviors, and unobvious limitations that we faced when indexing our dataset. Maybe not one hundred of them, but a few of the most important ones.
If you consider using either of these indexes, watch out for pitfalls and some leaky abstractions. Unfortunately, our data access pattern requires one additional index to support less usual queries. We’d like to take you on a journey of our discoveries related to indexing in DynamoDB.
LSI limits the available partition size
Let’s begin with Local Secondary Indexes (LSI). They basically allow searching or sorting by a different attribute other than the sort key. To support it, they are stored right next to the record, as a copy of all projected attributes. Having data and index next to each other takes advantage of data locality on a single storage node.
However, each partition in DynamoDB has a hard size limit of 10GiB of data. What happens to that limit when LSI shares the same partition? In the worst-case scenario, it shrinks by half, leaving you only 5 GiB of useful space. It’s less if you don’t project all attributes into your index, but that creates an issue on its own. More on that later. Also, each new LSI further shrinks your effective partition size.
Moreover, the maximum size of a single item is 400 KiB. This is quite a lot, and you should probably never hit that limit. It’s a better idea to store such large pieces of data externally, e.g., on S3. But this capacity is also used by the corresponding LSI entries. So, again, with each new LSI, your maximum item size also shrinks. It’s like your integrated graphics card is eating up main RAM rather than having its own.
LSI eats your throughput (and dollars)
LSI limits your physical storage, but it’s not the worst part. When updating data indexed by LSI, the index must be updated synchronously as well. Thanks to this behavior, LSI is strongly consistent and atomic. But this obviously has a performance cost. More importantly, it consumes your Write Capacity Units (WCU). So you literally pay more for each write, depending on the dataset size.
And it gets worse! When defining LSI, you must define which attributes are “projected” onto the index. In simple terms, you choose which attributes are copied verbatim from the main table to the index. So, it’s a denormalization that occurs behind the scenes.
N+1 problem strikes back
The decision which attributes are projected has a tremendous impact. If you query through an LSI and ask for even one attribute that’s not part of the LSI, DynamoDB will make one extra query for each returned result. This is the classic N+1 problem. You don’t just make one query from the LSI; you also make N queries for each row returned from the index.
As usual, it’s not only performance, but also the Read Capacity Unit (RCU) that is being impacted. Long story short, you might end up paying orders of magnitude more without any performance improvement. In our case, we ended up paying 10x more for one missing attribute.
The snippet below shows how many capacity units were used before the fix:
{
"Count": 488,
"ScannedCount": 488,
"ConsumedCapacity": {
"TableName": "[redacted]",
"CapacityUnits": 497.0
}
After adding relevant attributes to LSI, it was closer to 40 (vs. 497).
Oh, and remember, you have to make all these decisions in advance. Adding LSI (up to 5 per table) is only possible when the table is created. And they can’t be changed or removed afterwards. If you change your mind, you have to go through a rather painful data migration on a live database.
How partitions get split
Let’s assume you designed your LSI properly and you are fine with the performance so far. Now, let’s talk about data partitioning just a little bit.
As you probably already know, compound primary keys in DynamoDB consist of a partition key and a sort key. In general, items with the same partition key end up in the same partition. Also, items with different partition keys might end up in the same partition if they hash to the same value.
LSI prevents partition key splits
Now, let’s bust a few misconceptions. First of all, the partition key does not determine the partition the item is stored in. If, at some point, one partition becomes “hot”, DynamoDB might have to reshuffle your data. This is more common if some partition keys are more popular than others. In such cases, DynamoDB might decide to split the partition into two. This is completely transparent to you, but it allows a better spread of load and data among nodes. Under high load with hot partitions, you might observe a sudden performance surge after a few minutes due to a partition split.
Split for heat and beyond
But DynamoDB is even more clever. All items with the same partition key are named an item collection. If your partition key is disastrously bad/popular, you might end up with a single partition fully occupied by just one or a few item collections. In the worst-case scenario, imagine all your items having the same partition key (if your data model is really poor). At this point, DynamoDB will split the partition even further, placing roughly half of your item collection in each partition.
So no matter how badly you model your compound primary key, the database will eventually give you the best performance. In practice, the database splits the item collection by sort key, so lower keys are stored on one partition (node), whereas higher keys are stored on the other.
Of course, if you don’t use LSI. LSI prevents splitting the same partition key into multiple partitions. So all your bets are off.
GSI does not affect the main table (?)
Now, when we have LSI covered, let’s jump into Global secondary indexes (GSI), which are very different from LSI. They are stored in a separate set of partitions from the data they index. As a matter of fact, GSI is pretty much a copy of your original table with different keys in a different table. From our perspective, the biggest difference is the consistency guarantees.
LSI is updated synchronously during write; GSI, on the other hand, is updated asynchronously through a background process. This means that GSI is only eventually consistent, whereas LSI is strongly consistent.
Except when it does: replication lag can cause throttling
Superficially, you might think that GSI has no impact on writes because it’s updated asynchronously. It’s technically true, unless updating GSI lags behind for some reason. One such reason might be insufficient write capacity on the GSI table. Yes, it’s configurable separately from the original table. But that’s not the point. If GSI lags behind, it will throttle writes to… the main table. Which means for the reasons you might not have full control over (or awareness), you get client-facing errors.
DynamoDB will try to keep up with the main table, but it’s not guaranteed. Updates are queued and processed by a component named Log Propagator. The lag between the main table and GSI update is not documented, and there are no metrics to track it.
GSI replication is unordered
The term “global” in GSI only means you can index the entire dataset. You might think that it also guarantees the order of updates to GSI (with a lag, but still). Sadly, that might not be the case while the index is being updated. Changes to the main table might be propagated out-of-order, or even not monotonically, to GSI.
Just to be clear, this means if you change the indexed attribute from A to B, subsequent reads from GSI might return (in that order): A, B, A, B (you briefly see the old value after seeing the new value). The only guarantee is that the data will be eventually consistent.
Do I need an index?
The bottom line of DynamoDB indexing – think very hard in advance about your data model. Specifically, define your compound primary key in such a way that indexes aren’t necessary. Fetching by primary key or sorting by sort key is as performant as the name implies.
If you really need alternative query patterns, take into account all the limitations of LSI or GSI. Some drawbacks might be such a deal breaker that you should consider denormalization or alternative storage.
If you really believe you need LSI, at least think twice about the attributes you want to project. In the case of GSI, your application must be able to tolerate eventual consistency and occasional lag.