Reading from DynamoDB Streams without Lambda

Ivan Drobysh

Oct 30, 20258 min read

CDC in DynamoDB

At some point in monday.com, we wanted to reliably and quickly store our data somewhere. Moreover, we would like to have a near-real-time stream of data changes (CDC) from this storage to decrease read load. DynamoDB looked like a good choice for it. It is fast enough and features the DynamoDB Streams service, which supports CDC listening.

According to the AWS doc, there are several options for leveraging DynamoDB Streams:

Using AWS Lambda directly with DynamoDB Streams or Kinesis. All the complexity of shard management is hidden, and we’ve just been managing Lambda functions.
KCL 1.x, which stands for Kinesis Client Library. Using low-level DynamoDB API, but using an already existing Java open-sourced library.
The most interesting and hardcore one is building our own solution on top of DynamoDB streams’ low-level API and handling all the complexity by ourselves.

While Lambdas seem to be the most logical decision, there are a few reasons why we resigned (but later changed our minds):

Lambdas poll the partition only 4 times per second, which is not frequent enough to be near real-time
Lambdas are not on the golden path in the company, which would require a lot of additional work from our side to introduce them.

Another option is to use the low-level DynamoDB Streams API. The Java library (KCL 1.x) was used at the beginning as a POC, because it was fast enough to build a first version. Later, we implemented our own logic for listening to changes from the DynamoDB table using Go, as Java is not on the gold path in the company.

Building our own DynamoDB Streams listener

The basic implementation logic was based on the Java libraries: https://github.com/awslabs/amazon-kinesis-client and https://github.com/awslabs/dynamodb-streams-kinesis-adapter. However, before diving into the details of the algorithm used, let’s understand what we need to do.

Understanding DynamoDB Streams: Records and Shards

The DynamoDB Streams service provides a low-level API to receive updates about changes in the DynamoDB table. Low level means that you will require additional development to make it work properly. Let’s dive deeper into what the API actually provides us with.

DynamoDB Streams have two main entities:

Record – a single data modification in the DynamoDB table. It has a sequence number that reflects the order in which it was published to the stream
Shard – a group of DynamoDB Stream records.

As said above, DynamoDB Streams records are spread across Shards. Each shard contains updates for a subset of items in the DynamoDB table. Therefore, records might be imbalanced across shards if items in one shard are updated more frequently than those in another. In that case, the former shard might have many more records to fetch.

Shards have a hierarchical structure, and each shard might have one parent and one or two children. This makes sense because shards don’t live forever, but up to 24 hours.
For proper ordering, you must read records from the parent shards first, and then start reading the children. At some point, each shard will be closed, meaning it will no longer receive new records, but it will still be available for reading. To proceed with reading data, you must start reading records from the child (or children, if there are multiple children) shard. Children’s shards will continue to receive records of updates of the same items in the table.

This picture can illustrate how shard management happens in DynamoDB streams:

Here is the API that AWS provides us with, which can help us achieve our goal: ListStreams, DescribeStream, GetShardIterator, and GetRecords. It’s worth mentioning that, in fact, you have just implemented a simple polling by calling GetRecords once per a certain period.

Now, let’s move on to the algorithm itself. We’ve introduced three entities: Leader, Leaser, and Processor, and have called our service Bridge. Let’s review each of them:

Leader

To distribute shards across nodes, we should record them somewhere, so each node will know which shards it can process. There is a DescribeStream API method that will help us achieve the goal. To avoid race conditions on all nodes, it’s better to choose a single node that will be responsible for shard fetching. We called this node Leader.

The leader will read all shards from DynamoDB streams and store them in a storage system; it will also check the previously saved shards and remove any expired ones. To limit the number of leaders, we used a locking mechanism. For that, we chose the DynamoDB table and the dynamolock Go package.

Leaser

The Leaser is responsible for handling leases from the lease pattern. Leases are similar to locks, but they have expiration times. Each shard in DynamoDB will have its lease in storage. The lease is held by the owner, the node of Bridge. The simplified algorithm of Leaser is next:

Calculate the optimal number of shards per node
1. Finish if the node already has the optimal number of shards or more
Take expired leases or shards with no leases (it doesn’t have an owner)
1. Finish if the optimal number is reached
Start stealing shards from those nodes that have more than the optimal number of shards
This algorithm guarantees that all leases will be taken sooner or later, even if some node is dead.

Processor

The Processor is a user logic. It receives the change from the stream and performs an action with it, for example, sending it to Kafka. After the action is completed, this record is checkpointed in the table, which saves us from having to reread it by the processor in the future.

Benchmarking Bridge: Performance and Cost Analysis

Our implementation has already been tested in production. Bridge was introduced to tables with up to a few thousand partitions. To understand if it works correctly, we have also introduced a few core metrics:

Round-trip time – the time from the moment that a change appeared in the table till the moment when it’s read by Bridge.
Cost – how much does it cost to maintain the Bridge, and how does the scale increase the cost

Round-trip time

Once we completed our first implementation, we’re set to receive a request every 20ms to the Streams API, which makes our solution near-real-time. However, the cost of this approach was too high. That’s why we later increased it to 250ms. Quite similar to Lambda, isn’t it? We will cover this in the plan section. The expected round-trip time should be about 250-300 ms in the worst-case scenario and even less on average. To measure this, we’re setting a timestamp column at the moment of writing and using it to calculate the round-trip time. (point about clocks in servers). Here are the results of the trip time:

As you can see, our p50 and p95 values are as expected, but we have some spikes on p99, which may take up to several seconds.

Cost

The Bridge cost consists of several things:

Computation
Lease and leader tables cost. This is a service table, which we need to have in place to work with the DynamoDB Streams API.
DynamoDB Streams API

While computation depends on where you run your application, service tables have two other options that are fully dependent on the algorithm you use.

We were surprised that DynamoDB Streams API is the biggest contributor to our costs. We make four requests per second for each partition, which means 4,000 requests per second per table, or between $2,000 and $3,000, depending on the region, which is a substantial amount. Remember, we had one request every 20ms? Also, scaling the number of partitions will increase the cost linearly.

Next Steps

There are several directions in which we would like to invest our time regarding DynamoDB streams.

Performance improvement

As shown above, there are still spikes in the p99 of the trip time. Some of them are dead pods, others are due to a lack of graceful shutdown, and the third might be related to a third-party service that blocks updates from being read. The goal is to have a flat p99, which is close to the polling period.

Additionally, there is room for optimization in the usage of the lease/leader tables. While it may not affect performance, it could decrease the cost of these tables.

Data loss

We have indirect metrics regarding the level of integrity we possess, but they are not yet production-ready. We plan to have a solid metric that will show us how much data (we hope that 0) is being lost.

Cost

One of the biggest concerns about the current implementation is cost. While it might work fast, scaling the number of partitions becomes very expensive. The reading streams API doesn’t seem to have any possibility to reduce costs without impacting round-trip time.

One way to bypass this API is to use Lambda functions 😀. I know, we’ve resigned to it at the beginning, but now it seems like a good compromise between price and performance. It might poll a partition up to 4 times a second, and we pay only for execution time. So, no additional cost for Streams API, no support for additional tables. We’ve already started discovery, and it looks promising from a cost perspective.

While building a service based on the low-level API of DynamoDB streams is fun and cost-effective for small tables, it might become an expensive proposition for huge tables. You should have reasons to invest time in it, rather than just using some built-in options, such as Lambda integration.

Reading from DynamoDB Streams without Lambda

CDC in DynamoDB

Building our own DynamoDB Streams listener

Understanding DynamoDB Streams: Records and Shards

Leader

Leaser

Processor

Benchmarking Bridge: Performance and Cost Analysis

Round-trip time

Cost

Next Steps

Performance improvement

Data loss

Cost

Read More

Related Post

A Developer’s Guide to Growth Methodology

Related Post

Unmasking a hidden singleton

Related Post

Navigating cross-platform A/B testing