Today, we are launching read connectors for processing data in DynamoDB Tables and Streams. At a glance, here is what we are launching:
Dynamo DB Table Read Connector: Scans the table and send the records to compute engine / write connectors.
Interface (Java, Python, Javascript) and Example Implementations (Java, Python, Javascript)
Dynamo DB Streams Read Connector: Similar to the Kinesis Read Connector, this read connector reads from the Dynamo DB Streams and processes the records according to the customer’s dataset configuration.
Interface (Java, Python, Javascript) and Example Implementations (Java, Python, Javascript)
Details for these connectors are as follows:
DynamoDB Table Read Connector
DynamoDB Table Read Connectors Scan the DynamoDB Tables. You can use #LetsData connector to scan DynamoDB tables for scenarios such as populating caches, running Gen AI pipelines or any aggregation jobs.
Here are some notable implementation highlights are:
Configurable Scanning Speed: Customers can configure the scanning speed by specifying the number of reader tasks (parallel scan segments), which in conjunction with the configured concurrency can aggressively scan the table or be set up as a low priority scan
Item Filtering: We’ve enabled support for DynamoDB’s filter expressions to filter items that are read
Scan Completion Conditions: we’ve enabled a couple of modes for scan completion
SingleTableScan: Dataset completes once a single scan of the table is completed
Continuous: Dataset continuously scans the table until the dataset errors, is stopped or is deleted.
Simple Data Interfaces: we’ve defined a simple interface that customers can implement and it is available in our supported languages:
Getting started with the Dynamo DB Table Read Connector is simple.
Implement the interface, here is a simple python implementation:
define the read connector (tableName and artifact) and manifest for the Dynamo DB Table - in this example, we are selecting english language items from the DynamoDB Table by specifying the
readerFilterExpression
Scanning a table reliably at scale is an interesting challenge and what we’ve built is pretty compelling in our humble opinions. We have some futuristic ideas on how to further improve the DynamoDB Table read connector - we’d love to hear user’s experiences so that we can iterate and improve.
DynamoDB Streams Read Connector
DynamoDB Streams are fantastic! When they were launched, I remember being amazed by the new development scenarios that streams enabled - database architectures had instantly transformed from querying models to event driven architectures. With the #LetsData DynamoDB Streams read connector, you can process these events at scale. Here are some notable callouts:
Database Changelog: DynamoDB Streams are essentially a
changelog
where every insert/update/delete is available as an ordered stream of records.Kinesis / Dynamo DB Streams: The #LetsData implementation for the DynamoDB Streams Read Connector is very similar to the Kinesis Read Connector with the notable difference being the availability of
before modification
andafter modification
item images in the stream record. This lends to a slightly verbose #LetsData interfaceDesigning for Ephemeral Shards: One interesting challenge in developing for Kinesis / DynamoDB streams was that #LetsData defines a single task for each Kinesis / DynamoDB shard. However, the shards are ephemeral - a shard can split into child shards or adjacent shards can be merged into a single shard. This essentially means that existing tasks complete and new tasks need to be created for the new shards. We’ve added code upon each tasks completion to detect if there are shards that haven’t been assigned to tasks and create new tasks if needed.
In all our use cases thus far, we’d defined the complete work specification at dataset creation time. These instances have made us monitor for changes in work specifications and add / subtract tasks as needed.
LetsData Interfaces: The #LetsData interfaces are defined in different supported languages as follows:
Configuration: The read connector configuration is simple - the read connector requires the
streamArn
and the manifest defines 1.) configuration for the start point (Earliest
/Latest
) 2.) completion conditions such asStopWhenNoData
orContinuous
. Here are some example configs:
Performance: We’ve seen quite nice throughput and latency numbers which are comparable to similar read connectors.
Conclusion
With the Scan and Stream read connectors, any data that is in a Dynamo DB table can be processed by LetsData using different models - data pull using querying and data push with event streams. In addition to our already existing Dynamo DB Write Connector, and the nice performance numbers that we see, you can reliably use #LetsData and Dynamo DB as the central components in your data architecture.
Let us know, we’d love to work with you and onboard you to #LetsData!