Wednesday, June 7, 2023

Fast DynamoDB Pagination using Python

DynamoDB is the fastest NoSQL database at scale from AWS, operating within the key-value and document-based models. I won’t delve into the basics because I’m sure I don’t need to explain them to you – as you have arrived here :). Anyways, if you’d need a quick introduction, please check out the following links:

Pagination Architecture is not a trivial affair

If you have ever implemented a pagination component, you already know that it is not easy, especially in a clean and performant way.

Furthermore, DynamoDB adds its own set of challenges because of the way it works. The resultset is divided into groups or pages of data up to 1 MB in size if you execute a Query. So you’d need to find out if there are some remaining results to return after that first query. Also, you’d likely need to return a fixed number of results, which adds a few excellent edge cases to the mix.

If you use Scan, instead of Query, things get worse because it reads up the whole table, exhausting the assigned RCUS very quickly. I produced a first quick version using Scan; it works, but it’s not optimal for pagination, especially when you have many records – and expensive too.

Not all is gloom and doom, though. The Queryobject contains an element, LastEvaluatedKey, that points to the last processed record. We can use this element to build a cursor we can pass back and forth – in the response and request – to make our pagination component. When there are no elements left, this element is null, and therefore, we have reached the end of the resultset.

LastEvaluatedKey is a Map type object that contains the PK of the Table. We shouldn’t pass it like that, as we would expose our model to the world. A standard and better way to do this is by passing the element using Base64 encoding. You can use the Python module base64:

import base64

cursor_ascii = cursor.encode("ascii")
base64_bytes = base64.b64encode(s)
#we convert the bytes into a string or whatever we'd need

The first thing we have to do is to retrieve the cursor from the request, if it exists, and execute a first query. Then we assign the cursor – the decoded LastEvaluatedKey from the previous pagination – to the field ExclusiveStartKey. In this example, I’m retrieving a user’s data set using the id as a filter.

#get cursor from the request
exclusiveStartKey = decode_base64(cursor)

if exclusiveStartKey is not None:
        response = table.query(
                KeyConditionExpression=Key('id').eq(userId),
                ExclusiveStartKey=exclusiveStartKey
            )


Now, we find out if there are some remaining records – remember the 1 MB limit – until the element LastEvaluatedKey is present in the result object or if we have reached our imposed limit. Finally, we have to keep track of the LastEvaluatedKey to pass and encode it in the response.

lastEvaluatedKey = None
    while 'LastEvaluatedKey' in response:
        key = response['LastEvaluatedKey']
        lastEvaluatedKey = key

        response = table.query(
                KeyConditionExpression=Key('id').eq(userId),
                ExclusiveStartKey=key
            )
    ............

cusor = encode_base64(cursor)     

I hope this helps to build your pagination component 🙂

Adolfo Estevez
A Estevezhttps://mnube.org
Cloud & Digital Evangelist | AWS x 13 Certified | GCP x 6 | Serverless | Machine Learning | Analytics |

Related Articles

Leave a Reply

Latest Articles

error:
%d bloggers like this: