Wednesday, September 22, 2021

Fast DynamoDB Pagination using Python

DynamoDB is the fastest NoSQL database at scale from AWS, operating within the key-value and document-based models. I’m not going to delve into the basics because I’m sure I don’t need to explain them to you – as you have arrived here :). Anyways, if you’d need a quick introduction, please check out the following links:

Pagination Architecture is not a trivial affair

If you have ever implemented a pagination component, you already know that it is not a really easy thing to do, especially in a clean and performant way.

Furthermore, DynamoDB adds its own set of challenges because of the way it works. The resultset is divided into sets or pages of data up to 1 MB in size if you execute a Query. So you’d need to find out if there are some remaining results to return after that first query. Also, you’d likely need to return a fixed number of results, which adds a few nice edge cases to the mix.

If you use Scan, instead of Query, things get worse because it reads up the whole table, exhausting the assigned RCUS very quickly. I produced a first quick version using Scan; it works, but it’s not optimal for pagination, especially when you have a huge number of records – and expensive too.

Not all is gloom and doom, though. The Queryobject contains an element, LastEvaluatedKey, that points to the last processed record. We can use this element to build a cursor we can pass back and forth – in the response and request – to build our pagination component. When there are no elements left, this element is null, and therefore, we have reached the end of the resultset.

LastEvaluatedKey is a Map type object that contains the PK of the Table. We shouldn’t pass it like that, as we would be exposing our model to the world. A standard and better way to do this is passing the element using Base64 encoding. You can use the python module base64:

import base64

cursor_ascii = cursor.encode("ascii")
base64_bytes = base64.b64encode(s)
#we convert the bytes into a string or whatever we'd need

The first thing we have to do is to retrieve the cursor from the request, if it exists, and execute a first query. Then we assign the cursor – the decoded LastEvaluatedKey from the previous pagination – to the field ExclusiveStartKey. In this example, I’m retrieving a set of user’s data using the id as a filter.

#get cursor from the request
exclusiveStartKey = decode_base64(cursor)

if exclusiveStartKey is not None:
        response = table.query(
                KeyConditionExpression=Key('id').eq(userId),
                ExclusiveStartKey=exclusiveStartKey
            )


Now, we find out if there are some remaining records – remember the 1 MB limit – until the element LastEvaluatedKey is present in the result object, or we have reached our imposed limit. Finally, we have to keep track of the LastEvaluatedKey to pass and encode it in the response.

lastEvaluatedKey = None
    while 'LastEvaluatedKey' in response:
        key = response['LastEvaluatedKey']
        lastEvaluatedKey = key

        response = table.query(
                KeyConditionExpression=Key('id').eq(userId),
                ExclusiveStartKey=key
            )
    ............

cusor = encode_base64(cursor)     

I hope this helps to build your own pagination component 🙂

Adolfo Estevez
A Estevezhttps://mnube.org
Cloud & Digital Evangelist | AWS x 13 Certified | GCP x 6 | Serverless | Machine Learning | Analytics |

Related Articles

Leave a Reply

Latest Articles

error:
%d bloggers like this: