Friday, December 8, 2023

Summarization & Analysis of Tweets using AWS & OpenAI

I’m sharing with you one of the architectures, production-ready, that I’ve been working on lately. I was keen to learn how the GPT algorithms would fare summarizing Tweets, not just individually – for instance, with two or three words – but with groups of Tweets that share a tag and common themes. Also, to analyze and generate a response for those Tweets or summaries.

Let’s see an example.

We want to generate a summary for architects of the latest Tweets from the AWS Architecture account:

“By migrating to the cloud, we could develop our capabilities faster & focus our efforts on building solutions for our customers.” Jacek Presz, Director of IT Management Bureau, Bank Pocztowy SA Here’s the unique #AWSConfig solution 👉 #CloudCompliance”

“Building highly available, resilient apps is a *MUST* these days. Hear from Joseph Wagner, Jr., Domain Architect & Cloud Specialist about how using the #AWS #WellArchitected Tool helped Vanguard improve resilience of their apps 💪 #ResilienceOnAWS

“Aurora leads the way in autonomous trucking with the benefits of self-driving safely 🚘 Discover how #AWS contributes to this success story. It’s our latest look at the drivers of transformation in this season of #AllThingsAutomotive Watch how @IndyAChallenge is paving the way for the next generation of autonomous race cars with the help of #AWS & open innovation. ✨ Watch now: #AllThingsAutomotive”

We may get something like this:

“Here’s the summary of the tweets related to AWS Architecture:

  • Jacek Presz, Director of IT Management Bureau at Bank Pocztowy SA, highlights the benefits of migrating to the cloud for faster development and customer-focused solutions. They used the unique AWS Config solution for cloud compliance.
  • Joseph Wagner, Jr., Domain Architect & Cloud Specialist at Vanguard, shares how the AWS Well-Architected Tool helped improve the resilience of their highly available and resilient apps.
  • AWS contributes to the success story of Aurora, a leader in autonomous trucking with self-driving safety, showcasing the drivers of transformation in this season of AllThingsAutomotive.
  • Indy Autonomous Challenge uses AWS and open innovation to pave the way for the next generation of autonomous race cars in AllThingsAutomotive.”

Now we generate a response for the summary as a Tweet:

“Exciting to see how #AWS architecture is driving cutting-edge technologies like autonomous trucking and race cars while also improving app resilience and cloud compliance for businesses. It’s all about leveraging the latest tools like AWS Config and Well-Architected Framework!” #CloudInnovation #AllThingsAutomotive #ResilienceOnAWS


The architecture I’m sharing with you comprises well-known AWS components and patterns. For instance, the ingestion pipeline of the architecture could be done in different ways; in this case, I’m using the well know pattern Kinesis Data Streams –> Kinesis Data Analytics –> Kinesis Data Firehose to ingest and transform the Tweets in almost real-time. In the final version, I’ve used another different pipeline, but this is fine and better known.

Let’s review the architecture flow and the components quickly, and in the next instalments, I will go deeper if needed. Ideally, it would be best if you tried to implement this by yourself and learn from the experience. For instance, the first thing you could do is extract an MVP from the architecture and start building on top of it. Be advised this architecture could be $$$, so proceed with caution.

1- We first need to connect to Twitter to collect raw tweets; for that, we can use a Python script that connects to the service to retrieve the data and ingests it into Kinesis. This part is pretty straightforward; you will need a Twitter developer account and credentials to access it. Deploying the script as a Docker container in ECS Fargate is a very convenient way to do it and is a serverless service with all its benefits. The credentials are stored in the AWS Secrets Manager Service.

2 – As explained previously, for ingesting and transforming the data in almost real-time, we use the pattern Kinesis Data Streams -> Kinesis Data Analytics -> Kinesis Data Firehose -> AWS Open Search. This is a very convenient way to do it, and Data Analytics can help transform the raw Tweets, using SQL, to the format you need to create the Open Search index.

3 – Finally, to obtain the summaries, we have exposed a REST API with API Gateway, Cognito and Lambda. Again, you can implement this in many ways, but this is serverless and easy to implement. The Lambda function is the core of the API, querying the OpenSearch, fetching and grouping the tweets we want to summarise, and finally invoking the OPENAI API; for this, you need credentials that you can obtain on their website.

Adolfo Estevez
A Estevez
Cloud & Digital Evangelist | AWS x 13 Certified | GCP x 6 | Serverless | Machine Learning | Analytics |

Related Articles

Leave a Reply

Latest Articles