I recertified in April 2023 on the AWS Data Analytics certification for the second time, if you count the AWS Big Data certification – now retired. I got an email from AWS asking for feedback and another survey about the competencies for the AWS Data Engineer role – fair warning, a new version of the exam may be coming; no big secret here; they review the exams to keep them current and challenging.
You can read a bit about my experience with this certification in the past in the post above. I got it days after it was released, and I can say that it was a challenging and fun experience. Compared to the present experience, I can say that the exam pattern remains very similar but with more questions about some services like OpenSearch, Lake Formation and MSK, to name a few. Remember that this is just an orientation based on my test set so you could get a different experience. In any case, base your preparation on your personal experience and needs; as I always say, take the opportunity to create your course and journey.
About the value of the certification, I can say that it works for me, helping me to keep up to date and proving my skills in the field up to a point. The certification is not a substitute for experience, so remember that you need to have some to back it up, plus experience in other complementary domains.
I will focus this post on the new features or services with more presence in the test; as for the rest, you can find more information in the older post or around the net.
Let’s have a look at a question from the sample set:
This is a representative question that includes different services and is relatively verbose. It references many essential services, including Kafka.
In many questions, you must choose the best option, filtered by some functional requirement, in this case, the overall latency. The difference between two answers could be one word, literally one service. That can be a difficult choice, depending on the context given that can help you to choose.
So, let’s review some new features that I find interesting and can be relevant for the test too:
LakeFormation & Glue
LakeFormation and Glue are two pivotal services you need to know well, especially Glue. They can now work together, so you can use AWS Lake Formation permission defined on the data lake for crawling the data, avoiding creating roles and S3 permissions for them. Additionally, this feature allows cross-account access and can be used with Athena, creating a model for centralized permission government.
More on the subject, this time with Redshift:
AWS MSK & Kafka
There is one service you need to prepare for if you don’t have previous experience with it. The growing number of questions surprised me in a way, but it makes sense. You will find many projects using Kafka, and MSK is an excellent alternative to replace it with a managed solution. The questions weren’t that easy, so prepare well. Also, I’d recommend that you get a roundup of Kafka:
The book’s second edition is relatively recent, and it’s an excellent introduction. Topics and partitions are well explained 🙂
A classic, what’s the better choice for your workload, provisioned or serverless?
MSK interacts very well with other AWS services:
The ability to ingest hundreds of megabytes of data per second into Amazon Redshift materialized views and query it in seconds, avoiding provisioning additional infrastructure like Firehose, allows for implementing use cases like live leaderboards, clickstream analysis, application monitoring, and others in a much easier way.
New AppFlow integrations, like Marketing connectors (Facebook Ads…), Customer Service (MailChimp, Sendgrid … ), and Business operations (Stripe …).
But primarily, I’d like to highlight the integration with the AWS Glue Data Catalog that allows the registration of the SasS data into it. With this integration, there is no need to create crawlers to populate the data; it can be done with a few clicks.
Ah, AWS Open Search, an old friend of mine :), I have provisioned a few so far, and I’m happy to see that it appears more often in the test.
This new serverless flavour seems a good step forward for the service, following the steps of RedShift, for instance.
Apart from the usual suspects, documentation, blog, FAQ, practice test, and readiness course, I can recommend the official certification guide, which is very relevant, and packed with additional resources to prepare: