On April 13, the new AWS Data Analytics Specialty certification journey officially began – before the beta phase in December 2019 / January 2020. It coincided with the AWS Database Specialty Beta, which forced me to choose between them. Finally, I decided to take the Databases Specialty, as I had recently tested from AWS Big Data.
The “Beta exam” experience is very different from the “standard” one: 85 questions and 4 hours long – that is, 20 questions and one more hour – an intense experience. I recommend taking a 5-minute break – in the centres, they are allowed – since, after the third hour, it is challenging to stay focused.
The certification is the new version of AWS Big Data Specialty, an exam that will be withdrawn in June 2020. I will not go into much depth on the differences; suffice it to say that the domain of Machine Learning has been eliminated, expanding and updating the rest of the parts in depth. But beware, Machine Learning and IoT continue to appear integrated into the other domains; therefore, it is necessary to know them at an architectural level, at the very least.
Prerequisites and recommendations
I will not repeat the information already available on the AWS website; instead, I will give my recommendations and observations, as I consider the Learning Path that AWS suggests to be somewhat light for the current exam level.
- AWS experience at the architectural level. The exam primarily focuses on advanced architecture solutions – 5 pillars – and to a lesser extent, on development, mainly in services such as Kinesis and Glue. I recommend owning the AWS Architect Solutions Pro certification or the AWS Architect Associate + AWS Security Specialty.
- Advanced AWS security experience. It is a complete domain of the exam but can be found – cross-domain – in many questions. If you own the AWS Architect Solutions Pro, general security knowledge may be sufficient – not the specific certification knowledge for each service. Otherwise, the AWS Security Specialty is a good option, or equivalent knowledge in certain services, which I will indicate later.
- Analytics knowledge. Otherwise, I’d recommend studying books such as “Data Analytics with Hadoop” – O’Reilly 2016, or taking the courses indicated in the AWS Learning Path. Likewise, carry out laboratories or pet projects to obtain some practical experience.
- Hadoop’s ecosystem knowledge. Connected to the previous point. High-level and architectural knowledge of the ecosystem is a must: Hive, Presto, Pig, …
- Knowledge of Machine Learning and IoT – AWS ecosystem. Sagemaker and core IoT services at the architectural level
The questions follow the style of other certifications such as AWS Pro Architect or Security or Databases Specialty. Most of them are “scenario-based”, long and complex. You are not going to find many simple questions. Indeed, between 5% and 10% of “easy” questions appeared in a “scenario” format.
Let’s have a look at an example taken from the AWS sample questions:
I’d classify this question as the “intermediate” difficulty level. If you have taken the Architect PRO or some speciality such as Security or Big Data, you will know what I am talking about. Indeed, the questions’ level is much higher and more profound than in the previous version of the exam.
I’d recommend doing the new speciality directly, as the old one contains questions about already deprecated services – or outdated information.
Services to know in-depth
AWS Kinesis has three modalities: Data Streams, Firehose and Analytics. Architecture, dimensioning, configuration, integration with other services, security, troubleshooting, metrics, optimization and development. Questions of various levels, some very complex and of great depth.
AWS Glue – in deep for ETL and Discover – an integral part of the exam. Questions of different levels – I did not find them to be the most difficult.
AWS Redshift – architecture, design, dimensioning, integration, security, ETL, backups … a large number of questions and some of them very complex.
AWS EMR / Spark – architecture, sizing configuration, performance, integration with other services, security, integration with the Hadoop ecosystem – very important but not as necessary as the previous three services. These complex questions require advanced and transversal knowledge of all domains and the Hadoop ecosystem: Hive, HBase, Presto, Scoop, Pig …
Security – KMS encryption, AWS Cloud HMS, Federation, Active Directory, IAM, Policies, Roles… in general, and for each service. Transversal questions to other domains and of great difficulty.
Essential services to consider
- AWS S3 – core service base (storage, security, rules) and new features like AWS S3 Select. It appears consistently across all certifications, so I’d assume it’s known in-depth, except for the new features.
- AWS Athena – architecture, configuration, integration, performance, use cases. It appears consistently and as an alternative to other services.
- AWS Managed Kafka – alternative to Kinesis, architecture, configuration, dimensioning, performance, integration, and use cases.
- AWS Quicksight – subscription formats, service features, different ways of viewing, and use cases. Alternative to other services.
- AWS Elastic Search y Kibana (ELK) – architecture, configuration, dimensioning, performance, integration, use cases. Alternative to other services.
- AWS Lambda – architecture, integration, use cases.
- AWS StepFunctions – architecture, integration, use cases.
- AWS DMS – architecture, integration, use cases.
- AWS DataPipeline – architecture, integration, use cases.
- AWS Networking – basic network architectures and knowledge: VPC, security groups, Direct Connect, VPN, Regions, Zones … network configuration of each service.
- AWS DynamoDB, ElasticCache – architecture, integration, use case knowledge. These services, which appeared very prominently in the previous version of the exam, have much less weight than the current one.
- AWS CloudWatch, Events, Log – architecture, configuration, integration, use case knowledge.
- AWS RDS y Aurora – architecture, configuration, integration, use case knowledge.
- EC2, Autoscaling – knowledge of architecture, integration, and use cases.
- SQS, SNS – knowledge of architecture, integration, and use cases.
- AWS Cloudformation – knowledge of architecture, use cases, and DevOps.
- Sagemaker y AWS IoT core – knowledge of architecture, integration, and use cases.
- AWS Certification Website.
- Example questions.
- Readiness Course – a must, packed with information and resources – including a 20-question test.
- AWS Whitepapers – Big Data Analytics Options on AWS.
- AWS FAQS for every service – especially for Kinesis, Glue, Redshift, and EMR.
- AWS Big Data Blog
- Practice Exam – a must, quite challenging and representative of the actual exam.
Is it worth it?
Let’s see 🙂
AWS Data Analytics Specialty is a complex and challenging certification; expensive (300 euros), which requires a significant investment of time – even having experience in analytics and AWS. Therefore, it is not a decision that can be taken lightly.
In my case, I found it very convenient to have done it since I have been working on several projects of that kind – fast data, IoT – under AWS in recent times – apart from being the only certification that I needed to complete the full set of thirteen – if Big Data is included – certifications.
Certifications are an excellent way to validate knowledge externally, collect updated information, validate good practices, and consolidate knowledge with real (or almost) practical cases.
For those interested in the analytics field or who have professional experience in it and who want to leap to the cloud, my recommendation is first to obtain an AWS Architect-type certification – preferably PRO – and optionally the Security speciality or equivalent knowledge, at least in the services that I have mentioned in previous points.
For those who already have AWS certifications but no professional experience in the specific field may be an excellent way to start, but it will not be an easy or short path. I recommend doing labs or pet projects to get some experience necessary to pass the exam.
So is it worth it? Absolutely, but not as a first certification. It is mainly aimed at people with advanced knowledge of AWS architecture who want to delve deeper into the analytics-cloud field.
Good luck to you all!