GCP Professional Data Engineer Guide – September 2020


I have recently recalled my first experience with GCP. In London, shortly before the 2012 Olympics, it was in an online gaming project, initially thought for AWS, that was migrated to App Engine – PAAS platform that would evolve to the current GCP.

My initial impression was good, although the platform imposed several development limitations, which would be reduced later with the release of App Engine Flexible.

Coinciding with Tensor Flow’s launch as an Open Source framework in 2015, I was lucky enough to attend a workshop on neural networks – given by one of the AI scientists from Google Seattle – where I had my second experience with the platform. I was shocked by the simplicity of configuration and deployment, the NoOps concept and a Machine Learning / AI offering without competition at the time.

Do Androids Dream of Electric Sheep? Philip K. Dick would have “hallucinated” with the electrical dreams of neural networks – powered by Tensor Flow.


The exam structure is the usual one in GCP exams: 2 hours and 50 questions, with a format directed towards scenario-type questions, mixing questions of great difficulty with simpler ones of medium-low difficulty.

In general, to choose the correct answer, you have to apply technical and business criteria. Therefore, a deep knowledge of the services from the technological point of view and skill/experience applies the business criteria contextually, depending on the question, type of environment, sector, application, etc …

Image #1, Data Lake, the ubiquitous architecture – Image owned by GCP

Pre-requisites and recommendations

At this level of certification, the questions do not refer, in general, to a single topic. That is, a question from the Analytics domain may require more or less advanced knowledge of Computing, Security, Networking or DevOps to solve it successfully. I´d recommend having the GCP Associate Cloud Engineer certification or have equivalent knowledge.

  • GCP experience at the architectural level – In part, the exam focuses on the architecture solution, design and deployment of data pipelines, selection of technologies to solve business problems, and, to a lesser extent, development. I´d recommend studying as many reference architectures as possible, such as those I show in this guide.
  • GCP experience at the development level – Although no explicit programming questions appeared in my question set or the mock test, the exam requires technical knowledge of services and APIS: SQL, Python, REST, algorithms, Map-Reduce, Spark, Apache Beam (Dataflow) …
  • GCP experience at the Security level – A domain that appears transversally in all certifications – I´d recommend knowledge at the Associate Engineer level.
  • GCP experience at the Networking level – Another domain that appears transversely – I´d recommend knowledge at the level of Associate Engineer.
  • Knowledge of Data Analytics – It’s a no-brainer, but some domain knowledge is essential. Otherwise, I´d recommend studying books like “Data Analytics with Hadoop” or taking courses like Specialized Program: Data Engineering, Big Data and ML on Google Cloud in Coursera. Likewise, practising with laboratories or pet projects is essential to obtain some practical experience.
  • Knowledge of the Hadoop – Spark ecosystemConnected with the previous point. High-level knowledge of the ecosystem is necessary: Map Reduce, Spark, Hive, Hdfs, Pig …
  • Knowledge of Machine Learning and IoT – Advanced knowledge in Data Science and Machine Learning is essential, apart from specific knowledge of GCP products. There are questions exclusively about this domain – at the level of certifications like AWS Machine Learning or higher. IoT appears on the exam in a lighter form, but it is essential to know the architecture and services of reference.
  • DevOps experience – Concepts such as CI / CD, infrastructure or configuration as code are of great importance today, which is reflected in the exam, although they do not have great specific weight.

!-- Mentee Area - Register for access --!