Saturday, April 4, 2026
Home Blog Page 3

Verizon’s Media Data Warehouse migration to GCP

0

You’d probably remember how big of a player was Yahoo in the big tech scene in the late ’90s to the 2000s, and I surely still remember how their CEO rejected the buyout offer by Microsoft for $44.6 billion in 2008 – ouch!

Now Yahoo is part of Verizon Media. They have just finished a massive migration of Hadoop and Enterprise Data Warehouse (EDW) workloads to Google Cloud’s BigQuery and Looker, becoming a big part of their MAW – Media Analytics Warehouse.

Looker – image from google.cloud.com

I don’t need to vouch for the power and flexibility of BigQuery as a tool, is well known: analytics real-time or batch, warehouse or even as an AI tool, without having to move out the data from processing and just using SQL.

I’ve been using it lately in that capacity – BigQuery ML – and it’s really easy, even from Jupyter Notebooks:

%load_ext google.cloud.bigquery
%%bigquery
SELECT    source_year AS year,    
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC LIMIT 15

Read more in the following article about the Verizon’s migration:

https://cloud.google.com/blog/products/data-analytics/benchmarking-cloud-data-warehouse-bigquery-to-scale-fast

re:Invent 2020 playlist released on Youtube

0

A playlist containing 35 videos from the last re:Invent has been released by AWS on their Youtube channel.

Some favorites – just because 🙂

Machine Learning Keynote

Strong consistency for Amazon S3

NFL on using AWS to transform player safety

GCP Professional Cloud Architect BETA

0

The Professional Cloud Architect beta certification is now open for registration, until some point in March 2021. The format of the exam is the same, but two new case studies have been added, two have been updated and one have been removed:

Good luck!

Google Certified Associate Cloud Engineer All In One Guide Review

0

A new book for preparing for Google´s Associate Engineer Certified exam has been released. It belongs to the series All-In-One from McGraw Hill, which it´s one of my favourites; usually, the books are excellent.

This one is not an exception; I think it covers all the topics in the official exam guide. Even a section maps the objectives on the official guide with the chapters in the book, which is really useful. At the time of writing this post – December of 2020 -, both are almost identical:

  • Managing users in Cloud Identity (manually and automated) has replaced Linking users to G Suite Identities.
  • Deploying an application that receives Google Cloud events (e.g., Cloud Pub/Sub events, Cloud Storage object change notification events) has replaced Deploying a Cloud Function

Ensure you check out the official guide for changes because Google updates it occasionally.

About the changes, Cloud Identity it’s a big topic in the Security Certification and an exciting one, I have to say – so don’t miss it out.

Objetives

The book covers the complete official guide in eleven chapters. All the topics are up-to-date, including Anthos and Cloud Run. But remember, this is just a guide to prepare for the exam. That means that you have to expand each section, depending on your knowledge and experience about the subject, to be successful.

For instance, the chapter on Kubernetes Engine is quite good and covers many topics. But you are not going to learn Kubernetes just by reading the chapter, so I’d recommend reading the docs, and books, getting real experience, or taking a course because if you are a beginner, you will be confused about the subject. And the chances are that you will find advanced questions on the subject – the exam is heavy on Kubernetes, as a matter of fact.

Or have a look at the chapter on App Engine. It’s a topic well covered and probably enough to answer many of the questions you may find on the exam. But you need to go deeper, create apps, and get real experience if you don’t have it.

Review and practice questions

Every chapter has a handful of review questions, from ten to fifteen, which are in a test format similar to the ones you would find in the exam – but on the more accessible side.

Don’t forget to check the sample questions from Google, but again most questions are a bit easier than the ones you may find on the exam – but they are an excellent guide to check your knowledge and gauge your readiness to take the exam.

Representative sample question, property of cloud.google.com

The book also provides, for free, online content comprising one hundred practice questions.

Image was taken by the author of the post, property of Total Seminars Training Hub

Once you have registered on the Total Seminars Training Hub site, you can access the Custom Test screen, where you can customize the testing experience: duration, number of questions, exam objectives and assistance.

Works well enough, simple but functional. The assistance and the explanations given about the topics are short but enough.

The questions are similar to those you’d encounter on the exam, topics and format, so it’s good practice. I’d say the actual questions are lengthier and a bit more complicated.

Conclusion

This is a good guide that can help you with the preparation for the exam. But you need to expand every topic, depending on your experience and knowledge.

My advice is to use the guide – there is another guide from Google, but it’s a bit outdated – as a starting point and use the documentation, labs, videos and real-life experience not only to pass the exam but to round and up-to-date your knowledge so that you can validate your professional experience.

After all, this is not a college exam but a professional one!

The Learning Journey, Part II: The Dopamine Effect

0

Video killed the radio star – The Buggles, 1980.

Do you remember books? Yeah, those objects that you´d use to carry on your bag and that have been pushed aside by the video course frenzy – and the Internet. And I get it; video courses could be a fast and cheap way to gather information, and some of them are really good.

It seems that video killed the book too.

But there is more than meets the eye, though, so I can’t stress enough the value of books as a source of learning; in fact, and I have been sharing online many of the books I use daily.

Let’s go deeper and find out what’s going on 🙂

The Book Way

Learning through books takes a much bigger effort than watching a video, the same thing that is reading a book or watching the movie adaptation. But in exchange, you’d get a richer non-linear interactive experience, powered by your mind in a big way and keeping much more focused on one task. The complex ideas that you can handle while reading a book are, in many ways, astonishing.

When you are watching a video, a very passive activity, the chances to be distracted with other things and multitask grow exponentially, opening other tabs, reading emails, watching other videos, the notifications … the list has no end. After all, we are getting used to multitasking; that´s the Internet way.

Image taken by the author

That’s not a bad thing, but it comes with a hefty price to pay, loss of focus because of the constant distraction, and obviously, that leads to a loss of productivity in any endeavour that you are pursuing.

That could become a serious problem because, over time, our brain gets used to it as the normal way to function, rewarding the multitasking way and penalizing the single task way of working. We´d become naturally distracted.

Ask yourself the last time you watched a movie, from start to end, without checking the email or the notifications? Do you get bored reading a book? You´d rather go back to watch videos or checking social media?

Well, that’s the dopamine talking.

Shots of dopamine

Dopamine is a neurotransmitter that motivates us to do things through instant gratification because it affects our rewards and pleasure centre. When the brain anticipates that we will do something that gives us pleasure, it releases a certain amount, which depends on the task in question. Eating chocolate, watching a video, or playing a video game, releases a huge amount of it, taking in exchange a small amount of energy. For instance, in the case of video games, the brain could generate a constant supply of dopamine, as chances are we will find new and exciting patterns in the game; novelty generates a lot of dopamine.

Dopamine´s good, and we need it for survival. Still, the problem comes with the artificial release of it, through low-value activities, as dozen hours expended on the Internet – nothing wrong, if done with moderation. As you’d have guessed by now, dopamine generates addiction that could lead to loss of focus, concentration, and loss of time and productivity.

Dopamine: https://commons.wikimedia.org/wiki/User_talk:Jynto

That´s the reason that you are not reading that many books anymore. It still generates dopamine, but just a small amount over time, naturally and healthily. Also, it requires you a lot of energy and attention, leading you to deeper thinking. Your brain improves as a result.

A similar thing that eating an apple or eating a doughnut. A quick shot of refined sugar or a release of natural sugar over time. Guess which one liberates more dopamine.

It´s easy to get addicted to checking social media, playing video games or streaming, but not that easy to become a book addict. That can be a problem because it grows over time as our brain adapts to the new way of learning through dopamine. Your brain is demanding more and more, and you´d give it through automatic behaviour – like checking constantly social media, looking for new and exciting interactions.

I’m not surprised to see so many people with such a lower threshold of frustration these days. They want all the answers right away. No delay. No frustration. No struggle.

They want their dopamine shot.

The Learning Journey: dopamine fast

If you feel your brain is hungry for dopamine, and you can’t concentrate as you used to, or you haven’t read a book for ages, you should consider doing dopamine fast. One way to start is spending one day a week without distractions: no social media, no mobile phone, no movies, no video games … you´d get the idea. Instead, read a book, go for a walk, do exercise, write, paint etc … just analogue stuff. You will feel better over time like any addiction takes time to get rid of it.

My suggestion is to use video courses – videos in general – as a complement of your main learning, which should be a mix of books, documentation, posts, exercises, tests and practical projects. Watch only a few videos per session – short ones are preferred – take written notes and put them into practice.

Public Domain Picture

Transform a passive activity into an active one, and keep the healthy dopamine flowing 🙂

The Learning Journey, Part I

0
Temet Nosce in Greek

Temet Nosce,

Visitors would read that in awe when entering Apollo’s temple in Greece to visit the Delphi´s Oracle. Neo would face the same phrase – over the door – when visiting the Oracle in the Matrix – 1999, Warner Brothers movie. They were completely different epochs and visitors, but all of them had something in common, though they were looking for answers.

Funny thing, that quote It’s all you were going to get from the so called “Oracle”.

Let’s face it, going to someone else to find answers about one-self can be pretty deceiving. They could only give you a reflected image of yourself, and very likely, a distorted one – most of the time based upon our social masks. Now it’s truer than ever because, in the age of Instagram, we are just showing our good and happy side in an endless stream of funny pictures – before the CV19 anyway – and mindfulness quotes.

MGM, Public Domain Image

It’s US who should have all the answers, not some Oracle in some temple, a look-alike NY´s kitchen  😉

Is the Oracle a scam?

Ask Dorothy and their friends in “The Wizard of Oz”, Victor Fleming 1939, movie.

After a long journey travelling through the magical Land of Oz and meeting the Wizard – Oracle – they learned he had no answers for them because he was just “the man behind the curtain”, making the magical world go round. He was just a tool, a device, to create the discovery journey – or creating the Matrix, if you will. It is a mirror character to the Architect in the Matrix movie, even though the latter has many more gnostic overtones than the Wizard.

The Oracle is just an archetype, which is there to reflect the self, help them with introspection, not to give out the actual answers. The process of knowing one-self is obviously, personal and unique.

The Cloud Journey

That’s a concept I read about all the time on Linkedin, and probably some of you are familiar with it as well. I’m often being asked about my “journey” to the Cloud, how to get a job in the industry, or, similarly, my preparation to achieve one specific certification. That’s a new phenomenon brought by social media, something that we could call effectively collaboratory learning or mentoring.

It’s accurate to describe the learning process as “a journey”, though. Because learning should be an adventure, exploring unknown territory that could eventually take you out of the comfort zone. It will allow us to discover things about the self or Temet Nosce – the journey of self-discovery.

Maybe after all that time working on Development, you are now discovering that you are enjoying working on Machine Learning or Analytics. And that’s something that usually one wouldn’t learn in the day job – rarely happens to have that kind of opportunities. In fact, you need to make those opportunities happen, and a good way to do this is embarking on the “learning journey”.

Taking the first steps into the unknown

In the “Cloud Journey”, many people are giving the first steps using professional certifications. That’s a new phenomenon as well. Certifications are a good way to validate, up-to-date or enhance the professional experience. But I’ve never seen that neophyte people take them as the first steps to access an Industry at this scale.

Another way related to the former is taking one of the hundreds of pre-packaged video courses available on the different online learning platforms. They have been around for a long time now, but it’s only in the last few years that they have become so ubiquitous as a cheap way to self-learning and, in many cases, as shortcuts to achieve certifications. That defeats, in part, the journey of self-learning or discovery, as you are taking a fast highway instead of a secondary road. Certainly, driving on a highway is faster and safer, but you will learn much more about transiting on a secondary road. It’s tiresome, sure, but your driving skills will skyrocket.

Ideally, those courses should not be used as your main source truth and learning, but as a complement to your own learning path.

Lambda, EFS, and the Serverless Framework

0

If you’ve been developing serverless applications for a while, pretty sure you have found yourself facing a few challenges, apart from the old cold start thing – which has been solved to a great extent with the Provisioned Concurrency feature.

For instance, you need to load large files of rules consumed by a Lambda function that implements a rules engine or keep data files produced dynamically by the function between invocations. Lambda provides some local space – 512MB – that you may use, but it’s small and ephemeral, so it is not helpful for those kinds of scenarios.

Other solutions come to mind: storing in databases – RDS, DynamoDB, S3 … but comes with a high price of development, performance and cost. What would happen if we had peaks of several hundred – or thousands of requests – per second, loading big files in the startup and writing files to a data store concurrently?

Well, at the very least, we could have a significant performance hit, depending on the size of the files, the latency of retrieving the files at startup + the cold start of Lambdas – enter provisioned concurrency – plus the latency of storing the intermediate files to the datastores – it’s not the same storing and retrieving from S3 than from DynamoDB.

So no alternative? Well, we are in luck, as AWS released EFS support for Lambda in June!

Image property of AWS

Amazon EFS is widely known, so I’m not going to delve into the service but to mention that Amazon Elastic File Service provides an NFS file system that escalates on-demand, providing high throughput and low latency. It’s instrumental when shared storage and parallel access from the services it’s needed.

Configuration & Considerations

“With power comes responsibility”, or in our case, with powerful features come some configuration constraints. EFS runs in different subnets within a VPC, which means that our Lambda functions have to run within a VPC. That comes with a price: IP direction, a possible performance hit, and loss of connection to AWS global services; therefore, a NAT Gateway or Private Links / Gateway might need to be used, depending on the use case.

That constraint was vastly improved last year when Hyperplane ENI for Lambda was released, allowing that just a few ENIs – and therefore a few IPs – would be enough to handle a big number of Lambda invocations decoupling function scaling from ENI’s provisioning.

Configuration – Serverless Framework

The configuration of a Lambda function running within a VPC could be pretty simple – if it only needs to access the VPC resources – as shown in the image below – under the VPC label:

Serverless framework YAML – Image MNube.org

A security group is needed for the Lambda function, the IDs of the subnet(s) where the ENI(s) will be placed, and permissións to create, delete, and describe network interfaces.

VPC Lambda – Image MNube.org

The Lambda function is running within our VPC now, an ENI placed in each subnet selected, but to access the EFS instance, a few permissións will need to be provided:

Role permissións EFS, Lambda – Image MNube.org

Now the EFS can be created within the VPC. To do that, the console, Cloudformation, Serverless, AWS CLI, AWS SDK, etc … could be used.

EFS instance – Image MNube.org

After creating the instance, an access point needs to be provided to allow applications access. This is a new resource: “AWS::EFS::AccessPoint”. It can be created from the console or through a Cloudformation file – we will need to supply the EFS ID: ${self.provider}.

Serverless framework YAML – Image MNube.org

Finally, we link the file system to the Lambda Function, providing the arn of the EFS, the arn of the access point, and the locally mounted path – as shown in the image below:

Image MNube.org

The EFS instance is ready to be accessed by the Lambda function 🙂

Solution

I have used the Serverless framework to produce the solution – but AWS SAM with Cloud 9 as the official alternative could have been used instead.

Architecture – MNube.org

Let’s create – or transfer – a rules file that can be accessed from the Lambda function 🙂

Different services could be used to transfer the files, like AWS DataSync, an EC2 instance, or even creating files from code. The files we might transfer from EC2 are accessible from the Lambda functions, so we’ll use this method.

After the EC2 instance has been created – a t2.micro is enough – in one of the subnets of the VPC that has access to the EFS ENIs, a directory we’ll be needed – /efs. That directory doesn’t link to the EFS instance, so we’ll need to mount the directory.

One way to do it is by using the EFS tools:

                     sudo yum install -y amazon-efs-utils

An access point was created previously that we can use to mount the directory. It’s easy to get the command line needed from the web console. Just go to the Amazon EFS > Access Point > id link and press the Attach button:

EFS Mount – Image MNube.org

After mounting the directory – in green – the files can be transferred to the /efs directory:

Mounting and creating files – Image MNube.org

At this point, access to the directory from the Lambda function should be fully possible. I have coded a minimum Lambda function that lists the files contained in the directory:

Lambda function – Image MNube.org

The solution is now ready to be deployed. Remember that I have only shown parts of the serverless.yml, equivalent to the Cloudformation file you might use to provide the infrastructure – I will leave that to you as an exercise.

serverless deploy --stage dev --region eu-west-1
Serverless Stack – Image MNube.org

The framework provides an URL link, as I created an API gateway that invokes the Lambda function:

Cloudwatch Logs – Image from MNUBE.org

I have captured the request trace from the Cloudwatch Logs, where we can see the files in /efs: test.txt and rules.txt, and the low latency of the request.

Other Use Cases

  • Loading extensive libraries that Lambda layers can’t handle.
  • Files that are updated regularly.
  • Files that need locks for concurrent access.
  • Access to big files – zip / unzip.
  • Using different computing architectures – EC2, and ECS – to process the duplicate files.

GCP Professional Data Engineer Guide – September 2020

0

I have recently recalled my first experience with GCP. In London, shortly before the 2012 Olympics, it was in an online gaming project, initially thought for AWS, that was migrated to App Engine – PAAS platform that would evolve to the current GCP.

My initial impression was good, although the platform imposed several development limitations, which would be reduced later with the release of App Engine Flexible.

Coinciding with Tensor Flow’s launch as an Open Source framework in 2015, I was lucky enough to attend a workshop on neural networks – given by one of the AI scientists from Google Seattle – where I had my second experience with the platform. I was shocked by the simplicity of configuration and deployment, the NoOps concept and a Machine Learning / AI offering without competition at the time.

Do Androids Dream of Electric Sheep? Philip K. Dick would have “hallucinated” with the electrical dreams of neural networks – powered by Tensor Flow.

Exam

The exam structure is the usual one in GCP exams: 2 hours and 50 questions, with a format directed towards scenario-type questions, mixing questions of great difficulty with simpler ones of medium-low difficulty.

In general, to choose the correct answer, you must apply technical and business criteria. Therefore, a deep knowledge of the services from the technological point of view and skill/experience applies the business criteria contextually, depending on the question, type of environment, sector, application, etc …

Image #1, Data Lake, the ubiquitous architecture – Image owned by GCP

Pre-requisites and recommendations

At this level of certification, the questions do not refer, in general, to a single topic. That is, a question from the Analytics domain may require more or less advanced knowledge of Computing, Security, Networking or DevOps to solve it successfully. I’d recommend having the GCP Associate Cloud Engineer certification or having equivalent knowledge.

  • GCP experience at the architectural level – In part, the exam focuses on the architecture solution, design and deployment of data pipelines, selection of technologies to solve business problems, and, to a lesser extent, development. I’d recommend studying as many reference architectures as possible, such as those I show in this guide.
  • GCP experience at the development level – Although no direct programming questions appeared in my question set or the mock test, the exam requires technical knowledge of services and APIS: SQL, Python, REST, algorithms, Map-Reduce, Spark, Apache Beam (Dataflow) …
  • GCP experience at the Security level – A domain that appears transversally in all certifications – I’d recommend knowledge at the Associate Engineer level.
  • GCP experience at the Networking level – Another domain that appears transversely – I’d recommend knowledge at the level of Associate Engineer.
  • Knowledge of Data Analytics – It’s a no-brainer, but some domain knowledge is essential. Otherwise, I’d recommend studying books like “Data Analytics with Hadoop” or taking courses like Specialized Program: Data Engineering, Big Data and ML on Google Cloud in Coursera. Likewise, practising with laboratories or pet projects is essential to obtain some practical experience.
  • Knowledge of the Hadoop – Spark ecosystemConnected with the previous point. High-level ecosystem knowledge is necessary: Map Reduce, Spark, Hive, Hdfs, Pig …
  • Knowledge of Machine Learning and IoT – Advanced knowledge in Data Science and Machine Learning is essential, apart from specific knowledge of GCP products. There are questions exclusively about this domain – at the level of certifications like AWS Machine Learning or higher. IoT appears on the exam in a lighter form, but knowing the architecture and services of reference is essential.
  • DevOps experience – Concepts such as CI / CD, infrastructure or configuration as code are important today, reflected in the exam. However, they do not have great specific weight.
  • We can group the relevant services according to the states (and substates) of the data cycle:

Management, Storage, Transformation and Analysis.

  • Ingestion Batch / Data Lake: Cloud Storage.
  • Ingestion Streaming: Kafka, Pub/Sub, Computing Services, Cloud IoT Core.
  • Migrations: Transfer Appliance, Transfer Service, Interconnect, gsutil.
  • Transformations: Dataflow, Dataproc, Cloud Dataprep, Hadoop, Apache Beam.
  • Computing: Kubernetes Engine, Compute Instances, Cloud Functions, App Engine.
  • Storage: Cloud SQL, Cloud Spanner, Datastore / Firebase, BigQuery, BigTable, HBase, MongoDB, Cassandra.
  • Cache: Cloud Memorystore, Redis.
  • Analysis / Data Operations: BigQuery, Cloud Datalab, Data Studio, DataPrep, Cloud Composer, Apache Airflow.
  • Machine Learning: AI Platform, BigQueryML, Cloud AutoML, Tensor Flow, Cloud Text-to-Speech API, Cloud Speech-to-Text, Cloud Vision API, Cloud Video AI, Translations, Recommendations API, Cloud Inference API, Natural Language, DialogFlow, Spark MLib.
  • IoT: Cloud IoT Core, Cloud IoT Edge.
  • Security & Encryption: IAM, Roles, Encryption, KMS, Data Prevention API, Compliance …
  • Operations: Kubeflow, AI Platform, Cloud Deployment Manager …
  • Monitorization: Cloud Stackdriver Logging, Stackdriver Monitoring.
  • Optimization: Cost control, Autoscaling, Preemptive instances …

Standard questions

Representative questions of the level of difficulty of the exam.

Image property of GCP

Practical migration scenario question that includes cloud services and the Hadoop ecosystem and concepts from the Analytics domain.

Services to study in detail

Image #2 – property of GCP

  • Cloud Storage – Core service that appears consistently in all certifications and is central to Data Lake systems. I’d recommend its study in detail at an architectural level – see Image 1 -, configurations according to the data temperature and as an integration/storage element between the different services.
  • BigQuery – Core service in the Analytics GCP domain as a BI and storage element. Extremely important in the exam, so they have to be studied in detail: architecture, configuration, backups, export/import, streaming, batch, security, partitioning, sharding, projects, datasets, views, integration with other services, cost, queries and optimization SQL (legacy and standard) at table levels, keys …
  • Pub / Sub – Core service as an element of ingestion and integration. Its in-depth study is highly recommended: use cases, architecture, configuration, API, security and integration with other services (e.g. Dataflow, Cloud Storage) – Kafka’s native cloud mirror service.
  • Dataflow – Core service in the Analytics GCP domain as a process and transformation element. Implementation based on Apache Beam must know at a high level and pipeline design. Use cases, architecture, configuration, API and integration with other services.
  • Dataproc – Core service in the Analytics GCP domain as a process and transformation element. It is a service based on Hadoop; therefore, it is the indicated service for migrating to the cloud. In this case, knowledge of Dataproc is required, and in native services: Spark, HDFS, HBase, Pig … use cases, architecture, configuration, import/export, reliability, optimization, cost, API and integration with other services.
  • Cloud SQL, Cloud Spanner – Cloud-native relational databases. Use cases, architecture, configuration, security, performance, reliability, cost and optimization: clusters, transactionality, disaster recovery, backups, export/import, SQL performance and optimization, tables, queries, keys and debugging. Integration with other services.
  • Cloud Bigtable – Low latency NoSQL managed database, suitable for time series, IoT… ideal for replacing an HBase installation on-premise. Use cases, architecture, configuration, security, performance, reliability and optimization: clusters, CAP, backups, export/import, partitioning, performance, and optimization of tables, queries, and keys. Integration with other services.
  • Machine Learning – One of the certification strengths is the domain of “Operationalizing machine learning models”. It is much more dense and complex than it may seem at first since it includes the operability and knowledge of the relevant GCP services and the knowledge of the Data Science fundamentals: algorithm selection, optimization, metrics … The questions’ difficulty level is variable but comparable to specific certifications, such as AWS Certified Machine Learning – Specialty. Most essential services: BigQuery ML, Cloud Vision API, Cloud Video Intelligence, Cloud AutoML, Tensor Flow, Dialogflow, GPUs, TPU, ‘s…
  • Security – Security is a transversal concern across all domains and appears consistently in all certifications. In this case, it appears as an independent technical topic, crosscutting concern or a business requirement: KMS, IAM, Policies, Roles, Encryption, Data Prevention API …
Image #3, IoT Reference Architecture – owned by GCP

Essential services to consider

  • Networking – Cross-domain can appear in the form of separate technical issues, cross-cutting concerns, or business requirements: VPC, Direct Interconnect, Multi-Region / Zone, Hybrid connectivity, Firewall rules, Load Balancing, Network Security, Container Networking, API Access (private/public) …
  • Hadoop – The exam covers ecosystems and third-party services like Hadoop, Spark, HDFS, Hive, Pig … use cases, architecture, functionality, integration and migration to GCP.
  • Apache Kafka – Alternative service to Pub / Sub, so it is advisable to study it at a high level: use cases, operational characteristics, configuration, migration and integration with GCP – plugins, connectors.
  • IoT – It can appear in various questions at the architectural level: use cases, reference architecture and integration with other services. IoT core, Edge Computing.
  • Datastore / Firebase – Document database. Use cases, configuration, performance, entity model, keys and index optimization, transactions, backups, export/import and integration with other services. It doesn’t carry as much weight as the other data repositories.
  • Cloud Memory Store / Redis – Structured data cache repository. Use cases, architecture, configuration, performance, reliability and optimization: clusters, backups, export/import and integration with other services.
  • Cloud Dataprep – Use cases, console and general operation, supported formats, and Dataflow integration.
  • Cloud Stackdriver Use cases, monitoring and logging, both at the system and application level: Cloud Stackdriver Logging, Cloud Stackdriver Monitoring, Stackdriver Agent and plugins.

Other services

  • MongoDB, Cassandra – NoSQL databases that can appear in different scenarios. Use cases, architecture and integration with other services.
  • Cloud Composer – Use cases, general operation and web console, the configuration of diagram types, supported formats, import/export, integration with other services, and connectors.
  • Cloud Data Studio – Use cases, configuration, networking, security, general operation and environment, and integration with other services.
  • Cloud Data Lab – Use cases, general operation and web console, types of diagrams, supported formats, import/export and integration with other services.
  • Kubernetes Engine – Use cases, architecture, clustering and integration with other services.
  • Kubeflow – Use cases, architecture, environment configuration, Kubernetes.
  • Apache Airflow – Use cases, architecture and general operation.
  • Cloud Functions Use cases, architecture, configuration and integration with other services – such as Cloud Storage and Pub / Sub, in Push / Pull mode.
  • Compute Engine – Use cases, architecture, configuration, high availability, reliability and integration with other services.
  • App Engine – Use cases, architecture and integration with other services.

Bibliography & essential resources

Google provides many resources for preparing this certification in the form of courses, an official guidebook, documentation and mock exams. These resources are highly recommended and, in some cases, I would say essential.

The Data Engineering Specialized Program contains the Certification Preparation Course includes an extra exam, lots of additional tips, materials, and labs – using the external Qwik Labs tool.

As I have previously indicated, I find the Google courses on Coursera excellent. They combine short videos, reading material, labs, and test questions, thus creating a very dynamic experience. In any case, they should only be considered as a starting point, being necessary for the deepening – according to experience – in each one of the domains using, for instance, the excellent GCP documentation.

But you should not limit yourself to online courses. I can’t hide the fact that I love books in general and IT books in particular. In fact, I have a vast collection of books dating back to the 80s, which at some point, I will donate to a local Cervantina bookstore.

Books provide a more profound and dynamic experience than videos, which can be monotonous if they are too long and a much more passive experience – like watching TV. The ideal combines audiovisual and written media, thus creating your learning path.

Laboratories

Image #4 – Data Lake based upon Cloud Storage – owned by GCP

Part of the job as a Data Engineer consists of creating, integrating, deploying and maintaining data pipelines, both in batch and streaming mode.

The Data Engineering Quest contains several labs that introduce different data transformation, IoT, and Machine Learning pipelines, so I find them excellent exercises – and not just for certification.

Is it worth it?

The level of certification is advanced, and in general, it should not be the first cloud certification to obtain. It covers a large amount of material and domains, so tackling it without a certain level of prior knowledge can be quite a complex task.

Let’s compare it with the mirror certification on the AWS platform. It covers almost twice as much material, mainly due to the inclusion of questions about the Machine Learning / Data Science domain – which in the case of AWS, have been eliminated to be included in its certification. Therefore, it is like taking two certifications in one.

Is it worth it? Of course, but not as a first certification – depending on the experience provided.

Certifications are an excellent way to validate knowledge externally and collect updated information, validate good practices, and consolidate knowledge with real practical cases (or almost).

Good luck to you all!





AWS Certified Developer Reloaded

0

I’m going to share my recent experience with the re-certification – June 2020AWS Developer, one of my favourites, without a doubt. An experience that has been very different from the previous one, since my memory serves me well, I didn’t find any repeated question.

The exam structure is the usual one for the associated level: 2 hours and 65 questions, with an evolved format, even more, towards scenario-type questions. I don’t recall any direct questions and certainly not extremely easy ones. That said, it seems to me to be a much more balanced exam than the previous version, where some services had much more weight than others – API gateway; I’m looking at you.

Virtually all Core / Serverless services – important ones – are represented in the exam:

  • S3
  • In-Memory Databases: Elastic Cache, Memcache, Redis
  • Databases: RDS, DynamoDB …
  • Security: KMS, Policies…
  • CI / CD, IAC: ElasticBeanstalk, Codepipeline, Cloudformation …
  • Serverless: Lambda functions, API Gateway, Cognito …
  • Microservices: SQS, SNS, Kinesis, Containers, Step Functions …
  • Monitorización: Cloudwatch, Cloudwatch Logs, Cloudtrail, X-Ray …
  • Optimización: Cost control, Autoscaling, Spot Fleets …

The Developer Certification is the Serverless certification par excellence. However, some services, such as Step Functions or Fargate Containers, are poorly represented – just one or two questions and great difficulty.

Serverless is a great option for IoT Sytems

Prerequisites and recommendations

I will not repeat the information that is already available on the AWS website; instead, I will give my recommendations and personal observations.

Professionals with experience in Serverless development – especially in AWS – Microservices, or experience with React-type applications, will be the most comfortable when preparing and facing this certification.

  • AWS Experience. Certification indicated for professionals with little or no experience on AWS. I´d recommend getting the AWS Certified Cloud Practitioner, though.
  • Dev Experience. It’s essential to possess a certain level since many of the questions are eminently practical and result from experience in the development field. Knowledge of programming languages like Python, Javascript or Java is something very desirable. The exam poses programming problems indirectly through concepts, debugging and optimization. The lack of this knowledge or experience generates the impression in many professionals that this certification is of a very high level of difficulty when, in my opinion, it is not.
  • Architecture experience. The exam is largely focused on the development of Cloud applications, especially Serverless – Microservices. However, some questions may require knowledge at the Cloud / Serverless / Containers architecture pattern level.
  • DevOps Experience. Concepts such as CI / CD, infrastructure or configuration as code are of great importance today, which is reflected in the exam. Obviously, the questions focus – for the most part – on AWS products, but knowledge of other products like Docker, Jenkins, Spinnaker, Git and general principles can go a long way. Let’s not forget that this certification, together with SysOps, is part of the recommended path to obtain the AWS DevOps Pro certification. Obtaining them automatically re-certifies the two previously mentioned.

Neo, knowing the path is not the same as walking it” – Morpheus. The Matrix, 1999


Imagen aws.amazon.com

AWS Technical Essentials: introductory course, low level. Live remote or in person.

Developing on AWS: course focused on developing AWS applications using the SDK. It is the intermediate level, and the agenda seems quite relevant to the certification. Live remote or in person. Not free.

Advanced Developing on AWS: interesting course, but focused on AWS Architecture: migrations, re-architecture, microservices .. Live remote or face-to-face. Not free.

Exam Readiness Developer: essential. Free and digital.

AWS Certified Cloud Practitioner: Official certification, especially aimed at professionals with little knowledge of the Cloud in general and AWS.

Exam

As I have previously commented, the exam format is similar to most certifications, associated or not. That is, “scenario-based”, and in this case of medium difficulty, medium-high. You are not going to find “direct” or excessively simple questions. As it is an associated level exam, each question focuses on a single topic; if the question is about DynamoDB, the question will not contain cross-cutting concerns, such as Security, for instance.

Let’s examine a question taken from the certification sample questionnaire:

Very representative question of the medium-high level of difficulty of the exam. We are talking about a development-oriented certification, so you will find questions about development, APIs, configuration, optimization and debugging. In this case, we are presented with a real example of configuring and designing indexes for a DynamoDB table.

DynamoDB is an integral part of the AWS Serverless offering and the flagship database, with Aurora Serverless’s permission. Low latency NoSQL database ideal for IoT, events, time – series etc.… Its purely Serverless nature allows its use without providing and managing servers or placing them within a VPC. This fact provides a great advantage when accessing it directly from Lambda functions, since it is not necessary that they would “live” within a VPC, with the added expense of resource management and possible performance problems – “enter Hyperplane”.

DynamoDB hardly appears in the new AWS Databases certification, so I´d recommend that you study it in depth for this certification due to the number of questions that may appear.

Services to study in detail

The following services are of great importance – not just to pass the certification – so I highly recommend an in-depth study.

Imagen aws.amazon.com
  • AWS S3 – Core service. It appears consistently across all certifications—use cases, security, encryption, API, development and debugging.
  • Seguridad – It appears consistently in all certifications: KMS encryption, Certificate Manager, AWS Cloud HMS, Federation, Active Directory, IAM, Policies, Roles etc…
  • AWS Lambda – Use cases, creation, configuration-sizing, deployment, optimization, debugging and monitoring (X-RAY).
  • AWS DynamoDB – Use cases, table creation, configuration, optimization, indexes, API, DAX, DynamoDB Streams.
  • AWS API Gateway – Use cases, configuration, API, deployment, security and integration with S3, Cognito and Lambda. Optimization and debugging.
  • AWS ElastiCache – Use cases, configuration-sizing, API, deployment, security, optimization and debugging. It weighs heavily on the exam – at least in my question set.
  • AWS Cognito – Use cases, configuration and integration with other Serverless and Federation services. Concepts like SAML, OAuth, Active Directory etc.… are important for the exam.
  • AWS Cloudformation – Use cases, configuration, creation of scripts, knowledge of the nomenclature / CLI commands.
  • AWS SQS – Use cases, architecture, configuration, API, security, optimization and debugging. Questions of different difficulty levels may appear.

Very important services to consider

  • AWS SNS – Knowledge of use cases at the architecture level, configuration, endpoints, integration with other Serverless services.
  • AWS CLI – Average knowledge of different commands and nomenclature. In my set of questions, not many appeared, but in any case, it is very positive to have some ease at the console level.
  • AWS Kinesis – Some more complex questions appear in this version of the exam than in the previous embodiment. Use cases, configuration, sizing, KPL, KCL, API, debugging and monitoring.
  • AWS CloudWatch, Events, Log – It appears consistently across all certifications. Knowledge of architecture, configuration, metrics, alarms, integration, use cases.
  • AWS X-RAYUse cases, configuration, instrumentation and installation in different environments.
  • AWS Pipeline, CodeBuild, Cloud Deploy, CodeCommit, CodeStar – High-level operation, architecture, integration and use cases. I´d recommend an in-depth study of CodePipeline and CodeBuild.
  • AWS ELB / Certificates – Use cases, ELB types, integration, debugging, monitoring, security – certificate installation.
  • AWS EC2, Autoscaling – Use cases, integration with ELB.
  • AWS Beanstalk – Architecture, use cases, configuration, debugging and deployment types – very important for the exam: All at Once, Rolling etc.…
  • AWS RDS – One of the star services of AWS and the Databases Certification. Here it makes its appearance limited: use cases, configuration, integration – caches – debugging and monitoring.

Other Services

  • AWS Networking – architecture and basic network knowledge: VPC, security groups, Regions, Zones, VPN … They appear in a general and limited way, compared to the rest of the certifications. It is one of the reasons why this certification is ideal for beginners. Network architecture on AWS can be a very complex and arid topic.
  • AWS Step FunctionsA service widely used in the business environment but which appears circumstantially in certifications. I recommend studying architecture, use cases and nomenclature – the questions are not easy.
  • AWS SAM – Use cases, configuration and deployment. SAM CLI Commands.
  • AWS ECS / Fargate – Its appearance in the certifications is quite disappointing – and more so when compared to Google Cloud´s certifications, where Kubernetes – GKE – has the main role – logical since it´s Google’s native technology. I´d recommend studying architecture, use cases – microservices – configuration, integration and monitoring (X-RAY).
  • AWS Cloudfront – General operation and use cases. Integration with S3.
  • AWS Glue – General operation and use cases.
  • AWS EMR General operation and use cases.
  • AWS DataPipeline – General operation and use cases.
  • AWS Cloudtrail – General operation and use cases.
  • AWS GuardDuty – General operation and use cases.
  • AWS SecretsManager – General operation and use cases.

Essential Resources

  • AWS Certification Website.
  • Sample questions
  • Readiness course – recommended, with additional practice questions,
  • AWS Whitepapers – “Storage Services Overview“, “Hosting Static Websites on AWS“, “In Memory Processing in the Cloud with Amazon ElastiCache“, “Serverless Architectures with AWS Lambda“, “Microservices“.
  • FAQS – especially for Lambda, API Gateway, DynamoDB, Cognito, SQS and ElastiCache.
  • AWS Compute Blog
  • Practice Exam – highly recommended, level of difficulty representative of the exam.

Laboratories

I want to propose an incremental practical exercise, cooked by me, that can be useful for preparing for the exam.

Serverless Web App

Imagen aws.amazon.com
  • Create a static website and host it on S3. Use AWS CLI and APIS to create a bucket and copy the contents.
  • Create a repository with CodeCommit and upload the files from the Web to it.
  • Integrate S3 and Cloudfront – creating a Web distribution.
  • Create a Serverless backend with API Gateway, Lambda and DynamoDB, or Aurora Serverless, using Cloudformation and the AWS SAM model.
  • Code the Lambdas functions with one of the supported runtimes – Python, Javascript, Java … – and use BOTO to insert and read in DynamoDB. Each Lambda will correspond to an API Gateway method, accessible from the Web.
  • Integrate X-Ray to trace Lambdas.
  • Create the Stack from the console.
  • Upload the generated YAML´s files to CodeCommit.
  • Optional: create a pipeline using CodePipeline and CodeCommit.
  • Optional: integrate Cognito with API Gateway to authenticate, manage, and restrict API usage.
  • Optional: replace DynamoDB with RDS and integrate Elasticache.
  • Optional: add an SQS queue, which will be fed from a Lambda. Create another Lambda that consumes the queue periodically.

Is it worth it?

Certifications are a good way, not only to validate knowledge externally but to collect updated information, validate good practices and consolidate knowledge with real (or almost) practical cases.

Obtaining the AWS Certified Developer seems to be a “no brainer” in most cases, as I explained previously in another post, and in this one.

Good luck to everyone!

AWS Certified Solutions Architect – más allá del hype

1

La certificación AWS Certified Solutions Architect Associate es una de las más demandadas en el universo cloud en general, y del ecosistema AWS en particular. Decenas de cursos, posts y artículos pueden encontrarse en la red, prometiendo grandes salarios y oportunidades profesionales con solo aprobar un examen …

Este interés lo he podido comprobar tanto en mi entorno profesional directo, donde he estado impartiendo una serie de formaciones guiadas, como en el entorno online / networking, dónde recibo un gran número de mensajes, mayoritariamente de profesionales jóvenes con – relativa – poca experiencia en general, y/o en entornos cloud. Se da la circunstancia que muchos de ellos, o ya han obtenido la certificación, o bien quieren saber si la obtención de la misma les va a abrir las puertas del mundo cloud.

La pregunta se auto-contesta simplemente cruzando los mensajes de estos dos grupos. La realidad es que hay un gran número de certificados a nivel mundial y un número limitado de posiciones sin cubrir de estas características. Existe un claro desnivel entre oferta y demanda, que no se había producido en otros momentos históricos de aparición de nuevas tecnologías: JAVA, IOS / Android etc … la demanda se cubría con profesionales de poca experiencia.

Surgen, por tanto, varias preguntas:

  • ¿Hasta qué punto el hype es cierto?, después de todo es un examen de validación profesional.
  • ¿Realmente la obtención un examen de certificación arquitectural es el punto de comienzo ideal para una posible carrera en el sector cloud o tecnológico?
  • ¿Por qué no se cubren los puestos vacantes existentes?

Becoming a Solutions Architect

La Solución de Arquitectura es una disciplina compleja que engloba un conocimiento holístico de varios dominios y áreas de expertise – no solo técnicos: Arquitectura, Seguridad, Infraestructura, Performance, Reliability, Integración, Diseño, Compliance, Stakeholder / Team Management, Cost Control, Patrones etc etc …

Al adentramos en el campo de la Arquitectura Cloud el tema se vuelve todavía más complejo. Prácticamente ningún sistema cloud vive aislado, sino en conjunción con los sistemas “on premise / legacy” tradicionales, a los que complementa, amplía o sustituye parcialmente. Entramos, por tanto, en el mundo de la computación híbrida, donde temas complejos de infraestructura y seguridad – para empezar – , hacen de los proyectos complicados encajes de bolillos, donde el conocimiento combinado cloud + digital o tradicional es imperativo.

Encontrar Arquitectos de Soluciones que tengan menos de 10 a 15 años de experiencia, expertos en diferentes dominios y sectores tecnológicos, es algo complejo. A este rol se llega de forma gradual, mediante un proceso evolutivo y de maduración que dura años. La mayoría de Arquitectos Cloud vienen o de Arquitectura Digital, o del entorno de Administración de Sistemas – orientados a la Arquitectura DevOps, es decir, provienen de un reciclaje progresivo.

Se corresponde con mi propia experiencia.

A partir del año 2010 he participado en distintos proyectos que integran elementos de Arquitectura Cloud, y de esta forma progresiva y proactiva, he ido entrando de lleno en el sector.

Pensemos que este tipo de proyectos no son de tipo aplicación Web, que cumplen un cometido de dominio de negocio muy concreto, y que residen en la infraestructura del cliente. Son proyectos que forman parte integral de la misma, desde backups en S3 hasta plataformas completas core de negocio, que a su vez, pueden contener varias de estas aplicaciones Web. En cualquier caso, requieren como mínimo un estudio de integración, seguridad, infraestructura, costes etc.. incluso para un caso sencillo de Lift & Shift o backups en S3.

Encontramos, por tanto, la causa del desfase entre puestos no ocupados y profesionales certificados. Si en el pasado se contrataban profesionales con limitada experiencia para el desarrollo y arquitectura – sobre todo de aplicaciones Web – este hecho ya no se produce en la actualidad, debido a varios factores. Entre ellos, la maduración de una industria con una creciente complejidad tecnológica y unos sistemas que conforman el “core de negocio” de muchas empresas, los cuales están siendo digitalizados a marchas forzadas.

Podemos realizar una sencilla analogía. En caso de necesitar un cirujano cardiovascular, seguro que preferimos uno que ya haya operado previamente, en distintos casos y pacientes, aunque sea con técnicas tradicionales. Es cuestión “de confianza”.

Más allá del hype

Las certificaciones son una gran manera de validar los conocimientos sobre un dominio por parte de terceros, para dar una confianza añadida de cara al mercado laboral y profesional, así como un medio de actualización y de ampliación de conocimientos.

Imagen propiedad de aws.com

No son ciclos formativos, aunque como vemos en diagrama superior AWS y otros proveedores proporcionan “Learning Paths” para obtener las competencias necesarias para aprobar las diferentes certificaciones.

¿Son suficientes los Learning Paths?

Depende del caso, sobre todo de la experiencia previa y de los objetivos buscados. Pueden ser un punto de inicio válido, o simplemente una manera aprobar unos exámenes bastantes complejos.

AWS Certified Developer, la alternativa

En mi opinión, la certificación que debería ser el punto de partida para la mayoría de profesionales con poca experiencia, y que deseen entrar en el mundo cloud, es AWS Certified Developer.

Las razones son variadas.

Es más sencillo entrar en el mundo cloud a través de proyectos de implementación o de infraestructuras DevOps, que no desde la Solución de Arquitectura. Es un camino mucho más natural, que permite un conocimiento progresivo y más directo del ecosistema, así como su integración con los sistemas “on premise” y sus diferentes dominios – a través de tareas más acotadas y hands on.

La certificación se centra, en gran parte, en los servicios Serverless más conocidos de AWS: funciones Lambda. API Gateway, S3, DynamoDB etc … una área de gran demanda y de una complejidad asumible – a nivel asociado – que puede ser una buena manera de introducirse en el mundo cloud, con resultados prácticos inmediatos – y muy satisfactorios.

La dificultad de esta certificación es de nivel medio si la comparamos con las otras certificaciones del mismo nivel asociado. De hecho, me acabo de re-certificar, y publicaré un post sobre la misma en breve.

Para el grupo de profesionales que quieran conocer a alto nivel AWS y sus productos, la certificación Solutions Architect puede ser un complemento adecuado a la muy básica AWS Cloud Practitioner.

error: