Introduction
I recently passed the Google Cloud Professional Data Engineer certification exam, Professional Data Engineer Certification. It took me about five month to prepare for this, and I would like to share my thoughts of why I decided to take it on and how I prepared for it.
At the moment, Google cloud (GCP) is a distant third after Amazon (AWS) and Microsoft (Azure) . Google Cloud has also lost money every year since its inception. However, there are several reasons why someone with no prior Cloud (and Open source software technologies) experience might consider taking Google Cloud Certification.
New software projects often launched on Google Cloud (GCP)
I could see from the job postings in the San Francisco Bay Area, where I live and work, that many startups, professional consulting companies, and even larger established enterprises are looking for GCP expertise, when launch or migrate their applications into a cloud.
Google's Implementation of Open Source Software Technologies
Being deeply engaged with the Microsoft Business Intelligence Stack for a while, I feel that I have missed all that buzz about Open Source technologies, such as Apache AirBeam (data pipelines and workflows), Apache AirFlow (integration of data pipelines with other services), and Apache Kafka (streaming data). Google Cloud appeared later, so it has the latest and, sometimes more efficient, implementations of Open Source technologies compared with AWS or Azure.
Deep discount pricing on services
It’s hard to compare pricing between Google Cloud and the other cloud providers because each one has various discounts based on customers’ particular situations. As a database engineer, my strong interest is in databases with batch and streaming processing. Those services, especially a data warehouse implementation, are expensive to use on all three cloud providers. GCP makes it seem more affordable than the other providers.
Plenty of preparation material
The GCP exams (e.g. Google Cloud Professional Data Engineer) have been around since 2018. There is a lot of preparation material, but it’s evolving since the exam was updated in 2019. It’s important to find and use the latest updates. Here are a few resources:
- Exam preparation Book – Official Google Cloud Certified Professional Data Engineer Study Guide
- Online Coursera preparation classes.
- https://www.qwiklabs.com/ – GCP labs make the experience quite enjoyable!
My study routine
I have had no prior cloud experience before embarking on this learning journey, so my first step was to sign up with the Coursera Data Engineering with Google Cloud Specialization class, for a monthly fee of $USD 50.
The class is divided into six modules starting with cloud fundamentals. Each module includes Google Cloud labs, knowledge tests and instructional videos led by Google engineers. It's important that the entire curriculum was developed together with Google. I was studying two hours a day, and I was able to finish all six modules within two months. The Coursera class gave me about 40% of knowledge needed to pass the actual exam.
While spending time on Coursera classes, I purchased the Official Google Cloud Certified Professional Data Engineer Study Guide book for $USD 50. The book is overwhelming at first, but the author has a knack of breaking rather complicated material into a set of rather simple statements. The book comes with its own set of sample exam questions. I was reading and studying this book continuously. Additional benefit of having this book was that it could be used as a cross reference between Coursera class material and the exam itself. The book gave about 30% of knowledge needed to pass the actual exam.
One of my best 'finds' was QuickLabs.com. The company is owned by Google itself and provides you with an actual experience of working on Google Cloud. There are several hundreds of labs with different levels of difficulty and durations from 30 min to 1 hour 30 min. For a monthly fee of $USD 55, I tried to finish one lab a day for 2 months. The labs provided remaining 30% of knowledge needed to pass the actual exam.
Also, worth mentioning, the wide availability of sample exam questions (search web for `gcp exam dumps`) on the various sites and forums. The one big problem I discovered, was that the answers in most cases were incorrect.
Overall you will learn:
- Differences between SQL and Non-SQL databases : logical, physical architecture, when to use
- Alternative SQL databases (in addition to SQL Server) and when to use them: Cloud SQL, Cloud Spanner, BigQuery
- Data Warehouse implementation (BigQuery) with features comparable to Microsoft Azure and Amazon AWS
- Streaming vs batch processing of data
- Data pipelines implementation as an alternative to Microsoft Integration Services (SSIS) packages.
- Basics of Machine Learning (ML) and Artificial Intelligence (AI)
Conclusion
I hope that this short article provides you with enough motivation to consider Professional Data Engineer Certification) certification as well as ready-to-use updated list of exam preparation materials.