Google Cloud Fundamentals: Big Data and Machine Learning

Duration:

1 Day

Audience:

Employees of federal, state and local governments; and businesses working with the government.

This course is perfect for:

  • Data analysts getting started with Google Cloud Platform
  • Data scientists getting started with Google Cloud Platform
  • Business analysts getting started with Google Cloud Platform
  • Individuals responsible for designing pipelines and architectures for data processing, creating and maintaining machine learning and statistical models, querying datasets, visualizing query results and creating reports
  • Executives and IT decision makers evaluating Google Cloud Platform for use by data scientists

Course Overview:

This one-day instructor-led course introduces participants to the big data capabilities of Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you will get an overview of the Google Cloud platform and a detailed view of the data processing and machine learning capabilities. This course showcases the ease, flexibility, and power of big data solutions on Google Cloud Platform.

What You’ll Learn:

  • Purpose and value of the key Big Data and Machine Learning products in the Google Cloud Platform
  • Use Cloud SQL and Cloud Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud Platform
  • Employ BigQuery and Cloud Datalab to carry out interactive data analysis
  • Train and use a neural network using TensorFlow
  • Employ ML APIs
  • Choose between different data processing products on the Google Cloud Platform

Course Outline:

1. Introducing Google Cloud Platform

  • Google Platform Fundamentals Overview
  • Google Cloud Platform Data Products and Technology
  • Usage scenarios

2. Compute and Storage Fundamentals

  • CPUs on demand (Compute Engine)
  • A global filesystem (Cloud Storage)
  • CloudShell

3. Data Analytics on the Cloud

  • Stepping-stones to the cloud
  • CloudSQL: your SQL database on the cloud
  • Lab: Importing data into CloudSQL and running queries
  • Spark on Dataproc

4. Scaling Data Analysis

  • Fast random access
  • Datalab
  • BigQuery
  • Machine Learning with TensorFlow
  • Fully built models for common needs

5. Data Processing Architectures

  • Message-oriented architectures with Pub/Sub
  • Creating pipelines with Dataflow
  • Reference architecture for real-time and batch data processing

6. Summary

  • Why GCP?
  • Where to go from here
  • Additional Resources

Labs

Lab 1: Sign up for Google Cloud Platform

Lab 2: Set up a Ingest-Transform-Publish data processing pipeline

Lab 3: Machine Learning Recommendations with SparkML

Lab 4: Build machine learning dataset

Lab 5: Train and use neural network

Lab 6: Employ ML APIs