Duration

3 Days

Audience

Employees of federal, state and local governments; and businesses working with the government.

Course Overview

Machine Learning Foundation is a hands-on introduction to the mathematics and algorithms used in Data Science, as well as creating the foundation and building the intuition necessary for solving complex machine learning problems.  The course provides a good kick start in several core areas with the intent on continued, deeper learning as a follow on. This “skills-centric” course is about 50% hands-on lab and 50% lecture, with extensive practical exercises designed to reinforce fundamental skills, concepts and best practices taught throughout the course.  Throughout the course students will learn about and explore popular machine learning algorithms, their applicability and limitations and practical application of these methods in a machine learning environment.

Although this course is highly technical in nature, it is a foundation-level machine learning class for Intermediate skilled team members who are relatively new to AI and machine learning. This course as-is is not for advanced participants.

Learning Objectives

This “skills-centric” course is about 50% hands-on lab and 50% lecture, with extensive practical exercises designed to reinforce fundamental skills, concepts and best practices taught throughout the course.  Throughout the course students will learn about and explore popular machine learning algorithms, their applicability and limitations and practical application of these methods in a machine learning environment. This course reviews key foundational mathematics and introduces students to the algorithms of Data Science.

Working in a hands-on learning environment, students will explore:

  • Popular machine learning algorithms, their applicability and limitations
  • Practical application of these methods in a machine learning environment
  • Practical use cases and limitations of algorithms
  • Core machine learning mathematics and statistics
  • Supervised Learning vs. Unsupervised Learning
  • Classification Algorithms including Support Vector Machines, Discriminant Analysis, Naïve Bayes, and Nearest Neighbor
  • Regression Algorithms including Linear and Logistic Regression, Generalized Linear Modeling, Support Vector Regression, Decision Trees, k-Nearest Neighbors (KNN)
  • Clustering Algorithms including k-Means, Fuzzy clustering, Gaussian Mixture
  • Neural Networks including Hidden Markov (HMM), Recurrent (RNN) and Long-Short Term Memory (LSTM)
  • Dimensionality Reduction, Single Value Decomposition (SVD), Principle Component Analysis (PCA)
  • How to choose an algorithm for a given problem
  • How to choose parameters and activation functions
  • Ensemble methods

Course Outline

1.      Getting Started

  • Installing a Python Data Science Environment
  • Using and understanding IPython (Jupyter) Notebooks
  • Python basics – Part 1
  • Understanding Python code
  • Importing modules
  • Python basics – Part 2
  • Running Python scripts

2.      Statistics and Probability Refresher, and Python Practice

  • Types of data
  • Mean, median, and mode
  • Using mean, median, and mode in Python
  • Standard deviation and variance
  • Probability density function and probability mass function
  • Types of data distributions
  • Percentiles and moments

3.      Matplotlib and Advanced Probability Concepts

  • A crash course in Matplotlib
  • Covariance and correlation
  • Conditional probability
  • Bayes’ theorem

4.      Algorithm Overview

  • Data Prep
  • Linear Algorithms
  • Simple Linear Algorithms
  • Multivariate Linear Regression
  • Logistic Regression
  • Perceptrons
  • Non-Linear Algorithms
  • Classification Trees (CARTs)
  • Naive Bayes
  • k-Nearest Neighbors
  • Ensembles
  • Bootstrap Aggregation
  • Random Forest

5.      Predictive Models

  • Linear regression
  • Polynomial regression
  • Multivariate regression and predicting car prices
  • Multi-level models

6.      Applied Machine Learning with Python

  • Machine learning and train/test
  • Using train/test to prevent overfitting of a polynomial regression
  • Bayesian methods – Concepts
  • Implementing a spam classifier with Naïve Bayes
  • K-Means clustering
  • Clustering people based on income and age
  • Measuring entropy
  • Decision trees – Concepts
  • Decision trees – Predicting hiring decisions using Python
  • Ensemble learning
  • Support vector machine overview
  • Using SVM to cluster people by using scikit-learn

7.      Recommender Systems

  • What are recommender systems?
  • Item-based collaborative filtering
  • How item-based collaborative filtering works?
  • Finding movie similarities
  • Improving the results of movie similarities
  • Making movie recommendations to people
  • Improving the recommendation results

8.      More Applied Machine Learning Techniques

  • K-nearest neighbors – concepts
  • Using KNN to predict a rating for a movie
  • Dimensionality reduction and principal component analysis
  • A PCA example with the Iris dataset
  • Data warehousing overview
  • Reinforcement learning

9.      Dealing with Data in the Real World

  • Bias/variance trade-off
  • K-fold cross-validation to avoid overfitting
  • Data cleaning and normalization
  • Cleaning web log data
  • Normalizing numerical data
  • Detecting outliers

10.   Apache Spark Basics | Machine Learning on Big Data

  • Installing Spark
  • Spark introduction
  • Spark and Resilient Distributed Datasets (RDD)
  • Introducing MLlib
  • Decision Trees in Spark with MLlib
  • K-Means Clustering in Spark
  • TF-IDF
  • Searching Wikipedia with Spark MLlib
  • Using the Spark 2.0 DataFrame API for MLlib

11.   Testing and Experimental Design

  • A/B testing concepts
  • T-test and p-value
  • Measuring t-statistics and p-values using Python
  • Determining how long to run an experiment for
  • A/B test gotchas

12.   GUIs and REST

  • Build a UI for your Models
  • Build a REST API for your Models

13.   What the Future Holds

 

Raise the bar for advancing technology skills