Machine Learning Foundation (Math Emphasis) | Exploring Statistics, Algorithms and Neural Networks (TTML5504)

Duration

3 Days

Audience

Employees of federal, state and local governments; and businesses working with the government.

Course Overview

Machine Learning Foundation is a hands-on introduction to the mathematics and algorithms used in Data Science, as well as creating the foundation and building the intuition necessary for solving complex machine learning problems. The course provides a good kick start in several core areas with the intent on continued, deeper learning as a follow on. This “skills-centric” course is about 50% hands-on lab and 50% lecture, with extensive practical exercises designed to reinforce fundamental skills, concepts and best practices taught throughout the course. Throughout the course students will learn about and explore popular machine learning algorithms, their applicability and limitations and practical application of these methods in a machine learning environment.

Although this course is highly technical in nature, it is a foundation-level machine learning class for Intermediate skilled team members who are relatively new to AI and machine learning. This course as-is is not for advanced participants.

Learning Objectives

This “skills-centric” course is about 50% hands-on lab and 50% lecture, with extensive practical exercises designed to reinforce fundamental skills, concepts and best practices taught throughout the course. Throughout the course students will learn about and explore popular machine learning algorithms, their applicability and limitations and practical application of these methods in a machine learning environment. This course reviews key foundational mathematics and introduces students to the algorithms of Data Science.

Working in a hands-on learning environment, students will explore:

Popular machine learning algorithms, their applicability and limitations
Practical application of these methods in a machine learning environment
Practical use cases and limitations of algorithms
Core machine learning mathematics and statistics
Supervised Learning vs. Unsupervised Learning
Classification Algorithms including Support Vector Machines, Discriminant Analysis, Naïve Bayes, and Nearest Neighbor
Regression Algorithms including Linear and Logistic Regression, Generalized Linear Modeling, Support Vector Regression, Decision Trees, k-Nearest Neighbors (KNN)
Clustering Algorithms including k-Means, Fuzzy clustering, Gaussian Mixture
Neural Networks including Hidden Markov (HMM), Recurrent (RNN) and Long-Short Term Memory (LSTM)
Dimensionality Reduction, Single Value Decomposition (SVD), Principle Component Analysis (PCA)
How to choose an algorithm for a given problem
How to choose parameters and activation functions
Ensemble methods

Course Outline

1. Getting Started

Installing a Python Data Science Environment
Using and understanding IPython (Jupyter) Notebooks
Python basics – Part 1
Understanding Python code
Importing modules
Python basics – Part 2
Running Python scripts

2. Statistics and Probability Refresher, and Python Practice

Types of data
Mean, median, and mode
Using mean, median, and mode in Python
Standard deviation and variance
Probability density function and probability mass function
Types of data distributions
Percentiles and moments

3. Matplotlib and Advanced Probability Concepts

A crash course in Matplotlib
Covariance and correlation
Conditional probability
Bayes’ theorem

4. Algorithm Overview

Data Prep
Linear Algorithms
Simple Linear Algorithms
Multivariate Linear Regression
Logistic Regression
Perceptrons
Non-Linear Algorithms
Classification Trees (CARTs)
Naive Bayes
k-Nearest Neighbors
Ensembles
Bootstrap Aggregation
Random Forest

5. Predictive Models

Linear regression
Polynomial regression
Multivariate regression and predicting car prices
Multi-level models

6. Applied Machine Learning with Python

Machine learning and train/test
Using train/test to prevent overfitting of a polynomial regression
Bayesian methods – Concepts
Implementing a spam classifier with Naïve Bayes
K-Means clustering
Clustering people based on income and age
Measuring entropy
Decision trees – Concepts
Decision trees – Predicting hiring decisions using Python
Ensemble learning
Support vector machine overview
Using SVM to cluster people by using scikit-learn

7. Recommender Systems

What are recommender systems?
Item-based collaborative filtering
How item-based collaborative filtering works?
Finding movie similarities
Improving the results of movie similarities
Making movie recommendations to people
Improving the recommendation results

8. More Applied Machine Learning Techniques

K-nearest neighbors – concepts
Using KNN to predict a rating for a movie
Dimensionality reduction and principal component analysis
A PCA example with the Iris dataset
Data warehousing overview
Reinforcement learning

9. Dealing with Data in the Real World

Bias/variance trade-off
K-fold cross-validation to avoid overfitting
Data cleaning and normalization
Cleaning web log data
Normalizing numerical data
Detecting outliers

10. Apache Spark Basics | Machine Learning on Big Data

Installing Spark
Spark introduction
Spark and Resilient Distributed Datasets (RDD)
Introducing MLlib
Decision Trees in Spark with MLlib
K-Means Clustering in Spark
TF-IDF
Searching Wikipedia with Spark MLlib
Using the Spark 2.0 DataFrame API for MLlib

11. Testing and Experimental Design

A/B testing concepts
T-test and p-value
Measuring t-statistics and p-values using Python
Determining how long to run an experiment for
A/B test gotchas

12. GUIs and REST

Build a UI for your Models
Build a REST API for your Models

13. What the Future Holds