Duration:

5 Days

Audience:

Employees of federal, state and local governments; and businesses working with the government.  Specifically for Data Scientists, Software Engineers, and Data Engineers.

Focus:

Using Python in a Data Science context, Machine Learning

Skill-level:

Intermediate

Hands-On Format:

This hands-on class is approximately 50/50 lab to lecture ratio, combining engaging lecture, demos, group activities and discussions with comprehensive machine-based practical programming labs and project work.

Course Description:

Intermediate Python in Data Science covers the essentials of using Python as a tool for data scientists to perform exploratory data analysis, complex visualizations, and large-scale distributed processing on “Big Data”. In this course we cover essential mathematical and statistics libraries such as NumPy, Pandas, SciPy, SciKit-Learn, frameworks like TensorFlow and Spark, as well as visualization tools like matplotlib, PIL, and Seaborn.

Prerequisites:

Attending students are required to have a background in basic Python development skills.

Outline:

Session: Python for Data Science

Lesson: Python Review

·       Python Language

·       Essential Syntax

·       Lists, Sets, Dictionaries, and Comprehensions

·       Functions

·       Classes, Modules, and imports

·       Exceptions

Lesson: iPython

·       iPython basics

·       Terminal and GUI shells

·       Creating and using notebooks

·       Saving and loading notebooks

·       Ad hoc data visualization

·       Web Notebooks (Jupyter)

Lesson: numpy

·       numpy basics

·       Creating arrays

·       Indexing and slicing

·       Large number sets

·       Transforming data

·       Advanced tricks

Lesson: scipy

·       What can scipy do?

·       Most useful functions

·       Curve fitting

·       Modeling

·       Data visualization

·       Statistics

Lesson: A tour of scipy subpackages

·       Clustering

·       Physical and mathematical Constants

·       FFTs

·       Integral and differential solvers

·       Interpolation and smoothing

·       Input and Output

·       Linear Algebra

·       Image Processing

·       Distance Regression

·       Root-finding

·       Signal Processing

·       Sparse Matrices

·       Spatial data and algorithms

·       Statistical distributions and functions

·       C/C++ Integration

Lesson: pandas

·       pandas overview

·       Dataframes

·       Reading and writing data

·       Data alignment and reshaping

·       Fancy indexing and slicing

·       Merging and joining data sets

Lesson: matplotlib

·       Creating a basic plot

·       Commonly used plots

·       Ad hoc data visualization

·       Advanced usage

·       Exporting images

Lesson: The Python Imaging Library (PIL)

·       PIL overview

·       Core image library

·       Image processing

·       Displaying images

Lesson: seaborn

·       Seaborn overview

·       Bivariate and univariate plots

·       Visualizing Linear Regressions

·       Visualizing Data Matrices

·       Working with Time Series data

Lesson: SciKit-Learn Machine Learning Essentials

·       SciKit overview

·       SciKit-Learn overview

·       Algorithms Overview

·       Classification, Regression, Clustering, and Dimensionality Reduction

·       SciKit Demo

Lesson: TensorFlow Overview

·       TensorFlow overview

·       Keras

·       Getting Started with TensorFlow

Session: Python on Spark

Lesson: PySpark Overview

·       Python and Spark

·       SciKit-Learn vs. Spark MLlib

·       Python at Scale

·       PySpark Demo

Lesson: RDDs and DataFrames

·       DataFrames and Resilient Distributed Datasets (RDDs)

·       Partitions

·       Adding variables to a DataFrame

·       DataFrame Types

·       DataFrame Operations

·       Dependent vs. Independent variables

·       Map/Reduce with DataFrames

Lesson: Spark SQL

·       Spark SQL Overview

·       Data stores: HDFS, Cassandra, HBase, Hive, and S3

·       Table Definitions

·       Queries

Lesson: Spark MLib

·       MLib overview

·       MLib Algorithms Overview

·       Classification Algorithms

·       Regression Algorithms

·       Decision Trees and forests

·       Recommendation with ALS

·       Clustering Algorithms

·       Machine Learning Pipelines

·       Linear Algebra (SVD, PCA)

·       Statistics in MLib

Lesson: Spark Streaming

·       Streaming overview

·       Integrating Spark SQL, MLlib, and Streaming