This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on the statistics behind these methods. The course will start with a discussion of how machine learning is different than descriptive statistics, and introduce the scikit learn toolkit.
The issue of dimensionality of data will be discussed, and the task of clustering data, as well as evaluating those clusters, will be tackled. Supervised approaches for creating predictive models will be described, and learners will be able to apply the scikit learn predictive modelling methods while understanding process issues related to data generalizability (e.g. cross validation, overfitting). The course will end with a look at more advanced techniques, such as building ensembles, and practical limitations of predictive models. By the end of this course, students will be able to identify the difference between a supervised (classification) and unsupervised (clustering) technique, identify which technique they need to apply for a particular dataset and need, engineer features to meet that need, and write python code to carry out an analysis.
This course should be taken after Introduction to Data Science in Python and Applied Plotting, Charting & Data Representation in Python and before Applied Text Mining in Python and Applied Social Analysis in Python.
Who is this class for: This course is part of “Applied Data Science with Python“ and is intended for learners who have basic python or programming background, and want to apply statistics, machine learning, information visualization, social network analysis, and text analysis techniques to gain new insight into data. Only minimal statistics background is expected, and the first course contains a refresh of these basic concepts. There are no geographic restrictions. Learners with a formal training in Computer Science but without formal training in data science will still find the skills they acquire in these courses valuable in their studies and careers.
Course 3 of 5 in the Applied Data Science with Python Specialization.
Module 1: Fundamentals of Machine Learning – Intro to SciKit Learn
This module introduces basic machine learning concepts, tasks, and workflow using an example classification problem based on the K-nearest neighbors method, and implemented using the scikit-learn library.
Graded: Module 1 Quiz
Graded: Assignment 1 Submission
Module 2: Supervised Machine Learning – Part 1
This module delves into a wider variety of supervised learning methods for both classification and regression, learning about the connection between model complexity and generalization performance, the importance of proper feature scaling, and how to control model complexity by applying techniques like regularization to avoid overfitting. In addition to k-nearest neighbors, this week covers linear regression (least-squares, ridge, lasso, and polynomial regression), logistic regression, support vector machines, the use of cross-validation for model evaluation, and decision trees.
Graded: Module 2 Quiz
Graded: Assignment 2 Submission
Module 3: Evaluation
This module covers evaluation and model selection methods that you can use to help understand and optimize the performance of your machine learning models.
Graded: Module 3 Quiz
Graded: Assignment 3 Submission
Module 4: Supervised Machine Learning – Part 2
This module covers more advanced supervised learning methods that include ensembles of trees (random forests, gradient boosted trees), and neural networks (with an optional summary on deep learning). You will also learn about the critical problem of data leakage in machine learning and how to detect and avoid it.
Graded: Module 4 Quiz
Graded: Assignment 4 Submission
ENROLL IN COURSE