Machine Learning Models |
|
Modeling an epidemic |
|
00:08:00 |
|
The machine learning recipe |
|
00:06:00 |
|
The components of a machine learning model |
|
00:02:00 |
|
Why model? |
|
00:03:00 |
|
On assumptions and can we get rid of them? |
|
00:09:00 |
|
The case of AlphaZero |
|
00:11:00 |
|
Overfitting/underfitting/bias/variance |
|
00:11:00 |
|
Why use machine learning |
|
00:05:00 |
Linear regression |
|
The InsureMe challenge |
|
00:06:00 |
|
Supervised learning |
|
00:05:00 |
|
Linear assumption |
|
00:03:00 |
|
Linear regression template |
|
00:07:00 |
|
Non-linear vs proportional vs linear |
|
00:05:00 |
|
Linear regression template revisited |
|
00:04:00 |
|
Loss function |
|
00:03:00 |
|
Training algorithm |
|
00:08:00 |
|
Code time |
|
00:15:00 |
|
R squared |
|
00:06:00 |
|
Why use a linear model? |
|
00:04:00 |
Scaling and Pipelines |
|
Introduction to scaling |
|
00:06:00 |
|
Min-max scaling |
|
00:03:00 |
|
Code time (min-max scaling) |
|
00:09:00 |
|
The problem with min-max scaling |
|
00:03:00 |
|
What’s your IQ? |
|
00:11:00 |
|
Standard scaling |
|
00:04:00 |
|
Code time (standard scaling) |
|
00:02:00 |
|
Model before and after scaling |
|
00:05:00 |
|
Inference time |
|
00:07:00 |
|
Pipelines |
|
00:03:00 |
|
Code time (pipelines) |
|
00:05:00 |
Regularization |
|
Spurious correlations |
|
00:04:00 |
|
L2 regularization |
|
00:10:00 |
|
Code time (L2 regularization) |
|
00:05:00 |
|
L2 results |
|
00:02:00 |
|
L1 regularization |
|
00:06:00 |
|
Code time (L1 regularization) |
|
00:04:00 |
|
L1 results |
|
00:02:00 |
|
Why does L1 encourage zeros? |
|
00:09:00 |
|
L1 vs L2: Which one is best? |
|
00:01:00 |
Validation |
|
Introduction to validation |
|
00:02:00 |
|
Why not evaluate model on training data |
|
00:06:00 |
|
The validation set |
|
00:05:00 |
|
Code time (validation set) |
|
00:08:00 |
|
Error curves |
|
00:08:00 |
|
Model selection |
|
00:06:00 |
|
The problem with model selection |
|
00:06:00 |
|
Tainted validation set |
|
00:05:00 |
|
Monkeys with typewriters |
|
00:03:00 |
|
My own validation epic fail |
|
00:07:00 |
|
The test set |
|
00:06:00 |
|
What if the model doesn’t pass the test? |
|
00:05:00 |
|
How not to be fooled by randomness |
|
00:02:00 |
|
Cross-validation |
|
00:04:00 |
|
Code time (cross validation) |
|
00:07:00 |
|
Cross-validation results summary |
|
00:02:00 |
|
AutoML |
|
00:05:00 |
|
Is AutoML a good idea? |
|
00:05:00 |
|
Red flags: Don’t do this! |
|
00:07:00 |
|
Red flags summary and what to do instead |
|
00:05:00 |
|
Your job as a data scientist |
|
00:03:00 |
Common Mistakes |
|
Intro and recap |
|
00:02:00 |
|
Mistake #1: Data leakage |
|
00:05:00 |
|
The golden rule |
|
00:04:00 |
|
Helpful trick (feature importance) |
|
00:02:00 |
|
Real example of data leakage (part 1) |
|
00:05:00 |
|
Real example of data leakage (part 2) |
|
00:05:00 |
|
Another (funny) example of data leakage |
|
00:02:00 |
|
Mistake #2: Random split of dependent data |
|
00:05:00 |
|
Another example (insurance data) |
|
00:05:00 |
|
Mistake #3: Look-Ahead Bias |
|
00:06:00 |
|
Example solutions to Look-Ahead Bias |
|
00:02:00 |
|
Consequences of Look-Ahead Bias |
|
00:02:00 |
|
How to split data to avoid Look-Ahead Bias |
|
00:03:00 |
|
Cross-validation with temporally related data |
|
00:03:00 |
|
Mistake #4: Building model for one thing, using it for something else |
|
00:04:00 |
|
Sketchy rationale |
|
00:06:00 |
|
Why this matters for your career and job search |
|
00:04:00 |
Classification - Part 1: Logistic Model |
|
Classifying images of handwritten digits |
|
00:07:00 |
|
Why the usual regression doesn’t work |
|
00:04:00 |
|
Machine learning recipe recap |
|
00:02:00 |
|
Logistic model template (binary) |
|
00:13:00 |
|
Decision function and boundary (binary) |
|
00:05:00 |
|
Logistic model template (multiclass) |
|
00:14:00 |
|
Decision function and boundary (multi-class) |
|
00:01:00 |
|
Summary: binary vs multiclass |
|
00:01:00 |
|
Code time! |
|
00:20:00 |
|
Why the logistic model is often called logistic regression |
|
00:05:00 |
|
One vs Rest, One vs One |
|
00:05:00 |
Classification - Part 2: Maximum Likelihood Estimation |
|
Where we’re at |
|
00:02:00 |
|
Brier score and why it doesn’t work |
|
00:06:00 |
|
The likelihood function |
|
00:11:00 |
|
Optimization task and numerical stability |
|
00:03:00 |
|
Let’s improve the loss function |
|
00:09:00 |
|
Loss value examples |
|
00:05:00 |
|
Adding regularization |
|
00:02:00 |
|
Binary cross-entropy loss |
|
00:03:00 |
Classification - Part 3: Gradient Descent |
|
Recap |
|
00:03:00 |
|
No closed-form solution |
|
00:02:00 |
|
Naive algorithm |
|
00:04:00 |
|
Fog analogy |
|
00:05:00 |
|
Gradient descent overview |
|
00:03:00 |
|
The gradient |
|
00:06:00 |
|
Numerical calculation |
|
00:02:00 |
|
Parameter update |
|
00:04:00 |
|
Convergence |
|
00:03:00 |
|
Analytical solution |
|
00:03:00 |
|
[Optional] Interpreting analytical solution |
|
00:05:00 |
|
Gradient descent conditions |
|
00:03:00 |
|
Beyond vanilla gradient descent |
|
00:03:00 |
|
Code time |
|
00:07:00 |
|
Reading the documentation |
|
00:11:00 |
Classification metrics and class imbalance |
|
Binary classification and class imbalance |
|
00:06:00 |
|
Assessing performance |
|
00:04:00 |
|
Accuracy |
|
00:07:00 |
|
Accuracy with different class importance |
|
00:04:00 |
|
Precision and Recall |
|
00:07:00 |
|
Sensitivity and Specificity |
|
00:03:00 |
|
F-measure and other combined metrics |
|
00:05:00 |
|
ROC curve |
|
00:07:00 |
|
Area under the ROC curve |
|
00:06:00 |
|
Custom metric (important stuff!) |
|
00:06:00 |
|
Other custom metrics |
|
00:03:00 |
|
Bad data science process |
|
00:04:00 |
|
Data rebalancing (avoid doing this!) |
|
00:06:00 |
|
Stratified split |
|
00:03:00 |
Neural Networks |
|
The inverted MNIST dataset |
|
00:04:00 |
|
The problem with linear models |
|
00:05:00 |
|
Neurons |
|
00:03:00 |
|
Multi-layer perceptron (MLP) for binary classification |
|
00:05:00 |
|
MLP for regression |
|
00:02:00 |
|
MLP for multi-class classification |
|
00:01:00 |
|
Hidden layers |
|
00:01:00 |
|
Activation functions |
|
00:03:00 |
|
Decision boundary |
|
00:01:00 |
|
Loss function |
|
00:03:00 |
|
Intro to neural network training |
|
00:03:00 |
|
Parameter initialization |
|
00:03:00 |
|
Saturation |
|
00:05:00 |
|
Non-convexity |
|
00:04:00 |
|
Stochastic gradient descent (SGD) |
|
00:05:00 |
|
More on SGD |
|
00:07:00 |
|
Code time! |
|
00:13:00 |
|
Backpropagation |
|
00:11:00 |
|
The problem with MLPs |
|
00:04:00 |
|
Deep learning |
|
00:09:00 |
Tree-Based Models |
|
Decision trees |
|
00:04:00 |
|
Building decision trees |
|
00:09:00 |
|
Stopping tree growth |
|
00:03:00 |
|
Pros and cons of decision trees |
|
00:08:00 |
|
Decision trees for classification |
|
00:07:00 |
|
Decision boundary |
|
00:01:00 |
|
Bagging |
|
00:04:00 |
|
Random forests |
|
00:06:00 |
|
Gradient-boosted trees for regression |
|
00:07:00 |
|
Gradient-boosted trees for classification [optional] |
|
00:04:00 |
|
How to use gradient-boosted trees |
|
00:03:00 |
K-nn and SVM |
|
Nearest neighbor classification |
|
00:03:00 |
|
K nearest neighbors |
|
00:03:00 |
|
Disadvantages of k-NN |
|
00:04:00 |
|
Recommendation systems (collaborative filtering) |
|
00:03:00 |
|
Introduction to Support Vector Machines (SVMs) |
|
00:05:00 |
|
Maximum margin |
|
00:02:00 |
|
Soft margin |
|
00:02:00 |
|
SVM vs Logistic Model (support vectors) |
|
00:03:00 |
|
Alternative SVM formulation |
|
00:06:00 |
|
Dot product |
|
00:02:00 |
|
Non-linearly separable data |
|
00:03:00 |
|
Kernel trick (polynomial) |
|
00:10:00 |
|
RBF kernel |
|
00:02:00 |
|
SVM remarks |
|
00:06:00 |
Unsupervised Learning |
|
Intro to unsupervised learning |
|
00:01:00 |
|
Clustering |
|
00:03:00 |
|
K-means clustering |
|
00:10:00 |
|
K-means application example |
|
00:03:00 |
|
Elbow method |
|
00:02:00 |
|
Clustering remarks |
|
00:07:00 |
|
Intro to dimensionality reduction |
|
00:05:00 |
|
PCA (principal component analysis) |
|
00:08:00 |
|
PCA remarks |
|
00:03:00 |
|
Code time (PCA) |
|
00:13:00 |
Feature Engineering |
|
Missing data |
|
00:02:00 |
|
Imputation |
|
00:04:00 |
|
Imputer within pipeline |
|
00:04:00 |
|
One-Hot encoding |
|
00:05:00 |
|
Ordinal encoding |
|
00:03:00 |
|
How to combine pipelines |
|
00:04:00 |
|
Code sample |
|
00:08:00 |
|
Feature Engineering |
|
00:07:00 |
|
Features for Natural Language Processing (NLP) |
|
00:11:00 |
|
Anatomy of a Data Science Project |
|
00:01:00 |
Order Certificate |
|
Order Certificate |
|
00:00:00 |