•
In the previous section you were introduced to the basics of machine learning, you were introduced to the simple linear regression and regularized linear regression models.
•
Linear methods are the most used model in finance, and in this section you will investigate more types of linear models.
•
After that we will look into non-linear regression models like KNN regression models (k-nearest neighbors algorithm).
◦
We will not yet look at neural networks here, that would be left for another lecture.
◦
Will not investigate tree and rule-based models here — these would be addressed in the classification section.
•
Many models can be used both in classification and regression tasks and sometimes just require one simple mathematical switch.
•
The purpose of this section is to give you a scope of how many different forms of regression models are available to us.
Linear Model Innovations | Purpose | Difficulty |
Generalized Linear Model | Model to predict alternative values (count, categories, survival, skewed value) of different distributions as opposed to just continuous values with normal distributions. | Medium |
Augmented Linear Model | Model with a preprocessing step to generate or transform features and labels to improve the fit of the model. | Easy |
Penalized Augmentation Model | An automated method to augment the data as part of the modelling step while penalizing excessive feature creation. | Hard |
Sampled Linear Models | A method to adjust the importance of different samples (instances) based on some criteria e.g., old data should be less important than recent data. | Medium |
Gallery
Insurance Model (Jupyter Notebook)
•
Due Date: Mar 21, 2023 11:59 PM 2022
•
Submit as IPYNB or Markdown or PDF (code+visualizations)
Response to Student Questions.
Note: I don’t expect you to do everything perfectly, this is your first coding assignment (and I won’t mark it too strictly).
You don’t have to write a lot of code - just enough to tell a story (so perhaps 40-80 lines of code, but you can go over or below, it is up to you). Remember to look at some examples from the course and on Kaggle.
Assignment
Develop and compare six different machine learning regression models to predict clients’ insurance charges:
1.
You can use any programming language, but the final output should contain the code plus the visualizations.
2.
Like always you would be graded for making interesting and creative observations.
3.
You can take inspiration from code that is already available online, such as that on Kaggle.
K. Insurance Model
Linear