While linear regression in widely used in gaining insight about the relationship between a targeted variable and several factors associated with it, there are some conditions where linear regression is not suitable. For example, when dealing with count data, or categorical target variable representing some classes of interest, then generalized linear model should be considered. Also when dealing with interrelationship amongst categorical variables with many levels, generalized linear model could be used to simplify and identify significant relationships that may occur.

**Prerequisites** : MFDS, SFDS, EDA

**Objectives/Content** :

Upon finishing this module, the expected learning outcomes from the participants are:

- Participants can identify problems and common pitfalls from unsuitable using linear regression for condition where generalized linear regression is required.
- Participants can propose some forms of transformation to linearize the non-linear problem so that it simplifies the model.
- Participants can determine which model is appropriate given data under some circumstances.
- Participants can relate the generalized linear model to that of linear model

**Evaluations/Assignments :**

- For each model learned, trainee will be given a dataset to process and analyze. A simple report is required, for which the evaluation will be based on. Aspects of evaluation consists of model building process and insights derived from the result.
- At the end of the fundamental lessons in this module, trainee will be given a dataset and the metadata (story) behind it. The trainees than need to form a team and apply GLM to gain insights from the data. The evaluation is based on the report and presentation of the findings. The case study can be taken from real dataset from trainee’s division/department or from any other source such as
*Kaggle*. - Online quizzes in the eLearning platform.

**Reference:**

- A. Seber and A.J. Lee, Linear Regression Analysis (2nd Ed), 2003, John Wiley & Sons.
- P. McCullagh and J.A. Nelder, Generalized Linear Models (2nd Ed), 1989, Chapman & Hall.

Topic ID | Topic Title | Lessons |

GLM1 | Introduction to GLM | – General form of generalized linear model – Some link functions and how they are used in model building – Examples of data that are suitable to be analyzed using GLM – Assumptions and treatments |

GLM4 | Loglinear model | – Model building, evaluation, and interpretation |

GLM3 | Data Partitioning & Model Validation | – Classification model validation best practices (overfit-underfit, variance trade-off) – Cross Validation & Nested – LPOCV, LOOCV/JackKnife – Holdout – SubSampling – Bootstrap – Cross validation for time-series models |

GLM2 | Logistic regression | – Ordinal (Binary or multi-class) logistic regression – Model building and evaluation – Odds, Odds ratio, risk, and probability |

…

GLM6 | Probit, tobit, and normit models | – Examples of conditions when these models are required – Model building, evaluation, and interpretation – Binary and multi-classification problems |