DL_Logo

Supervised Learning – Classification Models (SLCM)

Classification can be performed on structured or unstructured data. Classification is a technique where we categorize data into a given number of classes. The main goal of a classification problem is to identify the category/class to which a new data will fall under.

In machine learning, classification problems are one of the most fundamentally exciting and yet challenging existing problems. The implications of a competent classification model are enormous — these models are leveraged for natural language processing text classification, image recognition, data prediction, reinforcement training, and a countless number of further applications. This module also offer state of the art discussions on classification models, such as xgboost for structured data and deep learning for unstructured data.

Prerequisites                          : SFDS, MFDS, GLM, EDA.

Objectives/Content               :

Having successfully completed this module trainees are expected to be able to:

  1. Understand and be able to apply the concepts and methods underlying the analysis of classification problems and the context for interpretation of results.
  2. Finding best model for the given problem and finding optimal parameters.
  3. Understand the theoretical bases of different methods of classification models.

Reference                               :

  1. Aggarwal, C. C. (2015). Data mining: the textbook. Springer.
  2. Cabena, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi. Discovering Data Mining: From Concept to Implementation. IBM, 1997
  3. Fayyad, G. Piatetsky-Shapiro, and P. Smith. From data mining to knowledge discovery. AI Magzine,Volume 17, pages 37-54, 1996.
  4. Barry, A. J. Michael & Linoff, S. Gordon. 2004. Data Mining Techniques. Wiley Publishing, Inc. Indianapolis : xxiii + 615 hlm.
  5. Hand, David etc. 2001. Principles of Data Mining. MIT Press Cambridge, Massachusetts : xxvii + 467 hlm.
  6. Hornick, Mark F., Marcade, Erik & Vankayala, Sunil. 2007. Java Data Mining: Strategy,Standard, and Practice. Morgan Kaufman. San Francisco : xxi + 519 hlm.
  7. Tang, ZhaoHui & Jamie, MacLennan. 2005. Data Mining with SQL Server 2005. Wiley Publishing, Inc. Indianapolis : xvii + 435 hal
  8. Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
  9. Yang, X. S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.
  10. Simovici, D. (2018). Mathematical Analysis for Machine Learning and Data Mining. World Scientific Publishing Co., Inc..
  11. Zheng, A. (2015). Evaluating machine learning models: a beginner’s guide to key concepts and pitfalls.
  12. Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill45(37), 870-877.
  13. Jason Brownlee: A Gentle Introduction to XGBoost for Applied Machine Learning. Mach. Learn. Mastery. (2016).
  14. Ketkar, N.: Deep Learning with Python. (2017). https://doi.org/10.1007/978-1-4842-2766-4.
Topic IDTopic TitleLessons
SLCM1
Introduction to Classification Methods
– Introduction to Classification problems
– Inductive Bias and Consistent Learning
– Evaluation Metrics (ROC-AUC, Lifts, Prec, recall, Error Types, F-bscores, NMI, Rand Index, micro-macro metrics, etc)
– Best practice on data labelling
– Dealing with new category / changes of data distributions
SLCM2Naive Bayes Classifier– Model Introduction
– Assumptions
– Parameter Estimation
– Visualizations
– Interpretations
– Modifications
– Case studies
SLCM3k_Nearest Neighbour– Model Introduction
– Assumptions
– Parameter Estimation
– Visualizations
– Interpretations
– Modifications
– Case studies
SLCM4Decision Tree– Model Introduction
– Assumptions
– Parameter Estimation
– Visualizations
– Interpretations
– Modifications
– Case studies
SLCM5Support Vector Machines– Model Introduction
– Assumptions
– Parameter Estimation
– Visualizations
– Interpretations
– Modifications
– Case studies
SLCM6Neural Network Models– Model Introduction
– Assumptions
– Parameter Estimation
– Visualizations
– Interpretations
– Modifications
– Case studies
SLCM7SLCM Grouped DiscussionRecap Discussion SLCM 1-6

SLCM8State of The Arts Classification Models– XGBoost (structured Data)
– Deep Learning (unstructured Data)
– Capstone Project (report + presentation)
SLCM9Advanced Classification Problems Case Studies– High-dimensional data classification problems
– Ensemble/Hybrid Classification problems
(bagging, boosting, blending, etc)
– Imbalance learning
– Rare case classification
– Fine-grained classification problem
– Multilabel Classification
– Multimodal Classification