Mathematical Foundation for Data Science (MFDS)

Data science requires a proper understanding of some basic mathematics. This module is designed to teach learners the basic math they will need in order to be successful in the later modules and is created for learners who have basic math skills but may not have taken algebra or pre-calculus. MFDS introduces the core math that data science is built upon, with no extra complexity, introducing unfamiliar ideas and math symbols one-at-a-time.  Learners who complete this course will master the vocabulary, notation, concepts, and algebra rules that all data scientists must know before moving on to more advanced modules. This includes a fundamental concepts of graph theory that are useful for social network analysis or path analysis in general.

Prerequisites                          : None

Objectives/Contents             :

  1. The course will introduce students to the fundamental mathematical concepts required for a program in data science.
  2. Basics of Data Science: Introduction; Typology of problems; Importance of linear algebra, and optimization from a data science perspective; Structured thinking for solving data science problems.
  3. Linear Algebra: Matrices and their properties (determinants, traces, rank, nullity, etc.); Eigenvalues and eigenvectors; Matrix factorizations; Inner products; Distance measures; Projections; Notion of hyperplanes; half-planes.
  4. Optimization: Unconstrained optimization; Necessary and sufficiency conditions for optima; Gradient descent methods; Constrained optimization, KKT conditions; Introduction to non-gradient techniques; Introduction to least squares optimization; Optimization view of machine learning.
  5. Introduction to some basic Data Science Methods: Linear regression as an exemplar function approximation problem; Linear classification problems.
  6. Upon the completion of the module, trainees should be able to answer most of the “why” and “what-if” questions in machine learning formulations. In other words, a good philosophical understanding of machine learning concept.

Reference:

  1. Anton, H., & Rorres, C. (2013). Elementary Linear Algebra: Applications Version. John Wiley & Sons.
  2. Thomas, G. (2018). Mathematics for Machine Learning. University of California, Berkeley.
  3. Michaels, J. G., & Rosen, K. H. (Eds.). (1991). Applications of discrete mathematics(Vol. 267). New York: McGraw-Hill.
  4. Solomon, J. (2015). Numerical algorithms: methods for computer vision, machine learning, and graphics. AK Peters/CRC Press.
  5. Yang, X. S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.
  6. Simovici, D. (2018). Mathematical Analysis for Machine Learning and Data Mining. World Scientific Publishing Co., Inc.
Topic IDTopic TitleLessons
MFDS1Basic Calculus– Set theory and logics
– Functions (transcendent: Logarithm, exponential, polynomial functions, other functions that are normally used in data transformation)
– Metrics (topology) and similarity
– Basic geometry and theorems, trigonometric identities
– Limit, sequence, and series
– Derivative (includes Jacobian and Hessian)
– Function Optimization (Lagrange)
– Basic definite integral
MFDS2Linear Algebra– Basic properties of matrix and vectors: scalar multiplication, linear transformation, transpose, conjugate, rank, determinant
– Inner and outer products, matrix multiplication rule and various algorithms, matrix inverse
– Special matrices: square matrix, identity matrix, triangular matrix, sparse and dense matrix, unit vectors, symmetric matrices.
– Matrix factorization concept/LU decomposition, Gaussian/Gauss-Jordan elimination, solving Ax=b linear system of equation
– Vector space, basis, span, orthogonality, orthonormality, linear least square
– Eigenvalues, eigenvectors, diagonalization, singular value decomposition
MFDS3MFDS Grouped Discussions– Recap discussions of MFDS1 & 2
MFDS3Combinatorics & Graph Theory– Sets, subsets, power sets
– Counting functions, combinatorics, countability
– Basic proof techniques: induction, proof by contradiction
– Basics of inductive, deductive, and propositional logic
– Basic data structures: stacks, queues, graphs, arrays, hash tables, trees
– Graph properties: connected components, degree, maximum flow/minimum cut concepts, graph coloring
MFDS4Operation Research 1
(continuous variables)
– General overview of Operation Research, Modelling, and Optimization
– Unconstrained Optimization:
– One-Dimensional search
– Gradient Methods
– (Quasi) Newton Methods
– Conjugate Direction
– Global search algorithm: Simulated Annealing, Particle Swarm, Genetic Algorithm
– Linear Programming:
– Simplex
– Duality
– Non-Simplex (brief)

MFDS5Operation Research 2
(Integer and Mixed Programming)
– Integer Linear Programming
– Non-Linear Constrained Optimization
– KKT
– Semidefinite programming
– Penalty & Barrier method
– Multiobjective optimization
MFDS6Numerical Methods-Error analysis (Floating point arithmetic)
-Ill conditioning (condition number)
-Basic Computational Statistics (refinement calculations on basic statistics)
-Finite Difference
-Basic Numerical integrations
-Numerical linear algebra (eigen, SLE, Matrix decompositions)
MFDS7Advanced Mathematics for data science*** Recommended only when a trainee already pass all CMs and EMs for DS with flying colors.
– Proposing new visualization using a good knowledge in geometry for a specific DA/DS need.
– Proposing new metric/loss function for complex data problem.
– Proposing new model (not just ensemble/hybrid/hierarchical)

Leave a Reply