Foundations of Algorithms, Data Structures, and Programming (ADSP)

Data scientists should be able to implement and understand algorithms for data collection and analysis. They should understand the time and space considerations of algorithms. They should follow good design principles developing software, understanding the importance of those principles for testability and maintainability.

With regards to algorithm a data scientist should recognize that the choice of algorithm will have an impact on the time and space required for a problem. A data scientist should also be familiar with a range of algorithmic techniques in order to select the appropriate one in a given situation. In terms of programming data scientist should be able to develop and implement of algorithms, as well as integrating existing software and/or tools. Finally, a data scientist should also know a variety of data structures, be able to use them, and understand the implications of choosing one over another.

Prerequisites              : None, Except ADSP4 & ADSP5 need some lessons in DFDS and MFDS

Objectives                  :

  1. Analyze the differences between iterative vs recursive-based algorithms
  2. Implement an efficient search algorithm to find a target with certain characteristics
  3. Provide the big Oh time and space for a given procedure
  4. Evaluate best, average, and worst-case behaviors of an algorithm.
  5. Apply an appropriate algorithmic approach to a given problem.
  6. Contrast which technique is more appropriate to use based on a given scenario.
  7. Compare various data structures for a given problem, such as array, list, set, map, stack, queue, hash table, tree, and graph
  8. Compare the trade-offs of different representations of a matrix and common operations such as addition, subtraction, and multiplication
  9. Recognize data structures obtained after called script-based subroutines
  10. Evaluate how efficient data structure for the insert, remove, and access operations
  11. Design an algorithm in a programming language to solve a simple problem
  12. Use the techniques of decomposition to modularize a program.
  13. Create code in a programming language that includes primitive data types, references, variables, expressions, assignments, I/O, control structures, and functions
  14. Create a simple program that uses recursion.
  15. Illustrate the use of databases and apply SQL and NoSQL
  16. Write a regular expression to match a pattern
  17. Use standard libraries for a given programming language
  18. Design and implement programs that use a database
  19. Use techniques for searching patterns in data
  20. Implement good documentation practices in programming

 Reference:

  1. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms. MIT press.
  2. Zelle, J. M. (2004). Python programming: an introduction to computer science. Franklin, Beedle & Associates, Inc..
  3. Peter, CV, (2019). Algorithms Notes for Professionals, GoalKicker
  4. Barendregt, H. P. (1984). The lambda calculus(Vol. 3). Amsterdam: North-Holland.
  5. Yang, X. S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.
Topic IDTopic TitleLessons
ADSP-01Introduction to Algorithm, Python & Basic Data Structures 1– Algorithms representation and basic logics
– Basic and special Python Syntax (I/O, for, if, while, try, etc.)
– Python basic data structures (list, set, dictionary, etc.)
– Python string
– Basic Python Modules (numpy, sys, collections, math, os, itertools)
– Function and lambdas
– Args and Kwargs
– Personal library to improve data analysis
ADSP2Introduction to Algorithm, Python & Basic Data Structures 2– PEP 8/ PyLint: Writing proper python codes
– Proper Python documentation
– Hash Function
– Comprehensions
– Function Decorators
– Random number generators.
– Tests, DocTests, UnitTests, pyTest
– Basic GitHub Usage
ADSP3Introduction to Algorithm, Python & Basic Data Structures 3– Import-export plain data: json, csv, xls, sparse data, compressed files.
– Parsing JSON from (relational) database
– Import-Export from databases (SQL and NoSQL).
– More data structures: matrix, array, dataframe-Series, json, dateTime, etc.
– Dataframe fundamentals (some basic preprocessing, statistics, and visualizations)
– Basic JIT
– Basic Parallel Programming (Embarassingly parallel)
ADSP4Advance Jupyter Notebook– Jupyter notebook VS Terminal
– Magic functions
– Notebook interactivity,
– markdown,
– latex
– add-ons: outline, dashboard, presentation mode, etc.
– Some Modules limitation: Pandas, scikit, etc.
ADSP5Asymptotic Notations & Applications– Big O and Theta notation implemented in some data analyst/scientist real cases.
– Recursive functions and its solutions
– Analyzing memory usage and performance on data structures and algorithm design.
– Analyzing program scalability
– Choosing optimal algorithm
– Finding bottleneck in a long process and suggesting improvements.

ADSP6Advance Algorithm and Data Structures– Advance data structures: Class, memmap, etc.
– Trees and Graph data structures
– Greedy and dynamic programming
– Longest Common Subsequence and Longest Increasing Subsequence
– Dynamic Time Warping
ADSP7Advanced Python– Encoding and Decoding of strings
– Basic string manipulation, beautifulsoup, and regex
– OOP: class, Inheritance, Metaclasses, abstract classes
– Advanced JIT (numba) and Compiled Python (Cython)
ADSP8Python Concurrency & Parallel Programming– In more depth of Concurrency
– Asyncio
– Multi Threading
– Global Interpreter Lock
– Parallel Map Reduce in Python
ADSP9Introduction to PySpark and Big Data Processing– Functional programming (lambda functions)
– PySpark API and Data Structures
– Running PySpark Programs
– Combining PySpark With Other Tools
ADSP10Designing and implementing data science models.* Require the completion of the fundamental topics in CMs.
-Case study on a data challenge that cannot be solved using conventional approach or already have module to solve it.