Data scientists should be able to implement and understand algorithms for data collection and analysis. They should understand the time and space considerations of algorithms. They should follow good design principles developing software, understanding the importance of those principles for testability and maintainability.
With regards to algorithm a data scientist should recognize that the choice of algorithm will have an impact on the time and space required for a problem. A data scientist should also be familiar with a range of algorithmic techniques in order to select the appropriate one in a given situation. In terms of programming data scientist should be able to develop and implement of algorithms, as well as integrating existing software and/or tools. Finally, a data scientist should also know a variety of data structures, be able to use them, and understand the implications of choosing one over another.
Prerequisites : None, Except ADSP4 & ADSP5 need some lessons in DFDS and MFDS
- Analyze the differences between iterative vs recursive-based algorithms
- Implement an efficient search algorithm to find a target with certain characteristics
- Provide the big Oh time and space for a given procedure
- Evaluate best, average, and worst-case behaviors of an algorithm.
- Apply an appropriate algorithmic approach to a given problem.
- Contrast which technique is more appropriate to use based on a given scenario.
- Compare various data structures for a given problem, such as array, list, set, map, stack, queue, hash table, tree, and graph
- Compare the trade-offs of different representations of a matrix and common operations such as addition, subtraction, and multiplication
- Recognize data structures obtained after called script-based subroutines
- Evaluate how efficient data structure for the insert, remove, and access operations
- Design an algorithm in a programming language to solve a simple problem
- Use the techniques of decomposition to modularize a program.
- Create code in a programming language that includes primitive data types, references, variables, expressions, assignments, I/O, control structures, and functions
- Create a simple program that uses recursion.
- Illustrate the use of databases and apply SQL and NoSQL
- Write a regular expression to match a pattern
- Use standard libraries for a given programming language
- Design and implement programs that use a database
- Use techniques for searching patterns in data
- Implement good documentation practices in programming
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms. MIT press.
- Zelle, J. M. (2004). Python programming: an introduction to computer science. Franklin, Beedle & Associates, Inc..
- Peter, CV, (2019). Algorithms Notes for Professionals, GoalKicker
- Barendregt, H. P. (1984). The lambda calculus(Vol. 3). Amsterdam: North-Holland.
- Yang, X. S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.
|Topic ID||Topic Title||Lessons|
|ADSP1||Introduction to Algorithm, Python & Basic Data Structures 1||– Algorithms representation and basic logics|
– Basic and special Python Syntax (I/O, for, if, while, try, etc.)
– Python basic data structures (list, set, dictionary, etc.)
– Python string
– Basic Python Modules (numpy, sys, collections, math, os, itertools)
– Function and lambdas
– Args and Kwargs
– Personal library to improve data analysis
|ADSP2||Introduction to Algorithm, Python & Basic Data Structures 2||– PEP 8/ PyLint: Writing proper python codes|
– Proper Python documentation
– Hash Function
– Function Decorators
– Random number generators.
– Tests, DocTests, UnitTests, pyTest
– Basic GitHub Usage
|ADSP3||Introduction to Algorithm, Python & Basic Data Structures 3||– Import-export plain data: json, csv, xls, sparse data, compressed files.|
– Parsing JSON from (relational) database
– Import-Export from databases (SQL and NoSQL).
– More data structures: matrix, array, dataframe-Series, json, dateTime, etc.
– Dataframe fundamentals (some basic preprocessing, statistics, and visualizations)
– Basic JIT
– Basic Parallel Programming (Embarassingly parallel)
|ADSP4||Advance Jupyter Notebook||– Jupyter notebook VS Terminal|
– Magic functions
– Notebook interactivity,
– add-ons: outline, dashboard, presentation mode, etc.
– Some Modules limitation: Pandas, scikit, etc.
|ADSP5||Asymptotic Notations & Applications||– Big O and Theta notation implemented in some data analyst/scientist real cases.|
– Recursive functions and its solutions
– Analyzing memory usage and performance on data structures and algorithm design.
– Analyzing program scalability
– Choosing optimal algorithm
– Finding bottleneck in a long process and suggesting improvements.
|ADSP6||Advance Algorithm and Data Structures||– Advance data structures: Class, memmap, etc.|
– Trees and Graph data structures
– Greedy and dynamic programming
– Longest Common Subsequence and Longest Increasing Subsequence
– Dynamic Time Warping
|ADSP7||Advanced Python||– Encoding and Decoding of strings|
– Basic string manipulation, beautifulsoup, and regex
– OOP: class, Inheritance, Metaclasses, abstract classes
– Advanced JIT (numba) and Compiled Python (Cython)
|ADSP8||Python Concurrency & Parallel Programming||– In more depth of Concurrency|
– Multi Threading
– Global Interpreter Lock
– Parallel Map Reduce in Python
|ADSP9||Introduction to PySpark and Big Data Processing||– Functional programming (lambda functions)|
– PySpark API and Data Structures
– Running PySpark Programs
– Combining PySpark With Other Tools
|ADSP10||Designing and implementing data science models.||* Require the completion of the fundamental topics in CMs.|
-Case study on a data challenge that cannot be solved using conventional approach or already have module to solve it.