Sampling Techniques and Experimental Design (STED)

Despite the advancement in big data computing and-or cloud technology. Analyzing large volume of data is time and cost consuming. In most cases samples can be taken from databases to produce similar results as when larger amount of data is analyzed. Moreover, when experimenting with the data to find optimal model clearly working on the whole data is unacceptable. This module also discusses experiment design techniques such as A/B testing and some alternatives methods such as multi-armed bandit algorithm.

Prerequisites                          : GLM, EDA

Objectives/Content               :

  1. Trainee understand not only the proper sampling techniques on different use case scenarios, but also the inferential adjustments and-or procedures that need to be taken.
  2. Trainee are able to determine optimal sampling size for the desired confidence level of analysis results on some daily cases.
  3. Describe how to design experiments, carry them out, and analyze the data they yield.
  4. Understand the process of designing an experiment including factorial and fractional factorial designs.
  5. Examine how a factorial design allows cost reduction, increases efficiency of experimentation, and reveals the essential nature of a process; and discuss its advantages to those who conduct the experiments as well as those to whom the results are reported.
  6. Investigate the logic of hypothesis testing, including analysis of variance and the detailed analysis of experimental data.
  7. Formulate understanding of the subject using real examples, including experimentation in the social and economic sciences.
  8. Gain an understanding of how the analysis of experimental design data is carried out using the most common software packages (such as Python).
  9. Be able to apply what trainees have learned immediately upon return to their division

Reference                               :

  1. Selvamuthu, D. and Das, D., 2018. Introduction to Statistical Methods, Design of Experiments and Statistical Quality Control. Springer.
  2. Cochran, W. G. (2007). Sampling techniques. John Wiley & Sons.
  3. Jiang, J. (2010). Large sample techniques for statistics. Springer Science & Business Media.
  4. Montgomery, D. C. (2019). Design and Analysis of Experiments, 10th Edition, John Wiley & Sons.
  5. Garrett, J. J. (2010). The elements of user experience: user-centered design for the web and beyond. Pearson Education.
  6. Allen, T. T. (2006). Introduction to engineering statistics and six sigma: statistical quality control and design of experiments and systems. Springer Science & Business Media.
  7. Som, R. K. (1995). Practical sampling techniques. CRC press.
STED1Sampling Techniques for big data analysis– Cases where sampling should be used in big data analysis
– Bias on sampling techniques, weights, and inference
– Simple random sampling and filtering
– Probabilistic sampling techniques
– Diversity-based Sampling Methods
– Bayesian Inference – Monte Carlo
– Active learning sampling techniques:
– Query by Committee
– Uncertainty Sampling
– Density Sampling
– Rejection and importance sampling
– Sampling on streaming data
STED2Experimental Design– Introduction to experimental design
– Questions experimental design can answer
– Classical experimental strategies / Why is experimental design useful?
– Principles of experimental design
– Problem formulation; which design type to use
– Screening designs: types, purposes and principles
– Optimizations designs: types, purposes and principles
– Analysis of screening designs: main effects, interactions, ANOVA
– Analysis of optimization designs: main effects, interactions, square effects, response surface ANOVA
– Response surface modeling and interpretation
– Factorial and multilevel (block) design
– random and mixed effects models
– A/B Testing revisited
– Multi-Armed Bandit technique
STED3Sampling-based machine learning models– Minibatch k-Means
– Re-sampling technique
– Introduction to Active Learning machine learning techniques
– Stationer and streaming Relevant sampling for text data analytics using termvectors