Big data is mostly unstructured data and among the variety formats of unstructured data text data has the most potential valuable insights. Text data analysis has a unique challenge beyond the conventional machine learning models. Text mining will utilize specific language rules/grammar (Natural Language Processing – NLP), specific preprocessing to text data such as n-grams, stop-words filtering, and spell-checks or corrections, and specific language models including recent embedding models and deep learning.
Prerequisites : SCM, IMUL, ADM
- Introduce comprehensive preprocessing techniques for text data, ranging from tokenization, normalizations, filtering, and autocorrect.
- Exploring basic text analytics as a way to do simple text data exploratory analysis.
- Introduce text mining methods to produce deep insights from text data or create text analytics engine that will be used at the production level.
- Introduce Natural Language processing
- Farzindar, A., & Inkpen, D. (2017). Natural language processing for social media. Synthesis Lectures on Human Language Technologies, 10(2), 1-195.
- Kao, A., & Poteet, S. R. (Eds.). (2007). Natural language processing and text mining. Springer Science & Business Media.
- Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Packt Publishing Ltd.
|Topic ID||Topic Title||Lessons|
|LPTM1||Introduction to Natural Language Processing||– Tokenization
– Stemming and lemma
– Stopwords filtering
– Named Entity Recognition
– Tree Parser, PoS tagger
|LPTM2||Introduction to Text Mining 1||– Vector Space Model (TF-IDF, BM25, etc)
– (Soft) Clustering – Topic Modelling
– Document Classification
– Sentiment Analysis
– Document Summarization & Recommendation
|LPTM3||Text Mining via Deep Learning||– Introduction to Deep Learning
– Word Embedding (Word2Vec & FastText)
– Multiclassification & Multilabel problems (soft classification)
|LPTM4||Social Media Analytics||– Crawling, Streaming, Scrapping
– Building the network (graph) + Visualizations
– Centrality Analysis
– Graph Partitioning
– Community detection
– Combining Social Network Analysis with text Analytics