Big data is mostly unstructured data and among the variety formats of unstructured data text data has the most potential valuable insights. Text data analysis has a unique challenge beyond the conventional machine learning models. Text mining will utilize specific language rules/grammar (Natural Language Processing – NLP), specific preprocessing to text data such as n-grams, stop-words filtering, and spell-checks or corrections, and specific language models including recent embedding models and deep learning.
Prerequisites : SCM, IMUL, ADM
Objectives/Content :
- Introduce comprehensive preprocessing techniques for text data, ranging from tokenization, normalizations, filtering, and autocorrect.
- Exploring basic text analytics as a way to do simple text data exploratory analysis.
- Introduce text mining methods to produce deep insights from text data or create text analytics engine that will be used at the production level.
- Introduce Natural Language processing
Reference :
- Farzindar, A., & Inkpen, D. (2017). Natural language processing for social media. Synthesis Lectures on Human Language Technologies, 10(2), 1-195.
- Kao, A., & Poteet, S. R. (Eds.). (2007). Natural language processing and text mining. Springer Science & Business Media.
- Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Packt Publishing Ltd.
- http://www.nltk.org/book/
Topic ID | Topic Title | Lessons |
LPTM1 | Introduction to Natural Language Processing | – Tokenization – Stemming and lemma – Spellcheck – Stopwords filtering – WordNet – Named Entity Recognition – Tree Parser, PoS tagger |
LPTM2 | Introduction to Text Mining 1 | – Vector Space Model (TF-IDF, BM25, etc) – (Soft) Clustering – Topic Modelling – Document Classification – Sentiment Analysis – Document Summarization & Recommendation |
…
LPTM3 | Text Mining via Deep Learning | – Introduction to Deep Learning – Word Embedding (Word2Vec & FastText) – LSTM – CNN – Multiclassification & Multilabel problems (soft classification) – BERT |
LPTM4 | Social Media Analytics | – Crawling, Streaming, Scrapping – Building the network (graph) + Visualizations – Centrality Analysis – Graph Partitioning – Community detection – Combining Social Network Analysis with text Analytics |
You must log in to post a comment.