Text Mining @ TAU
Course page for teaching materials, 25/26 (Semester B)
Text Mining @ TAU
This semester of Text Mining focused on practical NLP workflows, from preprocessing and classical representations through embeddings, transformers, and LLM-based applications.
Course Outline
Text Mining (Semester B) introduces text analysis and NLP from classical methods to modern Transformers and large language models. You will learn how to turn raw text into usable data, clean and preprocess corpora, and build representations such as Bag-of-Words, TF-IDF, n-grams, and embeddings. The course covers supervised and unsupervised analysis (including clustering and topic modeling), model evaluation and error analysis, and moves on to attention, Transformers, BERT/SBERT, prompting, and retrieval-augmented generation (RAG) for practical NLP systems.
| Lecture | Topic | Focus |
|---|---|---|
| 1 | Text mining & NLP overview | Core tasks, data sources, problem framing, and how text becomes analyzable data. |
| 2 | Cleaning and preprocessing pipelines | Tokenization, normalization, stopwords, stemming/lemmatization, dataset splits, and common pitfalls. |
| 3 | Classical vectorization | Bag-of-Words, TF-IDF, n-grams, sparsity, feature engineering, and baselines for prediction. |
| 4 | Unsupervised text discovery | Similarity, clustering, and topic modeling; interpreting topics and validating unsupervised results. |
| 5 | Embeddings and representation learning | Word embeddings and sentence embeddings; semantic similarity and downstream uses. |
| 6 | From sequences to Attention | Why sequential models struggled, the intuition behind attention, and what it enables. |
| 7 | Transformers in practice: BERT and SBERT | Encoder-style pretraining, fine-tuning vs embeddings, and applications like search and clustering. |
| 8 | LLMs and modern NLP systems | How LLMs work at a high level, prompting patterns, limitations/risks, and RAG-based applications. |
Course Materials
Week 0
- Text Mining 25/6 - S0: Course Intro - Week 0 lecture slides
Week 1
- Text Mining 24/5 - S1: Text Processing - Week 1 lecture slides
Week 2
- Text Mining 24/5 - S2: Word2Vec - Week 2 lecture slides
Week 3
- Text Mining 24/5 - S3: Unsupervised Text Mining & Visualization - Week 3 lecture slides
Week 4
- Text Mining 24/5 - S4: Supervised Text Mining - Week 4 lecture slides
Week 5
- Text Mining 24/5 - S5: Attention to SBERT - Week 5 lecture slides
Week 6
- Text Mining 24/5 - S6: LLMs - Week 6 lecture slides