Text Mining @ TAU


Course page for teaching materials, 25/26 (Semester B)

Text Mining @ TAU

Tel Aviv UniversityInstructor25/26

This semester of Text Mining focused on practical NLP workflows, from preprocessing and classical representations through embeddings, transformers, and LLM-based applications.

Course Outline

Course Syllabus

Text Mining (Semester B) introduces text analysis and NLP from classical methods to modern Transformers and large language models. You will learn how to turn raw text into usable data, clean and preprocess corpora, and build representations such as Bag-of-Words, TF-IDF, n-grams, and embeddings. The course covers supervised and unsupervised analysis (including clustering and topic modeling), model evaluation and error analysis, and moves on to attention, Transformers, BERT/SBERT, prompting, and retrieval-augmented generation (RAG) for practical NLP systems.

Lecture Topic Focus
1 Text mining & NLP overview Core tasks, data sources, problem framing, and how text becomes analyzable data.
2 Cleaning and preprocessing pipelines Tokenization, normalization, stopwords, stemming/lemmatization, dataset splits, and common pitfalls.
3 Classical vectorization Bag-of-Words, TF-IDF, n-grams, sparsity, feature engineering, and baselines for prediction.
4 Unsupervised text discovery Similarity, clustering, and topic modeling; interpreting topics and validating unsupervised results.
5 Embeddings and representation learning Word embeddings and sentence embeddings; semantic similarity and downstream uses.
6 From sequences to Attention Why sequential models struggled, the intuition behind attention, and what it enables.
7 Transformers in practice: BERT and SBERT Encoder-style pretraining, fine-tuning vs embeddings, and applications like search and clustering.
8 LLMs and modern NLP systems How LLMs work at a high level, prompting patterns, limitations/risks, and RAG-based applications.

Course Materials

Week 0

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6