Text Mining @ TAU

Tel Aviv UniversityInstructor25/26

This semester of Text Mining focused on practical NLP workflows, from preprocessing and classical representations through embeddings, transformers, and LLM-based applications.

Course Outline

Course Syllabus

Text Mining (Semester B) introduces text analysis and NLP from classical methods to modern Transformers and large language models. You will learn how to turn raw text into usable data, clean and preprocess corpora, and build representations such as Bag-of-Words, TF-IDF, n-grams, and embeddings. The course covers supervised and unsupervised analysis (including clustering and topic modeling), model evaluation and error analysis, and moves on to attention, Transformers, BERT/SBERT, prompting, and retrieval-augmented generation (RAG) for practical NLP systems.

Lecture	Topic	Focus
1	Text mining & NLP overview	Core tasks, data sources, problem framing, and how text becomes analyzable data.
2	Cleaning and preprocessing pipelines	Tokenization, normalization, stopwords, stemming/lemmatization, dataset splits, and common pitfalls.
3	Classical vectorization	Bag-of-Words, TF-IDF, n-grams, sparsity, feature engineering, and baselines for prediction.
4	Unsupervised text discovery	Similarity, clustering, and topic modeling; interpreting topics and validating unsupervised results.
5	Embeddings and representation learning	Word embeddings and sentence embeddings; semantic similarity and downstream uses.
6	From sequences to Attention	Why sequential models struggled, the intuition behind attention, and what it enables.
7	Transformers in practice: BERT and SBERT	Encoder-style pretraining, fine-tuning vs embeddings, and applications like search and clustering.
8	LLMs and modern NLP systems	How LLMs work at a high level, prompting patterns, limitations/risks, and RAG-based applications.

Course Materials

Text Mining @ TAU

Text Mining @ TAU

Course Outline

Course Materials

Week 0

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6