Text Mining @ TAU

Tel Aviv UniversityInstructor

I teach Text Mining at Tel Aviv University, covering classical NLP, representation learning, transformers, and modern LLM-based workflows for working with text as data.

Course Iterations

The course starts from practical text preprocessing and classical bag-of-words baselines, then moves through embedding methods, sequence models, transformers, and contemporary LLM systems used for retrieval, summarization, extraction, and classification.

This page collects shared materials from multiple iterations of the course together with links to the relevant semester pages.

Course Outline

It begins with text preprocessing, tokenization, normalization, bag-of-words, TF-IDF, and practical pipelines in Python.
The next part covers classical NLP baselines, keyword extraction, clustering, topic modeling, and embedding-based unsupervised analysis.
Later sessions move through Word2Vec and the path from recurrent sequence models to transformer architectures.
Recent iterations focus on BERT, sentence embeddings, prompt-based LLM use, and applied retrieval-augmented systems with real tradeoffs.
Coursework typically includes several exercises plus a final project or project presentation.