Strumenti Utente

Strumenti Sito


Questa è una vecchia versione del documento!

Text Analytics A.Y. 2018/19


Day Hour Room
Monday 11-13 X1, Polo Fibonacci
Tuesday 9-11 X1, Polo Fibonacci


Forum on Piazza


The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.

  1. Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
  2. Mathematical background: Probability, Statistics and Algebra
  3. Linguistic essentials: words, lemmas, morphology, PoS, syntax
  4. Basic text processing: regular expression, tokenisation
  5. Data gathering: twitter API, scraping
  6. Basic modelling: collocations, language models
  7. Introduction to Machine Learning: theory and practical tips
  8. Libraries and tools: NLTK, Keras
  9. Applications:
    • Classification/Clustering
    • Sentiment Analysis/Opinion Mining
    • Information Extraction/Relation Extraction
    • Entity Linking
    • Spam Detection: mail spam & phishing, blog spam, review spam

Jupyter Notebook Server

A server has been setup for running Jupyter Notebooks. In order to log into the server, you must get credentials for a Google Suite account:go to this page and register with your University credentials to activate your free account.


List of Suggested Projects - Students can also propose their own project ideas.

Send proposals and preferences to and

Once submit your work the date of the discussion will be set by appointment.

Lecture Notes

Date Lecture Notes
17/9/2018 Introduction Text Analytics
18/9/2018 Introduction to Probability Probability
24/9/2018 Language Modeling Language Modeling
25/9/2018 Introduction to Python See notebooks “Introduction to Python” in folder “Text Analytics” on
1/10/2018 Introduction to Python See notebooks “Introduction to Python 2” and “RegEx” in folder “Text Analytics” on
2/10/2018 Introduction to NLTK See notebooks “Introduction to NLTK” in folder “Text Analytics” on
8/10/2018 Preprocessing and tokenization Tokenization
9/10/2018 Word Similarity Tokenization Homework 1 (deadline 15/10)
15/10/2018 Correction of Homework 1, Text Classification Text Classification
16/10/2018 Classifiers Classifiers
22/10/2018 Hidden Markov Models HMM
23/10/2018 POS Tagging HMM
29/10/2018 HomeWork 2
5/11/2018 Named Entity Tagging NER
06/11/2018 Universal Dependencies
12/11/2018 Dependency Parsing
13/11/2018 Neural Language Models: PCA, Word2Vec LM See notebooks “LanguageModels.ipynb” on
20/11/2018 NLM: FastText, Doc2Vec LM See notebooks “docEmbeddings.ipynb” on
26/11/2018 NLM: Text Generation, ELMo, BERT LM See notebooks “TextGeneration.ipynb” on
27/11/2018 Sentiment Analysis SA
3/12/2018 Lexical Resources LR See notebooks “pmi-lex.ipynb” and “pmi-lex-IMDB.ipynb” on
4/12/2018 Sentiment Classification SC See notebooks “VADER.ipynb”, “sklearn.ipynb”, “lstmNet.ipynb” and “cnnNet.ipynb” on
10/12/2018 Quantification Quantification
11/12/2018 Spam, Scam, Phishing, Fake reviews, Clickbaits, Fake News Spam


  1. D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
  2. B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.
  3. S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.

Previous editions

mds/txa/start.1551450538.txt.gz · Ultima modifica: 01/03/2019 alle 14:28 (2 anni fa) da Andrea Esuli