Strumenti Utente

Strumenti Sito


mds:txa:start

Text Analytics A.Y. 2017/18

Teachers

Schedule
Day Hour Room
Monday 11-13 N1, Polo Fibonacci
Tuesday 9-11 N1, Polo Fibonacci

Forum

Forum on Piazza

Homeworks

Homework 2: deadline 6/11/2017.

Projects

List of Suggested Projects - Students can also propose their own project ideas.

Send proposals and preferences to attardi@di.unipi.it and andrea.esuli@isti.cnr.it

Post your questions about the project and exams on Piazza.

Once submit your work the date of the discussion will be set by appointment.

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.

  1. Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
  2. Mathematical background: Probability, Statistics and Algebra
  3. Linguistic essentials: words, lemmas, morphology, PoS, syntax
  4. Basic text processing: regular expression, tokenisation
  5. Data gathering: twitter API, scraping
  6. Basic modelling: collocations, language models
  7. Introduction to Machine Learning: theory and practical tips
  8. Libraries and tools: NLTK, Keras
  9. Applications:
    • Classification/Clustering
    • Sentiment Analysis/Opinion Mining
    • Information Extraction/Relation Extraction
    • Entity Linking
    • Spam Detection: mail spam & phishing, blog spam, review spam

Jupyter Notebook Server

A server has been setup for running Jupyter Notebooks. In order to log into the server, you must get credentials for a Google Suite account:go to this page and register with your University credentials to activate your free account.

Lecture Notes

Date Lecture Notes
18/9/2017 Introduction Introduction
19/9/2017 L'età della parola L'età della parola
25/9/2017 Introduction to Probability Probability
26/9/2017 Language Modeling Language Modeling
2/10/2017 Introduction to Python 1/2 Python 1/2 (notebook)
3/10/2017 Introduction to Python 2/2 Python 2/2 (notebook, homework 1)
9/10/2017 Introduction to NLTK Introduction to NLTK
10/10/2017 Basic Text Processing Tokenization
16/10/2017 Word Similarity Word Similarity
17/10/2017 Text Classification Text Classification
23/10/2017 Hidden Markov Models HMM
24/11/2017 Named Entity Recognition NER.pptx
6/11/2017 Classifiers Classifiers
7/11/2017 Deep Learning for NLP Deep Learning
13/11/2017 Neural Language Models NLM (notebooks)
14/11/2017 Correction of Homework 2, Keras Deep Leaning Libraries
21/11/2017 Introduction to Sentiment Analysis Sentiment Analysis
27/11/2017 Lexical resources for sentiment analysis Lexical resources
28/11/2017 Sentiment classification sentiment classification (notebooks)
4/12/2017 Data collection and experiments Data collection and experiments
5/12/2017 Spam, Scam, Phishing, Fake reviews, Clickbaits, Fake News Spam & co
12/12/2017 Quantification (Fabrizio Sebastiani) Quantification

Textbooks

  1. D. Jurafsky, J.H. Martin, Speech and Language Processing. 2nd edition, Prentice-Hall, 2008.
  2. B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.
mds/txa/start.txt · Ultima modifica: 11/01/2018 alle 16:02 (3 mesi fa) da Andrea Esuli