====== Privacy and anonymity in data publishing and (mobility) data mining ====== ====== Instructors ====== * **Fosca Giannotti, Dino Pedreschi, Franco Turini** * **KDD LAB** - Knowledge Discovery Laboratory - Dipartimento di Informatica dell'Università di Pisa and ISTI-CNR * [[fosca.giannotti@isti.cnr.it]],[[pedre@di.unipi.it]], [[turini@di.unipi.it]] * Assistant: **Anna Monreale**, Dottorato in Informatica, Univ. of Pisa [[annam@di.unipi.it]] ====== Summary ====== The automated collection of digital data about any sort of human activities creates unprecedented opportunities to discover useful knowledge with data analysis and mining techniques, as well as severe risks of privacy violation, as such data may reveal highly sensitive personal information. As a remarkable example, the technologies of mobile communications and ubiquitous computing pervade our society, and wireless networks sense the movement of people and vehicles, generating large volumes of mobility data. This is a scenario of great opportunities and risks: on one side, mining this data can produce useful knowledge, supporting sustainable mobility and intelligent transportation systems; on the other side, individual privacy is at risk, as the mobility data may reveal personal habits, preferences, etc. A new multidisciplinary research area is emerging at this crossroads of mobility, data mining, and privacy. This course precisely addresses the problem of privacy and anonymity of personal data, with a focus on location and mobility data, on the basis of the experience gathered in a large-scale European research initiative, the GeoPKDD project – Geographic Privacy-aware Knowledge Discovery and Delivery [[http://www.geopkdd.eu]] – and the PRIN project ANONIMO - An alliance of IT and law for the protection of the privacy and anonymity of personal data - which are coordinated by the instructors. The course will introduce the privacy and anonymity threats and the privacy-preserving and anonymity-preserving techniques developed in the general case of relational data for supporting secure data publishing and data mining. An account of the relevant legal frameworks for privacy and data protection existing internationally will be provided, as well as an account of the neighboring field of statistical disclosure control. The related theme of discrimination discovery in historical decision records will also be discussed, both from a legal and a technical perspective. In the final part, the course shall focus on the issue of privacy and anonymity in location-aware and movement-aware data, presenting an original account of the newly emerging and active research area of privacy and anonymity in mobility data analysis: models of privacy and anonymity threats for mobility data publishing and mobility data mining, and related privacy-preserving techniques for secure mobility data publishing and secure mobility data mining. ====== Reference textbooks ====== Fosca Giannotti and Dino Pedreschi (Eds.) [[http://www.springer.com/computer/database+management+&+information+retrieval/book/978-3-540-75176-2|Mobility, Data Mining and Privacy]]. Springer, 2008. (intro chapter downloadable) ====== Calendar and Lecture Slides====== * Mon 15/06/2009: intro, foundations of data mining (see the [[BISS09|Wiki of Data Mining@BISS 09 ]], Bertinoro International Spring School 2009.) * Mon 22/06/2009: legal frameworks for privacy and data protection ({{:tdm:privlegal.pdf|download slides}}) * Tue 23/06/2009: taxonomy of privacy-preserving data publishing and mining ({{:tdm:privacyphdpisa.surveyppdm.2008.06.23.pdf|download slides}}) * Thu 02/07/2009: discrimination discovery (download slides) * Thu 16/07/2009: data anonymity, data randomization, statistical disclosure control and knowledge hiding ({{:tdm:lecture16.07.2009.pdf|download slides}}) * Fri 17/07/2009: mobility, data mining and privacy, privacy in location-based services, secure multiparty computation ({{:tdm:lecture17.07.2009.pdf|download slides}})