| Prossima revisione | Revisione precedente |
| mds:smd:2020 [13/02/2021 alle 14:21 (5 anni fa)] – creata Salvatore Ruggieri | mds:smd:2020 [04/12/2021 alle 17:56 (4 anni fa)] (versione attuale) – eliminata Salvatore Ruggieri |
|---|
| <html> | |
| <!-- Google Analytics --> | |
| <script type="text/javascript" charset="utf-8"> | |
| (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ | |
| (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), | |
| m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) | |
| })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); | |
| |
| ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); | |
| ga('personalTracker.require', 'linker'); | |
| ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); | |
| | |
| ga('personalTracker.require', 'displayfeatures'); | |
| ga('personalTracker.send', 'pageview', 'ruggieri/teaching/smd/'); | |
| setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000); | |
| </script> | |
| <!-- End Google Analytics --> | |
| <!-- Capture clicks --> | |
| <script> | |
| jQuery(document).ready(function(){ | |
| jQuery('a[href$=".pdf"]').click(function() { | |
| var fname = this.href.split('/').pop(); | |
| ga('personalTracker.send', 'event', 'SMD', 'PDFs', fname); | |
| }); | |
| jQuery('a[href$=".r"]').click(function() { | |
| var fname = this.href.split('/').pop(); | |
| ga('personalTracker.send', 'event', 'SMD', 'Rs', fname); | |
| }); | |
| jQuery('a[href$=".zip"]').click(function() { | |
| var fname = this.href.split('/').pop(); | |
| ga('personalTracker.send', 'event', 'SMD', 'ZIPs', fname); | |
| }); | |
| }); | |
| </script> | |
| </html> | |
| ====== Statistical Methods for Data Science A.Y. 2019/20 ====== | |
| |
| =====Instructor===== | |
| |
| * **Salvatore Ruggieri** | |
| * Università di Pisa | |
| * [[http://pages.di.unipi.it/ruggieri/]] | |
| * [[salvatore.ruggieri@unipi.it]] | |
| * **Office hours** | |
| * Tuesday h 14:00 - 17:00, Department of Computer Science, room 321/DO. | |
| * **Office hours only via skype. Skype contact: salvatore.ruggieri** | |
| |
| |
| |
| =====Classes===== | |
| |
| **Dates are preliminary.** | |
| |
| ^ Day of Week ^ Hour ^ Room ^ | |
| | Tuesday | 16:00 - 18:00 | <del>Fib-L1</del> Distance Learning | | |
| | Wednesday| 9:00 - 11:00 | <del>Fib-A1</del> Distance Learning | | |
| |
| |
| =====Pre-requisites===== | |
| |
| Students should be comfortable with most of the topics on mathematical calculus covered in: | |
| |
| * **[P]** J. Ward, J. Abdey. **Mathematics and Statistics**. University of London, 2013. __Chapters 1-8 of Part 1__. | |
| |
| Extra-lessons refreshing such notions may be planned in the first part of the course. | |
| |
| |
| =====Mandatory Teaching Material===== | |
| |
| The following are //mandatory text books//: | |
| |
| * **[T]** F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. **A Modern Introduction to Probability and Statistics**. Springer, 2005. | |
| * **[R]** P. Dalgaard. **Introductory Statistics with R**. 2nd edition, Springer, 2008. | |
| |
| =====Software===== | |
| |
| * [[https://cran.r-project.org/|R]] | |
| * [[https://www.rstudio.com/|R Studio]] | |
| |
| =====Preliminary program and calendar===== | |
| |
| * [[https://esami.unipi.it/esami2/programma.php?c=44017&aa=2019|Preliminary program]]. | |
| * [[https://didattica.di.unipi.it/en/master-programme-in-data-science-and-business-informatics/academic-calendar-2019-2020/|Calendar of lessons]]. | |
| |
| |
| =====Student project===== | |
| |
| * The project can be done in groups of at most 3 students. | |
| * The project must be delivered (report + code) by end of July. | |
| * The oral discussion must be done by the September session, and it will cover both the project and all topics of the course. | |
| * The project replaces the written exam but **students have to [[https://esami.unipi.it/esami2/|register for the written dates]] in order to fill the student's questionnaire**. | |
| * Groups ready to discuss send the project to the teacher plus availability time slots for oral discussion. | |
| * {{ :mds:smd:smd.project.2020.pdf | Project presentation slides}} and [[http://apa.di.unipi.it/smd/video/2020_project.flv|project info audio-video (.flv)]] and [[http://apa.di.unipi.it/smd/video/2020_project_data.flv|project data audio-video (.flv)]]. | |
| * [[https://drive.google.com/drive/folders/1HytbG8cQbQtgVTrdUXqyKQIv1Rz-EKU1?usp=sharing|Google Drive project directory]] (accessible only to authorized students) | |
| =====Written exam===== | |
| |
| __//There are no mid-terms//.__ The exam consists of a written part and an oral part. The written part consists of exercises on the topics of the course. Each question is assigned a grade, summing up to 30 points. Students are admitted to the oral part if they receive a grade of at least 18 points. Written exam consists of open questions and exercises. Example written texts: **{{ :mds:smd:smdsample.pdf | sample1}}**, **{{ :mds:smd:smdsample2.pdf | sample2}}**. Oral consists of critical discussion of the written part and of open questions and problem solving on the topics of the course.\\ | |
| **Online exams:** during the COVID-19 restrictions, the written part and the oral part will be online. For the written part, students will connect to [[ https://www.unipi.it/index.php/docenti2/item/17671-corsi-online | Google Meet]] (room code: 500PP) and will activate both microphone and web-cam. Each sheet will include name, surname, student id, and it will be signed. A picture of the sheets will be delivered to [[ruggieri@di.unipi.it]]. | |
| |
| |
| Registration to exams is mandatory (**look at the deadline for registering!**): [[https://esami.unipi.it/esami2/|register here]]\\ | |
| |
| ^ Date ^ Hour ^ Room ^ Notes ^ | |
| | 19/01/2021 | 16:00 - 18:00 | Online exam | | | |
| | 09/02/2021 | 16:00 - 18:00 | Online exam | | | |
| =====Class calendar===== | |
| |
| **Distance-learning lessons**: see instructions for [[ https://www.unipi.it/index.php/docenti2/item/17671-corsi-online | Google Meet]] and use the room code: 500PP. | |
| |
| ^ ^ Date ^ Room ^ Topic ^ Learning material ^ | |
| |1| 25.02 16:00-18:00 | L1 | Introduction. Probability and independence. | **[T]** Chpts. 1-3 | | |
| |2| 26.02 9:00-11:00 | A1 | R basics. | **[R]** Chpts. 1,2.1,2.2 {{ :mds:smd:r_intro.pdf | slides}} {{ :mds:smd:2019smdr1.r | script1.R}} | | |
| |3| 03.03 16:00-18:00 | L1 | Discrete random variables. | **[T]** Chpt. 4 **[R]** Chpt. 3 {{ :mds:smd:2019smdr2.r | script2.R}} | | |
| |4| 04.03 9:00-11:00 | A1 | Continuous random variables. Simulation. | **[T]** Chpts. 5, 6.1-6.2 **[R]** Chpt. 3 {{ :mds:smd:2019smdr3.r | script3.R}} | | |
| |5| 10.03 16:00-18:00 | Distance-learning | Recalls: derivatives and integrals. [[http://apa.di.unipi.it/smd/video/rec01_20200310.flv|rec01 audio-video (.flv)]] | **[P]** Chpt. 1-8 {{ :mds:smd:2018smdrmath.r | scriptMath.R}}| | |
| |6| 11.03 9:00-11:00 | Distance-learning| Expectation and variance. R data access. [[http://apa.di.unipi.it/smd/video/rec02_20200311.flv|rec02 audio-video (.flv)]] | **[T]** Chpt. 7 **[R]** Chpt. 2.4 {{ :mds:smd:2019smdr4.r | script4.R}} | | |
| |7| 17.03 16:00-18:00 | Distance-learning | R programming. Project presentation. [[http://apa.di.unipi.it/smd/video/rec03_20200317.flv|rec03 audio-video (.flv)]] and [[http://apa.di.unipi.it/smd/video/2020_project.flv|project info audio-video (.flv)]] | **[R]** Chpt. 2.3 {{ :mds:smd:r_intro_exercise.r | exercise.R}} {{ :mds:smd:2019smdr5.zip | script5.zip}} | | |
| |8| 18.03 9:00-11:00 | Distance-learning | Project presentation. Power laws and Zipf laws. [[http://apa.di.unipi.it/smd/video/rec04_20200318.flv|rec04 audio-video (.flv)]] | [[https://arxiv.org/pdf/cond-mat/0412004.pdf | Newman's paper]] Sect I, II, III(A,B,E,F) {{ :mds:smd:2019smdr6.r | script6.R}} | | |
| |9| 24.03 16:00-18:00 | Distance-learning | Computations with random variables. Joint distributions. [[http://apa.di.unipi.it/smd/video/rec05_20200324.flv|rec05 audio-video (.flv)]] | **[T]** Chpts. 8-9 {{ :mds:smd:2019smdr7.zip | script7.zip}} | | |
| |10| 25.03 9:00-11:00 | Distance-learning | Covariance. Sum of random variables. [[http://apa.di.unipi.it/smd/video/rec06_20200325.flv|rec06 audio-video (.flv)]] | **[T]** Chpts. 10-11 {{ :mds:smd:2019smdr8.r | script8.R}} | | |
| |11| 31.03 16:00-18:00 | Distance-learning | Law of large numbers. The central limit theorem. [[http://apa.di.unipi.it/smd/video/rec07_20200331.flv|rec07 audio-video (.flv)]] | **[T]** Chpts. 13-14 {{ :mds:smd:2019smdr9.r | script9.R}} | | |
| |12| 1.04 9:00-11:00 | Distance-learning | Graphical summaries. [[http://apa.di.unipi.it/smd/video/rec08_20200401.flv|rec08 audio-video (.flv)]] | **[T]** Chpt. 15 {{ :mds:smd:2019smdr10.r | script10.R}} | | |
| |13| 7.04 16:00-18:00 | Distance-learning | Numerical summaries. Data preprocessing in R. Q&A on the project. [[http://apa.di.unipi.it/smd/video/rec09_20200407.flv|rec09 audio-video (.flv)]], [[http://apa.di.unipi.it/smd/video/2020_project_data.flv|project data audio-video (.flv)]] | **[T]** Chpt. 16, **[R]** Chpts. 4,10 {{ :mds:smd:2019smdr11.r | script11.R}}, {{ :mds:smd:dataprep.r | dataprep.R}} | | |
| |14| 8.04 9:00-11:00 | Distance-learning | Unbiased estimators. Efficiency and MSE. [[http://apa.di.unipi.it/smd/video/rec10_20200408.flv|rec10 audio-video (.flv)]] | **[T]** Chpts. 17.1-17.3, 19, 20 {{ :mds:smd:2019smdr12.r | script12.R}} | | |
| |<del>XX</del>| <del>15.04 9:00-11:00</del> | | No lesson on this date. Students work on the project on their own. | | | |
| |15| 21.04 16:00-18:00 | Distance-learning | Maximum likelihood. Fisher information.[[http://apa.di.unipi.it/smd/video/rec11_20200421.flv|rec11 audio-video (.flv)]] | **[T]** Chpt. 21 {{ :mds:smd:notes1.pdf |}} {{ :mds:smd:2019smdr13.r | script13.R}} | | |
| |16| 22.04 9:00-11:00 | Distance-learning | Simple linear and polynomial regression. Least squares. [[http://apa.di.unipi.it/smd/video/rec12_20200422.flv|rec12 audio-video (.flv)]] | **[T]** Chpts. 17.4,22 **[R]** Chpts. 6,12.1 {{ :mds:smd:2019smdr14.r | script14.R}} | | |
| |17| 28.04 16:00-18:00 | Distance-learning | Multiple, non-linear, and logistic regression. [[http://apa.di.unipi.it/smd/video/rec13_20200428.flv|rec13 audio-video (.flv)]] | **[R]** Chpt. 13,16.1-16.2 {{ :mds:smd:notes2.pdf |}} {{ :mds:smd:2019smdr15.r | script15.R}} | | |
| |18| 29.04 9:00-11:00 | Distance-learning | Confidence intervals: Gaussian, T-student, large sample method. [[http://apa.di.unipi.it/smd/video/rec14_20200429.flv|rec14 audio-video (.flv)]] | **[T]** Chpts. 23.1,23.2,23.4, 24.3,24.4 {{ :mds:smd:2019smdr16.r | script16.R}} | | |
| |19| 05.05 16:00-18:00 | Distance-learning | Confidence intervals in linear regression. Empirical bootstrap. Application to confidence intervals. [[http://apa.di.unipi.it/smd/video/rec15_20200505.flv|rec15 audio-video (.flv)]] | **[T]** Chpts. 18.1,18.2,23.3 {{ :mds:smd:notes2.pdf |}} {{ :mds:smd:2019smdr17.r | script17.R}} | | |
| |20| 06.05 9:00-11:00 | Distance-learning | Parametric bootstrap. Hypotheses testing. [[http://apa.di.unipi.it/smd/video/rec16_20200506.flv|rec16 audio-video (.flv)]] | **[T]** Chpts. 18.3,25 {{ :mds:smd:2019smdr18.r | script18.R}} | | |
| |21| 12.05 16:00-18:00 | Distance-learning | One-sample t-test and application to linear regression. [[http://apa.di.unipi.it/smd/video/rec17_20200512.flv|rec17 audio-video (.flv)]] | **[T]** Chpts. 26-27, **[R]** Chpts. 5.1,5.2 {{ :mds:smd:notes2.pdf |}} {{ :mds:smd:2019smdr19.r | script19.R}} | | |
| |22| 13.05 9:00-11:00 | Distance-learning | Goodness of fit: chi-square, K-S. Fitting power laws. [[http://apa.di.unipi.it/smd/video/rec18_20200513.flv|rec18 audio-video (.flv)]] | {{ :mds:smd:ks.pdf | K-S}} {{ :mds:smd:2019smdr20.r | script20.R}} | | |
| |<del>XX</del>| <del>19.05 16:00-18:00</del> | | No lesson on this date. Students work on the project on their own. | | | |
| |23| 20.05 9:00-11:00 | Distance-learning| Hypotheses testing: F-test, comparing two samples. [[http://apa.di.unipi.it/smd/video/rec19_20200520.flv|rec19 audio-video (.flv)]] | **[T]** Chpts. 28, **[R]** Chpts. 5.3-5.7 {{ :mds:smd:2019smdr21.r | script21.R}} | | |
| |<del>XX</del>| <del>26.05 16:00-18:00</del> | | No lesson on this date. Students work on the project on their own. | | | |
| |24| 27.05 9:00-11:00 | Distance-learning | Project tutoring. [[http://apa.di.unipi.it/smd/video/rec20_20200527.flv|rec20 audio-video (.flv)]] | | | |
| |
| |
| =====Previous years===== | |
| |
| * [[mds:smd:2019|Statistical Methods for Data Science A.Y. 2018/19]] | |
| * [[mds:smd:2018|Statistical Methods for Data Science A.Y. 2017/18]] | |
| * [[mds:smd:2017|Statistical Methods for Data Science A.Y. 2016/17]] | |
| |
| |