===== SPM 2017 recovery edition lessons ===== ==== Sep. 19 ==== 2h: Introduction to the course, structure of the lessons, hardware evolution and the urgency of parallel/distributed computing dictated by parallel hw (multicore CPUs, GPUs, clusters). ---- Assignment: look at top500.org and green500.org and try to figure out evolution of parallel machines through the "statistics" graphs in the lists. ==== Sep. 26 ==== 2h: Examples of parallel execution of different kind of real life tasks: * translating a book * counting the euros in pockets of people in a room * assembling an object out of pieces p1, ..., pn to be assembled in order How the tasks may be executed by a single person and by number of cooperating persons. Different way of executing tasks in parallel. Effect of "slow" persons recruited to execute a task. Figuring out general principles out of this: patterns, mechanisms, overheads, ... ---- {{ :magistraleinformaticanetworking:spm:spm26seta.mp4 |H1}} {{ :magistraleinformaticanetworking:spm:spm26seb.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm26set.pdf |Blackboard}} ==== Oct. 3 ==== 2h: Finding concurrency aspects: * concurrent activities * decomposition strategies * computation grain * concurrent activity graph * maximum parallelism degree and minimum time 2h: Mechanisms to implement parallel computations on multicores * C++ threads: creation, join, usage of shared memory with locks and condition variables, async tasks * [[http://en.cppreference.com/w/cpp/thread/thread|std::thread]] * [[http://en.cppreference.com/w/cpp/thread/async|std::async]] * [[https://www.classes.cs.uchicago.edu/archive/2013/spring/12300-1/labs/lab6/|mini tutorial]] * Cilk: cilk_spawn and cilk_sync, sample code * [[https://www.cilkplus.org/|Home page]] * [[https://www.cilkplus.org/tutorial-cilk-plus-keywords|cilk_spwan and cilk_sync]] * [[http://faculty.knox.edu/dbunde/teaching/cilk/|mini tutorial]] {{ :magistraleinformaticanetworking:spm:spm3otta.mp4 |H1}}{{ :magistraleinformaticanetworking:spm:spm3ottb.mp4 |H2}}{{ :magistraleinformaticanetworking:spm:spm3ottd.mp4 |H3}} (apologize I miss the last period of the lesson of the afternoon, due to my mistake) {{ :magistraleinformaticanetworking:spm:spm3ott17a.pdf |BlackBoard1}}{{ :magistraleinformaticanetworking:spm:spm3ott17b.pdf |BlackBoard2}} **Assigment**: write a program using Cilk and C++ threads "translating" a book in parallel, where: * the input book is a //char[]// * the translation consist in capitalizing all letters (e.g. "Ciao, come va?" -> "CIAO, COME VA?") ==== Oct 10 ==== 2h: Time and parallelism related measures * how to measure time (std::chrono, time (Unix) command * Speedup, scalability and efficiency * commands to filter benchmark output (script, grep, awk, sort, at) {{ :magistraleinformaticanetworking:spm:spm10aott.mp4 |H1}} {{ :magistraleinformaticanetworking:spm:spm10bott.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm17ott10.pdf |Blackboard}} **Assignment**: write a program using C++ threads counting word occurrences in a text. ==== Oct 17 (morning) ==== 2h: Access to lab machines. More on mechanisms in C++ for concurrency. * see also {{http://www.olsensoft.com/content/misc/C++11Threads.pdf|this link}} or {{http://home.deib.polimi.it/fornacia/lib/exe/fetch.php?media=teaching:aos:2015:aos201516_multithreading_cpp.pdf|this link}} {{ :magistraleinformaticanetworking:spm:spm17aott.mp4 |H1}} {{ :magistraleinformaticanetworking:spm:spm17bott.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm1710matt.pdf |Blackboard}} ==== Oct 17 (afternoon) ==== 2h: overheads and counter measures {{ :magistraleinformaticanetworking:spm:spm17cott.mp4 |H3-4}} {{ :magistraleinformaticanetworking:spm:spm1710pom.pdf |Blackboard}} **Assignment**: look for overheads in translate book and word count assignments. ==== Oct 24 ==== 2h: more on overheads (NUMA memory, false sharing). Introduction to vectorization (with sample code). Tools for checking sequential/vector code performance. * [[https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848-CompilerAutovectorizationGuide.pdf|Vectorization guide Intel]] * [[http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/Getting_Started.pdf|Vtune Getting Started]] * [[https://software.intel.com/en-us/get-started-with-vtune-linux-os|Vtune start page]] {{ :magistraleinformaticanetworking:spm:spm24otta.mp4 |H1 e 2}} {{ :magistraleinformaticanetworking:spm:spm24otta.pdf |Blackboard}} **Assignment (1)**: single Jacobi iteration (as in the blackboard PDF). Create randomly generated matrix A and vectors B and X and run a single iteration of the Jacobi method verifying that the code is properly vectorized and measuring the increase in speed due to vectorization (optimization of the sequential code in general) **Assignment (2)**: consider a large matrix (4-8K rows/columns at least) and measure the improvement w.r.t. non optimized sequential computation of a single Jacobi iteration * when only vectorization is used * when the problem is parallelized using C++11 threads (only) * when you use threads (coarse grain parallelisation) and vectorization (fine grain parallelisation) ==== Nov 7 ==== 2h: Introduction to parallel patterns/skeletons * using [[http://www.openmp.org/|OpenMP]] to implement parallel for (example of simple data parallel pattern) ([[http://www.openmp.org/wp-content/uploads/OpenMP-4.0-C.pdf|Summary of 4.0]], Old [[http://www.openmp.org/wp-content/uploads/omp-hands-on-SC08.pdf|Tutorial]] slides). OpenMP is available in g++ and in icc by specifying the flag **-fopenmp**. {{ :magistraleinformaticanetworking:spm:spm7nova.mp4 |H1}} * using [[https://github.com/arcosuc3m/grppi|GrPPI]] as high level programming model for skeletons (needs c++14) {{ :magistraleinformaticanetworking:spm:spm07novb.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm7nov.pdf |Blackboard}} **Assignment** * read the Chapter on skeletons and the one on design patterns in the Course notes book (pdf on the web site, Chap. 3 and 6) * try to re-write the initial assignments using OpenMP parallel for * if you succeed with openMP try to use GrPPI for the same assignments ==== 14 Nov ==== 2h: Sample code * [[traduttorecpp|code]] with different implementation of a "map" ... Parallel design patterns and algorithmic skeletons * concepts, similarities and differences * Usage of performance models and rewriting rules (this is the reading key for the chapter 10 (sections 10.1 to 10.3) {{ :magistraleinformaticanetworking:spm:spm14nova.mp4 |H1}} {{ :magistraleinformaticanetworking:spm:spm14novb.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm17_14nov.pdf |Blackboard}} **Assignment**: * implement a divide and conquer computation using c++ threads. The computation should look for a word in a text. Recursively divide the text up to the point you either get something of the a given length (parameter), and in this case you look for the string sequentially. Return an integer value counting the occurrences of the string. (//sub assignment: consider programming the solution in such a way the code for the divide and conquer strategy may be used to program another divide and conquer application//) ==== Nov 21 ==== 2h * data parallel patterns (map, reduce, google map reduce, stencil, stencil-reduce) * implementation of pattern frameworks (supporting composition) * template based * data flow based {{ :magistraleinformaticanetworking:spm:spm21nova.mp4 |H1}} {{ :magistraleinformaticanetworking:spm:spm21novb.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm21no17.pdf |PDF}} **Assignments** * read Chapter 4 of the notes * implement the word count with your own google mapreduce in C++ (//use threads to compute in parallel the map phase and again threads to compute in parallel the reduce phase//) ==== Nov 28 ==== 2h * FastFlow programming framework: Introduction, core patterns. * documentation may be found at [[http://calvados.di.unipi.it/fastflow|calvados.di.unipi.it]] * download code (via SVN!!) at [[https://sourceforge.net/projects/mc-fastflow/|SourceForge]] {{ :magistraleinformaticanetworking:spm:spm28nova.mp4 |H1}} {{ :magistraleinformaticanetworking:spm:spm28novb.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm28-17.pdf |PDF}} **Assignments** * try to write a FF pipeline (with core patterns) that implements the Sieve of Eratostene: one stage emits the stream of integers and stage i filters out all the numbers multiple of i. The last stage prints the results. Try generating 20 integers and using k<20 filter stages. === Afternoon lesson === by David Del Rio Astorga: 1h * Introduction to GrPPI (slides sent as PDF to the course mailing list). ==== December 12 ==== Morning: 2h * methodology to develop parallel code with structured programming * final project presentation (see the [[http://didawiki.di.unipi.it/doku.php/magistraleinformaticanetworking/spm/esamirec17#program|exams]] page) {{ :magistraleinformaticanetworking:spm:spm12dic17a1.mp4 |Video1}}{{ :magistraleinformaticanetworking:spm:spm12dic17a2.mp4 |Video2}} {{ :magistraleinformaticanetworking:spm:spm12dic17a1.pdf |H1}}{{ :magistraleinformaticanetworking:spm:spm12dic17a2.pdf |H2}} Afternoon: 2h (By Massimo Torquati) * high level usage of core FastFlow patterns * {{ :magistraleinformaticanetworking:spm:fastflow_overview_spm12dic.pdf |Slides}} * {{ :magistraleinformaticanetworking:spm:farmsquare.tgz |Additional code files}} ==== End of lessons ====