===== SPM 2017 recovery edition lessons =====

==== Sep. 19 ====
2h: Introduction to the course, structure of the lessons, hardware evolution and the urgency of parallel/distributed computing dictated by parallel hw (multicore CPUs, GPUs, clusters). 
----
Assignment: look at top500.org and green500.org and try to figure out evolution of parallel machines through the "statistics" graphs in the lists. 

==== Sep. 26 ====
2h: Examples of parallel execution of different kind of real life tasks:
  * translating a book
  * counting the euros in pockets of people in a room
  * assembling an object out of pieces p1, ..., pn to be assembled in order 
How the tasks may be executed by a single person and by number of cooperating persons.
Different way of executing tasks in parallel. 
Effect of "slow" persons recruited to execute a task. 
Figuring out general principles out of this: patterns, mechanisms, overheads, ... 

----
{{ :magistraleinformaticanetworking:spm:spm26seta.mp4 |H1}}  {{ :magistraleinformaticanetworking:spm:spm26seb.mp4 |H2}} {{ :magistraleinformaticanetworking:spm:spm26set.pdf |Blackboard}}

==== Oct. 3 ==== 
2h: Finding concurrency aspects:
  * concurrent activities
  * decomposition strategies
  * computation grain
  * concurrent activity graph
  * maximum parallelism degree and minimum time 

2h: Mechanisms to implement parallel computations on multicores
  * C++ threads: creation, join, usage of shared memory with locks and condition variables, async tasks
    * [[http://en.cppreference.com/w/cpp/thread/thread|std::thread]]
    * [[http://en.cppreference.com/w/cpp/thread/async|std::async]]
    * [[https://www.classes.cs.uchicago.edu/archive/2013/spring/12300-1/labs/lab6/|mini tutorial]]
  * Cilk: cilk_spawn and cilk_sync, sample code
    * [[https://www.cilkplus.org/|Home page]]
    * [[https://www.cilkplus.org/tutorial-cilk-plus-keywords|cilk_spwan and cilk_sync]]
    * [[http://faculty.knox.edu/dbunde/teaching/cilk/|mini tutorial]]

{{ :magistraleinformaticanetworking:spm:spm3otta.mp4 |H1}}{{ :magistraleinformaticanetworking:spm:spm3ottb.mp4 |H2}}{{ :magistraleinformaticanetworking:spm:spm3ottd.mp4 |H3}}
(apologize I miss the last period of the lesson of the afternoon, due to my mistake)

{{ :magistraleinformaticanetworking:spm:spm3ott17a.pdf |BlackBoard1}}{{ :magistraleinformaticanetworking:spm:spm3ott17b.pdf |BlackBoard2}}

**Assigment**: write a program using Cilk and C++ threads "translating"  a book in parallel, where:
  * the input book is a //char[]// 
  * the translation consist in capitalizing all letters (e.g. "Ciao, come va?" -> "CIAO, COME VA?") 

==== Oct 10 ====

2h: Time and parallelism related measures
  * how to measure time (std::chrono, time (Unix) command 
  * Speedup, scalability and efficiency
  * commands to filter benchmark output (script, grep, awk, sort, at)

{{ :magistraleinformaticanetworking:spm:spm10aott.mp4 |H1}}
{{ :magistraleinformaticanetworking:spm:spm10bott.mp4 |H2}}

{{ :magistraleinformaticanetworking:spm:spm17ott10.pdf |Blackboard}}

**Assignment**: write a program using C++ threads counting word occurrences in a text.

==== Oct 17 (morning) ====

2h: Access to lab machines. More on mechanisms in C++ for concurrency. 
  * see also {{http://www.olsensoft.com/content/misc/C++11Threads.pdf|this link}} or {{http://home.deib.polimi.it/fornacia/lib/exe/fetch.php?media=teaching:aos:2015:aos201516_multithreading_cpp.pdf|this link}}

{{ :magistraleinformaticanetworking:spm:spm17aott.mp4 |H1}}
{{ :magistraleinformaticanetworking:spm:spm17bott.mp4 |H2}}

{{ :magistraleinformaticanetworking:spm:spm1710matt.pdf |Blackboard}}

==== Oct 17 (afternoon) ====
2h: overheads and counter measures

{{ :magistraleinformaticanetworking:spm:spm17cott.mp4 |H3-4}}

{{ :magistraleinformaticanetworking:spm:spm1710pom.pdf |Blackboard}}

**Assignment**: look for overheads in translate book and word count assignments.

==== Oct 24 ==== 

2h: more on overheads (NUMA memory, false sharing). 
Introduction to vectorization (with sample code). Tools for checking sequential/vector code performance. 

  * [[https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848-CompilerAutovectorizationGuide.pdf|Vectorization guide Intel]]
  * [[http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/Getting_Started.pdf|Vtune Getting Started]]
  * [[https://software.intel.com/en-us/get-started-with-vtune-linux-os|Vtune start page]]

{{ :magistraleinformaticanetworking:spm:spm24otta.mp4 |H1 e 2}}

{{ :magistraleinformaticanetworking:spm:spm24otta.pdf |Blackboard}}

**Assignment (1)**: single Jacobi iteration (as in the blackboard PDF). Create randomly generated matrix A and vectors B and X and run a single iteration of the Jacobi method verifying that the code is properly vectorized and measuring the increase in speed due to vectorization (optimization of the sequential code in general)

**Assignment (2)**: consider a large matrix (4-8K rows/columns at least) and measure the improvement w.r.t. non optimized sequential computation of a single Jacobi iteration 
  * when only vectorization is used
  * when the problem is parallelized using C++11 threads (only) 
  * when you use threads (coarse grain parallelisation) and vectorization (fine grain parallelisation)


==== Nov 7 ====

2h: Introduction to parallel patterns/skeletons

  * using [[http://www.openmp.org/|OpenMP]] to implement parallel for (example of simple data parallel pattern) ([[http://www.openmp.org/wp-content/uploads/OpenMP-4.0-C.pdf|Summary of 4.0]], Old [[http://www.openmp.org/wp-content/uploads/omp-hands-on-SC08.pdf|Tutorial]] slides). OpenMP is available in g++ and in icc by specifying the flag **-fopenmp**. 


{{ :magistraleinformaticanetworking:spm:spm7nova.mp4 |H1}}


  * using  [[https://github.com/arcosuc3m/grppi|GrPPI]] as high level programming model for skeletons (needs c++14)

{{ :magistraleinformaticanetworking:spm:spm07novb.mp4 |H2}}

{{ :magistraleinformaticanetworking:spm:spm7nov.pdf |Blackboard}}

**Assignment**
  * read the Chapter on skeletons and the one on design patterns in the Course notes book (pdf on the web site, Chap. 3 and 6)
  * try to re-write the initial assignments using OpenMP parallel for 
  * if you succeed with openMP try to use GrPPI for the same assignments

==== 14 Nov ====
2h: 

Sample code
  * [[traduttorecpp|code]] with different implementation of a "map" ... 

Parallel design patterns and algorithmic skeletons
  * concepts, similarities and differences
  * Usage of performance models and rewriting rules (this is the reading key for the chapter 10 (sections 10.1 to 10.3) 

{{ :magistraleinformaticanetworking:spm:spm14nova.mp4 |H1}}
{{ :magistraleinformaticanetworking:spm:spm14novb.mp4 |H2}}

{{ :magistraleinformaticanetworking:spm:spm17_14nov.pdf |Blackboard}}

**Assignment**: 
  * implement a divide and conquer computation using c++ threads. The computation should look for a word in a text. Recursively divide the text up to the point you either get something of the a given length (parameter), and in this case you look for the string sequentially. Return an integer value counting the occurrences of the string.  (//sub assignment: consider programming the solution in such a  way the code for the divide and conquer strategy may be used to program another divide and conquer application//)


==== Nov 21 ====
2h

  * data parallel patterns (map, reduce, google map reduce, stencil, stencil-reduce)
  * implementation of pattern frameworks (supporting composition) 
    * template based 
    * data flow based

{{ :magistraleinformaticanetworking:spm:spm21nova.mp4 |H1}}
{{ :magistraleinformaticanetworking:spm:spm21novb.mp4 |H2}}

{{ :magistraleinformaticanetworking:spm:spm21no17.pdf |PDF}}

**Assignments**
  * read Chapter 4 of the notes
  * implement the word count with your own google mapreduce in C++ (//use threads to compute in parallel the map phase and again threads to compute in parallel the reduce phase//)

==== Nov 28 ====
2h

  * FastFlow programming framework: Introduction, core patterns. 
  * documentation may be found at [[http://calvados.di.unipi.it/fastflow|calvados.di.unipi.it]]
  * download code (via SVN!!) at [[https://sourceforge.net/projects/mc-fastflow/|SourceForge]]

{{ :magistraleinformaticanetworking:spm:spm28nova.mp4 |H1}}
{{ :magistraleinformaticanetworking:spm:spm28novb.mp4 |H2}}
{{ :magistraleinformaticanetworking:spm:spm28-17.pdf |PDF}}

**Assignments**
  * try to write a FF pipeline (with core patterns) that implements the Sieve of Eratostene: one stage emits the stream of integers and stage i filters out all the numbers multiple of i. The last stage prints the results. Try generating 20 integers and using k<20 filter stages.

=== Afternoon lesson === 
by David Del Rio Astorga: 1h

  * Introduction to GrPPI (slides sent as PDF to the course mailing list). 

==== December 12 ====

Morning:  2h

  * methodology to develop parallel code with structured programming
  * final project presentation (see the [[http://didawiki.di.unipi.it/doku.php/magistraleinformaticanetworking/spm/esamirec17#program|exams]] page)

{{ :magistraleinformaticanetworking:spm:spm12dic17a1.mp4 |Video1}}{{ :magistraleinformaticanetworking:spm:spm12dic17a2.mp4 |Video2}}

{{ :magistraleinformaticanetworking:spm:spm12dic17a1.pdf |H1}}{{ :magistraleinformaticanetworking:spm:spm12dic17a2.pdf |H2}}

Afternoon: 2h (By Massimo Torquati)
  * high level usage of core FastFlow patterns
  * {{ :magistraleinformaticanetworking:spm:fastflow_overview_spm12dic.pdf |Slides}}
  * {{ :magistraleinformaticanetworking:spm:farmsquare.tgz |Additional code files}}

==== End of lessons ====