DidaWiki

Indice

SPM 2017 recovery edition lessons
- Sep. 19
- Sep. 26
- Oct. 3
- Oct 10
- Oct 17 (morning)
- Oct 17 (afternoon)
- Oct 24
- Nov 7
- 14 Nov
- Nov 21
- Nov 28
  - Afternoon lesson
- December 12
- End of lessons

SPM 2017 recovery edition lessons

Sep. 19

2h: Introduction to the course, structure of the lessons, hardware evolution and the urgency of parallel/distributed computing dictated by parallel hw (multicore CPUs, GPUs, clusters).

Assignment: look at top500.org and green500.org and try to figure out evolution of parallel machines through the “statistics” graphs in the lists.

Sep. 26

2h: Examples of parallel execution of different kind of real life tasks:

translating a book
counting the euros in pockets of people in a room
assembling an object out of pieces p1, …, pn to be assembled in order

How the tasks may be executed by a single person and by number of cooperating persons. Different way of executing tasks in parallel. Effect of “slow” persons recruited to execute a task. Figuring out general principles out of this: patterns, mechanisms, overheads, …

Blackboard

Oct. 3

2h: Finding concurrency aspects:

concurrent activities
decomposition strategies
computation grain
concurrent activity graph
maximum parallelism degree and minimum time

2h: Mechanisms to implement parallel computations on multicores

C++ threads: creation, join, usage of shared memory with locks and condition variables, async tasks
Cilk: cilk_spawn and cilk_sync, sample code

(apologize I miss the last period of the lesson of the afternoon, due to my mistake)

BlackBoard1 BlackBoard2

Assigment: write a program using Cilk and C++ threads “translating” a book in parallel, where:

the input book is a char[]
the translation consist in capitalizing all letters (e.g. “Ciao, come va?” → “CIAO, COME VA?”)

Oct 10

2h: Time and parallelism related measures

how to measure time (std::chrono, time (Unix) command
Speedup, scalability and efficiency
commands to filter benchmark output (script, grep, awk, sort, at)

Blackboard

Assignment: write a program using C++ threads counting word occurrences in a text.

Oct 17 (morning)

2h: Access to lab machines. More on mechanisms in C++ for concurrency.

see also this link or this link

Blackboard

Oct 17 (afternoon)

2h: overheads and counter measures

Blackboard

Assignment: look for overheads in translate book and word count assignments.

Oct 24

2h: more on overheads (NUMA memory, false sharing). Introduction to vectorization (with sample code). Tools for checking sequential/vector code performance.

Blackboard

Assignment (1): single Jacobi iteration (as in the blackboard PDF). Create randomly generated matrix A and vectors B and X and run a single iteration of the Jacobi method verifying that the code is properly vectorized and measuring the increase in speed due to vectorization (optimization of the sequential code in general)

Assignment (2): consider a large matrix (4-8K rows/columns at least) and measure the improvement w.r.t. non optimized sequential computation of a single Jacobi iteration

when only vectorization is used
when the problem is parallelized using C++11 threads (only)
when you use threads (coarse grain parallelisation) and vectorization (fine grain parallelisation)

Nov 7

2h: Introduction to parallel patterns/skeletons

using OpenMP to implement parallel for (example of simple data parallel pattern) (Summary of 4.0, Old Tutorial slides). OpenMP is available in g++ and in icc by specifying the flag -fopenmp.

using GrPPI as high level programming model for skeletons (needs c++14)

Blackboard

Assignment

read the Chapter on skeletons and the one on design patterns in the Course notes book (pdf on the web site, Chap. 3 and 6)
try to re-write the initial assignments using OpenMP parallel for
if you succeed with openMP try to use GrPPI for the same assignments

14 Nov

2h:

Sample code

code with different implementation of a “map” …

Parallel design patterns and algorithmic skeletons

concepts, similarities and differences
Usage of performance models and rewriting rules (this is the reading key for the chapter 10 (sections 10.1 to 10.3)

Blackboard

Assignment:

implement a divide and conquer computation using c++ threads. The computation should look for a word in a text. Recursively divide the text up to the point you either get something of the a given length (parameter), and in this case you look for the string sequentially. Return an integer value counting the occurrences of the string. (sub assignment: consider programming the solution in such a way the code for the divide and conquer strategy may be used to program another divide and conquer application)

Nov 21

2h

data parallel patterns (map, reduce, google map reduce, stencil, stencil-reduce)
implementation of pattern frameworks (supporting composition)
- template based
- data flow based

PDF

Assignments

read Chapter 4 of the notes
implement the word count with your own google mapreduce in C++ (use threads to compute in parallel the map phase and again threads to compute in parallel the reduce phase)

Nov 28

2h

FastFlow programming framework: Introduction, core patterns.
documentation may be found at calvados.di.unipi.it
download code (via SVN!!) at SourceForge

PDF

Assignments

try to write a FF pipeline (with core patterns) that implements the Sieve of Eratostene: one stage emits the stream of integers and stage i filters out all the numbers multiple of i. The last stage prints the results. Try generating 20 integers and using k<20 filter stages.

Afternoon lesson

by David Del Rio Astorga: 1h

Introduction to GrPPI (slides sent as PDF to the course mailing list).

December 12

Morning: 2h

methodology to develop parallel code with structured programming
final project presentation (see the exams page)

H1 H2

Afternoon: 2h (By Massimo Torquati)

high level usage of core FastFlow patterns
Slides
Additional code files

Indice

SPM 2017 recovery edition lessons

Sep. 19

Sep. 26

Oct. 3

Oct 10

Oct 17 (morning)

Oct 17 (afternoon)

Oct 24

Nov 7

14 Nov

Nov 21

Nov 28

Afternoon lesson

December 12

End of lessons