DidaWiki

Indice

Journal of Lessons, SPD year 2014-2015
- Journal
- Slides, Notes and References to papers

Journal of Lessons, SPD year 2014-2015

Journal

24/02/2015 Course introduction : parallel programming frameworks and high-level approach to parallel programming over different platforms: MPI, TBB and OpenCL as main examples; course organization and prerequisites; reference books and studying material. MPI (Message Passing Interface) standard : brief history and aim of the standard, single program / multiple data execution model, compilation and linkage model; issues in supporting multiple programming languages and uses (application, utility library and programming language support) with a static compilation and linkage approach. Portability in parallel programming: functional and non-functional aspects, performance tuning and performance debugging. MPI as a parallel framework that supports a structured approach to parallel programming. Basic concepts of MPI.
25/02/2015 MPI basic concepts : communicators (definition, purpose, difference between inter and intra-communicators, process ranks); point to point communication (concepts of envelope, local/global completion, blocking/non-blocking primitive, send modes); collective communications (definition, communication scope, global serialization, freedom of implementation in the standard); MPI datatypes (basic meaning and use, primitive / derived datatypes, relationship with sequential language types).
03/03/2015 MPI : point to point communication semantics (buffer behaviour, receive, wildcards, status objects), MPI datatypes (purpose as explicitly defined meta-data provided to the MPI implementation, semantics, typemap and type signature, matching rules for communication, possible data conversions in heterogeneous distributed architectures, role in MPI-performed packing and unpacking, examples); basic and derived datatypes (multiple language bindings, code-instantiated metadata, examples)
04/03/2015 MPI : core primitives for datatype creation (MPI_Type_* : contiguous, vector, hvector, indexed, hindexed, struct, commit, free) and examples; point to point communication modes (MPI_RSend usage); non-blocking communication (Wait and Test group of primitives, semantics, MPI_Request object handles to active requests); communicators and groups (communicator design aim and programming abstraction, local and global information, attributes and virtual topologies, groups as local objects, primitives for locally creating and managing groups)
~~10/03/2015~~ Lesson is postponed to 12/03/2015 (see NEWS on the course page).
11/03/2015 MPI : canceling and testing cancellation of non-blocking primitives (issues and pitfalls, interaction with MPI implementation); MPI_finalize; intracommunicators (basic primitives concerning size, rank, comparison; communicator creation as a collective operation, MPI_Comm_create basic and general case, MPI_Comm_split; rationale of use of communicator creation and duplication as a tool to create isolate communication spaces for subprograms/skeletons and for parallel application libraries; impact on performance)
12/03/2015 MPI Lab. Writing structured MPI programs, point to point communications (ping pong, token ring and variants) and how to write reusable code by exploiting communicators, first examples with derived datatypes (indexed, vector and their combinations).
17/03/2015 MPI Lab. Design and implementation of a simple farm skeleton in MPI. Reusability and separation of concerns in MPI; exploiting communicators for skeleton and inside skeleton implementation (examples with farm extensions: worker initialization, worker status collection at the end of the stream, work stealing); different communication primitives (Synch/Buffered and Blocking/non blocking) wrt farm load distributions strategies: Round Robin, Job request, implicit job request with double buffering.
18/03/2015 MPI collective communications (definition and semantics, execution environment, basic features, agreement of key parameters among the processes, constraints on Datatypes and typemaps for collective op.s, overall serialization vs synchronization, potential deadlocks); taxonomy of MPI collectives (blocking/non-blocking, synchronization/communication/communication+computation, asymmetry of the communication pattern, variable size versions, all- versions); MPI_IN_PLACE and collective op.s.; basic blocking collective operations (barrier, broadcast, gather/gatherV, scatter/scatterV, allgather, alltoall/alltoallv).
24/03/2015 MPI Compute and communication collectives, MPI_Reduce, semantics; MPI Operators (arithmetic, logic, bitwise, MINLOC and MAXLOC) and their interaction with Datatypes; defining MPI custom operators via MPI_Create_op.
25/03/2015 MPI Lab. Adding generic support for farm workers' state and reinitialization from the emitter (streams and substreams), semantics and design choices. MPI General Reduce and Scan operators: Allreduce, Reduce_scatter_block, Reduce_scatter, Scan, Exscan.
31/03/2015 Thead Building Blocks (TBB) Thread Building Blocks C++ template library overview: purpose, abstraction mechanisms and implementation layers (templates, runtime, supported abstractions); tasks vs threads and parallel patters composability; parallel_for, ranges and partitioners; task partitioning and scheduling, grain size and affinity; quick survey of use of lambda expression, containers and mutexes.
01/04/2015 MPI IO Rationale of MPI-IO; basic concepts (file, file handle, etype, filetype, view); setting a file view; file size, offset and file pointer (shared, individual); file open, MPI_Info and most important tags; primitives for basic file management (close, delete, set and get size); examples.
15/04/2015
23/04/2015 TBB laboratory Simple Mandelbrot set example, use of parallel for and blocked range, speedup on 8 cores.
28/04/2015 KDD examples Short introduction to Knowledge Discovery in Databases and Data Mining; examples; parallelism exploitation in data mining algorithms; K-means algorithm.
29/04/2015 TBB laboratory Implementation of K-means with TBB, from example code to a running program; farm (parallel for) and reduce based parallelization of the inner loop. Tuning the TBB program for speedup.
05/05/2015 TBB TBB basic C++ concepts and algorithms (i.e. parallel skeletons). Binary splittables, range concept and blocked ranges, proportional split; parallel_for_each, parallel for; passing arguments to parallel algorithms (lamba functions vs body classes), optional arguments; parallel for 1D simplified syntax; partitioners; reduce (differences between “functional” and “imperative” forms); deterministic reduce; pipeline class and filter class (i.e. stages), strongly typed parallel_pipeline and make_filter template.
06/05/2015 Project discussion Overall structure and scope of the course final project.
12/05/2015 TBB parallel_do; containers: similarity and differences with STL containers, mutithreaded/sequential performance tradeoffs wrt software lockout, space and time overheads, relaxed/restricted semantics and feature drops, thread view consistency; container_range, extending containers to ranges, concurrent map and set templates: concurrent_hash, unordered, unordered_multi map; concurrent and unordered set; concurrent queue, bounded_queue, priority queue; concurrent vector; thread local storage; C+11 style atomics.
13/05/2015 OpenCL introduction Development history of modern GPUs, graphic pipeline, HW/FW implementations, load unbalance related to the distribution of graphic primitives executed, more “general purpose” and programmable core design; generic constraints and optimizations of the GPU approach; modern GPU architecture, memory optimization and constraints, memory spaces. GPGPU, and transition to explicitly general purpose programming languages for GPU. OpenCL intro and examples: framework goal, design concepts and programming abstractions (Devices/host interaction, kernels, queues). Memory spaces and Host/devices communication.
19/05/2015 OpenCL design concepts and programming abstractions: Devices/host interaction, kernel compilation, program objects, memory objects and kernel arguments, execution, kernel instances and workgroups, workgroup synchronization; vector types and vector operations; example with matrix multiplication.
~~20/05/2015~~ lesson postponed to 22/05
22/05/2015 TBB mutex-es and locks in TBB as a lower-level synchronization mechanism for building concurrent data structures; scoped locking approach, rationale, lock implementation and overheads, tradeoffs among features (scalability, fairness, lock size as well as reentrant and yeld/block behaviour); plain mutex and recursive mutex, spin and queueing mutex; basics of transactional memory support in the CPU; speculative spin locks in TBB; read-write locks (of spin, speculative spin and queue kinds) and access upgrade/downgrade; null mutex; low-level task management and scheduling.
25/05/2015 Large-graph computations in distributed memory: Spark and Graphx Large graph computation frameworks: distributed memory and vertex-centric approach; bulk synchronous versus map&reduce; introduction to Apache Spark, functional semantics and distributed shared memory approach; Resilient Distributed Datasets basic concepts: partition, immutability, lineage; RDD transformations and actions, lazy functional evaluation and programming model, interplay of dependencies and partitioning on evaluation order, job scheduling, fault tolerance, performance, memory management and checkpointing; GraphX data model for vertexes/edges and Gather/Apply/Scatter approach to coding graph algorithms; advantages and limitations. Discussion about the final course project.
26/05/2015

Slides, Notes and References to papers

Date	Slides	Notes	References / Other information
24/02	Course introduction
24/02, 25/02	MPI part I
03/03, 11/03	MPI part II -- updated on 11/03 with corrections, more examples, struct datatype
04/03, 11/03	MPI part III -- updated on 11/03 with canceling of requests MPI part IV
18/03	MPI part V -- updated on 24/03
12/03, 17/03	MPI lab slides
24/03, 25/03	MPI part VI
31/03	TBB part I
01/04	MPI part VII (MPI-IO) 5 slides with examples are missing
28/04		Notes on parallel DM	Dhillon and Modha TR on K-means
29/04			K-means example code in C
05/05, 12/05	TBB Lesson 2 TBB Lesson 3
13/05, 19/05	GPU computing Lesson 1		Intro to OpenCL by Tim Mattson
22/05
25/05	Intro to Spark and Graphx
26/05			B&B in Muesli note that here the B&B tasks are never evaluated concurrently Annals of Operation Research Journal, issue on Branch and bound quite a theoretical approach survey, see chapter 2 See also Introduction to Parallel Computing (2nd ed.) by Vipin Kumar et. al., Chapter 11 A simple intro to B&B