====== Complementi di piattaforme abilitanti distribuite ====== {{:magistraleinformaticanetworking:cpa:grid-small.jpg?300 |}} **Teacher**: //[[http://hpc.isti.cnr.it/~khast|Nicola Tonellotto]]// **Question time**: Please contact the teacher ^Tuesday | 11:30-13:30 |Room 10B | ^Wednesday | 16:00-18:00 |Room 10B | ^Friday | 9:30-11:30 | Room 10B | Teaching rooms: Room 10B, S.Anna/CNIT building in CNR Research Area, ground floor.\\ For this first year, the course is co-organized with [[magistraleinformaticanetworking:spd:| Strumenti di programmazione per sistemi paralleli e distribuiti]] taught by Dr. Massimo Coppola ===== Syllabus ===== 24/02: Grid Computing (I) {{:magistraleinformaticanetworking:cpa:grid.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:grid_notes.pdf|Student Notes}}\\ * Large-scale problems in research and prodution environments * How to approach these problems * Preliminary definitions: resources, protocols, services, APIs and SDKs * A simple example: web services and their protocols 25/02: Grid Computing (II) {{:magistraleinformaticanetworking:cpa:grid.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:grid_notes.pdf|Student Notes}}\\ * Virtual Organizations * The Grid vision and its requirements * The Grid architecture: fabric, connectivity, resource, collective and application layers * Using the Grid: scenarios and examples * Open Grid Service Architecture and its capabilities * The eight fallacies of Grid computing 09/03: Grid Computing (III) {{:magistraleinformaticanetworking:cpa:globus.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:globus_notes.pdf|Student Notes}}\\ * The Globus Project * Public Key Infrastructure (concepts) * Grid Security Infrastructure * Certificates * Single Sign On and Delegation 10/03: Grid Computing (IV) {{:magistraleinformaticanetworking:cpa:globus.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:globus_notes.pdf|Student Notes}}\\ * Grid Information Services * Lightweight Directory Access Protocol (concepts) * Monitoring and Discovery Service * IP, GRIS and GIIS * Grid Information Models: MDS-2 and GLUE schemata 19/03: Grid Computing (V) {{:magistraleinformaticanetworking:cpa:globus.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:globus_notes.pdf|Student Notes}}\\ * HPC Resource Management * Grid Resource Management * Gatekeeper and Job Manager * Data Management * GASS and GridFTP * Replica Catalog and Replica Management Services 23/03: MapReduce: the programming model {{:magistraleinformaticanetworking:cpa:mapreduce.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:mapreduce_hdfs_notes.pdf|Student Notes}}\\ * Problem characterization * Map Fold in LISP * Programming model: mappers and reducers * Programming model: partitioners and combiners * Example and data flow 24/03: Distributed File Systems: GFS and HDFS {{:magistraleinformaticanetworking:cpa:dfs.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:mapreduce_hdfs_notes.pdf|Student Notes}}\\ * Problem characterization * Blocks, Name nodes and Data Nodes * Master/server architecture * Master Server (namenode), Chunk Servers (datanode) protocols and responsabilities * Anatomy of a read * Anatomy of a write * Benchmarks 26/03: Lab {{:magistraleinformaticanetworking:cpa:note_1.pdf|Notes}} {{:magistraleinformaticanetworking:cpa:data.zip|Data}} \\ * Hadoop installation and setup * Single mode and pseudodistributed mode configuration * Grep application 13/04: Lab {{:magistraleinformaticanetworking:cpa:note_2.pdf|Notes}} {{:magistraleinformaticanetworking:cpa:snippets.zip|Data}} \\ * Word Count application (old Hadoop APIs) * Word Count application (new Hadoop APIs) * API usage * Using large number of files 14/04: Lab {{:magistraleinformaticanetworking:cpa:note_3.pdf|Problem}} {{:magistraleinformaticanetworking:cpa:note_3.1.pdf|Solution (1/4)}}\\ * Computing tf-idf with MapReduce * Word frequency in document 04/05: Lab {{:magistraleinformaticanetworking:cpa:note_3.2.pdf|Solution (2/4)}}\\ * Computing tf-idf with MapReduce * Word count in document 05/05: Lab {{:magistraleinformaticanetworking:cpa:lab3.3.pdf|Solution (3/4)}} {{:magistraleinformaticanetworking:cpa:lab3.4.pdf|Solution (4/4)}}\\ * Computing tf-idf with MapReduce * Word frequency in collection * Calculate TF-IDF 07/05: Autonomic Computing {{:magistraleinformaticanetworking:cpa:autonomic.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:autonomic_computing_notes.pdf|Student Notes}}\\ * Self management * Self properties * Feedback control of computing systems 11/05: Scheduling {{:magistraleinformaticanetworking:cpa:scheduling.pdf|Slides}} {{:magistraleinformaticanetworking:cpa:scheduling_notes.pdf|Student Notes}}\\ * Single processor scheduling: SJF, FCFS, RR, MLQ * Real time scheduling: RM, EDF * Cluster Scheduling: FCFS, Backfilling 12/05: Scheduling {{:magistraleinformaticanetworking:cpa:scheduling.pdf|Slides}}{{:magistraleinformaticanetworking:cpa:scheduling2.pdf|Student Notes}}\\ * Grid Resource Management * Bag-of-tasks heuristics: Min-Min, Max-Min, Sufferage * Workflow heuristics: List, Multilevel, Clustering scheduling. HEFT * Economic Scheduling ===== Bibliography ===== - [[http://www.globus.org/alliance/publications/papers/anatomy.pdf|The Anatomy of the Grid: Enabling Scalable Virtual Organizations]] - [[http://www.globus.org/alliance/publications/papers/ogsa.pdf|The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration]] - [[http://labs.google.com/papers/mapreduce-osdi04.pdf|MapReduce: Simplified Data Processing on Large Clusters]] - [[http://labs.google.com/papers/gfs-sosp2003.pdf|The Google File System]] - [[http://www.research.ibm.com/autonomic/research/papers/AC_Vision_Computer_Jan_2003.pdf|The Vision of Autonomic Computing]] - [[http://www.buyya.com/papers/MHS-Springer-Jia2008.pdf|Workflow Scheduling Algorithms for Grid Computing]] ===== External Links ===== - [[http://www.globus.org/|The Globus Toolkit Homepage]] - [[http://hadoop.apache.org/|The Hadoop Homepage]] - [[http://developer.yahoo.com/hadoop/tutorial/index.html|The Yahoo! Hadoop Tutorial Homepage]]