Syllabus

Introduction and motivation

A brief introduction to the concept of distributed systems. The goal is to explain the importance of a course about theory of distributed systems; in this sense, we will show that the design and implementation of effective distributed systems is not easy; issues related to ``imperfect'' computation and communication makes it substantially more difficult than designing centralized algorithms.

Lectures: 21/9
Slides: [pdf]
Additional notes: [pdf]

Reading material:

The "Two-generals" problem.
Introduction [Tv07, Chapter 1],
Architectures [Tv07, Chapter 2],
Examples of distributed systems [CDK05, Section 1.2],
"Inherent distribution" vs "Distribution as an Artifact" [GR04, Chapter 1]
Challanges of distributed computing [CDK05, Section 1.4]

Basic abstractions

This lecture presents the basic abstractions we use to model a distributed system composed of computational entities (processes) communicating by exchanging messages

Lectures: 23/9, 28/9
Slides: [pdf]

Reading material and topics:

Abstracting computation [CDK05, Section 2.2],[GR04, Chapter 2].
Abstracting communication [CDK05, Section 2.3],[GR04, Chapter 2].
Abstracting time [CDK05, 2.4],[GR04, Chapter 2].
Distributed models [CDK05, 2.4],[GR04, Chapter 2].
Specifications: liveness and safety [AS87]
A Modular Approach to Fault-Tolerant Broadcasts and Related Problems [HT93, Section 2]
Logical time and logical clocks [BM93]

Time and Global States

This lecture introduces the concept of logical time, showing how it can be used to order distributed events and debug distributed applications.

Lectures: 28/9, 30/9
Slides: [pdf]
Reading material and topics: [BM93]

Global states
Distributed debugging

Broadcast

This lecture explains how to obtain reliable communication ``many-to-many'', from multiple nodes to multiple nodes. The relation between this problem and more complex forms of coordination is explained.

Lectures: 5/10, 7/10
Slides: [pdf]
Reading material and topics: [HT93]

Reliable Broadcast, Uniform Reliable Broadcast
Ordering Properties (FIFO, Causal)
Atomic Broadcast

Epidemic Protocols

This set of lectures will show how biology-inspired techniques (epidemics) can be used to obtain protocols characterized by efficiency and robustness, with the only drawback that guarantees are not deterministic any more, but only probablistic.

Lectures: 7/10
Slides: [pdf]
Reading material and topics: [DGH⁺87]

Anti-entropy protocols
Gossip protocols

Consensus

This set of lectures is devoted to one of the main problems in distributed computing: how to reach an agreement among a collection of distributed processes. We will first show that Consensus cannot be reached in an asynchronous system [FLP85], and we will later shows how consensus can be solved ``in practice'' in real systems [CT96].

Lectures: 19/10, 21/10
Reading material and topics:

Impossibility of Consensus [FLP85]
Slides: Lecture at the blackboard [pdf]
Consensus: Beyond Impossibility [CT96,MR99,AT98]
Slides: [pdf]

Additional reading material:

[Tus96] proves that "◊W" is the weakest failure detector to solve Consensus
[Gue00] introduces the concept of ``indulgent algorithms'', i.e. algorithms that tolerate unreliable failure detectors.
[AT98] shows how the Consensus problem can be solved through an hybrid approach (FD + randomization)
[BO83] is the original paper of Ben-Or about randomized Consensus.
[Asp03] is a survey about randomized Consensus protocols.

Aggregation

This lecture introduces the concept of aggregation, which is the collective name of a set of functions that provide statistical information about a system.

Lectures: 26/10
Slides: [pdf]
Reading material and topics:

Gossip-based aggregation [JMB05]
Astrolabe [vRBV03]

Distributed transactions

A transaction is the execution of a sequence of actions that must be either entirely completed or aborted, independently of other transactions. A distributed transaction involves several servers and/or databases, and needs special protocols to be implemented.

Lectures: 28/10
Slides: [pdf]
Reading material and topics: [BT93]

Recap about transactional model
Atomic commitment

Consistency and Replication

This lecture introduces the concept of consistency policies for replicated systems.

Lectures: 2/11, 4/11
Slides: [pdf]
Reading material and topics: [Tv07, Chapter 7], [Gho06, Chapter 16]

Consistency models
Passive vs active replication
Primary-backup [BMST93]
State machine [Sch93]

Additional reading material:

T. Bressoud and F. Schneider. Hypervisor-Based Fault Tolerance. ACM TOCS, 14(1):80-107, 1996. [pdf]
B. Cully et al. Remus: High Availability via Asynchronous Virtual Machine Replication. In proceedings of NSDI'08, 2008. [pdf]

Group Communication

This lecture discusses a middleware tool called group communication, where multiple groups of processes or objects cooperate through simple primitives to provide replicated services to clients.

Lectures: 4/11
Slides: [pdf]
Reading material:

Group Communication [CKV01]

Peer Sampling

This lecture introduces the peer sampling problem, which is the large-scale equivalent of group membership in traditional systems.

Lectures: 9/11
Slides: [pdf]
Topics and Reading material:

Newscast [JGKv04]
Cyclon [VGv05]
Secure Peer Sampling [JHv07]

Lab - Peersim

This lecture introduces the Peersim simulator, which will be used for the laboratory projects.

Lectures: 11/11 Slides: [pdf]
Additional material: Peersim web site

P2P Systems

This lecture discusses the differences between structured and unstructured P2P systems and briefly introduces some well-known protocols.

Lectures: 18/11, 23/11)
Slides: [pdf], [pdf], [pdf]
Reading material and topics:

Chord [SMK⁺01]
Kademlia [MM02]
Bittorrent [Coh03]

Paxos

This short lecture discusses Paxos, a very practical Consensus protocol which is actually used in several real projects - including the Google file system.

Lectures: 30/11
Slides: [pdf]
Reading material: [Lam01]
Additional reading material

A paper about the Chubby lock service using Paxos [].
A nice paper about the fine details of implementing Paxos in a real system [CGR07].

Checkpointing

Lectures: 9/12
Slides: [pdf]
Reading material: [GPJ10]
Additional reading material: [BvS08]

Practical Byzantine Fault Tolerance

We give a look to a paper that has recently re-started the research in the area of byzantine fault tolerance.

Lectures: 2/12
Slides: [pdf]
Reading material: [CL99]
Additional reading material:

A paper introducing Zyzzyva, a speculative byzantine protocol that further optimized PBFT in the common case [KCW⁺07].
A paper introducing Aardvark, a novel approach to byzantine protocol trying to make them highly available even under attack.

CDK05: George Coulouris, Jean Dollimore, and Tim Kindberg.
Distributed systems (4th ed.): concepts and design.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2005.
[Bibtex].
GR04: Rachid Guerraoui and Louis Rodrigues.
Reliable Distributed Programming.
Springer-Verlag, Berlin, Germany, 2004.
[Bibtex].
Tv07: Andrew Tanenbaum and Maarten van Steen.
Distributed Systems: Principles and Paradigms (2nd ed.).
Prentice Hall, 2007.
[Bibtex].
AS87: Bowen Alpern and Fred B. Schneider.
Recognizing safety and liveness.
Distributed Computing, 2(3):117-126, 1987.
[PDF], [Bibtex].
HT93: Vassos Hadzilacos and Sam Toueg.
A modular approach to fault-tolerant broadcasts and related problems.
In Sape Mullender, editor, Distributed Systems (2nd edition). Addison-Wesley, 1993.
Chapter 5.
[PDF], [Bibtex].
BM93: Ozalp Babaoglu and Keith Marzullo.
Consistent global states of distributed systems: Fundamental concepts and mechanisms.
In Sape Mullender, editor, Distributed Systems (2nd edition). Addison-Wesley, 1993.
Chapter 4.
[PDF], [Bibtex].
DGH⁺87: Alan J. Demers, Daniel H. Greene, Carl Hauser, Wes Irish, John Larson, Scott Shenker, Howard E. Sturgis, Daniel C. Swinehart, and Douglas B. Terry.
Epidemic algorithms for replicated database maintenance.
In In Proceedings of the 6th annual ACM Symposium on Principles of Distributed Computing Systems (PODC'87), pages 1-12, 1987.
[PDF], [Bibtex].
FLP85: Michael J. Fischer, Nancy A. Lynch, and M. S. Paterson.
Impossibility of distributed consensus with one faulty processor.
Journal of the ACM, 32(2):374-382, April 1985.
[PDF], [Bibtex].
CT96: Tushar Deepak Chandra and Sam Toueg.
Unreliable failure detectors for reliable distributed systems.
Journal of the ACM, 43(2):225-267, 1996.
[PDF], [Bibtex].
MR99: Achour Mostéfaoui and Michel Raynal.
Solving consensus using chandra-toueg's unreliable failure detectors: A general quorum-based approach.
In In Proceedings of the 13th International Symposium on Distributed Computing (DISC'00), pages 49-63, Bratislava, Slavak Republic, 1999.
[PDF], [Bibtex].
AT98: Marcos Kawazoe Aguilera and Sam Toueg.
Correctness proof of ben-or’s randomized consensus algorithm.
Technical Report TR98-1682, Cornell University, 1998.
[PDF], [Bibtex].
Tus96: Vassos Hadzilacos Sam Toueg Tushar Deepak Chandra.
The weakest failure detector for solving consensus.
Journal of the ACM, 43(4):685-722, 1996.
[PDF], [Bibtex].
Gue00: Rachid Guerraoui.
Indulgent algorithms.
In In Proceedings of the 19th annual ACM Symposium on Principles of Distributed Computing Systems (PODC'00), pages 49-63, Portland, OR, 2000.
[PDF], [Bibtex].
AT98: Marcos Kawazoe Aguilera and Sam Toueg.
Failure detection and randomization: A hybrid approach to solve consensus.
SIAM Journal of Computing, 1998.
[PDF], [Bibtex].
BO83: Michael Ben-Or.
Another advantage of free choice: Completely asynchronous agreement protocols (extended abstract).
In In Proceedings of the 2nd annual ACM Symposium on Principles of Distributed Computing Systems (PODC'83), pages 27-30, 1983.
[PDF], [Bibtex].
Asp03: James Aspnes.
Randomized protocols for asynchronous consensus.
Distributed Computing, 16(2-3):165-175, September 2003.
[PDF], [Bibtex].
JMB05: Márk Jelasity, Alberto Montresor, and Ozalp Babaoglu.
Gossip-based aggregation in large dynamic networks.
ACM Trans. Comput. Syst., 23(1):219-252, August 2005.
[PDF], [Bibtex].
vRBV03: Robbert van Renesse, Kenneth Birman, and Werner Vogels.
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining.
ACM Trans. Comput. Syst., 21(2):164-206, May 2003.
[PDF], [Bibtex].
BT93: Ozalp Babaoglu and Sam Toueg.
Understanding non-blocking atomic commitment.
In Sape Mullender, editor, Distributed Systems (2nd edition). Addison-Wesley, 1993.
Chapter 6.
[PDF], [Bibtex].
Gho06: Sukumar Ghosh.
Distributed Systems: An Algorithmic Approach.
Chapman & Hall/CRC, 2006.
[Bibtex].
BMST93: Navin Budhiraja, Keith Marzullo, Fred Schneider, and Sam Toueg.
The primary-backup approach.
In Sape Mullender, editor, Distributed Systems (2nd edition). Addison-Wesley, 1993.
Chapter 8.
[PDF], [Bibtex].
Sch93: Fred Schneider.
Replication management using the state machine approach.
In Sape Mullender, editor, Distributed Systems (2nd edition). Addison-Wesley, 1993.
Chapter 7.
[PDF], [Bibtex].
CKV01: Gregory Chockler, Idit Keidar, and Roman Vitenberg.
Group communication specifications: a comprehensive study.
ACM Comput. Surv., 33(4):427-469, 2001.
[PDF], [Bibtex].
JGKv04: Márk Jelasity, Rachid Guerraoui, Anne-Marie Kermarrec, and Maarten van Steen.
The peer sampling service: Experimental evaluation of unstructured gossip-based implementations.
In Hans-Arno Jacobsen, editor, Proceedings of Middleware 2004, ACM/IFIP/USENIX International Middleware Conference, volume 3231 of Lecture Notes in Computer Science, pages 79-98, Toronto, Canada, 2004. Springer.
[PDF], [Bibtex].
VGv05: Spyros Voulgaris, Daniela Gavidia, and Maarten van Steen.
CYCLON: Inexpensive membership management for unstructured p2p overlays.
J. Network Syst. Manage., 13(2), 2005.
[PDF], [Bibtex].
JHv07: Gian Paolo Jesi, David Hales, and Maarten van Steen.
Identifying malicious peers before it's too late: A decentralized secure peer sampling service.
In Proceedings of the First International Conference on Self-Adaptive and Self-Organizing Systems (SASO'07), pages 237-246, Boston, MA, 2007. IEEE Computer Society.
[PDF], [Bibtex].
SMK⁺01: Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan.
Chord: A scalable peer-to-peer lookup service for internet applications.
In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM), pages 149-160, San Diego, CA, 2001. ACM, ACM Press.
[PDF], [Bibtex].
MM02: P. Maymounkov and D. Mazieres.
Kademlia: A peer-to-peer information system based on the XOR metric.
In Proceedings of the 1st International Workshop on Peer-to-Peer Systems (IPTPS’02), pages 258-263. Springer, 2002.
[PDF], [Bibtex].
Coh03: B. Cohen.
Incentives Build Robustness in BitTorrent.
In Proc. of the Workshop on Economics of Peer-to-Peer Systems, 2003.
[PDF], [Bibtex].
Lam01: L. Lamport.
Paxos made simple.
ACM SIGACT News, 32(4):18-25, 2001.
[PDF], [Bibtex].
CGR07: Tushar Deepak Chandra, Robert Griesemer, and Joshua Redstone.
Paxos made live: an engineering perspective.
In Indranil Gupta and Roger Wattenhofer, editors, Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing (PODC'07), pages 398-407, Portland, Oregon, USA, August 2007. ACM.
[PDF], [Bibtex].
GPJ10: Maarten van Steen Gian Paolo Jesi, Alberto Montresor.
Secure peer sampling.
Elsevier Computer Networks – Special Issue on Collaborative Peer-to-Peer Systems, 2010.
accepted.
[PDF], [Bibtex].
BvS08: Arno Bakker and Maarten van Steen.
Puppetcast: A secure peer sampling protocol.
In Proc. of the European Conference onComputer Network Defense, pages 3-10. IEEE Computer Society, 2008.
[PDF], [Bibtex].
CL99: Miguel Castro and Barbara Liskov.
Practical byzantine fault tolerance.
In Proc. of the 3rd Symposium on Operating Systems Design and Implementation (OSDI'99), New Orleans, USA, February 1999. USENIX Association.
[PDF], [Bibtex].
KCW⁺07: R. Kotla, A. Clement, E. Wong, L. Alvisi, and M. Dahlin.
Zyzzyva: Speculative byzantine fault tolerance.
In Proc. of the ACM Symposium on Operating Systems Principles (SOSP'07), Stevenson, WA, October 2007. ACM.
[PDF], [Bibtex].
EAWJ02: E. N. Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and David B. Johnson.
A survey of rollback-recovery protocols in message-passing systems.
ACM Computing Surveys, 34(3):375-408, 2002.
[PDF], [Bibtex].
DBGM02: Hrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina.
Streaming live media over peers.
Technical Report 2002-21, Stanford InfoLab, 2002.
[PDF], [Bibtex].
ZLLY05: Xinyan Zhang, Jiangchuan Liu, Bo Li, and Y.-S.P. Yum.
Coolstreaming/donet: a data-driven overlay network for peer-to-peer live media streaming.
volume 3, pages 2102-2111 vol. 3, March 2005.
[Bibtex].
hCRSZ02: Yang hua Chu, S.G. Rao, S. Seshan, and Hui Zhang.
A case for end system multicast.
Selected Areas in Communications, IEEE Journal on, 20(8):1456-1471, October 2002.
[Bibtex].
SF06: Thomas Silverston and Olivier Fourmaux.
P2p iptv measurement: A comparison study.
CoRR, abs/cs/0610133, 2006.
[PDF], [Bibtex].

Alberto Montresor 2009-12-12

Syllabus

Introduction and motivation

Basic abstractions

Time and Global States

Broadcast

Epidemic Protocols

Consensus

Aggregation

Distributed transactions

Consistency and Replication

Group Communication

Peer Sampling

Lab - Peersim

P2P Systems

Paxos

Checkpointing

Practical Byzantine Fault Tolerance

Checkpointing

P2P Video Streaming

Bibliography