What

This is an introductory course in Distributed Systems. Distributed systems is the study of how to build a computer system where the state of the program is divided over more than one machine (or "node").

This course is in active development. At the moment, it consists of a series of short videos. The intention is to create a complete set of video lectures and then add additional content (such as more projects). Sadly progress is slow due to my other commitments getting in the way...

Why?

Because I love teaching and I know a lot about distributed systems. So why not? Also, I want to learn more about the art of teaching online. Designing and building a short course seemed like a reasonable way of learning this.

How should I use this?

Watch the videos and enjoy. You will learn more effectively if you are actively working on designing/building/maintaining a distributed system while you study -- so start making something! (Examples of what you could work on: build a multi-user chat system, build a data analysis using Hadoop, attempt to understand Paxos and build your own implementation (note that Paxos is known for being hard to understand...).) If you are already taking a college-level class on distributed systems then watch these videos before or after your lectures to review the material.

Check out the class project chat servers, and try them out. If folks start using them, they may become a great way to get questions anwered. (Or, they will become spam honeypots. We'll see.)

If you are an instructor and want to use these videos as a part of your class -- feel free to link to this site and send your students here to watch. Please do not make your own copies of the videos or slides, or change them; I like knowing how many people are using and enjoying the videos, and being able to fix and improve them at will. If you want to do something that involves copying this content, send me an email -- I'm happy to listen to your ideas.

Topics

This course covers the following topics:

  • Introduction
  • How systems fail
  • How to express your goals: SLIs, SLOs, and SLAs [video, slides]
  • Class Project: building a multiuser chat server [video, slides]
  • How to get agreement -- consensus
  • How Counterstrike Works (a.k.a. Time in Distributed Systems) [video, slides]
  • Blockchain Consensus
    • Introduction to Blockchain Consensus [video]
    • What is a blockchain? [video, slides]
    • Bitcoin blockchain consensus [video, slides]
    • Should you use Bitcion blockchain consensus? [video, slides]
  • Distributed System Design Example (Unique ID) [video, no slides -- I've been playing with After Effects]
  • The CAP Theorem [video, no slides -- I've been playing with After Effects]
  • Consistency Models in Distributed Systems [video, slides]
Potential future topics include:
  • Distributed storage systems
  • How to combine unreliable components to make a more reliable system
  • How nodes communicate -- RPCs
  • How nodes find each other -- naming
  • How to persist data -- distributed storage
  • How to secure your system
  • How to operate your distributed system -- the art of SRE

Want to watch them all? As I create videos, I'm adding them to this playlist.

Class Project

It's hard to learn any systems topic without building something. For this class I've created a bare-bones multiuser chat server which you can use as a foundation to build a more interesting distributed system yourself. The source code can be found on GitHub here.

You can also try it out (and use it to ask questions of your fellow classmates!). In a misguided attempt to avoid webcrawlers and spam I'm not going to link to the demo servers here, instead you can figure it out yourself: distributedchat dot appspot dot com; and www dot distributedsystemscourse dot com slash dschat (Note: this version is currently down, as I updated to Debian Bullseye -- and it no longer supports Python2 with uwsgi. But this application requires webapp2, which has apparently not been ported to Python3. Sigh.)

Learning More

The most common question I get is "where can I learn more?" Some resources you can explore include:

Questions/Feedback

This class is very much a work in progress (can't you tell?). I welcome any and all questions or constructive feedback, as I want to make it better! Either leave comments on the videos, or email me at chris@distributedsystemscourse.com.

About Me

Hi! I'm Chris Colohan. I went to grad school and got a PhD at Carnegie Mellon, then I spent 10 years working at Google building distributed systems (and managing teams which build distributed systems). Systems which I've contributed to include SUIF, MapReduce, TCMalloc, Percolator, Caffeine, Borg, Omega, and Piper. You can find random other information about me here.