17 Feb 2017, 07:08

CAD, not CAP

Share

Not being Partition Tolerant just means that you’re not Distributed, so CAP can be read as CAD - this might help with reasoning about it.

There’s this thing running around called the CAP theorem by Eric Brewer. It’s meant to show you that you have to make a tradeoff when designing a system - like “you can be good, fast, or cheap; pick at most two.” You pick a spot somewhere in the CAP triangle.

CAP Triangle

To paraphrase:

  • “C” for Consistency: you get the most recent write or an error,
  • “A” for Availability: you always get an answer,
  • “P” for Partition tolerance: you can still talk to the system (outside-in) even if there’s internal communication issues. E.g. a Partition happens when a node in the US loses its communication to a node in Europe due to a DDoS attack that takes out provider internet access.

The theory is that you can only do 2 of these*. The AP system sacrifices consistency meaning that you can get different answers during/after a partition. The CP system sacrifices availability meaning that during a partition, some part of the system in unable to serve data. The AC system sacrifices… having a partition?

This is where the language does feel like it fits. The only way to choose a AC system is to not permit a partition. The tolerance isn’t being able to handle it when it arises so much as it’s allowing it to happen at all. The only current way to handle that (short of advances in quantum entanglement) is to have a monolith and not a distributed system. But the CAP theorem is only in the context of a distributed system.

Maybe the AC is a degenerate case of “when everything is functioning fine”, or it’s meant to handle the degenerate “distributed system of one.” I don’t know what Brewer’s original thought on this is, but it seems a bit off to handle this by calling it Partition Tolerance. It really feels like it shouldn’t be a part of it, and that you’re left with deciding where you want to be on the AC line of the triangle - which really just turns this into a line:

AC Line

Every distributed system has to figure out how it’s going to handle the inevitable Partition events that happen in it. It’s a fundamental property of distributed system.

So, next time you hear “Partition tolerant”, process that as “distributed” and see if that makes it easier to handle.

* Writer’s Note: The recently released Google Spanner claims a bit of being able to do all three. I haven’t looked at it yet, so maybe I’m wrong in my thought here.