… because everything in life is a choice
In the last days I’m involved in some architectural decisions about how our data is / will be managed in a distributed architectural model. During a meeting I quoted the CAP theorem and some people didn’t know about it. So I decided to write here and share an overview of my understanding.
“as applications become more web-based we should stop worrying about data consistency, because if we want high availability in these new distributed applications, then guaranteed consistency of data is something we cannot have, thus giving anyone with three servers and a keen eye for customer experience permission to start an internet scale business.“
Basically, the theorem is about trade-off decisions which need to be made in highly scalable / distributed systems and that it is impossible for a web service to provide the following three requirements at the same time: Consistency, Availability and Partition tolerance.
“Do or do not, there is no try” – Yoda.
Looking MIT paper they use the word “atomic” instead of consistent because, roughly speaking, consistent is the C in ACID as applied to the properties of database transactions and means that data will never be persisted if something goes wrong.
“The probability that the system is operating at a specified time t.” – Blanchard
The client should be able to retrieve the stored data in a distributed system no matter what happens under the hood. If the client makes a request then he must be able to get a response from the system whether a node (or N nodes) in the cluster are down.
“No set of failures less than total network failure is allowed to cause the system to respond incorrectly” – Gilbert & Lynch
Means that the data is network partitioned (aka cluster) and if the network stops delivering messages between two sets of partitions the system will continue to work correctly. Data must be available even when there is a “partition” breaks between nodes. Michael Stonebraker says: “If there is a network failure that splits the processing nodes into two groups that cannot talk to each other, then the goal would be to allow processing to continue in both subgroups”.
CAP in practice
Ok, show me the code! CAP theorem is at the base of developing distributed system, that means I need an extra effort to reproduce something worth. My idea here is not describe such a complex system at the code level. So, I’m gonna show a brilliant data store comparison (I found on web) and how they handle CAP’s trade off trying to elucidate the concept.
Based on CAP model, let’s take some market examples:
Consistent + Available (CA) = problems with partitions and the typically solution is replication. Example: traditional RDBMSs like Postgres, MySQL, Oracle, etc;
Consistent + Partition-Tolerant (CP) = problems with availability considering the cost to keeping data consistent across nodes. Examples: MongoDB, BigTable, Terrastore, Hypertable, HBase, Redis, MemcacheDB.
Eric Brewer recently wrote a topic of his theorem titled CAP Twelve Years Later: How the “Rules” Have Changed. The theorem still strong considering the approach and amplitude of distributed systems these days (cloud, database, data centers, …). The tradeoff between latency, consistency and availability are deep-seated.
Lastly, there are many discussions about how true is CAP (maybe next post), anyway I do believe it is a powerful tool in order to understand the challenges of distributed systems and the designed behavior of specific systems.