Please, please, please read the fine print and ensure you understand the design tradeoffs as well as your application's requirements before blindly using this.
The moment I heard multi-master I thought Paxos, Raft or maybe virtual synchrony. Hmm, nothing in the documentation. Maybe a new consensus protocol was written from scratch then? That should be interesting!
No, none of that either - this implementation completely disregards consistency and makes write conflicts the developer's problem.
* Applications using BDR are free to write to any node so long as they are careful to prevent or cope with conflicts
* There is no complex election of a new master if a node goes down or network problems arise. There is no wait for failover. Each node is always a master and always directly writeable.
* Applications can be partition-tolerant: the application can keep keep working even if it loses communication with some or all other nodes, then re-sync automatically when connectivity is restored. Loss of a critical VPN tunnel or WAN won't bring the entire store or satellite office to a halt.
Basically:
* Transactions are a lie
* Consistent reads are a lie
* Datasets will diverge during network partitioning
* Convergence is not guaranteed without a mechanism for resolving write conflicts
I am sure there are use-cases where the risk of this design is acceptable (or necessary), but ensure you have a plan for dealing with data inconsistencies!
"Everything after "Basically" is misleading and inaccurate.
The conclusions here are not logical: The existence of some restrictions does not imply "Transactions are a lie", since that makes us think ALL transaction semantics are suspended, which is very far from the case.
There are good reasons for the design and restrictions in the BDR design, offering performance about x100 what is possible with eager consensus. Real world pragmatism, with production tools to ensure no inconsistencies exist in the database. The BDR tools allow you to trap and handle run-time issues, so it is not a theoretical debate or a mysterious issue, just a practical task for application authors to check their apps work.
"Consistent reads are a lie". Reads from multiple nodes at the same time are not guaranteed to be consistent - but this is multi-master - why would you read two nodes when all the data is on one node? The whole point is to put the data you need near the users who need it, so this is designed to avoid multiple node reads.
I could go on, and will do in a longer post elsewhere, but the main purpose of my retort is to show that the conclusions drawn here are not valid. Let's see how the hacker news method of consensus decides what is correct in this case.
Postgres-BDR works very well for its intended business use case, which is geographically distributed multiple master database. Master-slave replication is not comparable cos there is only one master. You should use as few masters as your business case requires, so it is possible you don't need BDR at all. The underlying technology is roughly the same between master-slave and master-master, so that is not a big distinction and not the key point.
Postgres-BDR is designed to solve a common problem: access to a database from users all around the world. In cases where you have a single master and multiple standbys then all write transactions must be routed globally to the same location, adding 100s of milliseconds latency and in many cases making applications unusable.
Postgres-BDR offers the opportunity to make those writes to multiple copies of the database in different locations, significantly reducing the latency for users.
What BDR doesn't solve is what you do if all your users want to update the same data at the same time. But that type of application is just not scalable. If you run such an application on any database that gives you perfect consistency and lots of other shiny buzzwords, all you get is a really, really slow application, just like you had in the first place. But it will tick all the keywords.
All distributed systems require you to understand the business case and the trade-offs required to make it work. This is not fancy technology to use because you think its cool, its for use where it is useful and important to do so. BDR gives you the tools to know whether your application will work in a distributed manner and to keep it working in production without fear.
I could'n agree more. Be _very_ careful with this illusion. Multi-master replication implementation like this makes sense for very few use cases and even then the development team better know what the hell are they doing or a lot of unwanted/unaccounted things will show up in production
I've been prototyping with VoltDB be/c it has a pretty interesting model that should be able to achieve a near-linear scale of write operations for tables that are partitioned. After reading the docs on VoltDB [1] is became clear to me that they are putting the design constraints up front and if you can work through those constraints [2] you can achieve some wicked scale. But it's a bit more complex than your typical single host database.
The work that VoltDB makes you deal with up front are a lot like the work that would have to be done for a multi-master setup in PostgreSQL to function correctly. I like how VoltDB puts those problems up front, but I'm having problems seeing VoltDB as a general purpose solution. The old PG database I work on right now I can't see in the VoltDB, but maybe parts of it would fit OK.
I look forward to the tooling in PG to get better and better. It's a great community and I do like the work that 2ndquandrant is doing. I like how they approach the community with their work. I do the BDR work is important to understand so that when you're in a situation that calls for it you can take advantage of it (same with VoltDB).
If you have any question about whether VoltDB could fit or not, let us know. It's surely not as general purpose as PG, but yes, it's faster for many use cases and its clustering is fully consistent.
Thanks! I certainly will. It looks pretty cool. I'm certainly learning it and finding the technical details in the documentation to be quite interesting.
> I am sure there are use-cases where the risk of this design is acceptable (or necessary), but ensure you have a plan for dealing with data inconsistencies!
I'd argue most non-financing applications would find these risks acceptable. This form of Multi-Master is what most people writing web-based applications actually are looking for. It simplifies having fail-over, at the costs you mentioned, but those aren't a major issues, especially if they're known upfront.
> * Datasets will diverge during network partitioning
> * Convergence is not guaranteed without a mechanism for resolving write conflicts
So yes, there are definitely workloads where this type of replication isn't appropriate, however, acting like there aren't any is blatantly ignoring many types of workloads.
>acting like there aren't any is blatantly ignoring many types of workloads.
I agree in principle, and if you have the knowledge and means to understand/mitigate the risk, this warning isn't aimed at you ;)
It is aimed at the ignorant, not the negligent e.g the engineer I was 15 years ago - who didn't know what he didn't know and would've chosen BDR simply for its master-master promise without understanding what I am really getting myself into.
I nearly ruined a business by choosing MySQL + NDB few years ago.
If people are using Postgres in a worldwide distributed way, then it's safe to say that they likely have a deep understanding of the issues behind it.
Replication is a problem that n00bs aren't going to touch on their startup site. They're only going to use it when they have hundreds of thousands/millions of users already.
It is amazing to see the number of projects based on Postgres mentioning that they are trying to achieve multi-master with their own restrictions:
* Postgres-R is not multi-master, got discontinued in 2010, and lots most of its meaning after WAL-shipping has been added in-core with hot-standby. http://www.postgres-r.org/
* Postgres-XC, that says to do multi-master, the first project based on PostgreSQL that I heard of doing sharding (term not used at the beginning of the project, and that I first heard in 2010). The project began in 2009, got discontinued in 2013. There are limitations like non-support for triggers and savepoints. XC was designed to be good for OLTP workloads, sucks for long-transactions and data warehouse type of workloads. The design of the project has been done in coordination by NTT and EnterpriseDB. https://sourceforge.net/projects/postgres-xc/
* Postgres-XL, that forked from Postgres-XC, and enhanced the data-warehouse case with improvements for data analytics by introducing a communication protocol between Datanodes. The project is still being maintained by some folks at 2ndQuad. http://www.postgres-xl.org/
* Postgres-X2, which has been an attempt to unify Postgres-XC and Postgres-XL efforts under the same banner. I don't know where this is moving to but things look rather stalled. https://github.com/postgres-x2/postgres-x2
* Postgres-XZ (look for PGXZ!), which is something I heard recently, based on Postgres-XC, and developed by some folks in China. Visibly this has its own limitations, and is used in production where the constraints induced by the scaling out are thought as acceptable.
So, that's cool to see many efforts, BDR being one. And all of them are trying to address the scaling-out problem with their own way. Now each application should study carefully what to use and if the limitations and constraints used are acceptable or not.
The moment I heard multi-master I thought Paxos, Raft or maybe virtual synchrony. Hmm, nothing in the documentation. Maybe a new consensus protocol was written from scratch then?
What's your use case here? Do you really want multi-master/bi-directional replication, or do you want distributed transactions? Why would an app want to connect to both masters, this isn't sharding?
If it isn't sharding then what is it? If every node has to process every write anyway (so there's no performance advantage over single-master) then why would one ever use this rather than traditional master-slave?
I guess, but I don't see the advantage over master -> slave failover? You lose unreplicated writes either way.
> Or mostly-read regional DBs separated by a slow WAN, ditto.
If it's just reads then regional slaves work well. If your writes don't need to be transactional then a RDBMS seems unlikely to be a good fit in general.
Sure. Either global transactionality for writes is important (in which case this style is useless, you need a master to accept the writes (or else some kind of distributed consensus protocol which this doesn't have) and single global master + distributed read slaves is a good model), transactionality is only important per-center (in which case I'd rather have explicit sharding rather than invisible differences in behaviour between queries that look the same), or you don't need transactional behaviour at all (in which case an RDBMS is probably a poor fit).
I can imagine a reasonable use case.
If every node only accepts writes to a particular set of tables (or partitions), and serves as a read slave wrt the rest, the database may remain consistent and more available than a single-master setup. Imagine one node per a geographical region, with each node having a (read-only) picture of the whole.
Do you know of any relational products which offer high-throughput, low-latency, high-availability transaction processing using perfectly synchronous multi-master replication?
You can't guarantee low latency and consistency without setting geographical boundaries on the distribution of the servers, unless you have very generous definitions of "low latency", as the speed of light (and more realistically: the speed you can transmit a signal via the internet, which is substantially lower) between the servers will place lower bounds on latency.
This is the case as you can't guarantee consistency without coordination between the servers (or you won't know what order to make operations visible in the case of multi-master, or whether a transaction has even been successfully applied by a slave when replicating master-slave), which at a bare minimum involves one round-trip (sending a transaction, and waiting for confirmation that it has been applied; assuming no conflicts requiring additional effort to reconcile).
You can pipeline some stuff to partially avoid actual delays, but you can't avoid the signalling, and for many applications it has disastrous effect on throughput when certain tables/rows are highly contended.
Yes, I meant a strictly one-cluster solution for machines within the same datacenter and preferably the same rack.
Synchronous replication across such a cluster can give much more read performance with write consistency and durability guarantees even when hardware failures occur. I don't know if any potential write performance increase would be worth the increased complexity, compared to a standard single-master setup.
Thanks for posting this! How does the logical replication reason about preconditions? Are row versions consistent across replicas? Are the preconditions somehow encoded as value checks?
Please, please, please read the fine print and ensure you understand the design tradeoffs as well as your application's requirements before blindly using this.
The moment I heard multi-master I thought Paxos, Raft or maybe virtual synchrony. Hmm, nothing in the documentation. Maybe a new consensus protocol was written from scratch then? That should be interesting!
No, none of that either - this implementation completely disregards consistency and makes write conflicts the developer's problem.
From http://bdr-project.org/docs/stable/weak-coupled-multimaster....
* Applications using BDR are free to write to any node so long as they are careful to prevent or cope with conflicts
* There is no complex election of a new master if a node goes down or network problems arise. There is no wait for failover. Each node is always a master and always directly writeable.
* Applications can be partition-tolerant: the application can keep keep working even if it loses communication with some or all other nodes, then re-sync automatically when connectivity is restored. Loss of a critical VPN tunnel or WAN won't bring the entire store or satellite office to a halt.
Basically:
* Transactions are a lie
* Consistent reads are a lie
* Datasets will diverge during network partitioning
* Convergence is not guaranteed without a mechanism for resolving write conflicts
I am sure there are use-cases where the risk of this design is acceptable (or necessary), but ensure you have a plan for dealing with data inconsistencies!