Holy crap, I am scared! Please, please, please read the fine print and ensure yo...

simon2Q · on Sept 26, 2016

"Everything after "Basically" is misleading and inaccurate.

The conclusions here are not logical: The existence of some restrictions does not imply "Transactions are a lie", since that makes us think ALL transaction semantics are suspended, which is very far from the case.

There are good reasons for the design and restrictions in the BDR design, offering performance about x100 what is possible with eager consensus. Real world pragmatism, with production tools to ensure no inconsistencies exist in the database. The BDR tools allow you to trap and handle run-time issues, so it is not a theoretical debate or a mysterious issue, just a practical task for application authors to check their apps work.

"Consistent reads are a lie". Reads from multiple nodes at the same time are not guaranteed to be consistent - but this is multi-master - why would you read two nodes when all the data is on one node? The whole point is to put the data you need near the users who need it, so this is designed to avoid multiple node reads.

I could go on, and will do in a longer post elsewhere, but the main purpose of my retort is to show that the conclusions drawn here are not valid. Let's see how the hacker news method of consensus decides what is correct in this case.

ukj · on Sept 26, 2016

Well, you can re-define the properties of a "transaction" within BDR architecture but you can't re-define your customer's use-cases.

But you have a point. Not all semantics are suspended, only atomicity, isolation and consistency.

Durability has become probabilistic, because merge conflicts with a concurrent mutation from another host can discard our change.

>offering performance about x100 what is possible with eager consensus

And what's the performance gain over regular asynchronous master-slave replication? All nodes have to process all mutations at some point, no?

>but this is multi-master - why would you read two nodes when all the data is on one node?

The "why" is because my use-case requires consistent reads and the business risk of inconsistent reads is unacceptable.

I am guessing you are working on the project? It scares me that you are asking this question.

It remains to be seen how many people are happy with such trade-off. I claim total ignorance here - it could be millions :)

simon2Q · on Sept 26, 2016

Postgres-BDR works very well for its intended business use case, which is geographically distributed multiple master database. Master-slave replication is not comparable cos there is only one master. You should use as few masters as your business case requires, so it is possible you don't need BDR at all. The underlying technology is roughly the same between master-slave and master-master, so that is not a big distinction and not the key point.

Postgres-BDR is designed to solve a common problem: access to a database from users all around the world. In cases where you have a single master and multiple standbys then all write transactions must be routed globally to the same location, adding 100s of milliseconds latency and in many cases making applications unusable.

Postgres-BDR offers the opportunity to make those writes to multiple copies of the database in different locations, significantly reducing the latency for users.

What BDR doesn't solve is what you do if all your users want to update the same data at the same time. But that type of application is just not scalable. If you run such an application on any database that gives you perfect consistency and lots of other shiny buzzwords, all you get is a really, really slow application, just like you had in the first place. But it will tick all the keywords.

All distributed systems require you to understand the business case and the trade-offs required to make it work. This is not fancy technology to use because you think its cool, its for use where it is useful and important to do so. BDR gives you the tools to know whether your application will work in a distributed manner and to keep it working in production without fear.

zerg2k · on Sept 26, 2016

I could'n agree more. Be _very_ careful with this illusion. Multi-master replication implementation like this makes sense for very few use cases and even then the development team better know what the hell are they doing or a lot of unwanted/unaccounted things will show up in production

gabriel · on Sept 26, 2016

I kind of agree with the assessment.

I've been prototyping with VoltDB be/c it has a pretty interesting model that should be able to achieve a near-linear scale of write operations for tables that are partitioned. After reading the docs on VoltDB [1] is became clear to me that they are putting the design constraints up front and if you can work through those constraints [2] you can achieve some wicked scale. But it's a bit more complex than your typical single host database.

The work that VoltDB makes you deal with up front are a lot like the work that would have to be done for a multi-master setup in PostgreSQL to function correctly. I like how VoltDB puts those problems up front, but I'm having problems seeing VoltDB as a general purpose solution. The old PG database I work on right now I can't see in the VoltDB, but maybe parts of it would fit OK.

I look forward to the tooling in PG to get better and better. It's a great community and I do like the work that 2ndquandrant is doing. I like how they approach the community with their work. I do the BDR work is important to understand so that when you're in a situation that calls for it you can take advantage of it (same with VoltDB).

[1] https://docs.voltdb.com/UsingVoltDB/IntroHowVoltDBWorks.php#... [2] https://docs.voltdb.com/UsingVoltDB/DesignPartition.php#Sche...

jhugg · on Sept 26, 2016

If you have any question about whether VoltDB could fit or not, let us know. It's surely not as general purpose as PG, but yes, it's faster for many use cases and its clustering is fully consistent.

gabriel · on Sept 26, 2016

Thanks! I certainly will. It looks pretty cool. I'm certainly learning it and finding the technical details in the documentation to be quite interesting.

jimktrains2 · on Sept 26, 2016

> I am sure there are use-cases where the risk of this design is acceptable (or necessary), but ensure you have a plan for dealing with data inconsistencies!

I'd argue most non-financing applications would find these risks acceptable. This form of Multi-Master is what most people writing web-based applications actually are looking for. It simplifies having fail-over, at the costs you mentioned, but those aren't a major issues, especially if they're known upfront.

> * Datasets will diverge during network partitioning

> * Convergence is not guaranteed without a mechanism for resolving write conflicts

While this isn't ideal in a perfect world, it's workable for, again, web-based applications where consistency isn't usually required. Also, the rules are known http://bdr-project.org/docs/stable/conflicts-types.html

So yes, there are definitely workloads where this type of replication isn't appropriate, however, acting like there aren't any is blatantly ignoring many types of workloads.

ukj · on Sept 26, 2016

>acting like there aren't any is blatantly ignoring many types of workloads.

I agree in principle, and if you have the knowledge and means to understand/mitigate the risk, this warning isn't aimed at you ;)

It is aimed at the ignorant, not the negligent e.g the engineer I was 15 years ago - who didn't know what he didn't know and would've chosen BDR simply for its master-master promise without understanding what I am really getting myself into.

I nearly ruined a business by choosing MySQL + NDB few years ago.

mozumder · on Sept 26, 2016

If people are using Postgres in a worldwide distributed way, then it's safe to say that they likely have a deep understanding of the issues behind it.

Replication is a problem that n00bs aren't going to touch on their startup site. They're only going to use it when they have hundreds of thousands/millions of users already.

ioltas · on Sept 26, 2016

It is amazing to see the number of projects based on Postgres mentioning that they are trying to achieve multi-master with their own restrictions:

* Postgres-R is not multi-master, got discontinued in 2010, and lots most of its meaning after WAL-shipping has been added in-core with hot-standby. http://www.postgres-r.org/

* Postgres-XC, that says to do multi-master, the first project based on PostgreSQL that I heard of doing sharding (term not used at the beginning of the project, and that I first heard in 2010). The project began in 2009, got discontinued in 2013. There are limitations like non-support for triggers and savepoints. XC was designed to be good for OLTP workloads, sucks for long-transactions and data warehouse type of workloads. The design of the project has been done in coordination by NTT and EnterpriseDB. https://sourceforge.net/projects/postgres-xc/

* Postgres-XL, that forked from Postgres-XC, and enhanced the data-warehouse case with improvements for data analytics by introducing a communication protocol between Datanodes. The project is still being maintained by some folks at 2ndQuad. http://www.postgres-xl.org/

* Postgres-X2, which has been an attempt to unify Postgres-XC and Postgres-XL efforts under the same banner. I don't know where this is moving to but things look rather stalled. https://github.com/postgres-x2/postgres-x2

* Postgres-XZ (look for PGXZ!), which is something I heard recently, based on Postgres-XC, and developed by some folks in China. Visibly this has its own limitations, and is used in production where the constraints induced by the scaling out are thought as acceptable.

So, that's cool to see many efforts, BDR being one. And all of them are trying to address the scaling-out problem with their own way. Now each application should study carefully what to use and if the limitations and constraints used are acceptable or not.

simon2Q · on Sept 26, 2016

Postgres-R is dead. No need to look.

Postgres-XC has never implemented multi-master, so don't look there. Postgres-XZ, if it exists, is a closed-source fork of XC by a Chinese company.

Postgres-XL implements cluster scaling for queries and transactions. It is alive and well. http://blog.2ndquadrant.com/when-to-use-postgres-bdr-or-post...

X2 is the developers of XL and XC working together to see if we can help each other; there is no separate code from that initiative.

ddorian43 · on Sept 26, 2016

Also Citusdb which just got multimaster on their CitusMX cloud version and will release later on open-source.

gaius · on Sept 26, 2016

The moment I heard multi-master I thought Paxos, Raft or maybe virtual synchrony. Hmm, nothing in the documentation. Maybe a new consensus protocol was written from scratch then?

What's your use case here? Do you really want multi-master/bi-directional replication, or do you want distributed transactions? Why would an app want to connect to both masters, this isn't sharding?

lmm · on Sept 26, 2016

If it isn't sharding then what is it? If every node has to process every write anyway (so there's no performance advantage over single-master) then why would one ever use this rather than traditional master-slave?

gaius · on Sept 26, 2016

Active-active DR scenario, zero failover. Done this loads with Oracle. Or mostly-read regional DBs separated by a slow WAN, ditto.

lmm · on Sept 26, 2016

> Active-active DR scenario, zero failover.

I guess, but I don't see the advantage over master -> slave failover? You lose unreplicated writes either way.

> Or mostly-read regional DBs separated by a slow WAN, ditto.

If it's just reads then regional slaves work well. If your writes don't need to be transactional then a RDBMS seems unlikely to be a good fit in general.

CodeWriter23 · on Sept 26, 2016

mostly reads != just reads

Think mutiple fulfillment centers changing state orders as they are shipped. Read mostly, write a little.

lmm · on Sept 26, 2016

Sure. Either global transactionality for writes is important (in which case this style is useless, you need a master to accept the writes (or else some kind of distributed consensus protocol which this doesn't have) and single global master + distributed read slaves is a good model), transactionality is only important per-center (in which case I'd rather have explicit sharding rather than invisible differences in behaviour between queries that look the same), or you don't need transactional behaviour at all (in which case an RDBMS is probably a poor fit).

nine_k · on Sept 26, 2016

I can imagine a reasonable use case. If every node only accepts writes to a particular set of tables (or partitions), and serves as a read slave wrt the rest, the database may remain consistent and more available than a single-master setup. Imagine one node per a geographical region, with each node having a (read-only) picture of the whole.

This is a relatively limited use case.

jeffdavis · on Sept 26, 2016

Do you know of any relational products which offer high-throughput, low-latency, high-availability transaction processing using perfectly synchronous multi-master replication?

lmm · on Sept 26, 2016

No. But I am aware of products that don't silently drop written data. That should be a pretty low bar.

nine_k · on Sept 26, 2016

Doesn't Oracle offer such an option?

vidarh · on Sept 26, 2016

You can't guarantee low latency and consistency without setting geographical boundaries on the distribution of the servers, unless you have very generous definitions of "low latency", as the speed of light (and more realistically: the speed you can transmit a signal via the internet, which is substantially lower) between the servers will place lower bounds on latency.

This is the case as you can't guarantee consistency without coordination between the servers (or you won't know what order to make operations visible in the case of multi-master, or whether a transaction has even been successfully applied by a slave when replicating master-slave), which at a bare minimum involves one round-trip (sending a transaction, and waiting for confirmation that it has been applied; assuming no conflicts requiring additional effort to reconcile).

You can pipeline some stuff to partially avoid actual delays, but you can't avoid the signalling, and for many applications it has disastrous effect on throughput when certain tables/rows are highly contended.

nine_k · on Sept 26, 2016

Yes, I meant a strictly one-cluster solution for machines within the same datacenter and preferably the same rack.

Synchronous replication across such a cluster can give much more read performance with write consistency and durability guarantees even when hardware failures occur. I don't know if any potential write performance increase would be worth the increased complexity, compared to a standard single-master setup.

gaius · on Sept 26, 2016

I am well known as a database guy, but for a scenario like this I think best practice is probably CICS.

sargun · on Sept 26, 2016

Thanks for posting this! How does the logical replication reason about preconditions? Are row versions consistent across replicas? Are the preconditions somehow encoded as value checks?

segmondy · on Sept 28, 2016

We need a jepsen test.