I saw a talk by some operations engineers who work for ARIN at PGConf 2014.
They presented their architecture for high availability, which uses CMAN, Corosync, and Pacemaker, which handle message queueing, a quorum system and Postgres communication, respectively (Corosync is functioning similar to Zookeeper, as I understand it).
Their architecture was interesting for a couple reasons, it uses a three node system (master, synchronous slave and asynchronous slave), and it uses the network switch to cut off bad nodes. Nodes are tasked with monitoring other nodes, and when one node can’t communicate with another, it sends a message to others to request they check as well. A different node would be responsible to agree that some node is down, in which case, it notifies the switch that it needs to “fence” the bad node (i.e. cut off all network traffic).
In a failure scenario, the synchronous node can immediately become master, and the async must catch up and become synchronous – once the bad node is fixed, it re-enters the pool as asynchronous, which allows it to catch up slowly. It’s worth noting that if there is no synchronous node after a failure, the async node will be forced to catch up before the system can start running again, which will trigger a pause for end users.
Other ways to implement this (without a switch) would include iptables. Interestingly, the ARIN team came up with a list of ~45 failure scenarios, which they apparently tested in about three weeks (clearly this can be a lot of effort). Also interesting, a member of the audience mentioned an alternate design using dark fiber to connect SCSI connections on hard drives in what sounded like a RAID, for replication, but I don’t think most of my readers are going to be doing that.