This is part of a multi-part series on IBM Sterling Global Mailbox.
What is ZooKeeper?
Apache ZooKeeper a highly reliable, open source, distributed co-ordination service. ZooKeeper provides methods for co-ordinating activates across distributed systems with many nodes in many data centers.
ZooKeeper stores key-value pairs in a hierarchy which are used as building blocks for higher level co-ordination services. These services are called “recipes”. The recipe implementations are not part of ZooKeeper itself.
For high availability, multiple ZooKeeper nodes are deployed together, and data is replicated across them. A group of ZooKeeper nodes is called an ensemble.
ZooKeeper requires quorum for any operation to respond successfully.
What is Curator?
Apache Curator is a Java client library for ZooKeeper. It provides a set of Java APIs making it easier to connect to and work with ZooKeeper. Apache Curator also implements many of the recipes suggested in the ZooKeeper documentation.
Which ZooKeeper recipes are used Global Mailbox?
Since Global Mailbox is a distributed application, many activities must be co-ordinated across nodes and data centers. ZooKeeper provides this co-ordination for the Global Mailbox deployment.
Global Mailbox uses the following recipes to co-ordinate activities across nodes and data centers:
Locks
A lock prevents concurrent reads and/or writes to an object. If an object is locked by one thread, other threads within the same JVM and other JVMs will be prevented from accessing the object. They must wait until the lock is released. Once released, the next thread will be given access to the object.
Global Mailbox uses locks for many objects in the system to prevent consistency errors. For example, when a message is extracted, the extraction counter is locked to ensure that it’s properly updated.
Barriers
A barrier is used to block actions from happening until some condition is met.
Global Mailbox uses a barrier during synchronous replication. With synchronous replication, file uploads don’t complete until the payload is replicated to a majority of data centers. Since this involves multiple servers and data centers, a barrier is used to co-ordinate the activities.
When a file is uploaded, the protocol adapter stores the payload in the receiving data center, then waits for replication to complete by waiting on a barrier. The other data centers replicate the payloads to the local data center. When the payload has been replicated to a majority of data centers, the barrier is released, causing the protocol adapter in the receiving data center to complete the final steps of the upload and respond back to the client.
What happens when ZooKeeper nodes go down?
ZooKeeper requires a quorum of nodes to be up for ZooKeeper operations to be successful. Quorum is calculated as:
total # of nodes across all data centers/2 + 1 then round down to the next whole number
In the recommended Global Mailbox deployment of 2 data centers, you must have 3 ZooKeeper nodes in one data center and 2 in the other. There should always be an odd number of ZooKeeper nodes in total.
This configuration results in a quorum of 3. This means that you can lose up to 2 ZooKeeper nodes and still be functional.
Global Mailbox has a process that monitors the ZooKeeper ensemble to ensure that you can continue operating with less than quorum. This is used in situations where the 2 datacenters cannot communicate to each other (split brain) or if a datacenter is down for maintenance. This process is called the ZooKeeper Watchdog and is described below.
What do I do if a ZooKeeper node is corrupted and won’t start?
The key-value pairs stored in ZooKeeper are relatively short-lived for Global Mailbox. They represent locks or barriers which typically last only a few seconds. If a node crashes and cannot restart, it’s easy to recover since no data recovery is needed. See the Global Mailbox documentation for the steps to recover a corrupted ZooKeeper node.
What is the ZooKeeper Watchdog?
The ZooKeeper watchdog is specific to Global Mailbox and not part of standard ZooKeeper installations. The watchdog monitors the ZooKeeper ensemble. When it detects that quorum is lost, it temporarily removes unreachable nodes from the ensemble so that quorum can still be achieved. Once the nodes come back, they are re-added to the ensemble.
The watchdog must always be running. It is required in cases of network partition to keep the system functioning.
Considering the recommended deployment where DC1 has 3 ZooKeeper nodes and DC2 has 2 ZooKeeper nodes, quorum requires 3 nodes. If the 2 data centers cannot talk to each other, DC1 can still achieve quorum, but DC2 cannot because it only has 2 nodes. In this case, the watchdog in DC2 will detect quorum has been lost, remove the unreachable nodes from the configuration and restart ZooKeeper nodes. After doing this, quorum can be achieved in the data center and Global Mailbox continues to function, however locks will be local to the data center and not global across all data centers.
How do I download and install ZooKeeper?
ZooKeeper is included with the media for B2Bi/SFG. When you download the product you’ll see two installation manager repositories: b2birepo and gmrepo. gmrepo contains Cassandra, ZooKeeper and other addons for B2Bi/SFG. Use IBM Installation Manager to install ZooKeeper from the gmrepo.
Before installing, ensure you plan out your installation. The ZooKeeper installer will ask you several questions. Ensure you have the following information available at installation time:
- List of hostnames of each ZooKeeper node
- Ports: which port numbers to use for the various ZooKeeper protocols.
Be sure to follow the firewall considerations to open ports needed for ZooKeeper within and across data centers.
How do I uninstall ZooKeeper?
Use IBM Installation Manager to uninstall ZooKeeper. Installation Manager will remove all product binaries from disk and remove it from the list of installed applications. All user-generated files will remain on the file system such as the ZooKeeper files that contain the key-value pair data. If you wish to reinstall, clear out the user-generated data from the install location.
Can I install a different version of ZooKeeper?
No, you must use the version of ZooKeeper that’s included with the product.
Who provides support for ZooKeeper when used with Global Mailbox?
IBM supports ZooKeeper when used with Global Mailbox. If you have a problem with ZooKeeper in your Global Mailbox deployment, contact IBM support.
How do I get fixes for ZooKeeper?
IBM provides fixes for ZooKeeper via Fix Central.
#SupplyChain
#B2BIntegration