Top 14 Apache Zookeeper Interview Questions You Must Prepare 19.Mar.2024

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintain shared data with robust synchronization techniques. ZooKeeper is itself a distributed application providing services for writing a distributed application.

The common services provided by ZooKeeper are as follows :

Naming service:Identifying the nodes in a cluster by name. It is similar to DNS, but for nodes.

Configuration management: Latest and up-to-date configuration information of the system for a joining node.

Cluster management:Joining / leaving of a node in a cluster and node status at real time.

Leader election:Electing a node as leader for coordination purpose.

Locking and synchronization service:Locking the data while modifying it. This mechanism helps you in automatic fail recovery while connecting other distributed applications like Apache HBase.

Highly reliable data registry:Availability of data even when one or a few nodes are down.

Creating children is similar to creating new znodes. The only difference is that the path of the child znode will have the parent path as well.

create /parent/path/subnode/path /data

Reliability:Failure of a single or a few systems does not make the whole system to fail.

Scalability : Performance can be increased as and when needed by adding more machines with minor change in the configuration of the application with no downtime.

Trparency: Hides the complexity of the system and shows itself as a single entity / application.

The central part of the ZooKeeper API is ZooKeeper class. It provides options to connect the ZooKeeper ensemble in its constructor and has the following methods -

connect - connect to the ZooKeeper ensemble

ZooKeeper(String connectionString, int sessionTimeout, Watcher watcher)

create - create a znode

create(String path, byte[] data, List acl, CreateMode createMode)

exists - check whether a znode exists and its information

exists(String path, boolean watcher)

getData - get data from a particular znode

getData(String path, Watcher watcher, Stat stat)

setData - set data in a particular znode

setData(String path, byte[] data, int version)

getChildren - get all sub-nodes available in a particular znode

getChildren(String path, Watcher watcher)

delete - get a particular znode and all its children

delete(String path, int version)

close - close a connection

Race condition: Two or more machines trying to perform a particular task, which actually needs to be done only by a single machine at any given time. For example, shared resources should only be modified by a single machine at any given time.

Deadlock:Two or more operations waiting for each other to complete indefinitely.

Inconsistency:Partial failure of data.

ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.

The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks.

ZooKeeper Command Line Interface (CLI) is used to interact with the ZooKeeper ensemble for development purpose. It is useful for debugging and working around with different options. To perform ZooKeeper CLI operations, first turn on your ZooKeeper server (“bin/zkServer.sh start”) and then, ZooKeeper client (“bin/zkCli.sh”).

Once the client starts, you can perform the following operation:

  • Create znodes
  • Get data
  • Watch znode for changes
  • Set data
  • Create children of a znode
  • List children of a znode
  • Check Status
  • Remove / Delete a znode

Create a znode with the given path. The flag argument specifies whether the created znode will be ephemeral, persistent, or sequential. By default, all znodes are persistent.

  • Ephemeral znodes (flag: e) will be automatically deleted when a session expires or when the client disconnects.
  • Sequential znodes guaranty that the znode path will be unique.
  • ZooKeeper ensemble will add sequence number along with 10 digit padding to the znode path. For example, the znode path /myapp will be converted to /myapp0000000001 and the next sequence number will be /myapp00000000@

If no flags are specified, then the znode is considered as persistent.

create /path /data

To create a Sequential znode, add -s flag as shown below.

create -s /path /data

To create an Ephemeral Znode, add -e flag as shown below.

create -e /path /data

Here are the benefits of using ZooKeeper:

  • Simple distributed coordination process
  • Synchronization:Mutual exclusion and co-operation between server processes. This process helps in Apache HBase for configuration management.
  • Ordered Messages
  • Serialization :Encode the data according to specific rules. Ensure your application runs consistently. This approach can be used in MapReduce to coordinate queue to execute running threads.
  • Reliability
  • Atomicity:Data trfer either succeed or fail completely, but no traction is partial.

Removes a specified znode and recursively all its children. This would happen only if such a znode is available.

rmr /path

Below are some of instances where Apache ZooKeeper is being utilized:

  • Apache Storm, being a real time stateless processing/computing framework, manages its state in ZooKeeper Service
  • Apache Kafka uses it for choosing leader node for the topic partitions
  • Apache YARN relies on it for the automatic failover of resource manager (master node)
  • Yahoo! utilties it as the coordination and failure recovery service for Yahoo! Message Broker, which is a highly scalable publish-subscribe system managing thousands of topics for replication and data delivery. It is used by the Fetching Service for Yahoo! crawler, where it also manages failure recovery.

Znodes are categorized as persistence, sequential, and ephemeral.

Persistence znode - Persistence znode is alive even after the client, which created that particular znode, is disconnected. By default, all znodes are persistent unless otherwise specified.

Ephemeral znode - Ephemeral znodes are active until the client is alive. When a client gets disconnected from the ZooKeeper ensemble, then the ephemeral znodes get deleted automatically. For this reason, only ephemeral znodes are not allowed to have a children further. If an ephemeral znode is deleted, then the next suitable node will fill its position. Ephemeral znodes play an important role in Leader election.

Sequential znode - Sequential znodes can be either persistent or ephemeral. When a new znode is created as a sequential znode, then ZooKeeper sets the path of the znode by attaching a 10 digit sequence number to the original name. For example, if a znode with path /myapp is created as a sequential znode, ZooKeeper will change the path to /myapp0000000001 and set the next sequence number as 00000000@If two sequential znodes are created concurrently, then ZooKeeper never uses the same number for each znode. Sequential znodes play an important role in Locking and Synchronization.

Once a ZooKeeper ensemble starts, it will wait for the clients to connect. Clients will connect to one of the nodes in the ZooKeeper ensemble. It may be a leader or a follower node. Once a client is connected, the node assigns a session ID to the particular client and sends an acknowledgement to the client. If the client does not get an acknowledgment, it simply tries to connect another node in the ZooKeeper ensemble. Once connected to a node, the client will send heartbeats to the node in a regular interval to make sure that the connection is not lost.

If a client wants to read a particular znode, it sends a read request to the node with the znode path and the node returns the requested znode by getting it from its own database. For this reason, reads are fast in ZooKeeper ensemble.

If a client wants to store data in the ZooKeeper ensemble, it sends the znode path and the data to the server. The connected server will forward the request to the leader and then the leader will reissue the writing request to all the followers. If only a majority of the nodes respond successfully, then the write request will succeed and a successful return code will be sent to the client. Otherwise, the write request will fail. The strict majority of nodes is called as Quorum.

Application interacting with ZooKeeper ensemble is referred as ZooKeeper Client or simply Client. Znode is the core component of ZooKeeper ensemble and ZooKeeper API provides a small set of methods to manipulate all the details of znode with ZooKeeper ensemble. A client should follow the steps given below to have a clear and clean interaction with ZooKeeper ensemble.

  • Connect to the ZooKeeper ensemble. ZooKeeper ensemble assign a Session ID for the client.
  • Send heartbeats to the server periodically. Otherwise, the ZooKeeper ensemble expires the Session ID and the client needs to reconnect.
  • Get / Set the znodes as long as a session ID is active.
  • Disconnect from the ZooKeeper ensemble, once all the tasks are completed. If the client is inactive for a prolonged time, then the ZooKeeper ensemble will automatically disconnect the client.