Top 26 Apache Tajo Interview Questions and Answers for 27.Jul.2024

Q1. Mention The Salient Features Of Apache Tajo ?

Some salient feaures of Tajo are:

Superior scalability and optimized performance
Low latency
User-defined functions
Row/columnar storage processing framework.
Compatibility with HiveQL and Hive MetaStore
Simple data flow and easy maintenance.

Q2. How Tables Are Managed In Apache Tajo?

The logical view of the data source is defined as table. The table consists of various properties like logical schema, partitions, URL etc. A Tajo table can be a directory in HDFS, a single file, one HBase table, or a RDBMS table.

The types of tables supported by Apache Tajo are:

External table: External table needs the location property when the table is created. For instance, if the data is already there as Text/JSON files or HBase table, it can be registered as Tajo external table. The following query is an example of external table creation.

create external table sample(col1 int,col2 text,col3 int)

location ‘hdfs://path/to/table';

Internal table: A Internal table is also called an Managed Table. It is created in a pre-defined physical location called the Tablespace.

create table table1(col1 int,col2 text);

By default, Tajo uses “tajo.warehouse.directory” located in “conf/tajo-site.xml” . Tablespace configuration is used to assign new location for the table.

Q3. What Are The Storage Supported By Tajo?

Tajo supports the following storage formats:

HDFS
JDBC
Amazon S3
Apache HBase
Elasticsearch

Q4. What Is Apache Tajo?

Apache Tajo is a relational and distributed data processing framework. It is designed for low latency and scalable ad-hoc query analysis.

Tajo supports standard SQL and various data formats. Most of the Tajo queries can be executed without any modification.
Tajo has fault-tolerance through a restart mechanism for failed tasks and extensible query rewrite engine.
Tajo performs the necessary ETL (Extract Trform and Load process) operations to summarize large datasets stored on HDFS. It is an alternative choice to Hive/Pig.

Q5. How To Add Column In Apache Tajo?

To insert new column in the “students” table, type the following syntax -

Alter table ADD COLUMN

alter table students add column grade text;

Q6. What Is Having Clause In Apache Tajo?

The HAVING clause enables you to specify conditions that filter which group results appear in the final results. The WHERE clause places conditions on the selected columns, whereas the HAVING clause places conditions on the groups created by the GROUP BY clause.

SELECT column1, column2 FROM table1

GROUP BY column HAVING [ conditions ]

select age from mytable group by age having sum(mark) > 200;

Q7. Explain About Tablespace?

The locations in the storage system are defined by Tablespace. It is supported for only internal tables. Tablespaces are accessed by their names. Each tablespace can use a different storage type. If the tablespace is not specified then, Tajo uses the default tablespace in the root directory. Tajo’s internal table records can be accessed from another table only. It can be configured with tablespace.

CREATE TABLE [IF NOT EXISTS]

[(column_list)] [TABLESPACE tablespace_name]

[using [with ( = , ...)]] [AS ]

Q8. What Are Apache Tajo Sql Functions?

Some of the SQL functions supported by Apache Tajo are categorized into:

Math Functions
String Functions
DateTime Functions
JSON Functions

Q9. Mention Some Basic Tajo Shell Commands?

Start server

$ bin/start-tajo.sh

Start Shell

$ bin/tsql

List Database

default> l

List out Built-in Functions

default> df

Describe Function: df function name - This query returns the complete description of the given function.

default> df sqrt

Quit Terminal

default> q

Cluster Info

default&> admin -cluster

Show master

default> admin -showmasters

Q10. Explain About Tajo Worker Configuration?

Worker Heap Memory Size: The environment variable TAJO_WORKER_HEAPSIZE in conf/tajo-env.sh allow Tajo Worker to use the specified heap memory size. If you want to adjust heap memory size, set TAJO_WORKER_HEAPSIZE variable in conf/tajo-env.

sh with a proper size as follows:

TAJO_WORKER_HEAPSIZE=8000

The default size is 1000 (1GB).

Temporary Data Directory: TajoWorker stores temporary data on local file system due to out-of-core algorithms. It is possible to specify one or more temporary data directories where temporary data will be stored.

Maximum number of parallel running tasks for each worker: Each worker can execute multiple tasks at a time. Tajo allows users to specify the maximum number of parallel running tasks for each worker.

Q11. Explain The Tajo Architecture?

Client: Client submits the SQL statements to the Tajo Master to get the result.

Master: Master is the main daemon. It is responsible for query planning and is the coordinator for workers.

Catalog server: Maintains the table and index descriptions. It is embedded in the Master daemon. The catalog server uses Apache Derby as the storage layer and connects via JDBC client.

Worker: Master node assigns task to worker nodes. TajoWorker processes data. As the number of TajoWorkers increases, the processing capacity also increases linearly.

Query Master: Tajo master assigns query to the Query Master. The Query Master is responsible for controlling a distributed execution plan. It launches the TaskRunner and schedules tasks to TaskRunner. The main role of the Query Master is to monitor the running tasks and report them to the Master node.

Node Managers: Manages the resource of the worker node. It decides on allocating requests to the node.

TaskRunner: Acts as a local query execution engine. It is used to run and monitor query process. The TaskRunner processes one task at a time.

It has the following three main attributes:

Logical plan - An execution block which created the task.

A fragment - an input path, an offset range, and schema.

Fetches URIs:

Query Executor: It is used to execute a query.

Storage service: Connects the underlying data storage to Tajo.

Q12. Explain Tajo Configuration Files?

Tajo’s configuration is based on Hadoop’s configuration system.

Tajo uses two config files:

catalog-site.xml- configuration for the catalog server.

tajo-site.xml- configuration for other tajo modules. Tajo has a variety of internal configs. If you don’t set some config explicitly, the default config will be used for for that config. Tajo is designed to use only a few of configs in usual cases. You may not be concerned with the configuration.

In default, there is no tajo-site.xml in ${TAJO}/conf directory. If you set some configs, first copy $TAJO_HOME/conf/tajo-site.xml.templete to tajo-site.xml. Then, add the configs to your tajo-site.

Q13. How To Drop Database In Apache Tajo?

The syntax used to drop a database is -

DROP DATABASE

Ex: test> c default

Q14. Explain Different Queries Performed By Apache Tajo?

Predicates: To evaluate the true/false values of the UNKNOWN, an expression used is known as Predication. For the search condition of WHERE clause and HAVING clause, and constructs that require a Boolean value, predicate is used.

Explain: To obtain a query execution plan with a logical and global plan execution of a statement, Explain is used.

Join: SQL joins are used to combine rows from two or more tables.

The following are the different types of SQL Joins:

Inner join
{ LEFT | RIGHT | FULL } OUTER JOIN
Cross join
Self join
Natural join

Q15. What Are The Benefits Of Apache Tajo?

Apache Tajo offers the following benefits:

Easy to use
Simplified architecture
Cost-based query optimization
Vectorized query execution plan
Fast delivery
Simple I/O mechanism and supports various type of storage.
Fault tolerance

Q16. How To Create Index Statement In Apache Tajo?

The CREATE INDEX statement is used to create indexes in tables. Index is used for fast retrieval of data. Current version supports index for only plain TEXT formats stored on HDFS.

CREATE INDEX [ name ] ON table_name ( { column_name | ( expression ) }

create index student_index on mytable(id);

Q17. How To Set Property In Apache Tajo?

This property is used to change the table’s property.

ALTER TABLE students SET PROPERTY 'compression.type' = 'RECORD',

'compression.codec' = 'org.apache.hadoop.io.compress.Snappy Codec' ;

Q18. What Are The Window Functions Provided By Apache Tajo?

The functions that execute on a set of rows and return a single value for each row are Window functions. The Window function in a query, defines the window using the OVER() clause.

The OVER() clause has the following capabilities:

Defines window partitions to form groups of rows. (PARTITION BY clause)
Orders rows within a partition. (ORDER BY clause)

Some of the window functions are:

rank()
row_num()
lead(value[, offset integer[, default any]])
lag(value[, offset integer[, default any]])
first_value(value)
last_value(value)

Q19. What Are The Different Data Formats Supported By Apache Tajo?

Text
JSON
Parquet
RCFile
SequenceFile
ORC

Q20. Explain Abount Postgresql Storage Handler?

Tajo supports PostgreSQL storage handler. It enables user queries to access database objects in PostgreSQL. It is the default storage handler in Tajo so you can easily configure it.

{

"spaces": {

"postgre": {

"uri": "jdbc:postgresql://hostname:port/database1"

"configs": {

"mapped_database": “sampledb”

"connection_properties": {

"user":“tajo", "password": "pwd"

}

Here, “database1” refers to the postgreSQL database which is mapped to the database “sampledb” in Tajo.

Q21. What Is Distinct Clause In Apache Tajo?

A table column may contain duplicate values. The DISTINCT keyword can be used to return only distinct (different) values.

SELECT DISTINCT column1,column2 FROM table name;

select distinct age from mytable;

Q22. What Are The Data Formats Supported By Apache Tajo?

Apache Tajo supports the following data formats:

JSON
Text file(CSV)
Parquet
Sequence File
AVRO
Protocol Buffer
Apache Orc

Q23. How To Create Database Statement In Apache Tajo?

The statement used to create a database in Tajo is Create Database and the syntax for the statement is:

CREATE DATABASE [IF NOT EXISTS]

Ex: default> create database if not exists test;

Q24. Explain About Catalog Configuration?

If you want to customize the catalog service, copy $TAJO_HOME/conf/catalog-site.xml.template to catalog-site.xml. Then, add the following configs to catalog-site.xml. Note that the default configs are enough to launch Tajo cluster in most cases.

tajo.catalog.master.addr - If you want to launch a Tajo cluster in distributed mode, you must specify this address. For more detail information, see Default Ports.

tajo.catalog.store.class - If you want to change the persistent storage of the catalog server, specify the class name. Its default value is tajo.catalog.store.DerbyStore. In the current version, Tajo provides three persistent storage classes as follows:

tajo.catalog.store.DerbyStore - this storage class uses Apache Derby.

tajo.catalog.store.MySQLStore - this storage class uses MySQL.

tajo.catalog.store.MemStore - this is the in-memory storage. It is only used in unit tests to shorten the duration of unit tests.

Q25. How To Insert Records In Apache Tajo?

To insert records in the 'test' table, type the following query.

db sample> insert overwrite into test select * from mytable;

Q26. How Can We Launch A Tajo Cluster?

To launch the tajo master, execute start-tajo.sh.

$ $TAJO HOME/sbin/start-tajo.sh

After then, you can use tajo-cli to access the command line interface of Tajo. If you want to how to use tsql, read Tajo Interactive Shell document.

$ $TAJO HOME/bin/tsql

Top 26 Apache Tajo Interview Questions You Must Prepare 27.Jul.2024