Top 45 Apache Hive Interview Questions You Must Prepare 14.Jun.2024

This can be done with following query

SHOW PARTITIONS table_name PARTITION(partitioned_column=’partition_value’)

If we set the property hive.exec.mode.local.auto to true then hive will avoid mapreduce to fetch query results.

The default record delimiter is − n

And the filed delimiters are − 01,02,03

By using the ENABLE OFFLINE clause with ALTER TABLE atatement.

Hive is a tool in Hadoop ecosystem which provides an interface to organize and query data in a databse like fashion and write SQL like queries. It is suitable for accessing and analyzing data in Hadoop using SQL syntax.

The values in a column are hashed into a number of buckets which is defined by user. It is a way to avoid too many partitions or nested partitions while ensuring optimizes query output.

It controls how the map output is reduced among the reducers. It is useful in case of streaming data.

It sets the mapreduce jobs to strict mode.By which the queries on partitioned tables can not run without a WHERE clause. This prevents very large job running for long time.

Yes. The TIMESTAMP data types stores date in java.sql.timestamp format

Yes, using the ! mark just before the command.

For example !pwd at hive prompt will list the current directory.

Depending on the nature of data the user has, the inbuilt SerDe may not satisfy the format of the data. SO users need to write their own java code to satisfy their data format requirements.

No. As this kind of Join can not be implemented in mapreduce

In a join query the smallest table to be taken in the first position and largest table should be taken in the last position.

It creates partition on table employees with partition values coming from the columns in the select clause. It is called Dynamic partition insert.

It is query hint to stream a table into memory before running the query. It is a query optimization Technique.

No. The name of a view must be unique when compared to all other tables and views present in the same database.

The LIKE operator behaves the same way as the regular SQL operators used in select queries.

Example − street_name like ‘%Chi’

But the RLIKE operator uses more advance regular expressions which are available in java

Example − street_name RLIKE ‘.*(Chi|Oho).*’ which will select any word which has either chi or oho in it.

It is a file containing list of commands needs to run when the hive CLI starts. For example setting the strict mode to be true etc.

SHOW INDEX ON table_name

This will list all the indexes created on any of the columns in the table table_name.

There is no way you can delete the DBPROPERTY.

No. It only reduces the number of files which becomes easier for namenode to manage.

  • Local mode
  • Distributed mode
  • Pseudodistributed mode

There are two types. Managed table and external table. In managed table both the data an schema in under control of hive but in external table only the schema is under control of Hive.

Yes. A partition can be archived. Advantage is it decreases the number of files stored in namenode and the archived file can be queried using hive. The disadvantage is it will cause less efficient query and does not offer any space savings.

Indexes occupies space and there is a processing cost in arranging the values of the column on which index is cerated.

A table generating function is a function which takes a single column as argument and expands it to multiple column or rows. Example exploe()

It is a UDF which is created using a java program to server some specific need not covered under the existing functions in Hive. It can detect the type of input argument programmatically and provide appropriate response.

Alter Table table_name RENAME TO new_name

It is a relational database storing the metadata of hive tables, partitions, Hive databases etc

The hive variable is variable created in the Hive environment that can be referenced by Hive scripts. It is used to pass some values to the hive queries when the query starts executing.

The data stays in the old location. It has to be moved manually.

Using the source command.

Example: Hive> source /path/to/file/file_with_query.hql

There are three collection data types in Hive.

  • ARRAY
  • MAP
  • STRUCT

No. A view can not be the target of a INSERT or LOAD statement.

  • The TBLPROPERTIES clause is used to add the creator name while creating a table.
  • The TBLPROPERTIES is added like:  TBLPROPERTIES(‘creator’= ‘Joan’)

The new incoming files are just added to the target directory and the existing files are simply overwritten. Other files whose name does not match any of the incoming files will continue to exist.

If you add the OVERWRITE clause then all the existing data in the directory will be deleted before new data is written.

Using REPLACE column option

ALTER TABLE table_name REPLACE COLUMNS ……

The schema is validated with the data when reading the data and not enforced when writing data.

No Hive does not provide insert and update at row level. So it is not suitable for OLTP system.