Top 50 Teradata Interview Questions You Must Prepare 28.Mar.2024

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index. All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

Following query describes each column in the Teradata RDBMS

  SELECT * FROM DBC.TVFields;

Following query describes columns contained in indexes in the Teradata RDBMS

  SELECT * FROM DBC.Indexes;

Yes, we can Load a table using multiple datafiles in Fastload. Before giving "end loading" statement user can define file path and use insert sql for multiple source files and give "end loading" statement at the end.

MultiLoad is a very powerful utility; it has following limitations:

  • MultiLoad Utility doesn’t support SELECT statement.
  •  Concatenation of multiple input data files is not allowed.
  •  MultiLoad doesn’t support Arithmatic Functions i.e. ABS, LOG etc. in Mload Script.
  •  MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.

     

  • Preliminary Phase – Basic Setup
  •  DML Phase – Get DML steps down on AMPs
  •  Acquisition Phase – Send the input data to the AMPs and sort it
  •  Application Phase – Apply the input data to the appropriate Target Tables
  •  End Phase – Basic Cleanup

     

  •  Get business requirements.
  •  Create High Level Conceptual Data Model.
  •  Create Logical Data Model.
  •  Select target DBMS where data-modeling tool creates the physical schema.
  •  Create standard abbreviation document according to business standard.

Node is a database running in a server. We used 318 nodes and each node has 2 to 4 AMPS.

@Partition by CASE
CREATE TABLE ORDER_Table
(
ORD_number integer NOT NULL,
customer_number integer NOT NULL,
Generated by Foxit PDF Creator © Foxit Software
http://www.foxitsoftware.com For evaluation only.
order_date date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY case1 (
order_total < 10000 ,
order_total < 20000 ,
order_total < 30000,
NO CASE OR UNKNOWN ) ;
@Partition by Range - example using date range
CREATE TABLE ORDER_Table
(
ORD_number integer NOT NULL,
customer_number integer NOT NULL,
order_date date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY range1 (
Order_date BETWEEN date '2010-01-01' AND date '2010-12-01'
EACH interval '1' month ,
NO RANGE OR UNKNOWN);

There are 5 different indices present in Teradata
@Primary Index.
a.Unique primary index.
b. non Unique primary index.
@Secondary Index.
a. Unique Secondary index.
b. non Unique Secondary index.
@Partitioned Primary Index.
a. Case partition (ex. age, salary...).
b. range partition ( ex. date).
@Join index.
a. Single table join index.
b. multiple table join index.
c. Sparse Join index ( constraint applied on join index in where clause).
@Hash index.

Explain gives the execution strategy means what are the different steps that the query will go through.
HIGH CONFIDENCE: Statistics are collected.
LOW CONFIDENCE: Statistics are not collected. But the where condition is having the condition on indexed column. Then estimations can be based on sampling.
NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.

  Select databasename , tablename , columnposition ,columnname from dbc.indices   where indextype ='Q'  order by 1 ,2,3 ;

  • Teradata performance and workload.
  • Client platform type, performance and workload.
  • Channel performance for channel attached systems.
  • Network topology and performance for network attached systems.
  • Volume of data to be processed by the application.

@It allows us to write SQL statements along with BTEQ commands. We can use BTEQ for importing, exporting and reporting purposes.
@The commands start with a (.) dot and can be terminated by using (;), it is not mandatory to use (;).
@BTEQ will assume any thing written with out a dot as a SQL statement and requires a (;) to terminate it.

Following are the different functions which a DBA can perform:
@User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
@Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
@Access of Database Objects – Granting and Revoking Access Rights on different database objects.
@Security Control – Handling logon and logoff rules for Users.
@System Maintenance – Specification of system defaults, restart etc.
@System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
@Resource Monitoring – Database Query Log(DBQL) and Access Logging.
@Data Archives, Restores and Recovery – ARC Utility and Permanent Journals.

Clique is a group of disk arrays physically cabled to a group of nodes.

The BTEQ .SHOW CONTROL command displays BTEQ settings.

A data warehouse is a subject oriented, integrated, time variant, non-volatile collection of data in support of management's decision-making process.

V2R6 included the feature of replica in it. in which copy of data base are available on another system which means V2R6 provide the additional data protection as comparison to V2R5 while if data from one system has been vanishes.

Checkpoints are entries posted to a restart log table at regular intervals during the data transfer operation. If processing stops while a job is running, you can restart the job at the most recent checkpoint.

  1. If the teradata is going to be only a data base means It won’t change the System development life cycle (SDLC).
  2. If you are going to use the teradata utilities then it will change the Architecture or SDLC.
  3. If your schema is going to be in 3NF then there won’t be huge in change.

     

Collects demographic data for one or more columns of a table, hash index, or join index, computes a statistical profile of the collected data, and stores the synopsis in the data dictionary. The Optimizer uses the synopsis data when it generates its table access and join plans.

Tpump provides an alternative to MultiLoad for low volume batch maintenance of large databases under control of a Teradata system. Tpump updates information in real time, acquiring every bit of a data from the client system with low processor utilization. It does this through a continuous feed of data into the data warehouse, rather than the traditional batch updates. Continuous updates results in more accurate, timely data. Tpump uses row hash locks than table level locks. This allows you to run queries while Tpump is running.

Star schema is De-normalized and snowflake schema is normalized.

Yes, you can have a column for Primary key and have another column which will have no duplicates or null. e.g. A Salary Table will have employee ID as primary key. The table also contains TAX-ID which can not be null or duplicate.

@collecting statistics.
@Explain Statements.
@Avoid Product Joins when possible.
@select appropriate primary index to avoid skewness in storage.
@Avoid Redistribution when possible.
@Use sub-selects instead of big "IN" lists.
@Use derived tables.
@Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed).
@Use Compression on large tables.

An operational data store (ODS) is primarily a "dump" of relevant information from a very small number of systems (often just one) usually with little or no transformation. The benefits are an ad hoc query database, which does not affect the operation of systems required to run the business. ODS’s usually deal with data "raw" and "current" and can answer a limited set of queries as a result.

Analyzing of large volumes of relatively simple data to extract important trends and new, higher level information. For example, a data-mining program might analyze millions of product orders to determine trends among top-spending customers, such as their likelihood to purchase again, or their likelihood to switch to a different vendor.

In a slowly changing dimension the attribute for a record varies over time. There are three ways to solve this problem.
Type 1 – Replace an old record with a new record. No historical data available.
Type 2 – Keep the old record and insert a new record. Historical data available but resources intensive.
Type 3 – In the existing record, maintain extra columns for the new values.

No, because they become a single amp operation and my company didn’t encourage that.

Yes we can collect statistics on table level. The syntax is COLLECT STAT ON TAB_A;

TIMESTAMP (0) is CHAR (19) and TIMESTAMP (6) is CHAR (26)
Everything is same except that TIMESTAMP (6) has microseconds too.

There are 12 codd's rules applied to the teradata database.

The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions. Sleep command can be used with all load utilities not only fastload. This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration.

FALLBACK requests that a second copy of each row inserted into a table be stored on another AMP in the same cluster. This is done when AMP goes down or disk fails.

I use explain statement to check if the query is doing what I wanted it to do.

The three-tier differs from the two-tier architecture by strictly enforcing a logical separation of the graphical user interface, business logic, and data. The three-tier is widely used for data warehousing today. Organizations that require greater performance and scalability, the three-tier architecture may be more appropriate. In this architecture, data extracted from legacy systems is cleansed, transformed, and stored in high –speed database servers, which are used as the target database for front-end data access.

Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts (measures) are stored in these tables.

Primary index is the mechanism for assigning a data row to an AMP and a location on the AMP’s disks. Indexes also used to access rows from a table without having to search the entire table. Secondary indexes enhance set selection by specifying access paths less frequently used than the primary index path. Secondary indexes are also used to facilitate aggregate operations.

If a secondary index covers a query, then the Optimizer determines that it would be less costly to accesses its rows directly rather than using it to access the base table rows it points to. Sometimes multiple secondary indexes with low individual selectivity can be overlapped and bit mapped to provide enhanced.

It is Entity relationship diagram. Describes the relationship among the entities in the database model.

A Logical data model is the version of a data model that represents the business requirements (entire or part) of an organization and is developed before the physical data model. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.

Dimension table is one that describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables. In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling (star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc.

Yes we can collect statistics on multiple columns.

We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET.

Group by those fields and select id, count(*) from table group by id having count (*) > 1.

Subject area means fundamental entities that make up the major components of the business, e.g. customer, product, employee.