When addressing a table some dimension key must reflect the need for a record to get extracted. Mostly it will be from time dimension (e.g. date >= 1st of current month) or a traction flag (e.g. Order Invoiced Stat). Foolproof would be adding an archive flag to record which gets reset when record changes.
Non additive facts are facts that cannot be summed up for any dimensions present in fact table. These columns cannot be added for producing any results.
PowerCenter - ability to organize repositories into a data mart domain and share metadata across repositories.
PowerMart - only local repository can be created.
joiner is used to join two or more tables to retrieve data from tables(just like joins in sql).
Look up is used to check and compare source table and target table .(just like correlated sub-query in sql).
These are some differences b/w manual and ETL development.
ETL
Manual
Specify the Full path of the Shell script the "Post session properties of session/workflow".
Parameter file defines the value for parameter and variable used in a workflow, worklet or session.
Deleting data from data warehouse is known as data purging. Usually junk data like rows with null values or spaces are cleaned up.
Data purging is the process of cleaning this kind of junk values.
Full Load: completely erasing the contents of one or more tables and reloading with fresh data.
Incremental Load: applying ongoing changes to one or more tables based on a predefined schedule.
Yes, you can use advanced external trformation, You can use c++ language on unix and c++, vb vc++ on windows server.
The Latest Version is 7.2
Here are some popular versions of Informatica.
By using Aggregator trformation with first and last functions we can get first and last record.
Data cubes are commonly used for easy interpretation of data. It is used to represent data along with dimensions as some measures of business needs. Each dimension of the cube represents some attribute of the database. E.g profit per day, month or year.
SCD are dimensions whose data changes very slowly.
eg: city or an employee.
Snapshots are read-only copies of a master table located on a remote node which is periodically refreshed to reflect changes made to the master table. Snapshots are mirror or replicas of tables.
Views are built using the columns from one or more tables. The Single Table View can be updated but the view with multi table cannot be updated.
A View can be updated/deleted/inserted if it has only one base table if the view is based on columns from one or more tables then insert, update and delete is not possible.
Materialized view
A pre-computed table comprising aggregated or joined data from fact and possibly dimension tables. Also known as a summary or aggregate table.
A data warehouse can be thought of as a three-tier system in which a middle system provides usable data in a secure way to end users. On either side of this middle system are the end users and the back-end data stores.
A trformer built set of similar cubes is known as cube grouping. They are generally used in creating smaller cubes that are based on the data in the level of dimension.
The only difference b/w informatica 7 & 8 is... 8 is a SOA (Service Oriented Architecture) whereas 7 is not. SOA in informatica is handled through different grid designed in server.
The best procedure to take a help of debugger where we monitor each and every process of mappings and how data is loading based on conditions breaks.
An active data warehouse represents a single state of the business. It considers the analytic perspectives of customers and suppliers. It helps to deliver the updated data through reports.
Additive: A measure can participate arithmetic calculations using all or any dimensions.
Ex: Sales profit
Semi additive: A measure can participate arithmetic calculations using some dimensions.
Ex: Sales amount
Non Additive:A measure can't participate arithmetic calculations using dimensions.
Ex: temperature
You cannot lookup from a source qualifier directly. However, you can override the SQL in the source qualifier to join with the lookup table to perform the lookup.
Data staging is actually a collection of processes used to prepare source system data for loading a data warehouse. Staging includes the following steps:
Popular Tools:
ETL tool is meant for extraction data from the legacy systems and load into specified database with some process of cleing data.
ex: Informatica, data stage ....etc
OLAP is meant for Reporting purpose in OLAP data available in Multidirectional model. so that you can write simple query to extract data from the data base.
ex: Business objects, Cognos....etc
While the selection of a database and a hardware platform is a must, the selection of an ETL tool is highly recommended, but it's not a must. When you evaluate ETL tools, it pays to look for the following characteristics:
Informatica Metadata is data about data which stores in Informatica repositories.
Data mining can be used in a variety of fields/industries like marketing of products and services, AI, government intelligence.
The US FBI uses data mining for screening security and intelligence for identifying illegal and incriminating e-information distributed over internet.
ETL Tool:
It is used to Extract(E) data from multiple source systems(like RDBMS, Flat files, Mainframes, SAP, XML etc) trform(T) then based on Business requirements and Load(L) in target locations.(like tables, files etc).
Need of ETL Tool:
An ETL tool is typically required when data scattered across different systems.(like RDBMS, Flat files, Mainframes, SAP, XML etc).
If return port only one then we can go for unconnected. More than one return port is not possible with Unconnected. If more than one return port then go for Connected.
The ANALYZE statement allows you to validate and compute statistics for an index, table, or cluster. These statistics are used by the cost-based optimizer when it calculates the most efficient plan for retrieval. In addition to its role in statement optimization, ANALYZE also helps in validating object structures and in managing space in your system. You can choose the following operations: COMPUTER, ESTIMATE, and DELETE. Early version of Oracle7 produced unpredictable results when the ESTIMATE operation was used. It is best to compute your statistics.
EX:
select OWNER,
sum(decode(nvl(NUM_ROWS,9999), 9999,0,1)) analyzed,
sum(decode(nvl(NUM_ROWS,9999), 9999,1,0)) not_analyzed,
count(TABLE_NAME) total
from dba_tables
where OWNER not in ('SYS', 'SYSTEM')
group by OWNER
Key areas of activity in which favorable results are necessary for a company to obtain its goal.
There are four basic types of CSFs which are:
Connected lookup:
Connected lookup will receive input from the pipeline and sends output to the pipeline and can return any number of values it does not contain return port.
Unconnected lookup:
Unconnected lookup can return only one column it contain return port.
ETL stands for extraction, trformation and loading.
ETL provide developers with an interface for designing source-to-target mappings, trformation and job control parameter.
Extraction :
Take data from an external source and move it to the warehouse pre-processor database.
Trformation:
Trform data task allows point-to-point generating, modifying and trforming data.
Loading:
Load data task adds records to a database table in a warehouse.
A BUS schema is to identify the common dimensions across business processes, like identifying conforming dimensions. It has conformed dimension and standardized definition of facts.
ETL is extraction, trforming, loading process, you will extract data from the source and apply the business role on it then you will load it in the target the steps are :
A virtual data warehouse provides a collective view of the completed data. It can be considered as a logical data model of the containing metadata.
Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. They have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions.
Granularity, is the level of detail in which the fact table is describing, for example if we are making time analysis so the granularity maybe day based - month based or year based.