Adapters are additional Java-based programs that can be installed on the job server to provide connectivity to other systems such as Salesforce.com or the JavaMessagingQueue. There is also a SoftwareDevelopment Kit (SDK) to allow customers to create adapters for custom applications.
Data flow is used to extract, transform and load data from source to target system. All the transformations, loading and formatting occurs in dataflow.
Data Services Management Console
Discrete, multiline, and hybrid.
A Parameter is an expression that passes a piece of information to a work flow, data flow or custom function when it is called in a job. A Variable is a symbolic placeholder for values.
You can use Microsoft Excel workbook as data source using file formats in Data Services. Excel work book should be available on Windows file system or Unix File system.
Data Services also allows you to create a database datastore using Memory as the Database type. Memory Datastores are designed to enhance processing performance of data flows executing in real-time jobs.
DS Management Console → Job Execution History
Data Services includes the following standard components:
Most of the objects that are stored in repository can be reused. When a reusable objects is defined and save in the local repository, you can reuse the object by creating calls to the definition. Each reusable object has only one definition and all the calls to that object refer to that definition. Now if definition of an object is changed at one place you are changing object definition at all the places where that object appears.
An object library is used to contain object definition and when an object is drag and drop from library, it means a new reference to an existing object is created.
Single Use Objects:
All the objects that are defined specifically to a job or data flow, they are called single use objects. Example-specific transformation used in any data load.
SCDs are dimensions that have data that changes over time.
No, File format is not a datastore type.
Name match standards illustrate the multiple ways a name can be represented.They are used in the match process to greatly increase match results.
Array fetch size indicates the number of rows retrieved in a single request to a source database. The default value is 100@Higher numbers reduce requests, lowering network traffic, and possibly improve performance. The maximum value is 5000.
There are various database vendors which only provides one way communication path from one database to another database. These paths are known as database links. In SQL Server, Linked server allows one way communication path from one database to other.
Consider a local database Server name “Product” stores database link to access information on remote database server called Customer. Now users that are connected to remote database server Customer can’t use the same link to access data in database server Product. User that are connected to “Customer” should have a separate link in data dictionary of the server to access the data in Product database server.
This communication path between two databases are called database link and Datastores which are created between these linked database relationships is known as linked Datastores.
There is a possibility to connect Datastore to another Datastore and importing an external database link as option of Datastore.
It is a developer tool which is used to create objects consist of data mapping, transformation, and logic. It is GUI based and work as designer for Data Services.
You can create various objects using Data Services Designer like Project, Jobs, Work Flow, Data Flow, mapping, transformations, etc.
In Data Services, you can create a template table to move to target system that has same structure and data type as source table.
One Input: Embedded data flow is added at the end of dataflow.
One Output: Embedded data flow is added at the beginning of a data flow.
No input or output: Replicate an existing data flow.
Incorrect syntax, Job Server not running, port numbers for Designer and Job Server not matching.
Real-time jobs “extract” data from the body of the real time message received and from any secondary sources used in the job.
There is a staging area that is required during ETL load.
There are various reasons why a staging area is required:
As source systems are only available for specific period of time to extract data and this time is less than total data load time so Staging area allows you to extract the data from source system and keep it in staging area before time slot is ended.
Staging area is required when you want to get data from multiple data sources together. If you want to join two or more systems together. Example- You will not be able to perform a SQL query joining two tables from two physically different databases.
Data extractions time slot for different systems vary as per the time zone and operational hours.
Data extracted from source systems can be used in multiple data warehouse system, Operation Data stores, etc.
During ETL you can perform complex transformations that allows you to perform complex transformations and require extra area to store the data.
SAP BO Data Services is an ETL tool used for Data integration, data quality, data profiling and data processing and allows you to integrate, transform trusted data to data warehouse system for analytical reporting.
BO Data Services consists of a UI development interface, metadata repository, data connectivity to source and target system and management console for scheduling of jobs.
You can also add Conditionals to workflow. This allows you to implement If/Else/Then logic on the workflows.
This is most common transformation used in Data Services and you can perform below functions:
If you update version of SAP Data Services, there is a need to update version of Repository.
Below points should be considered when migrating a central repository to upgrade version:
Take the backup of central repository all tables and objects.
To maintain version of objects in data services, maintain a central repository for each version. Create a new central history with new version of Data Services software and copy all objects to this repository.
It is always recommended if you install new version of Data Services, you should upgrade your central repository to new version of objects.
Also upgrade your local repository to same version as different version of central and local repository may not work at the same time.
Before migrating the central repository, check in all the objects. As you don’t upgrade central and local repository simultaneously, so there is a need to check in all the objects. As once you have your central repository upgraded to new version, you will not be able to check in objects from local repository which is having older version of Data Services.
Use the Case transform to simplify branch logic in data flows by consolidating case or decision-making logic into one transform. The transform allows you to split a data set into smaller sets based on logical branches.
The Value that is constant in one environment, but may change when a job is migrated to another environment.
An Embedded Dataflow is a dataflow that is called from inside another dataflow.
You can create Datastore using memory as database type. Memory Datastore are used to improve the performance of data flows in real time jobs as it stores the data in memory to facilitate quick access and doesn’t require to go to original data source.
A memory Datastore is used to store memory table schemas in the repository. These memory tables get data from tables in Relational database or using hierarchical data files like XML message and IDocs.
The memory tables remain alive till job executes and data in memory tables can’t be shared between different real time jobs.
The Merge transform.
Transforms are used to manipulate data sets as inputs and creating one or multiple outputs. There are various transforms that can be used in Data Services.
Job, Workflow, Dataflow.
A script is a single-use object that is used to call functions and assign values in a workflow.
Repository is used to store meta-data of objects used in BO Data Services. Each Repository should be registered in Central Management Console CMC and is linked with single or many job servers which is responsible to execute jobs that are created by you.
There are three types of Repositories:
It is used to store the metadata of all objects created in Data Services Designer like project, jobs, data flow, work flow, etc.
It is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.
This is used to manage all the metadata related to profiler tasks performed in SAP BODS designer. CMS Repository stores metadata of all the tasks performed in CMC on BI platform. Information Steward Repository stores all the metadata of profiling tasks and objects created in information steward.
A file format is a set of properties describing the structure of a flat file (ASCII). File formats describe the metadata structure.
File format objects can describe files in:
Delimited format: Characters such as commas or tabs separate each field.
Fixed width format: The column width is specified by the user.
SAP ERP and R/3 format.
Embedded data flow is known as data flows which are called from another data flow in the design. The embedded data flow can contain multiple number of source and targets but only one input or output pass data to main data flow.
Directories provide information on addresses from postal authorities. Dictionary files are used to identify, parse, and standardize data such as names, titles, and firm data.
The DataServices repository is a set of tables that holds user-created and predefined system objects, source and target metadata, and transformation rules.
There are 3 types of repositories.
BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location.
All lookup functions return one row for each row in the source. They differ in how they choose which of several matching rows to return.