No. SAP Data Hub goes beyond classical batch ETL or real-time streaming. It modernizes these functions and focuses on the integration of new technologies, operating in distributed landscapes (e.g. Hadoop cluster or public cloud storages). The main paradigm is to bring the logic where the data resides and to leverage the cluster compute power. Hence it adds the processing and integration on top.
SAP Data Hub is already generally available, as of September 1, 2017.
SAP Data Hub is a data sharing, pipelining, and orchestration solution that helps companies accelerate and expand the flow of data across their modern, diverse data landscapes.
SAP Data Hub provides visibility and access to a broad range of data systems and assets; allows the easy and fast creation of powerful, organization-spanning data pipelines; and optimizes data pipeline execution speed with a “push-down” distributed processing approach at each step.
SAP Data Hub meets the governance and security needs of the enterprise, ensuring that appropriate policy measures are in place to meet regulatory and corporate requirements.
AP Leonardo is a digital innovation system that enables customers to rapidly innovate and then rapidly scale that innovation to redefine their business for the digital world. SAP’s Big Data solutions, SAP Data Hub, SAP Vora, and SAP Cloud Platform Big Data Services, are relevant to the Leonardo offering because they are key to scale and innovation. As such, they are offered in the Leonardo Big Data packages.
SAP Data Hub resonates with the core themes of Leonardo, because:
SAP Data Hub helps drive value of analytics by optimizing the data pipeline with speed and security to enable organizations to act on the right information in the moment. SAP is the only vendor in the market that can offer an end-to-end software portfolio across Data, Analytics, and Business Applications. SAP Analytics Cloud, a cloud based solution for all analytics (built on SAP Cloud Platform); will take advantage of powerful data orchestration capabilities with SAP Data Hub, allowing organizations to enhance powerful analytical use cases through the ability to control, manage and optimize their data environments.
SAP Vora capabilities are included in SAP Data Hub, however SAP Data Hub and SAP Vora are designed to address different use cases, based on customers’ specific needs.
SAP Data Hub simplifies the orchestration of complex data processes while providing governance across modern and diverse landscapes including big data stores, enterprise data stores, enterprise applications and cloud solutions.
SAP Vora is an enterprise-ready, easy-to-use in-memory distributed computing engine to help organizations uncover actionable insights from Big Data, typically stored in Hadoop and NoSQL solutions. It is positioned for both data scientists, and as a part of multi-tier data strategy with Hadoop.
SAP Data Hub will leverage existing customer investments and execute SAP HANA SDI/SDQ flowgraphs that run on SAP HANA boxes, as well as leverage SAP Data Services jobs that run on existing Data Services job servers. It will not replace their existing use cases.
SAP Data Hub is designed as a central place to orchestrate, monitor, and model integration flows, where SAP Data Services jobs, SAP HANA SDI and SDQ tasks, and Big Data flows can be brought together. These SAP EIM products will continue to be developed and offered separately from SAP Data Hub.
SAP Data Hub has some built-in profiling capabilities, but can be complemented with SAP ADP as a self-service data preparation tool. For this use case SAP ADP offers business users the capabilities to search and access their data sources, visually manipulate the data to make it ready for reporting, and publish it. It will be interacting closely with SAP Data Hub to bring this self-service to Big Data scenarios. In later releases SAP ADP, will leverage the metadata repository of SAP Data Hub.
here is more data and more ways to store and use it than ever before. While this data holds business opportunity, corporate data landscapes are growing increasingly complex, and it is getting harder and costlier for organizations to not only understand the data that they have, but to work across all the different systems that need to use it, and apply end-to-end governance, to capture the maximum value.
Key Pain Points:
For the initial release, SAP Data Hub will be offered as an on-premise application, which can connect and process data in cloud environments (e.g. Data Lakes in Amazon AWS). Its architecture is cloud-ready, and a PaaS and SaaS version will follow in future releases.
No. SAP Data Hub does not offer its own data storage. It is a platform to orchestrate and manage data between existing data storages, but is not a data warehouse, data mart, or Data Lake on its own.
SAP Data Hub gets its name from the fact that it offers centralized governance and pipelining capabilities – a unified view and data management of the complex data landscape.
Part of the power of the solution resides in its ability to leave the data where it is. The data does not have to be mass centralized with SAP Data Hub. This provides advantages in terms of ease of management and speed of data pipeline execution. Customers leverage their existing data stores and existing processing capabilities.