SAP Datasphere and Databricks: A Game-Changing Partnership
Back in March 2023, SAP announced SAP Datasphere. During this announcement, they discussed a powerful new open ecosystem with various partners. One of the partners mentioned was Databricks, which delivers an open and unified Data Intelligence Platform for data analytics and AI use cases.
In this two-part blog series, I want to discuss the key benefits of this partnership, along with how to get data from SAP Datasphere to Databricks and vice versa.
‘Why’ Databricks?
SAP Datasphere and Databricks are leading-edge data platforms each with an array of unique features and functionalities. SAP Datasphere focuses on handling business data effectively and efficiently, while Databricks shines with its best-in-class lakehouse architecture. This dynamic coupling helps to tackle big challenges in the data world as we currently know it, such as:
High Data Latency
Data Loss
Rebuilding Business Context
Data Duplication
Increased Cost & Maintenance due to Additional Software / Hardware Licenses
‘What’ are Some Use Cases for Databricks Coupled with SAP Datasphere?
This partnership is still fresh, meaning there is a world of possibilities to be truly tried and tested when it comes to integrating SAP Datasphere with Databricks. However, let’s review a few common use cases that we are currently seeing because of this partnership:
Harmonize SAP and Databricks Lakehouse Data with Zero Replication: Businesses can now bring together their SAP and non-SAP in one central location (SAP Datasphere) without having to run fragile or time-consuming ETL processes while preventing the replication of this data across systems. This subsequently prevents the need to make sense of SAP data which has been stripped of its business semantics.
Machine Learning, AI and Advanced Analytics: Businesses can now combine data from SAP and non-SAP (websites, IoT, Streaming) sources in Databricks Lakehouse Architecture. Predictive analytics can then be carried out on this data using libraries such as FedML in Databricks to build, train and deploy machine learning models for business insights. ML-generated insights could range from customer satisfaction to supply chain optimization. These insights can then be fed back to SAP Datasphere where the information can be consolidated.
SAP Analytics Cloud: SAP SAC needs no introduction; businesses can now use this frontend reporting tool to report on consolidated insights using stories by leveraging the live connection to SAP Datasphere (for more information on the integration between Datasphere and SAC check out my colleague's blog series Uniting SAP Datasphere with SAP Analytics Cloud for Data Harmony - Revolutionize Your Data Strategy).
‘How’ can I get data from SAP Datasphere to Databricks?
SAP and Databricks through some clever innovations have made the integration process as easy as possible. To enable this connection, it is important that the SAP Data Provisioning Agent (DP Agent) is set up and the camelJDBC adapter has been configured. Once, configured fedML or JDBC can be used to make the connection from within Databricks.
This quick guide assumes that you have already created a view in SAP Datasphere which is “Exposed for Consumption”.
Step 1 (SAP Datasphere): To connect to the SAP HANA database from Databricks, you need to update the IP Allowlist within your SAP Datasphere tenant with the IP address/range of your Databricks cluster. This can be carried out by navigating to System >> IP Allowlist >> Trusted IPs >> Add:
Step 2 (SAP Datasphere): Set up a database user which can be used by Databricks to access to SAP HANA database. Database users are granted at the space level. Therefore, to create a new user navigate to Space Management >> Database Access >> Database Users >> Create. Provide the necessary information for the connection you wish to establish:
Step 3 (SAP Datasphere): Complete the form with the necessary information needed to establish the connection and select “Create”:
Step 4 (SAP Datasphere): Once the new user has been created select the ‘information’ icon located at the end of the new user's record. Store the Database Username, Space Schema, Host Name, Port and Password for use later in Databricks:
Step 5 (Databricks): Now let's move over to Databricks and configure our cluster for the connection. The only config needed on the Databricks side is the installation of the SAP HANA JDBC driver from SAP Development Tools:
Step 6 (Databricks): Once the SAP HANA JDBC driver has been downloaded it must be installed on the Databricks cluster. In this example, the library was installed using DBFS as the source. Simply drag and drop the downloaded .jar file into the designated area, wait for it to upload and select ‘Install’:
Step 7 (Databricks): Run your first query to ensure the connection has been configured successfully and you can see the data housed in SAP Datasphere within Databricks (you will need the information collected in Step 4 to configure the connection properties). If a data frame is returned – congratulations, you have integrated SAP Datasphere with Databricks.
SeaPark currently has a fully operational demo showing how this connection works, both from SAP Datasphere to Databricks and vice versa. This demo also highlights how machine learning can be implemented in Databricks to gain further insights into customer data (customer churn). For a sneak peek or assistance setting up your connection contact william.hadnett@seaparkconsultancy.com.
Hi Team, This is an amazing article, thank you for putting this. One comment I have is, when you speak about adding IP address, I assume you are talking about external IP address. I have a question and need some clarity, given Databricks is a cloud based tool and the IP address changes everytime we restart the cluster, how can we configure a permanent IP address at Datasphere for Databricks access. Thanks in advance.