Databricks announced the launch of a new open source project called Delta Sharing, an open protocol for securely sharing data across organizations in real time, completely independent of the platform on which the data resides.
Delta Sharing is included within the open source Delta Lake project, and supported by Databricks and a broad set of data providers including Nasdaq, ICE, S&P, Precisely, Factset, Foursquare, SafeGraph, and software vendors like Amazon Web Services (AWS), Google Cloud, and Tableau. It is the fifth major open source project launched by Databricks, following Apache Spark, Delta Lake, MLflow, and Koalas, and is being donated to the Linux Foundation.
Data sharing has become critical to the digital economy as enterprises wish to easily and securely exchange data with their customers, partners, and suppliers — such as a retailer sharing timely inventory data with each of the brands they carry. However, data sharing solutions have historically been tied to a single vendor or commercial product, tethering data access to proprietary systems and limiting collaboration between organizations that use different platforms.
“The top challenge for data providers today is making their data easily and broadly consumable. Managing dozens of different data delivery solutions to reach all user platforms is untenable. An open, interoperable standard for real-time data sharing will dramatically improve the experience for both data providers and data users,” said Matei Zaharia, Chief Technologist and Co-Founder of Databricks. “Delta Sharing will standardize how data is securely exchanged between enterprises regardless of which storage or computing platform they use, and we are thrilled to make this innovation open source.”
As Delta Sharing eliminates vendor lock-in, it enables a much broader and more diverse set of use cases than have ever been possible before.
For example, an academic institution and hospital system partnering on vaccine research would have a standard, easy way to securely share research data and collaborate on their findings – without being constricted by proprietary data formats or differing applications and tools, and without requiring complex setup such as installing the same data warehouse software in both organizations. Or, an airplane engine manufacturer would have a standard way to access engine performance data from all the different airlines it serves, even if each airline uses a different set of systems to store and manage that data.
Delta Sharing extends the applicability of the lakehouse architecture that organizations are rapidly adopting today, as it enables an open, simple, collaborative approach to data and AI within, and now between, organizations.
A new open standard for securely sharing data across organizations
Underpinned by Delta Lake 1.0 and benefitting from a vendor neutral governance model supported by the Linux Foundation, Delta Sharing establishes a common standard for sharing all data types with an open protocol that can be used in SQL, visual analytics tools, and programming languages such as Python and R.
Delta Sharing also allows organizations to seamlessly share existing large-scale datasets in the Apache Parquet and Delta Lake formats in real time without copying them, and can be easily implemented within existing software that supports Parquet.
The introduction of Delta Sharing marks the latest advancement in Databricks’ pursuit of fostering an open, democratized data and AI ecosystem. Recognizing that innovation flourishes through collaboration, not isolation, Delta Sharing builds on Databricks’ longtime commitment to the open source community and adds to a storied catalog of open source projects, including the widely-adopted Delta Lake, Apache Spark, MLflow and Koalas – projects downloaded more than 15 million times per month by data teams around the world.
Vendor neutral flexibility to consume, analyze and visualize shared data with tools of choice
Delta Sharing provides built-in security controls and easy-to-manage permissions that helps ensure privacy and compliance needs are met as data assets are being shared securely across organizations.
Delta Sharing also allows organizations to confidently share data across suppliers and partners while giving each of those data teams the flexibility to query, visualize and enrich that shared data with their tools of choice including Azure Purview, GCP Big Query, AtScale, Collibra, Dremio, Immuta, Looker, Privacera, Qlik, Power BI, and Tableau.
“The ability to easily access, analyze and share data is crucial for fostering innovation and building truly data-driven organizations,” said Francois Ajenstat, Chief Product Officer at Tableau. “Establishing a new, open standard for data sharing aligns with Tableau’s mission of democratizing data and empowering anyone to make faster, smarter decisions. We look forward to supporting the future of Delta Sharing and helping our customers tap into the flexibility of an open, collaborative data ecosystem.”
As an open protocol for sharing data securely across organizations, supported by the Delta Lake open source project, Databricks, and commercial partners:
“We support Delta Sharing and its vision of an open protocol that will simplify secure data sharing and collaboration across organizations. Delta Sharing will enhance the way we work with our partners, reduce operational costs and enable more users to access a comprehensive range of Nasdaq’s data suite to discover insights and develop financial strategies,” said Bill Dague, Head of Alternative Data, Nasdaq.
“Our investment in Azure Data Share reflects the vision we share with Databricks — that data sharing should be open. We see Delta Sharing as aligning well with that vision. We are pleased to be moving forward with Databricks in our shared goals of supporting an open data ecosystem,” said Mike Flasko, Partner Director, Program Management at Microsoft.
“Google Cloud and Databricks share a common vision of making data accessible, actionable, and open in order to help businesses make informed decisions in today’s rapidly-changing environment,” said Sudhir Hasbe, Director, Product Management at Google Cloud. “We’re delighted to deliver Databricks on Google Cloud, and to support data accessibility and portability through solutions such as BigQuery to ensure that organizations can share data safely, and discover new and unique insights.”