Research in novel data management architectures to overcome the drawbacks of conventional, monolithic data platforms.
Federated information systems connect multiple information sources and allow users to access data without ingesting it in a central system. In practice, however, most systems are either data warehouse technologies, which are used to query large amounts of “historical” data, or data lake technologies, which store data in its raw form (e.g., files). These approaches have serious drawbacks, as they lead to centralized and monolithic data management platforms with no clear domain boundaries and ownership of domain data. Such systems are difficult to manage for large enterprises with multiple data sources and different consumers.
Decentralizing monolithic data platforms such as data lakes or data warehouses requires a rethinking of organizing data management and data ownership. A promising concept in this regard are data meshes. They constitute a new strategy to organize data infrastructures, namely as a mesh of distributed “data products". Instead of having a centralized data lake infrastructure, data meshes distribute the ownership of data. There is not one centralized team responsible for the data; the control over data is distributed to different locations and to the business domains, where the data comes from.
The challenge that is tackled in this project is to distribute an existing data mesh platform over multiple locations. More precisely, we intend to create a solution for distributing datasets (including metadata, access policies, and governance functionalities) across multiple locations and organisations.
Goals and Methods
The main idea behind the data mesh concept is to apply product thinking to a data management architecture in order to overcome the drawbacks of conventional, monolithic data platforms. We and our network of project partners (i.e., Nexyo) intend to develop a solution for a fully distributed data mesh platform. Such a data platform allows federated governance and policy management (e.g., the access control) of individual data products. The aims of our project are:
- Collecting stakeholder requirements with regard to distributed data platforms and answering questions such as how data domains need to be organized and how data governance is allocated to individual data owners.
- Developing an architecture model, which details how the domains and the corresponding data products are distributed across multiple infrastructures and organizations.
- Conducting research on the building blocks of the architecture and tackling problems such as reaching a consensus on a single truth (e.g., the state of the data, updates to the metadata) across the distributed platform and how agreement on governance decisions (e.g., access control policies) can be achieved.
- Developing a prototype, which can be integrated into Nexyo's Data Hub solution.
The publications of the project raise the awareness for research on the conceptualization and implementation of data mesh platforms and underscore its significance in areas dealing with data management, information systems and data science. The company partner Nexyo will transform the project results into a marketable product. The process of implementing the project outcomes as well as the steps needed for refining and further developing the product are accompanied by the whole project team. Moreover, as the partner Nexyo has a large network of data-driven companies, the results obtained in this project can be propagated internationally and contribute to establishing data mesh architectures as a new standard on a European level.