Draft:Polypheny

Polypheny is a multi-model database management system (DBMS) designed to process and manage a diverse array of data and mixed workloads. Based on the concept of a PolyDBMS (Polymorphic Database Management System), Polypheny presents a technical framework for managing diverse data through a unified platform. The system can process complex queries across extensive and heterogeneous datasets by utilizing various third-party database systems, each optimized for specific workloads, as execution engines. It accommodates structured, semi-structured, and unstructured data, allowing for the processing of various workloads.

The system is capable of managing various types of data while preserving their semantics. Polypheny natively supports the relational, document and labeled-property graph data models. Furthermore, Polypheny includes a browser-based user interface, which provides a platform for the management and monitoring of the system, as well as facilitating data and schema management and data querying. This interface also supports computational notebooks, providing a single interface for data integration and analysis tasks.

Polypheny supports a variety of query languages, such as SQL, openCypher, the Contextual Query Language, and the MongoDB Query Language, which contributes to its versatility in data retrieval and management. A distinctive feature of Polypheny lies in its capability to execute cross-model queries, allowing for the combination of data, irrespective of its organizational structure and model, within a singular query through a unified query language and interface. This functionality simplifies data interaction and supports a unified approach to managing and querying diverse data sets within the system.

History
Polypheny started as a research project at the University of Basel in 2017. The initial funding for the project has been provided by the Swiss National Science Foundation (SNF). The project was rooted in the exploration and development of advanced database management systems, with the objective of developing a solution that could handle a wide range of data types, query languages, and workloads. The foundational research and development were focused on innovating data management and developing a system intended to address the varied and changing needs of data-driven applications.

A significant milestone for Polypheny has been the selection as a mentoring organization for the Google Summer of Code (GSoC) in 2021, followed by subsequent selection in 2022 and 2024 .This involvement provided a platform for the project to gain visibility, engage with the global developer community, and facilitate collaborative development and knowledge exchange. The GSoC participation contributed to the advancement of the project, bringing in fresh perspectives and contributing to the enhancement and refinement of Polypheny.

The project reached a milestone with its first public release in February 2022, transitioning from a research project to a publicly available database management system. This release was characterized by the introduction of several key features and functionalities, including support for multiple data models and query languages, and the ability to perform cross-model queries. The public release made the solutions developed during the research phase available to a wider audience and various applications.

The establishment of Polypheny GmbH followed in December 2022, adopting an open-core model to drive the project forward. This model is intended to ensure the continued development and support of the open-source (community) edition of Polypheny and offers additional functionalities through enterprise editions. The formation of Polypheny GmbH indicated a move towards commercializing the Polypheny DBMS, while also aiming to ensure its ongoing development and adaptation to the changing needs and challenges of data management and analytics across various applications and sectors.

Platform
Polypheny is developed using Java, enabling it to run on systems for which an appropriate Java Runtime Environment (JRE) is available. The available releases include a Java Virtual Machine (JVM) and other dependencies to facilitate installation and execution across various operating systems. Releases are available for Windows, macOS, and Linux. For Linux, support is limited to distributions using either RPM or DEB packages.

Schema Model
Polypheny's schema model accommodates multiple data models, including the relational, document, and Labeled Property Graph (LPG) models, with each model holding equal standing within the system. Unlike some multi-model database systems that convert data into a single (extended) data model, Polypheny maintains the intrinsic semantics of each data model by supporting them in their original forms. This approach is intended to maintain the unique advantages and characteristics of each data model, aiming to provide users with the respective benefits and capabilities of each model. No model is prioritized or considered primary; they coexist, providing an environment for managing diverse data types and structures.

In Polypheny, namespaces are pivotal in organizing and structuring data, acting as a crucial element for defining logical schemas. Each namespace in Polypheny acts as a logical container, segregating different data models within the logical schema and identified by a unique name corresponding to a specific data model. The definition of a schema occurs within a namespace, and this namespace determines the rules and semantics for that schema, adhering to the principles of the corresponding data model. For instance, a namespace associated with a relational model adheres to its schema rules, while a namespace related to a document model follows its schema-less structure.

Polypheny supports cross-model queries and automated mapping between different data models. This capability allows data, organized and structured according to different data models, to be combined in a single query using one query language and interface, without the need for manual data transformation or migration.

Storage Model
Polypheny's storage model is designed to manage and store a variety of data, with a focus on performance and scalability. The model distinguishes between data stores and data sources, each having a specific role in data management, storage, and access within the system.

Data Stores
Data stores in Polypheny are integral components responsible for persistently storing data and executing queries. They serve not only as storage units but also have the capability to execute queries, participating in the data retrieval process. Polypheny supports several well-known and highly optimized and domain-specific database systems to be utilized as data stores. This inherently distributed architecture also introduces horizontal scaling, which can be advantageous when processing huge amounts of data. The data stores can handle different data models and are optimized to ensure efficient data storage and retrieval. Polypheny also allows data to be replicated and partitioned across multiple data stores.

Data Sources
Data sources act as connectors to external systems and do not store data persistently within Polypheny. They serve as bridges to other sources of information (e.g., other database systems, blockchains, files), allowing Polypheny to access and integrate external data without physically storing it. Data sources allow Polypheny to execute queries on external systems, retrieve data, and integrate it with the data in its own data stores. This means that Polypheny can access and utilize data from external databases and static files, providing a unified interface for accessing and managing data across multiple platforms and systems.

Internal Engine
Polypheny incorporates an internal execution engine, involved in query optimization and enabling the execution of query plans across multiple underlying data stores and data sources. Additionally, this internal engine can compensate for missing query functions on a data store or data source. Although this integrated engine might not achieve the same level of efficiency and optimizations as certain highly optimized database systems, it is not confined to a specific data model and natively supports all three data models available in Polypheny.

The internal execution engine plays a key role in executing cross-model queries and managing automated mapping between different data models. When a query is executed, the engine determines a method to retrieve and combine data from various data stores and sources, considering the different data models and query functions involved. It utilizes various optimization techniques, such as cost-based optimization and rule-based optimization, to determine an execution plan for each query, aiming to perform data retrieval in a resource-efficient manner.

Furthermore, the internal execution engine also facilitates the seamless execution of cross-model queries, enabling users to query data across different data models using a single query language. It handles the task of mapping between different data models, aiming to execute queries accurately and efficiently, even when they involve multiple data models and storage systems.

When a query involves data from different stores or sources, the internal execution engine coordinates the retrieval and integration of this data, intending to present a unified set of results to the user. This functionality can be advantageous in environments with diverse data models and systems, as it provides a unified interface for query execution and data retrieval.