Data Asset Management in XMANAI
The management of assets in XMANAI should meet a number of critical requirements. One of them is the explainability of data, since Explainable AI is the main objective in the project. Data engineers and data analysts working in XMANAI should be able to understand the data, its structure and its semantics. This is an important requirement, not just for working with the data in XMANAI, but for sharing high quality data with third parties as well. Another important requirement is that the users of the XMANAI platform should be able to query data. Therefore, data management components should provide an API for that. The main challenge though is in the heterogeneity of data. The XMANAI platform is not restricted to support a limited set of data analytics scenarios or the data, which is collected and handled by the platform. It can vary referring to structure, semantic, volume and dynamics. Apart from data, XMANAI should be able to manage other types of assets as well.
The asset management definition takes into account a number of key decisions that defines the design of the data related components of the platform:
· Types of assets in XMANAI. The XMANAI asset management has to support different types of assets including data files, data collected from external APIs, data pushed in XMANAI through its API, data analytics scripts, metadata and more. For managing the assets we will distinguish three types of them: files (of any type), structured data and metadata according to the XMANAI data model. The ability to collect and manage any files enables the required flexibility in addressing the heterogeneity of assets. The conformance of data and metadata to the XMANAI data model enables the provision of an advanced query interface.
· Datasets and role of metadata. Sharing of data and provision of transparency about the data transactions, data structure and semantics demand the availability of high-quality metadata in XMANAI. The information stored in metadata will be used to provide the knowledge about the data to XMANAI users. A set of data and its metadata is called a dataset.
· Data structures in XMANAI. The heterogeneity of data and the need for an advanced query interface requires a special approach when it comes to data structures handling in XMANAI. The main idea of our approach is to define the most suitable data structure for storing and managing data in each particular case. The definition of this internal XMANAI representation for storing the data should be done based on the data structure and semantics provided by the users who register the data in the XMANAI platform. This registration includes a description of data that is done by the allocation of data fields to elements of the XMANAI data model. In particular, for the first versions of the XMANAI platform the consortium decided to focus on tabular structures as the internal representation of data. The decision is motivated by the assumption that the data emerges in the XMANAI demonstrators have tabular structure. Another important reason is the simplicity of working with tabular data. The process of creating a data structure for the internal representation of data in XMANAI includes the following steps:
o The user describes the data structure and semantics of a concrete data file(s) or data stream using the elements from the XMANAI data model. This information, provided by the user, is saved as part of metadata for the data in the XMANAI metadata registry.
o On the basis of this information about the structure and semantics of data, the XMANAI platform creates a table with a suitable structure in a relational (or potentially other type of) database and adds the respective data to it.
o The information about the created table is added to the corresponding metadata in the metadata registry. This information can be used to query the table of the database and get, update, modify, delete data or perform more complex queries involving more than one table.
This relatively simple data handling approach in XMANAI can be extended by adding other types of databases and allowing users to select the most suitable internal representation.