JHML - Joint Human Machine Learning for Exploratory Data Analysis

Combining Machine Learning with Methods of Visualization for Exploratory Data Analysis.

Background

Exploring, analysing and understanding data is essential for arriving at well-informed decisions. Specifically, exploratory data analysis has become more prevalent, as the volume and availability of data has grown rapidly and continues to grow. Data exploration, though, can be challenging. Aside from the sheer amount of data, the often unstructured nature of the data makes it difficult to analyse it with classic data exploration techniques. Although machine learning (ML) methods are quite effective in overcoming such difficulties, they can only do so, when large-scale labelled data is available. However, in practise, this is often not the case, making it challenging for such (pre-trained or training-intensive) models to successfully complete data exploration tasks. Things get even worse when the target labels and classes are not known a priori, which is the specific problem setting of this project.

Project Content

We consider data exploration and target definition as intertwined problems. In other words, the growing understanding of the data by the users gradually feeds into the AI model and vice versa (i.e., knowledge acquisition is a mutual process driven by both the user and the model). This gradual process is also reflected in the visualizations used, which represent the acquired knowledge. We envision a joint human-machine data exploration (JDE) approach for exploratory data analysis (inspired by the data-frame theory) making use of both, the strengths of ML as well as human visual perception and analytical skills. Users and domain experts engage in a dialogue with an intelligent agent that gradually learns from their insights, uncertainties, and disagreements throughout this interaction. This helps to translate vague notions about the data into concrete expectations. Additionally, the agent alerts the users to unexpected patterns and information, thus revealing new, potentially interesting aspects. The procedure leads to new insights and consolidates already acquired knowledge.

Goals

The main goal of the project is to enable a more efficient and deeper knowledge gain through joint human-machine data exploration (JDE). Because our target audience are people with little or no experience in machine learning, JDE has to be intuitive, efficient, versatile, and capable of dealing with the viewpoints, perspectives, and ideas of multiple people working together. We also strive for a general solution that is easily accessible, can be employed in various scenarios and settings, and offers machine learning support with high standards of accuracy. To accomplish this the following questions need to be answered:

Is it possible to learn the users’ understanding of data from graphical externalizations (e.g., concept maps, mind maps or spatially organized physical pieces of information)?
Can we guide framing, questioning, and reframing by the user through personalized concept spaces, which capture the subjective perspectives of the users?
Can we trigger effective questioning and reframing by visualizing how the data fits the users’ understanding and thereby support making sense of the data?
As compared to conventional techniques for visual data exploration, can JDE support users in creating representations of a data structure that are more expressive and elaborate?

Methods

We integrate interactive ML and interactive visualization to learn about data and from data in a joint fashion. To this end, we propose a data-agnostic joint human-machine data exploration (JDE) framework that supports users in the exploratory analysis and the discovery of meaningful structures in the data. In contrast to existing approaches, we investigate data exploration from a new perspective that focuses on the discovery and definition of complex structural information from the data rather than primarily on the model (as in ML) or on the data itself (as in visualization). Additionally, we apply a new design process that includes iterative design and evaluation steps. A formative study is at the heart of this process, with the goal of gathering benchmark data and developing guidelines for interactive data exploration. Ultimately, we perform case studies with experts from fields outside of computer science.