Visual Analysis of Large and Heterogeneous Scientific Workflows for Analytical Provenance. - A new visual forensic tool realizes the visualisation of data analysis workflows and helps to increase the reproduciblity of studies.
The complex analysis of big amounts of data
Over the last few years, many fields of science have been confronted with tremendous amounts of data that are growing rapidly. Large-scale experiments are carried out as scientific workflows on powerful server infrastructures, combining data-transformation and analytic operations. Often groups of researchers from different organizations collaborate on administrating and constantly revising and changing those workflows. Therefore, it is more difficult to comprehend what kind of changes and variations have taken place and how they might have influenced the results. Reproducibility of results is an essential aspect in this context since sustainable scientific progress is only possible if researchers can trust previously published results, which they then can use a fundament for advancing their field. However, to increase the trustworthiness and scientific value of such studies in the future, developing novel approaches that implement traceability and reproducibility is of utmost importance.
How to understand and visualize scientific workflows
The key to traceability and reproducibility lies in the collection of information about the processed data, the applied analysis steps, and their parameters over time. We refer to this bundle of information as provenance graph. Right now, it is extremely difficult to find out which changes occurring at the level of the input data, the workflow itself and its parametrization in the context of large-scale projects actually caused variations in the output using existing provenance approaches. The primary aim of the project is to develop an innovative visual forensic solution for scientific workflow provenance graphs, combining the following features:
- scalable workflow visualization methods
- change metrics for heterogeneous data
- advanced visual comparison techniques.
Traceability and reproducibility
The primary goal of the proposed research project is to implement provenance at all levels, allowing analysts to gain a deeper understanding of the workflow, changes applied to the workflow, and how they influence the results. Therefore, this project will have a positive impact on a variety of fields and disciplines.
Links
Copyright: JKU Linz/Stefan Luger
Publications
Press coverage
Dem Workflow auf der Spur
02/01/2016Medium: Austria Innovativ
Vorbereiten auf die digitale Welt
11/16/2016Medium: Der Standard
Wissenschaft und Forschung in Niederösterreich
11/01/2016Medium: UNIVERSUM Magazin
Big Data – und welche Chancen Daten bieten
06/10/2016Medium: Die Presse
- Harvard University, School of Engineering & Applied Sciences, Visual Computing Group (Pfister Lab)
- Harvard Medical School, Center for Biomedical Informatics, Computational Genomics Group (Park Lab)
- University of Rostock, Institute of Computer Science, Visual Computing and Computer Graphics