MVPF- Visual Monitoring of Crop Production Systems: A multimodal Approach

Improving imaging techniques that are used to monitor crop production.


 “One Health” is a concept that takes a holistic view on health and emphasizes the interdependency between soil, plant/animal and human wellbeing. More precisely, sustainable soil management is thought to produce plants with greater natural resistance to biotic and abiotic stress as well as more high-quality food for a healthy human nutrition. To put this concept into practice, it is necessary to perform soil biomass measurements with chemical processes, analyse the hydration status of leaves via leaf water potential sensors, or estimate fruit maturity estimation with hyperspectral information. Imaging techniques are also used but unlike chemical measurements or attachable sensors, they allow for non-destructive assessment of traits and provide spatial information as well as high-throughput data. Despite being of great value imaging devices also have their limitations and here is where the present project comes in.

Project Content

Imaging techniques are used in agriculture to monitor different factors influencing plant growth. However, they have their drawbacks and only yield indirect information. For example, the colour of leaves as recorded with an RGB camera is currently a rather imprecise (less precise than using sensors) method of determining a plant’s water content. In this dissertation project, I aim to address such short-comings by combining information from multiple cameras for the monitoring of crop production systems and by developing novel multimodal machine learning and sensor fusion methods that are able to process the multiple-camera information. The main idea is that using multiple camera technologies (RGB, thermal, hyperspectral, depth) in one setup can yield complementary information that can increase the accuracy and robustness of monitoring processes. To assess and illustrate the usefulness and adaptability of the developed methodologies, they are evaluated in three separate use cases that span the full agricultural chain (soil - plant - crop).

Research questions

The overarching goal of this project is to lay the groundwork for novel image processing techniques used to monitor agricultural production. The following research questions need to be answered to accomplish this goal:

•    Does using multiple complementary camera types and applying novel multimodal learning techniques increase the predictive accuracy for factors relevant to crop production? What is the level of prediction accuracy that can be reached?  
•    How generalizable are the developed methods? Can they be applied to different use cases (three use-cases are tested in this project)? 
•    How important is each modality (individual imaging device) for each use case? Which combination of modalities is best suited for a particular use case? Which modalities do not provide useful information? 
•    How robust are the developed techniques? Do they still provide useful results when variations in specimen morphology, changes in the environment (e.g., illumination) occur or modalities are missing? 
•    Can multimodal learning techniques be leveraged to identify characteristic patterns of certain traits (pattern mining)? Are approaches of explainable AI (XAI) a suitable methodology for discovering characteristic patterns and can XAI methods be adapted to the multimodal learning techniques developed? 


Different camera types have varying depth of field, focal length, lens distortions, image resolutions as well as 3D positions and angles when mounted in the data capturing setup. As a result, the recorded image for each camera in the setup looks vastly different and it is difficult to identify matching points between camera views. To establish correspondences between the camera images, feature point descriptors and stereo matching algorithms, as well as unique matching approaches for varied input images, are being developed. Moreover, to monitor plants, it is important to enable trait predictions on a leaf-by-leaf basis. This is a challenging task because of overlapping leaves and shadowed regions. To provide a solution to that, I need to improve upon current methods and modify them to incorporate multi-spectral and multicamera inputs. I also develop multimodal learning architectures that integrate different fusion paradigms and allow for the fusion of multi-camera input with different channel dimensions to predict the desired target variables. To make decision making of deep learning networks more transparent and identify characteristic patterns for certain traits in the input data I plan to leverage eXplainable Artificial Intelligence (XAI) methods. All methods developed are tested in three use cases. The first use case focuses on health/fertility assessments of soil, the second use case on the detection of nutrient deficiencies in plants (are there enough nutrients available), and the final use case on the early detection of decay in fruits and vegetables.


This dissertation project breaks new grounds in the domains of computer vision, machine learning and agriculture. It further develops and improves image processing techniques, which are employed to determine and improve the health status of plants. The project draws inspiration from the “One Health” concept and adopts a holistic view on crop and food production. This means that the focus is not on one aspect only, but also on other significant factors and the interplay between them. On a larger scale, the project hopes to contribute to a more secure global food supply.

You want to know more? Feel free to ask!

Junior Researcher Institute of Creative\Media/Technologies
Department of Media and Digital Technologies
Location: Building A - Campus-Platz 1
Project manager
GFF (FTI Dissertationen 2021)
03/01/2023 – 02/28/2026
Involved Institutes, Groups and Centers
Institute of Creative\Media/Technologies
Research Group Media Computing