Du

Shedule

Place Conférence room BU Campus LyonTech La Doua

THESIS

Thesis defence Gaspart Dussert

Development and application of machine learning techniques for automatic species classification in camera trap images

The defence will be held in French, before a jury composed of:

François Munoz, PU, University of Lyon 1, Examiner

Alice Caplier, PU, INP Phelma, Reviewer

Marie-Pierre Etienne, Associate Professor, ENSAI, Reviewer

Julien Mairal, DR, Inria, Examiner

Stéphane Dray, DR, CNRS, LBBE, Thesis Director

Vincent Miele, IR, CNRS, LECA, Co-director of thesis

Simon Chamaillé-Jammes, DR, CNRS, CEFE, Thesis Supervisor

Thesis summary:

 

Large-scale ecosystem monitoring has become a major challenge in the context of the biodiversity crisis, as it is essential to fill critical knowledge gaps that hinder the design of effective management and conservation strategies. To address this, modern ecological monitoring relies on a variety of autonomous sensors to collect data in a continuous and standardized manner. In this context, this thesis focuses on camera traps, which have become essential tools for wildlife studies. However, these devices generate massive volumes of images, and manual processing represents a major bottleneck for both research and conservation. Artificial intelligence offers a promising solution by automating the analysis of such data. The objectives of this thesis are to develop and implement new deep learning methods to improve species and behaviour classification, enhance the interpretability of model predictions, and make these advances accessible to the ecological community through open-source tools. The first chapter presents the DeepFaune initiative, a collaborative project aiming to create the first large-scale dataset dedicated to European fauna and to develop efficient detection and classification models that can be easily used on personal computers through a dedicated software. The second chapter addresses the problem of confidence score calibration and demonstrates how temporal aggregation techniques and post-processing can improve the reliability of predictions, thereby facilitating their integration into downstream ecological models. The third chapter introduces a new module based on the self-attention mechanism to jointly exploit spatial and temporal information within image sequences, leading to improved classification performance, even in multi-species scenarios. Finally, the last chapter explores the potential of vision-language models for zero-shot animal behaviour prediction, i.e., without fine-tuning and for a task for which they have not been explicitly trained. Results show that their predictions are sufficiently reliable to estimate ecological indicators such as activity patterns. The methods developed throughout this work have been directly integrated through the DeepFaune software, which is now widely adopted across Europe, as well as through publicly available libraries and models. The species classification model has also been incorporated into other popular tools such as AddaxAI and Agouti, thereby facilitating the processing of millions of camera trap images and helping the automation of ecological monitoring. This thesis also opens new perspectives by promoting the use of vision-language models to predict ecological attributes that are rarely annotated, while also encouraging the development of vision-only models that leverage sequence information to improve animal detection. Together, these developments can strengthen the versatility and robustness of AI tools, ultimately enhancing their capacity to meet the growing demands of ecological studies.