How to select reference sites for long-term agricultural experiments: A data-driven approach
Autoren: Nishita Thakur, Marco Donat, Sonoko Bellingrath-Kimura, Wiebke Niether, Andreas Gattinger, Deise Aline Knob, Eva-Maria L. Minarsch, Philipp Weckenbrock, Franz Schulz, Lutz Breuer, Suzanne Jacobs, John Clifton-Brown, Karolina Golicz
DAKIS | 12.2025
Measuring the effects of land use changes such as agroforestry, photovoltaics or afforestation, on environmental variables poses an ongoing challenge for scientists. These difficulties stem from the long time frame for changes to manifest and the difficulty in isolating pre-existing site conditions from land use change effects.. This study addresses the key issue of reference site selection, which is often oversimplified to relying on spatially proximate but potentially heterogeneous neighbouring fields.. Advances in high-resolution spatial and temporal environmental data provide opportunities to enhance the robustness of land use change studies by enabling the identification of statistically comparable reference sites across the landscape, rather than relying on fixed, proximal references. The aim of this study was to develop a generalizable and easy-to-implement approach for the identification of potential reference sites, based on matching site conditions. Using 12 variables across three key categories—agronomic, topographic, and edaphic (soil)—we evaluated six fields (totalling 42.7 ha) to identify locations most comparable to a set of ad-hoc experimental plots located in a case-study agroforestry field (3.5 ha). High-resolution maps (3 × 3 m) were generated using state-of-the-art satellite imagery and data from a proximal multi-sensor platform. Geographically weighted principal component analysis (GWPCA) combined with K-means clustering stratified the fields and identified areas most similar to the experimental plots, demonstrating the methodology. Within individual fields, we identified specific areas that closely matched the target conditions of the investigated plots, optimizing the selection of reference sites for future sampling. Results showed that the fields nearest to the case-study field exhibited the highest similarity, as expected based on the Tobler’s proximity law and that the soil data generated from a proximal multi-sensor platform had limited impact on the selection. This approach, supported by open-access, well-documented Python code, is designed to be flexible and easily adapted to various research needs.