Skip to main content

Mapping Soil Characteristics from Lab Analyses

This use case continues the integration of soil analysis data using Generative AI: once the data is extracted from PDF lab reports, this service geolocates them by cadastral parcel and visualizes them on an interactive map, generating spatial interpolations of the main soil parameters across the territory and over time. It has been designed following the best practices outlined in the use case development guide.

Objectives

The main objective is to transform the collection of soil analyses that cooperatives and technical offices accumulate over the years — heterogeneous and difficult-to-exploit PDF documents — into an actionable geographic information layer. From these reports, the system generates a map that allows:

  • Visualize the spatial distribution of key agronomic parameters (pH, organic matter, phosphorus, etc.) across all parcels.
  • Analyze their temporal evolution, comparing years with each other.
  • Identify areas within or outside the optimal agronomic range for each parameter.

All this while preserving farmers' privacy: the map never displays individual samples or the exact location of parcels.

How it works

The algorithm runs in three sequential phases:

1. PDF Extraction (LLM-based)

For each report, both the rendered images of the pages and a layout-aware markdown transcription (extracted with docling) are sent to a vision-language model. The model:

  • Classifies the document as soil, water, foliar, or other. Only soil reports are retained; water and foliar analyses are automatically discarded.
  • Extracts cadastral identifiers (Polygon / Parcel / Municipality / INE code) and soil parameters regardless of the report format.
  • Applies strict rules for units and methods: nitrogen is retained only if reported as nitrate in mg/kg; phosphorus only if extracted by the Olsen method (or expressed in mg/kg); potassium only in mg/kg. Values from acid extraction methods, foliar in ppm, or expressed as % d.m. are discarded.
info

By reading the layout directly from the page images, format support depends on the content rather than rigid templates. The process has been validated with Eurofins XK bulletins (reference scheme AR-…-XK-…) and Eurofins BUTLLETÍ (BUTLLETÍ D'ANÀLISIS in Catalan), and can process other formats provided they contain the cadastral fields and at least one of the supported parameters.

2. Geolocation

Each sample is resolved to GPS coordinates using the Consulta_CPMRC API of the Spanish Cadastre, using the cadastral reference (province + municipality + polygon + parcel). For records without an explicit INE code, the municipality name is resolved against the Cadastre's municipality register for the corresponding province.

caution

Reports without a Polygon/Parcel cadastral reference retain their extracted parameters in the data, but do not appear on the map, as their geographical position cannot be determined.

3. Map Generation

IDW (Inverse Distance Weighting) interpolation rasters are calculated for each soil parameter, both for the complete aggregate and for each year present in the dataset. The rasters are rendered on an interactive Folium map with:

  • Parameter selector — pH, organic matter, electrical conductivity, N-NO₃, and phosphorus.
  • Year selector — complete aggregate (default) or any individual year; yearly rasters share the aggregate's color scale to ensure an honest visual comparison between years.
  • Low sample density warning — when a year has fewer than ~15 samples, the legend displays a note to avoid overinterpreting sparse interpolations.
  • WRB soil type WMS layer — ISRIC SoilGrids layer with adjustable opacity.
  • Statistics panel — pH distribution, histogram of samples per year, and mean ± standard deviation for each parameter.
  • Optimal range indicators — a highlighted zone in the legend gradient marks the agronomically optimal range for each parameter.

You can interact with the following sample map: change the parameter and year, adjust the opacity of the soil type layer, and consult the statistics panel.

Fig. 1. Interactive map generated by the algorithm, with IDW interpolation of each soil parameter across the set of parcels. Demonstration data. (source: AgrospAI)

Extracted Soil Parameters

ParameterKeyUnit
pHpH
Organic matterMO%
Electrical conductivityCEdS/m
Nitrate NitrogenN_Nitricomg/kg
PotassiumPotasiomg/kg

Privacy by Design

To protect the privacy of the parcel owner, the map does not display individual markers. IDW interpolation is the only visualization, which preserves the geographical privacy of farmers contributing data: exact parcel locations are never exposed.