Centre of Swine Studies of Catalonia (CEP)
This success story in the swine sector explores the data exchange opportunities offered by data spaces, especially regarding how data exchange barriers could be reduced by ensuring that data originators maintain control of their data. Therefore, the implementation of the data space is based on the Gaia-X Pontus-X ecosystem, using a Web3 and blockchain-based infrastructure that provides Data Sovereignty by Design. Furthermore, as Pontus-X builds on Gaia-X, its building blocks are used to facilitate the federation of the data space, especially the Gaia-X Digital Clearing House (GXDCH)1. Specifically:
-
All participants are identified using self-descriptions validated by the GXDCH Compliance service2, which is based on the trust framework provided by the GXDCH Registry3. In addition, the GXDCH Notary service4 is used to generate a Verifiable Claim with the legal registration number for all participants. The compliant self-descriptions for the participants are:
-
Universitat de Lleida (UdL): https://compliance.portal.agrospai.udl.cat/.well-known/UdL.vp.json
-
Centre of Swine Studies of Catalonia (CEP): https://compliance.portal.agrospai.udl.cat/.well-known/CEP.vp.json
-
-
The datasets and services available through the data space also present self-descriptions, which are also validated by GXDCH Compliance, for example:
- Exploratory Data Analysis Service: https://portal.agrospai.udl.cat/asset/did:op:34d5f73d77550843201ee1a43ad9d404d3e557ed6a70772e9afde7a27d863b8f
-
The self-descriptions of datasets and services, after compliance has been checked, are announced via the GXDCH Credentials Event Service5 (CES) so that they can be included in federated catalogs. For example: https://ces-development.lab.gaia-x.eu/credentials-events/0b041d31-e306-4a54-96d1-a4f9ef818877
The use case is run by the Universitat de Lleida (UdL) in cooperation with the Centre of Swine Studies of Catalonia (CEP), an experimental pig farm managed by a consortium formed by the Diputació de Lleida, the Regional Council of La Noguera, the City Council of Torrelameu and the Universitat de Lleida. The role of the CEP is mainly as a data originator, willing to share through the data space the data generated as a result of the different experiments carried out on the pig farm. For the CEP, it is crucial that they can maintain control of the data they provide, especially when it is generated by third parties such as manufacturers of automatic feeding machines testing their products at the CEP.
In addition, the monetization mechanisms provided by the Pontus-X are being tested to evaluate different incentive mechanisms that make the use case sustainable beyond the current proof-of-concept phase.
Currently, two services are offered, based on data provided by CEP. First, based on video surveillance images of one of the CEP's pens, an animal welfare assessment algorithm performs automatic image segmentation and tracking to identify and track the movements of the pigs. In addition, it is also possible to monitor the pigs' visits to areas of interest defined as the automatic feeding machine or the drinking bowl. This allows the automatic generation of metrics that can be used for animal welfare assessment.
Data sovereignty is guaranteed by design through a Data Room implemented using "Compute-to-Data", as shown in Fig. 16. The algorithm visits the image sequence within the data room, where they are analyzed, and only the calculated metrics leave the room. Consequently, there is no leakage of any image from within the farm. They are simply copied to the data room and destroyed after calculation without leaving it. A sample dataset, which has the consent of the CEP to be "visited" by the welfare assessment algorithm, is available online6.
Fig. 1. Sovereign data exchange of CEP data through a Data Room based on "Compute-to-Data" (source: AgrospAI)
The second service supports the "Pay-per-Use" approach by sharing data through a data space. Instead of requiring publishers to integrate data based on existing schemas, which causes a significant upfront overhead, the pay-per-use paradigm favors an incremental approach. In this way, entry barriers are reduced and data exchange is facilitated.
The data are published in tabular form, for example, those generated by the CEP automatic feeding machines. Semantic integration is provided by an algorithm that implements RML, an extension of the W3C R2RML standard, which in addition to mapping from relational databases to RDF semantic data, also provides mappings from CSV, TSV, XML, and JSON data sources to RDF.
Furthermore, data sovereignty is guaranteed by design through the Data Room. The mapped data does not leave the room, it is processed and then stored in a Knowledge Graph that remains within the room. In this way, it remains under the control of the data originator, the Centre of Swine Studies of Catalonia (CEP).
Later, the CEP can decide to grant access to trusted algorithms to visit the Data Room and cut the Knowledge Graph to extract the semantically integrated data relevant to their calculations. Also in this case, data sovereignty is guaranteed since only the results of the computation, such as aggregations or AI-trained models, can leave the room, not the original data or subsets thereof.
Currently, the CEP can directly share its existing tabular data on Daily Pig Weight and automatic feeding machine data. This data remains under its control, as the CEP can decide which algorithms can visit it in the Data Room. For example, an exploratory data analysis algorithm to build summaries of the data7.
In addition, the RML Mapper algorithm8 provides mappings for the CEP Daily weight and automatic feeding machine data formats that make it possible to follow the incremental and tiered approach proposed by the pay-per-use paradigm. It maps this type of CSV data to W3C RDF semantic data based on well-established vocabularies and ontologies that facilitate data integration, even between use cases and activity sectors.
The mapping implemented by this algorithm generates RDF data based on the smart application reference ontology (SAREF), an European Telecommunications Standards Institute (ETSI) compliant vocabulary that facilitates data integration in the smart application domain.
For example, for the daily pig weight data:
Animal ID | Date | Weight (g) |
---|---|---|
982091062894196 | 2021-03-16 | 16300 |
And the automatic feeding machine data:
Pen | Animal ID | Date | Time (s) | Duration | Feeding (g) | Weight (g) |
---|---|---|---|---|---|---|
4 | 982091062894196 | 2021-03-17 | 10:46 | 50 | 14 | 16500 |
The RML mapping generates the RDF semantic data graph that integrates both data sources, as shown in Fig. 2.
Fig. 2. Semantic integration of CEP daily pig weight and automatic feeding machine Data (source: AgrospAI)
Footnotes
-
https://registrationnumber.notary.lab.gaia-x.eu/development/docs ↩
-
https://dataspace.angliru.udl.cat/asset/did:op:31d6d1ea0fc540e1ea6e5268ebfd53e8129992cd6971dfbbbd0b88b08ca6f939 ↩
-
https://dataspace.angliru.udl.cat/asset/did:op:34d5f73d77550843201ee1a43ad9d404d3e557ed6a70772e9afde7a27d863b8f ↩
-
https://dataspace.angliru.udl.cat/asset/did:op:d20f956e79709fb2469fffe2bd85cf2fec95a21d2497998bb530043c6bbec901 ↩