Food & Agriculture Pilot
Pilot for Food security, sustainable agriculture and forestry, marine and maritime and inland water research, and the Bioeconomy
The problem of discovery and linking of information is present in every major area of agricultural research and agriculture in general. This is especially true in viticulture where different research methodologies produce a great amount of heterogeneous data from diverse sources; scientists need to be able to find all this information so as to analyse and correlate it to provide integrated solutions to the emerging problems in the European and global vineyard. These problems arise largely because of the impact of climate change and therefore the exploitation of the appropriate grapevine varieties is very important. Factors to bear in mind include the intensity of diseases, the intensification of the cultivation, the proper implementation of precision viticulture systems that affect the quality of viticultural products and their role in human health.
The overall goal of the SC2 Pilot is to demonstrate the ability of Big Data technologies to complement existing community-driven systems (e.g. VITIS for the Viticulture Research Community) with efficient large-scale back-end processing workflows.
The pilot deployment is organised in three Cycles with different targeted objectives
The goal of this Pilot Cycle is to showcase a large-scale processing workflow that automatically annotates scientific publications relevant to Viticulture. The focus of the first demonstrator cycle is on the Big Data aspects of such a workflow (i.e. storage, messaging and failure management) and not on the specificities of the NLP modules/tools used in this demonstrator.
Pilot Cycle 2 (SC2 Pilot Maturity / Functionality Expansion)
The goal of this Pilot Cycle is to showcase the ability of scalable processing workflows to handle a variety of data types (beyond bibliographic data) relevant to Viticulture.
Pilot Cycle 3 (Lowering SC2 Community Boundaries)
The goal of this Pilot Cycle is to provide an engaging, intuitive graphical web interface addressing key data-oriented questions relevant to the Viticulture Research Community, and if possible, intuitive interfaces for end-users for sharing and linking their on-the-field generated data.
Primary Content/Data Involved
In SC2 Pilot Cycle 1, content mainly refers to open scientific publications relevant to Viticulture, available at FAO/AGRIS and NCBI/PubMed in PDF format (about 26K and 7K publications respectively). In Cycle 2, the content pool has been extended to include:
- Weather Data, available via publicly available APIs (e.g. OpenWeatherMap, Weather Underground, AccuWeather etc.)
- User-generated data, e.g. geotagged photos from leaves, young shoots and grape clusters, ampelographic data, SSR-marker data etc.
Additional data sources include:
- Sensor Data, measuring temperature, humidity and luminosity retrieved from sensors installed in selected experimental vineyards,
- ESA Copernicus Sentinel 2 Data, for selected experimental vineyards.
The goal of the inclusion of these data is to complement the existing SC2 Pilot Demonstrator Knowledge Base so as to support complex real-life research questions, based on the correlation of environmental conditions with real observations on crop production and quality.
BDE SC2 Pilot Architecture / BDI Components Used
The SC2 Pilot showcases the ability of scalable processing workflows to handle a variety of data types relevant to Viticulture, such as weather and user-generated data. The extracted information (metadata and digital objects) extend the Knowledge Base of the VITIS application: Metadata are stored as triples in GraphDB, and digital objects (files) are stored in HDFS. The Pilot makes use of the following BDI components:
- Apache Flume – Data ingestion
- Apache Kafka – Messaging
- Apache Spark – Distributed analysis, transformation
- Apache HDFS – Raw data storage
- SWC PoolParty – Concept/Vocabulary Store
- GraphDB – Triple Store
The detailed pilot architecture is presented below: