Climate Pilot


Pilot for Climate Action, Environment, Resource Efficiency and Raw Materials

Building on its foundations for promoting multidisciplinary scientific research, NCSR-Demokritos has delivered functionality to the big data ecosystem, as implemented by the BigDataEurope Platform and institutional infrastructures carrying out climate and weather research.

The first SC5 pilot aimed to demonstrate the potential big data technologies have in supporting scientific workflows and to facilitate computations and handling of data related to climate and / or weather dynamic down-scaling. In addition, it attempted to decouple the BDE platform from the institutional infrastructure used for down-scale and climate or weather modelling. This design decision was taken to encourage frictionless BDE updates with minimal disruption to everyday workflows used by the scientists.

Focusing on the NetCDF format, which is widely used for the storage and exchange of climate and weather modelling data, the first SC5 pilot built components to demonstrate the following:

  1. Ingestion and exporting of climate and weather data found in NetCDF files.
  2. Recording of data lineage, to allow the reproducibility of computational experiments.
  3. Arranging and executing down-scaling operations on a relevant NCSR-D institutional infrastructure, using the popular Weather Research and Forecast (WRF) model, which is widely used in climatic data down-scaling computations from global to local spatial scales.

BDE Components Used

The SC5 team at NCSR-Demokritos has extensive experience in climate and weather research, as well as in data science and big data tools. Further, it has previously been instrumental in developing the Semagrow query federator for Linked Data, peripheral components of which have been used for the implementation of this pilot.

The 1st SC5 pilot makes use of the following BDE components:

  • Semagrow; for ingesting NetCDF data into Hive tables.
  • Apache HDFS and Hive; for storing climate and weather data in raw format. This provides flexibility for querying and analysing cimate projections.
  • Cassandra; for storing relevant metadata originally found in the NetCDF. Also for storing data lineage describing the operations taking place on the data, to allow reproducibility.

Webinars and Workshops

Pertaining to the 1st SC5 pilot, the SC5 team has organised and carried out two webinars and two workshops:

  • Webinar 1; 12 January 2016. The aim of this webinar was to introduce the activities and recent developments on the use of big data in climate action within BDE and present the first pilot use case.
  • Workshop 1; 15 June 2015, Brussels. The purpose of this workshop was to identify the current as well the future big data challenges in the general climate domain.
  • Webinar 2; 12 July 2016. The aim of this webinar was to present a working version of the pilot and to evaluate it during a hands-on exercise with the participants.
  • Workshop 2; 11 October 2016, Brussels. During this workshop, we presented the outcome and evaluation of the 1st SC5 pilot, as well as we discussed plans for the 2nd SC5 pilot.

Source Code and Installation

Installation of the pilot depends on the WRF model being operational on a locally accessible infrastructure. The source code can be found and downloaded from Github.