The present BDE Pilot aims to facilitate the process of dynamical downscaling from global climate data to regional / local scales with the support of tools aggregated on the Big Data Europe platform. The “Pilot Partner”, i.e., the institution which will develop and run the Use Case will be NCSR “Demokritos”.
Data acquisition, handling and management in climate research are often being performed in an ad-hoc manner. Typically, observational and analysis data from external services, or portals, are transferred to local file systems. Researchers then pre-process the data using third-party or custom-made scripts. The pre-processed data (i.e. the resulting set of files) are then fed into the regional or local model of choice along with a set of suitable parameters. The regional or local models are executed on local computers, in many cases manually. The resulting files are used for visualisation and further analysis either on local machines or, data-size and task-permitting, on the researchers’ desk-tops or laptops. In many cases, there is no infrastructural support for managing or archiving the data produced at the various stages. Data products are kept for as long as they are immediately useful before they get removed to make room for new datasets. Similarly there is often no infrastructural support for storing data lineage of data products and experimental results.
So difficulties can be encountered in managing and archiving the data produced at the various stages. In addition it may be difficult to track progress and intermediate data products as well as to reuse the results and to collaborate across disciplines.
The goals of the Use Case are to:
• Provide an intuitive interface between researchers and specific climate data portals and providers.
• Search and download climate model and, optionally, observational data, according to user requirements, such as geographic coverage and / or computational experiments (scenarios).
• Setup and orchestrate the execution of the dynamic downscaling process on institutional computational resources, while gathering and managing data products.
• Establish a workflow for useful metadata mappings and data lineage.
There is no generally agreed workflow to perform the dynamical downscaling process.
This Pilot will use publically available climate data. It will make use of NetCDF data sets available on the ESGF federated service. For the implementation of this pilot we intend to use WRF – the Weather Research and Forecasting model – as the computational module that will perform the dynamical downscaling. It has therefore been decided that the internal metadata will be expressed in a WRF-compatible format. Appropriate mappings will be put in place so that the conversion is transparent to the user and that it takes place automatically upon ingestion.
The functional roles involved in this 1st Pilot Use Case are:
1. Research site/centre (stakeholder): A research site involved in atmospheric modelling and climate change research could make use of the functionality of this pilot either as an external service/product or by deploying an internal instance of the BDE platform. By adopting BDE services related to the particular pilot use-case a research site would increase the potential for its internal processes with minimal disruption and expansion requirements to its infrastructure and internal policies.
2. Researcher (primary user): By using this pilot climate researchers would be able to effectively search for data, potentially across different providers. Further, they would be able to perform processing tasks on the BDE platform or on their departmental resources – this pilot will initially focus on the second case. By performing the dynamical downscaling process through the BDE platform they will gain data provenance information which will increase the efficiency of experimental runs (by avoiding processing the same data twice) as well as the value of their experiments (as they will be archivable and retrievable for potential future reference, publication and scientific replication). The Pilot use-case will increase the potential for primary users to perform multiple downscaling computations and inter-compare their results.
Regarding the Pilot Architecture, the BDE components involved are the following:
• Sextant Demonstrator: A UI component which is able to effectively query data initially encoded in the NetCDF format based on variables as well as on additional metadata. A Strabon-based demonstrator in concert with the underlying SemaGrow stack will optimise querying against local climate data, fetching data from external sources as needed (web services adapter, data caching and resolution and processing orchestration).
• The lineage manager will associate actions, such as querying and pre-processing data, to the data products obtained. This will serve as metadata to the data products and will allow climate experts to track actions taken on specific pieces of data. Lineage information will be stored in Apache Cassandra, or in alternative column-based stores.
• Apache Hive will act as the interface between the actual climate data files and the data querying and lineage stack. Raw data files will ultimately be stored on Hadoop HDFS. Hadoop MapReduce will be used implicitly by Apache Hive.
The development of this pilot therefore aims at improving the productivity of researchers by making it easier for them to manage external data sets originating from different sources, ingest and transform them to data formats of choice, while overseeing the execution of model runs over the data, making use of existing infrastructure and procedures.
Providing such facilities to climate researchers will therefore enable efficiency and reusability in downscaling experiments, which in turn will open up opportunities for well integrated pilots across communities within the BDE platform. Downscaling computations produce more detailed (in space and time) values of climate-related physical quantities which are subsequently used for assessment of climate change impacts on regional and local scales. Climate change impact assessment studies on sectors such as energy, food and agriculture are potential future pilot-use cases across societal challenges in the BDE platform.