The implementation of the pilot has been undertaken by the EU Satellite Centre (Domain Expert), the University of Athens (Technical Leader) and the NCSR “Demokritos”. The following architecture has been designed to meet the pilot requirements.
In total, there are 10 components, which can be grouped into 3 workflows:
- The Change Detection workflow involves the three components at the bottom of the diagram. The Image Aggregator receives the co-ordinates of an Area of Interest (AoI) from the user or from the Event Detector along with the date(s) of interest. Successively, it downloads the corresponding images through the Sentinels Scientific Data Hub and stores them into the HDFS. Currently the system deals with Sentinel 1 Ground Range Detected (GRD) images. The following step involves the Change Detector module, which retrieves the images and compares them in order to identify human-made changes. To improve this time-consuming procedure, its functionality is distributed to multiple nodes using Spark. This can be accomplished in two ways: the image-centric approach assigns a new set of images to a specific node, while the tile-centric approach distributes sets of tiles from the same image collection to all available nodes. In both cases, the same process is applied: every image is individually calibrated, then all images are co-registered (in order to be overlapped through the same set of coordinates) and finally they are juxtaposed to spot human-made changes.
- The Event Detection workflow involves the three components at the top of the diagram. The News Crawler monitors the RSS feed of Reuters as well as specific Twitter accounts, keywords and the public stream. Data and meta-data extracted from the collected news items are stored in Cassandra, which offers high scalability. Periodically, the Event Detector retrieves the latest news items and clusters them into events, which are stored back to Cassandra. Special care is taken to extract the geolocation of every event with the help of a Geonames lookup service that is stored in Strabon. The corresponding coordinates are forwarded to the Change Detector, triggering an off-line process that prepares the results to be presented to the user on demand. Finally, the Event Detector builds a summary for every event to facilitate its retrieval.
- The Core Workflow involves the four components in the middle of the diagram. GeoTriples receives the outcomes of the other two workflows, which involve two kinds of information: (i) a set of the coordinates that correspond to the geolocation of an event or an area with human-made changes; (ii) additional information, which provide either a short event description (title and time) or the type of human-made changes. GeoTriples converts this information into RDF triples and stores them in Strabon. Sextant is the user interface that enables a user to perform all necessary tasks, i.e., to trigger the change detection workflow for a specific area of interest, to inspect the areas with human-made changes, to retrieve the description of the latest events that pertain to a specific area or keyword and to see the content of the news items forming an event. Sextant receives this information through SemaGrow, which provides a federated access to Strabon and Cassandra.
Currently, the largest part of the pilot architecture has been completed. The necessary modifications to the user interface of Sextant have been carried out and the Event Detection workflow has been integrated with the Change Detection one. Both workflows have been integrated with the Core one, as well. The individual workflows have to overcome two pending issues in order to become operational: the implementation of the tile-centric approach for Change Detection and the adaptation of the Event Detector parallel algorithms to the Apache Spark framework. It is expected to address both issues by the end of April 2016. It is planned to set up the pilot on the BigDataEurope infrastructure at the beginning of May, while its performance and scalability will be tested from the mid until the end of May. Future work will be dedicated to the ingestion of other remote sensing data (e.g. Sentinel 1 SLC images and Sentinel 2 data) and other social sensing sources (e.g., additional news feeds).