With the term spatial (or geospatial) data we describe data or information identified by a geographic location on Earth. Spatial data are, therefore, described with coordinates and the information contained; this characteristic allow them to be mapped, visualized and analysed with applications like Geographic Information Systems (GIS).
Common spatial data are generally divided in two categories: raster and vector. A raster consists of a matrix of cells (or pixels) organized into rows and columns (or a grid), where each cell contains a value representing information. One of the most common example of a raster is a satellite image, where each pixel is described by a number and a specific location on the surface. Therefore, raster data are used to present continuous information, e.g. land cover, temperature and height. Vector data are used to represent discrete features through points, lines or polygons. Features like rivers, lakes, boundaries and punctual information are generally represented by vectors data.
With the Big Data revolution, massive amounts of geospatial data are being collected at a rate that increases every day so that a new term was coined to describe the union of Big Data and Spatial Data: Spatial Big Data (SBD).
Datasets for security applications do comply with the SBD definition. In more detail, the rapidly increasing volume, velocity, variety, veracity and value of data coming from spatial sources raises new issues such as the management of extremely large and complex datasets and their exploitation. Regarding the volume, the Sentinel satellites, the main European Earth Observation satellites, will deliver each day images on the order of terabytes (the sole Sentinel 1 and Sentinel 2 will deliver 2.6 Tb of images per day)
Data for security are not restricted to satellite images. Every data which can be associated to a geographic position can be used: aerial imagery (e.g. from Remotely Piloted Vehicles), intelligence sources, GPS data, media, public data, web-based communities, user-generated content, video sharing sites, wikis, blogs, other publicly available sources, etc.
On top of these data, applications like evacuation route planning, monitoring of critical infrastructures, surveillance and tracking for border security and maritime control are now possible. New challenges in infrastructure development, analytics capabilities and insights processes have therefore to be tackled in order to acquire, store, manage, query, analyse and disseminate this bulk of information.
Recent advances in technology aim to address these challenges. The emphasis is placed on developing generic, scalable and fault-tolerant systems that support distributed processing. The state-of-the-art in this direction is the lambda architecture, which is robust against hardware failures and human mistakes, while being able to serve a wide range of workloads and use cases. In essence, it supports two procedures: batch processing, which parallelizes off-line applications that handle large volumes of immutable data stored on disk, and speed processing, which extracts the most essential information from a stream of data in real time (i.e. with low latency). Many implementations of this architecture are publicly available on the Web, such as Apache Spark. The goal of BigDataEurope is to get the best of the available solutions and to combine them in an easy-to-use, versatile and robust infrastructure.