Transport Pilot


Pilot for Smart, green and integrated transport

Congestion is a major problem in Europe, especially in urban areas. There is a need to exploit the current advances in information and communication technologies and big data analytics, in order to improve mobility related data collection, utilising real-time data for the provision of accurate ‘info-mobility’ services and advanced transport planning, leading to better decisions from the travellers’ side as well as improved traffic management at city level by the respective traffic management authorities. The use of mobility data coming from multiple sources presents significant challenges, especially due to the different nature of the datasets both in content and spatio-temporal terms. A further challenge is that the data should be collected and processed in real time. New methods and tools coming from the advances in the big data analytics sector are needed in order to significantly improve the current state of the art in transport data exploitation.

CERTH and Thessaloniki


Thessaloniki has grown into a real world transport test bed in Europe thanks to the continuous effort of the Hellenic Institute of Transport of the Centre for Research and Technology Hellas (CERTH-HIT). The centre has implemented various national and European projects that have provided state of the art equipment and transport services to the city so that CERTH-HIT is now able to provide real time advanced traveller information services to citizens. Innovative data collection methods include a static sensor network tracking over 40 IDs via Bluetooth; dynamic sensors providing real-time floating car data in a fleet of over 1200 vehicles; cooperative technologies; and data obtained from social media.

Bluetooth sensors are installed throughout Thessaloniki

The streets of Thessaloniki are silent witness of the first large-scale deployment of cooperative services for both passenger (Compass4D) and freight (CO-GISTICS) transport at European level, both coordinated by ERTICO-ITS Europe while its port will be one of the first in Europe to host autonomous vehicles (LOGIMATIC).

The pilot makes use of real time Twitter data

All these assets have supported the selection of Thessaloniki as the transport pilot of the Big Data Europe project, where the existing services will be significantly improved through the use of big data techniques, improving the processing capabilities of CERTH-HIT. The new services will be mostly related to the improvement of the forecasting capabilities based on mobility and traffic pattern recognition using data from multiple sources. The aim is that citizens will benefit from more precise, faster and more detailed information via improved travel time estimation, traffic flow estimates, road hazard detection and traffic model simulations.

Webinars and Presentations

The hangout held on 27 June 2016 includes a lot of information about the pilot. You can see the complete recording below or individual slide decks and brief details on the recap page.

All the presentations from the 2nd Transport workshop, held in late September 2016 are also available. Josep Maria Salanova (CERTH)’s presentation Introducing Thessaloniki (PDF) and Luigi Selmi (Fraunhofer IAIS)’s Technical background of the Transport pilot (PDF) are particularly informative.

Challenges and Approach

The pilot tackles some interesting challenges around using sensor data, spatial databases, GPS data, messages from social networks, webcams and likely, data from the vehicles in the near future. The data sources in the transport domain are characterised by the time dimension and the spatial dimension and most of the data comes as streams of records whose value decreases quickly with time. There is little interest in knowing that a street is congested or closed for maintenance when people are already stuck on that street. The architecture is based on the concept of microservices where an application or service is implemented using different loosely coupled components for data ingestion, communication, processing and storage.

Apache Kafka is used as a message broker and allows the components to communicate asynchronously. Apache Flink is used to process the stream data, like the taxi data provided by our partner CERTH, and also batch data. Elasticsearch, a document database based on the open source search engine Apache Lucene, is used to store the records after the processing. All these components and others specific to our pilot are provided by the BDE project in Docker images that can be deployed in a single host where a Docker engine has been installed or in a distributed environment with nodes that are member of a Docker swarm.


Installation instructions to install and run the pilot, along with the necessary Docker images, are all provided in the BDE GitHub repository.

Thessaloniki signpost image: some rights reserved by Sotiris Marinopoulos, CC-BY-NC-ND licence.