FAQThis page includes answers to the answers the BDE partners are asked most frequently.
Big Data is famously characterised by the 4 Vs:
Volume: the platform is designed to handle arbitrarily large amounts of data.
Velocity: the platform is designed to handle real time data, such as climate, energy and transport sensor data. More complex computational tasks can be handled through batch processing, that is, one chunk at a time, with results returned after processing has been completed.
Variety: the platform makes use of Linked Data technologies to ‘semantify the data,’ that is, to add meaning to the data in whatever format it is in, allowing data from different sources, from different domains and with different licensing conditions to be integrated with relative ease.
Veracity: the provenance of all data handled by the platform is tracked.
Big Data Europe is building a powerful, flexible, customisable platform that can handle data ingestion, integration and processing at scale.
Small, Medium and large-sized entities coming from any sector within industry, research or the public sector, that have much to gain from making sense of large volumes of data (of both static or dynamic nature, and from various sources) to realise new and innovative use-cases, not just within their domain but also across different sectors.
The platform will need to be installed and configured by someone with a good level of technical skill. The base platform can be installed using the Chef cookbook and to perform computations on the platform, you will need a pipeline that will need to be developed by individuals with a knowledge of Big Data technologies. However, once installed it will be usable by non-specialists. It is designed to be used by researchers and other specialists in the meaning of the data, not in the data-handling technologies.
Certainly. You can use the platform for big data computations as well as small data analysis. A concrete example could combine metadata stored in a triple store with large datasets stored in HDFS.
The BDE platform does not limit nor help in these issues. You could deploy your own private platform or share the platform (and its content) with others.
The consortium comprises 14 Partners from across Europe that cover a wide range of disciplines from pharma to sociology. BDE is building and working with communities around each of the societal challenges identified by the European Commission to serve data intensive organisations and institutions.
There are currently no plans to offer the project’s own instance of the platform for external use. The expectation is that organisations wishing to use the platform will install their own instance.
The platform integrates with all software components that can be provided in a Docker container. The integration between the software components depends on their specific implementations.
OpenPHACTS has operated successfully in its current architecture since it was founded. However, the current solution doesn’t scale. Switching to the BDE Platform will make it easier to add more data and to manage the increasingly large volume and variety.
Normally, Big Data in agriculture is associated with information collected by sensors, satellites or drones combined with genomic information or climate data, which can all help farmers to optimize their farm’s operations. These types of data are the most challenging in terms of Volume and Velocity.
In addition, challenges and opportunities have been identified by existing communities of data managers in this area also around the heterogeneity of the data that need to be combined and integrated for both fostering new research and innovation and providing meaningful information for decision making: in this case typical data types would include the types above as well as phenotypical data, soil observations, experiments, statistics, food traceability data etc. and the challenge would be more around Variety.
Images, especially plant and soil images and real-time streaming from experimental fields, are another area of application: image recognition and manipulation over huge collections or live streaming pose challenges in terms of both Volume and Velocity.
Many researchers in the area of food and agriculture still use traditional technologies for managing and combining data (from Excel to RDBMS) and are only recently moving to using grid solutions and specific tools for sensors data and spatial / raster data like ENVI, R, GDAL etc.
The current advances in data collection techniques (like sensors and drones) and the availability of huge collections of scientific and experimental data that need to be integrated with other data to become meaningful and useful, make the manipulation and integration of such data very challenging with technologies that are not specifically designed for big data.
BDE will provide a (deployable) generic stack of technologies for big data plus a specific instance with ready-to-use demonstrators for the food and agriculture community.
Examples of application of such technologies: storage and processing of large volumes of data, computation over real-time data, scalable dataset processing for a number of standard formats, image pattern recognition and manipulation, access to data tables from within publications.
Overall, BDE will offer the kind of tool kit necessary to do research across a wide range of datasets more easily than is currently possible.
As well as electricity, wind turbines generate a lot of data. Managers need an infrastructure to ingest and process the data from large turbine arrays in real time using parallel processing. It is often the case that there is plenty of computing power available but that data is lost or not used for want of the tools for processing it. A local instance of BDE should go a long way to solving this problem.
The possibility to schedule one’s own trip whether they are using private or public transport, the prospects of using big data in scheduling transport infrastructure such as road maintenance; potential of using the data generated by vehicles to enhance the vehicles themselves and the road planning.
There are numerous sources of traffic data in European member states. The ITS Directive requires each MS to provide a National access point for traffic data. However there is no centralized and harmonised way to deliver and access this information. This is left to the discretion of the European member states.
The beauty of big data itself is that it auto-regulates; it gives the opportunity to cross check a variety of streams of data acting as a mean of quality control on its own. through cross-correlations, it enables the identification of low quality sources which may still be considered but with a lower confidence or only for specific traffic events.
This is not only a matter of autonomous vehicles but of all vehicles equipped with connectivity technologies. The manufacturers are constantly developing new security measures to mitigate the risk of hacking however the idea that the threat only relates to autonomous vehicles is not accurate.
Positioning information is always considered as personal data. This is the case for traffic information delivered from vehicles on the road. GPS data needs to be an anonymised and consolidated early in the data chain in order to avoid any linkability of the data.
The amount of data available related to climate is huge. Climatologists need to know what the impact of climate change will be and how it will affect lives. People generally want to know this in terms of a location so BDE will allow climatologists to see the past, present and expected future variance in different factors on a map. The specific benefit of the technology therefore is that rather than searching for datasets, the platform will know which datasets to invoke to answer questions phrased in terms of location.
There are many aspects of social sciences that can benefit directly from big data technologies. Very often the task is to associate statistics with geographical regions, implying the use of two disparate technologies. Within BDE, the focus is on public sector spending data. Government budget data can be very large and is usually far too technical in nature for a non-specialist. Collaborating with the Your Data Stories project and accessing data from municipalities across Europe, BDE will use big data technologies to reconcile heterogeneous datasets and make it easier to ask questions and make comparisons.
The security focus within BDE is on combining satellite data with news and social media. The volume of data being delivered from satellites is enormous, with every part of the Earth now being analysed by multiple space-borne instruments every day. It is possible to link this data to news and social media but it tends to be a labour intensive and therefore very expensive process. BDE promises to make it easier.
The code for the project is available in our GitHub repository.
We intend to have a low barrier of entry. A simplified version of the platform could be installed using a Vagrant setup. Developing Big Data components will be made easier by the provision of a base Docker image published on the Big Data Europe GitHub repository. These base images and their documentation are continuously evolving.