twitterlinkedinslideshareslideshare

FAQ

This page includes answers to the answers the BDE partners are asked most frequently.

General

What problems does BDE solve?

Big Data is famously characterised by the 4 Vs:

Volume: the platform is designed to handle arbitrarily large amounts of data.

Velocity: the platform is designed to handle real time data, such as climate, energy and transport sensor data. More complex computational tasks can be handled through batch processing, that is, one chunk at a time, with results returned after processing has been completed.

Variety: the platform makes use of Linked Data technologies to ‘semantify the data,’ that is, to add meaning to the data in whatever format it is in, allowing data from different sources, from different domains and with different licensing conditions to be integrated with relative ease.

Veracity: the provenance of all data handled by the platform is tracked.

What is Big Data Europe building?

Big Data Europe is building a powerful, flexible, customisable platform that can handle data ingestion, integration and processing at scale.

Who is Big Data Europe for?

Small, Medium and large-sized entities coming from any sector within industry, research or the public sector, that have much to gain from making sense of large volumes of data (of both static or dynamic nature, and from various sources) to realise new and innovative use-cases, not just within their domain but also across different sectors.

Will the BDE Platform handle real time data?

Yes.

Will the BDE Platform be able to handle real time processing?

Yes.

Will the BDE Platform be able to do batch processing of data?

Yes.

Will the BDE Platform allow me to work with large static datasets?

Yes.

Is the BDE platform usable by non-experts?

The platform will need to be installed and configured by someone with a good level of technical skill. The base platform can be installed using the Chef cookbook and to perform computations on the platform, you will need a pipeline that will need to be developed by individuals with a knowledge of Big Data technologies. However, once installed it will be usable by non-specialists. It is designed to be used by researchers and other specialists in the meaning of the data, not in the data-handling technologies.

Does BDE help/allow you to analyse small data in the context of big data?

Certainly. You can use the platform for big data computations as well as small data analysis. A concrete example could combine metadata stored in a triple store with large datasets stored in HDFS.

Will people be able to use each other’s data? Is the platform open? Can it be?

The BDE platform does not limit nor help in these issues. You could deploy your own private platform or share the platform (and its content) with others.

Who is behind Big Data Europe?

The consortium comprises 14 Partners from across Europe that cover a wide range of disciplines from pharma to sociology. BDE is building and working with communities around each of the societal challenges identified by the European Commission to serve data intensive organisations and institutions.

Will there be a hosted supported version of the platform?

There are currently no plans to offer the project’s own instance of the platform for external use. The expectation is that organisations wishing to use the platform will install their own instance.

What will or could BDE integrate with?

The platform integrates with all software components that can be provided in a Docker container. The integration between the software components depends on their specific implementations.

Health

OpenPHACTS exists already, why do we need BDE?

OpenPHACTS has operated successfully in its current architecture since it was founded. However, the current solution doesn’t scale. Switching to the BDE Platform will make it easier to add more data and to manage the increasingly large volume and variety.

Food

What are the types of data that can benefit more from big data technologies in the area of food and agriculture?

Normally, Big Data in agriculture is associated with information collected by sensors, satellites or drones combined with genomic information or climate data, which can all help farmers to optimize their farm’s operations. These types of data are the most challenging in terms of Volume and Velocity.

In addition, challenges and opportunities have been identified by existing communities of data managers in this area also around the heterogeneity of the data that need to be combined and integrated for both fostering new research and innovation and providing meaningful information for decision making: in this case typical data types would include the types above as well as phenotypical data, soil observations, experiments, statistics, food traceability data etc. and the challenge would be more around Variety.

Images, especially plant and soil images and real-time streaming from experimental fields, are another area of application: image recognition and manipulation over huge collections or live streaming pose challenges in terms of both Volume and Velocity.

What are the advantages of BDE for food & agriculture?

Many researchers in the area of food and agriculture still use traditional technologies for managing and combining data (from Excel to RDBMS) and are only recently moving to using grid solutions and specific tools for sensors data and spatial / raster data like ENVI, R, GDAL etc.
The current advances in data collection techniques (like sensors and drones) and the availability of huge collections of scientific and experimental data that need to be integrated with other data to become meaningful and useful, make the manipulation and integration of such data very challenging with technologies that are not specifically designed for big data.

BDE will provide a (deployable) generic stack of technologies for big data plus a specific instance with ready-to-use demonstrators for the food and agriculture community.
Examples of application of such technologies: storage and processing of large volumes of data, computation over real-time data, scalable dataset processing for a number of standard formats, image pattern recognition and manipulation, access to data tables from within publications.
Overall, BDE will offer the kind of tool kit necessary to do research across a wide range of datasets more easily than is currently possible.

Energy

What are the advantages of BDE for the energy sector

As well as electricity, wind turbines generate a lot of data. Managers need an infrastructure to ingest and process the data from large turbine arrays in real time using parallel processing. It is often the case that there is plenty of computing power available but that data is lost or not used for want of the tools for processing it. A local instance of BDE should go a long way to solving this problem.

Transport

What are the key application areas of big data in the transport domain?

The possibility to schedule one’s own trip whether they are using private or public transport, the prospects of using big data in scheduling transport infrastructure such as road maintenance; potential of using the data generated by vehicles to enhance the vehicles themselves and the road planning.

What are good sources for geo-traffic data?

There are numerous sources of traffic data in European member states. The ITS Directive requires each MS to provide a National access point for traffic data. However there is no centralized and harmonised way to deliver and access this information. This is left to the discretion of the European member states.

Is there some EU-wide policy on data aggregation in transport?

A few European policies are relevant in data management: PSI directive, ITS directive, the wider INSPIRE directive. In addition the European regulations related to data privacy are fully applicable.

How do you deal with quality control in the transport domain?

The beauty of big data itself is that it auto-regulates; it gives the opportunity to cross check a variety of streams of data acting as a mean of quality control on its own. through cross-correlations, it enables the identification of low quality sources which may still be considered but with a lower confidence or only for specific traffic events.

How can we ever be fully certain that autonomous vehicles will not be hacked?

This is not only a matter of autonomous vehicles but of all vehicles equipped with connectivity technologies. The manufacturers are constantly developing new security measures to mitigate the risk of hacking however the idea that the threat only relates to autonomous vehicles is not accurate.

What is your insight regarding the GPS or GNSS location information related to the privacy concern?

Positioning information is always considered as personal data. This is the case for traffic information delivered from vehicles on the road. GPS data needs to be an anonymised and consolidated early in the data chain in order to avoid any linkability of the data.  

Climate

What’s the Role of BDE in Climate Modelling?

The amount of data available related to climate is huge. Climatologists need to know what the impact of climate change will be and how it will affect lives. People generally want to know this in terms of a location so BDE will allow climatologists to see the past, present and expected future variance in different factors on a map. The specific benefit of the technology therefore is that rather than searching for datasets, the platform will know which datasets to invoke to answer questions phrased in terms of location.

Social sciences

What is BDE doing in the social sciences?

There are many aspects of social sciences that can benefit directly from big data technologies. Very often the task is to associate statistics with geographical regions, implying the use of two disparate technologies. Within BDE, the focus is on public sector spending data. Government budget data can be very large and is usually far too technical in nature for a non-specialist. Collaborating with the Your Data Stories project and accessing data from municipalities across Europe, BDE will use big data technologies to reconcile heterogeneous datasets and make it easier to ask questions and make comparisons.

Security

What is the role of BDE in Secure Societies?

The security focus within BDE is on combining satellite data with news and social media. The volume of data being delivered from satellites is enormous, with every part of the Earth now being analysed by multiple space-borne instruments every day. It is possible to link this data to news and social media but it tends to be a labour intensive and therefore very expensive process. BDE promises to make it easier.

Technology

Does the platform scale elastically/automatically?
The platform doesn’t currently include support for automatically adding additional instances of different components in response to demand. However, the Docker-based architecture and overall design means that this capability could readily be added in future.
How can you add other components that are not provided in the set of BD components?
This is very easy to do and works for any component that can be virtualised as a Docker component.
Where can I get & discuss the code?

The code for the project is available in our GitHub repository.

Will Big Data Europe provide a technical solution for organisations to make their first steps in Big Data?

We intend to have a low barrier of entry. A simplified version of the platform could be installed using a Vagrant setup. Developing Big Data components will be made easier by the provision of a base Docker image published on the Big Data Europe GitHub repository. These base images and their documentation are continuously evolving.

Leave a comment

You must be logged in to post a comment.