twitterlinkedinslideshareslideshare

FAQ

This page includes answers to the answers the BDE partners are asked most frequently.

General

Why Should I Use BDE rather than other big data platforms?

Aad Versteden explains that the BDE Integrator Platform is designed to let you experiment with different big data tools within your own context, i.e. run your own software on your own data. BDE makes this really easy and doesn’t require you to pick one tool before you start.

How Can BDE Help Someone Get Started With Big Data?

The topic of big data can be daunting – there are so many tools, options and methods. Hajira Jabeen explains how the BDE Integrator Platform can help someone who, like her, is new to big data, get up and running quickly and with a minimum of hassle.

What Are The Main Features Of The BDE Platform?

Hajira Jabeen, one of BDE’s lead developers, talks about the main features of the Big Data Europe Integrator Platform

What is the biggest achievement of BDE?

Aad Versteden explains the innovation behind the Big Data Integrator Platform. Recognising the work of others, he sets out how BDE allows you to work with components and tools that you may be unsure of but that BDE has already proved to work well in different combinations and toolchains in real life situations.

Can You Give An Example Of A Project Using the Big Data Europe Integrator Platform?

Ivan Irmilov gives an example of the kind of experiment that’s easy to do using the Big Data Europe Integrator Platform. See https://www.big-data-europe.eu/scalable-sparkhdfs-workbench-using-docker/ for more.

What Future Do You See for the Big Data Europe Platform?

Maritina Stavrakaki gives her view as a user, not a developer, of big data tools, what she sees as the future of the platform, and what she hopes it will provide.

What New Opportunities Are Opened by Big Data?

Maritina Stavrakaki explains that a lot of what is being done with big data in her field of agronomy is not new, however, it’s now much easier to access information within open access publications and others to which a university might have access.

What Are You Trying To Achieve With Big Data?

As a viticulturalist, Maritina Stavrikaki is concerned with things like genetic data (the DNA of the vines), phenotypic data (how/which genes are expressed), phenolic contents of the wine produced, the effects of climate change on vines and more. Here she explains how Big Data Europe is helping in her research.

What Are The Most Innovative Features Of BDE?

Hajira Jabeen describes what BDE has contributed to the community and what have been its most innovative features. In answer to the secind point, she highlights the BDE’s Semantic Data Lake and the SANSA Analytics engine.

Will The Semantic Data Lake Continue After BDE?

All too often, good research ideas die when projects come to an end. Not so the Semantic Data Lake as Mohamed Nadjib Mami explains.

Will CRES Use BDE After The Project?

All too often, software is left on the shelf to gather dust as soon as the project that created it is over. Not so at CRES, as Fragiskos Mouzakis of CRES explains.

What Will TenForce Do with BDE After the Project?

Aad Versteden explains how the BDE project fits within his company, TenForce. No, they won’t be offering commercial services to install it (it’s too easy!) but they are committed to continuing to work on the platform for at least another three years (through the SPECIAL project). They are already using it internally within the company and plan to explore more ways to do so.

What Were the Biggest Challenges You Faced?

Aad Versteden talks about the challenges faced by the team as they built the big data Integrator Platform. Sure there were technical challenges to be solved, but the biggest challenge of all was to inspire potential users.

What problems does BDE solve?

Big Data is famously characterised by the 4 Vs:

Volume: the platform is designed to handle arbitrarily large amounts of data.

Velocity: the platform is designed to handle real time data, such as climate, energy and transport sensor data. More complex computational tasks can be handled through batch processing, that is, one chunk at a time, with results returned after processing has been completed.

Variety: the platform makes use of Linked Data technologies to ‘semantify the data,’ that is, to add meaning to the data in whatever format it is in, allowing data from different sources, from different domains and with different licensing conditions to be integrated with relative ease.

Veracity: the provenance of all data handled by the platform is tracked.

What is Big Data Europe building?

Big Data Europe is building a powerful, flexible, customisable platform that can handle data ingestion, integration and processing at scale.

Who is Big Data Europe for?

Small, Medium and large-sized entities coming from any sector within industry, research or the public sector, that have much to gain from making sense of large volumes of data (of both static or dynamic nature, and from various sources) to realise new and innovative use-cases, not just within their domain but also across different sectors.

Will the BDE Platform handle real time data?

Yes.

Will the BDE Platform be able to handle real time processing?

Yes.

Will the BDE Platform be able to do batch processing of data?

Yes.

Will the BDE Platform allow me to work with large static datasets?

Yes.

Is the BDE platform usable by non-experts?

The platform will need to be installed and configured by someone with a good level of technical skill. The base platform can be installed using the Chef cookbook and to perform computations on the platform, you will need a pipeline that will need to be developed by individuals with a knowledge of Big Data technologies. However, once installed it will be usable by non-specialists. It is designed to be used by researchers and other specialists in the meaning of the data, not in the data-handling technologies.

Does BDE help/allow you to analyse small data in the context of big data?

Certainly. You can use the platform for big data computations as well as small data analysis. A concrete example could combine metadata stored in a triple store with large datasets stored in HDFS.

Will people be able to use each other’s data? Is the platform open? Can it be?

The BDE platform does not limit nor help in these issues. You could deploy your own private platform or share the platform (and its content) with others.

Who is behind Big Data Europe?

The consortium comprises 14 Partners from across Europe that cover a wide range of disciplines from pharma to sociology. BDE is building and working with communities around each of the societal challenges identified by the European Commission to serve data intensive organisations and institutions.

Will there be a hosted supported version of the platform?

There are currently no plans to offer the project’s own instance of the platform for external use. The expectation is that organisations wishing to use the platform will install their own instance.

What will or could BDE integrate with?

The platform integrates with all software components that can be provided in a Docker container. The integration between the software components depends on their specific implementations.

Health

OpenPHACTS exists already, why do we need BDE?

OpenPHACTS has operated successfully in its current architecture since it was founded. However, the current solution doesn’t scale. Switching to the BDE Platform will make it easier to add more data and to manage the increasingly large volume and variety.

Food

Do You Have To Be A Big Data Expert To Use BDE?

Maritina Stavrakaki is an agronomist specialising in viticulture, not a big data technologist. Here she explains her involvement in the BDE project’s pilot around the food and agriculture societal challenge.

What New Opportunities Are Opened by Big Data?

Maritina Stavrakaki explains that a lot of what is being done with big data in her field of agronomy is not new, however, it’s now much easier to access information within open access publications and others to which a university might have access.

What Are You Trying To Achieve With Big Data?

As a viticulturalist, Maritina Stavrikaki is concerned with things like genetic data (the DNA of the vines), phenotypic data (how/which genes are expressed), phenolic contents of the wine produced, the effects of climate change on vines and more. Here she explains how Big Data Europe is helping in her research.

Will This Work Be Useful to the Wine Industry and If So How?

Processing large amounts of data about vines might be a interesting academic exercise but will it actually help the wine industry? Maritina Stavrikaki explains why she is confident that it will.

What are the types of data that can benefit more from big data technologies in the area of food and agriculture?

Normally, Big Data in agriculture is associated with information collected by sensors, satellites or drones combined with genomic information or climate data, which can all help farmers to optimize their farm’s operations. These types of data are the most challenging in terms of Volume and Velocity.

In addition, challenges and opportunities have been identified by existing communities of data managers in this area also around the heterogeneity of the data that need to be combined and integrated for both fostering new research and innovation and providing meaningful information for decision making: in this case typical data types would include the types above as well as phenotypical data, soil observations, experiments, statistics, food traceability data etc. and the challenge would be more around Variety.

Images, especially plant and soil images and real-time streaming from experimental fields, are another area of application: image recognition and manipulation over huge collections or live streaming pose challenges in terms of both Volume and Velocity.

What are the advantages of BDE for food & agriculture?

Many researchers in the area of food and agriculture still use traditional technologies for managing and combining data (from Excel to RDBMS) and are only recently moving to using grid solutions and specific tools for sensors data and spatial / raster data like ENVI, R, GDAL etc.
The current advances in data collection techniques (like sensors and drones) and the availability of huge collections of scientific and experimental data that need to be integrated with other data to become meaningful and useful, make the manipulation and integration of such data very challenging with technologies that are not specifically designed for big data.

BDE will provide a (deployable) generic stack of technologies for big data plus a specific instance with ready-to-use demonstrators for the food and agriculture community.
Examples of application of such technologies: storage and processing of large volumes of data, computation over real-time data, scalable dataset processing for a number of standard formats, image pattern recognition and manipulation, access to data tables from within publications.
Overall, BDE will offer the kind of tool kit necessary to do research across a wide range of datasets more easily than is currently possible.

Energy

Will CRES Use BDE After The Project?

All too often, software is left on the shelf to gather dust as soon as the project that created it is over. Not so at CRES, as Fragiskos Mouzakis of CRES explains.

What Is The Reaction Among Other Wind Turbine Specialists?

Fragiskos Mouzakis describes the (positive) reaction of his fellow wind turbine specialists who are not involved in the BDE project, and what it is that excites them about the platform.

Why Are You In The BDE Project?

Fragiskos Mouzakis is not a big data engineer, he’s a wind energy spedcialist and, as such, represents the Energy societal challenge in the project. Here he explains hsi work at CRES and what big data technologies offer engineers like him.

What Is The Reaction to BDE Among Other Wind Turbine Specialists?

Fragiskos Mouzakis describes the (positive) reaction of his fellow wind turbine specialists who are not involved in the BDE project, and what it is that excites them about the platform.

What are the advantages of BDE for the energy sector

As well as electricity, wind turbines generate a lot of data. Managers need an infrastructure to ingest and process the data from large turbine arrays in real time using parallel processing. It is often the case that there is plenty of computing power available but that data is lost or not used for want of the tools for processing it. A local instance of BDE should go a long way to solving this problem.

Transport

What are the key application areas of big data in the transport domain?

The possibility to schedule one’s own trip whether they are using private or public transport, the prospects of using big data in scheduling transport infrastructure such as road maintenance; potential of using the data generated by vehicles to enhance the vehicles themselves and the road planning.

What are good sources for geo-traffic data?

There are numerous sources of traffic data in European member states. The ITS Directive requires each MS to provide a National access point for traffic data. However there is no centralized and harmonised way to deliver and access this information. This is left to the discretion of the European member states.

Is there some EU-wide policy on data aggregation in transport?

A few European policies are relevant in data management: PSI directive, ITS directive, the wider INSPIRE directive. In addition the European regulations related to data privacy are fully applicable.

How do you deal with quality control in the transport domain?

The beauty of big data itself is that it auto-regulates; it gives the opportunity to cross check a variety of streams of data acting as a mean of quality control on its own. through cross-correlations, it enables the identification of low quality sources which may still be considered but with a lower confidence or only for specific traffic events.

How can we ever be fully certain that autonomous vehicles will not be hacked?

This is not only a matter of autonomous vehicles but of all vehicles equipped with connectivity technologies. The manufacturers are constantly developing new security measures to mitigate the risk of hacking however the idea that the threat only relates to autonomous vehicles is not accurate.

What is your insight regarding the GPS or GNSS location information related to the privacy concern?

Positioning information is always considered as personal data. This is the case for traffic information delivered from vehicles on the road. GPS data needs to be an anonymised and consolidated early in the data chain in order to avoid any linkability of the data.  

Climate

What’s the Role of BDE in Climate Modelling?

The amount of data available related to climate is huge. Climatologists need to know what the impact of climate change will be and how it will affect lives. People generally want to know this in terms of a location so BDE will allow climatologists to see the past, present and expected future variance in different factors on a map. The specific benefit of the technology therefore is that rather than searching for datasets, the platform will know which datasets to invoke to answer questions phrased in terms of location.

Social sciences

What is BDE doing in the social sciences?

There are many aspects of social sciences that can benefit directly from big data technologies. Very often the task is to associate statistics with geographical regions, implying the use of two disparate technologies. Within BDE, the focus is on public sector spending data. Government budget data can be very large and is usually far too technical in nature for a non-specialist. Collaborating with the Your Data Stories project and accessing data from municipalities across Europe, BDE will use big data technologies to reconcile heterogeneous datasets and make it easier to ask questions and make comparisons.

Security

What is the role of BDE in Secure Societies?

The security focus within BDE is on combining satellite data with news and social media. The volume of data being delivered from satellites is enormous, with every part of the Earth now being analysed by multiple space-borne instruments every day. It is possible to link this data to news and social media but it tends to be a labour intensive and therefore very expensive process. BDE promises to make it easier.

Technology

Why Should I Use BDE rather than other big data platforms?

Aad Versteden explains that the BDE Integrator Platform is designed to let you experiment with different big data tools within your own context, i.e. run your own software on your own data. BDE makes this really easy and doesn’t require you to pick one tool before you start.

How Can BDE Help Someone Get Started With Big Data?

The topic of big data can be daunting – there are so many tools, options and methods. Hajira Jabeen explains how the BDE Integrator Platform can help someone who, like her, is new to big data, get up and running quickly and with a minimum of hassle.

What is the biggest achievement of BDE?

Aad Versteden explains the innovation behind the Big Data Integrator Platform. Recognising the work of others, he sets out how BDE allows you to work with components and tools that you may be unsure of but that BDE has already proved to work well in different combinations and toolchains in real life situations.

Can You Give An Example Of A Project Using the Big Data Europe Integrator Platform?

Ivan Irmilov gives an example of the kind of experiment that’s easy to do using the Big Data Europe Integrator Platform. See https://www.big-data-europe.eu/scalable-sparkhdfs-workbench-using-docker/ for more.

Why Did You Use Docker? Did You Consider Alternatives?

Aad Versteden explains that although Docker was not the obvious choice at the start of the project, as it has evolved, Docker, Docker Swarm and Docker Compose have become more mature and provide the flexibility needed.

Can You Explain the Concept of the Semantic Data Lake?

Two of the three Vs of big data – volume and velocity – have had a lot of attention and seem largely solved. Not so the problem of variety, that is, handling differenet and diverse sources and types of data. Mohamed Nadjib Mami explains how the Big Data Europe Integrator Platform solves the problem by abstracting all data into a semantic layer – the Semantic Data Lake.

At What Scale Have You Tested The Semantic Data Lake?

The Semantic Data Lake is an abstraction that allows you to execute a SPARQL query against any number of different data sources that can be in any number of different formats – it’s a key innovation of the Big Data Europe Integrator Platform. But does it scale? Mohamed Nadjib Mami has the answer

What Are The Most Innovative Features Of BDE?

Hajira Jabeen describes what BDE has contributed to the community and what have been its most innovative features. In answer to the secind point, she highlights the BDE’s Semantic Data Lake and the SANSA Analytics engine.

How Can You Run a SPARQL Query Over Non RDF Data?

The BDE Integrator Platform’s Semantic Data Lake allows you to execute a SPARQL query across multiple datasets, irrespective of their format. Mohamed Nadjib Mami explains how this is possible and the advantages of doing so.

How Does BDE Handle RDF at Scale?

RDF is notorious for not being as scalable as relational data. Hajira Jabeen says that experiments carried out within BDE suggest that is no longer the case.

What Is There Left To Do?

Every development cycle ends with “if we had more time we would…” Here, Ivan Irmilov considers what he hopes to add to the platform after the Big Data Europe project ends.

What Will I Find In The Big Data Europe GitHub Repo?
Aad Versteden tells you what to look for in the Big Data Europe GitHub repository (take a look around – then go for the Readme!)
Does the platform scale elastically/automatically?
The platform doesn’t currently include support for automatically adding additional instances of different components in response to demand. However, the Docker-based architecture and overall design means that this capability could readily be added in future.
How can you add other components that are not provided in the set of BD components?
This is very easy to do and works for any component that can be virtualised as a Docker component.
Where can I get & discuss the code?

The code for the project is available in our GitHub repository.

Will Big Data Europe provide a technical solution for organisations to make their first steps in Big Data?

We intend to have a low barrier of entry. A simplified version of the platform could be installed using a Vagrant setup. Developing Big Data components will be made easier by the provision of a base Docker image published on the Big Data Europe GitHub repository. These base images and their documentation are continuously evolving.

Leave a comment

You must be logged in to post a comment.