Add Semantics to your variety of Big Data Sources


Big Data poses several challenges, typically characterised the three Vs: volume, velocity and variety (a fourth is veracity). Volume and velocity are largely solved by components such as HDFS, Spark and Flink. However, these tools don’t tackle the problem of variety, that is, different data types and non-matching terms in different datasets. Big Data Europe tackles the problem of variety head on using Semantic Web technologies.

Big Data Europe features the concept of a Semantic Data Lake. Data is stored in whatever format it arrives in, but it can be queried and analysed as if it were stored as RDF. Ontario, a realisation of the Semantic Data Lake, accepts SPARQL queries that are then re-written and run over one or more datasets in whatever the relevant query language may be. The results are combined before being returned as a single result set. On the other hand, the SANSA Stack uses RDF data as its input and is able to perform sophisticated analysis, e.g. querying, reasoning or machine learning over the linked data available in the platform.

Thus, the Big Data Europe Integrator offers the full power of the Semantic Web to handle variety but within a high performance environment.

Learn more

icon Build your Big Data Pipeline through a simple graphical UI
icon Maximum flexibility brought to you by Docker