The day after the main European Data Forum, Big Data Europe teamed up with the HOBBIT and BYTE projects for a joint workshop. About 50 people gathered to hear Axel Ngonga of InfAI describe the HOBBIT project’s aim to provide a system to benchmark big data software and, later in the day, there was much discussion about how this can be used. For example, is it a service in its own right to be used by others or would be a tool used by a consultant as part of a more general service.
Fraunhofer’s Sören Auer introduced BDE and, referring to his presentation in the main EDF event, talked about the idea of the Semantic Data Lake. Technologies like JSON-LD, CSVW and R2RML allow existing data in multiple formats to be converted into Linked Data for easier integration. Finally Scott Cunningham of TU Delft introduced the BYTE project which is exploring the economic, legal, social and ethical, political impacts of big data.
The rest of the morning comprised a panel discussion, chaired by W3C’s Phil Archer, that focussed on the apparent conflict between converting everything to Linked Data and the problems associated with complex queries across very large numbers of triples. Sören Auer emphasised that although RDF should be the lingua franca of the semantics, the physical storage is much more likely to be a relational database or a graph store and that access is much more likely to be via a simple API than a SPARQL endpoint. Sonja Zillner of BDVA and BYTE, said that in her day job at Siemens, no one talks about semantics as it’s seen as esoteric – but it’s clearly very useful. The problem is that understanding a dataset may be a full time job. Where’s the incentive to facilitate new uses? And of course both copyright and privacy issues may restrict the level of integration that is legal or wise. As ever, there were calls for better tooling, wider education and more inspirational examples. Martin Strohbach and Axel Ngonga saw plenty of opportunity for benchmarking as a means to help companies make procurement decisions, particularly, SMEs.
There was agreement that in areas like health, the advantages of semantics are clear but they are less well known in industry and that there is a job to do, not to tell manufacturers what the benefits of integrated data can be, but to show them. Peter Winstanley of the Scottish Government argued for greater use of XML within Linked Data frameworks, bemoaning the effort some people expend in turning tables into RDF only to turn it back into tables again. Phil Archer pointed to the recently formed RDF and XML interoperability Community Group that is loking at exactly this kind of issue.
Sonja Zillner concluded that we need semantics for 3 three reasons:
- for easier data integration;
- to make analytics more flexible;
- to increase the impact of insights by understanding the full context.
But this needs to happen through easy to use tools. Everyone agreed that few users want to be disturbed by thoughts of triples moving around.
The final session was much more relaxed as befits a Friday afternoon at the end of a very busy conference week. Most of the discussion centred on how tools like the HOBBIT benchmark can best be exploited and deployed. Is it something that a potential customer would install and run locally? Is it something that should be part of a consultant’s toolkit? Is the target market SMEs or enterprise? There was agreement that context is crucial for any meaningful benchmark. What is the context in which data will be processed and used? The same tool might score very differently depending on the local configuration.
The final topic was licensing. This is the subject of the W3C’s Permissions & Obligations Working Group which is supported by the BDE project. The experience of most participants was that if data comes with any licence at all, it’s usually one of the well known ones, such as GPL or Creative Commons. However, it’s not always clear what they mean in detail which highlights the use case put forward by the European Data Portal that calls for a machine readable form for the atomic duties and permissions expressed in such licences.
Before closing, the workshop heard from the BYTE partners who had been continuing their discussion in parallel. They identified many areas for future research but, perhaps less helpfully, declared them all to be urgent! Their roadmap will be fed back to the Big Data Value Association at their summit in Valencia at the end of November.