The ISO/IEC Joint Technical Committee 1 Working Group 9 on Big Data Standards has been meeting in Dublin this week. Ahead of that meeting, members of WG 9 invited a range of stakeholders, including Big Data Europe, to Dublin City University on Monday to take an overview of the standards landscape. The International Workshop on Big Data Standards 2016 heard from speakers such as the head of Ireland’s Development Agency, Leo Clancy, the European Commission’s Head of Unit for Smart Cities and Sustainability, Colette Maloney, and chair of the CEN Workshop on ICT Skills, Dudley Dolan. All speakers emphasised the need for cooperation and standardisation if Big Data and the Internet of Things – increasingly seen as two aspects of the same thing – are to be the basis of European success.
Other speakers gave examples of working with standards. Dave Lewis of Trinity College Dublin talked about the importance of standards in his work in language technologies, and John Strassner, CTO of Huawei’s Software Labs in the Americas, focused on his work using the Semantic Sensor Network Ontology.
It was interesting to hear about the ALIGNED project in which BDE partners Semantic Web Company and the University of Leipzig are also participating. ALIGNED will be applying Big Data technologies to the task of extracting, processing and sharing Web data, dealing with the challenges of dynamism, complexity, scale and inconsistency.
There were two common themes that occurred repeatedly throughout the workshop:
- the need for the semantification of data;
- the importance of licences and access.
Data – on any scale – can only be integrated if its meaning is clear. Data is always collected or created within a specific context. Once the data is shared, that context can very easily be lost, rendering the data unusable by others. This is why the BDE tool chain will be engineered to take full advantage of available metadata associated with the datasets used.
A feature of the BDE platform is that it can be instantiated and configured locally. Although it can certainly make use of open data sources, it is equally useful in semi-open or closed environments. In this regard, the standardisation work getting under way at W3C on Permissions & Obligations is timely. This is supported directly by the BDE project and it was encouraging to hear so many people at the Dublin workshop talking about the need for machine processable licences and rights statements.
In my own slot, I talked about the BDE platform of course. The presentations were short so I only had time to talk about one of our pilots – the re-engineering of OpenPHACTS to take advantage of BDE platform’s ability to work at scale and the expected increase in the ease of data processing. The general topic of the semantification of data and the close link with IoT led me also to talk about the Semantic Sensor Network Ontology which is being put through the formal standards process of both W3C and OGC through the Spatial Data in the Web Working Group. Top of the list of Best Practices recognised by that group? The use of persistent HTTP URIs as identifiers at entity level within any dataset. This in turn builds on W3C’s broader Data on the Web Best Practices.
But there is a problem.
Standards are not written by standards bodies. Nor are they written in isolation. They are written by members of the community who benefit by collaborating to define common methodologies; and in that regard I was very encouraged by Ed Curry’s presentation. As well as being a researcher at Ireland’s Insight Centre, Ed is Vice President of the Big Data Value Association. Prompted by the BDE project and supported directly by my W3C colleague Felix Sasaki, as well as one of the organisers of the Dublin event, Abdellatif Benjelloun Touimi of Huawei, BDVA has an increasingly active discussion around standardisation.
It was highlighted in Dublin that standards work only begins when a dominant method has been established after wide experimentation. Recognising that, the BDVA is working to identify which standards need developing and, crucially, identifying the partners who are most motivated to put in the effort to create the standard and which standards development body is most appropriate for the specific work item. As I emphasised, when a community believes that someone else should develop standards on its behalf, the meetings are entirely hypothetical.