This blog post is based on a forthcoming paper by Antonis Troumpoukis, Angelos Charalambidis, Giannis Mouchakis, Stasinos Konstantopoulos, Ronald Siebes, Victor de Boer, Stian Soiland-Reyes and Daniela Digles which will be presented at BLINK workshop at the ISWC conference.
In the Big-Data-Europe project, the technical infrastructure is based on requirements and insights from the seven societal domains. A nice example in which domain-specific insights can be used to test generic infrastructures is found in the Health domain.
For measuring the performance of Semantic Web query processing platforms, query benchmarks are needed. Current benchmarks are often based on technological features of the queries and not so much on how realistic these queries. The SC1 Health pilot replicates the functionalities of the Open PHACTS drug discovery platform (https://dev.openphacts.org/) . To both develop and validate this original platform, 20 questions were gathered by pharmacological domain experts based on realistic workflows. An example question is Question 16: “Targets in Parkinson’s disease or Alzheimer’s disease are activated by which compounds?”. The effectiveness and efficiency of the platform was evaluated based on these questions.
These rather complex queries and workflows form the basis for the query set of our benchmark. As such it is a queryset that is simultaneously complex and frequently used and independently motivated. The benchmark engine and the queries are available as open source (https://github.com/semagrow/kobe) while all datasets are also publicly available. This is work in progress and we are currently using the new benchmark to test state-of-the-art federated query processing systems.