It gives the BDE Team great pleasure to announce the second public release of the open source BigDataEurope Integrator Platform (BDI). This platform is developed with the aim to facilitate and simplify the use of big data technologies by providing easy-to-use interfaces and deployment options. In this framework, the BDE Project offers practical solutions for big data related challenges, as identified and elicited in the context of the seven Societal Challenges, which constitute BDE’s Big Data focus areas.
During the last months, the BDE Platform has been significantly improved and further developed. The Platform is upgraded and curently uses Docker 1.12, benefiting thus from the new features added to Docker. For instance, Docker Engine is now offering multi-host and multi-container orchestration which is simple to use and accessible to everyone. Docker 1.12 Networking plays a key role in enabling these orchestration features. In this new release, BDI uses the following:
- Swarm-mode networking
- Routing Mesh
- Ingress and Internal Load-Balancing
- Service Discovery
- Multi-host networking with integrated KV-Store
- Fault tolerance
As a result, we can now create multiple containers on multiple nodes using Docker Compose. Docker Compose V2 and Docker Swarm aim to implement full integration, which means that it is feasible to point a Compose app at a Swarm cluster and make its use possible in the same manner as if a single Docker host was used.
Briefly stated, the new BDI release uses Docker 1.12 and Docker Compose V2, whilst coordinating the volume and variable management so that the containers can be scaled dynamically on the Swarm cluster.
Previous Swarm versions were very simple and worked out-of-the-box with all Docker client commands but lacked features such as service discovery, replication management or load balancing. With the Docker 1.12 release, these gaps are narrowed down and now there is a greater resemblance to Kubernetes in terms of orchestration features. Swarm presents a better choice in terms of shifting from a local/development environment to a cluster.
The resulting BDI remains easy-to-deploy, easy-to-use and adaptable (cluster-based and standalone) for the execution of big data frameworks and tools. The BDE Team provided baseline docker images for Apache Hadoop, Apache Spark, Apache Flink and many others. We selected these components based on the requirements gathered from the participating Societal Challenges. Thus, the Platform makes it feasible to perform a great variety of big data tasks, including message passing (Kafka, Flume), storage (Hive, Cassandra) or publishing (Geotriples).
The Platform is tested by the uses cases related to each one of the Societal Challenges, where the Docker components provided by BDI are used to implement the desired work flow.
The BDE Team has developed components, such as the Integrator UI, an Init daemon which allows the creation of work flows and monitoring of the start-up status of inter-dependent docker components. The Pipeline service and Pipeline Builder are developed with the aim to support the creation of workflows. Furthermore there is a Pipeline Monitor frontend, which demonstrates the current status of docker components. The Swarm UI supports in terms of visualising the status of your swarm cluster.
Moreover the BDE Team advances with regard to its research efforts in terms of smart data. The first release of SANSA (Semantics Analytics Stack) was announced on the 9th December. SANSA makes it possible to perform analytics on semantically structured RDF data by providing numerous out-of-the-box scalable algorithms for massive datasets.
The BDI Platform released today constitutes the ongoing effort to exploit the use of Docker, Docker Compose, Swarm and to develop easy to use interfaces for monitoring and deployment. The users can follow a simple set of instructions in order to install the platform and then they can start working with it. In the upcoming release, you can expect to see an integrated interface which will allow the creation and monitoring of a custom workflow by selecting the desired components.