Artificial intelligence (AI) and machine learning are among the hottest and most hyped areas in big data. From robotics to self-driving vehicles to DeepMind’s Alpha Go, in recent years the media has shown us a multitude of potential applications of AI to a broad range of topics and research. The health sector is no exception – already we’re seeing the beginnings of AI and machine learning approaches to big data in health, from the earliest stages of drug discovery through to treating patients in the clinic and large-scale analytics of patient data.
Machine learning has been playing a role in data analysis for decades. The origins of this go all the way back to the 1950s, with Alan Turing’s seminal work on early “thinking machines”, and the area has been evolving ever since. So what has changed in the last five years for us to see such acceleration in AI and machine learning?
Part of this is simply down to ever-increasing performances of computer processors (CPUs) and the growth of GPUs. Another factor is the explosion of accessible data across social and scientific fields.
One example of how access to large datasets supports machine learning is in the countless images that power Google and Facebook’s image recognition software. Without the billions of pictures we’ve shared of our furry pets, we would never have seen such quick evolution from basic image tools to machines that can spot a cat amongst other animals! These humble beginnings have grown to support applications in life sciences, benefitting the neural networks used for image analysis in research. For example, a number of pathology-related start-ups now offer AI-based image analysis that helps scientists with quicker detection and identification of issues.
Over the last 3-5 years, the development of accessible AI toolkits (e.g. Caffe, Theano, TensorFlow, Keras) has taken advantage of the improvements in computer processing and access to big data, bringing together the key factors to accelerate progress and open up all kinds of future potential.
So, what do we need to continue to realise the benefits of AI in health data?
First of all, AI still requires data, and lots of it. More specifically it requires good quality data; a machine learning model is only as good as the data it is based on, and good data are necessary to build robust models that can deal with edge cases and complex problems.
As we’ve seen in our SC1 pilot project, perhaps the biggest challenge to bringing together large, good-quality datasets in health is data variety. Many areas of health deal with data that comes from different places, is generated by different technologies, and is stored in different formats and databases. To realise the benefits of big data in health, all of this data needs to be integrated and mapped together in a way that still maintains the provenance and special characteristics of the original data.
Secondly, we need a step change in awareness of the key skills needed both to build good quality models, but also to interpret their accuracy and utility. Various groups have already begun supporting greater awareness of AI applications in life sciences, and building up communities in this area – for example the Pistoia Alliance’s open community for AI projects.
AI is already having clear impacts in the drug discovery space. We are seeing enhancements to drug candidate Quantitative Structure Activity Relationship (QSAR) and prediction processes, and Open PHACTS is supporting the use of existing datasets for model building and analysis in this area. GlaxoSmithKline has recently signed a deal with Exscientia to explore the impact of AI-driven chemistry approaches in early drug target identification, and companies like BenevolentAI are taking this a step further, using AI to build an extensive pre-clinical and clinical pipeline of drug programmes.
Looking further ahead at broader potential impacts, the foundational infrastructure that Big Data Europe has created will be a great starting point for using AI to support wider analysis across different domains. We can imagine applications connecting environment and population health data to build better models and analysis of relationships between the two.
Perhaps there is room here for an AI Europe (AIE) initiative to build on these big data themes, and ensure that the benefits of Big Data Europe continue to grow beyond 2020.
Again though, the one common factor needed for success in AI approaches to health data will be the ability to tackle data variety and provide AI models with the high quality, well-integrated datasets they require. Big data approaches hold the key to unlocking value in health data, but tackling the data variety issue in this domain is where the real impact can be had.