Don’t settle for black box: why only explainable AI is built for scientific discovery

At first glance, it doesn’t seem so. We find that across life sciences fields of research, there has only been a moderate increase in AI-related publications since 1970 (especially in comparison to computer science and business management). Of all research fields surveyed, ‘pharmacology, toxicology and pharmaceutics’ (research on the clinical behaviour, pharmacokinetics, chemical properties and associated toxicity of drugs) saw the lowest rate of increase of AI publishing intensity. Here, the percentage of research utilising AI only reached 2.2% after over 50+ years (see Figure 1).2

AI is a very broad term, but there are a lot of things that AI is not: AI is not automation, it’s not a statistical model and it’s not a pre-programmed software or a rule-based system. No matter what a vendor or supplier states, if what you’re using does not have the ability to learn or improve over time from new data, then it is not AI. So, then what actually is AI? When people say AI, they’re typically referring to artificial narrow intelligence (ANI). This includes subfields that are often referred to interchangeably – eg, ‘machine learning’, ‘deep learning’ and ‘neural networks’ – but in reality, there are nuances between these subfields. Generative AI (the form of AI popularised by ChatGPT and DALL-E), although powerful, is still considered ANI.

Most forms of AI are good at generating an accurate prediction, but they are not good at explaining ‘the why’ behind their predictions. This is why AI is often referred to as a ‘black box’. Such predictions are generated when data is fed through thousands or even millions of interconnected layers of nodes that form a network. A single node may be connected to several nodes in the layers before and after it. It’s a small comfort that data is ‘fed forward’, meaning that data only moves through these interconnected layers of nodes in one direction. It’s still an immensely complicated and tangled system that is impossible to unravel.

In order to squeeze out every bit of predictability possible from these millions of entangled and interlaced nodes, black-box models require you to feed them more and more data (also known as being ‘data hungry’). The data will first travel randomised routes through the network, and then the same routes over and over and over again as you ‘train’ it. Every data point you feed it will go through this process, whether or not it’s actually important to the prediction. This is why at the end of the process, you’re left with only a prediction (because a very tangled ball of millions of routes does not provide you much value).

Let’s say you wanted to optimise lead candidates by modelling drug properties such as efficacy and toxicity. Black-box AI would be good at generating a single, accurate prediction (‘active, no’ or ‘toxic, yes’). But if that prediction wasn’t what you were looking for, you wouldn’t be able to determine the route travelled to arrive at that answer. You would simply have to try again and again with more data until you received your desired prediction (‘active, yes’ or ‘toxic, no’).

A scientist would read the model in this way: ‘There is a normal probability distribution (gaussian) relationship between target binding energy and last dimers. I can predict the activity of a molecule if I consider that in addition to the first dimer.’ Now this is new knowledge, and it’s an explainable, understandable phenomenon, not just a prediction. A scientist can confirm that hypothesis in published literature, or perhaps it can serve as a new hypothesis to be tested. It is the icing on the proverbial cake that symbolic AI requires much less data to come to this logical conclusion.

Don’t settle for black box: why only explainable AI is built for scientific discovery

An example using black-box AI

Scientists ask ‘why’ questions. Can AI ever answer ‘why’?

Why can symbolic AI answer ‘why’ questions, while sub-symbolic AI cannot?

An example using symbolic AI

Symbolic AI is the path forward for science