Industry Insight


Cutting through the noise: Data science and COVID-19

Georgios Tsatsaronis, Vice President Data Science Research Content at Elsevier, illuminates the role data scientists play in the plight for new and innovative therapeutics

While artificial intelligence (AI) and machine learning (ML) have long been considered the future of research within the life science industry, real-world, impactful applications for these technologies have been limited. Like many things, this changed during the COVID-19 pandemic. The urgency that the pandemic created, coupled with advances in software and hardware, has forged the way for scientists to apply AI and deliver significant impact. Today, AI is revolutionising how data is managed and used within the life science and R&D sectors.

Data scientists play an essential role in the search for new therapeutics, and although there is an abundance of data to work with, scientific questions are becoming only more complex, while the demand for answers is becoming more urgent. To overcome the mounting pressure on research and discovery teams, the industry requires tools that can help filter and select the relevant data quickly. This is where AI and ML come in.

Keeping up with the ever-growing encyclopaedia of knowledge

In a saturated and growing data landscape, finding the relevant information to inform a discovery project can seem impossible. The timely, accurate, peer-reviewed, and efficient communication of any novel research findings is key in driving innovation. Research teams cannot be expected to read and process all the published scientific literature on a topic before setting out on their own experiments. But it is also vitally important that work is not repeated, and teams are as well-informed as possible before they undertake new projects.

Research tools need to be capable of reliably searching for and extracting only the relevant information from literature, and to provide automated results so that teams can design betterinformed projects. For AI and ML tools to help, they must be able to recognise which documents are relevant. However, poorly refined models will result in unrelated papers being flagged – or worse, researchers missing out on valuable data that was mistakenly filtered out.

The COVID-19 data influx

As the research community mobilised in response to COVID-19, huge volumes of data and literature were produced in a short space of time. These resources were essential to those developing therapeutics and vaccines, but the volume and velocity of information produced made it difficult to find relevant data. As the rate of COVID-19 publications increased, so did the challenge of sifting through it.

The core needs of scientists in any area with rapidly growing amounts of data can be categorised as follows. For data to be of most value, it must be:

Timely: Clinicians, biologists, chemists, and others interested in the data need to be able to locate answers quickly. The more time spent searching for information, the less productive researchers can be. Finding pertinent data fast can speed up treatment discovery and development times, ultimately saving lives.

Accurate: Researchers need to be able to rely on results; inaccuracy can lead to wasted time and costs, hampering innovation, and even to potential safety issues that emerge further down the line.

Relevant: Often, researchers come across an article that is related to their topic but has no relevance to their study. For example, an article about the financial or social impact of COVID-19 does not help those developing vaccines and treatments for the disease.

Image

Cutting through the noise

To meet these needs, researchers need a reliable data filtration system to cut through the noise and help them locate and extract the relevant insights.

The promise of such algorithms is not limited to COVID-19. AI and ML frameworks have farreaching potential for other diseases. While the scale of COVID-19 is anomalous, methods that facilitate more efficient literature reviews can benefit research teams working across a wide range of diseases.

Scientific literature holds the answers to researchers’ questions, but thoroughly inspecting each one manually represents an insurmountable task. This hinders treatment progress, which ultimately impacts lives. It’s imperative that we harness the power of evidence-based data science to overcome these challenges, and help enable timely access to answers that guide treatment decisions.

Georgios Tsatsaronis is Vice President Data Science Research Content, Elsevier