Insider Interview

Using genomics to uncover the causes of rare diseases and improve their treatment

Neil Ward from PacBio tells Pharmafocus about the use of whole genome sequencing in diagnosing and treating rare diseases, as well as considering other areas in which this could be utilised

Pharmafocus: What is long-read WGS and how does it differ from other sequencing approaches?

Neil Ward (NW): Most of the whole genome sequencing (WGS) done to date has used short-read sequencing technology. This approach takes a DNA sample and smashes it into billions of tiny fragments. Those fragments are copied and sequenced in a parallel process that produces billions of sequences 150 base pairs in length. Sequences are aligned to a reference genome and computer programmes identify the differences. Each person’s genome has millions of small genetic variants and sophisticated algorithms can compare those variants to databases of other people’s genomes to determine if they are potentially causative of disease.
Long-read WGS follows the same path but produces accurate sequencing information from DNA molecules that are at least 100 times larger, in the range of 15,000-20,000 base pairs. This greatly simplifies the interpretation of the sequence data, in the same way that a child’s jigsaw with a few large pieces is much easier to assemble than one with thousands of pieces that look identical to one another. The computer programmes that align short-read fragments back to the reference genome often struggle to determine where fragments have come from as the human genome has many stretches of DNA that are very similar. There are approximately eight single nucleotide variants in each 10,000 base pair stretch of DNA, but since short reads are only 150 base pairs long, many of these reads have no variation to help determine which chromosomes they have originated from.
Overall, this means short reads lead to less accurate variant calls, which are necessary for the effective diagnoses of disease – as well as more time and resources spent on bioinformatics for scientists in the lab. Historically, short-read genome sequencing has been significantly cheaper and higher throughput than long reads, but recent innovations have closed the gap. Many researchers are now turning to long reads to get a more complete understanding of the genome.

Pharmafocus: How will understanding genetic causes of rare diseases help discover new treatments?

NW:Since more than 70% of rare diseases have genetic origins, deploying genomic technologies helps researchers better understand the causal mechanisms of rare diseases, and enables them to find potential new drug targets and biomarkers to develop novel treatments. 1
There are hundreds of genes known to be clinically important to rare diseases. However, many of these genes span multiple regions of the genome and evade common sequencing tests. For example, exome sequencing is the first-line test for many rare diseases but is limited to looking for small genetic changes (under 50 base pairs) associated with specific rare diseases. Short-read whole genome sequencing (srWGS) is typically offered next, but as outlined, has its shortcomings in identifying longer variants and their location within the genome. Examples of such genes include human leukocyte antigen (HLA) genes associated with the immune system, which are highly polymorphic, or regions where variations in copy number or orientation can influence traits such as drug metabolism.
To unlock new routes to treatment, complex variations cannot be missed. The ability to see the full genome is critical to understanding how genetic variations drive susceptibility to disease, response to therapies and many other phenotypes. This depth of understanding can only be achieved with long-read sequencing.

Pharmafocus: How can this technology translate into noticeable improvements in rare disease care?

NW: The first step in improving rare disease care is for researchers to gain a complete understanding of the genetic variants that underpin diseases. Yet, many healthcare systems today aren’t doing a deep enough investigation. NHS England is one of the most advanced healthcare systems and short-read whole genome sequencing is routinely used in attempts to help diagnose rare diseases. However, the deficiencies of short-read sequencing mean that under half of the participants receive an explanation for their disease, with an average wait of four to five years for those that eventually do.2 Investing in long-read sequencing has the potential to significantly increase the number of participants gaining insight into their disease. Better understanding of specific genetic variants causing a disease can result in better management of the condition, including improved development of potential therapies.

Pharmafocus: Which rare diseases are most likely to benefit from developments in genomic sequencing technology and why?

NW: We believe long-read sequencing will help provide a more comprehensive picture of the genome and will have benefits across all diseases. Many rare diseases are difficult to solve because the regions of the genome typically affected are hard to assess with short-read sequencing. For example, diseases of the nervous system, such as myotonic dystrophy, are caused by the expansion of simple repeat sequences. Accurate long-read sequencing enables scientists to accurately determine the length of those repeat stretches as well as the nucleotide sequences, helping predict disease pathogenesis. Copy number variations have been shown to be a major contributor to many diseases and long-read sequencing characterises those complex genomic changes more accurately than short reads. We expect that this will have significant benefits to individuals with eye disorders or hearing loss.

Pharmafocus: Other than rare diseases, where do you think genomic sequencing will have the biggest impact?

NW: In cancer research, the two main current approaches to genomic sequencing, short and long reads, each bring value to different scenarios. In contrast to rare disease research, short-read sequencing has utility in cancer research where the depth and length of long reads are unnecessary, such as for single-nucleotide polymorphism (SNP) calling or sequencing microRNAs. Data from short reads helps to track residual cancer more accurately and aid ongoing cancer screening and early detection. Progress in the sensitivity and specificity of short reads means read results now have far fewer errors, reducing the number of false positives while increasing biological insight.
In other oncology use cases, long reads are required for exposing challenging variant types associated with cancer, like tandem repeats. This affords a deeper understanding of individual cancers and can advance precision oncology research. Long reads also reveal insights into the epigenome, which is where many early genetic changes related to cancer first show.3 3 With third-generation sequencing technologies making long-read technology more accessible, it’s more feasible to integrate long-reads into cancer studies at scale, enabling the discovery of new biomarkers associated with the risk of cancer.
Another area that will benefit from advances in sequencing is pharmacogenomics (PGx) research. PGx analyses the role of the genome in drug response and enables better outcomes for patients by tailoring prescriptions according to their genomic profiles. For example, if the NHS had a better understanding of which CYP2D6 variant patients had, it could save more than £41m a year on prescribing of SSRIs and opioids, which are heavily linked to the gene. Advancing PGx research relies on having a comprehensive and accurate view of the human genome, which can be achieved with the latest long-read whole genome sequencing.

Pharmafocus: How do you see genomic sequencing developing in the next five years, specifically in terms of rare diseases?

NW: We expect long reads to become even more accessible as costs come down and the scale of sample processing increases. For example, a whole genome sequence test cost $100m in 2001, but is now available for $1,000 with a 24-hour sequencing time. These advances mean researchers and institutions are increasingly adopting long-read sequencing. With more long-read data being collected, the bank of rare disease research will grow, fostering a global rare disease research ecosystem that offers hope to patients and their families.
The next major breakthroughs will come from layering additional data types on top of long-read sequences. We believe that sequencing full-length RNA will afford researchers a better understanding of how non-coding variations in the genome impact the expression of different gene isoforms. We also expect other ‘omics data, plus the continued digitisation of health information, will improve the ability to understand the complex biology of diseases.

Neil Ward is vice president and general manager for PacBio across Europe, Middle East and Africa. Ward is a genomics industry veteran with more than two decades of global experience, he has a passion for the role genomics can play to better human health and believes that this can be achieved by accelerating the utility of in-depth, highly accurate genomic applications.
In his various commercial roles, Ward has served as a key contributor to many of the world’s largest genomics projects including Genomics England’s 100,000 Genome Project, the Estonian Genome Project and the whole genome sequencing of the 500,000 UK Biobank samples.