Using genomics to uncover the causes of rare diseases and improve their treatment

Neil Ward (NW): Most of the whole genome sequencing (WGS) done to date has used short-read sequencing technology. This approach takes a DNA sample and smashes it into billions of tiny fragments. Those fragments are copied and sequenced in a parallel process that produces billions of sequences 150 base pairs in length. Sequences are aligned to a reference genome and computer programmes identify the differences. Each person’s genome has millions of small genetic variants and sophisticated algorithms can compare those variants to databases of other people’s genomes to determine if they are potentially causative of disease.

Long-read WGS follows the same path but produces accurate sequencing information from DNA molecules that are at least 100 times larger, in the range of 15,000-20,000 base pairs. This greatly simplifies the interpretation of the sequence data, in the same way that a child’s jigsaw with a few large pieces is much easier to assemble than one with thousands of pieces that look identical to one another. The computer programmes that align short-read fragments back to the reference genome often struggle to determine where fragments have come from as the human genome has many stretches of DNA that are very similar. There are approximately eight single nucleotide variants in each 10,000 base pair stretch of DNA, but since short reads are only 150 base pairs long, many of these reads have no variation to help determine which chromosomes they have originated from.

There are hundreds of genes known to be clinically important to rare diseases. However, many of these genes span multiple regions of the genome and evade common sequencing tests. For example, exome sequencing is the first-line test for many rare diseases but is limited to looking for small genetic changes (under 50 base pairs) associated with specific rare diseases. Short-read whole genome sequencing (srWGS) is typically offered next, but as outlined, has its shortcomings in identifying longer variants and their location within the genome. Examples of such genes include human leukocyte antigen (HLA) genes associated with the immune system, which are highly polymorphic, or regions where variations in copy number or orientation can influence traits such as drug metabolism.

NW: The first step in improving rare disease care is for researchers to gain a complete understanding of the genetic variants that underpin diseases. Yet, many healthcare systems today aren’t doing a deep enough investigation. NHS England is one of the most advanced healthcare systems and short-read whole genome sequencing is routinely used in attempts to help diagnose rare diseases. However, the deficiencies of short-read sequencing mean that under half of the participants receive an explanation for their disease, with an average wait of four to five years for those that eventually do.2 Investing in long-read sequencing has the potential to significantly increase the number of participants gaining insight into their disease. Better understanding of specific genetic variants causing a disease can result in better management of the condition, including improved development of potential therapies.

NW: We believe long-read sequencing will help provide a more comprehensive picture of the genome and will have benefits across all diseases. Many rare diseases are difficult to solve because the regions of the genome typically affected are hard to assess with short-read sequencing. For example, diseases of the nervous system, such as myotonic dystrophy, are caused by the expansion of simple repeat sequences. Accurate long-read sequencing enables scientists to accurately determine the length of those repeat stretches as well as the nucleotide sequences, helping predict disease pathogenesis. Copy number variations have been shown to be a major contributor to many diseases and long-read sequencing characterises those complex genomic changes more accurately than short reads. We expect that this will have significant benefits to individuals with eye disorders or hearing loss.

NW: In cancer research, the two main current approaches to genomic sequencing, short and long reads, each bring value to different scenarios. In contrast to rare disease research, short-read sequencing has utility in cancer research where the depth and length of long reads are unnecessary, such as for single-nucleotide polymorphism (SNP) calling or sequencing microRNAs. Data from short reads helps to track residual cancer more accurately and aid ongoing cancer screening and early detection. Progress in the sensitivity and specificity of short reads means read results now have far fewer errors, reducing the number of false positives while increasing biological insight.

In other oncology use cases, long reads are required for exposing challenging variant types associated with cancer, like tandem repeats. This affords a deeper understanding of individual cancers and can advance precision oncology research. Long reads also reveal insights into the epigenome, which is where many early genetic changes related to cancer first show.3 3 With third-generation sequencing technologies making long-read technology more accessible, it’s more feasible to integrate long-reads into cancer studies at scale, enabling the discovery of new biomarkers associated with the risk of cancer.

Another area that will benefit from advances in sequencing is pharmacogenomics (PGx) research. PGx analyses the role of the genome in drug response and enables better outcomes for patients by tailoring prescriptions according to their genomic profiles. For example, if the NHS had a better understanding of which CYP2D6 variant patients had, it could save more than £41m a year on prescribing of SSRIs and opioids, which are heavily linked to the gene. Advancing PGx research relies on having a comprehensive and accurate view of the human genome, which can be achieved with the latest long-read whole genome sequencing.

NW: We expect long reads to become even more accessible as costs come down and the scale of sample processing increases. For example, a whole genome sequence test cost $100m in 2001, but is now available for $1,000 with a 24-hour sequencing time. These advances mean researchers and institutions are increasingly adopting long-read sequencing. With more long-read data being collected, the bank of rare disease research will grow, fostering a global rare disease research ecosystem that offers hope to patients and their families.

Neil Ward is vice president and general manager for PacBio across Europe, Middle East and Africa. Ward is a genomics industry veteran with more than two decades of global experience, he has a passion for the role genomics can play to better human health and believes that this can be achieved by accelerating the utility of in-depth, highly accurate genomic applications.
In his various commercial roles, Ward has served as a key contributor to many of the world’s largest genomics projects including Genomics England’s 100,000 Genome Project, the Estonian Genome Project and the whole genome sequencing of the 500,000 UK Biobank samples.

Using genomics to uncover the causes of rare diseases and improve their treatment

Pharmafocus: What is long-read WGS and how does it differ from other sequencing approaches?

Pharmafocus: How will understanding genetic causes of rare diseases help discover new treatments?

Pharmafocus: How can this technology translate into noticeable improvements in rare disease care?

Pharmafocus: Which rare diseases are most likely to benefit from developments in genomic sequencing technology and why?

Pharmafocus: Other than rare diseases, where do you think genomic sequencing will have the biggest impact?

Pharmafocus: How do you see genomic sequencing developing in the next five years, specifically in terms of rare diseases?