Ohio University

Next Generation Sequencing

Gene sequencing graphic

Three possible scenarios for each DNA sequence in each nucleotide flow on the Ion Torrent PGM.

Next Generation Sequencing (NGS) refers to a number of different technologies that all sequence relatively short nucleic acid sequences (25 to 600 nucleotides) in a parallel, high-throughput manner via sequence by synthesis (SBS) technologies. Following the sequencing reaction, bioinformatic tools re-assemble the sequences into whole genomes, map RNA sequences to references for gene expression data, or a multitude of other more specialized and application specific approaches including, but not limited to, chromatin immuno-precipitation-seq, RNA structural sequencing, DNA-methylation, SNPseq, 16s metagenomics, and microbiome characterization.

NGS Workflow

All NGS experiments follow the same generalized workflow that is composed of three basic steps (Fig. 1):

  1. Library preparation — Nucleic acids (DNA or RNA) are fragmented (sonically, chemically, enzymatically, etc.) and ligated to specialized linkers, which include priming sites and indices/barcodes.
  2. Enrichment — Library enrichment using primers designed to the linkers such that amplification only on sequenceable DNA occurs. Quality control and absolute quantification of the library ensures the presence of library before sequencing.
  3. Sequencing and Analysis — DNA is sequenced using a variety of methods based on SBS. Data then is analyzed for quality (usually on board the sequencer) and visualized using various bioinformatic software.

While the above describes the basic steps in any NGS experiment, there are many different NGS platforms. The goal of each NGS platforms is the same large quantities of high quality data; however, the way this data is obtained varies. The Ohio University Genomics Facility houses two different NGS platforms, the Ion Torrent Personal Genome Machine (PGM) and the Illumina MiSeq.

Ion Torrent PGM

SBS cannot be performed on mixed populations of DNA molecules, thus, the starting material must be separated into individual DNA sequences such that one signal detector is detecting a homogenous population of DNA. To achieve this, Ion Torrent uses emulsion PCR to attach single DNA sequences to ISPs (Ion Sphere Particles) and amplify so that the entire surface is covered by a single homogenous DNA population. A reaction mix is set up that contains dNTPs, polymerase, and the necessary buffers. The reaction mix is then loaded into the emulsion PCR machine (the Ion One Touch 2) where it is mixed with reaction oil to form bubbles of H2O within the oil, emulsification. The size of the bubbles generated is only large enough for a single ISP and DNA molecule. After emulsification the PCR proceeds as in standard PCR, with cycles of denaturation, annealing, and extension repeated until the surface of the ISP is covered in copies of the original DNA sequence. It is these template ISPs that are then loaded into the semiconductor chip (describe below).

Ion Torrent uses semiconductor chip technology to sequence DNA by monitoring the pH changes as a result of H+ ion release which occur with the addition of each nucleotide. This monitoring occurs at the level of the sequencing chip (Fig. 3). The sequencing chip consists of the wafer, which is the semiconductor portion and the chip housing, which is just the physical packaging of the wafer (Fig. 3, top pictures). The wafer contains millions of sensor wells that will capture and isolate single ISPs (Fig. 3, bottom right). Each single well (Fig. 3, bottom left) contains the structures needed to sense the H+ ion release as well as fill, drain, and wash the wells of each nucleotide through each cycle.

Each cycle of the sequencing machine a single dNTP is flowed over the chip (and thus the ISPs) so that it can be incorporated into the growing sequencing chain. Addition of a dNTP to the molecule results in the release of a H+, which lowers the pH and changes the conductivity at that sensor (Fig. 4). This change in conductivity is recorded as an addition of a nucleotide. All unincorporated nucleotides then are washed away and the next dNTP is washed through the chip. This is repeated hundreds of times to generate sequences for each well.

During each sequencing flow 3 different scenarios around possible (illustrated in Fig. 5). The dNTP is:

  1. Not complementary to the next base and no dNTP is incorporated.
  2. Complementary to the next base and a single dNTP is incorporated.
  3. Complementary to more than one base and more than one dNTP is incorporated.

The semiconductor chip will register each of the above scenarios as such:

  1. No change in pH so no nucleotide identity is recorded.
  2. Change in pH directly related to the single nucleotide addition.
  3. Larger change in pH, amplitude of the change is directly dependent on the number of nucleotides added to the growing chain.

The changes in pH are recorded and then interpreted by the computer and finally reported as nucleic acid sequences to the researcher.

Illumina MiSeq

Illumina chemistry relies on SBS, the same as Ion Torrent, and therefore, the mixed population of adaptor ligated DNA must be separated such that individual sequences can be detected. While the Ion Torrent technology relies on Ion Sphere Particles and emulsion PCR to immobilize DNA, Illumina uses a flow cell technology and bridge amplification to immobilize the DNA.

The DNA library is denatured, using NaOH, to single stranded DNA, which then is washed over the flow cell. Within the flow cell are millions of spots of oligonucleotides that are complementary to the adaptor sequences and to which the library DNA will bind. After binding to the flow cell, the DNA, due to natural flexibility, will fall over and come in contact with another oligonucleotide that is complementary to the adaptor sequence on the other end of the DNA molecule. The second oligonucleotide is used as a priming site, and a copy of the original DNA sequence is made. This reaction is repeated many times until a small region of the flow cell is filled with a DNA sequence arising from a single piece of DNA in the library. This region is known as a cluster, and the process of forming them is cluster generation, the technique used to form the clusters is known as bridge amplification. After the clusters have been generated on the flow cell, the sequencing step can begin. First, a sequencing primer is flowed through the cell that will prime the DNA from a known sequence within the adaptors that have been added.

At this point, another difference between the Ion Torrent and Illumina technologies becomes apparent. Rather than detecting a change in conductivity based on the addition of 0, 1, or more than 1 nucleotide addition, Illumina uses fluorescently labeled dNTPs in which all four dNTPs are labeled with a different fluorophore. Additionally, each dNTP is designed with a reversible stop that prevents more than one dNTP being added per flow.

During each flow a single nucleotide is added to every cluster on the flow cell. After the nucleotide flow the machine captures an image of the flow cell through two different filters which allow it to identify the nucleotide that has been incorporated at each cluster. Once the image has been captured, the reversible stop is removed from each sequence and another flow of all four dNTPs occurs generating another image. Repeated cycling of these steps leads generates a pool of pictures that represent the entirety of the nucleic acid sequences going into the flow cell. The Illumina system then analyzes all of the images generated to determine the sequence of each cluster and reports those to the researcher.

Regardless of the platform used for NGS, the next step is analysis of the data produced. A large number of bioinformatics resources exist to help interpret the data generated.