Introduction and Application of Transcriptome Sequencing

  • A transcriptome is a collection of all RNAs transcribed from a particular tissue or cell at a certain developmental or functional state, including mainly mRNA and non-coding RNA (ncRNA). The transcriptome studies explore gene function and structure from the overall level and reveal the molecular mechanism in specific biological processes or disease development. Transcriptomic sequencing (RNA-Seq) refers to cDNA sequencing by second-generation high-throughput sequencing technology and obtains a complete and rapid access to almost all transcripts of a given organ or tissue in a certain state. The transcriptome study is the basis and starting point for the study of gene function and structure. Understanding the transcriptome is necessary to interpret the functional elements of the genome and reveal the molecular composition of cells and tissues, and it plays an important role in understanding the development of the body and diseases.

    Although there are many manufacturers of next-generation sequencing machines, the core principles are similar. Basically, high-throughput sequencing is used to obtain RNA sequence information in samples. The more sequences are detected, the higher the expression level is. Taking the Illumina sequencing platform as an example, we briefly introduce how the extracted RNA become a sequencing library and data stored in the hard disk. The following diagram briefly describes the general flow of second-generation sequencing:


    General flow of second-generation sequencing

    Transcriptome sequencing requires the preparation of sequencing libraries with specific sequences and lengths that can be directly analyzed by the sequencer, so we need to combine RNA enrichment, fragmentation, reverse transcription, adapters, PCR amplification, length selection, etc. methods to get sequences with linker sequences (for bridge PCR amplification), Barcode sequences (for distinguishing different samples), and inserted cDNA fragments that are to be tested.

    After the library is constructed, libraries of different samples can be mixed and sequenced. The Illumia sequencing principle is shown in the figure below. A simple summary is to integrate the library into the sequencing chip, and to use PCR to amplify a single sequence into clusters to increase the signal intensity. Then the fluorescence signal of each cluster is collected and converted to the corresponding Bases to obtain sequencing data.


    Figure: The Illumia sequencing principle


    The application of transcriptome technology in biology:

    • Quickly obtain mRNA species and their abundance in organisms, helping researchers to discover new mRNA isoforms resulting from alternative splicing or alternative polyadenylation site selection;
    • Analysis of differential mRNA expression information, through the differential expression of gene function analysis. It can be found in cell differentiation, especially embryonic stem cells and neural stem cell differentiation, body development, signal transduction and other biological processes in the overall process of changes in gene expression characteristics.

    The application of transcriptome technology in medicine:

    • In the course of the onset and development of cancerous and other complex diseases. The intracellular gene expression patterns will change significantly, and there are significant results in the prevention and treatment of diseases.
    • Obtain transcript information of species or tissues; obtain relevant information of genes in transcripts, such as gene structure, function, discover new genes, optimize gene structure, find variable shear or find gene fusion.