An Introduction to Molecular Biology
Protocols, optimization tips,
and more for flow cytometry.
In this article, the following topics will be covered:
- Introduction to Molecular Biology
- DNA and RNA – The Raw Materials of a Molecular Biologist
- From Genes to Proteins
- Transcription Termination
Introduction to Molecular Biology
Molecular biology has been the basis for the understanding of each individual step in the biology central dogma: DNA replication, DNA transcription into RNA, and RNA translation into proteins. These molecules are responsible for giving information to cells of each organism on how to survive and reproduce according to the environmental conditions at each exact moment. All this information is stored in the genetic material of cells and transferred to progeny.
DNA and RNA – The Raw Materials of a Molecular Biologist
The genetic information in cells is present in nucleic acids, such as DNA and RNA. The deoxyribonucleic acid (DNA) carries the genetic code of a single cell, whereas the ribonucleic acid (RNA) is a molecule that converts this information into amino acid sequences of proteins. Genetic information relies on the sequence of monomers of nucleic acids. Thus, unlike polysaccharides and lipids that are normally formed by long repeated unities, nucleic acids are informational macromolecules. These monomers are known as nucleotides, and consequently, DNA and RNA are polynucleotides. A nucleotide is composed of three components: one pentose (ribose for RNA and deoxyribose for DNA), one nitrogenated base and one phosphate group. The structure of DNA and RNA nucleotides are very similar. The nitrogenated bases are purines (adenine and guanine, which contain two heterocyclic rings) or pyrimidines (thymine, cytosine and uracil, which contain one single heterocyclic ring).
Guanine, adenine, and cytosine are in the composition of DNA and RNA. Excluding few exceptions, thymine is present only in DNA and uracil is present only in RNA. The nitrogenated base is linked to sugar pentose by a glycosidic bond between the carbon atom of the sugar and the nitrogen atom from the base. When nitrogenated base is linked to a sugar, it is denominated nucleoside. For that reason, nucleotides, are nucleosides with an addition of one or more phosphate groups.
In 1953, based on results from x-ray diffraction studies done by Williams and Franklin, the scientists Watson and Crick proposed a structural model for DNA. This model embraces both chemical and biological DNA properties, namely the replication capacity of this molecule. According to this model, the DNA molecule is formed by two helicoidal strands linked by hydrogen bonds between the nitrogenated bases of each strand. When the nitrogenated bases face each other, hydrogen bonds are formed. These hydrogen bonds are more stable between adenine and thymine, and guanine and cytosine. Thus, the adenine of one strand pairs with a thymine of the other strand and the same for guanine and cytosine. The resultant double helix strands are antiparallel, which means that the inter bonds 3′-5′ phosphodiester have opposite directions. Each round is composed by 10.5 pairs of nucleotides and measures 3.4 mm.
The size of a DNA molecule is defined as the number of nucleotide base pairs. Thus, a DNA molecule with 1000 nucleotides contains 1 kilobase (Kb). If the DNA molecule is in double helix structure, we use the base pairs nucleotide nomenclature. For instance, the bacteria Escherichia coli has around 4640 Kbp of DNA in its chromosome. Each base pair has 0.34 nanometers of length along the double helix, and each round is about 10 bp, which means that 1 Kbp of DNA represents 100 rounds, measuring 0.34 µm of length. The E. coli genome has 4640 bp and is 1.58 mm in total length. Since E. coli cells are only approximately 2 µm, the chromosome is much larger than its own cell size. For that reason, DNA needs to be compressed and packed to fit inside the cell.
The main genetic elements in cells are the chromosomes, but there are also other genetic elements such as viral genomes, plasmids, organelle genomes, and transposable elements. In prokaryotes, usually there is a single circular chromosome, while in the eukaryotes, genomes are organized in several chromosomes. Plasmids are genetic elements which replicate independently from the cell chromosomes. Usually plasmids are composed of double helix DNA molecules (circular or linear).
Transposable elements or jumping genes are DNA segments that are able to move from one site of a DNA molecule to another site of the same molecule or another distinct DNA molecule. Transposons are not found as individual DNA molecules, instead these elements are found inserted in other DNA molecules, such as chromosomes, plasmids, and viral genomes. Transposable elements are present in both eukaryotes and prokaryotes, and play an important role in genetic variation.
From Genes to Proteins
The gene is the basic and functional unit of genetic information. Genes are present in chromosomes or other big molecules, also referred to as genetic elements. In modern biology, the classification of organisms is made according to their genetic material composition and variability.
When genes are expressed, the genetic information stored in DNA is transferred to RNA. There are different RNA types, but only three cooperate for protein synthesis. The messenger RNA (mRNA) is a single stranded molecule that carries genetic information from the DNA to the ribosome, which is responsible for protein synthesis. The transference RNA (tRNA) convert the genetic information of RNA nucleotides into amino acid sequences of proteins. The ribosomal RNA (rRNA) is an important catalytic and structural component of ribosomes.
The molecular processes of genetic information can be divided into three stages:
- Replication: During replication, the DNA double helix is duplicated, producing two copies. Replication occurs through the action of the polymerase enzyme.
- Transcription: The transference of genetic information from DNA to RNA is called transcription. Transcription occurs by the catalytic action of RNA polymerase.
- Translation: Protein synthesis using genetic information contained in mRNA is denominated translation.
As shown in the image below, during replication, the DNA double helix is duplicated through the action of the DNA polymerase enzyme.
Many molecules different from RNA are transcribed from relatively short regions of DNA molecule. In eukaryotes, each gene is transcribed, generating mRNA, whereas in prokaryotes one single mRNA molecule can carry genetic information from several genes. There is a linear correlation between the nucleotide sequence of one gene and the amino acid sequence of a polypeptide. Each group of three nucleotides present in mRNA codes for one single amino acid, and each nucleotide triplet is referred to as a codon. Codons are translated into amino acid sequences by ribosomes, tRNA, auxiliary proteins, and translation factors.
Here is a simple graphic depiction of the transcription and translation processes:
Now we will go into more detail about these processes, and briefly mention their differences.
DNA replication is used by cells to allow cell division, either in reproduction or in the duplication of new cells in multicellular or unicellular organisms. The complexity of DNA replication process requires the involvement of a great number of specific enzymes. DNA is present in cells as a double helix molecule and when this helix is unwinding, a newly synthesized strand emerges along with a parental strand. The DNA strand used to produce the complementary strand is referred to as the template strand, which is used to synthesize complementary strands of each parental strand. Thus, replication is a semi-conservative process.
The following diagram shows a simple color coded depiction of this semi-conservative process.
The precursor of each new nucleotide in the DNA strand corresponds to a deoxynucleoside 5’-triphosphate. During replication, two terminal phosphates are removed and the internal phosphate is covalently bonded to the deoxyribose of the raising DNA strand. The nucleotide addition requires the presence of a free hydroxyl group, which is available only at the correct end, so the addition of the nucleotide phosphate group bonds with the 3’-hydroxyl(OH) of the previous nucleotide. The enzymes that catalyze the addition of deoxy ribonucleotides are denominated DNA polymerases. There are several types of these enzymes, each one with a specific role. All known DNA polymerases work in 5’ to 3’ direction, but none of them are able to start DNA synthesis alone. Since DNA polymerase can only add nucleotides to the 3’-OH, in order to start a new strand, it requires a primer.
The primer is a nucleic acid molecule in which the DNA polymerase can add a nucleotide to. Often, the primer is a small RNA fragment instead of DNA. When the double helix is unwinding at the beginning of replication, an enzyme of RNA polymerization (primase) synthesizes the RNA primer with 11-12 nucleotides, which is complementary to the DNA template strand. At the end of the RNA primer, there is a 3’-OH group in which the DNA polymerase adds the first deoxy ribonucleotide. Later the RNA primer will be removed and replaced by DNA.
Before DNA polymerase synthesizes a new DNA strand, the existent DNA double helix needs to undergo an unwinding process to expose the template strand. The unpackaged region is where replication will start, and it is designated the replication fork. The DNA helicase enzyme is responsible unwinding and separating the DNA double helix strands in an ATP-dependent process, exposing a small region of a single strand. Helicase can move along the double helix structure right at the front of the replication fork. There are specific regions for replication to get started, also known as replication origins.
The replication process always happens in 5’ to 3’ direction, which means that a new nucleotide is added to the 3’-OH group of the raising DNA strand. For this reason, the strand being synthesized uses the 3’-5’ strand as a template, and we call it the leading strand (continuous strand). DNA synthesis occurs continuously since there is always a free 3’-OH group. On the other hand, in the newly synthesized strand using as template the 5’-3’ DNA strand, the DNA synthesis occurs in a discontinuous process. It does not have available a free 3’-OH for a nucleotide to be added. Thus, in the lagging strand, it is necessary for primase to synthesize the primer repeatedly to make available a free 3’-OH group. In the continuous strand only one primer is required at the beginning of DNA synthesis, but the lagging strand is synthesized in short fragments, also known as Okazaki fragments. These fragments are posteriorly fused, generating a continuous strand.
A complex of proteins including DNA polymerase attaches to the DNA strand at the replication fork, and slides along the DNA template strand. Two DNA polymerases and protein complexes are required for DNA replication (one for each strand) at the replication fork. After the synthesis of the leading strand and lagging strand, a DNA polymerase with exonuclease activity is necessary to remove the RNA primer and add complementary DNA nucleotides. The last phosphodiester ligation is done by a DNA ligase enzyme. Ligase enzyme joins the DNA cuts that contain a 5’-PO4 and an adjacent 3’-OH group.
Transcription is the synthesis of ribonucleic acid RNA using DNA as template. There are three significant chemical differences between RNA and DNA:
- The RNA contains the ribose sugar instead of deoxyribose
- RNA has uracil replacing the thymine from DNA
- And excluding some viruses, RNA is not found as a double strand
The replacement of deoxyribose by ribose affects the chemical properties of the nucleic acid and generally enzymes that catalyze reactions in DNA do not have any action in RNA (and vice-versa). Meanwhile, the substitution of thymine by uracil does not affect the base pairing, since both thymine and uracil pair with adenine with the same efficiency.
All RNA molecules are the product of DNA transcription. They play a role at two different levels, genetic and functional. At the genetic level, the mRNA carries the genetic information from the genome to the ribosome. On the other hand, the rRNA play a functional and structural role in the ribosomes and the tRNA are responsible for transporting the amino acids for protein synthesis. Some RNA molecules, including rRNA, can have enzymatic activity.
The transcription of genetic information is done by an RNA-polymerase enzyme in a similar fashion as DNA-polymerase does in DNA replication. RNA polymerase catalyzes the formation of phosphodiester between ribonucleotides. This is driven at the expense of energy released by hydrolysis of two phosphate bonds from ribonucleotides. Similarly to DNA synthesis, RNA synthesis is performed in the 5’-3’ direction, with ribonucleotides being added to a free 3’-OH group from a previous ribonucleotide.
Unlike DNA polymerase, RNA polymerase is able to start new strands independently. Consequently, there is no need for a primer. In order to start RNA synthesis, it is necessary that RNA polymerase recognizes the DNA initiation sequences, also referred to as promoters. After RNA polymerase binds the promoter, transcription is allowed to start. In this process, the DNA double helix at the promoter region is unwound by RNA polymerase, exposing the DNA template strand. When a DNA region has two close promoters, with opposite directions, transcription takes place using both DNA strands as templates but in different directions. When the newly synthesized RNA strand dissociates from DNA, the unwinding DNA closes again back to its original double helix structure.
RNA polymerase uses DNA double strand (dsDNA) as template, however, only one of the strands is transcribed for each gene. Compared to DNA replication where all genomic DNA is replicated, in transcription only small DNA fragments are transcribed. Usually, these fragments correspond to one gene. This system allows the cell the possibility to transcribe different genes, at distinct frequencies depending on the cell's requirements.
Usually, the transcription process involves the transcription of genes required for the cell at that exact time, which means that it is critical that transcription is finished at the right spot. This termination process is controlled by specific sequences of nucleotides in the DNA template strand.
Unlike bacteria, a great number of transcripts from eukaryotes have introns (unnecessary regions for translation), which will require further RNA processing. This RNA processing is called splicing and occurs in the cell’s nucleus. The splicing involves a protein complex, the spliceosome, which is responsible for removing the introns from transcripts and joining the remaining sequences, referred to as exons.
Additionally, there are two other steps for mRNA processing in Eukaryotes. The first one is the capping process, which occurs before transcription termination. The capping process is the addition of a methylated guanine nucleotide cap at the 5’-end of the pre-mRNA. This cap will be crucial for translation initiation. The second step includes cleavage of transcript 3’-end, followed by addition of 100-200 adenylate residues, in a poly-A tail. This tail gives stability to the mRNA and its degradation will be required to allow RNA degradation.