|
Home
|
Gene Letter Information Resources
The entire human genome
sequence consists of just four letters, A, T, G and C. A sequence of three
billion letters makes up the human genome. These letters each describe a
different nucleotide base, these are ring-shaped chemical substances that
keep the structure of the DNA double helix together.
They are arranged in
sequence along the DNA strands and form the ‘rungs on the DNA ladder’. The
letter ‘A’ represents Adenine, ‘T’ represents Thymine, ‘G’ stands for
Guanine and finally ‘C’ represents Cytosine.
Two bases link together at the same time to form the rungs on the ladder,
known as base-pairs. Adenine always pairs with Thymine, and Cytosine always
pairs with Guanine since these chemicals fit together. The DNA double helix
is stabilised by hydrogen bonds between the bases attached to the two
strands.
When attached to ribose Adenine forms adenosine, a nucleoside. It forms
deoxyadenosine when attached to deoxyribose. When three phosphate groups are
added to adenosine, Adenine forms adenosine triphosphate, a nucleotide.
Adenosine triphosphate (ATP) can be used within cellular metabolism to
transfer chemical energy between chemical reactions.
Thymine creates the nucleoside deoxythymidine when combined with deoxyribose.
Thymidine mono- di- or triphosphate (TMP, TDP or TTP) can be produced when
Thymidine is phosphorylated with one, two or three phosphoric acid groups.
Guanine was first isolated in 1844 from guano (the excreta of sea birds). At
the time guano was simply used as fertiliser. Fifty years later the
structure of Guanine was determined by Fisher, who showed that uric acid
could be converted into guanine. It is possible to hydrolyse Guanine into
glycine, ammonia, carbon dioxide or carbon monoxide using strong acids.
Guanine oxidises more readily than Adenine, but is relatively insoluble in
water.
Cytosine was discovered in 1894 and was isolated from calf thymus tissues.
The latest use of Cytosine is in quantum computation, where the mechanical
properties of Cytosine can be harnessed to process information.
When using sequencing machines to ‘read’ the sequence of letters from the
DNA, the samples undergo a complex process to make them into a
machine-readable form. By the time they are ready to go through the
sequencing machines, the strands will have been cut into smaller sections,
chemically altered and died with fluorescent pigments so that sensors in the
machine can identify each different nucleotide base.
This process may be
repeated between six and ten times in order to ensure that the sequence that
is produced by the machine is as accurate as possible. Once the individual
letters have been read, the sections are rejoined to form the entire
sequence. This process is called ‘finishing’ and can often take longer than
the reading of the sequence.
Individual genes also have abbreviations consisting of upper-case letters
and numerals. The symbols should be no longer than six characters in length.
The database of known gene abbreviations for humans is known as the Human
Genome Naming Convention (HGNC), and currently contains around 25,000
approved gene symbols. A similar database exists to catalogue all known gene
symbols for mouse DNA. |