Gene
From Wikipedia, the free encyclopedia
-
For other senses of this term, see gene (disambiguation).
A gene is the unit of heredity in living organisms. Genes are encoded in an organism's genome, composed of DNA or RNA, and direct the physical development and behavior of the organism. Most genes encode proteins, which are biological macromolecules comprising linear chains of amino acids that affect most of the chemical reactions carried out by the cell. Some genes do not encode proteins, but produce non-coding RNA molecules that play key roles in protein biosynthesis and gene regulation. Molecules that result from gene expression, whether RNA or protein, are collectively known as gene products.
Most genes contain non-coding regions, that do not code for the gene products, but often regulate gene expression. A critical non-coding region is the promoter, a short DNA sequence that is required for initiation of gene expression. The genes of eukaryotic organisms often contain non-coding regions called introns which are removed from the messenger RNA in a process known as splicing. The regions that actually encode the gene product, which can be much smaller than the introns, are known as exons.
[edit] Two attempts of defining genes
There are two perspectives on the definition: a gene is defined either genetically as a region of non-complementation or physically as a string of nucleotides containing information needed for a gene to function. The former is the older definition. The physical basis of genes was discovered later. Although classical genetics and evolutionary biology use the term "gene" to refer to a conceptual entity or "unit of inheritance", modern molecular genetics typically uses the term to refer to a physical entity.
The Sequence Ontology Project, an effort directed by the larger Gene Ontology system, defines a gene physically as "a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions".[1]
The term gene is changing as biological evidence accumulates. For example, a gene was previously defined as coding for a protein. This is no longer accepted by many researchers. Instead many now include genes that produce only RNA and not protein (see dedicated section below).
The word "gene" is also used in common speech to refer to the inheritance of a trait, as in "a cancer gene" or "the gene for obesity". However, biologists rarely use the term in this sense because it is highly unlikely that such complex phenomena would be caused by a single gene.
The word "gene" was coined in 1909 by Danish botanist Wilhelm Johannsen for the fundamental physical and functional unit of heredity.[2] The word was derived from Hugo De Vries' term pangen, itself a derivative of the word pangenesis coined by Darwin (1868).[3] The word pangenesis is made from the Greek words pan (a prefix meaning "whole", "encompassing") and genesis ("birth") or genos ("origin").
[edit] Genes are stored as RNA or DNA
Most living organisms carry their genes and transmit them to offspring as DNA, but some viruses carry only RNA. Because they use RNA to store genes, their cellular hosts may synthesize their proteins as soon as they are infected and without the delay in waiting for transcription. On the other hand, RNA retroviruses, such as HIV, require the reverse transcription of their genome from RNA into DNA before their proteins can be synthesized.
In 2006, French researcher came across a puzzling example of RNA-mediated inheritance in mouse. Mice with a loss-of-function mutation in the gene Kit have white tails. Offspring of these mutants can have white tails despite having only normal Kit genes. The research team traced this effect back to mutated Kit RNA [4]. While RNA is common as genetic storage material in viruses, in mammals in particular RNA inheritance has been observed very rarely.
[edit] Genes produce proteins or RNA
[edit] Protein-coding genes
In molecular biology, a gene is a region of DNA (or RNA, in the case of some viruses) that determines the amino acid sequence of a protein (the coding sequence) and the surrounding sequence that controls when and where the protein will be produced (the regulatory sequence). The genetic code determines how the coding sequence is converted into a protein sequence. The protein-coding regions of genes are composed of a series of three-nucleotide sequences called codons. Each codon specifies a particular amino acid to be added to the protein chain; thus genes determine the protein's primary structure. Most genes are expressed in a two-stage process: first, the DNA is transcribed by enzymes known as RNA polymerases to produce an RNA molecule known as messenger RNA (mRNA), and second, the mRNA is translated by specialized cellular machinery known as the ribosome into a polypeptide chain that then folds into a functional protein. The genetic code is essentially the same for all known life, from bacteria to humans.
Through the proteins they encode, genes govern the cells in which they reside. In multicellular organisms, they control the development of the individual from the fertilized egg and the day-to-day functions of the cells that make up tissues and organs. The roles of their protein products range from mechanical support of the cell structure to the transportation and manufacture of other molecules and to the regulation of other proteins' activities.
[edit] RNA-coding genes
In most cases, RNA is an intermediate product in the process of manufacturing proteins from genes. However, for some gene sequences, the RNA molecules are the actual functional products. For example, RNAs known as ribozymes are capable of enzymatic function, and miRNAs have a regulatory role. The DNA sequences from which such RNAs are transcribed are known as non-coding RNA, or RNA genes.
RNA-coding genes is a relatively recent concept. It breaks with the paradigm that all genes make proteins. Instead this recent hypothesis states that some genes exert their function via their RNA product. However, it is likely
[edit] Mutations change genes
Due to rare, spontaneous errors (e.g. in DNA replication), mutations in the sequence of a gene may arise. Once propagated to the next generation, this mutation may lead to variations within a species' population. Variants of a single gene are known as alleles, and differences in alleles may give rise to differences in traits, for example eye colour. A gene's most common allele is called the wild type allele, and rare alleles are called mutants. (However, this does not imply that the wild-type allele is the ancestor from which the mutants are descended.)
[edit] Many steps lie between the gene and its effect
For various reasons, the relationship between DNA strand and a phenotype trait is not direct. The same DNA strand in two different individuals may result in different traits because of the effect of other DNA strands or the environment.
- The DNA strand is expressed into a trait only if it is transcribed to RNA. Because the transcription starts from a specific base-pair sequence (a promoter) and stops at another (a terminator), our DNA strand needs to be correctly placed between the two. If not, it is considered as junk DNA, and is not expressed.
- cells regulate the activity of genes in part by increasing or decreasing their rate of transcription. Over the short term, this regulation occurs through the binding or unbinding of proteins, known as transcription factors, to specific non-coding DNA sequences called regulatory elements. Therefore, to be expressed, our DNA strand needs to be properly regulated by other DNA strands.
- The DNA strand may also be silenced through DNA methylation or by chemical changes to the protein components of chromosomes (see histone). This is a permanent form of regulation of the transcription.
- The RNA is often edited before its translation into a protein. Eukaryotic cells splice the transcripts of a gene, by keeping the exons and removing the introns. Therefore, the DNA strand needs to be in an exon to be expressed. Because of the complexity of the splicing process, one transcribed RNA may be spliced in alternate ways to produce not one but a variety of proteins (alternative splicing) from one pre-mRNA. Prokaryotes produce a similar effect by shifting reading frames during translation.
- The translation of RNA into a protein also starts with a specific start and stop sequence.
- Once produced, the protein interacts with the many other proteins in the cell, according to the cell metabolism. This interaction finally produces the trait.
[edit] Genes are stored on chromosomes often in duplicate
All the genes and intervening DNA together make up the genome of an organism, which in many species is divided among several chromosomes and typically present in two or more copies. The location (or locus) of a gene and the chromosome on which it is situated is in a sense arbitrary. Genes that appear together on the chromosomes of one species, such as humans, may appear on separate chromosomes in another species, such as mice. Two genes positioned near one another on a chromosome may encode proteins that figure in the same cellular process or in completely unrelated processes. As an example of the former, many of the genes involved in spermatogenesis reside together on the Y chromosome.
Many species carry more than one copy of their genome within each of their somatic cells. These organisms are called diploid if they have two copies or polyploid if they have more than two copies. In such organisms, the copies are practically never identical. With respect to each gene, the copies that an individual possesses are liable to be distinct alleles, which may act synergistically or antagonistically to generate a trait or phenotype. The ways that gene copies interact are explained by chemical dominance relationships (see the articles on genetics, allele).
In the case of viruses the term chromosome is rarely used. Here the most common term is RNA or DNA genome.
[edit] The genome contains most genes of an organism
organism | genes | base pairs |
---|---|---|
Plant | <50,000 | <1011 |
Human, mouse or rat | 25,000 | 3×109 |
Fugu fish | 40,000 | 4x108 |
Fruit Fly | 13,767 | 1.3×108 |
Worm | 19,000 | 9.7×107 |
Fungus | 6,000 | 1.3×107 |
Bacterium | 500–6,000 | 5×105–107 |
Mycoplasma genitalium | 500 | 580,000 |
DNA virus | 10–900 | 5,000–800,000 |
RNA virus | 1–25 | 1,000–23,000 |
Viroid | 0–1 | ~500 |
The attached table gives typical numbers of genes and genome size for some organisms. Estimates of the number of genes in an organism are somewhat controversial because they depend on the discovery of genes, and no techniques currently exist to prove that a DNA sequence contains no gene. (In early genetics, genes could be identified only if there were mutations, or alleles.) Nonetheless, estimates are made based on current knowledge.
In most eukaryotic species, very little of the DNA in the genome encodes proteins, and the genes may be separated by vast sequences of so-called junk DNA. Moreover, the genes are often fragmented internally by non-coding sequences called introns, which can be many times longer than the coding sequence. Introns are removed on the heels of transcription by splicing. In the primary molecular sense, they represent parts of a gene, however.
Most organisms have more than one storage site for their genes. Bacteria, for example, store most of their genes in a circular double-stranded piece of DNA while some genes are stored in small plasmids. Usually the term bacterial genome does not include these plasmids. Eukaryotic cells store most of their genes in the nuclear genome composed of chromosomes while a few genes reside in the stripped-down DNA repositories of organelles like mitochondria.
[edit] The concept of a gene is still changing
When trying to understand the concept of a gene, keep in mind that it is not static. It has evolved considerably from a scarcely explained "unit of inheritance" without a physical basis (see history section) to a usually DNA-based unit that can exert its effects on the organism through RNA or protein products. It was also previously believed that one gene makes one protein. This concept has been overthrown by the discovery of alternative splicing.
And the definition of gene is still changing. The first cases of RNA-based inheritance have been discovered in mammals [4]. In plants, cases of traits reappearing after several generation of absence have lead researchers to hypothesise RNA-directed overwriting of genomic DNA [5]. Evidence is also accumulating that the control regions of a gene do not necessarily have to be close to the coding sequence on the linear molecule or even on the same chromosome. Spilianakis and colleagues discovered that the promoter region of the IFN-γ gene on chromosome 10 and the regulatory regions of the T(H)2 cytokine locus on chromosome 11 come into close proximity in the nucleus maybe to co-regulate [6].
The concept that genes are clearly limited is also being eroded. There is evidence for fused proteins stemming from two adjacent genes that can produce two separate protein products. While it is not clear whether these fusion proteins are functional, the phenomena is more frequent than previously thought [7]. Even more ground-breaking than the discovery of fused genes is the observation that some proteins can be composed of exons from far away regions and even different chromosomes [8].
[edit] Evolutionary concept of a gene
George C. Williams first explicitly advocated the gene-centric view of evolution in his 1966 book Adaptation and Natural Selection. He proposed an evolutionary concept of gene to be used when we are talking about natural selection favoring some genes. The definition is: "that which segregates and recombines with appreciable frequency." According to this definition, even an asexual genome could be considered a gene, insofar it have an appreciable permanency through many generations.
The difference is: the molecular gene transcribes as a unit, and the evolutionary gene inherits as a unit.
Richard Dawkins' The Selfish Gene and The Extended Phenotype defended the idea that the gene is the only replicator in living systems. This means that only genes transmit their structure largely intact and are potentially immortal in the form of copies. So, genes should be the unit of selection. In River Out of Eden, Dawkins further refined the idea of gene-centric selection by describing life as a river of compatible genes flowing through geological time. Scoop up a bucket of genes from the river of genes, and we have an organism serving as temporary bodies. A river of genes may fork into two branches representing two non-interbreeding species as a result of geographical separation.
[edit] History
The existence of genes was first suggested by Gregor Mendel, who, in the 1860s, studied inheritance in pea plants and hypothesized a factor that conveys traits from parent to offspring. Although he did not use the term gene, he explained his results in terms of inherited characteristics. Mendel was also the first to hypothesize independent assortment, the distinction between dominant and recessive traits, the distinction between a heterozygote and homozygote, and the difference between what would later be described as genotype and phenotype. Mendel's concept was finally named when Wilhelm Johannsen coined the word gene in 1909.
In the early 1900s, Mendel's work received renewed attention from scientists. In 1910, Thomas Hunt Morgan showed that genes reside on specific chromosomes. He later showed that genes occupy specific locations on the chromosome. With this knowledge, Morgan and his students began the first chromosomal map of the fruit fly Drosophila. In 1928, Frederick Griffith showed that genes could be transferred. In what is now known as Griffith's experiment, injections into a mouse of a deadly strain of bacteria that had been heat-killed transferred genetic information to a safe strain of the same bacteria, killing the mouse.
In 1941, George Wells Beadle and Edward Lawrie Tatum showed that mutations in genes caused errors in certain steps in metabolic pathways. This showed that specific genes code for specific proteins, leading to the "one gene, one enzyme" hypothesis. Oswald Avery, Collin Macleod, and Maclyn McCarty showed in 1944 that DNA holds the gene's information. In 1953, James D. Watson and Francis Crick demonstrated the molecular structure of DNA. Together, these discoveries established the central dogma of molecular biology, which states that proteins are translated from RNA which is transcribed from DNA. This dogma has since been shown to have exceptions, such as reverse transcription in retroviruses.
Richard Roberts and Phillip Sarp discovered in 1977 that genes can be split into segments. This leads to the idea that one gene can make several proteins. Recently (as of 2003-2006), biological results let the notion of gene appear more slippery. In particular, genes do not seem to sit side by side on DNA like discrete beads. Instead, regions of the DNA producing distinct proteins may overlap, so that the idea emerges that "genes are one long continuum".[9]
[edit] Human gene nomenclature
For each known human gene the HUGO Gene Nomenclature Committee (HGNC) approve a gene name and symbol (short-form abbreviation). All approved symbols are stored in the HGNC Database. Each symbol is unique and each gene is only given one approved gene symbol. It is necessary to provide a unique symbol for each gene so that people can talk about them. This also facilitates electronic data retrieval from publications. In preference each symbol maintains parallel construction in different members of a gene family and can be used in other species, especially the mouse.
[edit] See also
- DNA
- Gene-centric view of evolution
- Gene expression
- Gene therapy
- Gene family
- Genetic programming
- Genetic algorithm
- Genetics
- Genomes
- Genomes#Minimal genomes
- Genomics
- Homeobox
- Human Genome Project
- List of notable genes
- Meme
- Memetics
- Protein
- Pseudogene
- RNA
[edit] References
- ^ Sequence Ontology term browser: gene / SO:0000704
- ^ The Human Genome Project Timeline. Retrieved on 2006-09-13.
- ^ Darwin C. (1868). Animals and Plants under Domestication (1868).
- ^ a b Rassoulzadegan and colleagues (2006) RNA-mediated non-mendelian inheritance of an epigenetic change in the mouse. PMID 16724059
- ^ Lolle & colleagues (2005) Genome-wide non-mendelian inheritance of extra-genomic information in Arabidopsis. PMID 15785770
- ^ Spilianakis & colleagues (2005) Interchromosomal associations between alternatively expressed loci. PMID 15880101
- ^ Parra & colleagues (2006) Tandem chimerism as a means to increase protein complexity in the human genome. PMID 16344564
- ^ Kapranov & colleagues (2005) Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. PMID 15998911
- ^ Pearson H. (2006). Genetics: What is a gene? Nature 441(7092): 398-401.
- Dawkins, Richard (1990). The Selfish Gene. Oxford University Press. ISBN 0-19-286092-5. Google Book Search
- Dawkins, Richard (1995). River Out of Eden. Basic Books. ISBN 0-465-06990-8.
[edit] External links
- Science aid: Genetics Genetics for beginners/teens
- HUGO Gene Nomenclature Committee, HGNC
- Frederick Sanger, Gene Sequencing Freeview video interview with John Sanger and John Walker by the Vega Science Trust.
- Human Genome Organisation, HUGO
- Recount slashes number of human genes (from New Scientist magazine)
- National Human Genome Research Institute — News Release
- Nature - 21 October 2004 — Finishing the euchromatic sequence of the human genome
- Rat Genome
- Stanford Encyclopedia of Philosophy entry
- iHOP - Information Hyperlinked over Proteins
- UniProt
- IDconverter - Map your ids to other known public DBs
- Entrez Gene - A searchable database of genes
[edit] Wikibooks