Skip to main content
Log in

Structure of vertebrate genes: A statistical analysis implicating selection

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

This paper conducts a statistical analysis of the size distributions of exons and six other gene parts [the transcription unit, introns, intervening DNA (sum of introns), mRNA (sum of exons), and leader and trailer regions of mRNA] as well as the number of exons, the percentage of introns, the placement of introns within the gene, and the potential for frameshifts from coding exon shifts. The first seven variables measured in base pairs fit lognormal distributions. Significant correlations between the sizes of intervening DNA and mRNA, the sizes of leader and trailer regions, and the sizes of introns and flanking exons exist. Introns occur at nonrandom frequencies within the codon frame, in untranslated regions, and relative to the frameshift potential from exon movement or duplication. These nonrandom patterns in gene structure demonstrate that models of gene evolution must incorporate selective processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bell GI, Sanchez-Pescador R, Laybourn PJ, Jajarian RC (1983) Exon duplication and divergence in the human preproglucagon gene. Nature 304:368–371

    PubMed  Google Scholar 

  • Bell GI, Quinto C, Quiroga M, Valenzuela P, Craik CS, Rutter WJ (1984) Isolation and sequence of a rat chymotrypsin B gene. J Biol Chem 259:14265–14570

    PubMed  Google Scholar 

  • Bensi G, Raugei G, Klefenz H, Cortese R (1985) Structure and expression of the human haptoglobin locus. EMBO J 4:119–126

    PubMed  Google Scholar 

  • Blake CCF (1978) Do genes-in-pieces imply proteins-in-pieces? Nature 273:267–268

    PubMed  Google Scholar 

  • Blake CCF (1983) Exons—present from the beginning? Nature 306:535–537

    PubMed  Google Scholar 

  • Blake CCF (1985) Exons and the evolution of proteins. Int Rev Cytol 93:149–185

    PubMed  Google Scholar 

  • Bodner M, Fridkin M, Gozes I (1985) Coding sequences for vasoactive intestinal peptide and PHM-27 peptide are located on two adjacent exons in the human genome. Proc Natl Acad Sci USA 82:3548–3551

    PubMed  Google Scholar 

  • Brown JR, Daar IO, Krug JR, Maquat LE (1985) Characterization of the functional gene and several processed pseudogenes in the human triosephosphate isomerase gene family. Mol Cell Biol 5:1694–1706

    PubMed  Google Scholar 

  • Burgess DG, Penhoet EE (1985) Characterization of the chicken aldolase B gene. J Biol Chem 260:4604–4614

    PubMed  Google Scholar 

  • Campbell RD, Porter RP (1983) Molecular cloning and characterization of the gene coding for human complement protein factor B. Proc Natl Acad Sci USA 80:4464–4468

    PubMed  Google Scholar 

  • Campbell RS, Rosen JM (1984) Comparison of the whey acidic protein genes of the rat and mouse. Nucleic Acids Res 12:8685–8697

    PubMed  Google Scholar 

  • Cavalier-Smith T (1985) Selfish DNA and the origin of introns. Nature 315:283–284

    PubMed  Google Scholar 

  • Cech TR (1986) The generality of self-splicing RNA: relationship to nuclear mRNA splicing. Cell 44:207–210

    PubMed  Google Scholar 

  • Chan SJ, Episkopou V, Zeitlin S, Karathanasis SK, MacKrell A, Steiner DF, Efstratiadis A (1984) Guinea pig preproinsulin gene: an evolutionary compromise? Proc Natl Acad Sci USA 81:5046–5050

    PubMed  Google Scholar 

  • Chang HC, Seki T, Moriuchi T, Silver J (1985) Isolation and characterization of mouseThy-1 genomic clones. Proc Natl Acad Sci USA 82:3819–3823

    PubMed  Google Scholar 

  • Chiu I-M, Reddy EP, Givol D, Robbins KC, Tronick SR, Aaronson SA (1984) Nucleotide sequence analysis identifies the human c-sis proto-oncogene as a structural gene for plateletderived growth factor. Cell 37:123–129

    PubMed  Google Scholar 

  • Cooke NE, Baxter JD (1982) Structural analysis of the prolactin gene suggests a separate origin for its 5′ end. Nature 297:603–606

    PubMed  Google Scholar 

  • Craik CL, Choo Q-L, Swift GH, quinto C, MacDonald RJ, Rutter WJ (1984) Structure of two related rat pancreatic trypsin genes. J Biol Chem 259:14255–14264

    PubMed  Google Scholar 

  • Crouse GF, Simonsen CC, McEwan RN, Schimke RT (1982) Structure of amplified normal and variant dihydrofolate reductase genes in mouse sarcoma S180 cells. J Biol Chem 257:7887–7897

    PubMed  Google Scholar 

  • Das HK, Lawrence SK, Weissmann SM (1983) Structure and nucleotide sequence of the heavy chain gene of HLA-DR. Proc Natl Acad Sci USA 80:3543–3547

    PubMed  Google Scholar 

  • Davies PL, Hough C, Scott GK, Ng N, White BN, Hew CL (1984) Antifreeze protein genes of the winter flounder. J Biol Chem 259:9241–9247

    PubMed  Google Scholar 

  • Degen SJF, MacGillivray TTA, Davie EW (1983) Characterization of the complementary deoxyribonucleic acid gene coding for human prothrombin. Biochemistry 22:2087–2097

    PubMed  Google Scholar 

  • Deschenes RJ, Haun RS, Funckes CL, Dixon JE (1985) A gene encoding rat cholecystokinin. J Biol Chem 260:1280–1286

    PubMed  Google Scholar 

  • Doolittle RF (1985) The genealogy of some recently evolved vertebrate proteins. Trends Biochem Sci 10:233–237

    Google Scholar 

  • Dudov KP, Perry RP (1984) The gene family encoding the mouse ribosomal protein L32 contains a uniquely expressed intron-containing gene and an unmutated processed gene. Cell 37:457–468

    PubMed  Google Scholar 

  • Dull TJ, Gray A, Hayflick JS, Ullrich A (1984) Insulin-like growth factor II precursor gene organization in relation to insulin gene family. Nature 310:777–781

    PubMed  Google Scholar 

  • Dush MK, Sikela JM, Khan SA, Tischfield JA, Stanbrook PJ (1985) Nucleotide sequence and organization of the mouse adenine phosphoribosyltransferase gene: presence of a coding region common to animal and bacterial phosphoribosyltransferases that has a variable intron/exon arrangement. Proc Natl Acad Sci USA 82:2731–2735

    PubMed  Google Scholar 

  • Evans BA, Richards RI (1985) Genes for the α and γ subunits of nerve growth factor are contiguous. EMBO J 4:133–138

    PubMed  Google Scholar 

  • Fiddes JC, Goodman HM (1981) The gene encoding the common alpha subunit of the four human glycoprotein hormones. J Mol Appl Genet 1:3–18

    PubMed  Google Scholar 

  • Fornace AJ Jr, Cummings DE, Comeau CM, Kant JA, Crabtree GR (1984) Structure of the human γ-fibrinogen gene. J Biol Chem 259:12826–12830

    PubMed  Google Scholar 

  • Foster DC, Yoshitake S, Davie EW (1985) The nucleotide sequence of the gene for human protein C. Proc Natl Acad Sci USA 82:4673–4677

    PubMed  Google Scholar 

  • Gilbert W (1978) Why genes in pieces? Nature 271:501

    PubMed  Google Scholar 

  • Gilbert W (1985) Genes-in-pieces revisited. Science 228:823–824

    PubMed  Google Scholar 

  • Gitschier J, Wood WI, Goralka TM, Wion KL, Chen EY, Eaton DH, Vehar GA, Capon DJ, Lawn RM (1984) Characterization of the factor VIII gene. Nature 312:326–330

    PubMed  Google Scholar 

  • Gray PW, Goeddel DV (1982) Structure of the human immune interferon gene. Nature 298:859–863

    PubMed  Google Scholar 

  • Hall JL, Cowan NJ (1985) Structural features and restricted expression of a human α-tubulin gene. Nucleic Acids Res 13:207–223

    PubMed  Google Scholar 

  • Harris SE, Mansson P-E, Tully DR, Burkhart B (1983) Seminal vesicle secretion IV gene: allelic difference due to a series of 20-base-pair direct tandem repeats within an intron. Proc Natl Acad Sci USA 80:6460–6464

    PubMed  Google Scholar 

  • Heilig R, Muraskowsky R, Kloepfer C, Mandel JL (1982) The ovalbumin gene family; complete sequence and structure of the Y gene. Nucleic Acids Res 14:4363–4382

    Google Scholar 

  • Heinrich G, Kronenberg HM, Potts JT Jr, Habener JF (1984) Gene encoding parathyroid hormone. J Biol Chem 259:3320–3329

    PubMed  Google Scholar 

  • Hudson P, Haley J, John M, Cronk M, Crawford R, Haralambidis J, Treagear G, Shine J, Niall N (1983) Structure of a genomic clone encoding biologically active human relaxin. Nature 301:628–631

    PubMed  Google Scholar 

  • Ito R, Sato K, Helmer T, Jay G, Agarwal K (1984) Structural analysis of the gene encoding human gastrin: the large intron contains anAlu sequence. Proc Natl Acad Sci USA 81:4662–4666

    PubMed  Google Scholar 

  • Jameson L, Chin WW, Hollenberg AN, Chang AS, Habener JF (1984) The gene encoding the β-subunit of rat luteinizing hormone. J Biol Chem 259:15474–15480

    PubMed  Google Scholar 

  • Jones WK, Yu-Lee L, Clift SM, Brown TL, Rosen JM (1985) The rat casein multigene family. J Biol Chem 260:7042–7050

    PubMed  Google Scholar 

  • Jung A, Sippel AE, Grez M, Schutz G (1980) Exons encode functional and structural units of chicken lysozyme. Proc Natl Acad Sci USA 77:5759–5763

    PubMed  Google Scholar 

  • Kitamura N, Kitagawa H, Fukushima D, Takagaki Y, Miyata T, Nakanishi S (1985) Structural organization of the human kininogen gene and a model for its evolution. J Biol Chem 260:8610–8617

    PubMed  Google Scholar 

  • Kost TA, Theodorakis N, Hughes SH (1983) The nucleotide sequence of the chick cytoplasmic β-actin gene. Nucleic Acids Res 11:8287–8301

    PubMed  Google Scholar 

  • Kwoh TJ, Engler JA (1984) The nucleotide sequence of the chicken thymidine kinase gene and the relationship of its predicted polypeptide to that of the vaccinia virus thymidine kinase. Nucleic Acids Res 12:3959–3971

    PubMed  Google Scholar 

  • Larhammar D, Hyldig-Nielsen JJ, Serenius B, Andersson G, Rask L, Peterson PA (1983) Exon-intron organization and complete nucleotide sequence of a human major histocompatibility antigen DCβ gene. Proc Natl Acad Sci USA 80:7313–7317

    PubMed  Google Scholar 

  • Levanon D, Lieman-Hurwitz J, Dafni N, Wigderson M, Sherman L, Bernstein Y, Laver-Rudich Z, Danciger E, Stein O, Groner Y (1985) Architecture and anatomy of the chromosomal locus in human chromosome 21 encoding the Cu/Zn superoxide dismutase. EMBO J 4:77–84

    PubMed  Google Scholar 

  • Lonberg N, Gilbert W (1985) Intron/exon structure of the chicken pyruvate kinase gene. Cell 40:81–90

    PubMed  Google Scholar 

  • Mahdavi V, Chambers AP, Nadal-Ginard B (1984) Cardiac α-and β-myosin heavy chain genes are organized in tandem. Proc Natl Acad Sci USA 81:2626–2630

    PubMed  Google Scholar 

  • Marchuk D, McCrohon, Fuchs E (1984) Remarkable conservation of structure among intermediate filament genes. Cell 39:491–498

    PubMed  Google Scholar 

  • Mayo KE, Cerelli GM, Lebo RV, Bruce BD, Rosenfeld MG, Evans RM (1985) Gene encoding human growth hormone-releasing factor precursor: structure, sequence, and chromosomal assignment. Proc Natl Acad Sci USA 82:63–67

    PubMed  Google Scholar 

  • Melton DW, Konecki DS, Brennand J, Caskey CT (1984) Structure, expression, and mutation of the hypoxanthine phosphoribosyltransferase gene. Proc Natl Acad Sci USA 81:2147–2151

    PubMed  Google Scholar 

  • Meyerhof W, Klinger-Mitropoulos S, Stadler J, Weber R, Knochel W (1984) The primary structure of the larval β1-globin gene ofXenopus laevis and its flanking region. Nucleic Acids Res 12:7705–7719

    PubMed  Google Scholar 

  • Michelson AM, Bruns GAP, Morton CC, Orkin SH (1985) The human phosphoglycerate kinase multigene family. J Biol Chem 260:6982–6992

    PubMed  Google Scholar 

  • Miyatake S, Yokota T, Lee F, Arai K-I (1985) Structure of the chromosomal gene for murine interleukin 3. Proc Natl Acad Sci USA 82:316–320

    PubMed  Google Scholar 

  • Miyazaki H, Fukamizu A, Hirose S, Hayashi T, Hori H, Ohkubo H, Nadanishi S, Murakami K (1984) Structure of the human renin gene. Proc Natl Acad Sci USA 81:5999–6003

    PubMed  Google Scholar 

  • Nabeshima Y, Fujii-Kuriyama Y, Muramatsu M, Ogata K (1984) Alternate transcription and two modes of splicing result in two myosin light chains from one gene. Nature 308:333–338

    PubMed  Google Scholar 

  • Naora H, Deacon NJ (1982a) Clustered genes require extragenic territorial DNA sequences. Differentiation 21:1–6

    PubMed  Google Scholar 

  • Naora H, Deacon NJ (1982b) Relationship between the total size of exons and introns in protein coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200

    PubMed  Google Scholar 

  • Nathans J, Hogness DS (1984) Isolation and nucleotide sequence of the gene encoding human rhodopsin. Proc Natl Acad Sci USA 81:4851–4855

    PubMed  Google Scholar 

  • Nawa H, Kotani H, Nakanishi S (1984) Tissue specific generation of two preprotachykinin mRNAs by alternate splicing. Nature 312:729–734

    PubMed  Google Scholar 

  • Nef P, Mauron A, Stalder R, Alliod C, Ballivet M (1984) Structure, linkage, and sequence of the two genes encoding δ and γ subunits of the nicotinic acetylcholine receptor. Proc Natl Acad Sci USA 81:7975–7979

    PubMed  Google Scholar 

  • Nemer M, Chamberland M, Sirois D, Argentin S, Drouin J, Dixon RAF, Zivin RA, Condra JH (1984) Gene structure of human cardiac hormone precursor, pronatriodilatin. Nature 312:654–656

    PubMed  Google Scholar 

  • Notake M, Tobimatsu T, Watanabe Y, Takahashi H, Mishina M, Numa S (1983) Isolation and characterization of the mouse corticotropin-β-lipotropin precursor gene and a related pseudogene. FEBS Lett 156:67–71

    PubMed  Google Scholar 

  • Nudel U, Calvo JM, Shani M, Levy Z (1984) The nucleotide sequence of a rat myosin light chain 2 gene. Nucleic Acids Res 12:7175–7186

    PubMed  Google Scholar 

  • Ny T, Elgh F, Lund B (1984) The structure of the human tissue-type plasminogen activator gene: correlation of intron and exon structures to functional and structural domains. Proc Natl Acad Sci USA 81:5355–5359

    PubMed  Google Scholar 

  • Ohno M, Sakamoto H, Yasuda K, Okada TS, Shimura Y (1985) Nucleotide sequence of a chicken δ-crystallin gene. Nucleic Acids Res 13:1593–1606

    PubMed  Google Scholar 

  • Ooyen AV, Nusse R (1984) Structure and nucleotide sequence of the putative mammary oncogeneint-1; proviral insertions leave the protein-encoding domain intact. Cell 39:233–240

    PubMed  Google Scholar 

  • Parnes JR, Seidman JG (1982) Structure of wild-type and mutant mouse β2-microglobin genes. Cell 29:661–669

    PubMed  Google Scholar 

  • Protter AA, Levy-Wilson B, Miller J, Bencen G, White T, Seilhamer JJ (1984) Isolation and sequence analysis of the human apolipoprotein CIII gene and the intergenic region between the apo AI and apo CIII genes. DNA 3:449–456

    PubMed  Google Scholar 

  • Reinke R, Feigelson P (1985) Rat α1-acid glycoprotein. J Biol Chem 260:4397–4403

    PubMed  Google Scholar 

  • Rogers J (1985) Exon shuffling and intron insertion in serine protease genes. Nature 315:458–459

    PubMed  Google Scholar 

  • Rosen H, Douglass J, Herbert E (1984) Isolation and characterization of the rat proenkephalin gene. J Biol Chem 259:14309–14313

    PubMed  Google Scholar 

  • Ruppert S, Scherer G, Schutz G (1984) Recent gene conversion involving bovine vasopressin and oxytocin precursor genes suggested by nucleotide sequence. Nature 308:554–557

    PubMed  Google Scholar 

  • Sargent TD, Jagodizinski LL, Yang M, Bonner J (1981) Fine structure and evolution of the rat serum albumin gene. Mol Cell Biol 1:871–883

    PubMed  Google Scholar 

  • SAS Institute Inc (1982) SAS user's guide: basics, 1982 ed. SAS Institute, Cary NC

    Google Scholar 

  • Scarpulla RC (1984) Processed pseudogenes for rat cytochromec are preferentially derived from one of three alternate mRNAs. Mol Cell Biol 4:2279–2288

    PubMed  Google Scholar 

  • Searle PF, Davison BL, Stuart GW, Wilke TM, Norstedt G, Palmiter RD (1984) Regulation, linkage, and sequence of mouse metallothionein I and II genes. Mol Cell Biol 4:1221–1230

    PubMed  Google Scholar 

  • Seidman CE, Bloch KD, Klein KA, Smith JA, Seidman JG (1984) Nucleotide sequences of the human and mouse atrial natriuretic factor genes. Science 226:1206–1209

    PubMed  Google Scholar 

  • Sekiya K, Fushimi M, Hori H, Hirohashi S, Nishimura S, Sugimura T (1984) Molecular cloning and the total nucleotide sequence of the human c-Ha-ras-1 gene activated in a melanoma from a Japanese patient. Proc Natl Acad Sci USA 81:4771–4775

    PubMed  Google Scholar 

  • Selby MJ, Barta A, Baxter JD, Bell GI, Eberhardt NL (1984) Analysis of a major human chorionic somatomammotropin gene. J Biol Chem 259:13131–13138

    PubMed  Google Scholar 

  • Shen L-P, Rutter WJ (1984) Sequence of the human somatostatin I gene. Science 224:168–171

    PubMed  Google Scholar 

  • Simmen RCM, Tanaka T, Ts'ui KF, Putkey JA, Scott MJ (1985) The structural organization of the chicken calmodulin gene. J Biol Chem 260:907–912

    PubMed  Google Scholar 

  • Sogawa K, Fujii-Kuriyama Y, Mizukami Y, Ichihara Y, Takahashi K (1983) Primary structure of the human pepsinogen gene. J Biol Chem 258:5306–5311

    PubMed  Google Scholar 

  • Sogawa K, Gotoh O, Kawajiri K, Fujii-Kuriyama Y (1984) Distinct organization of methylcholanthrene- and phenobarbital-inducible cytochrome P-450 genes in the rat. Proc Natl Acad Sci USA 81:5066–5070

    PubMed  Google Scholar 

  • Sokal RR, Rohlf FJ (1981) Biometry, WH Freeman, New York

    Google Scholar 

  • Stanton LW, Fahrlander PD, Tesser PM, Marcu KB (1984) Nucleotide sequence comparison of normal and translocated murine c-myc genes. Nature 310:423–425

    PubMed  Google Scholar 

  • Strein JP, Catterall JF, Kristo P, Means AR, O'Malley BW (1980) Ovomucoid intervening sequences specify functional domains and generate protein polymorphism. Cell 21:681–687

    PubMed  Google Scholar 

  • Stone EM, Rothblum KN, Alevy MC, Kuo TM, Schwartz RJ (1985) Complete sequence of the chicken glyceraldehyde-3-phosphate dehydrogenase gene. Proc Natl Acad Sci USA 82:1628–1632

    PubMed  Google Scholar 

  • Sudhof TC, Goldstein JL, Brown MS, Russell DW (1985) The LDL receptor gene: a mosaic of exons shared with different proteins. Science 228:815–822

    PubMed  Google Scholar 

  • Swift GH, Craik CS, Stary SJ, Quinto C, Lahaie RG, Rutter WJ, MacDonald RJ (1984) Structure of the two related elastase genes expressed in the rat pancreas. J Biol Chem 259:14271–14278

    PubMed  Google Scholar 

  • Takeya T, Hanafusa H (1983) Structure and sequence of the cellular gene homologous to the RSVsrc gene and the mechanism for generating the transforming virus. Cell 32:881–890

    PubMed  Google Scholar 

  • Tamkun JW, Schwarzbauer JE, Hynes RO (1984) A single rat fibronectin gene generates three different mRNAs by alternative splicing a complex exon. Proc Natl Acad Sci USA 81:5140–5144

    PubMed  Google Scholar 

  • Valerio D, Duyvesteyn MGC, Dekker BMM, Weeda G, Berkvens TM, van der Voorn L, van Ormondt H, vander Eb AJ (1985) Adenosine deaminase: characterization and expression of a gene with a remarkable promoter. EMBO J 4:437–443

    PubMed  Google Scholar 

  • Wang JYJ, Ledley F, Goff S, Lee R, Groner Y, Baltimore D (1984) The mouse c-abl locus: molecular cloning and characterzation. Cell 36:349–356

    PubMed  Google Scholar 

  • Wiedemann LM, Perry RP (1984) Characterization of the expressed gene and several processed pseudogenes for the mouse ribosomal protein L30 gene family. Mol Cell Biol 4:2518–2528

    PubMed  Google Scholar 

  • Wieringa B, Hofer E, Weissmann C (1984) A minimum intron length but no specific internal sequence is required for splicing the large rabbit β-globin intron. Cell 37:915–925

    PubMed  Google Scholar 

  • Yamada Y, Kuhn K, Crombrugghe BD (1983) A conserved nucleotide sequence, coding for a segment of the C-propeptide, is found at the same location in different collagen genes. Nucleic Acids Res 11:2733–2744

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smith, M.W. Structure of vertebrate genes: A statistical analysis implicating selection. J Mol Evol 27, 45–55 (1988). https://doi.org/10.1007/BF02099729

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02099729

Key words

Navigation