Abstractt
Statistical considerations are frequently to the fore in the analysis of microarray data, as researchers sift through massive amounts of data and adjust for various sources of variability in order to identify the important genes among the many that are measured. This chapter summarizes some of the issues involved and provides a brief review of the analysis tools that are available to researchers to deal with these issues.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kerr, M. K. and Churchill, G. A. (2001) Experimental design for gene expression microarrays. Biostatistics 2, 183–201.
Glonek, G. F. V. and Solomon, P. J. (2002) Factorial designs for microarray experiments. Technical Report, Department of Applied Mathematics, University of Adelaide, Australia.
Pan, W., Lin, J., and Le, C. (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. 3(5), research0022.1–0022.10.
Speed, T. P. and Yang, Y. H. (2002) Direct versus indirect designs for cDNA microarray experiments. Technical Report 616, Department of Statistics, University of California, Berkeley.
Alizadeh, A. A, Eisen, M. B., Davis, R. E., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511.
Chen, Y., Dougherty, E. R., and Bittner, M. L. (1997) Ratio based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2, 364–374.
QuantArray Analysis Software. http://lifesciences.perkinelmer.com.
Scanalytics MicroArray Suite. http://www.scanalytics.com.
GenePix Pro microarray and array analysis software, Axon Instruments http://www.axon.com.
Buhler, J., Ideker, T., and Haynor, D. (2000) Dapple: improved techniques for finding spots on DNA microarrays. CSE Technical Report UWTR 2000-08-05, University of Washington.
Beucher, S. and Meyer, F. (1993) The morphological approach to segmentation: the watershed transformation: mathematical morphology in image processing. Opt. Eng. 34, 433–481.
Adams, R. and Bischof, L. (1994) Seeded region growing. IEEE Trans. Pattern Anal. Machine Intelligence 16, 641–647.
Buckley, M. J. (2000) Spot User’s Guide, CSIRO Mathematical and Information Sciences, Sydney, Australia. http://www.cmis.csiro.au/iap/Spot/spotmanual.htm.
Wang, X., Ghosh, S., and Guo, S.-W. (2001) Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res. 29(15), E75–5.
Eisen, M. B. (1999) ScanAlyze User Manual, Stanford University, Palo Alto. http://rana.lbl.gov.
ArrayVision, Imaging Research. http://imaging.brocku.ca.
Soille, P. (1999) Morphological Image Analysis: Principles and Applications, Springer, New York.
Yang, Y. H., Buckley, M. J., Dudoit, S., and Speed, T. P. (2002) Comparison of methods for image analysis on cDNA microarray data. J. Computat. Graph. Stat. 11, 108–136.
Kooperberg, C., Fazzio, T. G., Delrow, J. J., and Tsukiyama, T. (2002) Improved background correction for spotted cDNA microarrays. J. Computat. Biol. 9, 55–66.
Dudoit, S., Yang, Y. H., Speed, T. P., and Callow, M. J. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12, 111–140.
Kerr, M. K., Martin, M., and Churchill, G. A. (2000) Analysis of variance for gene expression microarray data. J. Computat. Biol. 7, 819–837.
Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., and Paules, R. S. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. J. Computat. Biol. 8, 625–637.
Yang, Y. H., Dudoit, S., Luu, P., and Speed, T. P. (2001) Normalization for cDNA microarray data, in Microarrays: Optical Technologies and Informatics (Bittner, M. L. Chen, Y. Dorsel, A. N. and Dougherty, E. R., eds.), Proceedings of SPIE, vol. 4266.
Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30(4), E15.
Finkelstein, D. B., Gollub, J., Ewing, R., Sterky, F., Somerville, S., and Cherry, J. M. (2001) Iterative linear regression by sector, in Methods of Microarray Data Analysis. Papers from CAMDA 2000. (Lin S. M. and Johnson, K. F., eds.) Kluwer Academic, pp. 57–68.
Kepler, T. B., Crosby, L., and Morgan, K. T. (2000) Normalization and analysis of DNA microarray data by self-consistency and local regression, Santa Fe Institute Working Paper, Santa Fe, NM.
Schadt, E. E., Li, C., Ellis, B., and Wong, W. H. (2002) Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J. Cell. Biochem. 84(Suppl. 37), 120–125.
Tseng, G. C., Oh, M.-K., Rohlin, L., Liao, J. C., and Wong, W. H. (2001) Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 29, 2549–2557.
Brown, C. S., Goodwin, P. C., and Sorger, P. K. (2000) Image metrics in the statistical analysis of DNA microarray data. Proc. Natl. Acad. Sci. USA 98, 8944–8949.
Yang, M. C., Ruan, Q.-G., Yang, J. J., Eckenrode, S., Wu, S., McIndoe, R. A., and She, J.-X. (2001) A statistical procedure for flagging weak spots greatly improves normalization and ratio estimates in microarray experiments. Physiol. Genomics 7, 45–53.
Nadon, R., Shi, P., Skandalis, A., Woody, E., Hubschle, H., Susko, E., Rghei, N., and Ramm, P. (2001) Statistical methods for gene expression arrays, in Microarrays: Optical Technologies and Informatics Proceedings of SPIE, vol. 4266, (Bittner, M. L., Chen, Y., Dorsel, A. N., and Dougherty, E. R. eds.), pp. 46–55.
Tusher, V., Tibshirani, R., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5124.
Lönnstedt, I. and Speed, T. P. (2002) Replicated microarray data. Statistica Sinica 12, 31–46.
Efron B., Tibshirani, R., Storey J. D., and Tusher V. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96, 1151–1160.
Lin, D. M., Yang, Y. H., Scolnick, J. A., Brunet, L. J., Peng, V., Speed, T. P., and Ngai, J. (2002) A spatial map of gene expression in the olfactory bulb, Department of Molecular and Cell Biology, University of California, Berkeley.
Lönnstedt, I., Grant, S., Begley, G., and Speed, T. P. (2001) Microarray analysis of two interacting treatments: a linear model and trends in expression over time. Technical Report, Department of Mathematics, Uppsala University, Sweden.
Huber, P. J. (1981) Robust Statistics, Wiley, New York.
Marazzi, A. (1993) Algorithms, Routines and S Functions for Robust Statistics, Wadsworth & Brooks/Cole, CA.
Shaffer, J. P. (1995) Multiple hypothesis testing. Annu. Rev. Psychol. 46, 561–576.
Westfall, P. H. and Young, S. S. (1993) Re-Sampling Based Multiple Testing, Wiley, New York.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. Ser. 57, 289–300.
Storey, J. D. and Tibshirani, R. (2001) Estimating false discovery rates under dependence with applications to DNA microarrays, Technical Report, Department of Statistics, Stanford University.
Ideker, T., Thorsson, V., Siegel, A. F., and Hood, L. (2000) Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J. Computat. Biol. 7(6), 805–817.
Newton, M. A., Kenziorski, C. M., Richmond, C. S., Blattner, F. R., and Tsui, K. W. (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Computat. Biol. 8, 37–52.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537.
Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979) Multivariate Analysis, Academic, London.
McLachlan, G. J. (1992) Discriminant Analysis and Statistical Pattern Recognition, Wiley, New York.
Riply, B. D. (1996) Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge.
Breiman, L., Friedman, J. H., Olsen, R. A., and Stone, C. J. (1984) Classification and Regression Trees, Wadsworth, Monterey, CA.
Breiman, L. (1996) Bagging predictors. Machine Learning 24, 123–140.
Breiman, L. (1998) Arcing classifiers. Ann. Stat. 26, 801–824.
Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M. Jr., and Haussler, D. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267.
Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427.
Dudoit, S., Fridlyand, J., and Speed, T. P. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87.
Eisen, M. B., Spellman, P. T., Brown, P. O., Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14,863–14,868.
Hastie, T., Tibshirani, R., Eisen, M. B., Alizadeh, A., Levy, R., Staudt, L., Chan, W. C., Botstein, D., and Brown, P. (2000) “Gene shaving” as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol. 1(2), 0003.1–0003.21.
Lazzeroni, L. and Owen, A. B. (2002) Plaid models for gene expression data. Statistica Sinica 12, 61–86.
Parmigiani, G., Garrett, E. S., Anbazhagan, R., and Gabrielson, E. (2002) A statistical framework for expression-based molecular classification in cancer, Technical Report, Department of Biostatistics, Johns Hopkins University.
Dudoit, S., Yang, Y. H., and Bolstad, B. (2002) Using R for the analysis of DNA microarray data. R News 2(1), 24–32.
Dudoit, S. and Yang, Y. H. (2003) Bioconductor R packages for exploratory analysis and normalization of cDNA microarray data, in The Analysis of Gene Expression Data: Methods and Software (Parmigiani, G., Garrett, E. S., Irizarry, R. A., and Zeger, S. L., eds.), Springer, New York, in press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Humana Press Inc.
About this protocol
Cite this protocol
Smyth, G.K., Yang, Y.H., Speed, T. (2003). Statistical Issues in cDNA Microarray Data Analysis. In: Brownstein, M.J., Khodursky, A.B. (eds) Functional Genomics. Methods in Molecular Biology, vol 224. Humana Press. https://doi.org/10.1385/1-59259-364-X:111
Download citation
DOI: https://doi.org/10.1385/1-59259-364-X:111
Publisher Name: Humana Press
Print ISBN: 978-1-58829-291-9
Online ISBN: 978-1-59259-364-4
eBook Packages: Springer Protocols