Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Reproducible pharmacogenomic profiling of cancer cell line panels

Abstract

The use of large-scale genomic and drug response screening of cancer cell lines depends crucially on the reproducibility of results. Here we consider two previously published screens, plus a later critique of these studies. Using independent data, we show that consistency is achievable, and provide a systematic description of the best laboratory and analysis practices for future studies.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: gCSI, CCLE and GDSC drug response statistics reveal different patterns of response for specifically versus broadly active drugs.
Figure 2: Highly concordant sensitive and resistant labels assigned via mean viability.
Figure 3: Consistent biomarker scoring for gCSI and CCLE drug response data.
Figure 4: Follow-up screen shows SYTO 60 assay closely agrees with CTG but is significantly less precise.

Similar content being viewed by others

References

  1. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012)

    Article  CAS  ADS  Google Scholar 

  2. Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012)

    Article  CAS  ADS  Google Scholar 

  3. Haibe-Kains, B. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013)

    Article  CAS  ADS  Google Scholar 

  4. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Article  Google Scholar 

  5. Ein-Dor, L., Kela, I., Getz, G., Givol, D. & Domany, E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21, 171–178 (2005)

    Article  CAS  Google Scholar 

  6. Chan, G. K. Y., Kleinheinz, T. L., Peterson, D. & Moffat, J. G. A simple high-content cell cycle assay reveals frequent discrepancies between cell number and ATP and MTS proliferation assays. PLoS ONE 8, e63583 (2013)

    Article  CAS  ADS  Google Scholar 

  7. Greshock, J. et al. Molecular target class is predictive of in vitro response profile. Cancer Res. 70, 3677–3686 (2010)

    Article  CAS  Google Scholar 

  8. Cancer Cell Line Encyclopedia Consortium & Genomics of Drug Sensitivity in Cancer Consortium. Pharmacogenomic agreement between two cancer cell line data sets. Nature 528, 84–87 (2015)

  9. Yu, M. et al. A resource for cell line authentication, annotation and quality control. Nature 520, 307–311 (2015)

    Article  CAS  ADS  Google Scholar 

  10. Jiang, Y., Hascall, D., Li, D. & Pease, J. H. A charged aerosol detector/chemiluminescent nitrogen detector/liquid chromatography/mass spectrometry system for regular and fragment compound analysis in drug discovery. J. Chromatogr. A 1411, 63–68 (2015)

    Article  CAS  Google Scholar 

  11. Dawes, T. D. et al. Compound transfer by acoustic droplet ejection promotes quality and efficiency in ultra-high-throughput screening campaigns. J. Lab. Autom. 21, 64–75 (2016)

    Article  Google Scholar 

  12. McDermott, U. et al. Identification of genotype-correlated sensitivity to selective kinase inhibitors by using high-throughput tumor cell line profiling. Proc. Natl Acad. Sci. USA 104, 19936–19941 (2007)

    Article  CAS  ADS  Google Scholar 

  13. Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nature Biotechnol. 33, 306–312 (2015)

    Article  CAS  Google Scholar 

  14. Zoppoli, G. et al. Putative DNA/RNA helicase Schlafen-11 (SLFN11) sensitizes cancer cells to DNA-damaging agents. Proc. Natl Acad. Sci. USA 109, 15030–15035 (2012)

    Article  CAS  ADS  Google Scholar 

  15. Duan, Z., Lamendola, D. E., Duan, Y., Yusuf, R. Z. & Seiden, M. V. Description of paclitaxel resistance-associated genes in ovarian and breast cancer cell lines. Cancer Chemother. Pharmacol. 55, 277–285 (2005)

    Article  CAS  Google Scholar 

  16. Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010)

    Article  CAS  Google Scholar 

  17. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)

    Article  CAS  Google Scholar 

  18. Greenman, C. D. et al. PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics 11, 164–175 (2010)

    Article  Google Scholar 

  19. Rudin, C. M. et al. Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer. Nature Genet. 44, 1111–1116 (2012)

    Article  CAS  Google Scholar 

  20. Tibshirani, R. & Wang, P. Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9, 18–29 (2008)

    Article  Google Scholar 

  21. Yourshaw, M., Taylor, S. P., Rao, A. R., Martín, M. G. & Nelson, S. F. Rich annotation of DNA sequencing variants by leveraging the Ensembl Variant Effect Predictor with plugins. Brief. Bioinform. 16, 255–264 (2015)

    Article  CAS  Google Scholar 

  22. Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011)

    Article  CAS  Google Scholar 

  23. González-Pérez, A. & López-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440–449 (2011)

    Article  Google Scholar 

  24. Barretina, J. et al. Addendum: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 492, 290 (2012)

    Article  CAS  ADS  Google Scholar 

  25. Sigoillot, F. D. & King, R. W. Vigilance and validation: Keys to success in RNAi screening. ACS Chem. Biol. 6, 47–60 (2011)

    Article  CAS  Google Scholar 

  26. Fraley, C. & Raftery, A. E. Model-based clustering, discriminant analysis and density estimation. J. Amer. Statist. Assoc. 97, 611–631 (2002)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank G. Manning, T. Sandmann, B. Forrest and D. Stokoe for their valuable contributions to improving the manuscript. We also thank L. Shi and J. Wu for work on early phases of gCSI screening; R. Rodriguez, G. Yuen and D. Hascall for help preparing drug plates; Y. Jiang for assistance with drug plate quality control; and S. Selvaraj and M. Yu for banking and quality control of cell lines used in this study. This manuscript contains an analysis of data released by the Broad Institute (CCLE) and by the GDSC members. Those who carried out the original analysis and collection of these data bear no responsibility for the further analysis or interpretation of it. The relevant subset of the CCLE data are included in the supplementary software package with the explicit written consent of the CCLE group. The GDSC data has been included following the GDSC’s instructions for attribution and under their specified open-source license, Creative Commons 3.0.

Author information

Authors and Affiliations

Authors

Contributions

P.M.H. and R.B. drafted the manuscript, prepared figures and tables and designed the factorial screen. P.M.H. and R.B. performed computational analysis and interpreted results. S.L. contributed to the development of elastic net biomarker identification software. J.S. and R.L.Y. contributed to the experimental design, data analysis, and manuscript preparation. S.M. contributed to manuscript preparation and data interpretation. E.L., J.T., Y.Y. and B.L. performed primary cell-based screening experiments. E.L. and R.M.N. designed and managed the cell-based screening experiments.

Corresponding author

Correspondence to Richard Bourgon.

Ethics declarations

Competing interests

All authors are employees of Genentech Inc. and may be stockholders of Roche Pharmaceuticals.

Extended data figures and tables

Extended Data Figure 1 Comparison of gCSI, CCLE and GDSC mean viability response statistics: all pairwise comparisons.

a, b, IC50 values (the drug concentration estimated to produce a 50% reduction in viability relative to untreated cells) for the cell lines in Fig. 1. Because extrapolation beyond tested doses can produce statistically unstable results (Extended Data Fig. 2b), IC50 values are truncated to the tested dose range (which varies by drug and project). c, d, Mean viability (equivalent to area under the log-dose/response curve) is plotted for all drugs that gCSI had in common with one or both other studies. 0% corresponds to complete growth inhibition at all doses; 100% corresponds to viability equal, on average, to untreated control wells. Values are capped at a maximum value of 100% for consistency with GDSC published data. gCSI mean viability and IC50 values were derived from three biological replicates (Methods).

Extended Data Figure 2 Assignment of sensitive and resistant labels for specifically active agents.

ac, Untruncated mean viability statistics from CCLE and gCSI are plotted for three drugs with specific activity. Two-component Gaussian mixture model fits to these values were performed to classify non-responder and responder distributions for CCLE and gCSI separately26. Non-responder mean viability statistics were scattered around 100%, providing a good indication of assay noise levels. Point colour indicates hard sensitive (red) versus resistant (blue) thresholding based on likelihood ratio—a strategy that ignores uncertainty of classification. Dotted lines, on the other hand, indicate the final thresholds selected by the method used for Fig. 2; this method quantifies uncertainty and, if it is too high, assigns a no-call label to the cell line (Methods). Ellipses indicate the covariance from a two-dimensional, two-component fit and are for illustrative purposes only; response labels were assigned on the basis of individual fits for gCSI and CCLE separately, using all available cell lines for each data set. df, Hypothetical log-dose/viability curves illustrating statistical challenges presented by IC50 response metric. In small screens, it is often feasible to adjust dosing scale so that a 50% inhibition (relative to vehicle-treated wells at the same time point) is cleanly achieved. For large-scale screening, on the other hand, it is common for many cell lines to fail to achieve this level of inhibition at any tested dose, and/or to fail to demonstrate a lower asymptote in their dose–response curve. The mean viability statistic, on the other hand, requires no extrapolation beyond tested doses. d, A hypothetical dose–response curve with characteristics ideal for relative or absolute IC50 computation. Tested doses yield viability values well above and well below 50%. Furthermore, both upper and lower asymptotes can be precisely estimated, enabling use of relative IC50, if desired (relative IC50 is computed after first rescaling to the observed viability dynamic range. In the present work, however, all three studies use absolute IC50). e, A hypothetical dose–response curve that yields precise upper and lower asymptote estimates but a highly imprecise absolute IC50 estimate: doses over a range exceeding two orders of magnitude are all consistent with 50% inhibition, and raw data are too noisy for further refinement. Use of such IC50 values, which can be frequently observed in gCSI, CCLE and GDSC results, may complicate cross-study comparisons. f, A hypothetical dose–response curve that requires potentially imprecise extrapolation for estimation of both absolute IC50 and lower asymptote. We advise against use of either absolute or relative IC50 for such curves, which are again common in all three studies under consideration. Finally, also note that in settings such as this, where low doses provide no information and inhibition occurs only at the highest tested doses, any response statistic is forced to rank cell lines on the basis of just a subset of the data generated. As a consequence, results for such drugs may exhibit reduced concordance regardless of analysis method.

Extended Data Figure 3 Comparison of gCSI, CCLE and GDSC primary drug response statistics: only cell lines assayed by all three projects.

Equivalent to Fig. 1, but restricted to cell lines in the three-way gCSI, CCLE and GDSC intersection. Figure 1 shows a significant number of cell lines interrogated by gCSI and CCLE that are sensitive to the three specifically active drugs. Of these, few (if any) are visible here, because few were included in the gCSI–GDSC intersection. ad, Mean viability statistics (a, b), and IC50 (c, d). Both are truncated as for Fig. 1 and Extended Data Fig. 1. gCSI mean viability and IC50 values were derived from three biological replicates (Methods).

Extended Data Figure 4 gCSI–GDSC concordance for drugs common to only gCSI and GDSC.

a, Genomic features were ranked by direction and strength of their association with GDSC mean viability (from strongest negative to strongest positive). Shaded regions indicate features with significant univariate association (FDR-adjusted P < 0.10). The univariate ranks of features selected by the elastic net using gCSI drug response are indicated with vertical bars, also coloured by direction of association. b, As for panel a, but applying the elastic net to GDSC drug response data and comparing results to univariate statistics obtained from gCSI drug response. c, As in Table 1, lines called sensitive or resistant by both gCSI and GDSC were assessed for concordant classification.

Extended Data Figure 5 Consistent biomarker scoring for gCSI and CCLE drug response data.

The elastic net algorithm was applied to identify short lists of candidate biomarkers for drug response as assessed by either gCSI or CCLE mean viability values, using gCSI genomic data. Supplementary Table 4 describes the number of cell lines available for each comparison. a, Description of biomarkers expected to associate with response for each of the six drugs tested by both gCSI and CCLE; recovery of expected biomarkers; total selected feature count; and overlap between gCSI and CCLE selected feature sets. Asterisk denotes that ST7-AS1 is adjacent to MET and ST7-AS1 expression correlates with MET copy number (Pearson correlation r = 0.47, P < 2.2 × 10−16, n = 534). The gCSI genomics set does not include translocations, so ALK translocation could not be checked directly. b, Equivalent to panel a, but using CCLE rather than gCSI genomics data. c, d, Equivalent to Fig. 3a, b, but using CCLE rather than gCSI genomics data.

Extended Data Figure 6 Features identified as response biomarkers using gCSI and CCLE discretized mean viability drug response data versus gCSI genomics data.

Equivalent to Fig. 3 and Extended Data Fig. 5a, but using discretized sensitive/resistant labels rather than continuous drug response statistics (lines assigned a no-call label were ignored for this analysis). Recovery of expected biomarkers and scoring of features selected by elastic net in the opposing project’s univariate analysis are both comparable to results obtained from continuous mean viability statistics (Fig. 3 and Extended Data Fig. 5a). a, Description of biomarkers expected to associate with response for each of the six drugs tested by both gCSI and CCLE; recovery of expected biomarkers; total selected feature count; and overlap between gCSI and CCLE selected feature sets. b, Genomic features were ranked by direction and strength of their association with CCLE mean viability (from strongest negative to strongest positive). Shaded regions indicate features with significant univariate association (FDR-adjusted P < 0.10). Blue indicates negative association with response and red indicates a positive association. The univariate ranks of features selected by the elastic net using gCSI drug response are indicated with vertical bars, also colored by direction of association. In general, the gCSI-selected features are strongly enriched at the extreme ends of the distributions, indicating that they are highly relevant to both drug response data sets. c, As for panel b, but applying the elastic net to CCLE drug response data and comparing results to univariate statistics obtained from gCSI drug response.

Extended Data Figure 7 Features identified as response biomarkers using gCSI and GDSC discretized mean viability drug response data versus gCSI genomics data.

Equivalent to Extended Data Fig. 6, but contrasting gCSI with GDSC rather than CCLE. a, Description of biomarkers expected to associate with response for each of the six drugs tested by both gCSI and GDSC; recovery of expected biomarkers; total selected feature count; and overlap between gCSI and GDSC selected feature sets. b, Genomic features were ranked by direction and strength of their association with GDSC mean viability (from strongest negative to strongest positive). Shaded regions indicate features with significant univariate association (FDR-adjusted P < 0.10). Blue indicates negative association with response and red indicates a positive association. The univariate ranks of features selected by the elastic net using gCSI drug response are indicated with vertical bars, also colored by direction of association. In general, the gCSI-selected features are strongly enriched at the extreme ends of the distributions, indicating that they are highly relevant to both drug response data sets. c, As for panel b, but applying the elastic net to GDSC drug response data and comparing results to univariate statistics obtained from gCSI drug response.

Extended Data Figure 8 Media serum concentration affects mean viability for some drugs and cell lines.

Differences in mean viability between the 10% and 5% FBS conditions are shown for the follow-up screen. Only CTG-derived mean viability values were used. a, Changes in single-run mean viability for variable seeding data before synthesizing replicates. In many cases, direction of change was not consistent. In others (for example, 769P or CAL-12T when treated with PD0325901), richer media had a consistent effect in each replicate. For each drug, overall significance of bidirectional effect was assessed by running a paired t-test for each cell line separately, and then pooling results by Fisher’s method. Overall media effect was significant for all drugs but lapatinib (for erlotinib, P = 0.037; lapatinib, P = 0.288; paclitaxel, P = 0.014; and PD0325901, P = 2.8 × 10−4). b, Only PD09325901 achieved an average media effect across cell lines that was statistically significant and of practically important size (4 percentage points). Data for fixed and variable seeding are plotted together because no significant difference in mean viability shift was observed between them for any of the four drugs considered (two-sided, unpaired t-tests: erlotinib, n = 47, P = 0.86; lapatinib, n = 45, P = 0.29; paclitaxel, n = 48, P = 0.80; and PD0325901, n = 46, P = 0.44). Mean viability values were derived from biological triplicate experiments (Methods). c, SYTO 60 produces more variable viability measurements than CTG. The width of the 95% confidence intervals for mean viability the 24 cell lines in the follow-on screen (on viability scale; see Methods for confidence interval estimation). Symbol colour indicates seeding strategy; symbol shape indicates media condition. For all drugs and screening conditions, the vast majority of cell lines showed wider confidence intervals for SYTO 60 than for CTG, indicating increased noise in the SYTO 60 data and reduced precision in estimation of mean viability (mean increase for erlotinib was 7 percentage points, P = 1.2 × 10−11; for lapatinib, 8 points, P = 1.6 × 10−14; for paclitaxel, 5 points, P = 2.7 × 10−10; and for PD0325901, 5 points, P = 2.5 × 10−7). This effect is not related to seeding strategy or media serum concentration.

Extended Data Figure 9 Seeding density affects drug response for multiple drugs and cell lines.

a, In the follow-up screen, the difference in estimated CTG mean viability between variable and fixed seeding is shown (with the two media conditions averaged). For broadly active compounds, slow growing lines typically yielded higher mean viability estimates under variable seeding and therefore produced a positive difference. Fast growing lines, on the other hand, tended to yield lower mean viability estimates under variable seeding (Spearman’s ρ test, P = 1.71 × 10−5). This trend was not significant for specifically active drugs (P = 8.77 × 10−2). Lines of average growth rate, for which fixed and variable seeding were essentially the same and thus no difference was expected, are indicated in the dashed rectangle. b, Similar to panel a, but showing the difference between gCSI (variable seeding) and CCLE (fixed seeding). Because mean viability statistics are not directly comparable when different dosing schedules are used, residuals from a regression of gCSI mean viability on CCLE mean viability are plotted in the heat maps rather than a simple difference. The line plot shows a moving average (75 cell line window) of the row averages of each heat map. As for the follow-up screen, a significant association was seen between cell line growth rate and impact of seeding density for broadly active drugs (Spearman’s ρ test, P = 7.76 × 10−9), but not for specifically active drugs (P = 1.12 × 10−1). Mean viability values were derived from biological triplicate experiments (Methods).

Extended Data Table 1 Key differences in assay conditions between projects

Supplementary information

Supplementary Information

This file contains a Supplementary note and full legends for Supplementary Tables 1-6. (PDF 193 kb)

Supplementary Data

This file contains Supplementary Tables 1-6, please refer to the Supplementary Information document for full legends. (XLSX 1887 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haverty, P., Lin, E., Tan, J. et al. Reproducible pharmacogenomic profiling of cancer cell line panels. Nature 533, 333–337 (2016). https://doi.org/10.1038/nature17987

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature17987

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer