RT Journal Article SR Electronic T1 Effects of underlying gene-regulation network structure on prediction accuracy in high-dimensional regression JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.09.11.293456 DO 10.1101/2020.09.11.293456 A1 Yuichi Okinaga A1 Daisuke Kyogoku A1 Satoshi Kondo A1 Atsushi J. Nagano A1 Kei Hirose YR 2020 UL http://biorxiv.org/content/early/2020/09/12/2020.09.11.293456.abstract AB Motivation The least absolute shrinkage and selection operator (lasso) and principal component regression (PCR) are popular methods of estimating traits from high-dimensional omics data, such as transcriptomes. The prediction accuracy of these estimation methods is highly dependent on the covariance structure, which is characterized by gene regulation networks. However, the manner in which the structure of a gene regulation network together with the sample size affects prediction accuracy has not yet been sufficiently investigated. In this study, Monte Carlo simulations are conducted to investigate the prediction accuracy for several network structures under various sample sizes.Results When the gene regulation network was random graph, the simulation indicated that models with high estimation accuracy could be achieved with small sample sizes. However, a real gene regulation network is likely to exhibit a scale-free structure. In such cases, the simulation indicated that a relatively large number of observations is required to accurately predict traits from a transcriptome.Availability and implementation Source code at https://github.com/keihirose/simrnetContact hirose{at}imi.kyushu-u.ac.jpCompeting Interest StatementThe authors have declared no competing interest.