TY - JOUR T1 - Supervised and Unsupervised Classification of lncRNA Subtypes JF - bioRxiv DO - 10.1101/2020.07.20.211433 SP - 2020.07.20.211433 AU - Rituparno Sen AU - Jörg Fallmann AU - Maria Emília M. T. Walter AU - Peter F. Stadler Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/08/17/2020.07.20.211433.abstract N2 - Many small nucleolar RNAs and many of the hairpin precursors of miRNAs are processed from long non-protein-coding (lncRNA) host genes. In contrast to their highly conserved and heavily structured payload, the host genes feature poorly conserved sequences. Nevertheless there is mounting evidence that the host genes have biological functions. No obvious connections between the function of the host genes and the function of their payloads have been reported. Here we inverstigate whether there is an association of host gene function or mechanisms with the type of payload. To assess this hypothesis we test whether the miRNA host genes (MIRHGs), snoRNA host genes (SNHGs), and other lncRNAs host genes can be distinguished based on sequence and structure features. A positive answer would imply a correlation between host genes and their payload. While the three classes can be distinguished reliably when the classifier is allowed to extract features from the payloads, this is no longer the case when only sequence and structure of parts of the host gene distal from the snoRNAs or miRNA payload is used for classification. Our data indicate that the functions of MIRHGs and SNHGs are largely independent of the functions of their payloads. Furthermore, there is no evidence that the MIRHGs and SNHGs form coherent classes of long non-coding RNAs distinguished by features other than their payloads.Competing Interest StatementThe authors have declared no competing interest.miRNAmicroRNAsnoRNAsmall nucleolar RNAslnRNAlong non-coding RNAMIRHGmiRNA host geneSNHGsnoRNA host geneNoHGlncRNAs that harbor neither snoRNAs nor miRNAs ER -