PT - JOURNAL ARTICLE AU - Jaime Abraham Castro-Mondragon AU - Sébastien Jaeger AU - Denis Thieffry AU - Morgane Thomas-Chollier AU - Jacques van Helden TI - RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections AID - 10.1101/065565 DP - 2016 Jan 01 TA - bioRxiv PG - 065565 4099 - http://biorxiv.org/content/early/2016/12/21/065565.short 4100 - http://biorxiv.org/content/early/2016/12/21/065565.full AB - Transcription Factor (TF) databases contain multitudes of motifs from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq peaks) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant collections of motifs. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools and highlights biologically relevant variations of similar motifs. By clustering 24 entire databases (>7,500 motifs), we show that matrix-clustering correctly groups motifs belonging to the same TF families, and can drastically reduce motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.