PT - JOURNAL ARTICLE AU - Alex D. Washburne AU - Justin D. Silverman AU - James T. Morton AU - Daniel J. Becker AU - Daniel Crowley AU - Sayan Mukherjee AU - Lawrence A. David AU - Raina K. Plowright TI - Phylofactorization: a graph-partitioning algorithm to identify phylogenetic scales of ecological data AID - 10.1101/235341 DP - 2017 Jan 01 TA - bioRxiv PG - 235341 4099 - http://biorxiv.org/content/early/2017/12/16/235341.short 4100 - http://biorxiv.org/content/early/2017/12/16/235341.full AB - The problem of pattern and scale is a central challenge in ecology [27]. The problem of scale is central to community ecology, where functional ecological groups are aggregated and treated as a unit underlying an ecological pattern, such as aggregation of “nitrogen fixing trees” into a total abundance of a trait underlying ecosystem physiology. With the emergence of massive community ecological datasets, from microbiomes to breeding bird surveys, there is a need to objectively identify the scales of organization pertaining to well-defined patterns in community ecological data.The phylogeny is a scaffold for identifying key phylogenetic scales associated with macroscopic patterns. Phylofactorization was developed to objectively identify phylogenetic scales underlying patterns in relative abundance data. However, many ecological data, such as presence-absences and counts, are not relative abundances, yet the logic of defining phylogenetic scales underlying a pattern of interest is still applicable. Here, we generalize phylofactorization beyond relative abundances to a graph-partitioning algorithm for traits and community-ecological data from any exponential-family distribution.Generalizing phylofactorization yields many tools for analyzing community ecological data. In the context of generalized phylofactorization, we identify three phylogenetic factors of mammalian body mass which arose during the K-Pg extinction event, consistent with other analyses of mammalian body mass evolution. We introduce a phylogenetic analysis of variance which refines our understanding of the major sources of variation in the human gut. We employ generalized additive modeling of microbes in central park soils to confirm that a large clade of Acidobacteria thrive in neutral soils. We demonstrate how to extend phylofactorization to generalized linear and additive modeling of any dataset of exponential family random variables. We finish with a discussion of how phylofactorization produces a novel species concept, a hybrid of a phy-logenetic and ecological species concepts in which the phylogenetic scales and units of interest are defined objectively by defining the ecological pattern and partitioning the phylogeny into clades based on different contributions to the pattern. All of these tools can be implemented with a new R package available online.