Institutional Differences in the Stewardship and Research Output of United States Herbaria

Public policy decisions regarding institutional frameworks that govern the stewardship of biodiversity data at public and private institutions are an area of increasing importance. Museums, government agencies, and academic institutions across the United States maintain collections of biological specimens and information critical to scientific discovery. One subset of these natural history collections are herbaria, or collections of preserved plant matter and their associated data. In this study, I evaluate the current state of the digitization and databasing of herbariums contributing data to the SEInet Regional Network of North American Herbaria, and assess the impact of characteristics, particularly institution type (cultural sector institutions, public universities, private universities, or public land institutions), on the metrics of herbaria richness, digitization, and research usage. The results of this study suggest that institution type is significantly associated with the size, diversity, and digitization efforts of a herbarium collection. Specifically, cultural sector institutions tend to have larger and more diverse collections, followed by public and private universities, and finally public land institutions. Additionally, as herbarium size and richness increases, the research output of associated staff also increases. These results highlight that some institutions, particularly larger institutions located at universities or cultural sector institutions, may be better supported in the curation, stewardship, and digitization of large collections, allowing long-term access to the associated biodiversity data. Smaller institutions at public land institutions may need additional support in these endeavors, and may represent an area of unmet needs for digitization and curatorial funding and resources.


Introduction
2018; Nelson et al. 2015). Georeferencing involves using the text locality information to provide a mapped point location and associated error or uncertainty for the specimen to enable geoanalysis (Murphey et al. 2004). Imaging flat herbarium specimens typically includes the capture, processing, and archiving of a 2D image of the specimen (Nelson et al. 2015;Giraud et al. 2018). Imaging occasionally also includes 3D scans of structures such as fruit or buds of a specimen (Schneider et al. 2018), or microscopic images of pollen or other structures (Allan et al. 2019;Carranza-Rojas et al. 2017). Increasingly, digitization also includes genomics or phylogenetics data related to the specimen, and the digitization best practices may soon include standardized sequencing of specimen DNA (R. D. M. Page 2013; Taylor and Swann 1994;Leavitt et al. 2019).
The availability of digitized specimen data enables biodiversity big data research that was historically impossible. For example, georeferenced specimens can serve as baseline data for defining the impacts of climate change, invasive species, and other anthropogenic changes on plant communities (Lang et al. 2019;Dyderski et al. 2018;Ahern et al. 2010). Imaged specimens allow for studies of plant growth forms and coloration without requiring travel or loan of specimens and also provides a labeled training dataset for deep learning and taxonomic classification (Jimenez-Mejias, Cohen, and Naczi 2017;Collins et al. 2018). These images also contain information about the plant life stage at the time of collection, allowing researchers to assess changes in plant phenology in response to climate change and land use change (Cleland et al. 2007;Everill et al. 2014;MacGillivray, Hudson, and Lowe 2010). Additionally, advances in genomics technology can allow researchers a window into genetic changes and how species are distributed across a landscape through the sequencing and genetic profiling of collection specimens (Cozzolino et al. 2007;Konrade, Shaw, and Beck 2019;Snyman et al. 2018).
Digitizing collections and making data publicly available should allow researchers to expand research usage of these collections, but these outputs may be dependent on particular management and institutional approaches to enabling herbarium research.
Much of the digitization progress in the United States has been funded through federal grants, particularly the Advancing Digitization of Biological Collections (ADBC) program, established by the National Science Foundation in 2011. ADBC provides funding to organizations to improve access to digitized specimens in US natural history collections (National Science Foundation 2015). This program established iDigBio, a centralized organization that coordinates the integration of the digital data resulting from digitization projects (Paul et al. 2013;Matsunaga et al. 2013;Nelson 2014). ADBC also supports Thematic Collections Network (TCN), which provides funding to networks of institutions with a shared strategy to digitize specimens from a specific research theme, such as a taxonomic focus or geographic region (Nelson 2014;National Science Foundation 2015). Eligible institutions include two-and four-year public and private colleges and universities; non-profit, non-academic cultural institutions including museums, botanical gardens, research labs, and professional societies; and state and local governments. There are presently more than 124,858,708 specimen records, 39,689,496 media records, and 1,623 record sheets aggregated on the iDigBio portal resulting from funding to 925 collections at 317 institutions. <Table 1 near here> Different types of natural history institutions have different challenges in the process of managing and digitizing their collections (Mayernik et al. 2020).
Smaller herbaria often struggle with prohibitively small budgets and few curatorial staff to assist in collection management and digitization tasks (Snow 2005;Harris and Marsico 2017).
Herbaria located at larger universities may have a greater number of affiliated researchers, but herbarium management is rarely their primary role, and their particular research area may not meaningfully contribute to specimen collection or curatorial tasks (Feeley and Silman 2011).
Cultural sector herbaria are often co-located with botanical gardens, field research sites, or natural history museums. The shared goals of cultural institutions include serving the public good, attaining financial stability, and supporting staff (Selwood 1999;Giardina and Rizzo 1994;Falk and Dierking 2008). Serving the public good takes a variety of forms, including serving as storehouses of cultural and scientific information, supporting research work on collection holdings, supporting social impact, and providing educational opportunities to the public (Scott 2006;Stanziola 2008). These institutions can vary in size, but their herbarium staff members are more likely to have curatorial tasks and taxonomic assessment as their primary role compared to university staff members. Variations in the curatorial role types and number of curators may have significant impacts on not only the size and richness of a herbarium, but also the progress made on digitization and incorporation into public databases. This ultimately can affect the research output associated with an individual herbarium.
Herbaria located at public land institutions face unique challenges compared to those at academic institutions. Namely, The Sundry Civil Act of March 3, 1879 (20 U.S.C. 59), requires that all physical object collections, including herbarium specimens, must eventually be archived in the Smithsonian Museum of Natural History. This means that, although a significant proportion of the collections are held by other organizations, like the Bureau of Land Management, the United States Geological Survey (USGS), the National Parks Service, and the Fish Wildlife Service, federally-managed collections are not typically locally managed or archived for long-term stewardship. The lack of accountability for physical objects has led to significant criticism from the scientific and data management community, particularly of the USGS (Office of the Inspector General 2017; Ruch 2018Ruch , 2019. A 2018 report highlighted that the agency lacked a policy for biological specimens, and its geological specimen policy had confusing language that could leave specimens at risk of destruction (Ruch 2018). In September of 2019, the USGS released their Policy on Scientific Working Collections (United States Geological Survey 2019), and though these policy changes are significant improvements, there are still concerns that these changes will not address risks to natural history collections under their administration (Ruch 2019).
Natural history collections at public lands are rarely collected exclusively for research purposes, and digitization for public research may not be a priority at these institutions.
Additionally, the requirement that all physical objects collected by government entities must be eventually deposited into the Smithsonian Institution National Museum of Natural History for long-term archiving may leave publicly managed natural history collections without a strong incentive to manage their specimens for long-term research usage, as the specimens are not yet housed in their long-term home. In theory, requiring public land institutions to deposit specimens into the Smithsonian should enable research access and improve stewardship of federally collected physical objects. In practice, however, this policy leads to neglect and insufficient stewardship of the distributed working collections at the USGS and other institutions and often leaves specimens at risk of destruction or degradation due to storage under suboptimal conditions (Ruch 2018).
These academic, government-run, and cultural sector institutions across the world maintain substantial natural history collections containing biological specimens and physical objects (Suarez and Tsutsui 200). Because these institutions vary in their size, taxonomic diversity, and research goals, there are significant differences in management priorities. The variety of institutional frameworks under which natural history collections are managed provide both challenges and opportunities to researchers who rely on this data to produce scientific knowledge. The purpose of this study is to assess how the institutional management type impacts the size and richness of an herbarium, the progress towards digitization, and the research output.

Data Collection
Herbaria across North America submit data to the SEINet Regional Networks Of North, I extracted collection-level statistics for a total of 399 submitting institutions on February 21, 2020 using the SouthEast Regional Network of Expertise and Collections portal. This provided aggregated measures of herbarium size: the total number of specimens and the total number of specimens identified to species level. This also included measures of digitization efforts in the collection including the total number and percentage of specimens that are georeferenced or imaged. Additionally, the portal reports measures of herbarium taxonomic diversity including: number of families, genera, species, and total taxonomic groups including in the collection.
Finally, the portal also reports the total number of type specimens, or specimens that have permanent taxonomic designations used by other researchers to confirm identifications and define taxonomic boundaries, in the collection. Herbaria outside of the United States were excluded from this analysis, and each remaining herbarium was classified as either associated with a public land institution, public university, private university, cultural sector institution, or for-profit organization.
In addition to measures of herbarium richness, five measures of research activity were collected for each herbaria. First, the total number of research staff for each institution was collected through the Index Herbariorum from the New York Botanical Gardens, which provides a list of associated staff members for each indexed herbarium. If an institution did not have an associated staff list, the homepage of the herbarium was used as a source of the number of staff members. For each collection, the total number of Google Scholar search results from a search for the institution name was recorded as a proxy for the number of times the institution name appears in publications, either as an author affiliation, in the acknowledgments, in research methods, or in cited material. This is admittedly a coarse measure of research output, so additionally, each staff member affiliated with the herbarium institutions was screened for a Google Scholar page. The total number of research articles and associated citations for each of the staff members with a Google Scholar page was recorded.

Statistical Analysis
To determine the correlations between the variables of interest, the Pearson correlation was calculated for each of the pairwise comparisons. Additionally, a one-way ANOVA was used to determine the association between the institution type and measures of herbarium research output and richness. In outcome variables with an ANOVA significant at P < 0.05, a Tukey's HSD test was used to determine the significant differences between the mean values of institutions and statistically significant groupings between the institution types. Finally, the herbarium richness measures were compared to research outcome variables using linear regressions. All data analysis was performed in R (version 3.6.2) with RStudio (version 1.1.453).

Summary of data collection
<Table 2 near here> Of the 339 total institutions that submit data to SEInet, 28 were omitted from the present study for having fewer than 10 databased specimens. A further 10 were omitted for being outside of the United States, leaving 301 institutions included for further analysis. After categorizing by type, the majority of the herbaria are housed at universities, with 52 at private universities and 174 at public universities. There are a total of 30 institutions that were categorized as cultural sector institutions, primarily housed in botanical gardens and museums. A total of 36 institutions are categorized as public land institutions, primarily housed at national parks, other federal institutions like the Bureau of Land Management and Forest Service, and state or local park departments. Finally, the remaining five herbaria collections are housed at fully private institutions, with two at for-profit companies and three at for-profit nature preserves. Due to the low number of representative collections, the fully private herbaria were excluded from future analysis by type, though the staff associated with the herbaria were included in analyses of research output and herbaria measures.
Across the institutions, a total of 1,102 staff members were found across all herbaria, because some herbaria shared staff members, this led to 1,024 unique staff members. Of these staff members, 24.31% (268) had active Google Scholar profile pages (Table 3). The distribution of herbaria richness across the remaining types (cultural sector institution, public university, private university, and public land institution) is presented in Table 1.  Table 3). However, the percent of imaged specimens is negatively correlated with both the percent of georeferenced specimens (r = -0.21, P = 0.0001) and the total number of georeferenced specimens (r = -0.11, P = 0.04). Additionally, the number of research staff at a herbarium is significantly and positively correlated with measures of herbarium richness, except for the number of type specimens in the institution (Figure 1). In every herbarium richness, diversity, and digitization analysis, public lands were in the Tukey grouping with the lowest mean, indicating public lands tended to have smaller and less diverse collections (Table 2). In contrast, cultural sector institutions were in the highest group, indicating larger and more diverse collections. In most assessments, the universities, both public and private, are in an intermediate group. However, in comparing within universities, public universities tended to have higher mean average metrics of diversity than private universities (Table 2).  Table 3). Particularly, the number of articles by staff members increased with the total number of the research staff at their home institution (b = 4.11, P < 0.0001, Figure 2A). This relationship also holds between the total Google Scholar results for the researcher name in the absence of a Google Scholar page (b = 3.77, P < 0.0001). There is also a significant positive correlation between a number of herbaria richness measures and the number of research articles of research staff with Google Scholar pages, namely the number of collection specimens (P = 0.0054), the total taxonomic groups represented by the collection (P = 0.03), the number of species groups in the collection (P = 0.04), and the number of type specimens (P = 0.03) (Figure 2). Finally, there was a significant difference in the percentage of staff associated with each institution type that had a Google Scholar profile page (Table 3).

Larger Collections Tend To Be More Diverse
One of the findings of this analysis is that herbarium richness in one area is positively correlated with herbarium richness in other measures. As size increases, the diversity, and digitization of the collection also increases. The institution type is significantly related to the institution type, with richer and larger herbaria at cultural sector institutions, followed by private universities, public universities, and finally public land institutions. Consistently, public land institutions lag behind both cultural sector institutions and university institutions across richness measures, suggesting management features of public land collection institutions may not lend themselves to large and diverse collections.
Cultural sector institutions outperform other institution types in the size and diversity of their collections. This may be because a greater overall research activities occurring at these institutions, as cultural sector institutions were found to have a higher number of type specimens --an indicator of active systematics research--and higher percent of specimens georeferenced -an indicator of active big data diversity research and ongoing digitization--in their collections (Table 1, Table 2). Alternatively, for this relationship may be because cultural sector institutions are often co-located with botanical gardens (living collections), leading to easy access to exotic plants that may otherwise be difficult to collect as specimens. While this would increase the number of unique species in the collection, thus increasing the associated biodiversity, this may not adequately support the needs of researchers interested in documenting wild species occurrence and differences across a species' natural range. Increasingly, cultural institutions like museums and botanical gardens also harness visitor enthusiasm for digitization tasks through citizen science tools, which can support digitization activities (Garretson et al. 2020; Garretson Forthcoming). Additionally, staff at cultural sector institutions are often not subject to other noncuration tasks, while collectors at public lands may have obligations to collect specific specimens relevant to survey work, and collectors at universities may have additional restrictions on time, including teaching, mentorship, and service (Snow 2005;Ab Rahim et al. 2013;Adams and Griliches 2000).
However, the negative correlation between the percent of specimens that are imaged and the percent of specimens georeferenced may suggest that there is a tradeoff between digitization task types ( Figure 1). There are very different skill sets and technological requirements to georeference specimens compared to imaging specimens (Nelson et al. 2015;Blagoderov et al. 2012). Standardized imaging of specimens requires at minimum, a high-quality digital camera, lighting fixtures, a camera stand, a color scale, and a ruler. This setup can cost upwards of US$1500, which may be prohibitive for a small herbarium (Harris and Marsico 2017). Therefore, institutions with a limited budget may need to focus on only one digitization task at a time, and may focus on georeferencing due to the financial investment required to appropriately image specimens. As digitization tasks expand to include generating genetic data for each specimen, the financial and technological needs to meet best practices of digitization will grow, increasing the disparity between institutions in digitization completion.

Research output and herbaria richness
The lack of association between institution type and individual researcher outcomes may be because there are too many confounding factors like career stage, position type, or age (Beaudry and Allaoui 2012;Gonzalez-Brambila and Veloso 2007;Wang et al. 2017). There may also be differences in the subset of researchers across all institution types that are likely to have active Google Scholar accounts. Particularly, younger researchers may be more likely to have an active online presence, as previous studies have found the age and career stage of a researcher can impact their assessment of the importance online academic presence (Mierzecka, Kisilowska, and Suminas 2020;Arshad and Ameen 2017;Wang et al. 2017). This may mean that the dataset of researchers used in this study may not be a random sample of researchers at the institutions. However, a positive relationship between herbaria measures and research outcomes means that a larger dataset could reveal significant institutional trends. The association with the number of staff might mean that collaborations and staff community could lead to more research, suggesting that larger institutions may generally have greater research output for any given faculty. This is in line with prior studies of research output of university faculty members that have suggested that larger public universities have greater research output compared to that of smaller private universities (Ab Rahim et al. 2013;Gonzalez-Brambila and Veloso 2007).

Conclusions
These results demonstrate that there are institutional differences in collection stewardship and there is a link between institutional type and its effectiveness in contributing to publicly available biodiversity data. The results of this study show that institution type is significantly associated with the size, diversity, and digitization of a herbarium collection. However, past studies have found that critical occurrence records and rare taxa can be found in small natural history collections (Glon et al. 2017). Small and distributed herbaria may be better able to catalogue and collect unique local flora, but may not have the resources to support larger research endeavors. There are also certainly differences in the institutional priorities across all institutions, particularly in the mission, vision, and objectives of the institutions which can drive data collection methods, focuses, and digitization protocols. Additionally, the results demonstrate that as herbaria richness increases, the research output of associated staff increases in kind. This suggests that there may be economies of scale in research output because as the number of researchers grow, research output per researcher grows accordingly. Decentralized, small collections can benefit from local knowledge and unique records, but centralization has significant returns to scale in collection stewardship, digitization, research output.

Implications of differences between cultural sector and public land institutions
While small herbaria can be important to researchers due to their ability to add unique occurrence information for rare plants (Glon et al. 2017), these findings suggest that smaller herbaria, particularly those located at public land institutions, may be under digitized and under incorporated into the publicly available biodiversity datasets. This may be because smaller herbaria tend to have fewer staff and fewer resources to support data collection, digitization, and access (Harris and Marsico 2017;Snow 2005;Blagoderov et al. 2012). Larger herbaria benefit from scale with respect to personnel, access to technology, and curatorial skill sets that may lead to greater opportunities for collaborative publication, collection research usage, and expansion of the collection. This demonstrates that smaller herbaria can act as polycentric institutions -better able to incorporate local knowledge and particularities of their surrounding region while other aspects of collection curation, namely digitization and long term storage, is better suited to more centralized archives. This mismatch between the institutions and researchers best positioned to collect novel specimens and institutions best positioned to digitize and manage the data in the long term represents an ongoing challenge in the management of small collections. This may also represent an important area for increased investment in supporting smaller institutions in building the capacity to support additional digitization and collection activities.
This finding supports the policy requiring federal entities to plan for submission of objects to the Smithsonian Institution. The Smithsonian Museum of Natural History is a public-nonprofit partnership, and contains more than 156 natural history specimens. The Smithsonian Museums benefit from economies of scale and often are the first institutions to have access to cutting edge curatorial technology including 3D scanners and genomics tools. However, the challenge in implementing this policy, particularly in the USGS, has been that this policy leaves USGS staff without significant investment in the long-term stewardship of the physical objects in question, and often without the training to appropriately store the items to prevent deterioration and destruction. Requiring the development of collection management plans may help address some of these concerns, but may not be sufficient to ensure proper management of the collection items before they reach the Smithsonian Institution. Like the USGS, many small herbaria need to maintain transition plans for their collections in the event that they lose funding, complete the associated research project, or face other risks to the datasets (Mayernik et al. 2020). Tools like persistent identifiers, such as IGSNs, can help prevent some of these hurdles by ensuring critical metadata is associated with natural history specimens throughout its lifecycle (Hobern, Hahn, and Robertson 2018).

Directions for Future Research
Herbaria are only one type of biodiversity information facility, so assessing whether these trends hold with museums, field stations, and published biodiversity data and literature may be key to further understanding how research institution types and digitization efforts impact biological discovery and knowledge generation. Other studies have suggested that substantial differences exist between universities with and without a medical school, and that these institutional differences may be more important than the private/public differentiation (Ahn, Charnes, and Cooper 1988). Finer-grain differences in the institutional type might also be relevant for the understanding of how institutional type might influence the collection richness and research effort, so future studies should consider the age of the institution and its operating budget.
Better understanding the institutional differences in the digitization of natural history collections can assist in targeting collections that might need additional support through granting agencies, like the ADBC and can improve our understanding of how policies regulating the deposition of federally collected specimens might impact long term collection outcomes.
Collections of natural history specimens are an invaluable resource in understanding scientific processes, and contribute to critical research in policy-relevant areas like public health and pandemic forecasting, monitoring of climate and ecological change, and basic biological research. Ensuring the ongoing protection, digital preservation, and research access to these items is a critical aspect of stewarding our research resources and better understanding our changing environment.   Private university herbaria are located at private universities. These universities are not operated by governments, although many receive public funding through tax breaks, publicly funded student loans, and government-administered research grants. Herbaria at these institutions are often maintained by faculty, graduate students, or curatorial staff and are typically used for academic research and teaching.
Public land herbaria may be located at national, state, or local park and recreation, agricultural, land management, or forestry administrations. These organizations are often tasked with collecting and retaining specimens relevant to their scope, e.g. a forestry department may collect specimens relevant to the tree species in their region or associated pests or a national park may collect specimens of endemic plants in their associated lands.

Potential Priorities
Visitor engagement, education, rare plant cultivation, recreation

Applying and Generalizing an Examination of Mercury Use in Preparing Herbarium
Specimens." Biodiversity Information Science and Standards 2 (July): e25699.