Changes in Forest Composition, Stem Density, and Biomass from the Settlement Era (1800s) to Present in the Upper Midwestern United States

EuroAmerican land use and its legacies have transformed forest structure and composition across the United States (US). More accurate reconstructions of historical states are critical to understanding the processes governing past, current, and future forest dynamics. Gridded (8×8km) estimates of pre-settlement (1800s) forests from the upper Midwestern US (Minnesota, Wisconsin, and most of Michigan) using 19th Century Public Land Survey (PLS) records provide relative composition, biomass, stem density, and basal area for 26 tree genera. This mapping is more robust than past efforts, using spatially varying correction factors to accommodate sampling design, azimuthal censoring, and biases in tree selection. We compare pre-settlement to modern forests using Forest Inventory and Analysis (FIA) data, with respect to structural changes and the prevalence of lost forests, pre-settlement forests with no current analogue, and novel forests, modern forests with no past analogs. Differences between PLSS and FIA forests are spatially structured as a result of differences in the underlying ecology and land use impacts in the Upper Midwestern United States. Modern biomass is higher than pre-settlement biomass in the northwest (Minnesota and northeastern Wisconsin, including regions that were historically open savanna), and lower in the east (eastern Wisconsin and Michigan), due to shifts in species composition and, presumably, average stand age. Modern forests are more homogeneous, and ecotonal gradients are more diffuse today than in the past. Novel forest assemblages represent 29% of all FIA cells, while 25% of pre-settlement forests no longer exist in a modern context. Lost forests are centered around the forests of the Tension Zone, particularly in hemlock dominated forests of north-central Wisconsin, and in oak-elm-basswood forests along the forest-prairie boundary in south central Minnesota and eastern Wisconsin. Novel FIA forest assemblages are distributed evenly across the region, but novelty shows a strong relationship to spatial distance from remnant forests in the upper Midwest, with novelty predicted at between 20 to 60km from remnants, depending on historical forest type. The spatial relationships between remnant and novel forests, shifts in ecotone structure and the loss of historic forest types point to significant challenges to land managers if landscape restoration is a priority in the region. The spatial signals of novelty and ecological change also point to potential challenges in using modern spatial distributions of species and communities and their relationship to underlying geophysical and climatic attributes in understanding potential responses to changing climate. The signal of human settlement on modern forests is broad, spatially varying and acts to homogenize modern forests relative to their historic counterparts, with significant implications for future management.

In areas of open prairie or other treeless areas, e.g. southwestern Minnesota, surveyors 275 recorded distances and bearings to 'Non Tree' objects. When points were to be located in 276 water bodies the point data indicates 'Water'. Points  transcription across all datasets, we exclude sites for which multiple large trees have a 293 distance of 1 link (20.12 cm) to plot center, trees with very large diameters (diameter at 294 breast height -dbh > 100 in; 254 cm), plots where the azimuth to the tree is unclear, and 295 plots where the tree is at plot center but has a recorded azimuth. All removed plots are documented in the code used for analysis (Supplement 1: step_one_clean_bind.R) and are 297 commented for review. 298 Data Aggregation 299 We binned the point data using an 64km 2 grid (Albers Gt. Lakes St Lawrence projection; 300 Supplement 1: base_calculations.R) to create a dataset that has sufficient numerical power 301 for spatial statistical modeling and sufficient resolution for regional scale analysis [63]. 302 This resolution is finer than the 100km 2 gridded scale used in Freidman and Reich [45], but 303 coarser than township grids used in other studies [19,56]  Our approach allows for spatial variation in surveyor methods by applying various 318 spatially different correction factors based not only on the empirical sample geometry, but 319 also on known surveyor biases deviating from this design [57]. These estimates are based 320 on empirical examination of the underlying data, and have been validated using 321 simulations on stem mapped stands [57]. 322 We estimate stem density (stems m -2 ) based on a on a modified form of the Morisita two-323 tree density estimator, which uses the distance-to-tree measurements for the two closest 324 (1) 332 where λ is density ; k is the number of sectors within which trees are sampled, N is the 333 number of points over which estimates are aggregated, r is the distance of point-to-tree (as 334 m). This estimate can be modified by a refinement of the Cottam quadrant factors [66,67] 335 which recognizes that different sampling designs, and the order of the distances in different 336 quadrants (or sectors) carry specific weights. This correction, herein called κ, accounts for 337 different sampling designs. When either four quadrants or trees are sampled (point quarter 338 design), or when two trees in opposite semicircles (point halves design) are sampled, the 339 equation is accurate and κ = 1; when the two trees are in the nearest of two quadrants (two 340 nearest quadrants design), κ = 0.857; and when two trees are in quadrants on the same 341 side of the direction of travel (one-sided or interior half design), κ = 2. This parameter, in 342 Cottam's notation [67], is a divisor of the denominator above, or here, the mathematically 343 equivalent multiplier in the numerator of the reciprocal of the squared distances. 344 We further simplify the density estimate in equation (1)  Trees near the cardinal directions were passed over, and a replacement was found within a 372 more restricted angular region. The correction for this bias is calculated following 373 Kronenfeld and Wang [55] in a manner similar to the sector bias. The factor ζ is the ratio of 374 the proportion of trees in the restricted area (p) divided by the proportion of the complete 375 circle (α) that is used. The azimuthal censoring parameter (ζ) ranges from 1.03 to 1.25 376 indicating an equivalent to complete elimination of trees from 10 o to 72 o azimuths adjacent 377 to the cardinal directions. 378 Diameter limit (ϕ). Examination of the diameter distributions from settlement era surveys 379 across the upper Midwest clearly demonstrate witness trees less than 8 inches in diameter 380 were undersampled [38,57,59]. We have confirmed this bias in our own inspection of plots of diameter frequency in the PLSS data, which show a strong mode at 8". This bias can be 382 accommodated by setting a diameter limit, and only calculating the density for trees with 383 diameters above this limit. Total density calculated from all trees is reduced to this 384 reference limit by simply multiplying the total by the percentage of trees above this limit. 385 This effectively eliminates the smaller trees from the total and normalizes the value of trees 386 above this standard. The parameter (ϕ) represents diameter size bias is simply the 387 percentage of trees ≥ 8" and, in practice, ranges from 0.6 -0.9. 388 Because all surveyor bias corrections are simple multipliers of the model density and 389 should be independent, the bias-minimized estimate of the point density of trees ≥ 8" is: 390 Estimates for each point i can be averaged for all N points in any region.    [83]. 462 We identify analogs and examine differences in composition between and within PLSS and 463 We test for the importance of this effect on our analog analyses via a sensitivity analysis in 481 which we test whether dissimilarities between FIA and PLSS grid cells are affected by the 482 number of PLSS plots per cell. We find a small effect (see below), suggesting that our 483 analyses are mainly sensitive to the compositional and structural processes operating on 484 large spatial scales. 485 To understand the extent to which the processes governing novelty operate at landscape 486 scales, we relate the novelty of a cell to the spatial distance between individual novel cells 487 and the nearest 'remnant' forest cell, i.e., how far away can you go from a remnant forest 488 cell before all cells are predicted to be novel. We examine whether this relationship varies 489 between forest types, and whether it is different than the relationship we might see if the 490 dissiminlarity values were distributed randomly on the landscape. The definition of 491 "remnant" forest is likely to be arbitrary and, possibly, contentious. We use a threshold, the 492 lowest 25%ile of compositional dissimilarity within the PLSS data, as our cutoff. This 493 means that all FIA cells with nearest neighbor dissimilarities to the PLSS era forests below 494 this cutoff are considered to be representative of the PLSS era forests. The analysis 495 presented below is robust to higher cutoffs for the remnant forest threshold. 496 We use a generalized linear model with a binomial family to relate novelty (as a binomial, 497 either novel or not) to the spatial distance from the nearest 'remnant' cell for each of the 498 five major forest types within the PLSS data (Oak savanna, Oak/Poplar/Basswood/Maple, 499 Pine, Hemlock/Cedar/Birch/Maple and Tamarack/Pine/Spruce/Poplar forests). Because 500 the geographic extent of this region is complex, with islands, peninsulas and political 501 boundaries, we use permutation, resampling the FIA to PLSS nearest neighbor distances 502 without replacement, to estimate the expected distance to novelty if FIA to PLSS nearest 503 neighbor dissimilarities were distributed randomly on the landscape. 504 We expect that a weak relationship will indicate that novelty, following landscape-scale 505 land use change, is moderated by a species pool culled from small remnant patches, 506 individual specimens, or local scale restoration efforts. A significant relationship between 507 distance from remant forest and novelty indicates that small patches have been insufficient 508 to restore natural forest cover within the region, and would indicate that greater efforts are 509 needed to restore landscapes at regional scales. Species assignments to genera were rarely problematic. Only 18 PLSS trees were assigned 529 to the Unknown Tree category, representing less than 0.01% of all points. These unknown 530 trees largely consisted of corner trees for which taxon could not be interpreted, but for 531 which diameter and azimuth data was recorded. A further 0.011% of trees were assigned 532 to the "Other hardwood" taxon (e.g., hawthorn, "may cherry", and "white thorn").
For maple the class has very high within-genera specificity for a number of assignments. A 534 total of 78478 trees are assigned to "Maple". Of these, surveyors do use common names 535 that can be ascribed to the species level (e.g., A. saccharum, n = 56331), but a large number 536 of the remaining assignments are above the species level (n = 21356). This lack of 537 specificity for a large number of records causes challenges in using the species level data. A 538 similar pattern is found for pine, where many individual trees (125639) can be identified to 539 the level of species (P. strobus, n = 41673; P. banksiana, n = 28784; P. resinosa, n = 28766), 540 but there remains a large class of pine identified only at the genus level, or with unclear 541 assignment (n = 17606). 542 For ash the data includes both surveyor attributions to "brown ash" (presumably a 543 colloquial term used by surveyors as this is not currently an accepted common name in the 544 region) and black ash (n=9312), and white ash (n = 2350), but again, also includes a large 545 class of ash for which no distinction is made within the genera (n = 7423). 546 These patterns are repeated throughout the data. For spruce this within-genera confusion 547 is even greater, with 50188 assignments to genera-level classes and only 20 to either black 548 or white spruce. 549  The mean stem density for the region (Fig 2a) is 153 stems ha -1 . Stem density exclusive of 552 prairie is 172 stems ha -1 and is 216 stems ha -1 when both prairie and savanna are excluded. 553

Spatial Patterns of Settlement-Era Forest Composition: Taxa and PFTs
The 95th percentile range is 0 -423 stems ha -1 , and within-cell standard deviations 554 between 0 and 441 stems ha -1 . Basal area in the domain (Fig 2c) has a 95th percentile 555 range between 0 and 63.5 m 2 ha -1 , a mean of 22.2 m 2 ha -1 , within-cell standard deviations 556 range from 0 to 76.7 m 2 ha -1 . Biomass ranges from 0 to 209 Mg ha -1 (Fig 2d), with cell level 557 standard deviations between 0 and 569 Mg ha -1 . High within-cell standard deviations 558 relative to mean values within cells for density, basal area and biomass indicate high levels 559 of heterogeneity within cells, as expected for the PLSS data, given its dispersed sampling 560 design. 561  show similar patterns to stem density (but see Fig 3). 566 In the PLSS data, stem density is lowest in the western and southwestern portions of the 567 region, regions defined as prairie and savanna (Fig 2b, Table 2 (Fig 2b). The highest stem densities occur in north-central Minnesota and in 571 north-eastern Wisconsin (Fig 2a), indicating younger forests and/or regions of lower forest 572 productivity. 573 Forest structure during the settlement era can be understood in part by examining the 580 ratio of stem density to biomass, a measure that incorporates both tree size and stocking. 581 Regions in northern Minnesota and northwestern Wisconsin have low biomass and high 582 stem densities (Fig 3, blue). This indicates the presence of young, small-diameter, even-583 aged stands, possibly due to frequent stand-replacing fire disturbance in the pre-584 EuroAmerican period or to poor edaphic conditions. Fire-originated vegetation is 585 supported by co-location with fire-prone landscapes in Wisconsin [88]. High-density, low-586 biomass regions also have shallower soils, colder climate, and resulting lower productivity. 587 High-biomass values relative to stem density (Fig 3, red) are found in Michigan and 588 southern Wisconsin. These regions have higher proportions of deciduous species, with 589 higher tree diameters than in northern Minnesota. 590  Wisconsin, and upper Michigan (Fig 5, yellow). This mixedwood assemblage is interspersed 644 by both Pine dominated landscapes (Fig 5, orange) and, to a lesser degree, the softwood 645 assemblage Tamarack/Pine/Spruce/Poplar (Fig 5, green), which dominates in north-646 eastern Minnesota. The softwood assemblage is itself interspersed with Pine dominated 647 landscapes, and grades into a mixed-hardwood assemblage of 648 Oak/Poplar/Basswood/Maple (Fig 5, light purple) to the west. Thismixed-softwood forest 649 assemblage grades south into mono-specific Oak savanna (Fig 5, dark blue). 650

Oak/Poplar/Basswood/Maple (light purple), Tamarack/Pine/Spruce/Poplar (light green), 654
Oak Savanna (dark purple) and Pine (orange). These forest types represent meso-scale 655 (64km 2 ) forest associations, rather than local-scale associations. 656 The broad distributions of most plant functional types results in patterns within individual 657 PFTs that are dissimilar to the forest cover classes (Fig 5). Thus overlap among PFT 658 distributions (Fig 6)  basal area, total biomass, and composition (Fig 2). The grassland PFT is mapped onto non-667 tree cells with the assumption that if trees were available surveyors would have sampled 668 them.

670
Differences between PLSS and FIA data shows strong spatial patterns, but overall estimates 671 can be examined. By cell, modern forests (FIA) have higher stem densities (146 stems ha -1 , 672 t 1,5177 = 51.8, p < 0.01) than PLSS forests, but slightly lower basal areas (-4.5 m 2 ha -1 , t 1,5177 673 = -16.4, p < 0.01) and lower biomass (-8.72 Mg ha -1 , t 1,5177 = -6.55, p < 0.01) (Fig 7). We use 674 only point pairs where both FIA and PLSS data occur since non-forested regions are 675 excluded from the FIA and as such cannot be directly compared with PLS estimates. The 676 similarity in biomass despite lower stem density and total basal area in the PLSS data is 677 surprising. Two likely factors are shifts in allometric scaling associated with changes in 678 species composition, or a higher mean diameter of PLSS trees (Fig 7d). Total biomass was 679 45,080 Mg higher in the PLSS when summed across all cells coincident between the FIA 680 and PLSS. 681

689
Every one of the five historical PLSS zones shows an increase in stem density (Table 3). The 690 two forest types bordering the prairie, Oak Savanna and Oak/Poplar/Basswood/Maple 691 both show increases in density that likely reflect, in part, the issues addressed earlier with 692 regards to the sampling of forested plots in the FIA (over 10% cover). Density in the Oak 693 Savanna increases from a mean 27 stems/ha to 217 stems/ha, with a mean biomass 694 increase of 62 Mg ha -1 per cell (Table 3), the highest of any of the zones. The 695 Oak/Poplar/Basswood/Maple forest had higher PLSS-era densities (90 stems/ha) 696 reflecting open forest status rather than savanna (Table 2), but also shows a large increase in estimated FIA-era stem density (to 218 stems/ha) but with a much lower increase in 698 biomass than the Oak Savanna, and a negligable increase in basal area (Table 3). The 699 largest forest zone, Hemlock/Cedar/Birch/Maple shows the largest decline in biomass (a 700 net loss of 56.7 MG ha 01 since the PLSS-era) and basal area (net loss of 13 m 2 ha -1 since the 701 PLSS-era), but with an average increase in FIA era stem density. 702

Fig 7. The relationship between (a) average stem density, (b) total basal area and (c) biomass 704
values in the PLSS and FIA datasets. Stem density and total basal area are higher in the FIA 705 than in the PLS, however average cell biomass is higher in the PLSS. A 1:1 line has been added 706 to panels a-c to indicate equality. 707 The PLSS has a lower overall mean diameter than the FIA (δ diam = -2.9 cm, 95%CI from -708 17.3 to 8.18cm). FIA diameters are higher than PLSS diameters in the northwestern parts 709 of the domain (on average 6.47 cm higher), overlapping almost exactly with regions where 710 we have shown low biomass-high density stands (Fig 3). At the same time, regions with 711 high biomass and low density stands, in northeastern Wisconsin, and the Upper and Lower 712 Peninsulas of Michigan, had higher average diameters during the PLSS than in the FIA, on 713 average 3.65 cm higher. Thus we are seeing an overal increase in tree size in the sub-boreal 714 region and a decrease in temperate mixedwood forests, where we find tree species with 715 much higher dbh:biomass ratios [68]. This is coupled with declining variance in dbh across 716 the domain (from within cell variance of 37.9 cm the PLSS to 30.7 cm in the FIA). Thus, the 717 mechanism by which low density and basal area produce roughly equivalent biomass 718 estimates between the FIA and PLSS is likely due to the differential impacts of land 719 clearence and subesequent forest management in the south east vs the northwest. The loss 720 of high biomass southern hardwood forests is balanced by higher biomass in the northeast 721 due to fire suppression and regeneration of hardwoods in the northwest. Declining 722 diameters from the PLSS to FIA are most strongly associated with higher abundances of 723 poplar, ironwood and oak, while declining diameters are associated with maple and 724 hemlock, further supporting the assertion that much of the loss in diameter, and, subsequently biomass, is occuring in southeastern mixedwood/hardwood forest, while 726 diameter and biomass increases are occuring in the northwest. 727 Differences between FIA and PLSS data in sampling design are unlikely to be a factor for 728 most measures (see below); these differences are expected to affect how these datasets 729 sample local-to landscape-scale heterogeneity, but should not affect the overall trends 730 between datasets. Differences in variability introduce noise into the relationship, but given 731 the large number of samples used here, the trends should be robust. 732 There is a small but significant positive relationship (F 1,5964 = 920, p < 0.001) between the 746 number of FIA plots and within-FIA dissimilarity. The relationship accounts for 13% of 747 total variance and estimates an increase of δ d = 0.0134 for every FIA plot within a cell. This 748 increase represents only 3.08% of the total range of dissimilarity values for the FIA data. 749

Compositional Changes Between PLSS and FIA Forests: Novel and Lost Forests
There is a gradient of species richness that is co-linear with the number of FIA plots within 750 a cell, where plot number increases from open forest in the south-west to closed canopy, 751 mixed forest in the Upper Peninsula of Michigan. Hence, differences in within-and 752 between-cell variability between the PLSS and FIA datasets seem to have only a minor 753 effect on these regional-scale dissimilarity analyses. 754 We define no-analog communities as those whose nearest neighbour is beyond the 95%ile 755 for dissimilarities within a particular dataset. In the PLSS dataset, forests that have no 756 modern analogs are defined as "lost forests", while forest types in the FIA with no past 757 analogs are defined as "novel forests". More than 25% of PLSS sites have no analog in the 758 FIA dataset ('lost forests'; PLS-FIA dissimilarity , Fig 8c), while 29% of FIA sites have no 759 analog in the PLSS data ('novel forests'; FIA-PLSS dissimilarity, Fig 8d). Lost forests show 760 strong spatial coherence, centered on the "Tension Zone" [85], the ecotone between 761 deciduous forests and hemlock-dominated mixed forest (Fig 4). 762 oak (x FIA = 12%). Tamarack in these forests has declined significantly, from 23% to only 777 5% in the FIA, while Poplar has increased from 10% to 22%, resulting in forests that look 778 less mesic and more like early seral forests. 779 Cedar/juniper-Hemlock-Pine accounts for 19.8% of all lost forests and 5.49% of the total 780 region. This forest type is found largely in northeastern Wisconsin, and the Upper and 781 Lower Peninsulas of Michigan. This lost forest type has been predominantly replaced by 782 maple, poplar, and pine, retaining relatively high levels of cedar (x PLS = 19%; x FIA = 14%). 783 The loss of hemlock is widespread across the region, but particularly within this forest 784 type, declining to only 3% from a pre-settlement average of 18%. Lastly, Beech-Maple-Hemlock accounts for 12.6% of all lost forests and 3.49% of the total 801 region. This forest type is found exclusively on the central, western shore of Lake Michigan 802 and in the Lower Peninsula, in part due to the limited geographic range of Beech in the 803 PLSS dataset (Fig 4). Beech is almost entirely excluded from the modern forests in this 804 region, declining from x PLS = 37% to x FIA = 4%. Pine in the region increases from 9% to 805 16%, while maple, the dominant taxa in the modern forests, increases from 16 -25%. 806 On average lost forests contain higher proportions of ironwood (r = 0.203), beech (r = 0.2), 807 birch (r = 0.189) and hemlock (r = 0.188) than the average PLSS forest, and lower 808 proportions of oak (r = -0.28), poplar (r = -0.145), and pine (r = -0.107). 809 The distribution of novel ecosystems (Fig 8d) is spatially diffuse relative to the lost forest of 810 the PLSS and the forest types tend to have fewer co-dominant taxa. FIA novel forest types 811 also have a more uneven distribution in proportion than the PLSS lost forests. Overall, 812 novel forests are associated with higher proportions of maple (r = 0.02), ash (r = 0.03) and 813 basswood (r = -0.04), although basswood is dominant in only one forest type (Poplar-814 Cedar/juniper-Maple). Novel forests are associated with lower proportions of oak (r = -815 0.28), and pine (r = -0.11). This analysis suggests that the loss of particular forest types 816 associated with post-settlement land use was concentrated in mesic deciduous forests and 817 the ecotonal transition between southern and northern hardwood forests, while the gains 818 in novelty were more dispersed, resulting from an overall decline in seral age. 819 By far the largest novel forest type is Maple, which accounts for 37.2% of all novel forests 820 and 2.68% of the total region. As with all novel forest types, this forest type is broadly 821 distributed across the region. This forest type is associated with co-dominant maple (x FIA = 822 23%) and ash (x FIA = 22%). Hemlock has declined significantly across this forest type, from 823 x PLS = 24% to x FIA = 4%. 824 Poplar-Cedar/juniper-Maple, accounts for 28.8% of all novel forests and 2.08% of the total 825 region. The broad distributiof these novel forests makes assigning a past forest type more 826 difficult than for the PLSS lost forests, the distribution replaces two classes of past forest, 827 one dominated by oak, in southern Wisconsin and Minnesota, the other by mixed hemlock, 828 beech, birch and cedar forests. 829 Pine-Cedar/juniper-Poplar-Maple forest accounts for 16.3% of all novel forests and 1.17% 830 of the total region. This forest type is again broadly distributed, and is widely distributed across the region, representing a homogenous, early seral forest type, likely associated 832 with more mesic sites. Oak forest accounts for 13.3% of all novel forests and 0.96% of the 833 total region. This grouping again shows a pattern of broad distribution across the region, 834 associated with cedar/juniper percentages near 40%, with smaller components of poplar 835 (14%) and maple (13%). 836  The GLM shows that distance from remnant forests in the FIA is significantly related to the 849 probability of a cell being novel (χ 1,4 =623, p < 0.001). The mean distance to novelty varies 850 by PLSS forest type, but is between approximately 20 and 60km for the four forest types 851 examined here (Fig 9), while the null model would predict a distance of 10 -20km to 852 novelty from remnant cells if dissimilarities were distributed randomly on the landscape 853 (Table 4). Novel forests are generally further from remnant patches than expected in the 854 null model, regardless of forest type, but the distance to novelty is greater for modern 855 forests that are, generally, more similar to their PLSS state (Pine and Tamarack dominated 856 forests), and closer for forests that are more dissimilar. 857