Local population structure in Cambridgeshire during the Roman occupation

The Roman period saw the empire expand across Europe and the Mediterranean, including much of what is today the United Kingdom. While there is written evidence of high mobility into and out of Britain for administrators, traders and the military, the impact of imperialism on local population structure is invisible in the textual record. The extent of genetic change that occurred in Britain before the Early Medieval Period and how closely linked by genetic kinship the local populations were, remains underexplored. Here, using genome-wide data from 52 ancient individuals from Cambridgeshire, we show low levels of genetic ancestry differentiation between Romano-British sites and lower levels of runs of homozygosity over 4 centimorgans (cM than in the Bronze Age and Neolithic. We find fourteen cases of genetic relatedness within and one between sites without evidence of patrilineal dominance and one case of temporary mobility within a family unit during the Late Romano-British period. We also show that the modern patterns of genetic ancestry composition in Modern Britain emerged after the Roman period.


Introduction
The most visible individuals from Roman Britain through texts are soldiers and administrators, many of whom came from other parts of the Empire (Eckardt and Müldner 2014). The Roman army was recruited from across the whole empire, and, in general, soldiers were posted to areas away from their homelands to avoid conflicts of loyalty (Haynes 2013)(pgs. 121-29). Migration from the Empire into Britain was likely dominated by these groups, as well as traders. Although local populations are archaeologically very well documented, the movement is less well understood and is invisible in the textual record. The extent of mobility in this period has been the subject of recent debate, with work largely focusing on the use of isotope data (Eckardt and Müldner 2014). However, as sampling has been dominated by the examination of unusual burials, our knowledge of the scale of migration and its impact on the overall population is impossible to assess. While the subsequent Early Medieval Period (5th -10th centuries CE) arguably resulted in major genetic shift towards higher affinities to Dutch, Danish, and other continental North Sea zone ancestries in eastern England, at the scale of 38-75% on average (Schiffels et al. 2016;Gretzinger et al. 2022), it is not clear whether this is due to migration only during the Early Medieval Period or if any of this change could be ascribed to gene flow during the Roman Period and before (Oosthuizen 2017); however, the long-standing ties between Britain and Gaul (Champion 2016)(pg. 155), both prior to and during the Roman period, may obscure the genetic distinction between local, indigenous Britons and incoming individuals.
In contrast to recent genomic studies on demographic changes in Bronze and Iron Age (Patterson et al. 2021) and Early Medieval (Gretzinger et al. 2022) periods, to date, few genomes from Great Britain from the Roman period have been published. A study of seven individuals from a cemetery in York with decapitations showed most individuals had a higher affinity to the modern Welsh than modern English, yet also highlighted the cosmopolitan nature of the Roman empire by identifying an individual with Middle Eastern / North African ancestry (Martiniano et al. 2016). However, York was a cosmopolitan urban centre and cannot be taken as typical of the province as a whole (Ottaway 2004). The population of Roman Britain totalled 2-4 million and was dominated by rural communities, who accounted for about 90% of the people (Millett 1990)(pg. 185). Although extensively networked, these communities were arguably little affected by migration (A. Smith, M. Allen, T. Brindle, M. Fulford 2016)(pg 416-17). However, this hypothesis has not yet been tested. The area that is today Cambridgeshire provides an extensively researched rural, agricultural region, not atypical of the province as a whole, so genetic information from farmstead communities in this region provides a key opportunity to improve our understanding of the local population(s) of Roman Britain.

Results
To explore the question of migration outside of cosmopolitan centres, we generated genome-wide data for 96 ancient individuals from eight sites (Table 1) in the Cambridgeshire region ( Figure 1A) to an average coverage of up to 3.8× (median 0.037×). Mitochondrial haplogroup could be determined for 71 individuals and a subset of 52 genomes which had autosomal coverage >0.01x were used in autosomal analyses including imputation-based analyses relying on 43 genomes (Data S1A). The individuals come from six sites across the Late Iron Age / Romano-British period and two earlier comparative sites from the Neolithic and Bronze Age (Table 1). The majority of our Romano-British sites were in use between 200-400 CE (Table 1, Data S1B) and encompass farmsteads and a cemetery with a number of burials thought to be decapitations (Wiseman et al. 2021) (Supplementary Note 1). In general, the genomes represent an equal distribution of males and females, as determined genetically, and a range of juveniles and adults of all ages (Data S1A). The average endogenous human DNA content varies by site, but overall is 12.03% and genome-wide coverage is 0.13×. Estimated contamination rates from mitochondrial DNA (mtDNA) range between 0 -3.78% (median 0.83%) and the average misincorporation of C > T in the first five base pairs (bp) is 8.39% (Data S1A). (C) PCA based on imputed genomes.The proportion of total variation explained by PC1 and PC2 is 0.0005 and 0.0004, respectively. Roman and Early Medieval Period genomes shown with x and + symbols, respectively, include those reported in this study as well as those from Martiniano et al. 2016 andSchiffels et al. 2016.

Population structure of Iron Age / Roman Cambridge
We studied the ancestry of the Roman period genomes from Cambridgeshire in the context of other contemporary material from Britain and modern genomes from Europe and the Middle East using Principal Component Analysis (PCA) (Data S2). We found that Roman genomes from Cambridgeshire all draw their genetic ancestry from Western Europe ( Figure   1C) and that similarly to the majority of Roman genomes from Yorkshire they cluster more closely with modern Welsh than local East Anglian genomes ( Figure 1C). Unlike the Yorkshire individuals, we do not detect outliers among the 25 Cambridgeshire Roman genomes with >0.1x coverage examined. All Roman period populations examined show homogeneity in their North/West European ancestry in relation to external reference populations in PCA analyses based on imputed data (Figure 1, Figure S1) or projections made from haploid genotype calls ( Figures S4 and S5).
We tested whether the imputed Roman genomes have different affinities to ancient and modern European populations using f4 statistics. Consistent with the increased Neolithic ancestry observed in Iron Age genomes from England by Patterson et al. (2022), all seven Roman Period sites we tested showed consistently higher drift sharing with Sardinian Neolithic genomes than genomes from Copper and Bronze Age England (-5.3<Z<-2.4; Figure 2A). All sites show higher affinity to Late Iron Age England than to Imperial Roman genomes ( Figure 2B). Unlike the Roman Period York cemetery that included a burial of a long distance migrant from present-day Syria or Jordan (Martiniano et al. 2016), we find no evidence of long-distance migration from the Mediterranean region among the 33 imputed genomes from Roman Cambridgeshire that we tested ( Figure S2A  Similarly to Roman Period genomes from York (Martiniano et al. 2016) we find higher proximity of the Cambridgeshire Roman genomes to present-day Dutch than French genomes ( Figure 2C, Data S3) and no higher affinity with Danish than Scottish genomes akin to the Early Medieval genomes ( Figure 2D). Neither do we observe any notable individual deviations from the patterns observed at site level ( Figure S3A-B). In sharp contrast to the single outlier, the long-distance migrant in York, we observe relatively little difference in the affinities of the Roman genomes to present-day groups from England, Kent and East England, with a third of the Roman Period individuals from Cambridgeshire (East England) showing, however, minor but significantly higher affinity to present-day Kent than average present-day genomes from East England ( Figure S3C).
We further examined IBD sharing patterns between imputed genomes of Roman individuals from Cambridgeshire in context of available Roman Period data from York, Late Iron Age France (Fischer et al. 2022) and Early Medieval West Europe (Gretzinger et al. 2022) as well as UK Biobank data for individuals born in the UK and elsewhere in Europe (Data S4, Figure 3). Unsurprisingly we find a relatively high level of IBD sharing among geographically close Roman sites in Cambridgeshire with an average probability of 25% of individuals from one site sharing an IBD segment longer than 4cM with individuals from another site, which is more than twice as high as sharing among present-day individuals from East or Southeast England. However, the mean rate of IBD sharing of Cambridgeshire Roman genomes with geographically more distant Roman site from York is lower (23%, p=0.42 by two-tailed t-test) than the local average in context of lower (16%, p=0.0014) diachronic IBD sharing between Cambridgeshire Roman and Early Medieval sites. Notably, IBD sharing among Early Medieval sites from across England is higher (32%, p=0.002) than sharing among Roman sites in Cambridgeshire alone, remaining high for the English Early Medieval sites across the Channel with Early Medieval sites from Lower Saxony and the Netherlands (28%). Compared to Roman sites, the Early Medieval sites from East England show (p=5x10 -7 ) increase in IBD sharing with present-day Scandinavian and Dutch genomes from approximately 10% to 15%, which is consistent with the major increase in that period of continental northern European ancestry detected by Gretzinger et al. (2022) (Gretzinger et al. 2022). At the same time, IBD sharing with Late Iron Age France drops in Cambridgeshire from the mean of 15.5% in the Roman to 10% in the Early Medieval and 8% in present-day East England which is comparable to the level of sharing between modern French and English (6.27%) (Data S4). We calculated runs of homozygosity (ROH) using HapROH ).
Using a two-tailed Student's T-test we find no difference in the average sum of ROH segments greater than 4cM or 8cM between the Roman era sites (Data S5). Nor, do we find a difference between the two newly generated Bronze Age (Over Barrows) samples and the Roman era populations (Data S5). We do find a difference between our Roman Period populations and the samples from York (4cM p = 0.014; 8cM p = 0.034), but only when comparing our imputed data (haploid mode) to the data available in the Allen Ancient DNA dataset (Anon). When comparing our imputed data to the York ROHs imputed in haploid and diploid mode from the shotgun data, there is no difference detected at either 4cM or 8cM lengths.

Uniparental marker diversity
To determine variation in the paternal lineages we called the genotypes of 161,140 Y chromosome haplogroup informative binary markers in 29 male samples from Late Iron Age and Roman Cambridgeshire with autosomal coverage >0.01x (Data S1C). All individuals could be assigned to haplogroups common in modern-day Europe. Majority of the samples (85%) belong to haplogroup R1b which became the predominant male lineage in Britain after the spread of the Beaker complex (Data S1D). Two first-degree related individuals from Duxford fall into the I2 clade which captures all previously known Y chromosome lineages in Britain before the Bell Beaker Culture (Data S1D-E). It is not clear, however, whether this particular lineage (I2-Y3722) of the Duxford father-son pair, reflects local continuity and survival from a pre-Beaker population as its present-day distribution is mainly focused on Ireland with only rare cases detected in England and Scotland (https://www.yfull.com/tree/I-Y3722/). Among R1b individuals with more coverage we identify distinct sub-clades, including the British/Irish Bell Beaker signature lineage R1b2-L21 (Patterson et al. 2021) as well as lineages from clades such as R1b11-Z2103 and R1b18-S1194 that have not been encountered in Britain in context of earlier time periods.
Notably, none of the four R1b samples with >0.2x Y chromosome coverage fall to the same sub-clade. Some of the identified subclades of R1b appear to be rare in a large, high resolution modern Y chromosome compendium of more than 60,000 FamilyTreeDNA customers (Data S1D). Overall, compared to Copper/Bronze Age periods we do not detect in our Roman Cambridgeshire samples any notable changes in the composition of the Y chromosome haplogroups apart from a single I1 (NWC010) and a single G2a (DUX006) lineage that, by their presence in the Iron Age data by (Patterson et al. 2021), were likely introduced to Britain from the mainland in the Iron Age. (Data S1E). In autosomal DNA and isotope analyses these two individuals (NWC010, DUX006) did not stand out as outliers with external origins or high mobility during their lifetime ( Figure 1C, Tables 2-3).
We determined mitochondrial (mtDNA) haplotypes for 71 individuals (Data S1F) and found high diversity, with no haplotype matches except in the case of close kinship (Data S1G).

Kinship structure
We examined relatedness within and among the Roman sites of Cambridgeshire using a pairwise mismatch approach on pseudo-haploid called data (Monroy Kuhn et al. 2018) to detect 1st-3rd degree related pairs of individuals and an IBD-based approach (Seidman et al. 2020) on imputed genomes to explore more distant forms of relatedness. Despite our relatively small sample sizes per site, we observed closely related pairs ( Figure 4) in all Roman age sites from Cambridgeshire except for Knobb's Farm (Data S1G -H). Both Knobb's Farm and the previously studied Driffield Terrace (2nd century Roman cemetery in York (Martiniano et al. 2016)), which also did not reveal related pairs, are sites where decapitated burials are common. None of the pairwise comparisons between the sites identified closely related individuals.  Kuhn et al. 2018. The lower boundary for 99% autosomally unrelated pairs is shown on the x-axis for guidance. Dots with high transparency correspond to pairs with low aggregate SNP coverage. B). Proportion of pairs with autosomal 1-3rd degree relatedness per site.
Notably within the relationships detected within the sites we find several triangular cases of relatedness with a female individual involved within more than one pair (e.g. Duxford DUX011 (female) related with DUX019 (male) and DUX001 (male)), or in case of North West Cambridge we find a relationship between three sampled male individuals (NWC004, NWC010, and NWC009) who appear to be related to each other through unsampled female(s) (either unexcavated or not sampled) because their pairwise X chromosomal differences are lower than population average despite the fact that they carry different mtDNA lineages ( Figure 4). Genetically related individuals appear not to be clustered or buried next to each other: for example, the members of a Duxford family DUX011 (mother), DUX008 (father) and their son (DUX001) are all buried in different groups of burials ( Figure S6) identified in the original site report (Lyons 2011 We further used IBIS to explore IBD sharing within and among Late Iron Age and Roman sites in Cambridgeshire. In case of all pairs of imputed individuals that were identified with READ as closely related we found multiple IBD segments supporting their close relatedness (Data S4). However, in all cases the observed total IBD shared expressed in kinship coefficient was less than expected from the 1st-3rd degree relationship suggesting that capturing long tracts of IBD at low coverage is hindered by their fragmentation due to imputation errors. Besides the kinship pairs already detected with READ ( Figure 4) we did not find any new relationships with IBIS within the sites. However, we detected a case of distant relatedness between DUX019 from Duxford and a previously reported sample 12884A (HI2, Schiffels et al. 2016) from Hinxton, who share five IBD segments longer than 7cM consistent with estimated kinship coefficient suggesting 6th degree relatedness. Given the Duxford and Hinxton sites are located only 3 kilometres from each other and are both in the Cam valley this finding points to local mobility between geographically adjacent sites (Data S4).

Phenotypic Changes / Health
To investigate potential changes in Cambridgeshire through the Roman Period, we imputed 114 SNPs known to be involved in phenotypic traits related to diet (carbohydrate, lipid and vitamin metabolism), immunity (response to pathogens, autoimmune diseases and other immunity traits) and pigmentation (eye, hair and skin colour) in the ancient individuals presented here plus 234 individuals from the literature, for a total of 277 samples divided into four groups from the Mesolithic to the Roman Period (Data S6A-D). Within Great Britain, from the Neolithic to modern Great Britain (1000 Genomes GBR) there are 34 significant SNPs (Data S6B). We can see two main "break-points" considering the significant group pairs after the Tukey test for each of the 34 SNPs, in line with previous findings: 1) after Neolithic and 2) after the Bronze Age. Most of the significant SNPs involve Neolithic or Chalcolithic/Bronze Age groups that differ from later periods. More specifically, they are involved in several different diet, immunity and pigmentation functions. In the diet group, we found six significant SNPs: two that confer lactase persistence, one involved in lipid When focusing on the group pairs involving Iron Age/Romans (IAR), we see eight SNPs are different between IAR and modern GBR. The MCM6 locus, with two Lactase-persistence SNPs, show a sharp increase after Iron Age/Romans, after the first great increase after Bronze Age; this is consistent with recent findings related to low frequency of LCT alleles in Bronze Age and increase in frequency in later periods (e.g. (Burger et al. 2020;Segurel et al. 2020)). Between different sites of Roman UK and Roman Italy, no significant SNPs appear. Differently from (Kerner et al. 2021), we do not observe frequency fluctuations for the TB risk factor rs34536443, which is low in frequency from the Neolithic with no significant changes over time. It reached present-day frequency from IAR.

Mobility through isotopic analysis
To further explore potential childhood origins and geographical mobility, we measured oxygen isotope ratios in the tooth enamel of Iron Age and Roman individuals excavated from the Cambridgeshire region. The oxygen isotope composition of local water sources is largely determined by the local climatic conditions (Dansgaard 1964;Pederzani and Britton 2019). The oxygen isotope ratios measured in archaeological human tooth enamel are a reflection of the water consumed during the formation of the enamel during childhood and can therefore provide information about the environment the individual grew up in (DeNiro and Epstein 1978;Longinelli 1984;Luz, Kolodny, and Horowitz 1984;Luz, Kolodny, and Kovach 1984). This can be broadly used to identify population mobility if this does not match the expected values for the environment in which they are buried (Pederzani and Britton 2019 (Wiseman et al. 2021). The datasets are broadly comparable, although due to the variety of teeth analysed, the data will not represent exactly the same period of life (Lightfoot in Wiseman et al. 2021, p. 160).  We investigated the presence of potential outliers further, following Lightfoot and O'Connell (2016) (Table 3)). The 1.5IQR method is considered most robust in this instance and identifies outliers only at Vicar's Farm: skeletons 2028, 2034 and 2055. However, these are well within the overall range of values seen at Knobb's Farm and may only appear as outliers due to small sample sizes. Interestingly, the individuals identified as outliers using 1.5IQR are not the same as those highlighted above as being outside the expected δ 18 O PO4 range, highlighting the difficulties of robustly identifying and interpreting potential childhood origins and geographical mobility using oxygen isotopes alone.

Discussion
The region of Cambridgeshire appears to have been composed mostly of homogenous, local populations when compared to the York samples of the same period where one out of seven randomly sampled individuals was a long-distance migrant. This evidence supports the hypothesis that the rural populations of Roman Britain were largely unaffected by migration. The differences between PCA mapping of Roman-period populations closer to Celtic populations and PiC score-based analyses show more similar regional affinities across different British geographic regions to Cambridgeshire Roman Period. This may potentially be explained by the relative homogeneity of the allele frequency pool before Anglo-Saxon migrations that makes the allele frequency approach more sensitive to detect the differences.
Regardless of relatively small sample sizes, we find close pairs of relatives in most of the Cambridgeshire sites. The exception being Knobb's Farm, a cemetery associated with a settlement that was possibly engaged in the processing of agricultural products and in which there were a significant number of burials that were decapitations. Knobb's Farm appears to have been more broadly networked than the other local farming communities sampled here, yet distinct from the cosmopolitan urban centre at York, which may explain the difference in population heterogeneity.
Generally, the oxygen isotope range of individuals from the three sites is wide, but there is no isotopic evidence for a large-scale change in population diversity (i.e. differences in the number of individuals who may have originated from elsewhere) between the Iron Age and Roman period sites. The data does appear to highlight several individuals that may have spent their childhoods elsewhere, however, the difficulties of interpreting δ 18 O data due to large ranges and broad, overlapping estimations of expected values means that these individuals can only be tentatively identified as potential migrants and further corroborating evidence, such as strontium isotope analysis would be required to make a more definitive interpretation.

Figure 4. Relatedness between ancient Iron Age/Roman genomes by autosomal and X chromosome calculated in two ways.
A) Mismatch probabilities. Each dot shown on the plot represents a pair of ancient genomes assessed (with READ) for their average pairwise differences normalised by population average. First, second and third degree boundaries for the autosomal relatedness are estimated as in Kuhn et al. 2018. The boundaries for maternal and paternal 1st degree relationships for sons are shown on the x-axis for guidance. Note that parent-daughter and sibling relationships (not shown) can take a wider range of normalised X P0 values. Dots with high transparency correspond to pairs with low aggregate SNP coverage at X chromosome SNPs. B).  Notes: * Skeletons 1 and 3 are almost certainly the same individual, thus the likely total individuals is only three even though four were reported in the original site report. ** Core occupancy date is likely starting at 260 CE.

Sample information and ethical statement
All skeletal elements were sampled with permissions from the representative bodies/host institutions. Samples were taken and processed to maximise research value and minimise destructive sampling. Teeth were sampled from skeletons using gloves. Molars were preferred due to having more roots and larger mass, but premolars were also sampled.

Over Barrows
At

Duxford
The site off Hinxton Road, Duxford, was excavated by CAM ARC in 2002 (Lyons 2011 (Lyons, A., 2011, pp. 10-12, 15-16), although aDNA analysis presented here indicates that these may be Late Iron Age. During the Late Iron Age, the higher ground was defined by a series of ditches that were repeatedly redug, surrounding a short-lived timber-framed rectangular shrine and a burial ground that was in operation c. 100 CE -125 CE (Lyons, A., 2011, pp. 38-49). The burials are believed to have 'formed a selected part of a community perhaps largely made up of a single family or other social grouping' (Lyons, A., 2011, p. 38  AD an inhumation cemetery was established on the southern edge of the settlement within the ditched enclosure system (Evans, C. and Lucas, G., 2020, pp. 314-37) (Fig. 3.46). This cemetery presumably served part or all of the nearby rural settlement, which shows some signs of being of higher status than most other local settlements and may have fulfilled some minor central place role within the local rural settlement hierarchy. There is evidence that neonates were buried within the settlement itself rather than the cemetery and it is possible that high status individuals were buried elsewhere.
The studied skeletons come from the inhumation cemetery, where thirty individuals were recovered from 29 graves. Eight individuals appear to have been buried in coffins, while hobnails indicate that seven were either wearing or accompanied by footwear. Grave goods accompanying seven individuals included bracelets, finger-rings, a glass bead necklace, ceramic vessels and a cache of glass fragments. Most graves are orientated roughly north-south or east-west and the burials were mainly extended and supine, with just one crouched burial.

North West Cambridge
Archaeological Several clusters of inhumations were found, in total containing 48 individuals, plus three cremations including a bustum. Graves were primarily NW-SE, inhumations extended or semi-flexed supine but some non-normative (prone, contracted, splayed knees, head to SE).
Many nails were within graves, suggesting coffins or biers, together with dress accessories and hobnail. One burial has apparent evidence for crucifixion.

Arbury
The burials were discovered in August of 1952 during construction work (Fell 1956) on Fortescue Rd, Arbury. One lead-lined coffin containing a female skeleton inspired the poem, "All the Dead Dears" by Sylvia Plath. Up to six burials were uncovered (Fell 1956). Pre-treatment of the enamel samples was carried out following a protocol based on methods in Balasse et al. (2002) (Balasse et al. 2002). To remove surface contaminants, the outer surface of the tooth enamel was abraded using a handheld Dremel drill with a round-headed, diamond-tipped drill bit. Following this, approximately 5.5-10.0mg of enamel powder was collected using a smaller round-headed diamond-tipped drill bit. Samples were then vortex mixed in approximately 0.1ml per mg of sample of 2-3% aq. sodium hypochlorite (NaOCl) and refrigerated for 24h. Samples were rinsed five times with distilled water and then vortex mixed in 0.1ml per mg of sample of 0.1M aq. acetic acid (CH 3 COOH) and left at room temperature for 4 h. Samples were then rinsed five times with distilled water, frozen and placed in a freeze dryer until full lyophilization. Approximately 2-4mg of the resultant enamel powder was weighed into glass gas bench tubes. For each batch of samples submitted for analysis, 2-4mg of two in-house faunal enamel standards were also weighed into glass gas bench tubes (eight standard tubes in total). The glass vials were vacuum sealed and the samples were reacted with 100% orthophosphoric acid at 90°C using a Micromass Multicarb Sample Preparation System. The CO 2 produced was then dried and transferred cryogenically into a Gas Bench II coupled to a Delta V mass spectrometer in the Godwin Laboratory, Department of Earth Sciences, Cambridge. All results for both carbon and oxygen are measured and reported on the international scale relative to VPDB calibrated through the NBS19 standard (Coplen 1988;Hoefs). Based on repeated measurements of the international and in-house standards, analytical error was < ± 0.10‰ for δ 18 O CO3 .

Generation and analysis of Isotopic data
All

Sampling, ancient DNA extraction and library preparation
Samples were processed in the clean room of the dedicated ancient DNA laboratory of the Institute of Genomics, University of Tartu, Estonia following established protocols as detailed most recently in (Saupe et al. 2021).

DNA sequencing
DNA was sequenced using the Illumina NextSeq500/550 High-Output single-end 75 cycle kit. As a norm, 15-20 samples were sequenced together on one flow cell; additional data was generated for 34 samples to increase coverage (Data S1A).

Mapping
Before mapping, the sequences of the adapters, indexes, and poly-G tales occuring due to the specifics of the NextSeq 500 technology were cut from the ends of DNA sequences using cutadapt-1.11 (Martin 2011). Sequences shorter than 30 bp were also removed with the same program to avoid random mapping of sequences from other species.
The sequences were aligned to the reference sequence GRCh37 (hg19) using Burrows-Wheeler Aligner (BWA 0.7.12) (Li and Durbin 2010) and the command aln with re-seeding disabled.
After alignment, the sequences were converted to BAM format and only sequences that mapped to the human genome were kept with samtools 1.3 (Li et al. 2009). Afterwards, the data from different flow cell lanes were merged and duplicates were removed using picard 2.12 (http://broadinstitute.github.io/picard/index.html).

aDNA authentication
As a result of degradation over time, aDNA can be distinguished from modern DNA by certain characteristics: short fragments and a high frequency of C=>T substitutions at the 5' ends of sequences due to cytosine deamination. The program mapDamage2.0 (Jónsson et al. 2013) was used to estimate the frequency of 5' C=>T transitions. Rates of contamination were estimated on mitochondrial DNA by calculating the percentage of non-consensus bases at haplogroup-defining positions as detailed in (Jones et al. 2017

Calculating genetic sex estimation
Genetic sex was calculated using the methods and script described in (Skoglund et al. 2013), estimating the fraction of reads mapping to Y chromosome out of all reads mapping to either X or Y chromosome. Genetic sex was calculated for samples with a coverage >0.01x and only reads with a mapping quality >30 were counted for the autosomal, X, and Y chromosome.

Determining mtDNA haplogroups
Mitochondrial DNA haplogroups were determined using Haplogrep2 on the command line (Kloss-Brandstätter et al. 2011). Subsequently, the identical results between the individuals were checked visually by aligning mapped reads to the reference sequence using samtools-1.3 (Li et al. 2009) command tview and confirming the haplogroup assignment in PhyloTree (accessed at: www.phylotree.org). Additionally, private mutations were noted for further kinship analysis.

Y chromosome variant calling and haplotyping
A total of 161,140 binary Y chromosome SNPs that have been detected as polymorphic in previous high coverage whole Y chromosome sequencing studies (Hallast et al. 2015;Karmin et al. 2015;Poznik et al. 2016) were called in 29 male samples with more than 0.01× autosomal coverage using ANGSD-0.916 (Korneliussen et al. 2014) '-doHaploCall' option.
A subset of 144,550 sites yielded a call in at least one of the samples and in the case of 5653 sites at least one of the 29 samples carried a derived allele (Table S8). Basal haplogroup affiliations (Table S7) (Table S7).

Preparing the datasets for autosomal analysis
Autosomal variants were called with the ANGSD-0.921 software (Korneliussen et al. 2014) command --doHaploCall keeping base for the 1,233,534 positions that are present in the 1240K capture set. Two compiled datasets were downloaded from David Reich Lab Samples from this study were merged with the comparative dataset using PLINK v1.90 (Chang et al. 2015) and in the resulting merged file #1, SNPS were filtered for a genotyping rate over 10% (--geno 0.9) leaving 1,232,632 SNPs from the base dataset. In file #2, SNPS were filtered for a genotyping rate over 10% (--geno 0.9) leaving 597,573 SNPs from the base dataset. Both filesets were converted to EIGENSTRAT format using the program convertf from the EIGENSOFT 7.2.0 package (Patterson et al. 2006).

Principal component analysis
PCA was performed using the program smartpca from the same package, projecting ancient individuals onto the components constructed based on the modern genotypes using autoshrink:YES and lsqproject: YES.

Kinship analysis
Variants were called with ANGSD (Korneliussen et al. 2014 Identical individuals and first degree relatives are consistently detected across groupings (Data S1H). In addition to 1st and 2nd degree relationships we also estimated P0 cutoffs (15/16 = 0.9375 as per Kuhn et al. 2018) for the detection of 3rd degree relatives while acknowledging that due the lack of relevant empirical data we cannot estimate the error rates for this class of relationship.
Imputed genomes were used to study in further detail cases of close degree of genetic relatedness detected with READ. We used the --genome function of PLINK 1.9.0 (Chang et al. 2015) to estimate pairwise proportions of IBD1 and IBD2 that are informative, for example, for distinguishing parent-offspring from sibling relationships (Data S1G).

Runs of Homozygosity
We used hapROH (Ringbauer et al. 2020) to detect runs of homozygosity (ROH) in ancient genomes. Using information from a reference panel, hapROH has been shown to work for genomes with more than 400K of the 1240K SNPs panel covered at an error rate lower than 3% in pseudo-haploid genotypes (Ringbauer et al. 2020). We note that the requirement is broadly in line with the imputation accuracy we get from coverages as low as 0.05x, where~60% of common variants (MAF≥0.05) in the HRC panel are recovered with an accuracy greater than 0.95 in diploid genotypes (Hui et al. 2020

LSAI sharing and individual connectedness inference
LSAI segments and kinship coefficients were estimated from merged plink files of 61 imputed ancient genomes, 503 Europeans from the 1000 Genome Project and UK Biobank data with IBIS version 1.20.9 using different minimum shared segment length (-min_L) threshold -4 cM for population genetic inference and 5 and 7 cM for kinship analysestogether with -maxDist 0.1 and -mt 300 parameters. In total, 269,319 binary SNPs with MAF >0.05 were used. Probabilities of IBD sharing among groups were estimated as in Kivisild et al. 2021.

Phenotype prediction
For 39 out of the 41 HIrisPlex-S set of SNPs we selected 2 Mb around the informative variants, merging the regions on the same chromosome, with the exception of the variants on chromosome 15, which have been analysed in two different regions since the distance between the two nearest SNPs was about 20 Mb. We selected 10 regions from 9 autosomes, spanning from about 1.5 Mb to 6 Mb. For the other phenotypic informative markers (diet, immunity and diseases), we selected 2 Mb around each variant and merged the overlapping region, for a total of 48 regions from 17 autosomes and the X chromosome.
We call the variants using ATLAS v0.9.0 (Link et al. 2017) (Antonio et al. 2019) and already analysed for the same phenotypic variants in (Saupe et al. 2021). For both the unpublished and published set of samples described above, the variant rs34536443, a risk factor for tuberculosis with fluctuating frequency in Europe over the last 2,000 years (Kerner et al. 2021), has been analysed here for the first time. After calling the variants separately for each sample, we merged them in one VCF file per region. We used the merged VCFs as input for the first step of our imputation pipeline (Hui et al. 2020) (genotype likelihood update), performed with Beagle 4.1 -gl command (Browning and Browning 2016) using the same panels as before as reference. We then discarded the variants with a genotype probability (GP) less than 0.99 and imputed the missing genotype with the -gt command of Beagle 5.0 (Browning et al. 2018) using the HRC as a reference panel for all groups of samples. We then discarded the variants with a GP < 0.99 and used the remaining SNPs to perform the phenotype prediction. Two markers of the HIrisPlex-S set, namely the rs312262906 indel and the rare (MAF=0 in the HRC) rs201326893 SNP, were not analysed because of the difficulties in the imputation of such variants. Results are reported in Data S6.
We performed this analysis on new and previously published ancient samples with a coverage ≥ 0.04x. We then grouped the individuals in different cohorts depending on both time and space.
First, we grouped the ancient individuals from the British Islands in five groups from the Mesolithic to the Early Medieval period and compared them with the modern GBR. We compared the groups with a sample size higher or equal to 5 performing an ANOVA test and, for the significant variants, we performed a Tukey test to identify the significantly different pairs of groups (Data S6B,C). Using the same approach, we also analysed the difference within Iron Age and Roman Britain by creating 8 local groups and comparing them with ancient Roman Italians, discarding the groups with a sample size less than 5. For both comparisons, we used a Bonferroni's correction on an alpha value of 0.01 for the number of tested SNPs to set the significance threshold. We performed our phenotype analysis in a set of 285 ancient individuals, composed of the 43 new individuals reported here for the first time, 231 ancient people from the British Islands and 11 ancient Italians from the literature and already analysed for phenotypes in (Saupe et al. 2021). Sample-by-sample phenotype prediction and genotype at the selected phenotype informative SNPs, reported as the number of effective alleles (0, 1 or 2) are shown in Data S6.

Male-biassed kinship in Prehistoric Britain and Ireland
Information regarding archaeological sites and genomes were pulled from the annotation file of the Allen Ancient DNA Resource (AADR) (Anon) and the supplementary materials of Olalde et al. 2018, Brace et al. 2019, Cassidy et al. 2016, Cassidy et al. 2020, Sanchez-Quinto et al. 2019and Scheib et al. 2019.
For each neolithic site the total number of estimated individuals, the number of genomes retrieved and number of genomes tested for kinship were noted.