Introgression, hominin dispersal and megafaunal survival in Late Pleistocene Island Southeast Asia

The hominin fossil record of Island Southeast Asia (ISEA) indicates that at least two endemic ‘super-archaic’ species – Homo luzonensis and H. floresiensis – were present around the time anatomically modern humans (AMH) arrived in the region >50,000 years ago. Contemporary human populations carry signals consistent with interbreeding events with Denisovans in ISEA – a species that is thought to be more closely related to AMH than the super-archaic endemic ISEA hominins. To query this disparity between fossil and genetic evidence, we performed a comprehensive search for super-archaic introgression in >400 modern human genomes. Our results corroborate widespread Denisovan ancestry in ISEA populations but fail to detect any super-archaic admixture signals. By highlighting local megafaunal survival east of the Wallace Line as a potential signature of deep, pre-H. sapiens hominin-faunal interaction, we propose that this understudied region may hold the key to unlocking significant chapters in Denisovan prehistory.


Main Text
Island Southeast Asia (ISEA) hosts a unique fossil record of hominin presence throughout the Pleistocene 1 . Homo erectus has a deep history in the region, from the early Pleistocene until ~108ka 2 , and at least two additional endemic species are known to have survived in ISEA until, or close to, the arrival of anatomically modern humans (AMH) >50 thousand years ago (ka) 3 If H. floresiensis and H. luzonensis do represent super-archaic human relatives, it is possible that they also admixed with AMH populations in ISEA and subsequently this highly divergent genetic ancestry might survive in present-day ISEA populations. Signals of super-archaic admixture have been observed in Altai Denisovans 23 and, potentially, in Andaman populations [24][25][26] , suggesting that additional super-archaic introgression may remain undetected in modern human genomes.

Results
To address these questions and provide further insights into the hominin prehistory of ISEA, we implemented the most comprehensive search for introgressed super-archaic regions in modern human genomes performed to date. We searched a total of 426 genomes from across the world, including 214 individuals from Papuan and ISEA populations 16 (Supplementary Table S1 28,29 . Importantly, HMMArchaic differs from CP and HMM in that it does not require a reference genome to guide the detection of introgressed DNA, making it suitable for identifying DNA from super-archaic groups for which no genome information currently exists. Accordingly, we were able to distinguish putative introgressed super-archaic blocks by running the three detection methods on all 426 genomes and only retaining those that did not overlap any of the Neanderthal and Denisovan blocks predicted by CP and/or HMM. We term the resulting set putative super-archaic sequences as residualArchaic blocks (see Methods).

No evidence for super-archaic introgression in AMH
Filtering the HMMArchaic introgressed blocks overlapping Neanderthal-and Denisovanintrogressed tracts identified ~12.5Mb of residualArchaic sequence per individual ( Figure 1A). The amount of detected residualArchaic sequence was consistent across worldwide populations, with a slightly higher amount found in East ISEA (~15Mb), and Papuan and Australian populations (~18Mb). In accordance with previous results, ISEA, Papuan, and Australian populations also had the largest amounts of Denisovan ancestry (reaching ~60Mb in Papuan and Australian genomes), meaning that these populations actually had the lowest proportion of residualArchaic sequence relative to the total archaic ancestry observed across all analysed populations (Supplementary Figure S1). Our results indicate that super-archaic ancestry could potentially comprise a small but consistent amount of the genomic ancestry of modern human populations outside of Africa; however the current lack of support for demographic scenarios involving widespread super-archaic admixture in previous studies suggests that this global residualArchaic signal is more likely a methodological artefact or a signal, ancient structure in human populations predating the out-of-Africa migration, or segregation of highly divergent AMHderived sequences that were not detected in our African reference samples that result from incomplete lineage sorting or balancing selection 30 .
Similarly, the additional ~2.5 to ~5Mb of residualArchaic sequence observed in Papuan and Australian populations may represent a small but meaningful amount of super-archaic ancestry specific to this region, or instead simply reflect inter-population variation in the power of the statistical methods to detect Denisovan fragments or some other methodological artefact.
To further discriminate if the residualArchaic blocks were truly introgressed super-archaic DNA, we searched for concordant signatures by investigating genetically distinct mutation motifs (i.e. The mutation motifs differed significantly between populations both when considering a linear model (ANOVA p-value 5.79x10 -224 ) but not a multinomial logistic regression (where motifs are not independent as is assumed for the linear model; Figure 1B and Supplementary Figure S2).
However, these differences are extremely subtle and correlate strongly with known archaic ancestry, suggesting a confounding effect ( Figure 1C  While precise accounting for all motif count differences is non-trivial, likely explanations include the misclassification of alleles as either ancestral or derived, complex demographic histories, and the persistence of Neanderthal and Denisovan archaic signals amongst the residualArchaic blocks that were not removed during the filtering step. For instance, the 2.5-5Mb extra residualArchaic sequence observed in Papuans and Australians might have resulted from these populations having substantially more introgression from a Denisovan-like source that is highly divergent from the Altai Denisovan genome 16 . This may result in some of the more diverged blocks being detected by the reference-free HMMArchaic scan, but not in the two methods that rely on reference genomes (i.e. CP and HMM). Indeed, while Denisovan and

Coalescent simulations support empirical observations
To rule out the possibility that the lack of evidence for super-archaic introgression into modern humans was due to a lack of power in our experimental design, we used the coalescent software msprime 31 to simulate Aboriginal Australian and Papuan histories under an empirically-informed demographic model 32 . These simulations included separate Neanderthal and Denisovan admixture events along with differing amounts of super-archaic introgression (2%, 1%, 0.1% and 0%) in the common ancestral population of Australo-Papuans (see Methods). We then applied our full analytical pipeline to these simulated genomic datasets to detect super-archaic blocks and quantified the power and false discovery rate for the different levels of super-archaic introgression.
Our simulation results demonstrate that HMMArchaic can confidently detect super-archaic blocks even in scenarios with extremely low levels of super-archaic ancestry -with true positive rates (TPR) ranging from ~50% to ~95% for models with 0.1% and 2% super-archaic ancestry, respectively ( Figure S9) -while maintaining extremely low false positive rates ( Figure S10).
The amount of residualArchaic sequences detected per individual in the 0.1% and 0% super-archaic introgression models (~20Mb - Figure 2a) is strikingly close to that observed in the Papuan and Australian empirical data (~18Mb - Figure 1A). For these models, the majority of the residualArchaic signal originates from Neanderthal and Denisovan introgression that went Similarly, the mutational motifs observed in 0.1% and 0% super-archaic introgression models provide a closer fit to the empirical data than higher levels of super-archaic introgression. For instance, the [1000] and [0111] mutational motifs comprise ~27% and ~6% on average in the empirical data, compared to ~26% and ~6.5% for the 0.1% model, and ~22.5% and ~4% for the 0% model ( Figure S13). The close fit of the 0% and 0.1% models to our empirical observations provide strong support for there being little to no introgressed super-archaic sequences in non-African human genomes ( Figure 2 and Supplementary Figure S12).

Location of ISEA Denisovan groups and hominin interactions
Despite the late survival of multiple hominin species in ISEA, our results demonstrate that detectable super-archaic ancestry is absent from genomes of present-day non-African human populations. While genetic data can provide a reasonable proxy for inferring historical locations of these AMH-Denisovan encounters 17 , making more robust conclusions is complicated by the absence of any fossil specimens in ISEA currently attributable to Denisovan lineages. Hence, to further explore the likely locations of the AMH-Denisovan contact points in ISEA, we considered the possibility that the current distribution of endemic megafauna on islands in the region may offer clues to the past occurrence of hominins prior to the arrival of H. sapiens.
During the Pleistocene, diverse assemblages of very large vertebrates (megafauna) characterized the terrestrial biotas of most areas of the world, including the Sunda Shelf (the continental extension of mainland Asia, including the large islands of Sumatra, Java, and Borneo), the Sahul Shelf (Australia, Tasmania, and New Guinea), and the islands in between that were never connected by land to either continental shelf during the Pleistocene (Wallacea and the Philippines; Figure 3). Terrestrial animals larger than modern humans that occurred on the Sunda Shelf during the late Pleistocene, when H. erectus occupied the region, include the elephant Elephas maximus, the rhinos Dicerorhinus sumatrensis and Rhinoceros sondaicus, the tapir Acrocodia indica, the wild cattle Bos javanicus, the deer Rusa spp. and Axis spp., pigs Sus, orangutans Pongo spp., and the tiger Panthera tigris, as well as large pythons, Python reticulata.
Though most of these are endangered, all of them survive in the Sunda Shelf as living species today. In contrast, a wide variety of terrestrial animals larger than modern humans occupied the Sahul Shelf during the Late Pleistocene before the arrival of hominins in the region, including diprotodontid marsupial genera (Diprotodon, Hulitherium, Maokopia, and Zygomaturus) and large kangaroo genera (Procoptodon, Simosthenurus, and Sthenurus), along with the marsupial predator Thylacoleo carnifex, the gigantic monitor Varanus priscus, the large snake Wonambi naracoortensis, the land crocodiles Quinkana and Pallimnarchus, and the turtles Meiolania and Ninjemys. All of these species became extinct after human arrival and no native animal weighing > 60 kg occurs in Australia, Tasmania, or New Guinea today.
The disparate extinction histories of Sunda and Sahul may be partly explained by historical distribution of hominin species in these regions. While the long term presence of hominin species across the Sunda Shelf (including islands like Sumatra, Borneo, and Java) 1 could have predisposed the local fauna to hominin impacts, the fauna across the Sahul shelf (including islands like New Guinea and Tasmania) were first exposed to hominins following AMH arrival.
Extending this logic to the islands between Sunda and Sahul -i.e. Wallacea and the Philippines -

Discussion
The lack of any detectable super-archaic introgression in our analyses stands in stark contrast to the strong evidence of Denisovan admixture with the ancestors of present-day ISEA populations [16][17][18][19]41,42 . Based on current paleoanthropological interpretations of H. luzonenesis and H. floresiensis as descendants of super-archaic hominin groups, our results indicate that interbreeding between these groups and AMH did not occur (at least at detectable levels), or that these encounters did not produce viable progeny, or that the offspring were viable but that these lineages have since died out. Evidence for super-archaic introgression into the ancestors of the Altai Denisovans 23 and, possibly, Andamanese populations [24][25][26] , suggests that viable reproduction may actually have been possible, though further evaluation of these hypotheses is not possible given the available data.
An alternative explanation is that H. luzonensis and H. floresiensis belong to a hominin clade that is considerably less divergent from AMH than is currently accepted, possibly being the late-

Methods
Samples. We examined 426 individuals from 10 distinct populations (Table S1), taking advantage of publicly available data from previous genomic studies, and a recent effort to sequence hundreds of Indonesian genomes through the Indonesian Genome Diversity Project (IGDP) 16 . For a description of data preparation (SNP calling, QC, phasing) see Jacobs et al. 16 . be applied to phased data, and hence extract putative introgressing haplotypes rather than unphased regions, allowing for downstream analysis that is more sensitive to the independent histories of homologous chromosomal regions. Hence, the model was trained and implemented on phased data, which was obtained as described in Jacobs et al. 16 . We used a 1,000bp slidingwindow approach, as performed in the original implementation of the method 45 , as the small size of the sliding-windows across the genome allows a fine-scale resolution of even small introgressed fragments where other methods [27][28][29] are likely to fail.
The HMMArchaic method outputs a posterior probability of introgression for each 1,000 bp window along each chromosome copy of each individual sample. These are called either 'Human' or 'Archaic' blocks, with each archaic block having posterior support >0.5; however, as we wish to focus on high-confidence introgressed blocks, we decided to drop archaic blocks with posterior probability support <=0.95. Therefore, the archaic blocks we examined were all regions directly estimated from HMMArchaic with posterior probability >0.95, with no further changes such as merging of the inferred archaic blocks.

Identifying Denisovan and Neanderthal introgressed fragments.
We first sought to detect genomic signals of Neanderthal and Denisovan introgression using the CP 27 and HMM 28,29 introgression-detection methods described in Jacobs et al 16 . These methods use phased data and seek to define haplotype blocks that are introgressed from an evolutionary relative of a sampled archaic genome, by detecting regions with a high density of variants that are shared with the archaic genome but not observed in an African outgroup sample. All parameters and details of the method implementations are given in Jacobs et al 16 Obtaining residualArchaic blocks. We then focused on regions inferred to be introgressed using HMMArchaic 45 , which contain the introgressed fragments from Neanderthals and Denisovans and, potentially, additional introgressed signals not captured by CP or HMM. By subtracting the introgressed regions inferred to be of Neanderthal or Denisovan origin from CP and HMM, we produced a residual HMMArchaic signal (residualArchaic) of blocks not overlapping Neanderthal or Denisovan fragments inferred with the other two methods. Specifically, for overlapping fragments, we subtract the overlapping HMMArchaic-CP/HMM regions, while still retaining the non-overlapping regions (refer to Supplementary Figure S8 for an illustration). This approach is allied to the residual S* calculated in Jacobs et al 16 , but differs in using more accurate phased archaic calls from HMMArchaic and in the detail of the residualArchaic block calling process. Note that identified residualArchaic blocks may be in close proximity to Denisovan or Neanderthal introgressed regions (as is the case in Supplementary Figure S8) and that these blocks are not suitable for some downstream analyses such as introgression time estimation based on introgressed block length, as they may correspond to subparts of larger introgressed blocks. We decided to adopt this strategy as there is potential for super archaic blocks, in case they are present, to segregate close to, or overlap with, Neanderthal and Denisovan fragments, given the potential for non-random segregation of archaic blocks within the genome. While in the current work we do not present the results for an alternative strategy of completely removing Neanderthal and Denisovan blocks to estimate residualArchaic, the findings are qualitatively similar to the ones presented here.
Looking at patterns of variation within residualArchaic blocks. In order to further disentangle the patterns seen in residualArchaic blocks, we looked at mutation-motif patterns. We defined the

Simulating super-archaic introgression using msprime
In order to test the power of our experimental design to detect introgression from a highly A major advantage of using msprime to implement coalescent simulations is that the software allows the genealogy of each portion of simulated sequence to be traced back through time, including the migration of genomic regions between archaic and human populations (i.e. introgression). This means that, for each individual, we are able to know the exact amount and location of the introgressed segments, and are thus able to directly compute the strength of our approaches for detecting super-archaic introgression in the empirical data.

Models of super-archaic introgression
We initially implemented two models of super-archaic introgression: a model containing 2% introgression into the ancestors of Australians occurring at the same time as Denisovan introgression, and a second model without super-archaic introgression (0%). To estimate the power of our analytical framework to detected super-archaic introgression at low levels of admixture, for each simulated individual we created datasets with ~1% and ~0.1% super-archaic introgression by masking a specific proportion of super-archaic blocks in the 2% model.
Specifically, this was achieved by 1) randomly sampling a proportion of introgressed superarchaic regions in each individual; and 2) merging all the regions sampled across all individuals and masking these merged super-archaic regions across all simulated individuals. This strategy ensured that the masked super-archaic regions were the same across all individuals. We were able to reduce the amount of super-archaic ancestry present in the simulated sequences to ~1% and ~0.1% by randomly sampling, per individual, ~10% and ~50% of introgressed super-archaic regions, respectively. Due to the masking of the introgressed regions, the 1% and 0.1% models contained slightly less genetic sequence than the 0% an 1% models (~2.88Gb and ~2.65Gb simulated sequence, respectively); however the masking did not alter the average proportion of introgressed sequences observed from either the Denisovan of Neanderthal lineages (Supplementary Figure S12).

Power to uncover archaic introgression
We evaluated the performance of the analytical pipeline by comparing the results from our empirical data to four models of Australian-super-archaic admixture at different introgression levels (i.e. 2%, 1%, 0.1% and 0%). First, we estimated the power of each of the three detection  Figure S9). Considering that both CP and HMM rely on the availability of a reference sequence for the putatively introgressing archaic population, this observation is consistent with the fact that the simulated introgressing Neanderthal population is genetically closer to the reference Altai Neanderthal than the simulated introgressing Denisovan population is to the reference Altai Denisovan. Nevertheless, both methods seem to perform only slightly better in the absence of super-archaic introgression, presumably because, at least in the case of CP, a very small proportion of inferred Neanderthal and Denisovan introgression derives from super-archaic introgression (see below). HMMArchaic has extremely high power to detect superarchaic segments (Supplementary Figure S9, top left) and, even though power decreases at lower levels of super-archaic introgression, it is always higher than the detection power for Neanderthal or Denisovan introgression across all four models (Supplementary Figure S9).

False positive rate
We next examined the False Positive Rate (FPR) of each method to detect archaic introgression.
For the CP and HMM methods we define FPR as the proportion of sequence misassigned to a particular archaic population when that sequence is either introgressed from another hominin lineage or is from the human genealogy. For HMMArchaic we simply estimated the proportion of In contrast, the amount of Neanderthal and Denisovan detected by both CP and HMM is essentially independent from the amount of super-archaic ancestry present (as expected from the TPRs shown in Supplementary Figure S9). As described above, the masking strategy adopted to reduce the amount of super-archaic in the simulations meant that models 1% and 0.1% contain a reduced amount of introgressed Neanderthal and Denisovan sequence overall (see explanation in Power to uncover archaic introgression).Therefore, we also present a corrected amount of simulated and detected archaic sequences by normalizing the total amounts to match the total amount of sequence considered in the empirical data (Supplementary Figure S12, panel b). This strategy also allowed us to compare the simulations directly to the results obtained for the empirical data, namely in terms of total residualArchaic sequence present. After determining the total detected sequence in each method, we obtained the residualArchaic regions by removing those regions that overlap with either the CP or HMM detected blocks (residualArchaic in Figure 2, overlapping blocks shown as overlapArchaic).

Effects of super-archaic ancestry to detect Neanderthal/Denisovan introgression
An interesting picture emerges when we consider the behaviour of HMMArchaic in the presence of super-archaic introgression. The ability of HMMArchaic to detect Neanderthal and Denisovan introgression is severely depleted at higher levels of super-archaic introgression, which appears to dominate the amount of detected archaic ancestry: less than 25% of truly introgressed Neanderthal and Denisovan sequences were detected when we simulate 2% super-archaic introgression, versus ~40-60% true rates for a model containing 0% super-archaic introgression ( Figure S9, top panel). This pattern is consistent with the power of HMMArchaic being proportionate to the divergence between the introgressing archaic population and the outgroup human population (i.e. Africa). Importantly, we have simulated a super-archaic source whose divergence to modern humans is significantly higher than that of Neanderthals and Denisovans to mimic introgression from H. floresiensis and H. luzonensis, assuming that the latter are earlier diverging lineages of Homo (see Methods). There is a considerably higher agreement between HMMArchaic and both CP and HMM for a model with no super-archaic introgression compared to a model containing even 0.1% super-archaic introgression ( Figure 2). The most important signal for differentiating these scenarios, which have similar total simulated residualArchaic, is the concordance between HMMArchaic and CP/HMM. Specifically, the excess divergence of superarchaic introgressed sequences means these blocks contain a higher amount of non-African variants and, therefore, are more efficiently detected by HMMArchaic. However, this process simultaneously impacts the internal optimisation of HMMArchaic emission parameters, causing the algorithm to seek more divergent introgressed blocks, which reduces the TPR for detecting known Denisovan and Neanderthal blocks. This is consistent with HMMArchaic having a higher TPR for introgressed Neanderthal and Denisovan sequences when no super-archaic introgression is present, which in turn leads to a higher amount of Neanderthal and Denisovan sequence detected by all three methods (Figure 2). This behaviour causes the concordance between methods to drop, and the residualArchaic signal to increase as a proportion of total HMMArchaic, even when simulating minimal amounts of super-archaic introgression. The higher concordance between HMMArchaic, CP, and HMM for the 0% model translates into a 27% proportion of residualArchaic in this model (Figure 2c) -consistent with residualArchaic regions computed in the empirical data (between ~15% in Papuan genomes and ~22% in West Eurasian genomes -Supplementary Figure S1) -and in contrast to ~33% to 60% for models with >=0.1% superarchaic introgression. Importantly, in simulations containing higher proportions of super-archaic ancestry (1% and 2% models), we observe a much higher proportion of residualArchaic sequence.

Investigating mutation motifs within residualArchaic simulated models
In order to further investigate the nature of genetic diversity within residualArchaic regions, we performed similar mutation motif analyses to those used in the empirical data (see above). In particular, we investigated the amount of shared ancestral and derived alleles between individuals carrying the residual sequence (i.e. test population), the simulated Altai Denisovan, the simulated Altai Neanderthal, and a simulated African genome -again, while all African variation was excluded from HMMArchaic analyses, we randomly sampled one individual and investigated allele sharing within residualArchaic regions after running the method.