Abstract
The French revolutionary Jean-Paul Marat was assassinated in 1793 in his bathtub, where he was trying to find relief from the debilitating skin disease he was suffering from. At the time of his death, Marat was annotating newspapers, which got stained with his blood and were subsequently preserved by his sister. We extracted and sequenced DNA from the blood stain and also from another section of the newspaper, which we used for comparison. Analysis of human DNA sequences supported the heterogeneous ancestry of Marat, with his mother being of French origin and his father born in Sardinia, although bearing more affinities to mainland Italy or Spain. Metagenomic analyses of the non-human reads uncovered the presence of fungal, bacterial and low levels of viral DNA. Relying on the presence/absence of microbial species in the samples, we could confidently rule out several putative infectious agents that had been previously hypothesised as the cause of his condition. Conversely, some of the detected species are uncommon as environmental contaminants and may represent plausible infective agents. Based on all the available evidence, we hypothesize that Marat may have suffered from a primary fungal infection (seborrheic dermatitis), superinfected with bacterial opportunistic pathogens.
Significance The advent of second-generation sequencing technologies allows for the retrieval of ancient genomes from long-dead people and, using non-human sequencing reads, of the pathogens that infected them. In this work we combined both approaches to gain insights into the ancestry and health of the controversial French revolutionary leader and physicist Jean-Paul Marat (1743-1793). Specifically, we investigate the pathogens, which may have been the cause of the debilitating skin condition that was affecting him, by analysing DNA obtained from a paper stained with his blood at the time of his death. This allowed us to confidently rule out several conditions that have been put forward. To our knowledge, this represents the oldest successful retrieval of genetic material from cellulose paper.
Background
Jean-Paul Marat (1743-1793) was a famous French physician, scientist and journalist, best known for his role as Jacobin leader during the French Revolution. Marat’s parents were Giovanni Mara, born in Cagliari, Sardinia, who later added a “t” to his family name to give it a French feel and Louise Cabrol, a French Huguenot from Castres. Marat was stabbed to death in his bathtub by the Girondist’ supporter Charlotte Corday on July 13th, 1793 (Figure 1a). Upon his death, his sister Charlotte Albertine kept two issues of Marat’s newspaper l’Ami du Peuple (nº506 and nº678, published on June 30th, 1791 and August 13th, 1792, respectively), which he was annotating the day of his assassination and that gost stained with his blood (Figure 1b). Albertine gave the issues to the collector François-Nicolas Maurin (1765-1848) in 1837. After his death, as explained by a handwritten note by writer Anatole France dated from October 10th, 1864, the two issues ended up in the possession of baron Carl De Vinck who in 1906 donated them to the Département des Estampes, Bibliothèque National de France, in Paris (see notice in the Catalogue Général: https://catalogue.bnf.fr/ark:/12148/cb40261215w).
Marat’s health during the last years before his assassination is shrouded in mystery. He suffered from a severe itching skin disease from which he found some relief by spending most of his time in a medicinal bathtub over which he placed a board to use as a writing desk. His condition, which he attributed to his stay in the sewers of Paris while hiding from his political enemies, has been the subject of numerous medical debates and has been alternatively attributed to scabies, syphilis, atopic eczema, seborrheic dermatitis or dermatitis herpetiformis (1–5), the latter as a potential manifestation of celiac disease (6). It has been suggested that his condition affected his character and turned it more violent (1).
With the intention of shedding light on these issues, we retrieved two samples from one of the newspapers stained with Marat’s blood, one sample from the blood stain and a second one from a non-stained area in the upper corner of the paper, to be used as a comparison. A principal concern was to use a non-destructive approach to explore Marat’s genomic footprint; therefore, the samples were taken with forensic swabs. The DNA extracted from both samples was used to build genomic libraries that were subjected to second-generation sequencing using the Illumina platform. DNA reads where subsequently classified, separating the human reads – most likely deriving from Marat’s blood – from those assigned to microbial species. The analysis of both sets of DNA sequences allowed characterisation of Marat’s ancestry as well as identification of the potential pathogens responsible for his debilitating skin condition.
Results
Human ancestry analysis
We generated 568,623,176 DNA reads from the blood stain, of which 74,244,610 reads mapped to the human reference genome (Table S1). From these, we retrieved a complete human mitochondrial (mtDNA) genome at a mean depth of coverage of 4.038x and the nuclear genome at 0.029x (Table S1). The predominant mtDNA haplotype was H2a2a1f, although we found evidence of some additional mtDNA sequences, notably a K1a15 haplotype. The ratio of sexual chromosome to autosomal DNA reads indicated that the sample donor was male (Fig. S1).
The human DNA reads showed evidence of post-mortem deamination occurring in 1% of the ends of sequencing reads, indicating authentic ancient DNA damage (Fig. S2). For further analyses we selected only those reads that displayed C to T or G to A substitutions at the 5’ or 3’ end, respectively. After this procedure, the degree of mitochondrial contamination was reduced to 0-0.01%.
To explore the ancestry of Marat in the context of modern European populations, we performed Principal Component Analysis (PCA) (Figure 2a and Fig. S3) and unsupervised clustering in ADMIXTURE (Figure 2b). Our sample projected among modern French individuals sampled from “Southern” France in the population genetic analyses. This result is broadly compatible with the mixed ancestry of Marat, especially if his paternal family was not Sardinian native but descendant of recent Italian or Spanish immigrants, as has been speculated (2). However, mixed ancestries are difficult to discern, especially when only limited genetic data is available.
Metagenomic analysis
We conducted metagenomic species assignments with the 9,788,947 deduplicated, quality controlled and low complexity filtered DNA reads (combined merged and non-merged) that did not map to the human genome (see Methods and Table S2). We used metaMix (7), a Bayesian mixture model framework developed to resolve complex metagenomic mixtures, which classified ∼9% of the non-human reads into 1,328 microbial species (Table S3). The species assignments were replicated with KrakenUniq (8), which led to largely consistent, if less accurate, results (∼7% classified into 3,213 species, Fig. S4, Table S4). Thus, we relied on the metaMix species assignments throughout the paper, unless stated otherwise.
We detected the presence of a wide range of microorganisms, including some expected to develop on decaying cellulose and/or dried blood, but also others recognized as opportunistic human pathogens from the following bacterial genera: Acidovorax, Acinetobacter, Burkholderia, Chryseobacterium, Corynebacterium, Cutibacterium, Micrococcus, Moraxella, Paraburkholderia, Paracoccus, Pseudomonas, Rothia, Staphylococcus, Streptococcus and the fungal genera Aspergillus, Penicillium, Talaromyces and Malassezia as well as HPV (type 179 and type 5) and HHV6B viruses, albeit the latter supported by a very low numbers of reads (Table S3, S4). Some of the DNA reads, notably from Aspergillus glaucus, Cutibacterium acnes, Malassezia restricta and Staphylococcus epidermis showed typical misincorporation patterns that are considered indicative of these sequences being authentically old (Fig. S5).
We additionally sequenced the swab taken from the unstained paper sample. In this case, only 96,252 paired DNA sequencing reads were obtained (56,616 merged, 25,712 non-merged, 36,434 deduplicated combined merged and non-merged), with 52% of the reads that could be classified with metaMix into 64 species and 36% with KrakenUniq into 953 species, respectively (see Methods and Table S3). Although very little DNA could be retrieved from the section of the document that had not been blood-stained, we tried to identify microorganisms that were statistically over-represented in the blood stain relative to the unstained paper. Amongst these and besides, as expected, Homo sapiens, different species of Aspergillus and Acinetobacter were significantly overrepresented in the blood stain (Table S3). It remains questionable however whether the unstained paper represents a suitable negative control given that the newspaper had been extensively manipulated by Marat. Significant over-representation of Aspergillus spp. and Acinetobacter spp. in the blood stain relative to the rest of the document could also be due to the blood providing better conditions for the growth of iron-limited microbes. Indeed, Aspergillus spp. and Acinetobacter spp. are commonly found in the environment but are also grown in blood agar. As such, it is plausible that these represent post-mortem contaminants. Indeed, for Acinetobacter spp. we identified no post-mortem damage pattern.
Metagenomic analysis of historical samples can be challenging as the resulting microbial communities typically comprise an unknown mixture of endogenous species as well as contaminants, both contemporary and modern. As such, we relied on a differential diagnostics approach (Table 1), where we specifically tested for the presence of reads from pathogens that could plausibly have led to Marat’s symptoms, most of which have been previously hypothesised in the literature (1–5).
Based on the absence of any sequencing reads in either the blood stain or the unstained paper we could rule out syphilis, leprosy, scrofula (tuberculosis) and diabetic candidiasis (thrush) (Table 1, Table S3). We additionally tested for scabies, which is caused by burrowing of the mite Sarcoptes scabiei under the skin. Since the metagenomic reference database did not include arthropod genomes, this was tested separately by blasting all the non-human reads against the Sarcoptes scabiei genome (GCA_000828355.1). Again, we detected not a single read matching to Sarcoptes scabiei, which also allowed ruling out scabies out as the cause of Marat’s skin disease (Table 1, Table S3).
Conversely, metaMix recovered 15,926 and 83 filtered DNA reads from the blood stain and the unstained paper respectively, assigned to Malassezia restricta a fungal pathogen causing seborrheic dermatitis, which has been previously hypothesized as one of the most plausible causes for Marat’s condition (1–4). Direct mapping of all reads to M. restricta (GCA_003290485.1) resulted in 19,194 reads from the blood stain dataset mapping over 17.17% of the reference genome. KrakenUniq failed to identify M. restricta, instead assigning 627 reads sequenced from the blood stain to M. sympodialis. However, further analysis of the Malassezia reads based on genome mapping pointed to most (80.3%) being uniquely assigned to M. restricta rather than M. sympodialis (Fig. S6). This allowed us to reconstruct a complete M. restricta mtDNA genome at 0.84X coverage. The Malassezia reads were evenly distributed along the full genome supporting no mixing or misclassification of the species (Fig. S7). We placed our Marat M. restricta genome in phylogenetic context by building a maximum likelihood phylogeny including our historical strain and available present-day mtDNA M. restricta genomes. Although the total number of samples is small, the fact that the M. restricta mtDNA molecule recovered from Marat’s blood is placed basal to modern strains (Fig. S8) and exhibits some post-mortem damage (Fig. S5) further support its authenticity.
We also recovered 587 filtered reads assigned by metaMix to Staphylococcus aureus in the blood stain but none in the reads obtained from the unstained paper. The differential representation in the two samples is not significantly different due to the far lower number of reads in the unstained sample (Table S3). S. aureus is a common human pathogen and the leading cause of atopic eczema. In order to confirm the metagenomic assignments to S. aureus, we mapped the raw microbial reads to a series of reference genomes from various species in the Staphylococcus genus. This allowed us to identify 888 reads mapping against the S. aureus reference genome, out of which 758 uniquely mapped to S. aureus (Fig. S9). The relatively low number of reads may point to a secondary infection by S. aureus rather than S. aureus being the initial cause of Marat’s condition.
The most prevalent microbial species in the blood stain was Cutibacterium acnes (formerly Propionibacterium acnes (9)), which was also present in the unstained paper (Table S3). C. acnes is largely a commensal and part of the normal skin biota present on most healthy adult humans’ skin, including in association with S. epidermis which we also observe in our sample (Table S3, Table S4, Fig. S10) (10). C. acnes is also a frequent contaminant in metagenomic samples (11, 12). However, C. acnes is also involved in severe acneiform eruptions (13) and could have contributed to Marat’s condition. 86,019 reads mapped to the C. acnes reference genome (GCF_000008345.1), yielding an alignment of 3.4X average coverage (Fig. S10) and exhibiting moderate post-mortem damage (Fig. S5).
A phylogeny of Marat C. acnes with a collection of publicly available modern strains (12, 14) places our historic genome on a short branch falling basal to Type I strains, supporting its age and authenticity (Fig. S11). This phylogenetic placement suggests our Marat strain falls into C. acnes phylotype I (C. acnes subsp. acnes) rather than II (C. acnes subsp. defendens). Whilst our Marat strain does not cluster with phylotype Ia, the type more commonly associated with skin surface associated acne vulgaris (15), its position, basal to Type Ib strains could be consistent with soft or deep tissue infections, with associated haemolytic activity (16).
Discussion
Over the last decade, ancient-pathogen genomics has made great progress by borrowing technological advances originally developed for the study of human ancient DNA (17, 18). Although most microbial data has been secondarily generated from the sequencing of ancient human bones or teeth (18–21) other, rare samples, such as preserved tissues (22, 23) or microscope slides from antique medical collections have been analysed (17, 24). We are aware of no previous attempt to leverage ancient DNA technology to diagnose infections in historical characters, despite previous sequencing of remains from other prominent historical figures such as King Richard III and the putative blood of Louis XVI (25, 26).
In this work we analysed both human and ‘off-target’ microbial reads to shed light on an important historical figure of the French Revolution and his skin condition. Due to the loss of Marat’s remains after their removal from the Panthéon in February 1795, the paper stained with his blood likely represents the only available biological material to study both his ancestry and the cause of his skin condition. Our work also represents the first instance where second-generation sequencing techniques have been applied to old cellulose paper.
The presence and relative abundance of different microorganisms in the documents Marat was annotating is affected by their endogenous presence as well as contemporary and modern contamination both for the blood and unstained sample. Some microorganisms present in the samples might reflect skin microbiome signatures. Whilst some other microorganisms represent environmental contaminants and are likely unrelated to Marat’s condition. Nevertheless, we can use classification of metagenomic samples to test some of the proposed diagnoses of Marat’s condition (Table 1). For instance, due to the conspicuous absence of DNA reads from several previously proposed pathogen candidates, we can confidently exclude syphilis, tuberculosis (scrofula), leprosy, diabetic candidiasis or scabies as causative agents.
The presence of Malassezia restricta is intriguing because this fungus is specialized to live on the skin (27). Malassezia has been described in various skin conditions, including dandruff, atopic eczema, folliculitis and seborrheic dermatitis (28, 29). Interestingly, the latter symptoms would fit those described in Marat (5). The M. restricta reads were not statistically significantly overrepresented in the blood’s stain relative to the unstained paper, although they could be expected to be present in both samples if someone heavily infected was holding the newspaper. Though, the damage patterns and phylogenetic placement relative to modern samples of the M. restricta mitochondrial genome assembled from the reads in Marat’s blood suggests they are indeed old (Fig. S5, Fig. S8). Also of interest is the widespread presence of Cutibacterium acnes subsp. acnes, causative of severe acneiform eruptions, which constitutes the top hit in the blood sample and falls basal to phylotype I strains currently in circulation. Staphylococcus aureus, which is frequently detected in cases of atopic dermatitis, is also present in reads obtained from the blood’s stain, although in low number.
Whilst our results do not allow us to reach a definite diagnosis of Marat’s condition, they allowed us to rule out several previous hypotheses and provide suggestive evidence that he could have been suffering from an advanced polymicrobial infection, either primary or secondary to another condition. Future metagenomic analysis of additional documents in Marat’s possession during his assassination could help confirm the microbial composition found in this study and strengthen these observations.
Our work further illustrates the potential of sequencing technologies for the generation of (meta-)genomic information from difficult, singular samples and opens new avenues to address medical hypotheses of major historical interest.
Material and methods
DNA extraction and sequencing
Forensic swabs were obtained from one of the newspapers Marat was annotating at the time of his assassination. One swab was taken from the blood stain and another from an area of the newspaper without visual evidence of blood. The blood swab was extracted with a buffer composed of 10 mM TrisHCl, 10 mM EDTA, 2 mM SDS, 50 mM DTT; proteinase K was added after one hour incubation. The extract was subsequently concentrated and purified using a Qiagen column kit. DNA extraction from both swabs was performed together with extraction blanks (no sample). A total of 35 ul of each sample was used for library preparation following the BEST protocol (30). Libraries were quantified using BioAnalyzer and sequenced by HiSeq 4000 (Illumina). Library blanks were also performed for each library batch.
Mapping and variant calling
Raw sequences adapters were removed using Cutadapt (31). Reads were then aligned against the Human Reference genome (GRCh37/hg19) and the mitochondrial reference genome (rCRS) as well as for a set of microbial candidates using BWA v.07.3 (32) and Bowtie2 (33). Duplicate reads were discarded using Picard tools (34). Unique mapped reads were filtered for a mapping quality equal of above 30. All mapped sequences (human and microbial) were assessed for post-mortem damage patterns at the ends of reads using MapDamage v.2 (35), which can be used as a sign of historic authenticity over modern contamination. Mapping statistics including the depth of coverage were recorded using Qualimap (36). Due to the low coverage of the human sample, we performed a pseudo-haploid calling approach, common to the processing of aDNA, using the SAMtools Pileup tool (37). This data was then merged with the Human Origins dataset for its use in population genetics analyses (38, 39).
Modern DNA Contamination
Schmutzi was used to estimate the amount of modern DNA contamination in the mitochondrial (mtDNA) genome (40) likely deriving from the DNA of those who have handled the newspaper in the years following Marat’s death. This allowed the modern DNA sequences to be delineated from the ancient DNA sequences using Jvarkit and a custom script (41) by selecting the human reads with mismatches in their first or last three nucleotides. To verify that all modern contamination was removed, we ran Schmutzi again with the filtered reads. The depth of coverage was then recorded using Qualimap.
Uniparental Markers and sex Determination analyses
The mtDNA haplogroup was determined using SAMtools pileup tool calling the positions defined in the Phylotree database (42). We used a genome browser (IGV.v2.4.14) to study the genomic context of each possible SNP (43). Only those SNPs that were present in two or more reads, and those which were not located at the ends of the reads, were considered. The contamination was estimated by calculating the ratio of discordant reads at haplogroup-diagnostic positions. Molecular sex was assigned with Ry_compute (44), a script designed for the sex identification of low coverage individuals (Fig. S1).
Population Genetics Analysis
A Principal Component Analysis (PCA) was performed using SmartPCA in EIG v6.0.1 with a subset of modern individuals from the Human Origins dataset (45). This subset contained 434 present-day Europeans and 616,938 autosomal SNPs, plus our sample. The Marat sample was projected using the option lsqproject. As projected individuals’ components tend to 0, we also carried out a control analysis using Han Chinese, French and Marat (Fig. S3). The results were visualised using the R package GGplot2 (46). This dataset confirmed that Marat is not artefactually placed at the centre of the plot.
We additionally ran an unsupervised clustering analysis using ADMIXTURE v1.3 and another subset of the Human Origins dataset (47). This subset included 881 individuals from Europe, West Asia and North Africa typed over 616,938 shared autosomal SNPs. We filtered the dataset by removing SNPs in high linkage disequilibrium using PLINK.v1.9 (48), removing all SNPs with a r2 threshold of 0.4 within a 200 SNP sliding window, advancing by 50 SNPs each time. We performed the clustering analysis using K values ranging from 1 to 10, with 10 replicates for each value of K. We selected K according to the lowest cross-validation error value (K=4). The ADMIXTURE results at K=4 were visualised using the R package pophelper (49).
Metagenomic analysis
We first removed adapters and merged the paired-end reads into longer single-end sequences using AdapterRemoval v2 (50). We removed PCR duplicates with exact sequence identity using dedupe from the BBMap suite of tools (https://sourceforge.net/projects/bbmap/) (51). We subsequently used the default preprocessing pipeline designed for metaMix which consists of removing human and rRNA sequences using bowtie2 followed by megaBLAST, as well as low quality and low complexity reads using prinseq (52) (-lc_method dust -lc_threshold 7 - min_qual_mean 15). We then screened the remaining DNA reads for the presence of possible pathogens using both KrakenUniq (8) against the Kraken database compiled in Lassalle et al 2018 (53) and metaMix (7) using megaBLAST and a local custom database consisting of the RefSeq sequences of bacteria, viruses, parasites, fungi and human, as of July 2019. KrakenUniq was run with default parameters. The metaMix-nucleotide mode was run with the default number of 12 MCMC chains x 10,000 iterations and the default read support parameter of 10 reads was used.
The relative proportion of reads assigned to different species by KrakenUniq and metaMix was highly correlated; R2=0.94 and R2=0.82, for the blood stain and the unstained paper, respectively (Fig. S4). However, metaMix tended to assign a higher number of reads to individual species, closer to the number found by mapping directly to the microbial genomes and we observed important discrepancies for the number of reads assigned to some of the species (Table S4). Additionally, metaMix results for both the blood stain and the unstained paper consisted of fewer species compared to KrakenUniq, even when the same read support threshold was applied to KrakenUniq, indicating increased specificity due to the MCMC exploration of the species space, that comes at an increased computational cost.
In order to compare the accuracy of the two assignment tools, we further explored the presence of clinically relevant species by mapping the quality-filtered subset of reads (Table S4) used in metagenomic assignment against the reference genomes of different candidate genera of fungi and bacteria using bowtie2 (33) and BWA v.07.3 (Fig. S5-S7, Fig. S9-S10). For all reads mapping to individual reference genomes, mapDamage v2 (35) was also run to assess evidence of nucleotide mis-incorporation characteristic of post-mortem damage. These mapping results were systematically supporting the metaMix assignments over those obtained with KrakenUniq (Table S4). This led us to rely on metaMix for all metagenomic assignments presented in the paper.
Besides testing for the presence and absence of species, we tested whether some microorganisms were overrepresented in the blood stain compared to the unstained section of the paper using a one-sided binomial test and a significance threshold of 0.95 (Table S3).
In the case of Malassezia, a phylogenetic analysis of the mitochondrial DNA genome with modern strains was performed. We called variant positions using GATK UnifiedGenotyper (54) and generated a Maximum Likelihood tree using RAxML-NG specifying a GTR substitution model and 100 bootstrap resamples (55). The tree was rooted with M. globosa (Fig. S8).
We also conducted a phylogenetic analysis for C. acnes, combining our historical strain with all C. acnes genomes deposited in the Short Read Archive (SRA) covering the reference at an average depth >10x, and with C. namnetense as an outgroup (SRR9222443). The only C. acnes genomes sequenced at medium to high depth were those reported by Gomes et al 2017 (14). A Maximum Likelihood tree was generated over the 21,751SNP alignment using RAxML-NG (Fig. S11) and clonal complexes and phylotypes were assigned based on the PubMLST C. acnes definitions database (https://pubmlst.org/bigsdb?db=pubmlst_pacnes_seqdef).
Acknowledgments
This work was supported by Obra Social ‘‘La Caixa’’ and Secretaria d’Universitats i Recerca (GRC2017-SGR880) (T.M.-B. and C.L.-F.), BFU2017-86471-P and PGC2018-101927-B-I00 (MINECO/FEDER, UE) (T.M.-B) and PGC2018-095931-B-100 (MINECO/FEDER, UE) (C.L.-F.). T.M-B. is also supported by a U01 MH106874 grant and Howard Hughes International Early Career and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya. S.M is funded by a Welcome Trust post-doctoral fellowship (206478/Z/17/Z). L.v.D and F.B. acknowledge financial support from the Newton Fund UK-China NSFC initiative (grant MR/P007597/1) and the BBSRC (equipment grant BB/R01356X/1).