Biodiversity Research and Innovation in Antarctica and the Southern Ocean

This article examines biodiversity research and innovation in Antarctica and the Southern Ocean based on a review of 150,401 scientific articles and 29,690 patent families for Antarctic species. The paper exploits the growing availability of open access databases, such as the Lens and Microsoft Academic Graph, along with taxonomic data from the Global Biodiversity Information Facility (GBIF) to explore the scientific and patent literature for the Antarctic at scale. The paper identifies the main contours of scientific research in Antarctica before exploring commercially oriented biodiversity research and development in the scientific literature and patent publications. The paper argues that biodiversity is not a free good and must be paid for. Ways forward in debates on commercial research and development in Antarctica can be found through increasing attention to the valuation of ecosystem services, new approaches to natural capital accounting and payment for ecosystem services that would bring the Antarctic, and the Antarctic Treaty System, into the wider fold of work on the economics of biodiversity. Economics based approaches can be criticised for reducing biodiversity to monetary exchange values at the expense of recognition of the wider values of biodiversity and its services. However, approaches grounded in the economics of biodiversity provide a transparent framework for approaching commercial activity in the Antarctic and introducing requirements for investments in the conservation of Antarctic biodiversity by those who seek to profit from it.

and innovation in Antarctica and the Southern Ocean. The article is based on a review of 28 150,401 scientific articles and 29,690 patent families that make reference to the Antarctic 29 or Southern Ocean in the open access Lens database of scientific and patent literature. 30 The Antarctic region is an important focus of scientific research in the context of the 31 biodiversity and climate change crisis [1]. The impacts of climate change on terrestrial and 32 marine biodiversity may be both positive and negative, with particular concern emerging 33 over non-native species in terrestrial Antarctica and environmental warming and ocean 34 acidification in the marine environment [1]. Commercial activity in Antarctica includes 35 tourism and the harvesting of marine genetic resources such as Antarctic krill and 36 Antarctic toothfish [2][3][4]. The region has also been a focus for bioprospecting or research Antarctic Treaty System consists of a set of agreements that aim to ensure that the 48 Antarctic is a "natural reserve, devoted to peace and science" for the benefit of human kind. 49 However, to date, activity under the Antarctic Treaty System with respect to 50 bioprospecting has been limited to information gathering by the Scientific Committee on 51 Antarctic Research (SCAR). 52 The aim of this article is twofold. First, we improve the evidence base for debates on the 53 governance of research in Antarctica and the Southern Ocean by making datasets of 54 scientific and patent literature and taxonomic data about the Antarctic publicly available 55 through the Open Science Framework. The datasets are intended to contribute to 56 methodological development in areas such scientometrics and machine learning based 57 approaches to natural language processing [11][12][13][13][14][15][16]. 2 We argue that further 58 methodological development is desirable, including by data providers, in order to address 59 weaknesses in data coverage and data quality. 60 Second, we examine the main features of the scientific and patent landscapes for Antarctica 61 and the Southern Ocean with a focus on biodiversity based innovation. The paper argues 62 that efforts to address commercial research and development could usefully be approached 63 in the wider context of the ecosystem services provided by Antarctic biodiversity [17][18][19]. 64 This could be extended to the application of natural capital accounting, presently being 65 incorporated into Systems of National Accounting (SNAs), to the Antarctic [20]. The rise of 66 ecosystem services and natural capital accounting is grounded in increasing recognition 67 within the economics community that biodiversity and the services it provides are not free 68 and must be paid for. If we accept that biodiversity is not a free good and that everyone 69 must, proportionate to their means, pay something we are able to ask other questions, such 70 as: how much, by whom, in what form and to what ends? This paper does not aim to 71 answer these questions but contributes to the evidence base for deliberation on the 72 opportunities to address issues of fairness, equity and benefit-sharing for biodiversity 73 based research and development in Antarctica and the Southern Ocean. 74

75
This paper is a contribution from anthropology and data science that combines analysis of 76 the scientific and patent literature with taxonomic data from the Global Biodiversity 77 Information Facility (GBIF) on Antarctic biodiversity. The method consists of five main 78 steps: 79 1. Capturing the raw universe of scientific and patent publications making reference to 80 Antarctica focus on species names and a limited set of common names based on data from the 87 Global Names Index (GNI) and GBIF; 88 4. Refining the data to focus on scientific literature and patent data containing a 89 verifiable Antarctic species using a cleaned version of Antarctic country code AQ data 90 from GBIF; 91 5. Text mining the results for Antarctic places names with a particular focus on patent 92 data using data from the SCAR Composite Gazetteer of Antarctica (CGA) and the 93 Geonames database of Antarctica (AQ) country code place names. 94 The steps above involved a number of elements and issues of interest to the data science 95 community that can be summarised as follows. 96 Open access databases such as the Lens from Cambia and the Queensland University of 97 Technology make it possible to search for data in multiple languages and to a more limited 98 degree to search the full texts of scientific publications and patent documents. Based on a 99 set of experimental tests the following multi-language query was developed to capture the 100 available universe of publications about Antarctica and the Southern Ocean in multiple 101 languages. 102

103
In considering the raw data in Table 1 it is important to note two points. First, that the 116 analysis in this paper is limited to the 135,150 papers from Microsoft Academic Graph. The 117 reason for this is that the Lens does not directly provide access to affiliation data but it is 118 possible to retrieve this data using the freely available Microsoft Academic Graph database 119 tables. Second, cases where the Antarctic search query only appeared in CORE full texts 120 merit more detailed investigation in future research. Except where they appear in 121 Microsoft Academic Graph these texts are excluded from the quantitative analysis below. 122 The results of the search include any document that references Antarctica, the Southern 123 Ocean or the South Pole anywhere in metadata (including author affiliations and 124 bibliographic references) or the available full texts from CORE. This will inevitably include 125 sources of objective noise, such as references to the South Pole of Mars or Titan or 126 negations such as "except Antarctica", and subjective noise such as the exploration of the 127 role Antarctica plays in the human imagination in literary or cultural studies that may not 128 be of interest to some readers. A conventional approach to dealing with noise in 129 bibliometrics/scientometrics is to attempt to exclude it at source. However, we adopted a 130 different approach informed by the possibilities of the rise of machine learning approaches 131 to natural language processing and their future application to polar research. 132 Machine learning based approaches to Natural Language Processing (NLP) involve training 133 models to engage in probabilistic classification of texts and named entity recognition 134 (e.g. place names, species names). At the time of writing popular libraries include keras, 135 fasttext, scikit-learn and spaCy (among others). The key condition for training models is the 136 availability of preferably large volumes of labelled texts for use in training, testing and 137 evaluating models. Viewed from this perspective, raw data that includes noise that is close 138 to the subject matter (e.g. the South Pole of Titan or "everywhere except Antarctica") is 139 valuable. Rather than excluding noise at source we therefore adopted the approach of 140 leaving the data as is and adding logical TRUE/FALSE columns to the raw data table as 141 labelled filters. The filters are based on text mining of publication metadata (titles,  142 abstracts, author keywords, fields of study, MeSH (medical subject heading terms). Table 2  143 displays the filters. 144  Counts of terms appearing in paper metadata including titles, abstracts, keywords, fields of study and MeSH terms.

145
The aim of the filters is to allow a user to restrict the data to areas of interest. For example, 146 'taxonomic name' is a filter for records containing a uninomial or binomial species name 147 while 'antarctic species' refers to species that occur in Antarctica validated in the 148 taxonomic data with an Antarctic location. 149 In the second step, data from the Lens was federated with Microsoft Academic Graph from 150 Microsoft Academic (January 2019, release). Microsoft Academic Graph is based on data 151 from the Bing search engine and is made available free of charge as a set of data tables that 152 contain over 200 million scientific records. Federation was performed using a Databricks 153 Apache Spark cluster on Microsoft Azure running R in RStudio with the sparklyr and 154 tidyverse packages on the master node [21][22][23][24]. Data federation focused on table joins  155  between the Lens data and affiliations and authors tables of Microsoft Academic Graph  156  using the shared identifier (the paperid). This yielded an affiliation table with 5,021  157  identified organisations (affiliationid) and an authors table with 244,778 authors  158 (authorid). One important and known limitation of Microsoft Academic Graph is that the 159 affiliations data is incomplete [11]. Thus, 69,805 of the papers in the dataset were recorded 160 with an affiliation id corresponding with 52% of the 135,150 papers. However, raw 161 affiliation data is available in the authors The same application may also be submitted to multiple countries where it will also be 182 republished. This introduces radical multiplier effects into patent counts. attention is required to improvements in the classification of marine species (e.g. to 217 distinguish between terrestrial aquatic and marine organisms) in later updates of WoRMS 218 when approaching this filter. 219 The raw results of text mining with dictionaries were passed to the GBIF API using the 220 taxize package from ROpenSci to retrieve the taxonomic hierarchy [30]. One issue when 221 retrieving the taxonomic hierarchy for thousands of species is that a single species name 222 may match to multiple records (e.g. as synonyms or homonyms). However, it is impractical 223 to manually review thousands of results when retrieving data. Fortunately, the return from 224 taxize includes a 'multiple matches' column that identifies these cases. The multiple 225 matches filter is retained in the taxonomic data tables to allow taxonomic specialists to 226 review and, as necessary, refine the data. 227 Scientific and patent publications that include taxonomic names commonly include 228 multiple names. This is particularly true in patent documents and presents the challenge 229 that a particular organism may or may not occur or have been collected in the Antarctic. 230 GBIF maintains a dataset of occurrence records (observations) with country code AQ that 231 in May 2019 consisted of 2,729,211 occurrence records [31]. However, at that time, over 1 232 million of the records were recorded at latitude -91 or -90 revealing unlikely and invalid 233 records. To address this, the data was restricted to records containing a text entry for 234 locality and a second data set for -60 latitude South was generated and combined [32]. To 235 address noisy records a multi-step procedure was adopted involving removing inaccurate 236 coordinates with the ROpenSci CoordinateCleaner package in R [33]. In the second step, the 237 SCAR Composite Gazetteer of Antarctica (CGA) of 23,833 names, was used to text mine the 238 locality field in GBIF data and single occurrence records were manually reviewed in 239 VantagePoint from Search Technology Inc. In the third step, single species occurrence 240 records that lacked locality information were identified. In the fourth step, a filter was 241 added for occurrences south of -60 degrees latitude as the demarcation point for the 242 Southern Ocean and Antarctica. In the fifth step, a species occurrence count was added 243 based on the observation that low species occurrence records that lack locality information 244 are often noise. In a sixth step, a filter was added for fossil records based on the existing 245 GBIF "basis of record" field. Occurrence records with a validated Antarctic location in the 246 locality field became the basis for the 'antarctic species' filter applied across the dataset. 247 The addition of an 'occurrence count' field allow the species related data to be 248 progressively restricted to those with a validated Antarctic location in an ordered way. means that the analysis presented in this paper is indicative rather than definitive. 254 Nevertheless, highlighting these limitations presents opportunities to identify ways 255 forward in improving data coverage and data quality to inform decision-making. 256 Figure 1 displays an overview of the raw dataset for the Antarctic search terms. In Figure  258 1A we can immediately observe that after a steep increase in the paper count to a peak in 259 2014 of 7,468 publications the data displays a declining trend. However, in our view this 260 will reflect data availability issues with Microsoft Academic Graph rather than an actual 261 decline in publications referencing Antarctica. The reason for this is that a steep decline 262 from around the same point is observable for non-Antarctic data. are displayed in red in Figure 1B. The remaining fields, shown in blue, are children of the 274 MAG disciplines. Thus, in Figure 1B  including limited labels for taxonomic classification. Overall, this signifies that papers may 278 be divided into very broad fields and may appear multiple times in the rankings at different 279 levels of detail. 280 Figure 1C displays the available data on the number of papers per organisation. The data is 281 counted by aggregating the papers linked to an organisation (which may include multiple 282 authors from the same entity) and then counting the distinct papers. As noted above, it 283 should be emphasised that this data is indicative rather than definitive. As the resolution of 284 affiliation data improves we would expect the numbers and relative positions of 285 organisations in the rankings to change. Nevertheless, the data is indicative of some of the 286 most important organisations conducting research involving the Antarctic in recent 287 decades. 288

257
Researchers from 134 countries appeared in the raw publication data relating to the 289 Antarctic. However, rankings are affected by the availability of affiliation data. We can gain 290 an initial idea of the geographic distribution of organisations involved by mapping 291 organisations in the data that also appear in the public domain Global Research Identifier 292 Database (GRID) https://www.grid.ac/. The GRID database forms part of a growing effort 293 to harmonise institutional names for geographic mapping and other purposes. Figure 2  294 breaks out the full data from Figure 1C and displays a map of available geographic data for 295 organisations publishing research relating to Antarctica and is accompanied by a ranking of 296 countries based on the number of distinct publications of all types linked to Antarctica. It is worth noting that some countries with organisations with a significant presence in 300 Antarctic research are probably under-represented in the organisation map because their 301 data is distributed across multiple organisations with no available georeference data, 302 notably Russia (with 63 organisations). 303 Figure 1D ( The main focus of the present research was on identifying and extracting species level 355 information from research on the Antarctic using text mining. As a starting point, research 356 on species can be divided into two broad categories: a) direct field research involving 357 Antarctic species, and; b) indirect or follow on research, including classification and 358 comparative analysis, and the exploration of the properties of organisms. 359 In total we identified 1,819 binomial species names with recorded occurrences in the 360 scientific literature for the Antarctic. Of these, 1,666 had specific locality information. In 361 the case of some animals such as whales, seals, penguins, and krill, common names, 362 e.g. Blue whale or Adelie penguin, appear more frequently in the literature than their Latin 363 names. To address this, additional counts were performed for the major groups including 364 both common and taxonomic names and marked in the accompanying data table. 365 Information on a public collection of biodiversity literature for Antarctica and the Southern 366 Ocean is provided in the supplementary material. 367 krill biomass on predators [63]. Antarctic krill are also a focus of the ecosystem-based 388 fisheries management approach of the based patent application has become associated with the concept of biopiracy, or 538 misappropriation, of genetic resources from countries and communities for commercial 539 gain without returning benefits to countries, communities or biodiversity conservation. We 540 now turn to the available data on patent activity for biodiversity from the Antarctic. 541

542
We identified patent activity referencing Antarctica using the search strategy described 543 above across the full texts of patent documents worldwide. The raw data was reduced to 544 29,690 applications and then further reduced to 26,120 earliest first filings that form the 545 basis of patent families. We then text mined the documents for any type of species name 546 and reduced the results to those with a verifiable occurrence in Antarctica or the Southern 547 Ocean in the available taxonomic record from GBIF. We identified a total of 3,907 patent 548 applications and 2,738 first filings that contained a verifiable Antarctic species. In total we 549 identified 1,212 species in the patent data of which 354 were verifiable Antarctic species 550 based on locality information in the taxonomic record. 551 In approaching this data we would note that the data on Antarctic species that formed the 552 basis for the search will inevitably be incomplete. As discussed below, we also note that the 553 appearance of an Antarctic species in a patent document does not necessarily mean that an 554 element of that species is claimed by the applicants. We will begin with an overview of the 555 patent data containing Antarctic species and then progressively narrow the focus before 556 concluding with examples of direct collection of samples in Antarctica. 557 Candida antarctica (accepted name Moesziomyces antarcticus). This is followed by the 563 ubiquitous E.coli. The presence of widespread species such as E. coli will in our view reflect 564 the use of this organism as a tool in biotechnology rather than specific strains from 565 Antarctica. This will also be true for other widely distributed species that have been 566 recorded in the Antarctic. 567 One important feature of patent activity is that a species may be mentioned in different 568 sections of a document. As a general rule, patent documents that mention a species in the 569 title, abstract or claims will in some fundamental sense involve that species in the 570 invention, either as a source for the invention, such as a lipase, or as a target of the 571 invention such as a pathogen. However, the main density of species references is found in 572 the description section. Figure 6 shows the breakdown of species names in the patent data 573 presented in Figure 5 by document section ranked on patent claims. 4 574 575 Figure 6: Antarctic Species in Patent Documents by Section Ranked on Claims 576 As Figure 6 reveals the majority of references to a species appear in the description section 577 with the remainder appearing in the claims. 578 References to species may appear in an application for a number of different reasons: 579

•
As part of the claimed invention (the species is material to the invention); 580 • As part of experiments leading to the claimed invention; 581 • As an actual or potential component or ingredient in the invention, including in claims 582 constructed on the genus, family, phylum or higher taxonomic levels; 583 • Literature citations (see below); 584 • Passing references (e.g. "in every species except…", or "species x has been used to do 585 y") and long lists (notably for viruses); 586 • As DNA or amino acid sequences that are either used as comparative reference 587 sequences or claimed. 588 In practice, determining whether a species is material to a claimed invention requires close 589 attention to and interpretation of the texts. In the discussion below we provide examples of 590 the different reasons that a species may appear in the text. Figure 7 presents an overview 591 of the 2,738 first filings. 592 Classification subclasses and has been edited for readability. Figure 5B suggests that the 602 Antarctic data is dominated by biotechnology with pharmaceutical or medical 603 preparations, detergents, foods, biocides and cosmetics as the other main product 604 categories. 605 In terms of the number of first filings the data is clearly led by Novozymes with other 606 companies and research organisations some distance behind. Here we would observe that 607 Novozymes has a long standing policy of including information on the geographic origin of 608 genetic material in patent applications. On balance, the number of filings overall and by 609 organisation is relatively small and subject to significant yearly variation. 610 In practice, the emerging patent landscape for Antarctica can be divided into six main 611 segments: a) sequence data b) Candida antarctica, c) Antarctic krill, d) other species 612 recorded in the Antarctic, e) citations of the Antarctic scientific literature, f) references to 613 Antarctic place names as collection sites. We now address each of these in turn. 614 leading to a total of 562,789 sequences. This may readily give an impression of significant 649 commercial interest until we recognise that 46% of activity over the period is made up of 650 two filings rising to 60% of activity across the 9 filings mentioned above. In short, 651 cumulative trends can radically amplify otherwise weak underlying activity. 652

Digital Sequence Information
It is common practice in patent analytics to focus on documents where a subject of interest 653 appears in the titles, abstracts or claims on the basis that the document will in a 654 fundamental way be 'about' that subject. Figure 8(2) reproduces the approach in Figure  655 8(1) but restricts the data, after the exclusion of E. coli, to filings where an Antarctic species 656 appears in the titles, abstracts of claims (TAC) of a filing. As the irregularity of this pattern 657 in Figure 8(2A), and the associated spike in Figure 8(2B), serve to highlight, when viewed 658 from this perspective commercial interest in Antarctic species, as reflected in sequence 659 data, can be reasonably be described as emergent rather than intense. 660 A need for caution in approaching sequence data in patent filings is also reflected in the fact 661 that, as Jefferson et. al. 2013 have ably demonstrated, sequences may appear in patent data 662 either because they are comparative reference sequences, or because they are claimed 663 [159]. However, disentangling referenced and claimed sequences requires close 664 interpretation of patent claims and represents a weak area in existing methods in patent 665 analytics. Tools such as PatSeq from the Lens are opening up the possibility of greater 666 rigour in the interpretation of sequence data in patent documents. 667 In our view, cumulative counts of sequences can serve as a useful indicator of growing 668 commercial interest in biodiversity in areas such as the Antarctic but should not be used in 669 isolation from conventional counts. Cumulative counts are particularly useful for 670 amplifying an otherwise weak signal. However, the method should logically only be used in 671 conjunction with other counts in order to avoid giving a misleading impression of intense 672 commercial interest in genetic resources when in practice activity is weak or emergent. 673 Furthermore, an exclusive focus on sequence data in the case of marine genetic resources 674 has occurred at the expense of recognition that the majority of patent activity for 675 biodiversity and marine biodiversity does not involve sequences [157,160]. Thus, in the 676 case of the Antarctic data presented here the 928 filings containing sequences constitute 677 34% of the 2,738 first filings containing an Antarctic species. As such, a broader view that 678 accommodates the full spectrum of patent activity for biodiversity is appropriate. 679 680

Candida antarctica 681
As noted above, the type specimen for Candida antarctica (accepted name Moesziomyces 682 antarcticus) was originally collected from sediment in Lake Vanda. Figure 9 displays an 683 overview of filing activity for Candida antarctica. 684 685

Figure 9: Overview of First Filings for Candida antarctica 686
Candida antarctica is a yeast species that is a source of industrially important lipases. A 687 lipase is any enzyme that catalyses the hydrolysis of fats. Patent activity for C. antarctica illustrates the point that species can be said to enjoy careers 722 inside the patent system. These careers typically start with filings on the discovery of a 723 useful property of an organism, are followed by claims to variants of that property and then 724 expand to the actual or potential use of that element in a wider range of claimed inventions 725 and products. As the uses of an element of an organism become established, research will 726 also typically turn to identifying other useful properties of an organism and the increasing 727 pursuit of alternatives from other sources to compete with those elements. Over time, the 728 bulk of activity relates to the actual or potential use of the elements of an organism in a 729 claimed invention rather than direct claims to elements of the organism. Experience 730 suggests that the careers of many species in the patent system follow this type of pattern 731 and this can also be observed in the case of Antarctic  Comparison between Figure 10 and Figure 11 helps to clarify that a single application may 755 lead to multiple applications and grants around the world. Applicants must pay fees at each 756 stage of the application procedure and, where relevant, maintenance fees for patent grants 757 in each country. Follow on filings therefore reflect the importance of the claimed inventions 758 to the applicants in specific markets. This data also demonstrates that a relatively small 759 number of filings can have a wider global impact as applicants seek to protect and 760 commercialise their claimed inventions in multiple markets. However, while Figure 11  761 shows a steeply rising trend the numbers are not dramatic relative to activity in the wider 762 patent system. 763 In the case of Antarctic krill we are witnessing a combination of an increasing number of 764 claims to elements of krill, such as krill oil, and the use of krill as an actual or potential 765 ingredient in a claimed invention (such as a foodstuff, animal feed or cosmetic form part of a wider thematic set (such as anti-freeze proteins) that indirectly informs the 795 claimed invention. In a third case, an element of an Antarctic species identified in the 796 literature may directly form part of a composition, method or process. Finally, in a small 797 number of cases, Antarctic researchers are both publishing and applying for patent 798 protection for biodiversity components arising from their research. We now briefly explore 799 this data. 800 The article on Antarctic biodiversity that has received the most patent citations, with over 801 60 citations, is a review entitled "Developments with Antarctic microorganisms: culture 802 collections, bioactivity screening, taxonomy, PUFA production and cold-adapted enzymes" 803 [183]. Patent applications citing this article have focussed on the production of 804 polyunsaturated fatty acids (PUFAs) from bacterial microorganisms [184][185][186][187] Inc and pertain to methods for  815 cooling and treating subcutaneous lipid rich cells such as adipose tissue [191,192], and 816 methods for interrupting or resuming treatments [193,194]. This is a second example 817 where the Antarctic literature indirectly informs or inspires a claimed invention because 818 the invention itself is a physical device for cooling tissue. 819 Patent claims involving biodiversity may be constructed on different taxonomic levels such 820 as species, genus, family and order. In the case of order level claims, a 1974 article "Four 821 new species of thraustochytrium from Antarctic regions…" [195] is referenced in 12 patent 822 documents from 3 patent families filed by Martek Biosciences. However, the specific 823 reference to Antarctica is limited to comparison with the growth conditions of other 824 Thraustochytrium. Patent documents within the three families include a process for 825 growing Thraustochytrium and a food product which includes Thraustochytrium [196] and 826 processes for growing microorganisms of the order Thraustochytriales [197,198]. The first 827 claim of one filing is for: "A process for culturing a microorganism of the order 828 Thraustochytriales…" in a culture medium to obtain PUFA lipids. In this case it is the 829 process for obtaining the lipids from the organisms that is the focus of the invention rather 830 than biochemical compounds from the organisms per se as in claims for compositions of 831 matter [196]. 832 Examples of patent claims at the genus level are provided in a set of 18 patent applications 833 citing an article defining the genus Nocardiopsis, including Nocardiopsis antarctica, [199]. 834 These patent documents include direct claims relating to Nocardiospis, such as a filings by 835 Novozymes in relation to proteases and associated DNA and amino acid sequences, but use 836 species other than N. antarctica such as N. alba [200]. However, these types of application 837 commonly anticipate the use of the same, or substantially similar sequences, from other 838 members of the genus through reference to other species, such as N. antarctica elsewhere 839 in the application. 840 As these examples illustrate, patent documents involving biodiversity and the biodiversity 841 literature may inform claimed inventions in a variety of ways and require considerable 842 care in interpretation. We now turn to patent filings that cite the Antarctic literature where 843 an Antarctic species is directly material to the claimed invention. is able to adapt to cold conditions and high salinity [206] has been cited in 6 patent families 852 (10 documents). These include the use of Chlorella vulgaris in the production of natural oil 853 for the purpose of manufacturing transportation fuels such as renewable diesel, biodiesel, 854 and renewable jet fuel, as well as oleochemicals such as functional fluids, surfactants, soaps 855 and lubricants [207]. This patent application has been cited by over 17 later filings. 856 Another patent application utilising the species in the production of renewable fuels, which 857 are also useful as feedstocks, also cites this article [208]. 858 Research on alkaloids from the Antarctic sponge Kirkpatrickia varialosa in the mid 1990s 859 has been cited in four patent families containing 10 documents led by the Spanish 860 pharmaceutical and marine biodiscovery company Pharma Mar filed from 2000 onwards 861 [209]. The patent families focus on the anti-tumour properties of Variolin and its 862 derivatives [210][211][212][213]. Three of the patent families contain over 30 family members with 863 protection sought in 21 countries suggesting that the applicants believe that the claimed 864 invention has significant commercial potential. obtained from its native environment or grown in artificial settings [221]. As this suggests, 880 genetic elements and compounds from Antarctic species may find applications in multiple 881 industry sectors. In total, as highlighted in Figure 5, we identified 26 first filings involving 882 Deschampsia antarctica. 883 An article examining the antifreeze protein gene from the antarctic marine diatom 884 Chaetoceros neogracile [222] is cited in a 2014 patent family filed by Samsung electronics 885 for an "Antifreeze Member". The focus of the claimed invention is the creation of a metal 886 substrate for semiconductors, energy and biosensors that overcomes the problem of frost 887 formation on cooling plates. A 2017 US patent grant to Samsung claims that this problem 888 can be solved by "a recombinant antifreeze protein in which a metal-binding protein is 889 conjugated to an antifreeze protein derived from Chaetoceros neogracile" [223]. 890 In what appears to be a small number of cases the authors of scientific articles are also 891 applying for patent protection.     a pipeline approach to monitoring Antarctic research by streaming  1064  new scientific publications and patent data from database application programming  1065  interfaces (APIs), such as the Lens, through a machine learning model for classification,  1066 name entity recognition, analysis and distribution to the scientific and policy community. 1067 The growing popularity of pipeline approaches to dealing with data at scale reflects the 1068 widespread availability of open source libraries for analytics at scale. Implementing such a 1069 pipeline would require focused investment by one or more members of the Antarctic 1070 Treaty System and would logically be coordinated with the SCAR. As this paper helps to 1071 demonstrate, this is an achievable goal. 1072 The present research also points to potential ways forward in addressing harder questions 1073 around benefit-sharing from commercial research and development involving Antarctic 1074 biodiversity. Bioprospecting has been on the agenda of the Antarctic Treaty System for a 1075 number of years. However, as far as we are aware, beyond agreement to keep discussing 1076 the issue, no consensus has emerged on a need for practical action other than collecting 1077 more information to inform deliberations. This has a certain logic in light of uncertainties 1078 about levels of activity and the actual or potential overlap between genetic resources inside 1079 the Antarctic Treaty System, those within national jurisdictions and those being considered 1080 by debates on the new treaty on marine biodiversity in areas beyond national jurisdiction 1081 under the Law of the Sea. 1082 One challenge with the treatment of bioprospecting, or commercial research and 1083 development as we prefer, is that it is largely seen in isolation from other activities in 1084 Antarctica should be emphasised that the valuation of ecosystem services is challenging and it is 1096 increasingly recognised that there is a risk that such approaches may seek to reduce 1097 biodiversity to an equivalent monetary value at the expense of recognition of the multiple 1098 values of biodiversity and its services. Nevertheless, despite these reservations, over the 1099 short and medium term this approach would place the assessment of activities such as 1100 commercial research and development or tourism within a clear and transparent 1101 framework that would bring Antarctica into the fold of wider work on the economics of 1102 biodiversity. 1103 The year 2020 has been described as a super year for biodiversity. As countries scramble to 1104 address the formidable damage caused by Covid-19 it remains to be seen whether this will 1105 become a reality. However, one important lesson from the environmental and ecological 1106 economics literature is that biodiversity cannot be treated as a free good. The joint 1107 biodiversity and climate crisis has its origins in the treatment of the environment as a free 1108 good when in fact the costs are deferred elsewhere including to future generations. When 1109 viewed from this perspective, biodiversity is not free but has to be paid for. At present, as 1110 far as we are aware, the revenue generated by biodiversity based innovation from research 1111 in the Antarctic does not contribute to the conservation of biodiversity in the Antarctic. 1112 2020 provides an opportunity to rethink the logic that produces this situation by 1113 recognising that biodiversity must be paid for. By accepting that biodiversity is not free we 1114 are then able to ask other questions focusing on returning tangible benefits to Antarctic 1115 biodiversity such as: how much, by whom, in what form, and to what ends? This paper 1116 seeks to contribute to the development of the evidence base for addressing these questions.