RT Journal Article SR Electronic T1 Geocoding genomic databases using GBIF JF bioRxiv FD Cold Spring Harbor Laboratory SP 469650 DO 10.1101/469650 A1 Roderic D. M. Page YR 2018 UL http://biorxiv.org/content/early/2018/11/14/469650.abstract AB Many nucleotide sequences in the publicly available genomics databases lack spatial information, such as the latitude and longitude coordinates for the locality where the sample for sequencing was taken. In this note I discuss several approaches to geocoding sequence records. The first method uses the Global Biodiversity Information Facility (GBIF: https://gbif.org) as a gazetter. The availability of a simple full text search across GBIF data makes it possible to rapidly geocode locality information simply by searching for matching records within GBIF. Hence if a sequence lacks coordinates but has some locality information it could be rapidly geocoded. The second method matches voucher specimen code for sequences with the corresponding specimen records in GBIF, which may be geocoded even if the sequence obtained from that specimen is not. Lastly, there will be cases where sequence records lack either locality or specimen information, but that information is available elsewhere, such as in the published literature or in supplementary data files. The possibility of publishing geocoded sequence records using Github is discussed.