RT Journal Article SR Electronic T1 Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data JF bioRxiv FD Cold Spring Harbor Laboratory SP 117812 DO 10.1101/117812 A1 McMurry, Julie A A1 Juty, Nick A1 Blomberg, Niklas A1 Burdett, Tony A1 Conlin, Tom A1 Conte, Nathalie A1 Courtot, Mélanie A1 Deck, John A1 Dumontier, Michel A1 Fellows, Donal K A1 Gonzalez-Beltran, Alejandra A1 Gormanns, Philipp A1 Grethe, Jeffrey A1 Hastings, Janna A1 Hermjakob, Henning A1 Hériché, Jean-Karim A1 Ison, Jon C A1 Jimenez, Rafael C A1 Jupp, Simon A1 Kunze, John A1 Laibe, Camille A1 Le Novère, Nicolas A1 Malone, James A1 Martin, Maria Jesus A1 McEntyre, Johanna R A1 Morris, Chris A1 Muilu, Juha A1 Müller, Wolfgang A1 Rocca-Serra, Philippe A1 Sansone, Susanna-Assunta A1 Sariyar, Murat A1 Snoep, Jacky L A1 Stanford, Natalie J A1 Soiland-Reyes, Stian A1 Swainston, Neil A1 Washington, Nicole A1 Williams, Alan R A1 Wimalaratne, Sarala A1 Winfree, Lilly A1 Wolstencroft, Katherine A1 Goble, Carole A1 Mungall, Christopher J A1 Haendel, Melissa A A1 Parkinson, Helen YR 2017 UL http://biorxiv.org/content/early/2017/03/20/117812.abstract AB In many disciplines, data is highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline ten lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers; we also outline important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.