Abstract
Capturing the published corpus of information on all members of a given protein family should be an essential step in any study focusing on any specific member of that said family. This step is often performed only superficially or partially by experimentalists as the most common approaches and tools to pursue this objective are far from optimal. Using a previously gathered dataset of 284 references mentioning a member of the DUF34 (NIF3/Ngg1-interacting Factor 3), we evaluated the productivity of different databases and search tools, and devised a workflow that can be used by experimentalists to capture the most information in less time. To complement this workflow, web-based platforms allowing for the exploration of member distributions for several protein families across sequenced genomes or for the capture of gene neighborhood information were reviewed for their versatility, completeness and ease of use. Recommendations that can be used for experimentalist users, as well as educators, are provided and integrated within a customized, publicly accessible Wiki.
Data summary The authors confirm all supporting data, code, and protocols have been provided within the article or through supplementary data files. The complete set of supplementary data sheets may be accessed via FigShare.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵+ APC Microbiome Ireland, Science Foundation Ireland Research Centre, Bioscience Building, School of Microbiology, University College Cork ― National University of Ireland, Cork, Ireland
& University of Central Florida, 4000 Central Florida Blvd. Orlando, Florida, 32816, USA