Molecular Biology Information Service: An innovative medical library-based bioinformatics support service for biomedical researchers

Biomedical researchers are increasingly reliant on obtaining bioinformatics training in order to conduct their research. Here we present a model that academic institutions may follow to provide such training for their researchers, based on the Molecular Biology Information Service (MBIS) of the Health Sciences Library System, University of Pittsburgh. The MBIS runs a four-facet service with the following goals: (1) identify, procure, and implement commercially-licensed bioinformatics software, (2) teach hands-on workshops using bioinformatics tools to solve research questions, (3) provide in-person and email consultations on software/databases, and (4) maintain a web portal providing overall guidance on the access and use of bioinformatics resources and MBIS-created webtools. This paper describes these facets of MBIS activities from 2006-2018, including outcomes from a survey measuring attitudes of University of Pittsburgh researchers about MBIS service and performance.


Introduction
Recent advancement in molecular technologies such as massively parallel DNA sequencing, microarray platforms, and other high-throughput methodologies, generate substantial amounts of scientific data. Following the successful completion of the Human Genome Project [1,2], initiatives such as the Human Microbiome Project [3,4], the ENCyclopedia of DNA Elements [5], The Cancer Genome Atlas [6], and the 1000 Genome Project [7] continue to generate a massive catalog of biological datasets. In response to this data deluge, bioinformatics software and databases utilizing computational and statistical methods rapidly evolve. Thriving in the current big data-intensive life sciences research environment requires proficiency in bioinformatics tools, which assist with the formulation of new hypotheses, the design of studies to test these hypotheses, and the analysis, interpretation, and validation of experimental results.
The opportunities for experimental biologists to receive bioinformatics training is often limited.
Undergraduate and even graduate curricula do not routinely include mandatory bioinformatics classes [8]. Additionally, the bioinformatics resources landscape changes at a rapid pace -the most sought-after resources can quickly become obsolete [9,10]. It is very challenging for biomedical researchers to self-train and stay updated with this moving target.
To help such researchers at the University of Pittsburgh (Pitt), the Health Sciences Library System (HSLS) established the Molecular Biology Information Service (MBIS) in 2002 as an innovative bioinformatics support service. In most biomedical research-intensive institutions, bioinformatics support is typically delivered through departments such as computational biology or biomedical informatics, or facilities such as the sequencing core. Medical libraries traditionally support biomedical research by providing access to journals and books, procuring licenses for electronic resources, providing instructional workshops, and efficiently delivering digital content. As shared-use facilities, libraries can leverage their operational infrastructure to provide a bioinformatics-focused information service by incorporating molecular databases and software into their collections and offer training on the use of these resources.
HSLS was an early pioneer in the implementation of a health sciences library-based bioinformatics support service. The first library to offer a bioinformatics service was the University of Washington Health Sciences Library (UW). The UW service was initiated in 1995 and directed by a PhD scientist [11]. HSLS emulated the UW program in 2002 by hiring a PhD molecular biologist and developing a four-facet service with the following goals: (1) identify, procure, and implement commercially-licensed bioinformatics software, (2) teach hands-on workshops using bioinformatics tools to solve research questions, (3) provide in-person and email consultations on software/databases, and (4) maintain a web portal providing overall guidance on the access and use of bioinformatics resources and MBIS-created webtools.
MBIS has been well-received by the Pitt research community and is now in its second decade of service. Program activities recorded from implementation to 2006 were previously described [12,13]. Several other university libraries have also embraced this type of bioinformaticsfocused specialized service as a means to connect with their own biomedical research communities [14][15][16]; a 2006 special issue of the Journal of Medical Library Association describes a few of these programs in detail [17][18][19].
MBIS workshops initially focused on information retrieval and searching strategies for molecular databases, and the original HSLS-licensed software products were intended for lowthroughput DNA and protein sequence analysis. However, the attention of the research community has shifted to massively parallel sequencing technologies, aka Next Generation Sequencing (NGS). The advances in NGS technology make it is easier, cheaper, and faster to generate huge volumes of sequencing data. Bioinformatics software is routinely developed to analyze these complex, large datasets: RNA-Seq for gene expression studies [20], Exome-Seq for variant detection [21], and ChIP-Seq [22]and ATAC-Seq [23] for epigenomic experiments.
Application of these computational tools is critical for experimental scientists to uncover molecular mechanisms underpinning intricate biological processes and diseases.
Proficiency with NGS software requires appropriate training with sufficient access to leading tools [24]. As Pitt researchers increasingly began to request assistance with NGS data analysis, MBIS refined its approach by developing collaborative partnerships with other university units and expanding services with the addition of new resources and personnel. This paper describes the four facets of MBIS activities from 2006-2018, as well as outcomes from a survey measuring attitudes of Pitt researchers about MBIS service and performance. Our intention is that universities with libraries considering or currently providing a specialized bioinformatics service might glean valuable information from the approaches taken and lessons learned by the Pitt HSLS MBIS program.

University & Library Environment
HSLS thrives in a rich environment of biomedical education, research, and clinical practice at Pitt. As of the 2018 fall term, total undergraduate/graduate student enrollment was more than 28,000, with over 5,000 full or part-time faculty members, and over 700 postdoctoral or research associates [25]. In terms of federal funding, Pitt ranks fifth in competitive grants awarded by the U.S. National Institutes of Health and ninth in federal science and engineering funding, according to a report by the National Science Foundation [26].

MBIS Survey
To evaluate the effectiveness of the MBIS program, an online Qualtrics survey was  Table 1. , and no one expressed dissatisfaction with current offerings. When asked how satisfied they are with the software registration process, including delivery of access instructions, 64% think the process works smoothly (n = 78), 23% indicate that the registration process is fine, but the access instructions could be improved (n = 28), and 2% express frustration (n = 2). Regarding access issues, 48% report no access problems (n= 58), 31% sometimes have access problems (n = 38), and only 2% often experience access problems (n = 3). When asked whether they reference any MBIS commercially-licensed software tools in their publications, 14% indicate yes (n = 17), 46% report "not yet but will soon" (n = 56), and 40% indicate no (n = 49).
The university-wide licensing of proprietary bioinformatics software by HSLS has proven to be very beneficial. It results in significant cost savings by relieving the financial burden of individual labs to license and manage essential commercial bioinformatics software packages [35].
Software is quite expensive for individual research groups, and vendors routinely release software updates at an additional expense. Bioinformatics software usage by individual research groups is normally a project-based process. In our experience, CLC Genomics Workbench is extensively utilized during transcriptomics data analysis, but then rarely used until a similar project starts in the lab. Since HSLS covers yearly maintenance for all licensed software, users can always access the latest version of the software, free of cost.   In bioinformatics, the recommended practice is to analyze the same dataset using multiple software packages and then compare the results [39]. The addition of these NGS workshops obliged MBIS to discontinue workshops with lower attendance in order to cope with the limited time and classroom space available each semester.

MBIS Bioinformatics Training
As a result, workshops on topics such as DNA/protein information retrieval, sequencing similarity searching, and DNA/protein sequence analysis tools are no longer listed in the workshop roster (Figure 1). According to the MBIS survey, researchers do want specialized workshops covering statistics and data science. MBIS and the HSLS Data Services team [40] therefore developed a partnership with a faculty expert from the CRC to teach Python workshops in 2017, and a workshop on using R for genomics begins in 2019.

MBIS Bioinformatics Consultations
Biomedical researchers regularly contact MBIS for assistance. These interactions fall into two general categories: (1) prompt responses to relatively simple email questions, often on licensed software access issues and (2)   The analytical workflows for NGS software are complex for many researchers, and rigor and reproducibility in data analysis is a growing concern. MBIS workshops provide a solid overview of the multi-step process, but additional training is often necessary to ensure competence. In

MBIS Web Portal
MBIS maintains an active website [30] which serves as the digital gateway for HSLS bioinformatics services. The portal displays links to workshop guides, class calendar, MBIS blog/newsletter, consultation request form, and more. To provide a thorough overview of the MBIS software collection, we created an infographic embedded with links to guides dedicated to each of the listed tools [41]. These pages provide registration and access information, highlight key software features, and link to supportive webinars, tutorials, user manuals, and other relevant materials.
MBIS and the HSLS digital services team continually collaborate on HSLS-developed online resources such as data mining web tools and domain-specific search engines [42]. For example, a search tool prominently displayed on the MBIS webpage helps researchers via a federated seach to identify useful molecular databases/software, experimental protocols, lecture videos, and peer-recommended articles. The search engine, powered by IBM Watson Explorer [43] and licensed by HSLS, clusters the search results into meaningful categories for easy navigation.
In 2015, MBIS released "InfoBoosters," an innovative, easy-to-install web browser widget that integrates digital text with databases to retrieve relevant information on demand [44] [45]. As life sciences research becomes increasingly interdisciplinary, scientific papers read by the research community often include genes, proteins, methodologies, and biological concepts outside of a readers' domain of expertise. In order to thoroughly comprehend such articles, it is necessary to explore any unknown terms. Information is readily available in various molecular databases, but the reading format of journal articles (PDF or Web-based) does not provide links capable of accessing these databases directly from the article. The typical multi-step method to learn more is to (1) leave the article, (2) go to a separate online database, (3) search for the term of interest, (4) identify a reputable knowledge source, (5) scan it for the pertinent information, and then (6) return to the original article to continue reading. InfoBoosters improve upon this inefficient process and directly connect readers to databases such as UniProt and NCBI resources, as well as general information sites such as Wikipedia and Vocabulary.com.
As an example, by highlighting a gene term in an article and then clicking on the "Protein-Info" InfoBooster, a pop-up window appears displaying protein-centric information for that gene as

Outcome
With the growth of genome sequencing, the use of computational tools in biomedical data analysis is an integral part of the research [35]. However, due to limited time and training opportunities, many in the biomedical experimental research workforce lack the necessary skillset [10]. Thus, some researchers must either collaborate with expert bioinformaticians or outsource data analysis efforts to institutional bioinformatics core facilities . Given the limited availability of skilled bioinformatics professionals, this approach often requires extra time and expense for project completion. To speed up the process, researchers may prefer to perform their own bioinformatics analyses, and consequently require appropriate training.
MBIS fulfills such a bioinformatics training need. Our services are valuable to the Pitt research community thanks to a service-oriented philosophy as a library-based program available to all.
The workshop format of 2-3 hours is particularly appealing over a more traditional semester-20 long for-credit bioinformatics class in that it (1) requires a lesser time commitment and (2) provides no-cost training to all members of a research team, including postdoctoral researchers and lab technicians.

Challenges
The maintenance of a library-based bioinformatics service is not without its obstacles. The primary challenge is keeping up with advancements in the field of bioinformatics; new technologies frequently replace older experimental methodologies, resulting in the perpetual emergence of new data analysis tools. Another considerable challenge is the management of the HSLS-purchased software licenses that have limited concurrent usage access. Users are occasionally denied access to software when the concurrency limit is reached, which is frustrating and results in an increased number of emails to MBIS staff with requests to troubleshoot. It is also demanding to create workshop content appropriate for all audiences, especially attendees with a diverse level of expertise and background knowledge. Condensing a convoluted, multi-step data analysis process, such as RNA-Seq analysis, into a three hour session takes a lot of time and care.

Conclusion
Establishing and maintaining a library-based bioinformatics program with scalable infrastructure and sustainable service requires strong support from research leadership, along with dedicated funding for resources, infrastructure, and personnel. We believe that the We anticipate the demand for bioinformatics software and training offered through MBIS will continuously increase in response to easy access to the latest NGS sequencing and informatics technology provided by the recently established UPMC Genome Center [48]. Additionally, the recent shift of research interest from the characterization of heterogeneous cell populations to the high-resolution study of single cells will also multiply the need for large-scale data analysis [49]. As it has for the past 17 years, the MBIS will continue to adapt its educational outreach, collaborative partnerships, and licensed tool resources as needed to best support the Pitt biomedical research community.
We hope that our experience with the HSLS Molecular Biology Information Service may serve as a guide to other institutions and libraries interested in developing a similar program or expanding upon their current services.