Automatic OpenAPI to Bio.tools Conversion

Computation has become a central component of life sciences research. Making computational services FAIR has had a strong interest from the life sciences community in the past 15 years. Admittedly, uptake of any of the developed solutions has been limited, and the existence of multiple approaches will not have helped. Interoperability of solution may be essential. This work introduces an interoperability layer between two approaches for FAIR annotation of web services: OpenAPI and bio. tools.


Introduction
For many years finding webservices and documenting webservice functionality has been a key topic in Europe. For example, projects like EMBRACE and BioCatalogue annotated SOAP webservices [Rice2006, Bhagat2010]. Recently, ELIXIR started the bio.tools service [Ison2016], with a broader scope and no longer limited to SOAP webservices. With alternatives being proposed, such as the self-documenting XMPP web services [Wagener2009] and SADI [Wilkinson2011], this is welcome. Similarly, REST [Fielding2000] and REST-like webservices are starting to get documented and annotated with OpenAPI (https://www.openapis.org/), previously called Swagger. OpenAPI has the same goal as bio.tools, as well as other annotation solutions, such as BioSchemas [Larcombe2017]: FAIR annotation [Wilkinson2016, Mons2017] of webservices. There is increasing notion that this is needed to move the field forward [Exner2017, Hanwell2017]. Therefore, being able to interconvert specifications is a useful interoperability feature. To implement this interoperability layer, ELIXIR-DK and Maastricht University set up a small project, focusing on converting OpenAPI 2.0 documentation file into bio.tools information. One point of interest here is that bio.tools support ontological annotation (using the EDAM ontology [Ison2013]), something that is not availlable in OpenAPI 2.0. This document reports on the results of this project.

OpenAPIs for existing REST services
To demonstrate the potential impact of this work, we manually defined an OpenAPI 2.0 JSON specification file of an existing REST interface, namely that of the Ensembl database [Flicek2013]. The online Ensembl API information was used to identify the various methods to be made available in the OpenAPI JSON.

Crowdsourcing OpenAPIs for the life sciences
In autumn 2016, Twitter and Biostars were used to interest people in crowdsourcing and provide information about webservices in the life sciences with OpenAPIs [Willighagen2016a,Willighagen2016b]. It was decided to record the OpenAPI in Wikidata [Mietchen2015], where existing statement models were available to record "URL"s for databases, and the "instance of" qualifier was used to indicate that the URL represented a OpenAPI endpoint. The following model was selected to record the availability APIs (alongside an example annotation of a SPARQL end point): These can be recovered from Wikidata with the following SPARQL query: This query can be easily repeated using this link . The resulting table shows the database entry in Wikidata, name of the database, the license of the data (entry and name), and the location of the OpenAPI endpoint. This annotation in Wikidata was set up to avoid having to register these services in bio.tools manually, and to allow the conversion tool to be used for this.

JSON for ontology annotation
To overcome the discrepancy in the OpenAPI JSON with respect to the ontology annotation, the concept of an external annotation file was set up, also in the JSON format. For the annotation the EDAM ontology is used [Ison2013].

The convertor
The Java programming language was used to implement the convertor. The convertor is designed to support the OpenAPI 2.0 specification and has been tested by applying it to a number of OpenAPI 2.0 webservices. The tool can be compiled using the Maven build system.

Results
The following sections shows the results of our efforts to show how community OpenAPI service documentation can be used in a semi-automated way to populate bio.tools. We first report on the OpenAPI JSON we created for a key European resource for the life sciences, Ensembl. We then show a, not too well succeeded, crowdsourcing of OpenAPI specifications of life sciences-related webservices. This is followed by results showing our annotation with EDAM ontology terms, and conversion to bio.tools input. We conclude with reporting on new and update bio.tools entries, enriched with specific API calls.

Wikidata OpenAPI listing
The crowdsourcing of OpenAPIs for the live sciences yielded 14 OpenAPI-based web services related to the life sciences, as given in the below table. This table was generated using an appropriate SPARQL query.

EDAM Ontology Annotation
The ten remaining specifications were annotated with EDAM ontology terms. That annotation covers the input parameters, where identifier (edam:data_0842) and data (edam:data_0006) are frequently used, but also more specific terms, such as Ensembl Gene ID (edam:data_1033). Because most current OpenAPIs are for databases, rather than webservices, a lot of operations are for Data retrieval (edam:operation_2422). More detail about the nature of the operation is provided in a hover over: Output is annotated with Data (edam:data_0006) or Identifier (edam:data_0842), depending on the services. The output formats are also annotated, for example with HTML (edam:format_2331), XML (edam:format_2332), and JSON (edam:format_3464). This API only has very general EDAM ontology annotations, though it should be noted that no specific identifier entry is available for some of the supported identifiers, such as the SPLASH [Wolgemuth2016].

Discussion
The first observation is that the number of publicly available OpenAPI specifications in the life sciences is very disappointing. With hundreds of updated and new databases reported in the database issue of Nucleic Acids Research , a meager fourteen webservices is not what I expected. Furthermore, not all of those use the current stable OpenAPI specification, version 2.0, further reducing the number of services for which the convertor works. On the other hand, many more REST and REST-like services exist that can be documented with an OpenAPI specification.
There is plenty of room for improvement. The workflow does not yet take advantage of the API of bio.tools itself. We envision that later versions will inject the service specification JSON directly into the bio.tools entry JSON using this API. An second interesting opportunity is provided by the upcoming OpenAPI 3.0 specification. This new version is more modular, and has the option of specification extensions ( https://github.com/OAI/OpenAPI-Specification/blob/OpenAPI.next/versions/3.0.md ). This approach can be used to integrate the EDAM ontology annotations directly into the OpenAPI JSON, removing the need for a separate JSON files. One point of attention is that input parameters for web service methods is that they may except different types in the same methods. For example, multiple types of identifiers as input for a single method, where the methods figures out what the type is. In that case, the input of that method needs to be annotated with either a general Identifier term, or with multiple terms. However, the technical feasibility of that in bio.tools is yet to be explored. Similarly, we still need to explore if bio.tools allows us to annotate input parameters as optional.
This discussion leads to many new wishes, and, particularly, these specific ideas: • Support OpenAPI v3 (currently in draft) • Develop an OpenAPI v3 extension to support ontology annotation • Align with smartAPI for ontological annotation [Zaveri2017] • Align with OpenRiskNet [Exner2017] • Write outreach material about OpenAPI annotations in the life sciences Support of OpenAPI v3 by existing services is supported by tools that convert OpenAPI v2 configuration files to v3 [Ralphson2017].

Conclusion
The project has shown how existing and new OpenAPI documentation files can be enriched with ontological annotation and automatically converted into bio.tools entry content. Using this approach, we converted nine OpenAPI JSON configuration files and enriched the matching entries in the bio.tools registry. The number of OpenAPI specifications is unexpectedly low, given the advantages this interactive documentation of webservices offers and the plethora of life science databases published yearly. That said, the upcoming OpenAPI specification is even more powerful, and bio.tools has an opportunity to expert entries as OpenAPI 3.0 to further promote this piece of the FAIR data and interoperability landscape.