ABSTRACT
Objective ClinicalTrials.gov is a registry of clinical-trial metadata whose use is required by many funding agencies and scientific publishers. Metadata are essential to the reuse of data, but issues such as heterogenous metadata schemas, inconsistent values, and usage of free text instead of controlled terms pervade many metadata repositories. Our objective is to evaluate the quality of metadata about clinical studies in ClinicalTrials.gov and to document strategies to improve metadata accuracy.
Methods Using 302,091 metadata records, we evaluated whether values adhere to type expectations for Boolean, integer, date, age, and value-set fields, and whether records contain fields required by the Food and Drug Administration. We tested whether values for condition and intervention use terms from biomedical ontologies, and whether values for eligibility criteria follow the recommended format.
Results For simple fields, records contain correctly typed values, but there are anomalies in value-set fields. Contact information, outcome measures, and study design are frequently missing or underspecified. Important fields for search, such as condition and intervention, are not restricted to ontology terms, and almost half of the values for condition are not from MeSH, as recommended. Eligibility criteria are stored as unstructured free text.
Conclusions ClinicalTrials.gov’s data-entry system enforces a schema with type restrictions, freeing records from common issues in metadata repositories. However, lack of ontology restrictions or structure for the condition, intervention, and eligibility criteria elements significantly impairs reusability. Searchability of the database depends on infrastructure that maps free-text values to terms from UMLS ontologies.