Characteristics and data reporting of rare disease clinical trials: Getting better but still room for improvement

Background It is estimated that there are more than 7,000 rare diseases (RDs) worldwide, impacting the lives of approximately 400 million people and only 5% have an approved therapy. Facing special challenges, including patient scarceness, incomplete knowledge of the natural history and only few specialized clinical sites, clinical trials (CT) are limited, making the data from trials critical for research and clinical care. Despite the introduction of the U.S. Food and Drug Administration Amendment Act (FDAAA) in 2007 requiring certain CTs to post results on the registry ClinicalTrials.gov within 12 months following completion, compliance has been reportedly poor. Here, we describe general characteristics of RD CTs, identify trends, and evaluate result reporting practices under the FDAAA aiming to draw awareness to the problem of non-compliance. Methods CTs conducted between 2008 and 2015 were extracted from the public U.S. trial registry ClinicalTrials.gov using the text mining software I2E (Linguamatics). Disease names were matched with rare disease names from the Orphanet Rare Disease Ontology (ORDO, v2.5, Orphanet). Statistical analyses and data visualization were performed using GraphPad Prism 7 and R (v3.5). The Student’s t-test was employed to calculate significance using p-value cut-offs of <0.05 or <0.001. Results We analyzed 1,056 RD CTs of which 55.7% were phase 2, 7.7% phase 2/3 and 36.7% phase 3 trials. The studies were mostly one- and two-armed experimental CTs with the majority (60.2%) being funded by industry. Cystic fibrosis and sickle cell disease represented the most frequently investigated diseases (25.0% and 16.5%). Industry-led phase 2 RD CTs were significantly (p<0.0001) shorter than their equivalent led by academia/non-profit (22 vs. 33 months). Screening CTs completed before the end of 2015, we found that of the 725 analyzed studies, 55.2% predominantly phase 2 CTs, did not report results. Taking their potential applicability to the FDAAA into account, 25.2% industry-funded and 28.0% academia/non-profit-funded trials failed to disclose results on ClinicalTrial.gov. Conclusion RD CTs tend to be comparatively small, industry-funded studies focusing on genetic and neurologic conditions. Sponsor-related differences in study design, duration, and enrollment were observed. There are still substantial shortcomings when it comes to result publication.


Disease categories 151
To generate a more comprehensive view of the RD CT landscape, we analyzed the disease 152 areas addressed in this dataset. By leveraging the Orphanet Diseases 2.5 ontology, we were 153 able to map all but one trial to specific conditions. Each CT could be allocated to up to seven 154 diseases and each disease could be assigned to more than one category. The ten most 155 common rare disease categories were genetic disease (19.4%), neurologic disease (12.3%), 156 respiratory disease (8.2%), eye disease (6.4%), systemic or rheumatologic disease (5.9%), 157 hematologic disease (5.4%), renal disease (5.0%), infertility (4.7%), skin disease (4.4%) and 158 developmental defect (4.1%). These accounted for 75.8% of all diseases. In fact, 92.9% 159 (n=971) of the RD CTs were mapped to at least one of the top 10 categories. In detail, 35.8% 160 (n=378) of the studies were mapped to one top-ten category, 21.8% (n=230) were mapped to 161 two and 20.5% (n=216) to three. 162 Furthermore, we found that the top ten diseases account for 86.2% of all CTs with cystic 163 fibrosis (CF) accounting for 25.0%, followed by sickle cell disease (16.5%), hemophilia 164 (8.7%), Fabry disease (6.3%), fragile X syndrome (6.1%), Huntington disease (5.7%), 165 sarcoidosis (4.9%), systemic sclerosis (4.9%), Friedreich Ataxia (4.3%) and muscular 166 dystrophies (primarily Duchenne, but also Becker and oculopharyngeal muscular dystrophy) 167 (3.9%). To determine whether the ten most investigated diseases also reflect the leading ClinicalTrials.gov investigated cancers and other neoplasms (13.8%), followed by general 177 pathology (10.5%), nervous system diseases (9.2%), digestive system diseases (6.7%), heart 178 and blood diseases (6.6%) and behaviors and mental disorders (6.5%). Thus, RD CTs 179 represent a different spectrum of disease areas as compared to more common diseases. In 180 summary, the majority of the investigated diseases in our dataset are rare genetic and 181 neurologic conditions with CF and sickle cell disease being the most studies diseases. 182 183

Intervention type 184
Next, intervention types across study phases together with sponsors were analyzed (

Study location 203
The selection of investigational sites can be pivotal for enrollment and overall study success 204 and their number varies considerably depending on the sponsor. Out of the 1,056 trials in our 205 dataset, 632 (59.8%) entered at least one study site and, analyzing both sponsor categories 206 separately, no significant difference was found. While 58.3% of all 635 CTs run by industry 207 entered a study location, it was 62.1% of those run by academia/non-profit out of 420. One 208 study could not be allocated to a sponsor and was therefore excluded leaving 1,055 trials for 209 analysis. Although CTs from all around the world can be registered in ClinicalTrials.gov, we 210 found that 44.3% (n=4,431) of the study sites were located in the U.S.. Here, the sum of 211 counts by location does not equal the total CT number, as each location indicated by a study 212 is counted and a single study may be counted more than once. Analyzing the number of sites 213 by continent, the majority of sites are located in North America (48.5%, n=4,845) and Europe 214 (33.9%, n=3,387), followed by Asia (11.4%, n=1,141), Central and South America (2.9%, 215 n=286), Australia and Oceania (2.7%, n=269) and Africa (0.7%, n=72). The countries with 216 the most study sites were the U.S. with 4,431 sites, Germany with 621 sites, France with 566 217 sites, Japan with 429 sites, Canada with 414 sites and the UK with 395 sites as well as Italy 218 and Spain, with 326 and 251 sites, respectively. Approximately one third of RD CTs were run 219 at a single site (34.7%, n=219), followed by 14.9% with up to five sites, 12.5% with up to ten 220 sites and 9.7% with up to 15 sites. 4.6% and 5.4% of the studies had up to 15 and 25 study 221 sites, respectively, and 1.6% indicated more than 100. The remaining 16.8% were scattered 222 between 25 and 100 study sites. Moreover, industry-led trials significantly increased the 223 number of study sites with ascending study phase. Accordingly, the mean location number for industry in phases 2, 2/3 and 3 was 15.0 (median of 7.5), 23.0 (median of 18.0) and 32.7 225 (median of 23.5), while for academia the location numbers per phase were 3.1 (median of 226 1.0), 4.6 (median of 1.0) and 9.6 (median of 1.0). Compared to the entire trial corpus present 227 on ClinicalTrials.gov, where 40% of the CTs indicated trial sites in the U.S. (35% U.S. only 228 and 5% U.S. and non-U.S.), we found that RD CTs tend to be more often conducted in the 229 academia/non-profit CTs. This enrollment difference between sponsor types was significant 242 (p<0.05) in phases 2 and 2/3. The mean enrollment was approximately 96% (industry) and 243 83% (academia/non-profit) higher than the median. Of the 73,071 participants enrolled in 244 industry-funded trials, 59.4% had been recruited for phase 3 CTs, followed by 33.6% for 245 phase 2 CTs, and 6.9% for phase 2/3 CTs. Academia/non-profit enrolled 44.0% of 24,959 246 participants in phase 3 CTs, 40.7% in phase 2 CTs, and 15.3% in phase 2/3 CTs. In analyzing 247 gender eligibility of RD CTs, we found that the majority of the studies, 89.3%, admitted 248 participants of any gender, whereas 7.1% and 3.6% of RD CTs enrolled only male and female subjects, respectively. In sum, industry-led trials consistently enrolled consistently more 250 patients than academia/non-profit-funded trials, with notable variability as reflected by the 251 standard deviation in enrollment. 252 253

Participant Age 254
A RD can affect anyone irrespective of age, but at least half of RDs manifest themselves in 255 early childhood. To determine if this is reflected in RD CTs, the participant age was assessed. 256 Therefore, the entries for the ClinicalTrials.gov categories "participant age", "participant 257 minimum age" and "participant maximum age" were retrieved ( Table 3). 258  of the studies indicated an upper age limit for participants. Of those, 41.6% indicated an 262 extended age of eligibility above 65 years and a notable fraction (15.2%) of trials set the 263 maximum age between 11 and 18 years. While extracting the minimum and maximum age 264 limits was straightforward, the analysis of the participant age, filled out by 96.8% of the 265 studies, proved complicated due to the lack of uniformity of the information entered. After 266 manual categorization into age groups, we were able to visualize the data (Fig. 2). Even 267 though around three quarters of the studies set the upper age limit to more than 60 years, numerous studies put their focus on a younger population, mirrored by the high proportion of 269 studies indicating a minimum participant age below 18.

Study duration 274
Another aspect of CTs is study duration, as calculated in this analysis from the study start to 275 the study completion date. We found that phase 2 studies conducted by industry were, with a 276 median duration of 22 months (IQR, 14-33), significantly (p<0.0001) shorter than their 277 academia/non-profit-led equivalent, with a median duration of 33 months (IQR, 22-47) (Fig.  278 3). This trend could be similarly observed in phase 3 CTs, with industry conducting shorter 279 studies with a median of 27 months (IQR, 18-41) versus 34.5 months (IQR, 19-52) for 280 academia/non-profit CTs (p<0.05). In contrast, the study duration did not differ significantly 281 in phase 2/3 CTs between industry and academia/non-profit with a median of 29 (IQR, 21-282 37) and 31 (IQR, 18-48), respectively. In summary, industry-funded phase 2 and 3 RD CTs 283 are of a shorter duration than studies led by academia/non-profit. investigation has been manufactured in the U.S. or its territories, or if the CT has at least one 292 study site in the U.S. or its territories. To examine result reporting, a second dataset of 293 (n=725 CTs) consisting of 55.9% phase 2, 7.6% phase 2/3, and 36.6% phase 3 CTs was 294 generated (Supplementary table 1). Overall, 44.8% (n=325) trials had entered results while 295 55.2% (n=400) had not. Of those that failed to report results, 65.8% were phase 2, 9.8% were 296 phase 2/3, and 24.5% were phase 3 CTs. Industry reported consistently more results over all researched disease area, followed by neoplasms and general pathologies. In contrast, 360 digestive system diseases, heart and blood diseases, and behaviors and mental disorders were 361 highly represented in ClinicalTrials.gov, but not in our RD dataset. Therefore, although rare 362 diseases are numerous, the vast majority of clinical trials focus on a few disease categories 363 and conditions, such as CF. Furthermore, we found Huntington disease (HD) representing the 364 top disease in rare eye disorders, followed by uveitis. Although HD is not primarily an eye 365 disorder, a possible explanation for this finding could be a common endpoint in this disease 366 (Unified Huntington's Disease Rating Scale), including ocular assessments causing its 367 misclassification as eye disorder. This example highlights the difficulties that can occur when 368 analyzing data from Clinicaltrials.gov that has not been curated or reviewed critically. Africa, respectively. However, 40.2% of CTs did not report a location, biasing the data and 377 its interpretation. In accordance with a previous report analyzing the International Clinical 378 Trial Registry Platform (ICTRP), we found Japan to be the country with the most CT sites in 379 Asia 23 . While the mentioned report attributes the shift away from "traditional" CT sites in 380 Western countries to lower study costs in other countries, the study location for RD CTs 381 could be more influenced by patient prevalence, the availability of specialized care centers 382 and specific legislation fostering RD research 24-25 . 383 Considering the general scarceness of patients and experts/specialized care centers in RDs, it 384 is not surprising that half of the trials indicate at most five sites. In contrast, almost two thirds 385 of the trials indicated more than one study site, pointing towards the implementation of more, 386 but smaller study sites in RD research. Interventional RD CTs that were previously found to 387 have less study sites compared to non-RDs might provide support for this hypothesis 26 . The 388 correlation we found between industry-funded CTs and an increased number of study sites 389 could be linked to the different financial support received compared to academia/non-profit. 390 Although phase 2 CTs prevail in our dataset, most participants were enrolled in phase 3 CTs, 391 with the majority enrolled in industry sponsored CTs. A study analyzing common disease 392 CTs of all phases reported an overall enrollment rate below 100 participants for 62% of 393 studies, which was the same percentage we observed in phase 3 CTs in ClinicalTrials.gov.
Examining phase 2 and 2/3 RD CTs, the proportion of studies enrolling 100 or less patients 395 rose to 87.8% and 72.8%, respectively. Comparing our dataset to the aforementioned study 396 based on the respective median enrollment, we found comparable accrual numbers between 397 RD CTs overall and therein described oncology trials, whilst mental health and 398 cardiovascular CTs were higher at 85 and 100, respectively. It is somewhat surprising that the 399 number of patients in RD CTs and non-RD CTs are similar, as we expected RD CTs to enroll 400 fewer patients as previously indicated in a study comparing rare with non-rare disease CTs 26 . 401 While this study describes mostly early phase CTs with fewer participants, most of the 402 patients in our dataset were enrolled in phase 3 CTs. Additionally, the limited power of small 403 studies to show significant clinical efficacy, might be avoided by study sponsors resulting in 404 higher accrual numbers. Reasons for less participant accrual in non-RD CTs could be 405 competing trials targeting similar participant populations or even an increased focus on small 406 sub-populations within a larger disease area. This indicates that CTs in RDs and more 407 common diseases may be becoming more similar and both may have issues with recruitment 408 of sufficient numbers of patients to draw firm conclusions. Notably, we observed great 409 heterogeneity between trials enrolling few patients, sometimes in single digits and trials 410 enrolling hundreds or thousands, a phenomenon that has also been previously reported 17 . 411 Gender disparity in biomedical research, which may insert a bias in clinical findings and 412 therefore may lead to a disadvantage in clinical practice for women, is recognized and has 413 been described before 17,27,28,29 . Although we did not observe a significant gender bias in RD 414 CTs, we found that out of all studies recruiting only patients from a specific gender, twice as 415 many recruited only male patients than female, which could be partially attributed to the 416 aforementioned historic and general underrepresentation of women in CTs. 417 The age of eligibility for study subjects is crucial in many diseases, often influencing the 418 overall study success. The majority of RDs are thought to be genetic disorders, which present 419 already in early childhood or adolescence, thereby necessitating an early therapy start 30-32 . 420 Consequently, RD CTs had a wide age range with the majority including pediatric patients, 421 unlike in more common diseases that can occur throughout life 17,26 . 422 CT participation is often associated with great efforts by (pediatric) patients and their family 423 members, thus the study duration can have a major impact on participant compliance and 424 retention in the study. We found a direct correlation between the study duration and study  apply to the FDAAA, industry funding is shown to be associated with a higher degree of non-456 reporting in RD CTs. However, including potentially applicable CTs, academia/non-profit 457 and industry fall abreast. This importantly highlights the need for complete information in 458 order for external parties to be able to draw transparent conclusions. Generally, industry trials 459 in RDs were associated with better overall result posting even when not required by the 460 FDAAA, potentially due to more strictly regulated and overseen follow-up processes and the 461 fact that industry-funded trials tend to be larger and larger studies are more likely to be 462 published 37 . 463 The finding that RD CTs adhere more strictly to the FDAAA than CTs overall might be due 464 to ethical obligations towards these patients and the high value of clinical data from a CT, 465 which may be the only one performed in a patient population. RD CTs also include many 466 pediatric patients and pediatric CTs were found more likely to be completed 38 . Regardless of 467 the reasons, it is encouraging to note that RD CTs are reported at a higher rate than other 468 CTs. Ultimately, CTs need to overcome the widely recognized and general deficit of consistent and transparent data sharing practices, not only to improve patient care, but also to 470 value the participants who help advancing science and future patients. 471

CONCLUSION 473
This study provides insight into the landscape of RD CTs. RD clinical research tends to be 474 industry-funded and focused on treatments with drugs or biologicals and many trials involve 475 a relatively small number of participants and patients starting from very early in life. 476 Generally, industry funding has been associated with larger studies, including larger numbers 477 of participants and more study sites, but slightly shorter study duration compared with 478 academia/non-profit trials. Finally, CT results in rare diseases are being made often publicly 479 available more frequently than has been reported for studies entered in ClinicalTrial.gov, but 480 improvements are still needed to make all data readily available to the public. 481

482
Limitations 483 There are some limitations in this study to be taken into consideration. First, 484 ClinicalTrials.gov does not cover all trials conducted worldwide. However, there is a reported 485 80% overlap between ClinicalTrials.gov and the WHO ICTRP portal 18 . Secondly, there is no 486 single standard ontology for the description of clinical research and, despite extensive manual 487 data curation efforts, the dataset may still contain misclassified studies. Conversely, some 488 CTs may have been excluded from our study, due to the disease under study not conforming 489 to the ORDO naming convention. Efforts were made to identify, correct or remove erroneous 490 or ambiguous as well as carelessly entered data prior to analysis, however, an element of 491 uncertainty remains. Additionally, a notable fraction of studies may provide divergent 492 information on ClinicalTrials.gov and EUCTR, making comparisons difficult 39 . Although the 493 FDAAA was introduced to increase transparency, this legislation came with certain limitations; For example, its applicability to studies with at least one study site or an item 495 manufactured in the U.S. or U.S. territories. It is, however, not evident which studies are 496 applicable and which ones are exempt. Even though we attempted to address this limitation 497 by examining a random sample of 25% of the trials manually, errors may still be present. In 498 this study, early proof-of-concept trials, which can be impactful on the scientific community, 499 e.g. in order to avoid duplicity of resources, are not reviewed. Finally, this study represents a 500 snapshot of RD CTs as they were entered in March 2018 and some study details may have 501 been added after we retrieved the data. 502