The safety and efficacy of Momordica charantia L. in animal models of type 2 diabetes mellitus: A systematic review and meta‐analysis

Type 2 diabetes mellitus is a chronic hyperglycemic condition due to progressively impaired glucose regulation. Momordica charantia L. could potentially improve hyperglycemia because its fruit extracts can alleviate insulin resistance, beta‐cell dysfunction, and increase serum insulin level. We evaluated the effect of M. charantia L. in comparison with a vehicle on glycemic control in animal models of type 2 diabetes mellitus. MEDLINE, Web of Science, Scopus, and CINAHL databases were searched without language restriction through April 2019. About 66 studies involving 1861 animals that examined the effect of M. charantia L. on type 2 diabetes mellitus were included. Fruits and seed extracts reduced fasting plasma glucose (FPG) and glycosylated hemoglobin A1c in comparison to vehicle control: (42 studies, 815 animals; SMD, −6.86 [95% CI; −7.95, −5.77], 3 studies, 59 animals; SMD; −7.76 [95% CI; −12.50, −3.01]) respectively. Also, the extracts have hepato‐renal protective effects at varying doses and duration of administration. Despite the observed significant glycemic control effect, poor methodological quality calls for future researches to focus on standardizing extract based on chemical markers and adopt measures to improve the quality of preclinical studies such as sample size calculation, randomization, and blinding.


| Study design and animal models eligibility
The authors included preclinical randomized or non-randomized controlled designed studies. Furthermore, the original full article and those conducted in animal models of T2DM were considered. The authors assessed animal models carefully to include those with insulin resistance and β-cells failure to ensure construct validity. Our review also included all sex, age, species, and strain of animals. However, studies done in a human, in vitro, ex vivo, and in silico designs, and before-after studies without a description of the control group were excluded.

| Data items and collection process
Two authors extracted data independently from the included studies using a pilot-tested data collection form. Discrepancies between the authors were identified and resolved through consensus. Corresponding authors of included studies were contacted via email to obtain numerical data of studies that had data presented graphically, missing, or when additional data were required.

| Taxonomical assessment
The taxonomical and nomenclatural accuracy was assessed by comparing reported taxonomical information with existing standards in an open botanical database accessible at www.theplantlist.org.
Analysis of potential taxonomical errors was done according to methods proposed by Rivera and colleagues (Rivera et al., 2014).
Articles received "A" grade if full information about the species of plant, identification of specimen, and deposited voucher specimen presented, while grade "B" for those which did not present information on identification of specimen and a voucher specimen and those with inaccurate taxonomic information. Finally, the authors rated "C" to studies with incomplete or not presented at all information about the species of plant, or identification of specimens and a voucher specimen.

| Methodological quality and risk of bias assessment
We used SYRCLE's risk of bias tool to assess the risk of bias for each included study (Hooijmans et al., 2014). The tool assessed domains of random sequence generation, baseline characteristics, allocation concealment, random housing, blinding of investigators/caregivers, random outcome assessment, blinding of assessor, incomplete outcome data, selective outcome reporting, and other sources of bias. Each criterion was assigned value as high, low, or unclear risk of bias.
Besides, a modified CAMARADES checklist was used to assess the methodological quality of the included studies. According to this checklist, the quality indicators are based on 10 criteria: (1) peer-reviewed publication, (2) statement of control of temperature, (3) random allocation to treatment or control, (4) blinded caregiver/investigator, (5) blinded assessment of outcome, (6) use of co-interventions/co-morbid, (7) appropriate animal model (age, sex, species, strain), (8) sample size calculation, (9) compliance with animal welfare regulations and (10) statement of potential conflict of interests (Dalgleish et al., 2007). Each study was given a quality score out of a possible total of 10 points. Finally, the authors calculated mean score and categorized studies into "low-quality" for mean score 1-5 and "high quality" for mean score 6-10.

| Data synthesis
Quantitative data were pooled in a statistical meta-analysis using Review Manager (RevMan) software 5.3 (Copenhagen: The Nordic Cochrane Centre). Meta-analysis included studies with data on; FPG, HbA1c, serum insulin level, number of insulin-positive cells, TGs, TC, HDL-c, LDL-c, liver glycogen, ALT, AST, ALP, urea, serum creatinine, and weight. Since the same outcomes reported on the different measurement scales, we used the standardized mean difference (SMD) to evaluate the effect of M. charantia L. in comparison to vehicle control.
The inverse of variance-weighted method was used to attribute the relative contribution of each included study to the pooled SMD effect of M. charantia L. and its 95% confidence intervals. The authors used the random effects model for pooling effect estimates because the effect sizes from animal studies were more likely to differ due to the difference in design characteristics.
Qualitative data were summarized in the form of a table. We used signs (+) and (−) to indicate the direction of increased or decreased effect respectively. Variables analyzed qualitatively were HOMA-IR, HOMA-B, morphological structure of islet of Langerhans, number of beta-cells, and number of insulin secretory granules.

| Heterogeneity assessment
We used the I 2 statistic to quantify heterogeneity in primary studies (Higgins & Thompson, 2002). The I 2 of 75 or more was considered as indicative of substantial heterogeneity (Borenstein, Hedges, Higgins, & Rothstein, 2010). Subgroup analysis was done to examine potential factors that influence heterogeneity on the primary outcome (FPG).
For this analysis, we considered the risk of bias score, methodological quality score, study design (randomized and non-randomized design), duration of treatment, dose, mode of preparation of M. charantia L., animal species (mouse, rat, rabbit, dog, other), animal strains (Wistar, Long-Evans, KK mice, C57BL/6J mice, others), animal age, sex (male, female), and model of induction of type 2 diabetes mellitus (chemical, genetic, surgical, high-fat diet).

| Publication bias
Publication bias for each outcome was assessed by testing the asymmetry of the funnel plot using Egger's test (Egger, Davey Smith, Schneider, & Minder, 1997). For the publication bias assessment, we only considered meta-analysis of 10 or more studies because test power is generally too low to distinguish chance from real asymmetry when it includes a smaller number of the primary studies (Egger et al., 1997;Sterne et al., 2011). When publication bias was detected, the trim and fill method was used to correct the probable publication bias by imputing missed studies and adjusted the effect size (Duval & Tweedie, 2000).

| Assessment of confidence in cumulative evidence
The authors used "The Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach" as a framework to rate the certainty in the evidence of preclinical studies (Leeflang et al., 2018;Wei et al., 2016). The authors rated the certainty for each outcome by considering the risk of bias (as assessed by SYRCLE'S risk of bias tool), inconsistency (as assessed by heterogeneity tests, confidence intervals, and p values), imprecision, publication bias, and indirectness as proposed by Leeflang and colleagues (Leeflang et al., 2018). After considering all factors, the authors rated evidence as high, moderate, low, or very low-quality.

| Results of the search
We identified 443 articles through electronic and manual searching.
After removing duplications and screening the articles based on the titles and abstracts, 181 articles remained. The full-texts of these articles were examined for eligibility, consequently, 115 articles were further excluded because; one was a thesis for which a published article retrieved, 12 were only abstracts, 16 were inappropriately designed, 45 had not induced T2DM before administering the M. charantia L., 18 had no outcomes of interest, 16 did not investigate the intervention of interest and seven were duplicate publications. For each set of duplicate publications, we included one article which had most data.
Only one contacted corresponding author shared full text. Therefore, we included 66 studies in qualitative analysis and 48 studies in metaanalysis. A PRISMA flow diagram is presented to show the screened, excluded, and included articles (Figure 1).
Thirty-two (32) of these studies used an aqueous extract of fresh or dried fruits, and 17 used an alcoholic extract. The remaining studies used acetone extract, hydroalcoholic extract, petroleum ether extract, supernatant aqueous extract, and powdered dried fruits.
About 62 studies used fruits of M. charantia L., three used leaves, and one used seeds. Saifi et al., 2014 is the only study that described quality control measures of the intervention; the remaining studies did not describe quality control measures. The studies administered the M. charantia L. between 7 days and 90 days. Table 1 summarized the characteristics of the included studies.

| Taxonomical assessment of included studies
All 66 included studies used the scientific names; however, the majority 57 (86.1%) of the scientific names were not correct. The most recurrent type of error was missing plant authority names 39 (68.4%) and missing plant family names 25 (43.9%). Table 2 illustrates the different types of errors identified.
Four (4) out of 66 studies were given taxonomical validation score of "A" because they presented full information about plant name, identification of specimens, and voucher specimen deposited.
On the other hand, 10 studies were given a score of "B" since only partial information about plant name and identification of specimen was present. It is worth noting that, the majority of included studies (52) had inadequate or no information about taxonomical identification of plant species (S2).

| Methodological quality
The quality score of the majority of studies in this analysis 51 (77.3%) was between 2 and 3 with a median score of 3 (interquartile range 1).
To put it succinctly, these studies had poor methodological quality.
Interestingly, all 66 studies reported publication in peer-reviewed journals. However, none of these studies described the method of random allocation of animals to the treatment or control group, blinded caregiver/investigator, blinded assessment of outcome, and sample size calculation. Only one study described co-interventions of animal models used at baseline (Hossain et al., 2014). About 25 studies reported compliance with animal welfare regulations while 21 studies reported a statement of maintaining a constant temperature, and only 14 studies provided a statement of potential conflict of interest.
Supplementary material (S3) summarizes the methodological quality assessment of studies included in the analysis.

| Risk of bias assessment
Our results indicated that all studies did not perform allocation concealment, random animal housing, blinding of animal caregivers and investigators, random outcome assessment and blinding of outcome assessment. This could mean that these studies were prone to systematic errors due to the design flaw that could overestimate the effect of the M. charantia L. Four studies were given unclear risk of bias concerning random sequence generation because we found an inadequate description of the method used for random sequence generation (Aswar & Kuchekar, 2012;Ayoub et al., 2013;Rezaeizadeh et al., 2011;Singh & Gupta, 2007). Summary of the risk of bias across all studies and risk of bias of each included study is provided in

Insulin-positive cells
There

Triglycerides (TGs)
The data from 13 preclinical studies were pooled for the assessment of triglycerides ( Figure 6). Results showed a very low-quality evidence that M. charantia L. significantly lowered TGs level in treated group (n = 142) compared to vehicle control group (n = 87); −9.12 of SMD (95% CI; −11.76, −6.49). The I 2 was 92% indicated the presence of substantial heterogeneity in individual studies.
The I 2 was 95% indicated the heterogeneity. The certainty of this evidence was assessed as low (S4 Appendix).

High-density lipoprotein cholesterol (HDL-c)
The HDL-c was assessed by integrating data from eight studies ( Figure 6). There was low-quality evidence that the HDL-c level in M. charantia L. treated group (n = 72) increased compared to the vehicle control group (n = 50), 4.37 SDM (95% CI; 2.29, 6.45). The I 2 was 89% indicated the presence of heterogeneity.

Low-density lipoprotein cholesterol (LDL-c)
The LDL-c level in the M. charantia L. treated group (n = 72) was significantly decreased compared to that observed in the vehicle control group (n = 50). The SMD of −6.71 (95% CI; −9.06, −4.36). The I 2 was 89% indicated the presence of heterogeneity ( Figure 6).
Alanine aminotransferase (ALT), aspartate aminotransferase (AST), and alkaline phosphate (ALP) There was a significant reduction of ALT (SMD; −5.14; [95% CI; tors that could lead to asymmetric funnel plot such as true heterogeneity, poor methodological quality, artefactual, and chance (Egger et al., 1997). With these factors become ubiquitous in preclinical studies, as reported in other assessments (Gupta, 2019;Kilkenny et al., 2009;Pound, Ebrahim, Sandercock, Bracken, & Roberts, 2004), it could be safe to assume their influence on funnel plot asymmetry  (Almarzooq, 2009;Aswar & Kuchekar, 2012;Ayoub et al., 2013;Hafizur et al., 2011;Hossain et al., 2014 (Gheibi, Kash, & Ghasemi, 2017). The previous study indicated that a single dose of 45 mg/kg STZ leads to hyperglycemia and a higher mortality rate than multiple doses of 30 mg/kg (Zhang, Lv, Li, Xu, & Chen, 2009). Inspired by a growing understanding of disease pathophysiology, researchers have now revealed that a combination of high-fat diet and low dose STZ produce a model of T2DM that closely mimic a natural history of human with T2DM (Reed et al., 2000;Vatandoust et al., 2018). Our findings suggest that the concern about a different model of inducing T2DM varying similarity to humans with the condition is warranted.
These design features could potentially be sources of heterogeneity, and by extension, influence constructs validity of the study.

| Strength of the study
This is the first and timely systematic review and meta-analysis of M. charantia L. using animal studies. We provided a more in-depth insight into the current state and level of available preclinical evidence. We also provided evidence of major methodological, taxonomical flaws, and risk of bias that could potentially threat validity and clinical generalizability of preclinical studies of M. charantia L.

| CONCLUSION
Momordica charantia L. reduced elevated fasting plasma glucose levels in animal models of type 2 diabetes mellitus. It also significantly reduced glycosylated hemoglobin, alanine aminotransferase, aspartate aminotransferase, alkaline phosphate, urea, serum creatinine, and several lipid profile parameters. This conclusion must be interpreted in light of strongly suspected publication bias, high risk of bias, and poor methodological quality of primary studies. To enhance clinical generalizability, future researches should focus on standardizing doses of M. charantia L. with known chemical markers, provide adequate quality control data, conduct preclinical studies that are designed with random allocation, blinding of investigators and assessors, and power calculation of sample size.