ABSTRACT
Objective To evaluate the feasibility and value of creating an extensible framework for psychiatric phenotyping that indexes both strengths and weaknesses of behavioral dimensions. The Extended Strengths and Weaknesses Assessment of Normal Behavior (E-SWAN) reconceptualizes each diagnostic criterion for selected DSM-5 disorders as a behavior, which can range from high (strengths) to low (weaknesses). Initial efforts have focused on Panic Disorder, Social Anxiety, Major Depression, and Disruptive Mood Dysregulation Disorder.
Methods Data were collected from 523 participants (ages: 5-21 years old) in the Child Mind Institute Healthy Brain Network − an ongoing community-referred study. Parents completed each of the four E-SWAN scales and traditional unidirectional scales addressing the same disorders. Distributional properties, Item Response Theory Analysis (IRT) and Receiver Operating Characteristic (ROC) curves (for diagnostic prediction) were used to assess and compare the performance of E-SWAN and traditional scales.
Results In contrast to the traditional scales, which exhibited truncated distributions, all four E-SWAN scales were found to have near-normal distributions. IRT analyses indicate the E-SWAN subscales provided reliable information about respondents throughout the population distribution; in contrast, traditional scales only provided reliable information about respondents at the high end of the distribution. Predictive value for DSM-5 diagnoses was comparable to prior scales.
Conclusion E-SWAN bidirectional scales can capture the full spectrum of the population distribution for DSM disorders. The additional information provided can better inform examination of inter-individual variation in population studies, as well as facilitate the identification of factors related to resiliency in clinical samples.
1. INTRODUCTION
Myriad questionnaires are available for measuring psychiatric illness dimensionally. However, the vast majority are based on detection of the presence of problematic behaviors and symptoms. Although useful from a clinical perspective, the tendency to focus on only ‘one end’ of the distribution (i.e., the pathologic trait range) limits the ability of such tools to distinguish individuals from one another in less symptomatic or non-affected segments of the population (i.e., the distribution is truncated)1,2. This failure to consider differences in strengths among individuals is particularly problematic for psychiatric research, where efforts to model brain-behavior relationships are increasingly turning to broader community and transdiagnostic samples3.
The Strengths and Weaknesses Assessment of ADHD Symptoms and Normal Behavior (SWAN) provides a potentially valuable model for bidirectional questionnaire design4. Rather than attempting to quantify only the presence of ADHD symptoms, the SWAN probes a range of behaviors to identify relative strengths (i.e., abilities, which are indicative of adaptive behavior) and weaknesses (i.e., disabilities, which are indicative of problems requiring clinical attention, such as ADHD). This was accomplished by: 1) converting each DSM-IV ADHD symptom into a behavior and 2) expanding the typical 4-point scale of symptom presence (“not at all” to “very much”) to a 7-point scale (“far below average” to “far above average”). Numerous published studies have demonstrated that the SWAN generates bidirectional distributions that are nearnormal5–7. Importantly, among individuals with ADHD symptomatology (i.e., a clinical sample), there is generally a high degree of agreement between the SWAN and traditional scales1.
Here we report on the initial design and feasibility testing of the Extended Strengths and Weaknesses Assessment of Normal Behavior (E-SWAN) — a framework that extends the general methodology of the SWAN to questionnaires for other psychiatric disorders. Consistent with the SWAN, the clinical wisdom embodied in the DSM-5 was taken as the departure point to develop each scale. Four DSM-5 disorders were chosen to provide a sampling of challenges that can arise in the conversion of DSM symptoms to dimensional probes. Major Depressive Disorder and Social Anxiety were chosen for their high prevalence in the general population8,9. Disruptive Mood Dysregulation Disorder (DMDD) was chosen as this new disorder in DSM-5 does not have empirically defined criteria or many valid measures for assessing symptoms10. Finally, Panic Disorder was chosen to determine the feasibility of applying this framework to a disorder with physiologic symptoms11.
The present work makes use of initial data obtained in the Child Mind Institute Healthy Brain Network sample (ages: 5-21; N=523) that enabled comparison of E-SWAN results with those obtained using equivalent unidirectional questionnaires in the same individuals. Item response theory analyses are included to demonstrate the added value of the information obtained via the E-SWAN. Additionally, we obtained informant and self-report data via the Prolific Academic platform to verify the bidirectional distributional properties of the E-SWAN in an independent sample with distinct characteristics (n=250).
2. METHODS
2.1 Questionnaire Construction: Process
The present work focused on the development and testing of four E-SWAN questionnaires (Major Depression, Disruptive Mood Dysregulation, Social Anxiety, Panic Disorder) using a uniform method based on that previously employed for construction of the SWAN (see Figure 1). First, each DSM-5 criterion was broken down to reflect specific symptoms that are core to each of the DSM-5 disorders. Second, each specific symptom was transformed into its underlying ability or behavior, i.e., the ability/behavior that when impaired or dysfunctional gives rise to the symptom. Lastly, each item was worded to be answered on a 7-point scale representing deviation from children of the same age, following the statement: “When compared to children of the same age, how well does this child…” (See Figure 1 for detailed workflow).The results from this process were discussed by a committee of experts and the final versions were circulated to experienced clinicians for comments.
E-SWAN Questionnaire Development Workflow. Workflow diagram detailing the steps followed for developing the items of each E-SWAN questionnaire.
2.2 Questionnaire Construction: Considerations
2.2.1 Level of detail and nuance
When converting DSM criteria to behaviors, we worked to ensure that question items capture the level of detail and nuance of the original criteria. This is essential as even slight changes can impact the interpretation of and responses to an item. An example of the importance of this consideration is in the DMDD questionnaire. A key criterion of a DMDD diagnosis is that the behaviors must be present in more than on setting. Initially, we indicated in each question that the behavior being rated must have taken place in more than one setting. However, we found this to be problematic, as it required parents to think about a behavior over several settings at once, and was not informative as to the specific setting(s) in which the behavior is actually taking place. As a result, we changed the questionnaire to ask each question separately for each of the settings (home, school, and with friends).
2.2.2 Multiple phrases versus single
DSM diagnoses differ in number and complexity of criteria. Some DSM diagnoses have simple, clearly defined criteria, such as a symptom count, while other diagnoses contain long phrases capturing many symptoms or multiple contexts in one criterion. For example, when looking at ADHD, Depression or Social Anxiety, individual criteria are generally single symptoms. In contrast, for DMDD, each criterion encapsulates multiple symptoms. The first DMDD criterion reads “Severe recurrent temper outbursts manifested verbally and/or behaviorally that are grossly out of proportion in intensity or duration to the situation or provocation.” This one criterion captures several behaviors or symptoms across multiple settings. In the first step of the E-SWAN construction process, complex criteria such as this are broken down into multiple single items. For this one DMDD criterion, we determined that five symptoms or characteristics are being queried: verbal outbursts, behavioral outbursts, intensity, duration, and situation.
2.2.3 Conditional Criteria
Some DSM diagnoses have conditional criteria that cannot easily be translated into abilities or strengths. For example, Panic Disorder criteria are mostly physiological symptoms experienced during a panic attack. To address this, we first developed three questions focused on the presence and severity of panic attacks (phrased as “moments of intense fear or discomfort”). We then ask the parent to rate how well their child is able to regulate the physiological symptoms while experiencing “a moment of intense fear or discomfort”. This allows us to potentially capture what prevents a panic attack in one individual in the same context that elicits a panic attack in another individual.
2.2.4 PROMIS Guidelines
In addition to the principles that we developed, we followed the PROMIS Instrument Development Validation Scientific Standards (http://www.healthmeasures.net/images/PROMIS/PROMISStandards_Vers2.0_Final.pdf) - a set of guidelines proposed by NIH for the development of standardized assessments and rating scales. In particular, we focused on three PROMIS criteria: clarity, precision, and general applicability. Clarity means that each item is straightforward and easy to understand; not vague, confusing, or complex. To meet this goal, we used simple language characteristic of a fourth-grade reading level. Precision means that each question is specific, asking only about one behavior in one setting. We did not include multiple behaviors in one item. To meet this goal, several of the original questions were broken down into multiple questions. General applicability means that the questions do not require cultural or contextual knowledge.
2.3 Participants
2.3.1 CMI Healthy Brain Network
Data were collected from 523 participants of the Healthy Brain Network (HBN; http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/), which is designed to be a sample of 10,000 children and adolescents from the New York City area, collected using a community-referred model that recruits based on the presence of behavioral concerns (REF). Participants selected for the present work ranged in age from 6.0-17.0. As part of the HBN protocol, parents of participants completed all E-SWAN scales (Depression, Social Anxiety, Panic Disorder, DMDD, and ADHD [SWAN]). Additionally, they completed traditionally designed instruments to assess these same disorders, including: the Mood and Feelings Questionnaire (MFQ)12, the Screen for Child Anxiety and Related Disorders (SCARED)13, the Affective Reactivity Index (ARI)14 and the Strengths and Difficulties Questionnaire (SDQ)15. DSM-5 diagnoses were established for all participants using a computerized version of the Kiddie Schedule for Affective Disorders and Schizophrenia (KSADS) (https://clinicaltrials.gov/ct2/show/NCT01866956) that was administered by licensed/license-eligible clinicians.
2.3.2. Prolific Academic Sample
To confirm the generality of distributional properties for the E-SWAN, adadditional data were collected from 250 parents through Prolific Academic (an online crowdsourced data collection tool) (https://www.prolific.ac/). Users were screened based on having a child in the 6.0-17.0 age range. Parents then completed four questionnaires anonymously through a Google survey. These respondents completed the E-SWAN questionnaires only, and were given a small monetary compensation.
2.4 Statistical Analysis
E-SWAN Distributional Properties. For each of the bidirectional E-SWAN scales and corresponding traditional unidirectional scales, mean, median, skewness, and kurtosis were calculated.
Testing Correspondence between E-SWAN and Traditional Scales. For each of the E-SWAN scales, and its unidirectional counterpart, we calculated: 1) Cronbach’s alpha coefficients to measure internal consistency, 2) Kendall’s W coefficients to measure reliability of the different measures, and 3) Pearson correlation coefficients. To further examine the added value of measuring both strengths and weaknesses, for each questionnaire, we subdivided participants into those with mean E-SWAN scores higher for strengths (E-SWAN mean score < 0) and those with mean E-SWAN scores higher for weaknesses (E-SWAN mean score > 0), and then calculated each of these statistics again for each of the two groups (Cronbach’s alpha, Kendall’s W, Pearson correlation). Finally, we tested whether correlations between E-SWAN scales and traditional scales vary as a function of E-SWAN score using quantile regressions16.
Item Response Theory (IRT). Next, we tested the performance of individual items from the E-SWAN and their traditional counterparts in measuring the latent trait using IRT. Prior to the IRT analysis, Confirmatory Factor Analysis (CFA) models were fitted to test the assumption of unidimensionality. These models used mean and variance adjusted weighted least squares (WLSMV) estimators, which account for the ordinal nature of the data. From this we assessed model fit using the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI) and Root Mean Square Error of Approximation (RMSEA). Values of the CFI and TLI equal to or higher than 0.90 represent an acceptable fit, and higher than 0.95 represent a good fit. Values of the RMSEA equal to or lower than 0.08 represent an acceptable fit, and lower than 0.05 represent a good fit17,18.
For IRT analyses, we used Graded Response Models (GRM) to calculate each item’s discrimination parameter, which indexes the strength of the relationship between each item and the latent trait, and each item difficulty parameter, which indexes in each area of the latent trait the item that concentrates the ability to provide information19. Test Information Function (TIF) curves were plotted for each instrument. These plots depict how well the overall test discriminates individuals, and the precision of the measurement at various levels of the latent trait. Similarly, Item Information Function (IIF) curves were plotted, which shows how each individual item on the questionnaire performs. Finally, Item Characteristic Curves (ICC) were plotted for each item of each scale. These plots depict the difficulty parameter, which represents the probability of a respondent endorsing each of the response options (e.g., far above average to far below average) along the latent trait for each instrument.
ROC Curves for Diagnostic Prediction. To determine the ability of E-SWAN scales to predict DSM-based diagnoses, and their comparability to traditional scales, we generated Receiver Operating Characteristic (ROC) curves for all scales using clinician consensus diagnoses from the KSADS COMP (a computerized DSM-5-based KSADS tool)20. We then calculated and compared Area Under the Curve (AUC) between E-SWAN scales and unidirectional scales measuring the same disorder.
All analyses were carried out in R using the following packages: lavaan21, ltm22, psych23, quantreg24, irr25 and pROC26.
3. RESULTS
3.1 Distributional Properties
Consistent with the SWAN, all E-SWAN scales were approximately normally distributed, in contrast to their traditionally designed counterparts (Figure 2 and Table 1). Kendall’s W, Cronbach’s alpha and Pearson correlation coefficients showed significant concordance, reliability, and correlation between ratings on the E-SWAN subscales and their unidirectional counterparts, respectively. To examine the added benefit of the bidirectional scales, we split the E-SWAN data into those that scored at or above 0 (higher levels of symptomatology) and those that scored below 0 (low symptomatology; higher levels of strengths). We again calculated Kendall’s W, Cronbach’s alpha, and Pearson correlations on the two groups. We found that reliability, concordance and correlations were markedly lower among those that scored below 0 (strengths group). This indicates that the unidirectional measures no longer align with the E-SWAN scales when measuring positive behaviors rather than symptoms. This relationship was consistently seen across all five domains assessed by the E-SWAN (DMDD, Depression, Social Anxiety, Panic Disorder, ADHD) (Table 1). Quantile regression showed that the traditional scales vary as a function of the E-SWAN scores. Stronger correlations are seen between the traditional scales and the E-SWAN scales at the extreme (pathologic) end of the trait (Supplementary Figure 1). A sample of reports from 250 parents on Prolific Academic yielded highly similar distributional properties, confirming that our findings were not specific to HBN (Supplementary Figure 2).
Distribution of E-SWAN Scores and Unidirectional Measure Scores. Distribution of scores from E-SWAN scales, shown in the top panel, compared with the distribution of scores from their unidirectional counterparts, shown in the bottom panel.
Descriptive Statistics and Measures of Concordance and Reliability. Measures of concordance and reliability are shown between each E-SWAN scale and its unidirectional counterparts for all participants, and for participants grouped by mean E-SWAN score.
3.2 Confirmatory Factor Analyses
A prerequisite for IRT analyses is the demonstration that the data being analyzed meet assumptions of sufficient unidimensionality. In this regard, scree plots were used to confirm that all E-SWAN measures, ARI, SCARED, and MFQ questionnaires met the assumption of unidimensionality (Supplementary Figure 3). CFA results are shown in Table 2. Some models showed residual correlations between items of similar content. All models showed good fit based on CFIand TLI (all values above 0.9), and most showed acceptable to good fit for RMSEA (all values <= 0.08) (Supplementary Table 1).
3.3 Item Response Theory Analyses
Figure 3 shows the Test Information Function curves and reliability for each scale. As can be seen in this figure, the E-SWAN scales provide reliable information across the full latent trait, from -3 to 3 standard units above the mean (reliability values 0.77-0.97 [supplementary tables 2–9]). The unidirectional scales only capture reliable information from 0 to 3 standard units above the mean of the latent trait.
Test Information Function Plots. Plots depict the overall performance of each E-SWAN scale and its unidirectional counterpart at measuring the latent trait. Blue lines represent areas of the curve where information is reliable.
The Item Information Curves in Supplementary Figure 4 show that this same relationship exists for the individual items that make up the E-SWAN and unidirectional questionnaires. The E-SWAN questions capture information across the full latent trait, while the unidirectional scales only capture information at the high end of the latent trait.
Supplementary Figure 5 shows an example of the Item Characteristic Curve for one question on the E-SWAN Depression subscale and one question on the MFQ. These curves show the probability of a particular answer choice being endorsed at each level of the latent trait. Supplementary tables 2-9 indicate at which point along the latent trait there is a 50% probability of transitioning to the next response choice.
ROC Curves. Receiver Operating Characteristic curves representing diagnostic capabilities of both E-SWAN and unidirectional scales for DMDD, Depression, and Social Anxiety.
3.4 Predictive Value for DSM Diagnosis
A key concern that may arise regarding the E-SWAN is whether the resulting scales have comparable predictive value for DSM diagnoses relative to previously established unidirectional scales. Given the high correlation of scores at the high end of the latent trait, one would expect this to be the case. To test this, we generated ROC curves for all scales using diagnoses generated from the K-SADS (Figure 4). Both E-SWAN and traditional scales performed well (AUC values 0.7-0.89), indicating that they are comparable screening tools and giving increased support for the validity of the E-SWAN questionnaires.
4. DISCUSSION
Inspired by the SWAN, we developed and tested a generalized framework for constructing questionnaires to assess the full range of behavior defined by DSM symptoms, when considered as an endpoint of a dimension. Consistent with the SWAN, each of the E-SWAN questionnaires were constructed to be bidirectional, i.e., indexing both strengths (abilities) and weaknesses (disabilities). When compared to the unidirectional scales, the E-SWAN scales exhibited distributional properties that were near-normal rather than highly skewed or truncated. As predicted, for each trait, a strong correspondence was noted between the E-SWAN scores and traditional scale scores among individuals at the high (pathological) end, but not at the low end. IRT analyses suggested that in contrast to traditional scales, the E-SWAN subscales exhibited good discrimination and reliability across the full latent trait (z-scores from −3 to +3; reliabilities ranging from 0.77 to 0.97) — not just at the high end. Finally, we demonstrated the ability to generate self-report questionnaires using the E-SWAN framework. Consistent with the data from Healthy Brain Network participants, our online sample from Prolific Academic yielded a near-normal distribution, although shifted slightly to the left (i.e., less symptomatic), as would be expected given the differences in recruitment strategies (online crowdsourcing community vs. community-referred based on the presence of behavioral concerns).
The ability to meaningfully and reliably catalog variance across the entirety of a population becomes more important as biological and epidemiologic studies shift away from categorical, syndromic characterizations of psychiatric illness. Efforts such as the NIMH Research Domain Criteria Project have successfully drawn attention to the potential added value of dimensional characterizations that cut across the diagnostic boundaries specified by DSM and ICD3. Yet the vast majority of questionnaires focused on mental health are limited in their ability to characterize variation among individuals beyond the symptomatic segment of the population1,2. As demonstrated in the present work, the E-SWAN framework offers a viable alternative for improving our ability to differentiate individuals that are non-symptomatic in a given domain. It does so without losing track of the clinical significance of identifying a pathological range of the trait (which is used as the departure point for characterizing the trait). The data from the E-SWAN are more statistically appropriate for dimensional analysis. While the present work used the total score obtained for a given scale as the unit of analysis, future work may benefit from consideration of individual item scores. Similar to overall questionnaire scores, individual items of the E-SWAN are intended to represent a bidirectional dimension. This property can be particularly valuable for efforts focused on the identification of abilities that may have protective effects or confer resilience, as well as disabilities associated with impairment. For example, when evaluating individuals with equivalent symptom profiles (defined by one end of the underlying bidirectional dimensions), though differing outcomes, the presence or absence of strengths (on the other end of the dimensions — but not measured by traditional scales) may result in different outcomes. The E-SWAN scales can be used to capture such distinctions.
There are a number of limitations of the E-SWAN framework that suggest areas for improvement. First, a key assumption of the E-SWAN framework — that the underlying dimension of behavior described is bidirectional and normally distributed in the general population — may not hold for all disorders. While likely reasonable for most DSM disorders, some, such as PTSD and Substance Use Disorders, represent clear instances where this is not the case, as only a subset of the population has had exposure to trauma or a given substance11. For these disorders, the prompts and range of responses can be changed to create a distribution in a subset of the population defined by the presence of a particular exposure (e.g., stressor, substance use). Questionnaires for both of these disorders are under development (see eswan.org for current drafts). Second is the potential for biased reporting, which can arise from either a skewed perception of ‘other children the same age’ on the part of the informant, or a bias to see a child as more (or less) able than they are (e.g., the “Lake Wobegon Effect”)27. Arguably such biases are also present in unidirectional questionnaires, though centered more around ratings of frequency. As demonstrated with other questionnaire tools, one of the most promising ways of overcoming such bias is the collection of data from multiple informants.
As demonstrated in our comparison of results from Healthy Brain Network and Prolific Academic, the mean of the distribution obtained in a given sample can vary depending on the specific segment of the population sampled. Within a given study, this is not necessarily a problem. However, if not taken into account, such variation can lead to confounds or biases when attempting to compare or combine across studies, or to develop clinical cutoffs. An effective solution would arguably be the generation of appropriately-sized normative samples to serve as a reference. This need is not different from any other questionnaire. A positive aspect of the E-SWAN versus other questionnaires is that one can more easily compare distributional characteristics to understand differences among samples in their entirety.
In the spirit of collaboration and open science, all E-SWAN questionnaires are freely available for use and can be accessed at www.eswan.org, licensed Creative Commons (CC) BY 4.0 to encourage maximal dissemination and application of the questionnaires. It is our hope that other investigative teams will join in the effort to create the full range of E-SWAN questionnaires encompassing all major psychiatric syndromes.
Supplemental Figure Legends
Quantile Regression Plots. Quantile regression plots showing that the traditional scales vary as a function of the E-SWAN scores. Stronger correlations are seen between the traditional scales and the E-SWAN scales at the extreme (pathologic) end of the trait.
Prolific Academic Parent Report and HBN Parent Report. Plots comparing distribution of E-SWAN scores from HBN parent reports and online parent reports gathered through Prolific Academic. Both samples show similar distributions, with the online sample having a slightly lower mean.
Supplementary Figure 3. Scree plots. Scree plots indicate that all scales meet the assumption of unidimensionality required for CFA analyses.
Item Information Curves. Plots depict the performance of each item on each of the E-SWAN scales and its unidirectional counterpart at measuring the latent trait.
Sample Item Characteristic Curves. An example Item Characteristic Curve from the E-SWAN Depression scale, and the MFQ are shown. Lines represent the probability of endorsing a specific response choice at each area of the latent trait. For the sample item from the E-SWAN (item 7 of E-SWAN Depression scale) there is a 50% estimated probability that a participant will transition from an answer of ‘Far Below Average’ to ‘Below Average’ or higher in this item when latent trait levels are 2.06 standard units below the mean, 50% probability of transitioning from ‘Below Average’ to ‘Slightly Above Average’ when latent trait levels are 1.24 standard units below the mean, and 50% probability of transitioning from ‘Slightly Below Average’ to ‘About Average’ when latent trait levels are 0.7 standard units below the mean. Categories of ‘Slightly Above Average’ or higher, ‘Above Average’ or higher and ‘Far Above Average’ are endorsed with a 50% probability when latent trait levels are 0.6, 1.37, and 2.18 standard units above the mean, respectively. For the sample item of the MFQ (item 3 of the MFQ), there is a 50% estimated probability that a participant will transition from an answer of ‘Not True’ to ‘Sometimes’ or higher when latent trait levels are 1.58 standard units above the mean, and 50% probability of transitioning from ‘Sometimes’ to ‘True’ when latent trait levels are 3.49 standard units above the mean.
Supplementary Table Legends:
Measures of Fit for Confirmatory Factor Analysis.
Item Response Theory Parameters for E-SWAN Depression Scale.
Item Response Theory Parameters for E-SWAN Panic Disorder Scale.
Item Response Theory Parameters for E-SWAN Social Anxiety Scale.
Item Response Theory Parameters for E-SWAN DMDD Scale.
Item Response Theory Parameters for ARI.
Item Response Theory Parameters for MFQ.
Item Response Theory Parameters for SCARED Panic Disorder.
Item Response Theory Parameters for SCARED Social Anxiety.
Footnotes
No competing interests to report