Balancing Strengths and Weaknesses in Dimensional Psychiatry

Objective To evaluate the feasibility and value of creating an extensible framework for psychiatric phenotyping that indexes both strengths and weaknesses of behavioral dimensions. The Extended Strengths and Weaknesses Assessment of Normal Behavior (E-SWAN) reconceptualizes each diagnostic criterion for selected DSM-5 disorders as a behavior, which can range from high (strengths) to low (weaknesses). Initial efforts have focused on Panic Disorder, Social Anxiety, Major Depression, and Disruptive Mood Dysregulation Disorder. Methods Data were collected from 523 participants (ages: 5-21 years old) in the Child Mind Institute Healthy Brain Network − an ongoing community-referred study. Parents completed each of the four E-SWAN scales and traditional unidirectional scales addressing the same disorders. Distributional properties, Item Response Theory Analysis (IRT) and Receiver Operating Characteristic (ROC) curves (for diagnostic prediction) were used to assess and compare the performance of E-SWAN and traditional scales. Results In contrast to the traditional scales, which exhibited truncated distributions, all four E-SWAN scales were found to have near-normal distributions. IRT analyses indicate the E-SWAN subscales provided reliable information about respondents throughout the population distribution; in contrast, traditional scales only provided reliable information about respondents at the high end of the distribution. Predictive value for DSM-5 diagnoses was comparable to prior scales. Conclusion E-SWAN bidirectional scales can capture the full spectrum of the population distribution for DSM disorders. The additional information provided can better inform examination of inter-individual variation in population studies, as well as facilitate the identification of factors related to resiliency in clinical samples.


INTRODUCTION
Myriad questionnaires are available for measuring psychiatric illness dimensionally. However, the vast majority are based on detection of the presence of problematic behaviors and symptoms. Although useful from a clinical perspective, the tendency to focus on only 'one end' of the distribution (i.e., the pathologic trait range) limits the ability of such tools to distinguish individuals from one another in less symptomatic or non-affected segments of the population (i.e., the distribution is truncated) 1,2 . This failure to consider differences in strengths among individuals is particularly problematic for psychiatric research, where efforts to model brainbehavior relationships are increasingly turning to broader community and transdiagnostic samples 3 .

The Strengths and Weaknesses Assessment of ADHD Symptoms and Normal Behavior
(SWAN) provides a potentially valuable model for bidirectional questionnaire design 4 . Rather than attempting to quantify only the presence of ADHD symptoms, the SWAN probes a range of behaviors to identify relative strengths (i.e., abilities, which are indicative of adaptive behavior) and weaknesses (i.e., disabilities, which are indicative of problems requiring clinical attention, such as ADHD). This was accomplished by: 1) converting each DSM-IV ADHD symptom into a behavior and 2) expanding the typical 4-point scale of symptom presence ("not at all" to "very much") to a 7-point scale ("far below average" to "far above average"). Numerous published studies have demonstrated that the SWAN generates bidirectional distributions that are nearnormal [5][6][7] . Importantly, among individuals with ADHD symptomatology (i.e., a clinical sample), there is generally a high degree of agreement between the SWAN and traditional scales 1 .
Here we report on the initial design and feasibility testing of the Extended Strengths and Weaknesses Assessment of Normal Behavior (E-SWAN) -a framework that extends the general methodology of the SWAN to questionnaires for other psychiatric disorders. Consistent with the SWAN, the clinical wisdom embodied in the DSM-5 was taken as the departure point to develop each scale. Four DSM-5 disorders were chosen to provide a sampling of challenges that can arise in the conversion of DSM symptoms to dimensional probes. Major Depressive Disorder and Social Anxiety were chosen for their high prevalence in the general population 8,9 . Disruptive Mood Dysregulation Disorder (DMDD) was chosen as this new disorder in DSM-5 does not have empirically defined criteria or many valid measures for assessing symptoms 10 .
Finally, Panic Disorder was chosen to determine the feasibility of applying this framework to a disorder with physiologic symptoms 11 .
The present work makes use of initial data obtained in the Child Mind Institute Healthy Brain Network sample (ages: 5-21; N=523) that enabled comparison of E-SWAN results with those obtained using equivalent unidirectional questionnaires in the same individuals. Item response theory analyses are included to demonstrate the added value of the information obtained via the E-SWAN. Additionally, we obtained informant and self-report data via the Prolific Academic platform to verify the bidirectional distributional properties of the E-SWAN in an independent sample with distinct characteristics (n=250).

Questionnaire Construction: Process.
The present work focused on the development and testing of four E-SWAN questionnaires (Major Depression, Disruptive Mood Dysregulation, Social Anxiety, Panic Disorder) using a uniform method based on that previously employed for construction of the SWAN (see Figure 1). First, each DSM-5 criterion was broken down to reflect specific symptoms that are core to each of the DSM-5 disorders. Second, each specific symptom was transformed into its underlying ability or behavior, i.e., the ability/behavior that when impaired or dysfunctional gives rise to the symptom. Lastly, each item was worded to be answered on a 7-point scale representing deviation from children of the same age, following the statement: "When compared to children of the same age, how well does this child…" (See Figure 1 for detailed workflow).The results from this process were discussed by a committee of experts and the final versions were circulated to experienced clinicians for comments.

Questionnaire Construction: Considerations.
2.2.1 Level of detail and nuance. When converting DSM criteria to behaviors, we worked to ensure that question items capture the level of detail and nuance of the original criteria. This is essential as even slight changes can impact the interpretation of and responses to an item. An example of the importance of this consideration is in the DMDD questionnaire. A key criterion of a DMDD diagnosis is that the behaviors must be present in more than on setting. Initially, we indicated in each question that the behavior being rated must have taken place in more than one setting. However, we found this to be problematic, as it required parents to think about a behavior over several settings at once, and was not informative as to the specific setting(s) in which the behavior is actually taking place. As a result, we changed the questionnaire to ask each question separately for each of the settings (home, school, and with friends). We then ask the parent to rate how well their child is able to regulate the physiological symptoms while experiencing "a moment of intense fear or discomfort". This allows us to potentially capture what prevents a panic attack in one individual in the same context that elicits a panic attack in another individual.

PROMIS Guidelines.
In addition to the principles that we developed, we followed the PROMIS Instrument Development Validation Scientific Standards (http://www.healthmeasures.net/images/PROMIS/PROMISStandards_Vers2.0_Final.pdf) -a set of guidelines proposed by NIH for the development of standardized assessments and rating scales. In particular, we focused on three PROMIS criteria: clarity, precision, and general applicability. Clarity means that each item is straightforward and easy to understand; not vague, confusing, or complex. To meet this goal, we used simple language characteristic of a fourthgrade reading level. Precision means that each question is specific, asking only about one behavior in one setting. We did not include multiple behaviors in one item. To meet this goal, several of the original questions were broken down into multiple questions. General applicability means that the questions do not require cultural or contextual knowledge.

CMI Healthy Brain Network.
Data were collected from 523 participants of the Healthy Brain Network (HBN; http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/), which is designed to be a sample of 10,000 children and adolescents from the New York City area, collected using a community-referred model that recruits based on the presence of behavioral concerns (REF).

Prolific Academic Sample.
To confirm the generality of distributional properties for the E-SWAN, additional data were collected from 250 parents through Prolific Academic (an online crowdsourced data collection tool) (https://www.prolific.ac/). Users were screened based on having a child in the 6.0-17.0 age range. Parents then completed four questionnaires anonymously through a Google survey.
These respondents completed the E-SWAN questionnaires only, and were given a small monetary compensation. For IRT analyses, we used Graded Response Models (GRM) to calculate each item's discrimination parameter, which indexes the strength of the relationship between each item and the latent trait, and each item difficulty parameter, which indexes in each area of the latent trait the item that concentrates the ability to provide information 19  All analyses were carried out in R using the following packages: lavaan 21 , ltm 22 , psych 23 , quantreg 24 , irr 25 and pROC 26 .

Distributional Properties. Consistent with the SWAN, all E-SWAN scales were
approximately normally distributed, in contrast to their traditionally designed counterparts ( Figure 2 and Table 1). Kendall's W, Cronbach's alpha and Pearson correlation coefficients showed significant concordance, reliability, and correlation between ratings on the E-SWAN subscales and their unidirectional counterparts, respectively. To examine the added benefit of the bidirectional scales, we split the E-SWAN data into those that scored at or above 0 (higher levels of symptomatology) and those that scored below 0 (low symptomatology; higher levels of strengths Stronger correlations are seen between the traditional scales and the E-SWAN scales at the extreme (pathologic) end of the trait (Supplementary Figure 1). A sample of reports from 250 parents on Prolific Academic yielded highly similar distributional properties, confirming that our findings were not specific to HBN (Supplementary Figure 2).

Confirmatory Factor Analyses. A prerequisite for IRT analyses is the demonstration that
the data being analyzed meet assumptions of sufficient unidimensionality. In this regard, scree plots were used to confirm that all E-SWAN measures, ARI, SCARED, and MFQ questionnaires met the assumption of unidimensionality (Supplementary Figure 3). CFA results are shown in Table 2. Some models showed residual correlations between items of similar content. All models showed good fit based on CFI and TLI (all values above 0.9), and most showed acceptable to good fit for RMSEA (all values <= 0.08) (Supplementary Table 1). Figure 3 shows Supplementary tables 2-9 indicate at which point along the latent trait there is a 50% probability of transitioning to the next response choice.

Predictive Value for DSM Diagnosis.
A key concern that may arise regarding the E-SWAN is whether the resulting scales have comparable predictive value for DSM diagnoses relative to previously established unidirectional scales. Given the high correlation of scores at the high end of the latent trait, one would expect this to be the case. To test this, we generated ROC curves for all scales using diagnoses generated from the K-SADS (Figure 4). Both E-SWAN and traditional scales performed well (AUC values 0.7-0.89), indicating that they are comparable screening tools and giving increased support for the validity of the E-SWAN questionnaires.

DISCUSSION
Inspired by the SWAN, we developed and tested a generalized framework for constructing questionnaires to assess the full range of behavior defined by DSM symptoms, when considered as an endpoint of a dimension. Consistent with the SWAN, each of the E-SWAN questionnaires were constructed to be bidirectional, i.e., indexing both strengths (abilities) and weaknesses (disabilities). When compared to the unidirectional scales, the E-SWAN scales exhibited distributional properties that were near-normal rather than highly skewed or truncated.
As predicted, for each trait, a strong correspondence was noted between the E-SWAN scores and traditional scale scores among individuals at the high (pathological) end, but not at the low end. IRT analyses suggested that in contrast to traditional scales, the E-SWAN subscales exhibited good discrimination and reliability across the full latent trait (z-scores from -3 to +3; reliabilities ranging from 0.77 to 0.97) -not just at the high end. Finally, we demonstrated the ability to generate self-report questionnaires using the E-SWAN framework. Consistent with the data from Healthy Brain Network participants, our online sample from Prolific Academic yielded a near-normal distribution, although shifted slightly to the left (i.e., less symptomatic), as would be expected given the differences in recruitment strategies (online crowdsourcing community vs. community-referred based on the presence of behavioral concerns). There are a number of limitations of the E-SWAN framework that suggest areas for improvement. First, a key assumption of the E-SWAN framework -that the underlying dimension of behavior described is bidirectional and normally distributed in the general population -may not hold for all disorders. While likely reasonable for most DSM disorders, some, such as PTSD and Substance Use Disorders, represent clear instances where this is not the case, as only a subset of the population has had exposure to trauma or a given substance 11 .
For these disorders, the prompts and range of responses can be changed to create a distribution in a subset of the population defined by the presence of a particular exposure (e.g., stressor, substance use). Questionnaires for both of these disorders are under development (see eswan.org for current drafts). Second is the potential for biased reporting, which can arise from either a skewed perception of 'other children the same age' on the part of the informant, or a bias to see a child as more (or less) able than they are (e.g., the "Lake Wobegon Effect") 27 .
Arguably such biases are also present in unidirectional questionnaires, though centered more around ratings of frequency. As demonstrated with other questionnaire tools, one of the most promising ways of overcoming such bias is the collection of data from multiple informants.

Item Construction
Extract symptom(s) from each DSM-5 criterion

Response Construction
Using a standardized process to develop the questionnaires allows them to be used individually, or together as a set, and allows the development of future scales to have consistent language and formatting throughout.
Construct question item text to ensure that deviations reflect strengths or weaknesses based on wording Transform each symptom into a corresponding behavior Consideration: The prompt for each questionnaire asks the parent to compare the child to other children his or her age. This eliminates the need to have questions about abilities with age restrictions, or to have several versions of the same questionnaire for different age ranges.
Each item was worded so a response on a 7-point scale (-3 to 3, with 0 as a midpoint anchor) would represent deviation from the average child at the same age Consideration: Each symptom is then considered as an end-point of a dimension (a disability), then then the other end-point (an ability). Using the ability-disability dimension allows the questionnaire to capture both strengths and weaknesses of individuals.
All questionnaires use the following prompt: "When compared to children of the same age, how well does this child" All questionnaires use the following response choices: -3=Far above average; -2=Above average; -1=Slightly above average; 0=Average;