Innovations In Clinical Neuroscience

NOV-DEC 2017

A peer-reviewed, evidence-based journal for clinicians in the field of neuroscience

Issue link:

Contents of this Issue


Page 31 of 83

32 ICNS INNOVATIONS IN CLINICAL NEUROSCIENCE November-December 2017 • Volume 14 • Number 11–12 O R I G I N A L R E S E A R C H Measures. The PANSS 17 is a 30-item rating instrument comprising three subscales: the seven-item Positive Symptoms subscale (P-P7), the seven-item Negative Symptoms subscale (N1-N7), and the 16-item General Psychopathology subscale (G1-G16). All 30 items are rated on a seven-point scale (1=absent to 7=extreme). Currently, there are over 40 official language versions of the PANSS. Translations have been carried out per international guidelines, through collaborations between specific sponsors and translation agencies in the geo-cultural groups concerned. Translation standards for the PANSS follow internationally recognized guidelines with the objective to achieve semantic equivalence as outlined by the Multi-Health Systems translation policy. All raters participating in the 16 clinical trials received rater training and certification on the PANSS prior to conducting PANSS assessments. Processes for rater training differed across studies, but all raters received didactic training overseen by a PANSS subject matter expert. Didactic training on the PANSS consisted of a detailed overview of each PANSS item and its anchor points. Following the overview, all raters were required to view and score a PANSS "Gold Score" video, which is a recorded interview of a rater conducting a structured clinical interview with either a patient with schizophrenia or an actor trained to portray a patient with schizophrenia. The rater's scores on the interview were then compared to the consensus scores of two or more expert raters. In order to receive certification, a rater was expected to have an intra-class correlation (ICC) of at least 0.80 with the Gold Score ratings. It is expected that for some studies there might have been exceptions to the ICC≥0.80 requirement, based on rater qualifications and experience. Specific inter-rater reliability values within and across studies were not available. The categorization of data was based on country, culture, and language. Because a minimum of 100 subjects per group is recommended for performing DIF analysis, 22 attention was placed on the number of subjects per country. Additionally, an attempt was made to match groups to raters who were more likely to share their language and culture. To the extent possible, based on the available sample size, an attempt was made to maintain individual countries as individual categories. The resulting categories and rationales for combining multiple countries into single categories are presented in Table 1. Despite its heterogeneity of language and culture, the United States (US) is identified as a separate category for several reasons. First, Gören's study 23 examining the most culturally diverse countries in the world places the United States near the middle of all countries. Although New York and San Francisco are within the top 10 most culturally diverse cities, the only Western country ranked in the top 20 most diverse countries is Canada. 23 Second, the original scale development of the PANSS occurred in the United States, and its psychometric properties were validated based on the country's population by diverse United States raters. 17 Additionally, our team used the United States as a reference group in a previous DIF analysis of the PANSS. 18 Statistical analysis. We first conducted an exploratory factor analysis (EFA) to determine if the dataset adhered to the seven PANSS negative symptom factor (NSF) items (i.e., N1 Blunted Affect, N2 Emotional Withdrawal, N3 Poor Rapport, N4 Passive Social Withdrawal, N6 Lack of Spontaneity and Flow of Conversation, G7 Motor Retardation, and G16 Active Social Avoidance). Next, we performed a confirmatory factor analysis (CFA) on the seven NSF items for the entire dataset. For the CFA, the Kaiser-Meyer-Olkin (KMO) measure evaluates whether the responses given by the sample are adequate; Kaiser 24 recommends a 0.50 value for KMO as the minimum (barely accepted), with values between 0.70 and 0.80 determined to be acceptable and values above 0.90 determined to be excellent. 24 We also assessed Bartlett's test for significance (<0.05), which indicated a rejection of the null hypothesis. The following indices of goodness-of-fit were computed and used for model evaluation: the chi-square difference test, comparative fit index (CFI; values>0.90 represent acceptable fit), Tucker-Lewis index (TLI; values>0.90 represent acceptable fit), root mean square error of approximation (RMSEA; values<0.05 represent acceptable fit), and goodness-of-fit index (GFI; values>0.90 represent satisfactory fit). 25-27 CFA and chi-square difference tests were conducted using SPSS 23.0 28 and R. 29 We investigated the validity of the PANSS expressive-experiential distinction across 15 countries or geographical regions—South America-Mexico; Austria-Germany; Belgium- Netherlands; Brazil; Canada; the Nordic region (Denmark, Finland, Norway, and Sweden); France; Great Britain; India; Italy; Poland; Eastern Europe (Romania, Slovakia, Ukraine, Croatia, Estonia, and Czech Republic); Russia; South Africa; and Spain—as compared with the United States. The Mantel-Haenszel statistic was used in the analysis of DIF, as it creates meaningful comparisons of item performance for different geographical regions by comparing raters assessing subjects in similar countries, rather than by comparing overall group performance on an item. For DIF of the expressive and experiential deficit items, the expectation is that two individual item responses have a probability of p≤0.05 in accordance with the Rasch model; "α" is the type I error for a single test (incorrectly rejecting a true null hypothesis). Thus, when the data fit the model, the probability of a correct finding is (1-α) for one item and (1-α)n for "n" items. Consequently, the type I error for n independent items is 1-(1-α)n. Therefore, the level for each single test is α/n. For example, in order to reject the hypothesis that "the entire set of items fits the Rasch model" in a finding of p≤0.05 for four items on the expressive factor and three items on the experiential factor, at least one item would need to be reported with p≤0.013 and p≤0.017, respectively. Subjects were matched by severity level on the PANSS and grouped by geographic region; since DIF only allows for two groups per analysis, each region was compared with the United States. DIF testing was based on the chi- square statistic and is highly sensitive to sample size. 30 If the sample size is large, statistical significance can emerge even when DIF is quite small. DIF effect sizes can be investigated to alleviate this concern, because even though statistical significance is necessary for an item to demonstrate DIF, it is not sufficient. Zumbo et al 31 note that an item only demonstrates DIF if the significant difference in chi-square has at least a moderate effect size (0.30–0.79). Therefore, three criteria were used to flag items as differentially functioning: 1) statistically significant chi-square test statistic (p≤0.05), 2) effect size (ES), and 3) Educational Testing Services (ETS) DIF classification criteria. Since the statistically significant test statistic does not indicate that the magnitude of the DIF is significant, 32 a review of both the effect size

Articles in this issue

Archives of this issue

view archives of Innovations In Clinical Neuroscience - NOV-DEC 2017