38 ICNS INNOVATIONS IN CLINICAL NEUROSCIENCE November-December 2017 • Volume 14 • Number 11–12 O R I G I N A L R E S E A R C H and India, the small number of DIF identified for these two items might be due to less subjectivity in interpretation on the part of the rater or to less variability in the presentation of these core, unmistakable features of the illness across most geographical regions. There are several likely explanations for DIF among raters across and within diverse geographical locations. 45 One key reason might be variation across raters in measurement procedures and variability in interpretation of measurement result. This variability in measurement procedures (e.g., PANSS administration, interview skill, interview environment) and variability in interpretation (i.e., scoring the PANSS NSF) implies that when differences occur once raters have agreed upon criteria for administering and scoring a symptom, they are the result of decision-making differences in the scoring of the item. 18 Since cultural differences cannot be standardized, the development of a standardized international PANSS training curriculum is not possible. However, training can be culturally adapted to manage these differences by supplementing the standard PANSS training with additional culture-specific training. Akin to the linguistic and cultural validation processes employed in the translation of rating scales, rater training could also include linguistic and cultural methodologies based on findings from cultural analysis of rating scales. For countries for which normative data are not available, this can be achieved by providing "cultural translations" of specific PANSS items, concepts, and symptoms. Such "cultural translation" could involve the employment of native culture-specific experts to provide detailed guidance on how specific items and concepts on the PANSS are manifested in their cultures. For instance, when deploying rater training for negative symptom trials, training should be customized for geographical location, cultural and language norms, and expectations of what constitutes endorsing each anchor point for the items with large DIF. It should be especially ensured that the training received for raters in the United States, India, Brazil, and other heterogeneous regions, captures within-region variability in language, cluture, and social constructs. Our study identified significant moderate-to-large DIF for items of the PANSS expressive deficit across geographical locations as compared with the United States. Dissimilar social interpretations due to geographical and cultural influences might lead to different ratings of social and emotional behaviors present in the PANSS expressive deficit and should be subjected to interim item analysis throughout a clinical trial. Despite social, linguistic, and cultural differences between sites, large international clinical trials will continue to be conducted, and data from these trials will be combined to assess efficacy. For this reason, it is important to underscore that DIF does not denote that the scores provided by the raters are not appropriate for the culture, but that the interpretation of the anchor points as outlined in the PANSS can be further explored to lessen large scoring discrepancies among regions. For example, in a previous DIF study conducted by our group in which all subjects viewed the same PANSS interview video, we also found differences in the interpretation of anchors across geo-cultural regions. 18 The expectation of supplemental training is not to homogenize the understanding of a symptom, but rather to clearly define that symptom within a social and cultural context. Limitations. The present study has some limitations. First, we examined subjects with chronic schizophrenia who were screened for enrollment in various clinical trials and who were taking one or more antipsychotic medications. Consequently, this study's results are not generalizable to subjects in different illness courses, such as first-episode subjects or subjects who are not on an antipsychotic medication. Second, the data used in the analysis comprises data collected in 16 clinical trials that did not specifically focus on negative symptoms, although the overall Negative Symptom subscale score and the NSF score were higher than the overall Positive Symptom subscale score for this sample. Additionally, scores on the NSF ranged from 7 (lowest possible score) to 48 (highest possible score is 49). The baseline data from these 16 trials are also representative of individuals who enter multicenter international clinical trials. Third, this analysis focuses on raters from 15 geographical locations with varying levels of proficiency and experience in scoring the PANSS. Although all raters received rater training and certification prior to conducting PANSS assessments, training and certification processes differed across the 16 studies, and specific interrater reliability values were not available. Fourth, although this is a very diverse sample, it does not include every area in which clinical trials are commonly conducted (e.g., the Philippines). Additionally, some could argue that our groupings are themselves heterogeneous (e.g., Finland among the Nordic countries has language differences; grouping Mexico and South America together was not based on geographic location, but rather on language similarities). Fifth, as this study examines PANSS scores at baseline only and not longitudinally, treatment change was not addressed. Sixth, our dataset did not contain the language in which the PANSS was administered, the specific site location within the geographical region, or rater information (e.g., experience level, qualifications). We recognize that these could influence the differences in scoring responses. Seventh, rater training could not be examined using the currently available data and should be addressed in future studies assessing cross-regional comparisons. Finally, we acknowledge that individuals with negative symptoms might not provide accurate information or enough information for adequate assessment of a symptom. CONCLUSIONS Following research conducted over the past 30 years, this study addressed how items of the PANSS expressive and experiential deficits function across cultures. Items of the PANSS expressive deficit show more DIF across 15 geographic regions, as compared with the items of the experiential deficit. These differences among geographical regions might be related to rater cultural interpretations, language differences, social experiences, probability of the subject endorsing negative symptoms, rater training, and/or subject geo-cultural variability. The results of this study could be useful in protocol development, rater training practices across geographical regions, and decision- making among clinicians and researchers. Furthermore, these results might highlight subtle phenomenological differences between expressive and experiential deficits that can be used to guide future research. Future efforts to develop scales assessing negative symptoms would benefit from examining whether a scale functions in the same way across regions, cultures, languages, severity levels, and in relationship to functional outcomes. Harvey et al 46 use these factor structures to examine their

