Revisiting validity in cross-cultural psychometric test development : A systems-informed shift towards qualitative research designs

The validity of clinical psychological tests remains a challenging issue, especially when the tests are applied in cultural groups other than those for which they were originally developed. Notwithstanding impressive developments over the past few decades in the field of clinical cross-cultural psychometric test development – reaching a high point in the International Test Commission (ITC) Guidelines on Adapting Tests – several authors point out the difficulties in obtaining validity or equivalence for adapted measures, particularly construct validity.


articles
The validity of clinical psychological tests remains a challenging issue, especially when the tests are applied in cultural groups other than those for which they were originally developed.
Notwithstanding impressive developments over the past few decades in the field of clinical cross-cultural psychometric test development -reaching a high point in the International Test Commission (ITC) Guidelines on Adapting Tests -several authors point out the difficulties in obtaining validity or equivalence for adapted measures, particularly construct validity. 1 6 With regard to terminology, validity usually refers to a test's ability to measure what it is supposed to measure and to capture the true meaning of the concept.It includes construct-, content-and criterion-related validity. 1,7The term equivalence indicates whether an adapted or translated test reliably measures what the original version measured and in the same way. 2,3,8,9A related term, test fairness, refers to the comparability of score meanings across individuals, groups or settings. 10Closely linked to the issue of validity is the issue of bias that detracts from the validity or equivalence of an adapted test. 3,4Societal values are also argued to underpin validity. 11is article focuses on the question of how to enhance validity in clinical cross-cultural psychometric test development, and explores a possible solution based on systems theory by building on the work of Kitayama. 12The scope of the article is limited to the field of clinical psychology, and excludes other fields where crosscultural psychometric testing is used, e.g.industrial psychology or the human resources context.In the present scope, psychological or psychometric tests are taken to include measures of intelligence, personality inventories, projective tests, and self-report measures of various psychological variables such as anxiety and others.
We track some of the progress in the field of clinical cross-cultural psychometric test development before and since the appearance of the ITC Guidelines.We then propose a systems-informed paradigm shift and suggest that qualitative research designs might be used more frequently in clinical cross-cultural psychometric test development.

Before the ITC Guidelines
Until the early 1990s the adaptation of tests for cross-cultural use was characterised by a relatively low level of methodological sophistication in comparison with the sophisticated testdevelopment, validation and norming techniques used for the original monolingual tests. 8In cross-cultural test development various combinations of approaches have been used to adapt tests, including translation, back-translation, decentring, adaptation of test content or administration procedures, and establishing culture-specific norms. 6,9,10,13 20

Revisiting validity in cross-cultural psychometric test development:
A systems-informed shift towards qualitative research designs In some previous test-development studies respondents' cultural contexts were not taken into consideration, whereas in others constructs were not clearly defined. 10Some studies fail to account for the effects of confounding variables such as gender, sexual orientation, religion, ethnicity, or educational and socio-economic factors. 21,22More importantly, there appears to have been hardly any attempt to address issues relating to societal bias and value judgements in clinical cross-cultural test adaptation, e.g.crosscultural differences in the evaluation of sensitive topics such as honesty, work ethic, aggression, immorality, or sensitivity to gender-specific experiences.
The work of the ITC has aimed to reduce the impact of some of these limitations, mainly by drawing attention to the need to establish the equivalence of constructs as used in different cultures and the need to document evidence that sources of potential bias were considered in the process of test adaptation. 5

The ITC Guidelines on Adapting Tests
The ITC project aimed to prepare and disseminate a set of validated guidelines for adapting tests and psychological instruments, and establishing score equivalence across cultural groups.Twelve psychologists from eight international organisations collaborated in developing 22 guidelines on adapting tests.Although the ITC represents the broad field of psychology as a whole and does not specifically serve the interests of clinical psychology, the guidelines do apply to the field of clinical psychology as well.
The guidelines cover the following areas: cultural context, test development and adaptation, test administration, and documentation/score interpretations.Each guideline is described by a rationale for including the guideline, steps for addressing the guideline in practice, a list of common errors, and a set of references for additional study. 5,8,23 summary, the ITC Guidelines on Adapting Tests encourage test adaptors to provide evidence that they have done the following: • accounted for cultural and linguistic differences between the populations studied • listed the changes made to the test • used judgemental (i.e.evaluative) methods to ensure equivalence of all language versions • checked for linguistic equivalence among all language versions • tested statistically for construct equivalence among all language versions • tested statistically for item equivalence among all language versions • identified problematic or non-equivalent components of the test • tested the validity of the test in the target populations • checked that testing formats, techniques and procedures are familiar to all intended populations • standardised administration procedures across all populations for whom the test was intended • prepared appropriate materials and instructions in both/all languages for all intended populations • minimised the role of the test administrator • interpreted differences in scores by different populations in the light of other empirical evidence and the socio-cultural and ecological contexts of the populations.
In building on these guidelines for test adaptation, numerous methods have been suggested to improve equivalence and enhance validity, and to reduce various sources of bias.Forwardadaptation designs have been distinguished from backwardadaptation designs, suggestions have been offered for the selection criteria and training of translators, and three sources of bias (construct, method and item bias) have been identified, along with statistical procedures that detect their presence and assess equivalence.Furthermore, the ITC Guidelines on Adapting Tests have also been field-tested in actual test adaptation projects and the guidelines revised accordingly. 1,3,4,7,9,23 addition to the Guidelines on Adapting Tests, the ITC also published the International Guidelines for Test Use according to which competent test users or administrators should take responsibility for ethical test use by acting in a professional and ethical manner, ensuring that they have the competence to use tests, keeping test materials securely, and treating results confidentially.24 Good practice in the use of tests further includes evaluating the potential utility of testing in an assessment situation, choosing technically sound tests that are appropriate for the situation, giving due consideration to issues of fairness in testing, making necessary preparations for the testing situation, administering the tests properly, scoring and analysing test results accurately, interpreting results appropriately, communicating the results clearly and accurately to relevant others, and reviewing the appropriateness of the test and its use.24 This set of guidelines for test use appears to be clearer, better refined and more comprehensive than the guidelines on adapting tests.Perhaps this difference might reflect the greater complexity and difficulties around developing valid psychological tests for cross-cultural application.
The ITC has certainly shaped and streamlined current cross-cultural test-adaptation practice.However, it appears as if the emphasis has remained on quantitative methods. 23This focus on quantitative methods, through failing to account for all relevant variables may have contributed to the neglect of one of the first-mentioned guidelines on adapting tests -guideline D.1.-which insists that the adaptation process take 'full account of linguistic and articles cultural differences among the populations for whom adapted versions of the test are intended'. 5,23Despite attempts such as those by Jeanrie and Bertrand, who distilled from the ITC guidelines three main principles, the first and second of which emphasise the importance of construct validity and judgemental decisions, it seems that the meaning and purpose behind the concepts are often still being neglected. 7For instance, Van de Vijver and Poortinga acknowledge that construct and method bias are often not examined sufficiently. 4rking in a geographically and linguistically more isolated context, South African clinical psychometric test developers have had to deal with a challenging multicultural context and limited professional resources.These factors have influenced progress in the South African context.

The South African approach to test adaptation
Multicultural South Africa can be considered an especially challenging context for clinical psychometric test development because of the 11 official languages; the large variety of cultural, ethnic and religious belief systems and worldviews; the large rural population with little formal education; and widespread socio-economic deprivation -all of which contribute to 'culture'.Moreover, an individual may represent more than one cultural group and speak several languages, which can complicate testing.Furthermore, the local realities of less-than-ideal working conditions, a shrinking clinical workforce, an increasing workload on clinical psychologists, and inadequate language resources in public psychiatric service delivery contexts would constrain efforts to develop or adapt tests for all 11 official language groups.
Amid the above complexity, commendable test-adaptation work has been done in South Africa, e.g. the adaptation of the South African Wechsler Adult Intelligence scale (SAWAIS) and the adaptation of clinical self-report measures -of depression, alcohol use and post-traumatic stress disorder -for a Xhosaspeaking population. 6,14,22 the whole, however, the frequency of South African academic publications on clinical psychometric test development has declined over the past few decades and theoretical developments have lagged behind. 25We believe that there is a need in South Africa for more and valid tests of psychological constructs other than intelligence, such as personality inventories, projective tests and self-report measures of emotional variables such as anxiety, anger, etc., for use in all cultural groups.With regard to personality, a promising local project in the industrial psychology context to develop a comprehensive questionnaire to assess personality among all South-African language groups should be noted. 26though the local practical and social constraints are not insurmountable, there are no guarantees that following the ITC Guidelines would necessarily yield cross-culturally valid, equivalent tests that are fair to all cultural groups.In particular, unresolved potential differences in cultural subgroup-based values, especially values concerning some of the sensitive topics mentioned earlier, might diminish progress.Perhaps most importantly, the question arises whether an adapted test -that was not originally developed for a specific cultural group, and in which the intended meaning is not clearly embedded in the particular target-cultural context -could ever be regarded as a valid and useful instrument for that cultural group.
Might the solution lie in stepping back to a point where clinical cross-cultural psychometric test development is conceptualised from a different theoretical framework?

A systems-informed paradigm shift
The following section will explore the potential benefits of applying systemic thinking in the field of clinical cross-cultural psychometric test development.In particular, we will draw from the influence that systems theory has had in the field of family therapy, where the world is seen in relational, reciprocal terms, and where the concern is as much about the process as the content of a given relationship. 27 32Closer to the present cross-cultural focus, systemic family therapy has been shown to facilitate cultural sensitivity. 33re specifically, systems theory has been applied to the field of cross-cultural psychology before.For instance, Kitayama comments on the thorough review and meta-analyses conducted by Oyserman et al. of the proliferation of cross-cultural psychometric studies on the cultural values of individualism and collectivism. 12,34itayama expresses concern about the appropriateness of using attitudinal surveys to measure cultural values, and questions the utility of the underlying entity view of culture.Kitayama then presents an alternative meta-theory, a system view of culture, according to which the future of the study of culture in psychology might lie in analysing culture-dependent functional relations among variables rather than considering these variables in isolation. 12He stresses recognition of the dynamic interplay between psychological tendencies and social situations/cultural contexts.The functional relationships among variables might be interpreted as explanatory hypotheses, e.g. about two different ways in which social support might give rise to happinesshypotheses that may be tested empirically. 12The dynamic interplay between psychological tendencies and cultural context might be expressed in research designs where descriptions of social situations are used to study the psychological reactions of people from different cultures. 12 concur with Kitayama that a systemic approach holds certain benefits for the field of cross-cultural psychological research.However, our appreciation of the benefits of systems thinking is seated at another level.We understand Kitayama's theoretical framework to remain positivist and propose a shift towards an interpretivist -even a critical -paradigm where articles researchers or test developers are open to the emergence of new meaning as created by participants from various cultures and where researchers collaborate cross-culturally in the interests of emancipation from the results of invalid research.
An application of certain systemic concepts helps us to make this paradigm shift.In particular, the characteristics of neutrality, circularity and relationality are worth considering for their potential value in the process of developing psychometric tests.
One of the characteristics of therapies based on systemic theory is the stance of neutrality that is encouraged by these approaches, according to which the therapist models an attitude of respect, non-discrimination and acceptance of each system member's perspective as valid. 27,29 32It also implies an attitude of openness to the emergence of new and multiple layers of meaning.We should bear in mind, however, that neutrality may be misconstrued as the therapist -or in our context the test developer -hiding behind a mask of objectivity and disregarding the creative input by the system members.For this reason, the term 'neutrality' might carry lingering overtones of objectivistic positivism.On the contrary, what matters here is critical self-awareness on the side of test developers of their own and others' cultural biases.Hence, perhaps the application of systemic concepts could be supplemented by the phenomenological idea of highly reflexive intersubjective intentionality in this instance. 35stemic authors also emphasise circular thinking, according to which interactional patterns are identified and potentially altered to improve relationships. 30,31Hence, one of the aims in systemic family therapy is to understand and explain any selected item of behaviour by viewing it in relation to its wider context of social interaction.Multiple individual perspectives provide a complex holistic picture. 31,32plying circularity would mean engaging in a process of to-andfro communication between the test developer, target group and other relevant parties, creating a feedback loop for mutual information and progressive communication.For example, a multidisciplinary, multicultural test-developing team might consist of a mental health care worker, a cross-cultural psychologist, a linguist, a psychometrist, a researcher and one or more members of the target culture (i.e.insiders from the relevant cultural group).
The expertise provided by all these different disciplines would bring a variety of different, possibly conflicting perspectives and values, which would allow multiple layers of meaning to emerge.
Relationality in the context of psychometric test development would imply a reciprocal relationship between the test-developing team and the target group participants.Test developers cannot exclude their own impact on the process as if they were objective outsiders.For this reason, test developers should not develop tests in isolation without engaging with the target group.Instead, a systems-informed process of test development would be characterised by consultation and collaboration with the target group before embarking on test-adaptation or test-development activities.The test-developing process should be informed by the needs and belief systems of that specific community and the cultural context in which they live.A participant-observer stance would equip the test-developing team to interpret phenomena through the eyes of the participants.
Focusing more on process than on content issues amid conflicting perspectives would encourage the natural unfolding of creative potential in test development.The test developers would no longer concern themselves with following the 'correct' procedures as per Western guidelines, but would trust that the process would be appropriate for the 'target' group.Note that 'target' is becoming an inappropriate term if one considers the relatedness between the test developer and the participants in the context of a testdeveloper-participant team.
Without such multidisciplinary, multicultural teamwork there would be a risk of role diffusion.For example, if one individual served simultaneously as the translator and as a member of the 'target' cultural group, this individual would stand with one leg in the Western psychometric tradition (as translator) and the other leg in the 'target' culture.The individual's psychometric training would already imply a personal history of westernisation and exposure to the accompanying positivistic research paradigm, and hence a contamination of his/her role in representing the interests of the 'target' cultural group.Although one would have hoped that a translator from the 'target' cultural group would notice and react if a specific concept did not translate easily, the quality of the outcome might be better if done through multidisciplinary, multicultural teamwork where different roles are clearly defined.
The outcome after the proposed test-development process would hopefully be a tailor-made psychometric test that is a valid measurement of the identified construct in that cultural group.Note that regular itemetric analyses would still have a role in contributing to a more comprehensive process of validation and reliability testing. 5e potential problem could be that such a newly developed test might turn out to be very different from the tests developed in or for other population groups, which would jeopardise or even preclude quantitative cross-cultural comparisons.The question may then arise how one could go about doing cross-cultural comparisons in the absence of cross-culturally equivalent and valid tests.One might even wonder what would be the point of doing quantitative cross-cultural comparisons using tests that are not truly valid for all cultures concerned?
On a more general level, one might wonder about the compatibility of two approaches: on the one hand celebrating cultural diversity and on the other hand taking into account cultural differences with a view to standardising some of the psychological tools that have become very valuable in modern industrialised societies. articles In response to a comment by Van de Vijver and Leung that '… it is naive to assume universality …' (p.146), 9 perhaps the solution lies in making a qualitative shift. 9In the light of our growing appreciation of the rich diversity of human experience and the complex nature of the relationships between individual psychological characteristics and socio-cultural context, perhaps a shift towards using more qualitative methods in clinical crosscultural psychometric test development might help researchers get to the meaning of the concepts that are being studied crossculturally.

Limitations and conclusions
The challenge of enhancing validity in the field of clinical crosscultural psychometric test development appears to have been fuelled by inter alia societal biases and value judgements, a positivistic paradigm, and challenges associated with multiculturalism that influenced cross-cultural research methodology.As a result, an overemphasis on quantitative methods and insufficient exploration of the meaning of the concepts to be measured has tended to threaten construct validity.
It is proposed that the field of clinical cross-cultural psychometric test development may develop more effectively if a shift could be made to include the use of systemic concepts in the design of cross-cultural studies.Such a systems-informed paradigm shift would involve the application of systemic concepts such as circularity, relationality, neutrality and a concern with process issues.This shift would help to ensure that the meaning of the relevant concept/s is captured in a valid way for each cultural group.
The approach suggested above could have a number of limitations.Although a multidisciplinary, multicultural testdeveloping team would be the ideal, it might not always be possible to obtain all the required members because of geographical and organisational constraints.For example, despite diversity development in South Africa, the field of clinical psychometric test development is small and highly specialised and not all cultures are yet represented in the field.At a professional level, this article might have implications for the training of test administrators and interpreters, who would ideally undergo crosscultural sensitisation or training to be culturally sensitive, if not knowledgeable about the 'target' cultural group.
With regard to revision of the ITC Guidelines on Adapting Tests, our appeal would be to expand the guidelines to recommend a phase of pre-test-development consultation with members of the 'target' cultural group/s, with a special focus on the deeper meaning of the construct to be measured and its associated value judgements.
We propose that qualitative research designs be used more frequently in clinical cross-cultural psychometric test development in order to advance theoretical development in the field.This follows from increasing calls for a greater emphasis on the theoretical and conceptual aspects of cross-cultural test adaptation (rather than the psychometric aspects), as well as on construct equivalence. 2,9phasis would be better placed on building theory and generating hypotheses, in order to pursue a deeper understanding of the constructs under investigation.Not only are qualitative methods well suited to facilitate the interpretation of the meaning of the findings, but qualitative methods also facilitate '… explanations that acknowledge the complexity, interdependence and mutual causality of phenomena, and the possibility of different interpretations and perspectives on reality' (p.49). 36Moreover, qualitative methods can serve as a means whereby theory itself can be developed or generated. 36Furthermore, qualitative designs would facilitate an ethical approach to 'target' cultural groups, ensuring that their voices are heard.Finally, qualitative approaches would also bear out the paradigm shift promoted earlier -away from the positivist paradigm that appears to have limited theoretical advances in the field of clinical cross-cultural psychometric test development.
What remains to be decided by individual test developers, however, is which qualitative designs and methods would best suit a specific project.For example, in order to develop crossculturally valid subjective measures of depression for all language groups, which qualitative design type should be employed in the initial stages of the process to gain clarity on the meaning of the concept 'depression' in each group -a phenomenological, grounded theory, ethnographic, ethno-methodological, discourse analysis, instrumental case, or action research study?Each of these design types would yield unique empirical findings that could inform the subsequent quantitative stages of the testdevelopment process in different ways.
Note that quantitative psychometric methods, including itemetric analyses, would not become obsolete.On the contrary, their role and value would be enhanced when construct validity is less of a problem.However, as mentioned earlier, unanswered questions remain about whether or not cross-cultural group comparisons would be meaningful if such initial qualitative studies demonstrate divergent meanings attached to the concept in question among different cultural groups.
In conclusion, this article presents a limited viewpoint on how systems theory might influence the process of clinical cross-cultural psychometric test development.Owing to space constraints, further exploration of examples of how the suggested procedures could be carried out and how such procedures would address the issues raised cannot be undertaken here.The actual application of the proposed approach to test development (including the procedures for how qualitative methods and the systemic approach would be applied to the test construction process) and