Invited symposia

Invited symposium 1 / Tuesday, 3rd July / 11.00-12.30 / Room: Grote zaal

The assessment of critical thinking: Cross-cultural and validity issues

Chair

Diane F. Halpern (Department of Psychology, Claremont McKenna College, CA, USA)

 

Symposium abstract

Blue Ribbons Panels such as the one assembled by National Academy of Sciences in the United States, employers, accrediting agencies around the globe, teachers at all levels, and even social media "buzz" all agree that the key skill for citizens in the 21st century is critical thinking. With knowledge proliferating at an exponential rate and fewer jobs that rely on solving routine problems, a new type of intelligence is needed and with it a new type of assessment. The ability to think critically is what Stanovich calls "what intelligence tests miss." The Halpern Critical Thinking Assessment (HCTA) covers five broad categories of skills (verbal reasoning, argument analysis, hypothesis testing, likelihood and probability, and decision-making and problem solving) posed in the context of common everyday situations. Papers in this symposium examine cross-cultural analyses of this assessment and other measures of critical thinking.

 

Paper 1

Expanding the validity of the Halpern Critical Thinking Assessment (HCTA) with real-world outcomes on critical thinking

Heather Butler (Department of Psychology, Claremont Graduate University, CA, USA)

 

The Halpern Critical Thinking Assessment (HCTA) is a reliable and valid measure of critical thinking skills. It has been validated with many different populations and measures of academic success (Halpern, 2010). Few assessments of critical thinking have been validated with "real-world" outcomes of critical thinking beyond measures of academic performance. This study explored whether scores on the HCTA predicted real-world outcomes of critical thinking. Community adults (n = 50) and college students (n = 84) in the United States completed the HCTA and a behavioral inventory of life events. The life events varied from mildly negative (e.g., paying a late fee for returning a movie rental late) to severely negative (e.g., filing bankruptcy), and accounted for behavior in a wide range of domains (education, health, finance, and interpersonal relationships). Those with higher critical thinking scores reported fewer negative life events than those with lower critical thinking scores.

 

Paper 2

Formal Logic versus Critical Thinking: The Role of Language in Assessment

Kelly Y. L. Ku (Department of Education Studies, Hong Kong Baptist University, Hong Kong, China)

K. T. Hau (Department of Educational Psychology, Chinese University of Hong Kong, Hong Kong, China)

 

This presentation addresses the needs to allow verbalization of reasoning in critical thinking assessment. I begin the orientation to the conception of critical thinking by drawing a line between critical thinking and formal logic. The distinction between the two is a prerequisite to understanding the significance of relying on sophisticated use of language to reason in the process of critical thinking. Empirical findings on the relationships between students' verbal reasoning ability and critical thinking performance are presented. Implications on assessment of critical thinking are discussed.

 

Paper 3

Examining the influence of culture on critical thinking

Vivian Miu-Chi Lun (Department of Sociology and Social Policy, Lingnan University, Hong Kong, China)

 

The ability to think critically is a crucial attribute expected of university graduates. However, the endorsement of critical thinking in higher education has been challenged by the growing cultural diversity in university classrooms. There have been concerns about students of different cultural backgrounds showing systematic difference in critical thinking. This presentation seeks to highlight a few areas that culture may have an effect on the teaching and learning of critical thinking in higher education. Related empirical findings will also be reported to illustrate these ideas.

 

Paper 4

Assessing critical thinking in Portugal: Translation and adaptation study of Halpern critical Thinking Assessment (HCTA)

Amanda H. Franco (Institute of Education, University of Minho, Braga, Portugal)

Leandro S. Almeida (Institute of Education, University of Minho, Braga, Portugal)

 

To successfully assess and intervene in educational and psychological settings, assessment tests must be valid, reliable and designed to respect the nature of the evaluation goals and the characteristics of the population they are intended for. Considering, on the one hand, the relevance increasingly conveyed to critical thinking, which has been correlated to variables such as academic performance or daily life success, as well as, on the other hand, the lack of critical thinking assessment tests in Portugal, our study aims to translate and adapt an internationally used test, the Halpern Critical Thinking Assessment (Halpern, 2010), so it can be used in assessment and intervention efforts. We present the translation and retroversion study, the qualitative analysis of the items (thinking aloud method), the conclusions resulting from the discussion of the scoring prompts led by experts in the cognitive assessment field and, finally, the experimental version's application data. We report our results and infer some considerations about the translation and adaptation process, and the characteristics of the HCTA's Portuguese version. Finally, we define future study goals aiming the assessment of critical thinking in academic settings using the HCTA.

 

First Discussant

Diane F. Halpern (Department of Psychology, Claremont McKenna College, CA, USA)

 

Second Discussant

An Verburgh ( / Center for Instruction Psychology & Technology, Catholic University of Leuven, Belgium)

 

 

 

 

 

Invited symposium 2 / Tuesday, 3rd July / 13.45-15.15 / Room: Mauritszaal

Reflections on the International Journal of Testing: History, influential articles, and getting your research published

Chair

Stephen G. Sireci (University of Massachusetts Amherst, USA)

Moderator

John Hattie (University of Melbourne, Australia)

Symposium Abstract

The International Journal of Testing (IJT) is the flagship publication of the International Test Commission and is in its 12th year of publication. In this session, the current editorial team and the previous editor will draw upon their experience in editing the Journal to (a) highlight some of the most influential papers published in IJT and (b) describe the current trends in research and practice in educational and psychological assessment that are of interest to assessment researchers and practitioners throughout the world. Key themes of the session will be how to identify research worth publishing in IJT and other measurement journals, how to maximize the likelihood a manuscript submitted for publication will receive a favorable review, and emerging trends in international assessment research.

 

Paper 1

Conducting research worth publishing: Illustrations from the International Journal of Testing

Stephen G. Sireci (University of Massachusetts Amherst, USA)

 

In this presentation, examples from recent articles published in IJT will be used to illustrate how identifying an important research problem, and communicating the importance of that problem to readers, are critical factors in getting research published. The review process for IJT will be described to illustrate the importance of being tenacious when receiving feedback from the review process. Goals of this presentation include informing audience members about the publication process and inspiring them to create quality manuscripts that are likely to receive favorable reviews.

 

Paper 2

Some exemplary, unacceptable, and most improved submissions to the International Journal of Testing

Rob Meijer (University of Groningen, The Netherlands)

 

In this presentation I will share some experiences with the audience about how it is to be an editor of IJT. To do this I will give examples of manuscripts that were outright rejected, manuscripts that were accepted and for which it was clear from the start that it was very likely that they would be accepted, and manuscripts that were accepted although an initial round of reviews were not that positive.

Paper 3

Qualitative and quantitative analysis of published articles and publication processes of the

International Journal of Testing

April Zenisky (University of Massachusetts Amherst, USA)

 

In recent years, approximately 80 to 100 unique manuscripts have been submitted annually to IJT for publishing consideration. This presentation will use submission and final manuscript decision data to provide insight into IJT's publication processes and editorial perspectives. The intent here is to describe the publication processes of IJT in the context of the full picture of what is submitted, with special attention given to illustrating content and methodological trends in publication decisions.

 

Discussant

The International Journal of Testing: Past, present, and future

John Hattie (University of Melbourne, Australia)

 

 

 

 

Invited symposium 3 / Tuesday, 3rd July / 15.45-17.15 / Room: Mauritszaal

Dealing with response distortions in applied questionnaires

Chair

Olxander (Sasha) Chernyshenko (Nanyang Technological University, Singapore)

 

Symposium abstract

This symposium presents several studies dealing with response biases commonly encountered in questionnaires. Approaches used in the four studies are very diverse and include: a) changing the format in which items are presented (e.g., forced-choice, SJT), b) modelling halo biases as a separate latent dimension, and c) identifying unmotivated examinees post hoc. Samples involving over 200,000 individuals assessed in different countries for research, personal development and organizational selection purposes are used to illustrate the effectiveness of each approach in improving validity and cross-cultural comparability of applied questionnaires.

 

Paper 1

Forced-choice vs. Likert scales in cross-cultural survey assessments

Jonas P. Bertling (ETS, USA)

In PISA and other large-scale assessments, a quite robust phenomenon is a reliable difference between individual and national level relationships for certain questionnaire scales. For example, in PISA 2003, interest in, self-concept in, and motivation for mathematics, measured by Likert scales, were found to correlate positively with mathematics achievement within countries (r = .20 to .50). But the correlation at the national level was simultaneously found to be strongly negative (r = -.30 to -.80). A question is whether this reflects true differences between countries in attitudinal factors (high achieving countries have worse attitudes towards mathematics), or merely a method artifact. In the PISA 2012 Field Trial (65 countries, N = 85,000), we compared Likert-scale and Forced-choice versions of certain questionnaire scales. No major differences between the two formats on within-country correlations were found. Correlations at the country level were in the same direction as correlations within country for the Forced-choice format but not for the Likert-scale format. Our results suggest that differences between national and individual level correlations might not reflect construct differences as much as method effects, and that the Forced-choice format may have advantages over the Likert-scale in cross-cultural research.

 

Paper 2

Counteracting halo / horn and other rater effects by forcing choice

Anna Brown (University of Cambridge, UK)

 

The stiffest challenge in interpreting assessments carried out by raters is delineating the real effects (e.g. other people's behavioural styles, product features, patients' experiences with service etc.) from the rater effects, such as 'leniency / severity' and 'halo / horn'. The latter are over-generalised judgements, often leading to effective redundancy of assessment facets, which are overshadowed by the overall 'halo' or 'horn'. One way to deal with these effects is to model them; another is to force raters to differentiate by using comparative judgements. This study examines quality of measurement in a 360-degree feedback context using both of these approaches. The Inventory of Managerial Competencies (IMC) was administered to N = 817 line managers to seek assessments of their direct reports on 16 behavioural domains, using the single-stimulus format, and the multidimensional forced-choice (MFC) format. The single-stimulus data yielded a strong halo, which could be modelled by an 'affective overtones' construct in an overall IRT model. The forced-choice data, modelled and scored with a Thurstonian IRT approach (Brown & Maydeu-Olivares, 2011), was free of halo. Moreover, it yielded better measurement of competencies than the single-stimulus ratings. It is concluded that comparative judgements in this context elicit finer, more nuanced evaluations than absolute judgements.

 

Paper 3

Development of a person-fit index for multidimensional pairwise preference tests

Stephen Stark (University of South Florida, USA)

 

This paper focuses on statistically identifying applicants who do not do their best on multidimensional pairwise preference tests and making appropriate adjustments for them. Our proposed approach is based on ideas from appropriateness measurement (Levine & Drasgow, 1982), which is sometimes called person fit. Specifically, we developed a new appropriateness index that can be used with forced-choice tests of any dimensionality and devised a simulation-based strategy to derive critical cut off values. We examined the effectiveness of the proposed method in simulation and empirical studies involving non-adaptive and adaptive personality tests. High power and low Type I error rates were observed for conditions where substantial random or fake good responding was present.

 

Paper 4

Effects of unmotivated responding on validities of forced choice tests

Olxander (Sasha) Chernyshenko (Nanyang Technological University, Singapore)

 

This paper aims to investigate effects of random and socially desirable responding on validities of multidimensional forced choice personality tests. In the first study, we simulated several realistic scenarios for unmotivated responding for non-adaptive and adaptive tests and found validities diminish only marginally, unless the extent of aberrant responding was extreme. In the second study, we analyzed responses from U.S Army applicants (N = 250,000) who had taken either a static or adaptive versions of a forced choice personality test and found little evidence of unmotivated responding. As expected, excluding unmotivated respondents increased observed validities for a number of outcomes.

 

Discussant

Richard Roberts (ETS, USA)

 

 

 

 

Invited symposium 4 / Wednesday, 4th July / 08.45 - 10.30 / Room: Mauritszaal

International Test Commission Standards for Assessing Linguistic Minorities

 

Chair

Alina A. von Davier (Educational Testing Service, Princeton, USA)

 

Symposium abstract

Language is a barrier between people who speak and operate in different languages; however, it can also be a barrier to valid measurement whenever an examinee is not fully proficient in the language in which the test is administered.

As an international organization devoted to promoting fair and valid assessments around the globe, the International Test Commission will develop a document to raise awareness about the problems in measuring the knowledge, skills, and abilities of linguistic minorities (i.e., the relatively small proportion of people who take a test in a language in which they are not fully proficient) and to provide guidance to test developers, test administrators and those who interpret test scores for such test-takers.

This symposium will illustrate the work on the standards from several perspectives: psychometrics, test validity, cross-cultural dimensions in assessments, quality control, and the special features of the psychological tests. A discussion that synthesizes these perspectives will be provided in the context of a specific language / country operational experience.

 

Paper 1

Standards for Testing Linguistic Minorities: Language and Acculturation Issues

Fons J. R. van de Vijver (Tilburg University, the Netherlands; North-West University, South Africa; University of Queensland, Australia)

 

This presentation will discuss issues that are specific to assessment that become salient among populations with a very different cultural background than the target population of the instrument. I will consider two population characteristics with special relevance: cultural distance between the tester and target population and knowledge of the language of assessment. These moderators have an impact on all types of assessment; cognitive performance is underestimated in such populations, although the degree of underestimation may be hard to estimate. However, non-cognitive measures may also be influenced (e.g., response style differences). If there is doubt about the validity of test results because of these moderators, special measures can be taken to enhance the validity, such as the assessment of cultural distance, acculturation, and language of the target test. I will illustrate how instruments can be adapted so as to make these more widely applicable. Special attention will be paid to construct definitions, the link between this definition and items, and documenting adaptations to enhance the validity of instruments.

 

Paper 2

Validity and Fairness Issues in Assessing Linguistic Minorities

Stephen G. Sireci (University of Massachusetts Amherst, USA)

Molly Faulkner-Bond (University of Massachusetts Amherst, USA)

 

The proposed ITC Standards for Assessing Linguistic Minorities were motivated by a desire to improve the validity of measures used to assess examinees who are not fully proficient in the language in which the assessment is administered. It has been acknowledged that tests in any subject inevitably end up being partial measures of language proficiency, in addition to whatever content knowledge or skill, or psychological construct, they intend to target. A consequence of this language load is that it becomes a source of construct-irrelevant variance that undermines accurate interpretation of the test performance of linguistic minorities. In this presentation we highlight the key validity issues that must be considered whenever linguistic minorities are tested. Many of these issues refer to test development activities (e.g., sensitivity review, inclusive pilot studies, DIF screening, cultural sensitivity in scoring constructed-response items), while others refer to validation activities that either support the appropriateness of the test for linguistic minorities, support the use of accommodations for linguistic minorities, or both. Based on the issues raised in the paper and research conducted in this area, we conclude with a list of the most critical factors to consider in test development and evaluation whenever linguistic minorities are tested.

 

Paper 3

Psychometrics in Support of a Valid Assessment of Linguistic Minorities

Alina A. von Davier (Educational Testing Service, Princeton, USA)

This presentation discusses the types of psychometric analyses needed to support the validity of a test taken by linguistic minorities. Content experts and psychometricians work together to ensure that the psychometric and validity requirements are met. Some of the psychometric arsenals to help meeting the requirements are (a) designing the data collection and sampling schemes by accounting for the linguistic minorities; (b) recommending the appropriate test length, item types, and scoring rules accounting for pre-existing knowledge about the linguistic minorities; (c) analyzing the data factorial structure in different groups of test takers to validate the content claims and the construct invariance claims; (d) designing a reporting scores scale that is meaningful to all test users; (e) recommending a long-term data collection design that prevents security breaches; and (f) devising a plan for detecting security breaches. To ensure test fairness psychometricians contribute by (g) analyzing the performance of each item and each test form on subgroups of interest and (h) ensuring that the equating results are group independent.

 

Paper 4

Guidelines for Assessing Psychological Qualities of Linguistic Minorities

Thomas Oakland (University of Florida, USA)

 

This paper summarizes scholarship on issues associated with the clinical assessment of examinees who are emigrants and lack fluency in the host country's primary or preferred language. Portions of the paper also may be relevant to the clinical assessment of second generation family members whose parents emigrated some years ago and are transitioning into a new culture and for whom proficiency in the country's primary or preferred language remains a goal. This review considers six somewhat broad issues that may impact the clinical assessment of examinees whose cultural and linguistic qualities differ from the local culture: the nature of the behavior being assessed, the quality of the measures used in the assessment, possible limitations that result from language and cultural differences, interpersonal relationships that may impact the assessment process, and ethical issues. Possible test modifications together with best practice guidelines are suggested.

 

Paper 5

Quality Control Procedures in Testing Linguistic Minorities

Avi Allalouf (National Institute for Testing and Evaluation, Jerusalem, Israel)

 

Testing people who are not native speakers of the test language is complicated, with implications for all stages of the assessment process, from test construction to scoring. It is essential to ensure that the assessment tools are fair to all test-takers, and as a result to each linguistic minority / non-native speaker subgroup. Quality Control procedures (QC) must to be applied at all stages of the assessment. This presentation will focus on several procedures related to the accuracy and stability of scale scores - final scores as well as subscores - for the total group and for the relevant subgroups over time. One such QC procedure involves reviewing central measures, passing rates and extreme scores for each group. Another type of procedure deals with analysing repeater data - examinees who take the test more than once - for each group. If the outcomes reveal suspicious findings, additional QC procedures may be employed.

Discussant

Thomas Eckes (TestDaF, Hagen, Germany)

 

 

 

Invited symposium 5 / Wednesday, 4th July / 11.00-12.30 / Room: Grote zaal

Network Psychometrics

 

Chair

Denny Borsboom (University of Amsterdam, The Netherlands)

 

Symposium abstract

Network analysis offers powerful tools to a) study the dynamics of interconnected systems, b) analyze the architecture of networks involving large numbers of entities (e.g., neurons, people, genes, variables), and c) visualize connectivity structures in such networks. This symposium illustrates network methodology with applications to clinical psychology.

 

Paper 1

Items, systems, and constructs

Denny Borsboom (University of Amsterdam, The Netherlands)

 

The treatment of test scores in psychology is usually predicated on the idea that test scores measure psychological properties. This idea underlies psychometric techniques from factor analysis to Item Response Theory. In psychopathology research, however, the assumption that symptoms (e.g., 'desire to gamble', 'debts') are measures of disorders (e.g., 'gambling disorder') is implausible. It is much more plausible to assume that psychopathology symptoms influence each other (e.g., desire to gamble -> gambling -> debts). A way of modeling such a situation is by assuming that disorders can be characterized as causal systems, in which the symptoms figure as components, so that disorders are networks that are constituted of symptoms. I will illustrate the power of network approaches by showing how they can accomodate findings from psychology, psychometrics, and behavior genetics.

 

Paper 2

The major depression network: key concepts and implications for diagnosis

Angélique O. J. Cramer (University of Amsterdam, The Netherlands)

 

From a network perspective, major depression (MD) is a network of symptoms that are directly connected: e.g., insomnia ® fatigue ® concentration problems. In line with this perspective, recent evidence shows that stressful life events impact individual symptoms differently (e.g., conflict triggers more depressed mood and thoughts of death than romantic loss). Intra-individual differences likely arise in how strongly certain symptoms are connected: Bob gets tired after two sleepless nights while Alice is not tired after five sleepless nights (i.e., the connection insomnia ® fatigue is stronger in Bob compared to Alice). As a result, some symptoms are more central than others: e.g., if Alice feels sad, this quickly triggers other symptoms while Bob rarely develops other symptoms when he feels sad (i.e., feeling blue is relatively peripheral in Bob's network and central in Alice's network). Implications for diagnosis (of at-risk individuals) are discussed.

 

Paper 3

Applications of visualizing test data through networks

Sacha Epskamp (University of Amsterdam, The Netherlands)

 

Networks can be used to visualize relationships between the items of a test. For instance, a correlation matrix can be represented as a network in which each variable is a node and each correlation an edge; by varying the width of the edges according to the strength of the association, the structure of the correlation matrix can be visualized. This innovative technique has many applications, such as allowing a researcher to detect complex structures in a dataset, validating the measurement model of a test and comparing individuals on differences in the correlation structure of repeated measures. In this talk, I will discuss how these methods can be used and interpreted with real data examples.

 

Paper 4

Classification and Comorbidity - Symptom vs System approach

Verena D. Schmittmann (University of Amsterdam, The Netherlands)

 

Mental disorders are highly comorbid: people classified as having one disorder are likely to have another as well. We explain empirical comorbidity patterns based on a network model of psychiatric symptoms, derived from an analysis of symptom overlap in the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV). Half of the symptoms in the DSM-IV network are connected, and distances between disorders in this structure predict empirical comorbidity rates. Network simulations of Major Depressive Episode and Generalized Anxiety Disorder show that the model faithfully reproduces empirical population statistics for these disorders. In the network view, mental disorders are inherently complex, and emerge from causal interrelations between symptoms. We outline a psychosystems approach to investigate and classify mental disorders according to the underlying system structure and system dynamics rather than the presence or absence of symptoms.

 

Discussant

John Hattie (University of Melbourne, Australia)

 

 

 

Invited symposium 6 / Wednesday, 4th July / 13.45-15.15 / Room: Mauritszaal

Combined emic-etic method in personality assessment: Cross-cultural relevance of CPAI-2

 

Chair

Fanny M. Cheung (Chinese University of Hong Kong, Hong Kong SAR, China)

 

Symposium abstract

The combined emic-etic method in personality assessment incorporates indigenous constructs to supplement universal personality factors to provide more culture-sensitive assessment of personality. The Cross-cultural (Chinese) Personality Assessment Inventory (CPAI-2) was developed using a combined emic-etic method. Originally designed for the Chinese cultural context, cross-cultural research using the CPAI-2 and its adolescent version, CPAI-A, in the United States, The Netherlands, and Romania illustrates the relevance of both the universal and indigenous personality dimensions in predicting vocational behavior and academic performance beyond the Chinese context. This symposium reports recent findings on the cross-cultural studies of the CPAI. The utility of the combined emic-etic method in cross-cultural personality assessment is discussed.

 

Paper 1

Personality, self-efficacy and vocational identity among American and Chinese adolescents

Federick T. L. Leong (Michigan State University, USA)

Sarah Wan (Chinese University of Hong Kong, Hong Kong SAR, China)

Weiqiao Fan (Shanghai Normal University, China)

Shu Fai Cheung (University of Macau, Macau SAR, China)

Fanny M. Cheung (Chinese University of Hong Kong, Hong Kong SAR, China)

 

This study assessed the effects of personality and career decision self-efficacy (CDSE) on the clarity of vocational identity among American (n = 218), Hong Kong (n = 544) and Shanghai (n = 324) high school students. Personality scales of CPAI-A were grouped into four factors (Social Potency, Dependability, Emotional Stability, and Interpersonal Relatedness) to measure both culture-general and culture-specific dimensions. In the final regression model, several culture-general personality scales predicted vocational identity. Particularly, Life Goal (from Dependability) was positively associated with identity in all cultural groups. In terms of Emotional Stability scales, Optimism predicted identity for both American and Hong Kong students. High Self-Acceptance also related to a clear identity in American students. Among culture-specific personality scales, Veraciousness (from Interpersonal Relatedness) was positively associated with identity in Hong Kong participants. Beyond the personality effects, subscales of CDSE explained vocational identity. Specifically, Self-Appraisal was related to a clear identity for Hong Kong students. Obtaining Occupational Information, however, had a negative association with identity in American students. Goal Selection was related to a clear identity for all cultural groups. Present findings highlight the importance of CDSE and the utility of CPAI-A for studying vocational identity in both American and Chinese adolescents. Implications of these findings will be discussed.

 

Paper 2

Prediction of career self-efficacy by personality and collective efficacy among American and Chinese adolescents

Sarah Wan (Chinese University of Hong Kong, Hong Kong SAR, China)

Fanny M. Cheung (Chinese University of Hong Kong, Hong Kong SAR, China)

 

Using the Cross-cultural (Chinese) Personality Assessment Inventory for Adolescents (CPAI-A; Cheung, Leung, & Cheung, 2005), this study investigated the roles of personality and collective career efficacy in predicting career decision self-efficacy (CDSE) among 544 Hong Kong Chinese, 324 Shanghai Chinese and 218 American high school students. Grouped into four factors (Social Potency, Dependability, Emotional Stability, and Interpersonal Relatedness), CPAI-A scales assess both culture-general and culture-specific personality dimensions. Among the culture-general scales, regression analyses indicate scales of Social Potency were useful in predicting CDSE among the two Chinese samples (e.g., Divergent Thinking). Life Goal (Dependability) related positively to CDSE in all cultural groups. Scales of Emotional Stability were associated with CDSE in the Hong Kong and the American samples only. Among the culture-specific scales, relational dimensions including Social Sensitivity (Social Potency) and Family Orientation (Interpersonal Relatedness) were associated with high CDSE in the American sample. Veraciousness (Interpersonal Relatedness) related negatively to CDSE among Hong Kong students. Beyond the effects of personality, collective career efficacy significantly predicted CDSE in the two Chinese samples but not the American sample. Present findings support the utility of the CPAI-A for studying the relationship between personality and CDSE in both American and Chinese adolescents.

 

Paper 3

Validity of CPAI-2 and Hexaco among Chinese-Dutch and indigenous Dutch students

Marise Ph. Born (Erasmus University Rotterdam and Free University, Amsterdam, The Netherlands)

 

Score differences on the CPAI-2 and Hexaco between 91 Chinese-Dutch and 363 indigenous Dutch students were investigated following recent awareness of the importance of trait specificity in personality measurement (e.g., Foldes et al., 2008). In contrast to narrow personality traits (facets), the use of broad personality traits (factors) to compare ethnic groups may mask ethnic differences. Ethnic score differences on broad and narrow CPAI-2 and Hexaco traits factors were compared. We expected ethnic score differences to be more visible on narrow traits. Further, the predictive validity of broad and narrow traits for self-reported academic performance (GPA, counterproductive academic behavior, and study involvement) was examined for both personality questionnaires. Conscientiousness- and integrity-related facets and factors were expected to be important predictors. We addressed the questions as to whether the relationship between personality and academic performance are similar among ethnic groups and in both personality questionnaires. The value of the CPAI-2 as an indigenously derived inventory in the Chinese cultural context is discussed in particular.

 

Paper 4

Incremental validity of CPAI-2 for Romanian and Chinese workers in Romania

Dragos Iliescu (National School of Political and Administrative Studies, Bucharest, Romania)

Dan Ispas (Illinois State University, Normal, Illinois, U.S.A.)

Alexandra Ilie (University of South Florida, Tampa, Florida, U.S.A.)

Andrei Ion (National School of Political and Administrative Studies, Bucharest, Romania)

 

The Cross-Cultural (Chinese) Personality Assessment Inventory-2 was administered in Romania on a sample of 439 workers employed in Romania by a Romanian textile producer: with 121 males and 318 females; 186 are Romanian workers and 253 are Chinese workers. Ages range from 20 to 52 years (M = 36.9). Together with the CPAI-2, the participants also completed the General Adult Mental Ability (GAMA, Naglieri & Bardos, 1997), a non-verbal test of cognitive ability. Job performance ratings were collected from company records. Results show that the CPAI-2 has incremental validity beyond cognitive ability scores for the prediction of job performance. Both the total variance predicted and the incremental validity of CPAI-2 traits are larger for Chinese working in Romania than for Romanian workers. Emic scales in the CPAI-2 are strongly predictive for the performance of Chinese workers, even more so than some of the established etic personality traits (e.g. Conscientiousness). Emic scales in the CPAI-2 are not predictive for the performance of Romanian workers. The CPAI-2 shows an interesting picture of differential prediction by culture, being a valid predictor of work performance with its etic scales and contributing for specific populations supplementary predictive variance with targeted emic scales.

 

Discussant

Robert Roe (Maastricht University, The Netherlands)

 

 

 

Invited symposium 7 / Wednesday, 4th July / 15.45-17.15 / Room: Grote zaal

Advances in Psychometrics-Theory, Methods, and Practices

 

Organizer

Ronald K. Hambleton (University of Massachusetts, Amherst, USA)

Chairperson

April Zenisky (University of Massachusetts, Amherst, USA)

Symposium abstract

The field of psychometric methods is booming: At every level--school, district, region, national, and international-there is an ever increasing focus on technically sound and valid educational and psychological assessments. Psychometric theory, methods, and practices are developing fast and the potential impact of these advances is great. The increasing role of the computer for item selection, item administration and scoring, sometimes the complexity of the advances, wider application of new models, and the fact that advances are coming from all over the world has created a situation where it is difficult even for the most advanced scholars, to keep up with the progress being made. In this invited symposium, the contributors are going to take on four big topics: The impact of item response modeling on psychological testing, test assembly with new techniques, model building, and the communication of test information to practitioners. Each topic is big and critically important to the advancement of technically sound and valid assessments for decision making in psychology and education.

 

Paper 1

Item Response Theory for Psychological Assessment

Paul De Boeck (University of Amsterdam, Netherlands)

The primary domains of psychological assessment are intelligence, personality, and clinical, but, for none of these domains has item response theory (IRT) become a popular approach, in contract with its popularity for educational assessment. First, a brief overview will be given of IRT applications for psychological testing and what the underlying motivation is. Second, the potential of IRT will be discussed. The traditional view is that IRT can improve the quality of measurement and creates the possibility of adaptive testing. I will rather focus on how test design and item design combined with IRT offers ways of investigating and improving the internal validity of tests and how this in turns opens perspectives for more specific diagnosis (e.g., cognitive diagnosis) and in some cases also for a criterion-referenced approach. These ideas will be illustrated with examples of applications in the cognitive domain. Third, issues will be discussed, explaining why IRT often has only a minor surplus value for measurement, why building item banks is too difficult for some domains of assessment, and how traditional concepts of internal validity and of norming can make it difficult to make further progress.

 

Paper 2

A Universal Automated Test Assembly Engine

Wim J. van der Linden (CTB / McGraw-Hill, USA)

Qi Diao (CTB/McGraw-Hill, USA)

 

The current practice of educational and psychological testing shows a large variety of testing formats, ranging from group-based, fixed-form testing to individualized, fully adaptive formats. The goal of this presentation is to introduce an integrated automated test assembly (ATA) engine that enables us to assembly tests with any of the current fixed or adaptive formats that meet a given set of content specifications. The practical relevance of the engine is its support of at-the-push-of-a-button assembly of test forms with different formats while maintaining score comparability across them. In addition, the underlying methodology allows us to evaluate the relative efficiency of different testing formats against each other. We present empirical results illustrating the efficiency of different adaptive formats (including fully adaptive, multi-stage, linear on the fly formats) for a few given sets of content specifications.

 

Paper 3

New Psychometric Methods to Determine the Relative Importance of Explanatory or Predictor Variables in Assessment and Validation Studies

Bruno D. Zumbo (University of British Columbia, Canada)

 

The use of latent variable regression models presents one of the most exciting new developments in psychometrics with implications for assessment research and validation studies in particular. Common research questions using regression with latent or observed variables are: which of the variables is most predictive of the criterion measure, and which variables are most explanatory of our test score variation? These questions reflect two different, yet common, uses of regression. A new class of statistical methods developed by the author for latent variable models that allow one to answer these questions more precisely and informatively is described using examples with international assessment data.

 

Paper 4

Making test score scales and scores more meaningful to users

Ronald Hambleton (University of Massachusetts, USA)

April Zenisky (University of Massachusetts, USA)

 

Testing practices in education and psychology have advanced considerably in recent years through the introduction of item response theory models, generalizability theory, automated test assembly, and new test designs such as computer-adaptive testing. What has not kept pace is the technology for making score scales and score reports more understandable and meaningful for users. This is unfortunate because of the large amount of evidence suggesting that candidates and other score users such as teachers, policy-makers, psychologists, and the media, are often confused by test scores. The goals of this presentation include: (1) describing promising ideas for increasing the utility of score scales, (2) considering ways for communicating the concept of score imprecision, (3) addressing the problem of subtest reporting and reliability, and (4) offering guidelines for score report design based on our recent research.

 

Discussant

Ronald K. Hambleton (University of Massachusetts, Amherst, USA)

 

 

Invited symposium 8 / Thursday, 5th July / 08.30-10.00 / Room: Mauritszaal

Attitude of psychologists on tests and testing: The results of an international survey

 

Chair

Arne Evers (University of Amsterdam, The Netherlands)

 

Symposium Abstract

In this symposium the results of the EFPA / ITC survey on test use and test attitudes of psychologists are presented. The first two presentations concern the results of the EFPA / ITC survey in two particular countries: Brazil and Israel. In addition, specific problems relating to test use in these countries will be addressed. The other two presentations will deal with the results in a more global perspective. In the third presentation the universality of the factor structure of the attitude items will be tested and a multilevel approach will be used to investigate the influence of gender, field of specialisation and country on attitude scores. In the final presentation differences in test use and test attitudes will be highlighted and consequences will be discussed.

 

Paper 1

Brazilian psychologists testing attitudes and practices

Solange Muglia Wechsler (Pontifical Catholic University of Campinas, Brazil)

 

In Brazil the use of tests is restricted to professionals who hold a psychologist's degree. A psychologist's degree can only be acquired by the completion of five years of undergraduate study in psychology courses. As one can get a master's degree in psychology by following different tracks, not all psychology masters will get a psychologist's degree, and therefore not all psychology masters are allowed to use tests. A regulation passed by the Federal Council of Psychology in 2003 requires psychologists to use only tests of which the technical qualities (validity, reliability, norms) are approved by an commission of experts. Foreign tests have to be validated for use in Brazil. The results of the EFPA / ITC survey among 70 psychologists demonstrated that this regulation had a very positive impact on the number of tests constructed and validated over the last years. National conferences hold by the Brazilian Institute of Psychological Assessment (founded in 1997) have been attracting more than 1,000 participants. Nowadays, in contrast to the previous decade, psychologists with diverse specializations show a positive attitude toward test use and test relevance.

 

Paper 2

Psychological testing in Israel: New challenges, old issues

Saul Fine (Midot, Ltd., and the University of Haifa, Israel)

Dennis Bernstein (PsychTech, Israel)

 

Psychological tests are used extensively in Israel in clinical, educational and organizational settings. As such, Israeli psychologists are trained in the proper use of testing during their graduate studies and internships, as part of their requirements for professional licensure. However, unlike in some European countries and the US: No specific certifications are offered to Israeli psychologists in the area of testing; Israeli psychologists are not obligated to pursue continued professional education programs; and the tests published in Israel are seldom reviewed independently for local usability. Overall, the relatively loose controls over testing in Israel may raise concerns regarding testing practices in the country. This study surveyed a representative sample of Israeli psychologists (N = 338), working in a variety of settings, regarding psychological testing in Israel. The results provide initial evidence highlighting some of the challenges that should be addressed. Specifically, the findings confirm the wide use and high value of testing among Israeli psychologists, but indicate that a greater level of professional supervision is still required to better ensure the competence of test users and the quality of tests being used.

 

Paper 3

A multilevel approach to the EFPA / ITC questionnaire on test attitudes

Carina M. McCormick (Buros Center for Testing, University of Nebraska-Lincoln, USA)

Leslie H. Shaw (Buros Center for Testing, University of Nebraska-Lincoln, USA)

Arne Evers (University of Amsterdam, The Netherlands)

Kurt F. Geisinger (Buros Center for Testing, University of Nebraska-Lincoln, USA)

 

The EFPA / ITC Questionnaire on Test Attitudes has been administered to psychologists in more than 20 countries. Because education, restrictions on test use, testing practices, etc. may differ in these countries, and because these differences may influence test attitudes, the assumption of independent observations is not met. For the analyses a multi-level approach will be used to quantify and account for country-level dependency, thus avoiding the possibility of arriving at inaccurate results through more traditional procedures that ignore the dependency. First, a multilevel confirmatory factor analysis will be completed using Mplus 6.1 software, based on the structure suggested by previous principal components analysis. The analysis will take into account the complex sampling structure of the data in creating and evaluating the model. Next, multilevel modeling in SAS 9.2 software will be used to model the respondent subscores, including country as a random effect. First, the intraclass correlation for each subscore will be calculated to determine the proportion of variance in subscores between countries, compared to within countries. Then fixed effects of gender and specialization will be added to estimate the effect of these variables on subscores, while taking into account country-level dependency.

 

Paper 4

Attitude of psychologists on tests and testing: Global results

Arne Evers (University of Amsterdam, The Netherlands)

José Muñiz (University of Oviedo, Spain)

Dave Bartram (SHL Group Ltd., UK)

and 22 co-authors

 

In order to take the right actions aimed at improving the quality of tests and test use, it is essential to know the opinion of psychologists about tests and testing practices. Therefore, in 1999 the Committee on Tests and Testing of the EFPA undertook a survey on the test use and the attitude towards tests and testing among psychologists in six European countries. In 2009 this survey was repeated in 17 European countries (including the six countries of the first survey). The questionnaire was extended with items concerning computerized tests and testing by Internet. In cooperation with the ITC the survey was also carried out in countries outside Europe. These data were gathered in 2010, 2011 and 2012. In this presentation the results of the 1999- and 2009-surveys will be compared and the differences among the participating countries in and outside Europe of the second survey will be discussed. Some possible actions for improving testing practices will be suggested.

 

Discussant

José Muñiz (University of Oviedo, Spain)

 

 

 

Invited symposium 9 / Thursday, 5th July / 10.30-12.00 / Room: Grote zaal

Psychometric modelling and the management of response biases in questionnaire-based assessment

 

Chair

Anna Brown (University of Cambridge, UK)

 

Symposium abstract

Asking people simple questions about themselves or other people and things is by far the most popular way of gathering data because it is cheap; however, such data is commonly affected by conscious and unconscious response distortions. Examples include individual styles in using rating options, misreports to reversed items, tendency to present self in positive light, halo / horn effects, etc. The extent to which respondents engage in such behaviours varies, which may alter their true ordering on the trait of interest. Response distortions, therefore, are a great concern for validity of assessments in personality, social attitudes, and all other areas relying on respondent-reported measures.

This symposium brings together research looking at biasing factors evoked by responding to questionnaire items with different features and in different contexts. Such studies require going beyond the basic CFA or IRT models, which make unrealistic assumptions such as that item parameters are fixed across respondents.

 

Paper 1

Modelling processes of responding to reversed Likert items

Luis Eduardo Garrido (Autonomous University of Madrid, Spain)

Balanced scales have been recommended since the inception of Likert scales. More recently, however, researchers have identified several problems with reversed items, such as distortion of the factorial structure, confusion of the respondents and misresponse. In the current study we examine the prevalence and effects of misresponse to reversed items using factor mixture analysis (FMA; Muthen, 2008).

A Service Quality instrument, measuring three different facets of satisfaction with 21 positively and 7 negatively phrased items, was administered to 2,031 library users from a Spanish university. Mixture analysis with two latent classes was applied to a confirmatory model with 3 factors underlying the Service Quality facets, and a 4th "method" factor indicated by the 7 reversed items. The first class (73% of respondents) showed an acceptable fit to the three-factor model. The second class (27%) showed an acceptable fit only when the method factor was included. This "misresponse" class had higher response latency (p < 0.01). These findings confirm the "item verification difficulty" theory (Swain et al., 2008), which attributes the problems with reversed items to increased cognitive difficulty involved in responding to them accurately using Likert scales. Based on these results, we discourage the use of reversed items in questionnaires.

 

Paper 2

The effect of labelling and numbering of response scales on the likelihood of response bias

Natalia Kieruj (CentERdata, Tilburg, The Netherlands)

Extreme response style (ERS) and acquiescence response style (ARS) are among the most encountered problems in attitudinal research. We investigate whether response bias caused by these response styles vary with three aspects of question format, namely full versus end-labeling, numbering answering categories and bipolar versus agreement response scales. A questionnaire was distributed to a random sample of 5,351 panel members from the LISS household panel and respondents were assigned to one of five treatments with differing scale formats. We apply a latent class factor model that allows for diagnosing and correcting for ERS and ARS simultaneously.

Results show clearly that both response styles are present in our dataset, but ARS is less pronounced than ERS. With regard to format effects it is found that end-labeling evokes more ERS than full labeling, and that bipolar scales evoke more ERS than agreement style scales. Format also affected the form of ERS, i.e. when full labeling was used, exerting ERS contrasts with opting for middle response categories, whereas when end-labeling was used, exerting ERS contrasts with opting for either one of the non-extreme response categories. ARS did not significantly differ depending on test conditions.

 

Paper 3

The general factor of personality or social desirability artefact? Assessment across different age cohorts and settings

Matthias Ziegler (Humboldt University of Berlin, Germany)

 

During the last few years, personality researchers have debated the existence of the General Factor of Personality (GFP) as the apex of personality. Empirical evidence is mixed with some studies supporting the existence of the GFP and others supporting the view of the GFP as the impact of socially desirable responding. The latter point of view is based on the idea that social desirable responding increases the correlations between distorted domains yielding a more saturated GFP. The present study compared two large samples which both had been administered the NEO-PI-R. One sample was collected during real applicant situations (n1 = 3,360) and contains data from applicants aged 15 to 20. Different factor solutions were tested and compared across all age groups. These findings were compared with the matching groups from the German NEO-PI-R norm sample (n2 = 2,166) using multiple group structural equation modelling. Moreover, mixture models were used to investigate whether the combined sample would yield distinct classes of respondents. Again, the structure of the GFP was compared in the resulting classes. All findings are discussed in the light of GFP and social desirability theory. Moreover, practical implications will be discussed.

 

Paper 4

Modelling impression management in high-stakes personality assessments

Anna Brown (University of Cambridge, UK)

Test users have been concerned about the use of self-report questionnaires in high stakes because these tools are open to intentional manipulations (impression management, faking) by respondents. Indeed, there is considerable evidence of distortions to the score distributions, factorial structure, and validity of scales in high stakes assessments as compared to low stakes. The task of remedying this situation - either by developing measures that are less open to motivated distortion, or by making statistical adjustments to the results - cannot begin to be addressed without understanding the response process that test takers go through in high stakes assessments.

The objective of the present research is to summarise common psychometric features of all high stakes data and discuss necessary properties of decision processes that would lead to such phenomena. Several such necessary properties are outlined, including varying degrees of distortion among respondents, the contextual nature of such variations, and the moderating effect of these variations on the relationship between the measured traits and the observed item responses. A plausible response process is suggested, which involves trade-offs between the 'honest' response and a response with acceptable desirability. Following Kuncel and Tellegen (2009), relationship between response options and their perceived desirability are assumed nonlinear.

 

Discussant

Dave Bartram (SHL Group Ltd., UK)

 

 

 

Important Dates and Deadlines

Conference Dates:

July 3-5, 2012

July 2, 2012 (Pre-Conference Workshops)

 

Deadlines:

Submissions are now closed since 20 January 2012

Early bird registration has been closed on 15 April 2012

 

Second announcement of conference:

Download 2nd Announcement 8th Conference of the ITC

 

_____________________________________

 

DIAMOND SPONSORS:

 

GMAC

 

NIP

 

SHL

 

_____________________________________

 

PLATINUM SPONSORS:

 

BPS

 

BUROS

 

Thomas