Return to Research

Featured research in the ELI

Hitoshi Nishizawa, PhD candidate in Second Language Studies

Hitoshi’s recent research paper entitled  Construct validity and fairness of an operational listening test with World Englishes has been published in the prestigious journal Language Testing. His findings support the validity of the language placement test and offer support for a variety of Englishes in listening tests. 

For more details, see his abstract or the explanation below.

Nishizawa, H. (2023). Construct validity and fairness of an operational listening test with World Englishes. Language Testing.

From the author: In this study, I investigated the construct validity and fairness of the listening section of the English Language Institute Placement Test, using confirmatory factor analysis and Rasch-based differential item functioning analysis.

Construct validity concerns the interpretation of the test score. If confirmatory factor analysis found different factors, separate score reporting is recommended rather than simply reporting a total sum score. Many language tests report subscores for each skill (e.g., listening and reading), and some people are better at one of the skills than others. Confirmatory factor analysis statistically examines the underlying factors of tests. In this study, I was interested if the use of English varieties influences the underlying factor structure of the test because the placement test uses some English varieties that are considered to be less familiar to the test takers.

As for fairness, I used Rasch-based differential item functioning analysis to examine any differences in the likelihood of correct responses among test takers as a function of their country of origin. If there are systematic and statistically significant differences, this raises a potential fairness issue that the test treats test takers unequally. For instance, for a certain English variety, Japanese test takers might score higher than Korean test takers.
The fairness issue was further investigated in relation to item types. I coded the test items either as narrow items or broad items. Narrow items are defined as the items that target a narrow level of understanding (e.g., listening for details), while broad items are defined as the ones that target a broad level of understanding (e.g., making an inference, getting the main idea).

The results showed that the placement test had only one underlying factor, meaning that separate sub-score reporting is unwarranted based on English varieties. As for fairness, there are many items that advantaged or disadvantaged a certain test taker group. Yet, at the test level, the difference (or unfair treatment) was negligible because it was not likely to influence the raw score. These findings support the validity of the placement test.