Using item response theory to evaluate a test


The Boiler House

View slides

Item response theory (IRT) is an approach used in psychometrics to analyse test instruments, and it can be a helpful tool to evaluate the performance of individual questions on a test. IRT is built on the assumption that each student taking the test has a certain “ability”, measured on a continuous scale, which is being assessed by the items on the test. The response pattern for each question can then be modelled, using data from students’ attempts at the test, and the output gives useful information about the individual questions (e.g. whether they were found too easy/difficult).

While IRT is commonly used in research design and in evaluating high-stakes standardised tests, it is unusual to apply it routinely in university teaching, even though the tools are readily available. We will give an example of IRT being applied as part of a project to evaluate and improve the Mathematics Diagnostic Test used at the University of Edinburgh, which is taken by around 1000 new students each year. The findings helped to inform changes to the test while it was being re-implemented in STACK in September 2017, so we will be able to compare the performance of the old and new versions of the test.