This research initially set out to investigate comparability across three of the eleven language versions of the PIRLS 2016 reading test in South Africa. The original aim was to understand if the test presented a comparable level of difficulty for students taking the test in different languages using an Item Response Theory (IRT) based approach. However, initial analyses undertaken to establish model fit and check assumptions before conducting the planned IRT analysis exposed issues that went beyond comparability. This presentation discusses the challenges faced when ‘traditional’ model fit could not be established, the consequences of using data that deviates from expected patterns, and the next steps taken in this investigation of PIRLS 2016 in South Africa.