The widespread deployment of risk assessment instruments (RAI) within the criminal justice system has renewed longstanding concerns about algorithmic bias in the administration of justice. Recent studies have proposed tests for the presence of algorithmic bias, but no clear consensus has emerged on which of these tests, if any, adequately assess or address the problem of RAI trained on data of unknown accuracy. The present paper compares the performance of risk prediction algorithms trained and tested on conventional law enforcement data to the performance of algorithms trained on ground-truth measures of self-reported criminal involvement. Comparing the predictions from these models we find limited support for the ability of conventional RAI to predict self-reported criminal offending. These results suggest that algorithms trained on conventional law enforcement data are not fit for purpose; they cannot reliably distinguish individuals with substantially different levels of self-report offending risk nor can they group individuals with similar levels of self-reported offending risk. If replicated, this finding would indicate that conventionally trained RAIs lack criterion validity and should not be used to inform public safety risks of related criminal justice resource allocation decisions.