A fast parallel implementation of CTC (Connectionist Temporal Classification), on both CPU and GPU. CTC is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels. For example, CTC can be used to train end-to-end systems for speech recognition, which is how we have been using it at Baidu's Silicon Valley AI Lab. WARP-CTC functionality in Deep Speech 2 has been shown to allow speech recognition in English and Mandarin, and in some cases the program is better at speech recognition than humans.