March 28, 2025
Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints
Whisper is one of the best open source speech recognition models and definitely the one most widely used. Hugging Face Inference Endpoints make it very easy to deploy any Whisper model out of the box. However, if you’d like to introduce additional features, like a diarization pipeline to identify speakers, or assisted generation for speculative decoding, things get trickier. The reason is that you need to combine Whisper with additional models, while still exposing a single API endpoint. We’ll solve this challenge using a custom inference handler, which will implement the Automatic Speech Recogniton (ASR) and Diarization pipeline on Inference