March 15, 2025
How do I speed up my agent?
I get this question a bunch. Developers generally first spend time getting the agent to work, but then they turn their attention to speed and cost. There are few things we see developers doing: Identifying where the latency is coming from Changing the UX to reduce the “perceived” latency Making fewer LLM calls Speeding up LLM calls Making LLM calls in parallel Identifying where the latency is coming from This may sound basic, but how you approach reducing latency will depend entirely on your specific bottleneck. Is the latency coming from one big LLM call, or from multiple small ones