AI Intelligence

AutoTTS Cuts LLM Token Use by 69.5% Through Automated Reasoning Strategies

By Julian Reed • Published: May 29, 2026 • 2 MIN READ

2 Min Read

AutoTTS automates test‑time scaling

AutoTTS is a new framework that lets large language models allocate extra compute at inference without hand‑crafted rules. By treating strategy design as a search problem, the system explores thousands of width‑depth policies in an offline replay of pre‑generated reasoning traces.

How the framework slashes token costs

The discovered Confidence Momentum Controller monitors an exponential moving average of confidence, couples branch widening with depth probing, and reallocates budget toward branches that agree with the leading answer. In benchmark trials on Qwen‑3 models, the approach achieved a ↑ 69.5% reduction in token consumption while keeping accuracy flat.

“The automation removes the guesswork that has limited test‑time scaling for years,” a researcher noted.

Experiments spanned math challenges such as AIME‑24, AIME‑25, HMMT‑25 and the GPQA‑Diamond reasoning set. Compared with traditional Self‑Consistency (64 paths), Adaptive‑Consistency and Parallel‑Probe, AutoTTS either matched or outperformed accuracy, and in five of eight cases set new performance peaks.

The entire discovery loop ran in under three hours at a cost of $39.90, thanks to the offline replay environment. Enterprises can now generate custom controllers for proprietary models without a dedicated research budget.

Must Read Intel AI rivalry Elon Musk and Sam Altman: SpaceXAI vs OpenAI battle for the AI throne

For further reading on LLM scaling trends see Reuters or Bloomberg.

Analysis by: Julian Reed
Consumer Electronics Expert

Analysis By Julian Reed

Senior Intel Analyst & Contributing Editor. Focused on deep-tier geopolitical and market strategies.