The Five Kinds of Model Routers That Cut AI Costs
Before we get to today’s agenda, check out Phoebe’s report Wednesday night that Nvidia has launched a new program to financially backstop customers’ purchases of its AI chips. It’s yet another way Nvidia is leveraging its powerful balance sheet to keep the GPU data center party going.
On to the column…
As more companies reassess rising prices for advanced AI models and tokenmaxxing among their employees, model routers are having a moment in the sun.
Rather than relying on users to manually select a (possibly pricey) model to answer their questions or write their code, routers aim to pick the right model for the job. These routers take different forms—standalone products, features from cloud-computing providers or even a DIY app made by company IT departments—but they are all gaining attention as a way to save money on AI services without losing too much quality.
Basic chores like summarizing emails or searching through documents can often run on open source models or old proprietary models for a fraction of the cost of cutting-edge models, for instance. Firms including Snowflake and Palo Alto Networks have told us they found cost savings by swapping in cheaper models for certain tasks.