Exclusive: Anthropic in Talks With Samsung to Manufacture Custom AI Chip Save 25% to unlock this story

Sign in
Subscribe

    Data Tools

    • About Pro
    • The Executives Leading the Data Center Race
    • The Next GPs 2026
    • The Next GPs 2025
    • The Rising Stars of AI Research
    • Leaders of the AI Shopping Revolution
    • Enterprise Software Startup Takeover List
    • Org Charts
    • The Information 50 2025
    • Generative AI Takeover List
    • Generative AI Database
    • AI Chip Database
    • AI Data Center Database
    • Tech IPO Tracker
    • Tech Sentiment Tracker
    • Gigafactory Database

    Special Projects

    • The Information 50 Database
    • VC Diversity Index
    • Enterprise Tech Powerlist
  • Org Charts
  • Deep Research
  • Tech
  • Finance
  • Weekend
  • Charts
  • Events
  • TITV
    • Directory

      Search, find and engage with others who are serious about tech and business.

    • Forum

      Follow and be a part of discussions about tech, finance and media.

    • Brand Partnerships

      Premium advertising opportunities for brands

    • Group Subscriptions

      Team access to our exclusive tech news

    • Newsletters

      Journalists who break and shape the news, in your inbox

    • Video

      Catch up on conversations with global leaders in tech, media and finance

    • Partner Content

      Explore our recent partner collaborations

      XFacebookLinkedInThreadsInstagram
    • Help & Support
    • RSS Feed
    • Careers
    Sign in
  • About Pro
  • The Executives Leading the Data Center Race
  • The Next GPs 2026
  • The Next GPs 2025
  • The Rising Stars of AI Research
  • Leaders of the AI Shopping Revolution
  • Enterprise Software Startup Takeover List
  • Org Charts
  • The Information 50 2025
  • Generative AI Takeover List
  • Generative AI Database
  • AI Chip Database
  • AI Data Center Database
  • Tech IPO Tracker
  • Tech Sentiment Tracker
  • Gigafactory Database

SPECIAL PROJECTS

  • The Information 50 Database
  • VC Diversity Index
  • Enterprise Tech Powerlist
Deep Research
TITV
Tech
Finance
Weekend
Charts
Events
Newsletters
  • Directory

    Search, find and engage with others who are serious about tech and business.

  • Forum

    Follow and be a part of discussions about tech, finance and media.

  • Brand Partnerships

    Premium advertising opportunities for brands

  • Group Subscriptions

    Team access to our exclusive tech news

  • Newsletters

    Journalists who break and shape the news, in your inbox

  • Video

    Catch up on conversations with global leaders in tech, media and finance

  • Partner Content

    Explore our recent partner collaborations

Subscribe
  • Sign in
  • Search
  • Opinion
  • Venture Capital
  • Artificial Intelligence
  • Startups
  • Market Research
    XFacebookLinkedInThreadsInstagram
  • Help & Support
  • RSS Feed
  • Careers

In-depth insights in seconds. Ask Deep Research.

AI Agenda

OpenAI Discovers New Way to Cut Inference Costs in Half

Art by Clark Miller.
By
Stephanie Palazzolo
[email protected]Profile and archive

We closely track efforts by Anthropic, Google and OpenAI to get access to more server chips to run their models. But we don’t talk enough about the work these companies are doing to get more juice from the servers they already have.

In one previously unreported example, OpenAI engineers earlier this month told some colleagues they had figured out a way to more than halve the cost of inference, or running existing models, thanks to some newly-discovered optimizations, according to a person with knowledge of those discussions.

When the engineers applied the new techniques to power ChatGPT for visitors who didn’t have a free or paid account, it reduced the number of Nvidia graphics processing units needed at one point to just a couple hundred—a shockingly small number. (That said, OpenAI likely doesn’t get much ChatGPT usage from such users, as the company limits how much they can use the chatbot that way.)

It isn’t clear what OpenAI did to get its latest efficiency gains, which might include techniques such as quantization; key value-caching, or helping the model remember information from prior calculations it made so it doesn’t need to repeat the work; sending queries to be answered in batches rather than one by one; and routing some queries to models or parts of models that require less power to answer them.

Recommended