The LLM Paradox: High Expectations Coupled With Lack of Trust

Some call them plagiarism bots, while others see them as a new capital asset that will change economies and societies. Some call them stupid, while others believe it’s stupid not to recognize their full potential. So what’s the right way to think about large language models?

The Information asked its readers how they view LLMs, or computational models driven by machine-learning algorithms. The report below summarizes the results of a survey of 242 readers of The Information (see “Methodology” at the end of the report), augmented by anonymous comments from readers. The report also includes thoughts from founders of technology companies that are involved with LLMs.

The report covers these questions:

What will be the true impact of LLMs?
How widely will they be used?
How soon will they be rolled out?
How can the industry increase the accuracy of LLM-generated outcomes?

The True Impact of LLMs: A Case of FOMO or a Game-Changing Technology?

Enthusiasm prevails: Two-thirds of The Information’s survey respondents believe LLMs are a game-changing technology, while the rest say they will not live up to their hype.

The different outlooks for LLMs are illustrated by these opinions shared by The Information’s survey respondents. An enthusiast says: “LLMs represent a fundamental paradigm shift in technology. The hype will die down, but what will remain will be the fundamental shift in how work is done in every job—like with the arrival of the fax machine or the internet.”

A naysayer, on the other hand, sees LLMs as a fad: “Definitely a lot of FOMO-driven LLM projects right now—hopefully the dust will settle in six months and people will again look beyond LLMs at many of the more efficient, practical and cheaper [machine-learning] techniques.”

The dissonance between the enthusiasts and naysayers about the impact of LLMs may boil down to different perspectives—looking at what LLMs are now in the current ecosystem (naysayers), versus imagining them in the future as part of a new technology landscape (enthusiasts).

One of The Information’s survey respondents takes a forward-looking view of LLMs: “It feels like other fundamental technology shifts—disappointing in the near term as the technology is applied to existing applications, and profound in the long term as new applications and use cases are developed that are founded and developed for the new technology.”

The Scope of Use: Unlocking the Full Potential

How widely used will LLMs be? Most of The Information’s survey respondents see LLMs as useful enterprise wide and serving multiple goals. The remaining third see them as limited mostly to specific functions and tasks.

The survey reveals that the top tasks for which LLMs are currently best suited for are customer service (chatbots, 24/7 customer support), 69%; document review (due diligence, contract analysis, financial analysis and reporting), 60%; and personalized content creation, 54%.

More than half of respondents (56%) say companies currently use LLMs for software engineering, which is “one of the earliest successful use cases for LLMs,” says Morgante Pell, co-founder and CEO of Grit. His company offers an AI agent that can create, modify, test and deploy code. Grit has helped companies upgrade software to deal with a security vulnerability or convert to a more scalable database. The efficiency gains that come with Grit’s AI transform what would typically be a team effort to an individual one.

“Software engineering is the best place to start with automation, because software development can be very arduous. Everyone is happy to offload it to AI,” says Pell.

Content generation, on the other hand, is one of those areas where people want to play an active role and may be more resistant to adopting LLMs. Content is the domain of Narratize, an AI co-author that supports product innovation, development and marketing teams with creating the right technical story—all the way from the seed of an idea through consumer messaging.

With Narratize, human marketers are not cut out of the loop, says Katie Trauth Taylor, co-founder and CEO. Just the opposite: Narratize prompts a human with questions like: What’s the big idea you’re trying to share? “The AI sits with you like [a] great colleague would, and it just asks you one strategic question after another to help pool your best ideas,” says Taylor. One feature, the story infuser, further fosters collaboration by allowing users to send prompts to contributors, who can infuse the storytelling with their ideas.

What results will LLMs be able to deliver? Some survey respondents see LLMs’ capabilities as very narrow, limited to tasks within processes rather than achieving fully fledged business goals. Says one respondent: “They [LLMs] are great for text analysis and generation, but they can’t accurately and effectively perform tasks that add value to a company. They are lacking and unreliable when it comes to end-to-end work.”

Another respondent argued that the only way to justify investment in LLMs is to create very specific metrics: “Narrow definition of purpose and tasks, narrow parameters of operation, narrow focus of application are necessary both to determine up-front investment required and how to evaluate return on that investment. Without focus, LLMs cannot be measurable, manageable and profitable.”

Razi Raziuddin, co-founder and CEO of FeatureByte, thinks about LLM use cases in terms of not just the narrow end-to-end result but the real-life impact. FeatureByte offers AI data a copilot, which automates the end-to-end process of doing data prep and pipeline deployment. Automation reduces that process by a factor of 50, estimates Razi, often from months or weeks to hours or minutes. FeatureByte works with companies in multiple industries, including media and telecommunication, life sciences, financial services and healthcare, helping them better understand their customers.

No industry can offer a clearer delineation of real-life impacts than healthcare, where a machine-driven mistake can lead to a misdiagnosis or even death. Raziuddin believes LLMs are not yet ready to help doctors diagnose patients but are very useful for appointment scheduling, scheduling the staffing of the operating rooms or identifying the right patients for clinical trials.

Some survey respondents encourage evaluating the role of LLMs in achieving ultimate business goals instead of taking a narrow view. “It’s the business models behind the LLMs, stupid!” says one reader, while another sees a narrow approach as a missed opportunity: “Locking everyone’s thoughts into the idea that LLMs are the chatbot and can only be utilized through natural language is like mistaking a monitor for the computer.”

The Timeline: A Sprint or a Marathon?

Excitement about LLMs is running high, but not for everyone. Most companies are taking a wait-and-see approach.

“As with all previous enterprise technologies, companies will be slow to adopt,” says one reader. “Early adopters in the U.S. will be the same type of companies: VC-backed, tech-focused on the West Coast and financial services on the East Coast. The rest will adopt it two to three years later as it becomes more mainstream, proven and cost-effective.”

Among the fast crowd, the race to succeed is proceeding at breakneck speed. “Those without a strategy going forward will fall to those that implement first,” says one reader. “This is an incredibly disruptive technology, moving faster than anything I have ever seen. This is a land grab, a gold rush. Those at the head will reap the rewards, those at the tail will get crumbs. Nvidia is selling picks and shovels to the gold rush.”

The Information’s survey further segments companies by the maturity stages they are at in terms of LLMs: Preliminary, Active or Advanced (see “Chart 4: High Expectations”).

Today, more than half (57%) are at the preliminary stage—either planning whether to implement LLMs or exploring their potential benefits but not yet developing them. But within 18 months, the number of those at the preliminary stage will drop to less than a quarter of companies (23%), with more companies pulling the trigger on developing, implementing and testing, and moving to the active stage (42%). What’s even more impressive is that the number of companies that are already seeing business value from LLMs should triple, from 10% today to 35% in 18 months.

The Trust Issue: In Search of the Truth

The crux of the issue with LLMs is trust. The biggest group of respondents (87%) points to verifying the accuracy of machine-generated outcomes as a top challenge. Alarmingly, only 10% of respondents believe companies can manage this challenge.

“If I can’t trust the LLMs’ output, they are useless to me,” says one reader. Others agree: “[LLM-driven generative AI] creates realistic-looking footnotes that are completely made up and descriptions that are well written but either wrong or entirely fabricated. It’s good at words and syntax, bad at truth.”

Narratize’s Taylor makes sure such criticisms do not apply to her company’s AI co-author. Narratize feeds off domain-specific knowledge that has been validated and peer-reviewed. Additionally, the research hub feature allows users to ask the AI any research question and distill key insights across multiple publications to find gaps and opportunities within existing markets or research areas. The information comes with a source for further assurance of credibility.

Narratize securely incorporates an enterprise’s existing structured or unstructured data and brand guidelines, so that the content it generates is based on an understanding of the user’s industry, company, products and consumers.

The veracity of LLMs’ outcomes depends on what data LLMs are trained on. Getting clean and relevant data for model training is the second-biggest challenge with succeeding at LLMs (65%). And here also, the capabilities for obtaining that data are mediocre, with just 9% of corporations able to handle this challenge well or very well.

“Trust is critical,” says Grit’s Pell. “We focus on earning and maintaining customer trust by relying upon a lot of software engineering work to build tooling that can provide verifiable correctness.” He adds that the risk with writing code is less than the risk with other LLM uses, because there can be strong validation and testing of the code.

The way to increase trust in LLMs is to first focus on the data they are fed. “Corporations are overly concerned with hardware and models, where they should be working to reinvent data ecosystems to ensure they can effectively integrate [generative AI] into workflows and enhance models with accurate and, when necessary, real-time data,” says one reader.

FeatureByte’s Raziuddin addresses the concerns about data quality sentiment in this way: “No organization has ever claimed that their data is perfect. You work with imperfect data, but you can design LLMs to handle data-quality issues.” One way is to build in decision trees that will limit the impact of data-driven mistakes. Another way to stay on the safe side is to use LLMs internally, as the tolerance for mistakes internally is much higher than with outputs that go outside the company—for example, to customers.

“Ultimately, you have to understand who the end user is, what the risk of good and bad LLMs-driven decisions will be, and what it would take to convince the humans that ultimately they’re in control,” says Raziuddin.

Conclusion

LLMs create mixed emotions within companies. The full potential of LLMs has not yet been fully explored, and there are different opinions about how best to use them and what role they will ultimately play.

The survey of 242 readers and analysis of their comments conducted by The Information, and interviews with technology founders involved with LLMs, point to some considerations to make when implementing LLMs:

Consider the ultimate real-life impact of the LLMs. Go beyond the tactical or strategic uses and determine what impacts—and on whom—LLMs will have. Implement LLMs only when you can live with the consequences.
Recognize the role of the human as part of the collaboration with LLMs. Review the impact LLMs will have on the work of the human and set up the best working relationship between the human and the machine.
Create guardrails around machine-driven decision-making. Embed features that limit potential risks relating to data analytics, and make sure humans have ultimate control.

Methodology: Based on a survey of 242 respondents conducted by The Information in May 2024.

Size. A majority of survey respondents (53%) came from companies with revenues under $10 million, 19% had revenues between $10 million and $100 million, 9% had revenues between $100 million and $1 billion, and the remaining 19% had revenues above $1 billion.
Industries. The top industries represented in the survey were technology, media and telecommunications (42%), followed by professional services (19%), financial services (7%), and healthcare and life sciences (7%). All other remaining industries represented less than 5% of the survey.
Function. Top functional areas were general management (35%), followed by information technology and marketing and communications (both at 13%), and research and development (12%). All other functions represented less than 10% of the survey.
Rank. The biggest group of respondents were directors (19%), followed by CEOs (16%), employees (15%) and owners (13%). All remaining ranks represented less than 10% of survey respondents.
Gender. 74% of respondents were men and 19% were women. 7% preferred to self-describe or not disclose their gender.
Race or ethnicity. 70% of respondents were white, 12% Asian or Pacific Islander, 5% Hispanic or Latino, 4% multiple ethnicities, 2% black and 7% preferred not to disclose.
Age. The biggest group of respondents ranged in age from 45 to 54 (30%), followed by those from 55 to 64 (25%), those 35 to 44 (21%) and those 65 and over (15%). All other age groups represented less than 10% of respondents.