Welcome back to The Hidden Layer, my new twice-weekly email devoted to the opaque,
high-stakes world of artificial intelligence. I’m Ian Krietzberg.
Thanks for all the great feedback on my last issue. Keep it coming! If you’ve got questions, tips, critiques, thoughts about the Knicks or Rob Thomas (he has new music, okay?), or anything else you want to get
off your chest, just reply to this email. You can also message me on Signal at 732-804-1223. Also, if you’re not yet subscribed to Puck,
click here to change that.
In
today’s issue, a close look at how hospitals are actually using A.I., and what that means for patient care. Plus, the realities behind so-called L.L.M. “reasoning.”
Mentioned in this issue: Elon Musk, Grok, OpenAI, xAI, Windsurf, Mark Zuckerberg, Dr. John Brownstein, Boston Children’s Hospital, “cognitive offloading,” Google, Keyon Vafa, Elsevier, Meta,
Mike Lindell (yes, that Mike), and many more…
But first…
|
Back in May, it seemed like OpenAI was close to its largest-ever acquisition: a $3 billion deal to
absorb Windsurf, a popular A.I. coding company. But on Friday, Google announced an agreement to hire Windsurf’s co-founder and C.E.O., Varun Mohan, plus other key Windsurf employees, in a deal reportedly worth $2.4 billion. (Google is also getting a nonexclusive license for Windsurf’s technology.) Then, on
Monday, A.I.-coding startup Cognition—the makers of the much-hyped Devin product—announced that it was acquiring Windsurf’s remaining assets. Jeff Wang, who was instated as Windsurf’s C.E.O. three days ago, described
those 72 hours as “the wildest rollercoaster ride” of his career. No kidding.
Of course, acqui-hires are increasingly popular among the leading giants in the A.I. space: Google did it last year with Character.AI, Microsoft did it last year with Inflection, and Amazon also did it last year with Adept. Still, it’s remarkable how quickly the talent war is upending the venture ecosystem, in which
investors and early employees bet on a big exit for the company, not just its founders or star employees.
|
Three
Things You Should Know
|
-
“Potemkin” reasoning: Last week, after I wrote about Grok 4, the latest iteration of Elon Musk’s A.I. chatbot, I received an interesting note from a reader. This person pointed out that, in addition to acing a few other benchmarks, Grok 4 achieved a 16 percent on the
ARC-AGI-2 benchmark—doubling the previous high score of 8 percent secured by Claude 4 Opus. Among other things, the ARC-AGI-2 seeks to measure genuine machine intelligence via logic puzzles that are easy for humans but exceedingly difficult for A.I. models. Of course, we don’t know anything about Grok’s training data, which means it might have been exposed to similar logic puzzles. (Grok 4 was also
tested on ARC’s semi-private evaluation, which allows some degree of exposure to the benchmark before testing.)
I was reminded of a recent conversation I had with Keyon Vafa, a postdoctoral fellow at the Harvard Data Science Initiative, who has been studying the reasoning capability of L.L.M.s. Along with researchers from MIT and UChicago, he recently
published a paper that aimed to reconcile high benchmark performance with inconsistent, real-world efficacy. The problem, the researchers pointed out, is that many of these benchmarks are also used to test humans. “These benchmarks are only valid tests if L.L.M.s misunderstand concepts in ways that mirror human misunderstandings,” they wrote. “If L.L.M. misunderstandings diverge from human
patterns, models can succeed on benchmarks without understanding the underlying concepts.”
Their research found traces of what they termed “potemkin” reasoning across a variety of models—instances that suggest a capacity for answering questions correctly without “true concept comprehension.” Across other research, Vafa
found that, for now, L.L.M.s don’t possess an inherent “world model,” which I also wrote about last week. Obviously, properly evaluating the reasoning capabilities of L.L.M.s is
vitally important for adoption—but conducting meaningful evaluations is easier said than done, in part because they’re used for such a wide variety of tasks. - American Prometheus: On Monday, Mark Zuckerberg announced that Meta will invest “hundreds of billions of dollars” into A.I.-enabled
G.P.U.s to achieve his mission of machine “superintelligence.” The first multi-gigawatt cluster, called Prometheus, will apparently come online next year; Hyperion, a second cluster that could scale up to 5 gigawatts, is also in the works. The largest supercomputing cluster in existence, xAI’s Colossus, boasts 200,000 G.P.U.s and reportedly requires some 300 megawatts of energy, most of which is supplied by a series of on-site gas turbines. (One gigawatt of energy is enough to
power nearly a million homes in the U.S.)
It’s unclear where Meta’s data centers will be located, how they will be powered, or what impacts there might be to nearby populations. “Just one of these covers a significant part of the footprint of Manhattan,” Zuckerberg wrote. Meta didn’t return a request for comment. - Grok
slouches toward Washington: Just days after xAI was forced to apologize for antisemitic rants from its Grok chatbot, the company announced on Monday that it had secured a contract from the U.S. Department of Defense, and that its products will be available for purchase by every
federal department and agency. In a press release, the D.O.D. said that it had awarded contracts of up to $200 million each to Anthropic, Google, OpenAI, and xAI, all meant to “accelerate” the D.O.D.’s adoption of A.I. for “national security” purposes.
|
Hallucination
of the Week
|
Two attorneys representing MyPillow C.E.O. Mike Lindell in a
recent defamation case were sanctioned and fined last week by a federal judge after they were caught using a generative A.I. system to prepare a court filing, which included citations to cases that don’t exist. The lawyer-gets-busted-using-A.I. trope has been playing out almost from the moment ChatGPT burst onto the scene. One researcher has
assembled a database of such scenarios that includes more than 200 cases. Somehow, they never learn. Maybe the fines are too low?
And now for the main event…
|
|
|
Nearly half of clinicians are now using A.I. for their work. Patients are turning
to ChatGPT to self-diagnose mysterious ailments. And everyone from the chief innovation officer of Boston Children’s Hospital to R.F.K. Jr. is excited about the revolution unfolding in plain sight. What could go wrong?
|
|
|
Long before ChatGPT infiltrated classrooms and became an obsession at cocktail parties,
Boston Children’s Hospital embarked on what Dr. John Brownstein, its chief innovation officer, described as an “A.I. journey.” For years, Brownstein told me, the hospital had been using machine learning in data-rich environments—like radiology, pathology, or the intensive care unit—to generate “predictions” about patient outcomes. Then came the generative A.I. explosion. Now, Brownstein said, his team is anticipating that A.I. is “going to be part of the fabric of almost all the
technologies we use in the hospital.” For many people in the A.I. field, the integration with medicine represents a potential holy grail.
Obviously, these technologies are still error-prone, and the stakes are much higher when you’re incorporating A.I. into potentially life-or-death healthcare decisions, rather than, say, enabling Gemini in your Gmail. But physicians are finding early success with A.I. tools, and the rate of adoption is steadily ticking up: According to Elsevier’s
fourth-annual “Clinician of the Future” report, which was released today, 48 percent of clinicians had used A.I. for work in 2025, nearly double the 26 percent reported the year before, and more than triple the figure from the year before that. The 2,000 or so physicians who responded to the survey described their primary use cases for A.I. as identifying drug
interactions, analyzing medical images, and providing a patient’s medication summary.
This rapid adoption curve, Brownstein said, can be attributed in part to the industry’s seeming openness to this technology. At Boston Children’s, 30 percent of the hospital’s workforce has already started using A.I., although mostly via “low-risk” applications, like administrative tools. The hospital was also one of the earlier adopters of (controversial) ambient listening tools, which use A.I. to
auto-transcribe patient-doctor visits, and has partnered with OpenAI to advance their work on the diagnosis of rare diseases. “We’ve been very careful about the deployment of these tools, recognizing that some come with more risk than others,” Brownstein said, adding that the hospital has also started using physician-facing tools, at least in part, for “care guidance”—a step toward wide-scale, predictive, personalized healthcare.
Still, plenty of doctors remain cautious. In Elsevier’s
2024 survey, 85 percent of clinicians said that A.I. could cause critical errors, and 93 percent were worried about misinformation. In this year’s survey, only 40 percent of clinicians claimed that A.I. could be trusted to assist with clinical decision-making, and only 30 percent said their institutions were providing adequate training—an issue that Brownstein acknowledged as an impediment to adoption. “At the end of the day, whoever’s using them has to sign off and take responsibility for
whatever the output is,” he told me. “It still resides with the clinician to provide that consideration. Yes, there’s a future world where a lot of patients are going to turn directly to these tools, but that’s not where we are.”
|
The Revolution Comes to
Washington
|
The feverish integration of A.I. tools across the medical industry has become a focus for virtually
every big tech player in the space. Microsoft is pursuing “medical superintelligence”; OpenAI says that “improving human health will be one of the defining impacts of A.G.I.”; Nvidia is
“transforming” healthcare with A.I.; Amazon is “reimagining primary care” with A.I.; and Google is “transforming”
healthcare. All are ostensibly focused on personalizing medicine with better analytics and predictive diagnostics; enhancing productivity to overcome staffing shortages; and improving operational efficiencies to address burnout while improving patient care.
The excitement has penetrated Washington, D.C., too. Secretary of Health and Human Services Robert F. Kennedy Jr., who
heralded the arrival of a so-called “A.I. revolution” at the department, has declared that “President Trump is
determined to end the hemorrhaging of rural hospitals, and he’s asked me to do that through the use of A.I.” It all sounds compelling, but things haven’t exactly gone according to plan on the ground. A 2024 survey from National Nurses United—the country’s largest union of registered nurses—found that only 40 percent of respondents trust their institutions will implement A.I. with patient safety “as their first priority.” The survey
details cases of missing patient information, and inaccurate algorithmic analyses that “often contradict and undermine nurses’ own clinical judgment and threaten patient safety.”
Michelle Mahon, the director of nursing practice at NNU, told me that while nurses are not anti-tech,
their primary concern is whether a given technology will actually improve patient care. “There is no demonstrated effectiveness” when it comes to A.I., she said. “It is not proven to be safe and effective prior to its implementation, and this has caused disruptions.” When asked about the technology’s propensity to “hallucinate,” she quickly noted that “we are talking about people’s lives. Even a small aberration in facts can be life-threatening.”
As with the electronic health
record, Mahon said that A.I. is “primarily designed to capture revenue. There’s been a long-standing goal of the industry to reduce labor costs by cutting back on nursing.” As long as the goal involves saving money by “distilling care,” she said, no technological intervention is “going to work in the best interest of patients.” Citing Kennedy’s statements on the supposed capabilities of “A.I. nurses,” she said she does not see a path to the integration of A.I. technologies that
would overcome or even address any of these concerns: “This is a red line for us.”
At the heart of these concerns is the slightly more abstract, somewhat longer-term threat of “cognitive offloading” and skill degradation associated with the use of generative A.I. Indeed,
several studies this year have identified a decrease in critical thinking associated with using and trusting the technology.
According to Dr. Rahul Goyal, Elsevier’s chief medical officer, an overreliance on A.I. in a medical context could potentially cause healthcare workers to lose their “knack.” While “A.I. is a brilliant servant,” he told me, it’s “not the best master.”
When it comes to the question of skill degradation—in which clinicians might become trained to respond to A.I.-powered alerts rather than their own intuition—the threat is multipronged, and dovetails with the problem of
cognitive offloading. To wit: When tools seem to work only some of the time (or if the A.I. experiences a hallucination), clinicians in a crisis might not have the skills necessary to address a problem. And even if the tech worked perfectly all the time, the risk of overreliance doesn’t go away.
Last year, the Ascension health system lost access to its electronic health record for more than a month due to a ransomware attack. Mahon said that downtime systems weren’t in
place, and staff didn’t know how to work from a physical chart. “As medical healthcare professionals, we are taught immediately, first and foremost, to treat the patient and not the monitor,” she told me. “What is happening with a person is more important than the data. This flips the script and emphasizes the value of data over all other kinds of knowledge. And that is dangerous.”
|
There were lots of great responses to my first two columns last week. Here’s a sampling of the
feedback in my inbox…
“I do not agree with this premise that videos or video games will eventually be generated for each individual user. People crave collective experiences, something they can engage other human beings about. If everyone is watching their own unique video or playing their own specially tailored game, it removes the ability to engage with others. Can someone who works in A.I. explain to me why they think people won’t care about the collective experience in favor of
something uniquely for them? This is a deep misreading of why people engage with things, and I find it frustrating that these experts never comment on that while claiming this is the future of media.” —A film industry person
“Your most recent newsletter includes a Quote of the Week from someone named Alexandra Ebert at Mostly AI. Her remarks reflect the usual ignorance of history of many of our tech innovators. The early days of car safety did not involve a
person with a red flag walking around to warn passengers, with seat belts coming online downstream. Once cars were commercialized, it became necessary to construct literal guardrails on roads. And traffic lights. And stop signs. And road infrastructure. As individuals it behooves us to take precautions to drive carefully: watch out for pedestrians and other cars, maintain a safe speed, and so on. But we shouldn’t expect that our personal decisions alone will be enough to ensure road safety.”
—A longtime Puck subscriber
|
|
|
Join Emmy Award-winning journalist Peter Hamby, along with the team of expert journalists at Puck, as they let you in
on the conversations insiders are having across the four corners of power in America: Wall Street, Washington, Silicon Valley, and Hollywood. Presented in partnership with Audacy, new episodes publish daily, Monday through Friday.
|
|
|
A professional-grade rundown on the business of sports from John Ourand, the industry’s preeminent journalist,
covering the leagues, players, agencies, media deals, and the egos fueling it all.
|
|
|
Need help? Review our FAQ page or contact us for assistance. For brand partnerships, email ads@puck.news. You received this email because you signed up to receive emails from Puck, or as part of your Puck account associated with {{customer.email}}. To stop receiving this newsletter and/or manage all your email preferences, click here.
|
Puck is published by Heat Media LLC. 107 Greenwich St, New York, NY 10006
|
|
|
|