{{ 'now' | timezone: 'America/New_York' | date: '%b %d, %Y' }}
|
|
|
Welcome back to The Hidden Layer. I’m Ian Krietzberg, gearing up for a trip to
Seattle next week. If you’ll be in the area and want to meet up, drop me a line.
Today, I’ve got an exclusive look at Poetiq, a new startup from a couple of Google veterans that’s been quietly acing one of the industry’s toughest benchmarks. It’s been six months since this six-person team got started, and they’ve already captured the attention of heavyweights, including OpenAI and a bunch of major investors. Details below.
Plus, news and notes on the nudification apps that just
won’t go away, Donald Trump’s latest bid to speed up nuclear energy development for A.I. companies, and the $211 billion venture capital A.I. gold rush.
Mentioned in this issue: Poetiq, Shumeet Baluja, Ian Fischer, Elon Musk, xAI, Tesla, Donald Trump, nuclear power, the Tech Transparency Project, Heidy Khlaaf, the ARC benchmark, and many more…
Let’s get into it…
|
Three Things You
Should Know…
|
- A
crisis of nudification: The nudification scandal that engulfed Elon Musk’s xAI at the start of the year has brought attention back to a long-standing crisis outside of Musk’s sphere of influence: all the other A.I. image generation models that digitally undress photos of women and girls. While X has been heavily scrutinized, Apple and Google,
through their respective app stores, have also allowed users to download programs that create these sexualized, nonconsensual images.A recent investigation from the Tech Transparency Project discovered 55 such apps in the Google Play Store and 47 in the Apple App Store—and the report says they’ve been downloaded more
than 700 million times in total, generating over $100 million in revenue. “T.T.P.’s findings show that Google and Apple have failed to keep pace with the spread of A.I. deepfake apps that can ‘nudify’ people without their permission,” the report says. “Both companies say they are dedicated to the safety and security of users, but they host a collection of apps that can turn an innocuous photo of a woman into an abusive, sexualized image.”
Since the investigation, Apple has removed 28 of
these apps, and Google has taken down 31. But a quick search of my own revealed that plenty are still available—and, perhaps unsurprisingly, the top result in the Apple App Store for the search term “nudify” is none other than xAI’s Grok.
- Unleash the reactor: While the A.I. industry scrambles to find more energy to fuel its growth, plenty of major developers have pushed
nuclear as a solution. The Trump administration has been more than happy to support this
strategy—according to NPR, the Department of Energy has issued new orders over the past few months effectively slashing long-established safety requirements for nuclear energy development. The orders have not been made public, and the D.O.E. did not respond to a request for comment.NPR obtained copies of several of the orders, which reportedly
cut hundreds of pages of requirements for record-keeping, safety, and security at the reactors, in addition to environmental protections. Last year, OpenAI sent a letter to the White House Office of Science and Technology Policy asking the government to “step back by streamlining and modernizing outdated and onerous regulations to unlock
energy innovation.” But as Dr. Heidy Khlaaf, the chief A.I. scientist at the AI Now Institute, told me in November: “Nuclear is safe only because we make it safe.”
- They raised how much??: A.I. companies raised a total of $211 billion in venture capital last year, accounting for half of all global venture investment—and nearly double the $114 billion they raised in 2024—according to a report released today by Crunchbase and HumanX. Roughly three-quarters of the funds came from a few hundred “mega-deals” worth at least $100 million. Don’t call it a bubble!
|
Deal of the Week:
Tesla–xAI
|
Tesla has agreed to invest $2 billion in Elon Musk’s other company, xAI, as disclosed in a new
earnings report from the electric car maker—which, incidentally, also reported a drop in annual revenue for the first time. The investment was part of the $20 billion funding round that xAI closed earlier this month, and “was made on market terms consistent with those previously agreed to by other investors in the financing round,” according to the
report.
Meta and Microsoft were among the other Mag 7 companies that reported earnings last night. Wall Street was pleased only with Meta’s report, with the company’s stock rising some 10 percent; meanwhile, Microsoft took a 10 percent dive (erasing roughly $350 billion in value), despite the company reporting “net gains” from its investment in
OpenAI for the first time—a result of “the dilution gain from the OpenAI Recapitalization,” according to its 10-Q. Tesla dropped 3.5 percent, and Apple rose slightly on a huge surge in iPhone demand. Google’s and Amazon’s earnings will arrive next week; Nvidia’s will come at the end of February.
And now for the main event…
|
|
|
Poetiq, a less-than-one-year-old A.I. startup just crushed the ARC A.G.I. benchmark, beating
Anthropic and Google with only six people and $40,000. An exclusive look inside the search for an A.I. holy grail.
|
|
|
Most early-stage A.I. companies focus on a handful of similar tasks: training and fine-tuning L.L.M.s,
developing specialized models, buying G.P.U.s, etcetera. Poetiq—founded last year by Shumeet Baluja and Ian Fischer, former A.I. researchers at Google DeepMind who have been working together for over a decade—doesn’t do any of those things. Nevertheless, the startup just achieved one of the highest scores ever recorded on the gold-standard ARC-AGI-2 benchmark, beating out Anthropic, Google, and all but one of OpenAI’s attempts. They also pulled it off in just
six months, with a six-person team, and a total hardware bill of only $40,000.
It’s a wildly impressive feat, made more interesting by the fact that Baluja and Fischer, Poetiq’s co-C.E.O.s, are trying to scale up a meta-system to achieve “safe superintelligence” with unusual frugality: they raised $45.8 million in a previously unreported seed round, at an undisclosed valuation, co-led by Surface Ventures and Fyrfly Venture Partners with participation from Y Combinator, 468
Capital, Operator Collective, Hico Ventures, and Neuron Venture Partners. “Poetiq is helping foundation models achieve their potential without massive expenditures or needing to be the experts at training them,” Fischer told me. “We don’t view ourselves as competing with foundation models. We really view them as our most important partners.”
|
|
|
A MESSAGE FROM OUR SPONSOR
|
How should software companies redefine themselves in the AI era?
As AI-native startups achieve $100M+ ARR with radically different operating models, incumbents face a pivotal moment. Staying competitive requires more than embedding AI into products - it demands a fundamental reinvention of go-to-market, business models, internal operations, and talent strategy.
In The AI-centric imperative: Navigating the next software
frontier, McKinsey draws on insights from a recent global survey of software top executives to outline the seven imperatives leaders must embrace to become truly AI-centric and stay ahead as AI rewrites the rules of software.
Read the full article here.
|
|
|
Baluja and Fischer’s opportunity centers on bridging the gap between L.L.M. capability and usability. The
industry has poured ungodly sums into improving its models, yet for the companies trying to integrate L.L.M.s into their organizations, things haven’t been going so well. Most corporate A.I. experiments die in the pilot phase, and those that survive aren’t necessarily helping businesses: Employees report that A.I. tools don’t save them much time—arguably the single biggest value proposition of L.L.M.s. As I
wrote earlier in the month, this has turned 2026 into a make-or-break year for the industry.
To address this problem, Poetiq’s co-C.E.O.s envisioned an alternative to both reinforcement learning and fine-tuning. Those approaches are costly, time-consuming, and require anywhere from thousands to millions of data points; Poetiq’s system requires far less data, and
works by leveraging a given L.L.M., or an ensemble of L.L.M.s, to create what they describe as “expert agents,” which can more accurately and efficiently solve specific problems. Fischer described these agents as “code, prompts, and data sitting on top of one or more language models.”
The construction of these expert systems, according to Fischer, represents the company’s take on recursive self-improvement—a holy grail of A.I. in which a system continuously and autonomously makes itself
more capable. As Baluja explained, L.L.M.s are better at determining how to solve problems than actually solving them. Poetiq’s approach is unique, he said, because it addresses that gap by extracting “the information from the L.L.M. on how to solve a problem, but then when we actually solve it—it’s done outside the L.L.M.”
The novel technique embraces the inherent stochasticity, or unpredictability, of A.I. models. As I’ve written before, L.L.M.s are probabilistic prediction
machines, which means everything they output is fundamentally a hallucination. Sometimes those predictions are accurate; sometimes they’re not. But instead of attempting to overcome this issue, Poetiq views it as a strength. “If I ask you how to solve a problem and you explain it to me multiple different ways, and they all have some grain of truth in them, that’s a lot of information for me,” Baluja said. “So if you can gather that information and do something with it, all of a sudden, you’ve
just said that the stochasticity and variableness of L.L.M.s is actually a good thing.”
Baluja continued: “What’s crucial to understand about our approach is that we think there’s a ton of information already in the L.L.M.s that most people just can’t get to,” he said. “These models have given us a solid foundation on which to build intelligence. They are not the intelligence itself. We don’t need the L.L.M. to be the execution environment. That’s a very strange thing to do.”
|
“I Don’t Want to Oversell It, But…”
|
Ultimately, Baluja and Fischer hope to achieve one of the industry’s white whales: artificial
superintelligence. Fischer told me their focus on superintelligence, rather than general intelligence, was the result of an explicit attempt to “sidestep some of the definitional questions around A.G.I.” Both remain theoretical, but roughly speaking, the goal of A.G.I. is to simulate human-level intelligence, while superintelligence seeks to transcend the capabilities of even the best human minds. Fischer said they’ll know they’re there when they “can solve every problem better than a
human—like, much better than a human.”
Who knows when, or if, this goal will be achieved? In the meantime, a focus on safety is baked into Poetiq’s approach. It largely comes down to controllability: Poetiq decides what tools and environments its system is allowed to access, which prevents the system from doing something unexpected. Then there’s the fact that the startup is “not deploying a general A.I. externally—so that means if somebody comes and wants to develop some
biological attack agent, the only way for them to do that with Poetiq is if they have a customer relationship with us,” Fischer said. “I don’t want to oversell it,” he continued, “but it’s a huge advantage relative to the frontier labs, who are all trying to make a general-purpose A.I. that is generally available.”
|
|
|
A MESSAGE FROM OUR SPONSOR
|
How should software companies redefine themselves in the AI era?
As AI-native startups achieve $100M+ ARR with radically different operating models, incumbents face a pivotal moment. Staying competitive requires more than embedding AI into products - it demands a fundamental reinvention of go-to-market, business models, internal operations, and talent strategy.
In The AI-centric imperative: Navigating the next software
frontier, McKinsey draws on insights from a recent global survey of software top executives to outline the seven imperatives leaders must embrace to become truly AI-centric and stay ahead as AI rewrites the rules of software.
Read the full article here.
|
|
|
Baluja said that this guardrailed, aspirationally superintelligent capability is central to Poetiq’s business
model. To wit, companies come to them with a very specific problem, they build an expert agent to solve that problem, then that “expert” is available when a different customer comes to them with a similar problem. “What we’ll wind up with in a few years is thousands of these experts, so that when you get a new problem, it’s like, Okay, I’ve seen something like that before,” he said. Taken together, he added, these experts “have an amazing amount of intelligence, but the way they’re
being used is not as one gigantic model.”
|
In many ways, Poetiq’s performance on the ARC benchmark validated not just their system, but also their
business strategy. Fischer and Baluja approached the test as they would a customer: They spent a few thousand dollars to get the system going; plugged in a bunch of different models, including Gemini 3, GPT-5.1, and Grok 4; and trained the system on the dataset for ARC-AGI-1, the easier precursor to ARC-AGI-2. Poetiq’s expert agent then developed a general-enough solution to the benchmark, which is designed to test abstract problem-solving rather than knowledge, that it managed to achieve
state-of-the-art results on ARC-AGI-2—without ever having seen the training data for the second version of the test. “It’s so important that that happened,” Baluja said. “Because once people come to us with a certain problem, we want to be able to learn from that. And the next time you come up to us with a related problem, we want to be able to do that problem faster.”
On the private evaluation conducted by ARC, Poetiq’s system, using Gemini 3, achieved a score of 54 percent—10 percentage
points above Google’s own score, and at less than half the cost per task. On a public set of evaluations, which weren’t conducted by the ARC team, Poetiq used an ensemble of models to achieve a score of 65 percent, higher than the average human score of 60 percent. (A panel of people can get a perfect score on this benchmark.)
Shortly after Poetiq posted their results, OpenAI came
to them and said they were about to release GPT-5.2. They asked Poetiq to test its system using the new model; that test returned a score of 75 percent. “We couldn’t hope for a better result,” Baluja told me. “We’re seeing a lot of transference across models, and we’re seeing it across tasks, which is exactly what we need for a system like Poetiq to actually achieve this whole idea of
learning and generalization. It gave us a lot more confidence in our own approach.” He explained that Poetiq has plans to offer a self-service model, which would allow their technology to scale quickly.
Despite these early successes, Poetiq is entering a market that is both fiercely competitive and, depending on who you talk to, possibly teetering on the verge of a correction. When I brought this up, Baluja said he’s not sure whether the worst-case scenario will come to pass. (“Fingers
crossed,” he said.) But he added that Poetiq’s model agnosticism provides a bit of insulation if we are in fact in a bubble, and that bubble bursts. Moreover, Poetiq’s approach of taking a model and making it better in specific domains means that for price-sensitive customers, smaller and cheaper models could be made to perform on par with very expensive ones. This is “really compelling,” he told me. “We’re not insulated, obviously, from price fluctuations. But we think if A.I. is going to be
used anywhere, we’ll just get you a higher-value proposition for using it.”
|
That’s all for today. I’ll see you next week.
Ian
|
|
|
Join Emmy Award-winning journalist Peter Hamby, along with the team of expert journalists at Puck, as they let you in on the
conversations insiders are having across the four corners of power in America: Wall Street, Washington, Silicon Valley, and Hollywood. Presented in partnership with Audacy, new episodes publish daily, Monday through Friday.
|
|
|
A professional-grade rundown on the business of sports from John Ourand, the industry’s preeminent journalist, covering the
leagues, players, agencies, media deals, and the egos fueling it all.
|
|
|
Need help? Review our
FAQ page or contact us for assistance. For brand partnerships, email ads@puck.news.
You received this email because you signed up to receive emails from Puck, or as part of your Puck account associated with {{customer.email}}. To stop receiving this newsletter and/or manage all your email preferences, click here.
|
Puck is published by Heat Media LLC. 107 Greenwich St., New York, NY 10006
|
|
|
|