Dawn of the Self-Building A.I.

chatbot ai school aducation
According to founder Arun Bahl, Aloe’s model leverages a combination of neurosymbolic reasoning, program synthesis, and confidence scoring to more helpfully respond to problems that tend to trip up stand-alone L.L.M.s. Photo: Carmen Jaspersen/picture alliance/Getty Images
Ian Krietzberg
September 16, 2025

Join Puck to listen to this article

Arun Bahl is hoping to build a different kind of A.I. company, one explicitly designed to steer our society away from a dystopian future and toward something more… agreeable. But what really distinguishes Aloe, which Bahl co-founded in 2023, is that the company’s A.I. model (also called Aloe) is apparently “self-building.” There’s a certain woo-woo aspect to the endeavor: The company’s employees are referred to as “gardeners” who shape and guide the technology, and the website emphasizes that Aloe isn’t going to sell your data or “mine your dopamine.”

Aloe’s mission, like that of Amazon’s AGI Labs, is to provide a technological remedy to the “distraction economy.” Bahl, a cognitive scientist turned serial entrepreneur, told me that his goal was to build a chatbot that can “help bolster our abilities, so that we can thrive in this environment that’s not the same as the one that we evolved for.” Plenty of scientific research backs up the link between distraction and weakened cognitive performance, and that attempting to process too much information leads to worse memory recall; Bahl believes that A.I. can eliminate the digital clutter that exacerbates this phenomenon.

In his view, getting there involves moving past the industry’s fixation on large language models and designing an auditable system that users will find genuinely trustworthy. Bahl told me this requires users to actually believe the chatbot is capable of sound reasoning; has access to good information; and isn’t a malicious actor. Of course, it also means dipping a bucket into the veritable river of V.C. cash that’s been flooding Silicon Valley since 2022. So far, Bahl told me, his small team of three has been able to bootstrap the company, helped along by a few angel investors. But a big capital raise is probably the next step—if the “right partners” with the right motivations come along.



It’s unclear how far along those conversations are, but Bahl told me that a raise would happen “soon.” In the meantime, the company’s main focus has been on pulling people off the waitlist to test how well Aloe works in the real world. “In a moment where sometimes it feels a bit hard to have optimism, I think that we can,” Bahl told me. “Being clear about what our expectations are, not just for our tools, but tech as an industry, I think that there’s a real opportunity to rewrite that script and nudge the boat in a different direction.”


Temet Nosce

According to Bahl, Aloe’s model leverages a combination of neurosymbolic reasoning, program synthesis, and confidence scoring to more helpfully respond to problems that tend to trip up stand-alone L.L.M.s. But that was as much as I could squeeze out of him. That said, the idea of neurosymbolic A.I., which has been pushed by a number of researchers in the field, including Gary Marcus, is to combine the two prominent A.I. architectures—symbolic A.I., which relies on rules-based reasoning and formal logic; and artificial neural networks, which power L.L.M.s and recognize patterns from data—in a way that overcomes each of their respective shortcomings.

Moreover, Aloe’s capabilities are based on an artificial version of metacognition, a jargony term that, in humans, essentially equates to self-awareness. In terms of A.I. systems, a metacognitive model would be trained to automatically question why it landed on a certain answer, which would in theory make the output more trustworthy. “If there’s a disease I want to bequeath to this thing, it’s crushing self doubt,” Bahl told me. In short, through confidence scoring, Aloe seems capable of recognizing when its responses aren’t trustworthy, and can then leverage symbolic tools to write code autonomously as a means of problem-solving. One example Bahl offered is training a model to use a calculator to answer a math query, rather than rely on its own brittle “reasoning.” The trick is getting the model to actually reach for the calculator, which requires enough “self-awareness” to suspect that its initial response is incorrect.

This is where the “self-building” component comes in. Beyond merely relying on an external tool when necessary, Bahl said that Aloe is “able to create new tools if it doesn’t have the tool that it needs for a certain situation.” He offered an anecdote in which Aloe “needed to be able to understand that there was some speech inside of an MP3 file.” The model wasn’t given a tool to identify the speech, but Bahl said that when it recognized the problem, “it stopped and wrote the tool to do that. It iterated through, and didn’t just write the code, but tested it. And when it looked like [the code] was running well enough, it could then bring that tool back into its toolbox and use it. Once it’s been able to create this new tool to work through a type of problem, that’s now available for all such types of problems.” (Neither the tool nor this specific capability has been independently tested or verified, so it’s almost impossible to know, at this point, just how effective, robust, or reliable Aloe actually is.)



Bahl also noted that Aloe is built on a number of L.L.M.s. “We’re totally model agnostic, so we can use open source too,” he said. “There were situations where Gemini couldn’t answer this question, but Gemini inside of Aloe can.” At some point, Bahl said he might be interested in building a proprietary language model to power Aloe and specifically expressed interest in the promise of a smaller, more efficient model. But he reiterated that, for now, the team’s focus is on the scaffolding that can be built around an existing L.L.M.

A month ago, Aloe released test results against the popular General A.I. Assistants benchmark (GAIA), in which its system beat the competition to achieve state-of-the-art scores by a healthy margin. As Bahl pointed out, perhaps the most significant revelation was that Aloe’s lead over the competition was at its highest on the most difficult questions. (Aloe achieved a score of 78.9 on the third level of the GAIA benchmark, compared to GPT-5 medium’s 38.4.) Of course, Aloe’s score has been neither verified nor peer reviewed—not to mention the fact that benchmarks rarely measure what they purport to measure. Unlike the verified scores on GAIA’s leaderboard, it’s also unclear what it cost Aloe to achieve its scores. But as a marker against which the entire industry is competing, Aloe seems to have slipped to the top.

Shared with you by a Puck member

Enjoy this free article thanks to a friend. You can keep exploring Puck with a free account or enjoy a 14 day free trial.