Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

0:00

Code's not even the right verb anymore, right? But I have to express my will to my agents for sixteen Manifest. Hours a How can I have not just a single session of Claude Code or Codex or some of these agent harnesses? How can I have more of them? How can I do that appropriately? The agent part is now taken for granted. Now the claw like entities are taken for granted. And now you can have multiple of them. And now you can have instructions to them. And now you can have optimization of the instructions. But this is why it gets to the psychosis, is that this is like infinite and everything is a skill issue. Hi, listeners. Welcome back to Know Priors. Today, I'm here with Andrej Karpathy, and we have a wide ranging conversation for you about Code Agents, the future of engineering and AI research, how more people can contribute to research, what's happening in robotics, his prediction for how agents can reach out into the real world, and education in this next stage. Welcome, Andrej. Andrej, thanks for doing this. Yeah. Thank you for having me. So it's been a very exciting couple of months in AI. Yeah, you could say that. I remember walking into the office at some point and you were like really locked in. Was asking what you were up to, and you're like, I just I have to code for sixteen hours a day. Code's not even the right verb anymore, right? But I have to express my will to my agents for sixteen Manifest. Hours a Because like there's been a jump in capability. What's happening? Tell me about your experience.

1:28

Yeah. I kind of feel like I was just in this perpetual I still am often in this state of AI psychosis just like all the time because there was a huge unlock in what you can achieve as a person, as an individual. Right? Because you were bottlenecked by, you know, your typing speed and so on. But now with these agents, it really I would say in December is when it really just something flipped, where I kinda went from eighty twenty of, like, you know, to, like, twenty eighty of writing code by myself versus just delegating to agents. And I don't even think it's twenty eighty by now. I think it's a lot more than that. I don't think I've typed, like, a line of code probably since December, basically, which is, like, an extremely large change. I was talking to it, like, for example I was talking about it to, for example, my parents and so on, and don't think, like, a normal person actually realizes that this happened or how dramatic it was. Like, literally, like, if you just find a random software engineer or something like that at their at their desk and what they're doing, like, their default workflow of, you know, building software is completely different as of basically December. So, I'm just, like, in this state of psychosis of trying to figure out, like, what's possible, trying to push it to the limit. How is it how can I have not just a single session of, you know, Claude Code or Codex or some of these agent harnesses? How can I have more of them? How can I do that appropriately? And then how can I use these claws? What are these claws? And so there's, like, a lot of new things. I wanna be at the forefront of it, you know, and I'm very antsy that I'm not at the forefront of it. And I see lots of people on Twitter doing all kinds of things, and they all sound like really good ideas, and I need to be at the forefront or I feel extremely nervous. And so I guess I'm just in this psychosis of like, what's possible? Like, because it's unexplored fundamentally. Well, if you're nervous, the rest of us are nervous.

3:04

We have a team that we work with at Conviction that their setup is everybody is like, you know, none of the engineers write code by hand. And they're all microphoned, they just like whisper to their agents all the time. It's the strangest work setting ever. And I thought they were crazy, and now I, like, I fully accept. I was like, oh, this was the way. Like, you're just ahead of it. Yes. How do you think about your own capacity now to, like, explore or to do projects? Like, what is it limited by? Yeah. What is it limited by? Just I think everything like, so many things, even if they don't work, I think to a large extent, you feel like it's a skill issue. It's not that the capability is not there. It's that you just haven't found a way Yeah. To string it together of what's available. Like, I just don't I didn't give enough instructions in the agent's MD file or whatever it may be. I don't have a nice enough memory tool that I put in there or something like that. So, it all kinda feels like skill issue when it doesn't work to some extent. You wanna see how you can paralyze them, etcetera, and you wanna be Peter Steinberg, basically. So, Peter is famous. He has a funny photo where he's in front of a monitor with lots of, like, he uses Codex. So, lots of Codex agents styling the the monitor, and they all take about twenty minutes if you prompt them correctly and use the high effort, and so they all take about twenty minutes. So, you have multiple, you know, 10 repos checked out, and so he's just going between them and giving them work. It's just like you can move in much larger macro actions. It's not just like, here's a line of code, here's a new function. It's like, here's a new functionality and delegate it to agent one. Here's a new functionality that's not gonna interfere with the other one. Give it agent two, and then try to review their work as best as you can, depending on how much you care about that code. Like, what are these macro actions that I can manipulate my software repository by? And, like, another agent is doing some research, another agent is writing code, another one is coming up with a plan for some new implementation. And so, everything just happens in these like macro actions over your repository, and you're just trying to become like really good at it and develop like a muscle memory for it is extremely, yeah, it's very rewarding, number one, because it actually works. But it's also kind of like the new thing to learn. So, that's why, hence the psychosis.

5:07

Yeah. I I do feel like my instinct is, like, whenever I am waiting for an agent to complete something, the obvious thing to do is like, well, I can do more work. Right? Like, if I have access to more tokens, then, like, I should just paralyze at more tasks. And so that's that's very stressful because if you Yeah. Don't feel very bounded by your ability to spend on tokens Yeah. Then, you know, you are the bottleneck in the system that is max capability. Yeah. If you're not maximizing your subscription Yeah. At least. And ideally for multiple agents. Like, if you run out of the codec on Codex, you should switch to Claude or whatnot. I don't know. Like, that's what I've been trying to do a little bit. And I feel nervous when I have subscription left over. That just means I haven't maximized my token throughput. So, I actually kind of experienced this when I was a PhD student. You would feel nervous when your GPUs are not running. Like, you have GPU capability, and you're not maximizing the available FLOPs to you. But now it's not about FLOPs, it's about tokens. So, what is your token throughput, and what token throughput do you command? I would actually argue that it's very interesting that we had, you know, at least ten years where in many engineering tasks, people just didn't feel compute bound. Right? And the entire industry feels that now. They feel like they felt resource bound. And now that you have this big capability jump, you're like, oh, actually, it's not, you know, my ability to access the compute anymore. Like, I'm the binding constraint. Yeah. It's a skill issue. Yeah. Which is very empowering because, yeah, because you could be getting better. So that's why that's why I think it's very addictive because there's unlocks when you when you get better. Where do you think it goes? Like, if you just think about, like, okay, you know, Andrej is iterating, and everybody else is for sixteen hours a day, getting better at using coding agents. Like, what does it look like in a year of, like, you've reached mastery?

6:49

Yeah, what does mastery look like, right, at the end of the year, or like two, three years, five years, ten years, etcetera? Yeah. Well, I think everyone is basically interested in, like, going up the stack. So, I would say, yeah, it's not about a single session with your agent, multiple agents, how do they collaborate, and teams, and so on. So, everyone's trying to figure out what that looks like. And then I would say, Claude is also kind of an interesting direction, because it really When I say a Claude, I mean this, like, layer that kind of takes persistence to a whole new level. Like, it's something that, like, keeps looping. It's not something that you are interactively in the middle of. It kind of like has its own little sandbox, its own little, you know, it kind of like does stuff on your behalf even if you're not looking kind of thing, and then also has like maybe more sophisticated memory systems, etcetera, that are not yet implemented in Agents. So, OpenCLaw has a lot more sophisticated memory, I would say, than what you would get by default, which is just memory compaction when your context runs out. Right? You think that's the piece that resonated for more users versus like perhaps, like, broader tool access? For OpenCLO? Yeah. There's like I think there's at least five things that Yeah. Resonated with Good job, Peter. I mean, Peter has done a really amazing job. I saw him recently, and I talked to him about it, and I he's very humble about it, I think he innovated simultaneously in, like, five different ways and put it all together. So, for example, like the Soul and Dee document. Like, he actually really crafted a personality that is kind of compelling and interesting, and I feel like a lot of the current agents, they don't get this correctly. I actually think Claude has a pretty good personality. It feels like a teammate, and it's excited with you, etcetera. I would say, for example, Codex is a lot more dry, which is kind of interesting, because in ChatGPT, Codex is a lot more upbeat and highly sycophantic. But I would say Codex, the coding agent, is very dry. It doesn't seem to care about what you're creating. It's kinda like, oh, I implemented it. It's like, okay, but do you understand what we're building? It's true. You know, it doesn't and the other thing I would say is, example, with Claude, think they dialed the psychophanty fairly well, where when Claude gives me praise, I do feel like I slightly deserve it. Because sometimes I kinda give it, like, not very well formed thoughts, and I give it an idea that I don't think is fully baked, and it doesn't actually react very strongly. It's like, oh, yeah. We can implement that. But when it's a really good idea by my own account, it does seem to reward it a bit more. And so I kinda feel like I'm trying to, like, earn its praise, which is really weird. Mhmm. And so I do think the personality matters a lot, and I think a lot of the other tools maybe don't appreciate it as much. And I think in this aspect also, Peter really cares about this, and so that was correct. And then the memory system, and then just, you know, he's just having fun with this. And then the the single WhatsApp portal to all of the automation. Yeah. Is there something that you have done personally with your claws beyond software engineering that you think is fun or interesting? Yeah. So in January, I had a claw. I went through a period of claw psychosis.

9:29

So I built I have a claw basically that takes care of my home, and I call him Dobby the Elf Claw. And, basically, I used the agents to find all of the smart home subsystems of my home on the local area network, which I was kind of surprised that worked out of the box. Like, I just told her that I think I have Sonos at home. Like, can you try to find it? And it goes, and it did, like, IP scan of all the, basically, computers on the local area network, and it found the Sonos thing, the Sonos system, and it turned out that there's no password protection or anything like that. It just logged in, and it's like, oh, yeah, have these Sonos systems installed. I let me try to reverse engineer how it's working. It does some web searches, and it finds like, okay, these are the API endpoints. And then it's like, do you wanna try it? And I'm like, woah. Like, you just did that. I'm like, yeah. Can you try to play something in the study? And it does, and music comes out. And I'm like, I can't believe I just. That's crazy. That's like three prompts. I can't believe I just typed in, like, can you find my Sonos? And that suddenly it's playing music. And it did the same for lights, and so basically, like, it kinda hacked in, figured out the whole thing, created APIs, created dashboard, so I could see the command kinda center of, like, all of my lights in the home, then and it was like switching lights on and off. You know, so I can ask it like Adobe at sleepy time. And when it's sleepy time, that just means all the lights go off, etcetera, and so on. So it controls all of my lights, my HVAC, my shades, the pool, and the spa, and also my security system. So I have a camera pointed outside of the house, and anytime someone rolls in, I have a Quinn model that looks at the videos. So, first of all, there's change detection. Right. And then based on change detection, it goes to Quinn, and then it actually tells me it sends me a text to my WhatsApp. It shows an image from the outside, and it says, hey, a FedEx truck just pulled up FedEx truck just pulled up, and you might wanna check it, and you got new mail or something like that. And Dolby just text me this. This is really incredible. So so, Adobe is in charge of the house. I text through with it through WhatsApp, and it's been, like, really fun to have these macro actions that maintain my house. I haven't, like, really pushed it, like, way more beyond that, and I think people are doing a lot more crazy things with it. But for me, even just a home automation setup, I used to use, like, six apps. Yeah. Like, completely different apps. And I don't have to use these apps anymore. Like, Adobe controls everything in natural language. It's amazing.

11:40

And so I think, like, I haven't even pushed a paradigm fully, but already that is so helpful and so inspiring, I would say. Do you think that's indicative of, like, what people want from a user experience perspective with software? Right? Because I I don't think you know, it's pretty ignored that it takes humans effort to, like, learn new software, like, new UI. Yeah. I think to some extent, that's right. It's like working backwards from how people think an AI should be, because what people have in their mind of like what an AI is is not actually what an LLM is by, like, a raw sense. Like, LLM is a token generator. You know, like, more tokens come out. But what they think of is like this persona identity that they can tell stuff, and it remembers it. You know? And it's just kinda an entity behind the WhatsApp. It's like a lot more understandable. Mhmm. So I think, to some extent, it's like matching the expectations that humans already have for what an AI should behave, Under the hood, there's like a lot of technical details go into that, and LLNs are too raw of a primitive to actually type check as AI, I think, for most people, if that makes sense. Yeah. I think that's how we understand what the AI is and the description of it as Dobby or some person. It obviously resonates I with also think that the unification that you did across your six different software systems for your home automation speaks to a different question of like, do people really want all of the software that we have today? Yeah. Right? Because I would argue, like, well, you have the hardware, but you've now thrown away the software or the UX layer of Do you think that's what people want? Yeah. I think there's this, like, there's this sense that these apps that are in the App Store for using these smart home devices, etcetera, these shouldn't even exist kind of in a certain sense. Like, shouldn't it just be APIs, and shouldn't agents be just using it directly?

13:23

And wouldn't it, like, I can do all kinds of home automation stuff that any individual app will not be able to do. Right? And then LLM can actually drive the tools and call all the right tools and do pretty complicated things. And so, in a certain sense, it does point to this Like, maybe there's an overproduction of lots of custom bespoke apps that shouldn't exist, because agents crumble them up, and everything should be a lot more just exposed API endpoints, and agents are the glue of the intelligence that actually, like, tool calls all the parts. Another example is, like, my treadmill. There's an app for my treadmill, and I wanted to, like, keep track of how often I do my cardio, but, like, I don't want to, like, log into a web UI and go through a flow and etcetera. Like, all this should just be, like, make APIs available, and this is kind of, you know, going towards the agentic sort of web or, like, agent first tools and all this kind of stuff. So I think the industry just has to reconfigure in so many ways that it's, like, the customer is not the human anymore. It's, like, agents who are acting on behalf of humans, and this refactoring will be will probably be substantial in a certain sense. One way that people sometimes push back on this is, like, do people do we do we expect people to bytecode some of these tools? Do we expect normal people to do this kind of stuff that I described? Mhmm. But I think to some extent, this is just, you know, technology as it exists today, and right now there is some bad coding, and I'm actually watching it, and I'm working with the system. But I kinda feel like this kind of stuff that I just talked about, this should be free, like, in a year or two or three. There's no bad coding involved. This is trivial. This is table stakes. This is like any AI, even the open source models, etcetera, can, like, do this. You should be able to translate from a less technical human's intent very easily to this Yeah. It's web coding that's involved, and not many people are gonna do it, but And you still have to make some design decisions. Right? We were talking about, like, take frames, for example. Yeah. But I kinda feel like this will just start to the barrier will just come down, and it's just ephemeral software on your behalf, and some kind of like Claw is handling all the details for you, but you're not involved. Claw has a Claw has a machine, and it will figure it out. And it's just presenting you UIs, you're, like, saying stuff. You know? Mhmm.

15:27

Haven't you, I guess, like, pushed the boundaries of what you can personally with claws? Like, is it, you know, you're focusing on more important projects, AutoResearch, etcetera, or you're climbing the hill to mastery or something else. Right? Yeah. I just feel like I'm so distracted by everything. So I spent I spent like a week on the Claude stuff, and I have more to dos almost. But I will say that It's like Jensen told us we're all just busier, unfortunately. Yeah. I didn't really take advantage of a lot of like email and calendar and all this other stuff, and I didn't even access because I'm still a little bit, like, suspicious and still very new and rough around the edges. So I didn't wanna give it, like, full access to my digital life yet, and part of it is just the security, privacy, and just being very cautious in that in that realm. And so some of it is, like, held back by that, I would say. Yeah. Maybe that's like the dominant dominant feature, but some of it is also just I feel so distracted because I feel like I had a week of Claude, and then other stuff is happening. What was the I mean, you've talked about, like, being able to train or at least optimize a model as a task you wanted to see agents do for a long time. What was the motivation behind AutoResearch?

16:34

AutoResearch, yeah. So, I think, I had a tweet earlier where I kind of said something along the lines of to get the most out of the tools that have become available now, you have to remove yourself as the bottleneck. You can't be there to prompt the next thing. Need to take yourself outside. You have to arrange things such that they're completely autonomous, And the more How can you maximize your token throughput and not be in the loop? This is the goal. And so, I kind of mentioned that the name of the game now is to increase your leverage. I put in just very few tokens just once in a while, and a huge amount of stuff happens on my behalf. And so, AutoResearch, like, tweeted that, and I think people liked it and whatnot, but they haven't, like, maybe worked through, like, the implications of that. And for me, AutoResearch is an example of, like, an implication of that, where it's like, I don't wanna be, like, the researcher in loop, like, looking at results, etcetera. Like, I'm I'm holding the system back. So, the question is, how do I refactor all the abstractions so that I'm not I have to arrange it once and hit go. The name of the game is how can you get more agents running for longer periods of time without your involvement, doing stuff on your behalf? And AutoResearch is just, yeah, here's an objective, here's a metric, here's your boundaries of what you can and cannot do, and go. Yeah, You were surprised at its effectiveness. Yeah, I didn't expect it to work because So, I have the Project Data Chat, And fundamentally, like, I think a lot of people are very confused with my obsession for, like, training GPT-two models and so on. But for me, training GPT models and so on is just a little harness, a little playground for training LLMs. And fundamentally, what I'm more interested in is, like, this idea of recursive self improvement and to what extent you can actually have LLMs improving LLMs. Because I think all the frontier labs, this is like the thing Mhmm. For obvious reasons, and they're all trying to recursively self improve, roughly speaking. And so for me, this is kinda like a little playpen off that. And I guess I'd like tuned Nematode chat already quite a bit by hand in a good old fashioned way that I'm used to. Like, I'm a researcher. I've done this for, like, you know, two decades. I have some amount of, like, what is the opposite Yeah.

18:28

Earned confidence. Okay. I have, like, two decades of, like, oh, I've trained this model, like, thousands of times of, like so I've done a bunch of experiments. I've done hyperparameter tuning. I've done all the things I'm very used to, and I've done for two decades. Yeah. And I've gotten to a certain point, and I thought it was, like, fairly well tuned. And then I let AutoResearch go for like overnight, and it came back with like tunings that I didn't see. And yeah, did forget like the weight decay on the value embeddings, and my Atom betas were not sufficiently tuned, and these things jointly interact. So, like, once you tune one thing, the other things have to potentially change too. You know, I shouldn't be a bottleneck. I shouldn't be running these hyperparameters to shop optimizations. I shouldn't be looking at the results. There's objective criteria in this case, so you just let you just have to arrange it so that it can just go forever. So that's a single sort of version of AutoResearch, of, like, a single loop trying to improve. And I was surprised that it found these things that I you know, the repo was already fairly well tuned and still found something. And that's just a single it's a single loop. Like, Frontier Labs, they have GPU clusters of tens of thousands of them. And so, it's very easy to imagine how you would basically get a lot of this automation on smaller models. And fundamentally, everything around, like, frontier level intelligence is about extrapolation and scaling loss. And so, you basically do a ton of the exploration on the smaller models, and then you try to extrapolate out. So you're saying our research efforts are gonna get more efficient. Like, we're gonna have better direction for when we scale as well if we can do this experimentation better. Yeah. Would say that, like, the most interesting project and probably what the Frontier Labs are working on is, you you experiment on the smaller models. You try to make it as autonomous as possible, remove researchers from the loop. They have way too much what is the opposite?

20:04

Earned Yeah. Yeah. They don't know. They shouldn't be touching any of this, really. So, And you have to rewrite the whole thing, because right now, suddenly they can contribute ideas, but, okay, they shouldn't actually be enacting those ideas. There's a queue of ideas, and there's maybe an automated scientist that comes up with ideas based on all the archive papers and GitHub repos, and it funnels ideas in, or researchers can contribute ideas, but it's a single queue, and there's workers that pull items, and they try them out, and whatever works just gets put on the feature branch, and maybe some people monitor the feature branch and merge to the main branch sometimes. So, yeah, just removing humans from all the processes and automating as much as possible and getting high tokens per second throughputs. And it does require rethinking of all the abstractions, and everything has to be reshuffled. So, yeah, I think it's very exciting. If we take one more recursive step here, when is the model gonna write a better ProgramMD than you? Yeah. So ProgramMD is We're not not in the loop. Yeah. Exactly. Yeah. So ProgramMD is my crappy attempt at describing, like, how the AutoResearch should work. Like, oh, do this, then do that, and that, and then try these kinds of ideas. Then here's maybe some ideas, look at architecture, look at optimizer, etcetera. But I just came up with this in markdown. Right? And so, yeah, exactly. You want some kind of an AutoResearch loop maybe that looks for you can imagine that different Program. Nds would give you different progress. So, basically, every research organization is described by ProgramMD. Yeah. A research organization is a set of markdown files that describe all the roles and how the whole thing connects. And you can imagine having a better research organization. So, maybe they do fewer stand ups in the morning because they're useless. And this is all just code. Right? And so you can So, one organization can have fewer stand ups. One organization can have more. One organization can be very risk taking. One organization can be less. As you can definitely imagine that you have multiple research orgs, and then they all have code. And once you have code, then you can imagine tuning the code. So, 100% there's like the meta layer of it.

22:10

Did you see my text about my contest idea? My contest idea was, like, let people write different program MDs. Right? And so for same hardware, where do you get most improvement? Oh, I see. And then you can take all that data and then give it to the model and say, write a better ProgramMD. Yes. Yes. Yeah, exactly. We're gonna get something better. Like, there's no way we don't. Right? You can 100% look at where the improvements came from, and like, can I change ProgramMD such that more of these kinds of things would be done? Or like things that didn't work. It's meta optimization. Yeah. You can 100% imagine doing that. So, I think this is a great idea. But it's like, you know, I think like you could sort of go one step at a time where you sort of have one process, and then second process, and then the next process, and these are all layers of an onion. Like, the LLM sort of part is now taken for granted. The agent part is now taken for granted. Now the claw like entities are taken for granted, and now you can have multiple of them, and now you can have instructions to them, and now you can have optimization of the instructions, and it's just like, it's a little too much, you know? But, I mean, this is why it gets to the psychosis is that this is, like, infinite, and everything is skill issue. And that's why I feel like, yeah, that's just coming back to this is why it's so insane. Okay. Well, if we're we're just trying to, like, diagnose the current moment and what is a relevant skill right now. What do you think is the implication that this is the loop we should be trying to achieve in different areas and that it works?

23:31

Like, you know, remove, create the metric or create the ability for agents to continue working on it without you. Yeah. Do we still have performance engineering? Like, what Yeah. I mean, so there's a few caveats that I would put on top of the LM psychosis. Number one, this is extremely well suited to anything that has objective metrics that are easy to evaluate. So, for example, like writing kernels for more efficient CUDA code for various parts of a model, etcetera, are the perfect fit. Because you have inefficient code, and then you want efficient code that has the exact same behavior, but it's much faster. Perfect fit. So, a lot of things are perfect fit for AutoResearch, but many things will not be. And so they It's just if you can't evaluate it, then you can't AutoResearch it. Right? So, that's like caveat number one. And then maybe caveat number two, I would say, is, you know, we're kinda talking about next steps, and we kinda see what the next steps are, but fundamentally, the whole thing still doesn't it's still kinda like bursting at the seams a little bit, and there's cracks, and it doesn't fully work. And if you kinda try to go too far ahead, the whole thing is actually net not useful, if that makes sense. Because these models still are not you know, they've improved a lot, but they're still rough around the edges, is maybe the way I would describe it. I simultaneously feel like I'm talking to an extremely brilliant PhD student who's been like a systems programmer for their entire life, and a 10 year old. And it's so weird, because humans, I feel like they're lot more coupled, like you have, you know, everything in the wouldn't encounter that combination. This jaggedness is really strange, and humans have a lot less of that kind of jaggedness, although they definitely have some. But humans have a lot more jaggedness. Sorry, the Agents have a lot more jaggedness, where sometimes, like, you know, I ask for functionality, and it like comes back with something that's just like totally wrong, and then we get into loops that are totally wrong. And then I'm just I get so frustrated with the agents all the time still because you feel the power of it, but you also There's still like it does nonsensical things once in a while for me still as well. I get very annoyed when I feel like the agent wasted a lot of compute on something it should have recognized was an obvious problem. Yeah.

25:34

I think like some of the bigger things is like, maybe what's underneath it, if I could hypothesize, is fundamentally these models are trained via reinforcement learning. So, they're actually struggling with the exact same thing we just talked about, which is the labs can improve the models in anything that is verifiable, but that has rewards. So, did you write the program correctly, and does it do the unit test checkout? Yes or no. But some of the things where they're struggling is, like, example, I think they have a tough time with, like, nuance of maybe what I had in mind or what I intended and when to ask clarifying questions. Like, what I yeah. It's just anything that feels softer is, like, worse. And so you're kind of, like, you're either on rails and you're part of the superintelligence circuits, or you're not on rails and you're outside of the verifiable domains, and suddenly everything kind of just, like, meanders. Like, maybe another way to put it is if you go to if today, if you go to, like, state of the art model, ChatGPT, and you ask it, tell me a joke. Do you know what joke you're gonna get? There's the joke. The joke? I do feel I I can't tell you, like, the, you know, standard form of it, but I do feel like ChatGPT has, three jokes. Yeah.

26:35

The the joke that apparently all the LMs like love the most is why do scientists not trust atoms? Okay. Because they make everything up. Okay. They make everything up. Okay. This is Why still would that emerge? So, this is the joke you would get three or four years ago, and this is the joke you still get today. Okay. So, even though the models have improved tremendously, and if you give them an agentic task, they will just go for hours and move mountains for you. Mhmm. And then you ask for, like, a joke, it has a stupid joke, a crappy joke from five years ago. Mhmm. And it's because it's outside of the RL. It's outside of the reinforcement learning. It's outside of what's being improved. And it's part of the jaggedness of like, shouldn't you expect models as they get better to also have like better jokes or more diversity of them? It's just not being optimized, and it's stuck. Do you think that that implies that we are not seeing, like, generalization in the sense of, like, broader intelligence of joke smartness being attached to code smartness? Yeah. I think there's some decoupling where some things are verifiable and some things are not, and some things are optimized for arbitrarily by the labs depending on like what data went in, and some things are not.

27:46

But I mean, the premise, there's a premise from some research groups that if you are smarter at code generation or in these verifiable fields, you should be better at everything. And, like, the joke situation suggests that that's not happening in all of I don't think that's happening. Okay. Yeah. I don't think that's happening. I think maybe I we're seeing like a little bit of that, but not like a satisfying amount. Yeah. Black jaggedness exists in humans. You can be very, very good at math and still tell a really bad joke. Yeah. That's true. Yeah. But it just it still means that we're not getting, like, the story is that we're getting a lot of the intelligence and capabilities in all the domains of society for free as we get better and better models, and it's not exactly fundamentally what's going on, and there's some blind spots, sometimes some things are not being optimized for, and this is all clustered up in these neural net opaque models. So, you're either on rails of what it was trained for, and everything is like you're going at speed of light, or you're not. And so, it's jaggedness. So, that's why I think even though progression is obvious, which should happen, you can't let it fully go there yet because it doesn't fully work, or it's a skill issue, and we just haven't, like, figured out how to use it. So, you know, it's hard to tell. Can I ask kind of a blasphemous question, which is like if this jagginess is persisting and it's all rolled up in a least monolithic interface, right, but, you know, single model, does that make sense? Or do you should should it be unbundled into things that are can be optimized and improved against different domains of intelligence. Like, unbundling the models into multiple experts in different areas, etcetera? More directly. Yeah.

29:22

Instead of just MOE that we have no exposure to. Because that can be, like, confusing as a user from the outside Uh-huh. Which is like, why is it so good at this but not at this other thing? Yeah. I think currently, my impression is the labs are trying to have a single sort of, like, monoculture of a model that is arbitrarily intelligent in all these different domains, and they just stuff into the parameters. I do we should expect more speciation in the intelligences. Like, you know, the animal kingdom is extremely diverse in the brains that exist, and there's lots of different niches of nature, and some animals have overdeveloped visual cortex or other kind of parts, and I think we should be able to see more speciation, and you don't need like this oracle that knows everything. You kinda speciate it, and then you put it on a specific task, and we should be seeing some of that because you should be able to have, like, much smaller models that still have the cognitive core, like, they're still competent, but then they specialize, and then and then they can become more efficient in terms of latency or throughput on specific tasks that you really care about. Like, if you're a mathematician working in lean, I saw, for example, there's a few releases that really, like, target that as a domain. So there's probably gonna be a few examples like that where the unbundling kind of makes sense.

30:33

One question I have is whether or constraint on available compute infrastructure drives more of this because efficiency actually matters more. Right? Like, financing aside, the financing is involved in all of this. If you have access to full compute for anything you do, like leaving one single model, right? But if you actually feel pressure where you're like, I can't serve a model of massive size for every use case, do you think that leads to any speciation? Does that question make sense to you? The question makes sense. And I guess, like, what I'm what I'm what I what I'm struggling with is I don't think we've seen too much speciation just yet. Right? No. We're seeing a monoculture of models. Yeah. And there's like clearly pressure for like make a good code model, put it back in the main, merge again. Yeah, yeah. Even though there already is pressure on the models. I guess perhaps I feel like there's a lot of very short supply crunch, and, like, maybe that causes more speciation now. Yeah. Think fundamentally, like, the the the labs are serving a model, and they don't really know what the end user is going to be asking about. So maybe that's, like, some part of it because they kind of have to multitask over all the possible things that could be asked. But I think if you're coming to a business and maybe partnering on some specific problems you care about, then maybe you would see that there. Or there will be some very high value applications that are, like, more niche. But I think right now, they're kinda like going after the totality of what's available. I don't think that the science of manipulating the brains is, like, fully developed yet, partly. What do mean manipulating? So, like, so fine tuning without losing capabilities, as an example. And we don't have these primitives for actually, like, working with the intelligences in ways other than just context windows. Context windows kind of just work, and it's very cheap to manipulate, etcetera, and this is how we're getting some of the customization, etcetera. But I think if it was I think it's a bit more of a developing science of how you more deeply adjust the models, how you have continual learning maybe, or how you you fine tune in a certain area, how you get better in a certain area, or, like, how you actually touch the weights, not just the context windows. And so it's a lot more tricky, I would say, to touch the weights than just the context windows, because you're actually fundamentally changing the full model and potentially its intelligence. And so so maybe it's just like not a fully developed size, if that makes sense, of speciation.

32:52

And it also has to be like cheap enough Yeah. For that speciation to be worthwhile Yeah. In these given contexts. Can I ask a question about an extension to AutoResearch that you described in terms of open ground? You said, okay, well, you know, we have this thing. We need more collaboration surface around it essentially for people to contribute to research overall. Can you talk about that? Yeah. So we talked about AutoResearch has a single thread of, like, I'm gonna try stuff in Loopy. Mhmm. But fundamentally, the parallelization of this is, like, the interesting component. And I guess I was trying to, like, play around with a few ideas, but I don't have anything that, like, clicks as simply as, like I don't have something that I'm like super happy with just yet, but it's something I'm like working on the side when I'm not working on my claw. So, I think like one issue is if you have a bunch of nodes of paralyzation available to you, then it's very easy to just have multiple auto researchers talking through a common system or something like that. What I was more interested in is how you can have an untrusted pool of workers out there on the Internet. So, example, in AutoResearch, you're just trying to find the piece of code that trains a model to a very low validation loss. If anyone gives you a candidate commit, it's very easy to verify that that commit is correct is good. Like, they someone could claim from the Internet that this piece of code will optimize much better and give you much better performance. You could just check. Very easy. But probably a lot of work goes into that checking. But fundamentally, they could lie and etcetera. So you're basically dealing with a similar kind of it's almost actually, like, looks a little bit like my my designs that incorporate an untrusted pool of workers actually look a little bit more like a blockchain a little bit, because instead of blocks, you have commits, and these commits can build on each other, and they contain, like, changes to the code as you're improving it. And the proof of work is basically doing tons of experimentation to find the commits that work, And that's hard. And then the reward is just being on the leaderboard right now. There's no monetary reward whatsoever. But I don't wanna push the analogy too far, but it fundamentally has this issue where a huge amount of search goes into it, but it's very cheap to verify that a candidate solution is indeed good, because you can just train a single You know, someone had to try 10,000 ideas, but you just have to check that the thing that they produced actually works, because the 99,000 of them didn't work. You know? And so, basically, long story short, it's like you have to come up with a system where an untrusted pool of workers can collaborate with a trusted pool of workers that do the verification, and the whole thing is kinda like asynchronous and works and and so on. And it's it's like safe from a security perspective, because if anyone sends you arbitrary code and you're gonna run it, that is very sketchy and dodgy. So but fundamentally, it should be totally possible. So you're familiar with projects like setting at home and folding at home. Of these problems have a similar kind of setup. So, folding at home, you're folding a protein, and it's very hard to find a configuration that is low energy. But if someone finds a configuration that they evaluate to be low energy, that's perfect. You can just use it. You can easily verify it. So, a lot of things have this property that's very expensive to come up with, but very cheap to verify. And so, in all those cases, things like folding at home, or SATi at home, or AutoResearch at home will be good fits. And so, long story short, a swarm of agents on the Internet could collaborate to improve LLMs, and could potentially even run circles around Frontier Labs, like who knows? That's even possible. Frontier Labs have a huge amount of trusted compute, but the Earth is much bigger and has a huge amount of untrusted compute. But if you put systems in check, systems in place that deal with this, then maybe it is possible that the swarm out there could come up with better solutions, and people contribute cycles to a thing that they care about. And so, sorry, so the last thought is lots of companies or whatnot, they could maybe have their own things that they care about, and you, if you have compute capacity, you could contribute to different kind of AutoResearch tracks. Like, maybe you care about certain you know, you care about cancer or something like that of a certain type. You don't have to just donate money to an institution. You actually could purchase compute, and then you could join the AutoResearch forum for that project.

36:58

If everything is rebundled into AutoResearches, then compute becomes the thing that you're contributing to the pool. Yeah. That's very inspiring, and it's also interesting. Like, I know how far this goes, but it is interesting that at least some audience of people, you know, here in Silicon Valley or lining up at, you know, retail stores in China have discovered that, like, having access to personal compute is interesting again. Yeah. Right? So maybe they're really motivated to do that for their claws, and then they can contribute to AutoResearch. It's almost like dollars the thing everyone cares about, but is flop the thing that actually everyone cares about in the future? Like, is there gonna be, like, a flipping thing almost of, like, what's the thing that you care about? Like, right now, for example, it's really hard to get compute even if you have money. Yeah. So, actually, it almost seems like the flop is, like, dominant in a certain sense. Yeah. So so maybe that's kinda like kinda like that. Like, how much how many flops do you control instead of, like, what wealth do you control? I don't actually think that's true, but it's kind of interesting to think about. The last thing you released was, like, a little bit of jobs data analysis. Yeah. Is that right? Might have touched a nerve even though you're just like visualizing some public data. What were you curious about? Yeah, I guess I was curious to I mean, everyone really is thinking about the impacts of AI on the job market and what it's gonna look like. So, I was just interested to take a look, like, what does the job market look like? Where are the different roles?

38:20

And how many people are in different professions? And I was like really just interested to like look through the individual cases and try to think myself about, like, you know, with these AIs and how they're likely to evolve, like, are these gonna be tools that people are using? Are these gonna be displacing tools for these professions? And like, what are the current professions, and how are they gonna change? Are they gonna grow or adjust to a large extent? Or like, what could be new professions? So, it's really just like a way to fuel my own chain of thought about the industry, I suppose. And so, yeah, the jobs data basically is just a Bureau of Labor Statistics. They actually have a percent outlook for each profession about how much it's expected to grow over the next, I think, almost decade. Yeah. Think it's a decade, but it was made in 2024. Okay. We need a lot of healthcare workers. Yeah. So, they've already made those projections, and I'm not sure actually a 100% what the methodology was that they put into their projections. I guess I was interested to color things by like, if people think that what's, like, primarily being developed now is this kinda like more digital AI that is kind of like almost like these ghosts or spirit entities that can, like, interact in the digital world and manipulate a lot of, like, digital information, and they currently don't really have a physical embodiment or presence. And the physical stuff is probably gonna go slightly slower because you're manipulating atoms. So flipping flipping bits and and the ability to copy paste digital information is like makes everything a million times faster than accelerating matter. You know? So so energetically, I just think we're gonna see a huge amount of activity in digital space, huge amount of rewriting, huge amount of activity boiling soup. And I think we're gonna see something that in the digital space goes at the speed of light compared to, I think, what's gonna happen in the physical world to some extent, it would be the extrapolation. And so I think, like, there's currently kind of, like, I think, an overhang where there can be, like, a lot of unhobbling, almost potentially, of, like, a lot of digital information processing that used to be done by computers and people. And now with AI as, like, a third kind of manipulator of digital information, there's gonna be a lot of refactoring in those in those disciplines. But the physical world is actually gonna be, like, I think, behind that by some amount of time. And so I think what's really fascinating to me is, so that's why I was highlighting the the professionals that fundamentally manipulate digital information. This is work you could do from your home, etcetera. Because I feel like those will be like, things will change. And it doesn't mean that there's gonna be less of those jobs or more of those jobs because that has to do with, like, demand elasticity and many other factors, but things will change in these professions because of these new tools and because of this upgrade to the nervous system of the human superorganism, if you wanna think about it that way. Given the look you had at the data, do you have either any observations or guidance for people facing the job market or thinking about what to study now or what skills to develop? I mean, we can all go get like, I'm very thankful that I have to, like, meet people for my job right now. Yeah. I'm getting more physical. Yeah. Could you do your work from home, though? I could.

41:13

I think there are relationship parts of it that are hard, but most of it I could. Yeah. I think it's really hard to tell because, again, like, job market is extremely diverse, and I think the answers will probably vary. But to a large extent, like, these tools are extremely new, extremely powerful, and so just being, you know, just trying to keep up with it is, like, the first thing. And, yeah, because I think a lot of people kinda like dismiss it or Or they're afraid of it. Or they're afraid of it, etcetera, which is totally understandable, of course. Yeah. I think like it's fundamentally an empowering tool at the moment. And these jobs are bundles of tasks, and some of these tasks can go a lot faster. And so people should think of it as primarily a tool that it is right now. And I think the long term future of that is uncertain. Yeah. It's kinda really hard to forecast, to be honest. And, like, I'm not professionally, like, doing that really. And I think it's a job of, like, economists to do properly. You are an engineer, though. And, like, one thing I thought was interesting is that, like, the demand for engineering jobs is continuing to increase. Yeah. I can't tell if that's like a temporary phenomenon. I'm not sure how I feel about it yet. Do you know? Yeah. That's like the demand elasticity almost. Like, software was scarce. Right? And so the reason we don't have more demand for software is just scarcity and it's too expensive. Too expensive. Yeah. So, if the barrier comes down, then actually you have the Jebin's paradox, which is like, you know, you actually the demand for software actually goes up. It's cheaper, and there's more more More powerful. Yeah. The the classical example of this always is the ATMs and the bank tellers, because there was a lot of, like, fear that ATMs and computers, basically, would displace tellers. But what happened is they made, like, the cost of operation of of a bank branch much cheaper as there were more bank branches, so there were more tellers. It's like the canonical example people cite. But, basically, it's just Jemin's paradox. Like, something becomes cheaper, so there's a lot of unlocked demand for it. So I do think that that's probably I do have, like, cautiously optimistic view of this in software engineering, where I do it does seem to me like the demand for software will be extremely large, and it's just become a lot cheaper. And so I do think that for quite some time, it's very hard to forecast, but it does seem to me like right now, at least locally, there's gonna be more demand for software. Because software is amazing. It's like, you know, digital information processing. You're not forced to use arbitrary tools that are given to you that are imperfect in various ways. You're not forced to subscribe to what exists. Code is now ephemeral, and it can change, and it can be modified. And so, I think there's gonna be a lot of activity in the digital space to rewire everything in a certain sense, and I think it's gonna create a lot of demand for this kind of stuff. I think long term, yeah, obviously, even with AutoResearch, like OpenAI or Anthropic or these other labs, like, they're employing, what, like a thousand something researchers. Right? These researchers are basically, like, glorified AutoResearch. You know, they're like automating themselves away, like, actively, and this is like the thing they're all trying to do. Yeah. I think like, I went around Some of those researchers also feel psychosis. Right? Because they can it's working. Right? And so they're like, it's over for me too. I did spend a bunch of time going around OpenAI, and I was like, you guys realize if we're successful, like, we're a lot of jobs. Like, like, it's just we're just building automation for Sam or something like that. Like, I or the board. I'm not sure. But, like, there's just billing about this automation for, yeah, the board or the CEO or something like that, and we're all out of our job and maybe contributing on the sides. And so, yeah, it's kind of like nerding from that perspective. Is it okay if I ask you Noam's question?

44:38

You could be doing that. Right? Auto researching with a lot of compute scale and a bunch of colleagues at one of the frontier labs. Like, why not? Well, I was there for a while. Right? Like, and I did reenter. So to some extent, I agree, and I think that there are many ways to slice this question. It's a very loaded question a little bit. I will say that I feel very good about what people can contribute and their impact outside of the Frontier Labs, obviously. Not in the industry, but also in more ecosystem level roles. So your role, for example, is more ecosystem level. My role currently is also more on ecosystem level, and I feel very good about the impact that people can have in those kinds of roles. I think conversely, are definitely problems in my mind for basically aligning yourself way too much with the Frontier Labs too. So, fundamentally, I mean, have a huge financial incentive with these frontier labs, and by your own admission, the AIs are going to really change humanity and society in very dramatic ways, and here you are basically building the technology and benefiting from it, and being very allied to it through financial means. Like, this was a conundrum that was in at the heart of, you know, how OpenAI was started in the beginning. Like, this was the conundrum that we were trying to solve. Mhmm. And so, you know, that so it's kind of It's still not resolved. The conundrum is still not, like, fully resolved. So that's number one. You you're not a completely free agent, and you can't actually, like, be part of that conversation in a fully autonomous free way, like, if you're inside one of the Frontier Labs. Like, there are certain things that you can't say, and conversely, there are certain things that the organization wants you to say. And, you know, they're not gonna twist your arm, but you feel the pressure of, like, what you should be saying. You know? Right. Because, like, obviously. Otherwise, it's, like, really awkward conversations, strange side eyes, like, what are you doing? You know? So you can't, like, really be an independent agent, and I feel like a bit more aligned with humanity in a certain sense outside of frontier lab, because I'm not subject to those pressures almost, right, and I can't say whatever I want. I would say in the frontier labs, you can have impact there, of course, as well. But there's many researchers, and maybe you're one of them, maybe your ideas are really good, etcetera. And maybe there's a lot of decision making to do, and you want to be in a position where you are in the room with those conversations when they come up. I do think that currently the stakes are, like, overall fairly low, and so everything is kinda, like, nice, But ultimately, at the end of the day, like, when the stakes are really high, etcetera, if you're an employee at an organization, I don't actually know how much sway you're going to have on your organization and what it's going to do. Like, fundamentally, at the end of the day, it's you're not, like, really in charge. In a room, and you're contributing ideas, but you're not really in charge of that entity that you're a part of. So, those are like some sources of misalignment, I think, to some extent. I will say that, like, in one way, I do agree a lot with that sentiment, that I do feel like, labs, for better or worse, they're opaque, and a lot of work is there, and they're kind of like at the edge of capability in what's possible, and they're working on what's coming down the line. And I think if you're outside of that frontier lab, your judgment fundamentally will start to drift because you're not part of the, you know, what's coming down the line. Right. And so I feel like my judgment will inevitably start to drift as well. And I won't actually have an understanding of how these systems actually work under the hood. That's an opaque system. I won't have a good understanding of how it's going to develop and etcetera. And so I do think that in that sense, I agree and something I'm nervous about. I think it's worth basically base being in touch with what's actually happening and actually being in the frontier lab. And if if some of the frontier labs would have me come for, you know, some amount of time and do really good work for them, and then maybe coming Guys, he's looking for a job. This is super exciting. Yeah. Then I think that's maybe a good setup, because I kinda feel like it kind of, you know, maybe that's like one way to actually be connected to what's actually happening, but also not feel like you're necessarily fully controlled by those entities. So, I think, honestly, my mind, Noam can probably do extremely good work at AutoAI, but also I think his most AutoResearch.

48:30

Yeah. There's many things to do on the outside, and it's a and I think, ultimately, I think the ideal solution maybe is, like, yeah, going back and forth or yeah. And I think fundamentally, it can have a really amazing impact in both places. So very complicated. I don't know. Like, it's a very loaded question a little bit, but, I mean, I joined the Frontier Lab, and I'm outside. And then maybe in the future, I'll want to join again, and I think that's kind of like how I look at it. One question related to what visibility does the world or the AI ecosystem have into the frontier is like how close open source is to the frontier and how sustainable that is. I think it is quite surprising, the entire sequence of events actually from like having a handful of Chinese models and global models, and I think people are going to continue releasing here in the near term, that are closer than much of the industry anticipated from a capability perspective. I don't know if you're surprised by that, but you're a long term contributor to open source. Like, what's your prediction here? Yeah. So roughly speaking, basically, yeah. The closed models are ahead, but, like, people are monitoring the number of months that sort of, like, open source models are behind.

49:39

And to start with, there's nothing, and then it went to eighteen months. Yeah. They've been convergence. Right? So then maybe they're behind by, like, what is the latest? Maybe, like, eight months, six months, eight months kind of thing right now? Yeah. I'm a huge fan of open source, obviously. So for example, in operating systems, have, like, closed sort like, you know, Windows and Mac OS. These are large software projects, kind of like what LMs are gonna become, and there's Linux. But Linux is very easy. Like, actually, Linux is an extremely successful project. It runs on the vast majority of computers. Like, last time I checked, was it like 60% or something, like, run Linux? And that's because there is a need in the industry to have a common open platform that everyone feels sort of safe using. I would say, like, the industry has always felt a demand for that kind of a project to exist. Mhmm. And I think the same is true now, and that's why businesses actually want there's demand for this kind of a thing to exist. The big difference is that everything is capital. There's a lot of It's very expensive. That goes into this. So I think that's where things fall apart a little bit, make it a bit harder to to compete in terms size. I I do think that the current models are very good. The other thing that I think is really interesting is that for the vast majority of consumer use cases and things like that, even open source models are actually quite good, I would say. And I think, like, if you go forward, like, more years, it does seem to me like a huge amount of simple use cases are gonna be well covered and actually even run locally. But there's gonna be always like some demand for frontier intelligence, and that can actually be extremely large piece of the pie. But it could be that the frontier the need for frontier intelligence is gonna be, like, you know, Nobel Prize kind of work, or, like, let's move Linux from C to Rust. There's gonna be, like, bigger projects, you know, like, scoped in that kind of a way, and there's gonna be maybe more and maybe that's where a lot of the frontier closed intelligences are gonna be interacting with, and open source is kinda, like, gonna eat through a lot of the more basic use cases or something like that. You know, at some point, what is frontier today is gonna be you know, probably later this year, what's frontier today in terms of what I'm using right now from the closed labs might be open source, and that's gonna be doing a lot of work. So I kind of expect that this dynamic will actually basically continue. Like, we'll have Frontier Labs that have closed AIs that are kind of like these Oracles, and then we'll have open source kind of like behind by some amount of months, and I kind of expect that to to continue. And I actually think that's like a pretty pretty good setup overall, because I'm a little bit hesitant of having I don't actually think it's like structurally I think there's some systemic risk attached to just having intelligences that are closed, and that's like, that's it. And I think that that's a you know, centralization has a very poor track record in my view in the past and has You mean, like, in political or economic systems in in general? Yes.

52:12

Exactly. I think there's, like, a lot of European. Yeah. A lot of it's pretty bad precedent. So I want there to be a thing that is maybe not at the edge of capability because it's new and unexplored, etcetera, but I want there to be a thing that's behind and that is kind of like a common working space for intelligences that the entire industry has access to. Yeah. That seems to me like a pretty decent power balance for the industry. Yeah. I also think there's just like there are many problems to solve. Right? Like if you keep advancing intelligence from the frontier, we can do new things, and there are a lot of like very big problems for humanity, right? And so like it seems that that will continue to be a very expensive game. And so I want to like root for labs that are doing that because there are problems we cannot solve without continuing to advance the models in a very expensive way. And yet, as you point out, like if what we have today as Frontier is open, that's a lot of capability. Right? And so I think the power of that or the democratization of that seems like very useful and also healthy. Yeah. I think basically by accident, we're actually like in an okay spot. In optimal. Yeah. Yeah. But by accident, we happen to be in a good spot in a certain sense. Well, and to some degree, the longer this endures, like this dynamic, the healthier of a spot, like, the ecosystem might be in. Right? Because you have more and more area under the curve. And I will say that even on the close side, I almost feel like it's been, like, even further centralizing recently because I think a lot of the front runners are, like, not necessarily, like, the top tier. And so, yeah, like, in that sense, I think it's it's not super ideal. I would love there to be more more frontier less because, yeah, I'm, like, by default very suspicious of, like I want there to be more people in the room. I want I think, like, in machine learning, ensembles always outperform any individual model, and so I want there to be ensembles of people thinking about all the hardest problems, and I want there to be ensembles of people in a room when they to be all well informed and to make all those decisions. You know? So I don't want it to be like a closed doors with two people or three people. I feel like that's, like, not a good not a good feature. I almost wish, like, there were more labs, is long story short, and I I I do think that OpenAI has a place to play. I hope it sticks around. And basically it's currently slightly behind, and it's actually kind of like a good thing. Okay. You worked on the precursor to generalized robotics, autonomy, in cars. Right?

54:25

A lot has happened in the last couple months with robotics companies as well, like acceleration of really impressive generalization of environment, of tasks, like increasing long horizon tasks, lots of money going into the space. Like, is it gonna happen? Has anything in your view changed recently? So, like, my view is kind of informed by what I saw in self driving, and I do feel like self driving is the first robotics application. So probably what I saw is at the time, like ten years ago, there were a large number of startups, and I kinda feel like like most of them basically, like, didn't long term make it. And what I saw is that, like, a lot of capital expenditure had to go in and a lot of time. And so I think it's like I think robotics, because it's so difficult and so messy and requires a huge amount of capital investment and a lot of, like, conviction, just it's like a big problem, and I think items are really hard. So, I kind of feel like they will lag behind what's gonna happen in digital space, And in digital space, there's gonna be a huge amount of unhobbling, basically, like, things that weren't super efficient becoming a lot more efficient by, like, a factor of a 100, because bits are so much easier. And so, I think currently, in terms of what's gonna change and where the activity is, I kinda feel like digital space is going to change a huge amount, and then the physical space will lag behind. And what I find very interesting is this interface in between them as well, because I think in this, if we do have more agents acting on behalf of humans, and more agents talking to each other, and doing tasks, and participating in the economy of agents, etcetera, you're gonna run out of things that you're gonna do purely in a digital space. At some point, you have to go to the universe, and you have to ask it questions. You have to run an experiment and see what the universe tells you to get back to learn something. And so we currently have a huge amount of digital work because there's an overhang in how much we collectively thought about what already is digital. So we just didn't have enough thinking cycles among the humans to think about all the information that's already digital and already uploaded. And so we're gonna start running out of stuff that is actually, like, already uploaded. So you're gonna, at some point, read all the papers and process them and have some ideas about what to try. But, yeah, we're just gonna I don't actually know how much you can, like, get intelligence that's, like, fully closed off and with just information that's filled through it. You know? And so I think what's gonna happen is, first, there's gonna be huge amount of unhabbling, and I think there's a huge amount of work there. Then, actually, it's going to move to, like, the interfaces between physical and digital. So and that's like sensors of, like, seeing the world and actuators of, like, doing something to the world. Mhmm. So I think a lot of interesting companies will actually come from that interface of, like, can we feed the superintelligence in a certain sense data, and can we actually, like, take data out and manipulate the physical world per its bidding, if you wanna, like, anthropomorphize the whole thing. Right? And then the the physical world, actually, I almost feel like the the total addressable market, etcetera, in terms of, like, the amount of work and so on, is is massive, possibly even much larger, maybe what can happen in the digital space. So I actually think it's, a much bigger opportunity as well, but I do feel like it's a huge amount of work, and in my in my mind, the atoms are just, like, a a million times harder. So so it will lag behind, but it's also, I think, a little bit of a bigger market. So it's kinda like yeah. I think the opportunities kind of, like, follow that kind of trajectory. So right now, this digital is, like, my main interest, and then interfaces would be, like, after that, and then maybe like some of the physical things.

57:41

Like, their time will come, and they'll be huge when they do come. Well, it's an interesting framework for it too because certain things, not the things I'm working on right now, but certain things are much easier even in the world of atoms. Mhmm. Right? Like, you if just think about read and write to the physical world, read, sensors, cameras, there's a lot of existing hardware. And you can imagine enriching agent capabilities or capturing a lot of new data if you're just clever about it, and you don't necessarily have to invest a lot to get something valuable. Yeah. So, like, examples of this that I saw, for example, are, you know, a friend of mine, Liam, is the CEO of Periodic. I visited them last week, so it was just on top of mind. They're trying to do auto research for material science. And so, that case, it's like the sensors to the intelligence are actually pretty expensive lab equipment. And the same is true in biology. I think a lot of people are very interested in engineering biology, and, you know, the sensors will be more than just video cameras, if that makes sense. And then the other thing I saw, for example, is companies that are trying to have Like, you basically pay people for training data Yeah. As an example. Yeah, programmatically. Yeah. To feed the Borg. And so, these are all examples of sensors in a certain sense. So they take many diverse shapes and forms, if that makes sense. Mhmm. Yeah. So I'm looking forward to the point where I can ask for a task in the physical world, and I can put a price on it. Just tell the agent, like, you know, you figure out how to do it. Yeah. Go get the data. I'm actually kinda surprised we don't have enough, like, information markets. Mhmm. Like, if, for example, if polling market or other betting markets or even stocks, etcetera, if they have so much autonomous activity and rising amount of activity Mhmm. Like, why should like, for example, if Iran was just happening now, like, how come there isn't a process where, like, taking a photo or video from somewhere in Tehran should cost, like, $10? Like, someone should be able to pay for that. You know? Like and that's an example of, like, feeding the intelligence. There's not gonna be a human looking at it. It's gonna be, like, agents who are trying to guess the betting games in stock markets and so on. So I kinda feel like the agentic web is still, like, fairly new, that there's no, like, mechanisms for this, but this is an example of what I I think might happen. There's a good book that maybe is inspiring called Daemon. Mhmm. You potentially read it. In Daemon, the intelligence ends up, like, puppeteering almost a little bit, like humanity in some sense. You know? And so humans are kind of like its actuators, but humans are also like its sensors. And so I think, like, collectively, like, society will kinda, like, reshape in a certain way in to to serve that kind of a that will kind of like end up happening collectively across the industry where, yeah, there's just a lot more automation and has certain needs, and kind of humans will be serving those needs of that machine, not necessarily like to each other. But we were on this very specific point of, like, missing pieces of training data. We needed something like AutoResearch, right? Like, we need the training cycle or the SFT piece to be far more mechanized.

1:00:27

For what part? In order to make the collection, like, in order to take the human out of the loop to ask for a task that has just like improved my model quality with new data. Right? Yes. Does that make sense to you? Like, if you can't have the model do the training runs by itself, then your ability to do this is a, like, closed loop task by pricing data is more challenged. Yes. Yes. A 100%. Yeah. But now it is. The thing is for LLM training, it actually is, like, very easily it, like, really fits the paradigm. Mhmm. So you'd actually expect metric. Yeah. Like, LLM training actually fits the paradigm really well, really easily. Like, all the optimization of all the code, and so it runs faster. Then you also have metrics that you can optimize against. I do think that if you had an autonomous loop over those metrics, there's gonna be a lot of good hardening going on where the system will overfit to those metrics. But then you can use the system to devise more metrics, you just have really good coverage. So it's kinda hard to tell, but in a certain sense, it's like a pretty good fit. I wanna talk about a little tiny side project you have before we end. Tell me about the micro GPT art. Oh, yeah. Okay. So micro GPT. So I have this, like, running obsession of, like, maybe a decade or two of just, like, simplifying and boiling down the, basically, LLMs to, like, their bare essence, and I've had a number of projects along these lines, so, like, NanoGPT and Make More and MicroGrad, etcetera. So, I feel like MicroGPT is now the state of the art of me trying to, like, just boil it down to just the essence. Because the thing is, like, training neural nets and LLMs specifically is a huge amount of code, but all of that code is actually complexity from efficiency. It's just because you need it to go fast. If you don't need it to go fast and you just care about the algorithm, then that algorithm actually is the 200 lines of Python. Very simple to read, and this includes comments and everything. Because you just have, like, your dataset, which is text, and you need your neural network architecture, which is, like, 50 lines. You need to do your forward pass, and then you have to do your backward pass to calculate the gradients. And so an old AutoGrad engine to calculate the gradient is, like, a 100 lines, and they need an optimizer, an Atom, for example, which is a very state of the art optimizer. It's, like, again, 10 lines, really. And so putting everything together in the training loop is, like, yeah, 200 lines. And what's interesting to me, like, normally before, like, maybe a year ago or more, if I had come up with micro GPT, I would be tempted to basically explain to people. Like, I have a video, like, stepping through it or something like that. And I actually tried to make that video a little bit, and I tried to make, like, a little guide to it and so on. Mhmm. But I kinda realized that this not really it's not really adding too much, because people because it's already so simple that it's 200 lines that anyone could ask their agent to explain it in various ways, and agents like, I'm not explaining it to people anymore. I'm explaining it to agents. If you can explain it to agents, then agents can be the router, and they can actually target it to the human in their language with infinite, you know, patience, and just at their capability, and so on. Right. If I don't understand this particular function, I ask the agent to explain it to me like three different ways, and I'm not gonna get that from you. Exactly. And so I kind of feel like, you know, what is education? Like, it used to be guides, it used to be lectures, it used to be this thing, but I feel like now more I'm explaining things to agents, and maybe I'm coming up with skills where, like, so basically, skill is just a way to instruct the agent how to teach the thing. So, maybe I could have a skill for MicroGPT of the progression I imagine the agent should take you through if you're interested in understanding the code base. And it's just like hints to the model to like, oh, first start off with this, and then with that. And so I could just script the curriculum a little bit as a skill. So so I I don't feel like yeah. I feel like there's gonna be less of, like, explaining things directly to people, and it's gonna be more of just like, does the agent get it? And if the agent gets it, they'll do the explanation. And we're not fully there yet because they I still can I still think I can probably explain things a little bit better than the agents, but I still feel like the models are improving so rapidly that I feel like it's a losing battle to some to some extent? And so I think education is gonna be kinda like reshuffled by this quite substantially, where it's the end of, like, teaching each other things almost a little bit. Like, if I have a library, for example, of code or something like that, it used to be that you have documentation for other people who are gonna use your library, but you shouldn't do that anymore. You should have instead of HTML documents for humans, you have markdown documents for agents. Because if agents get it, then they can just explain all the different parts of it. So it's this redirection through agents. You know? And that's why so I think we're gonna see a lot more of that playing out. Well, we'll see if the great teachers know, like, to develop intuition for how to explain things to the agents differently. Ultimately, so for example, micro GPT, like, I asked I tried to get an agent to write micro GPT. Mhmm. So I told them, like, try to boil down the simplest things, like try to boil down neural network stream to the simplest thing, and it can't do it. Like, microGPT is like my end of my obsession. It's the 200 lies. I thought about this for a long time. I've attested about this for a long time. This is the solution. Trust me. It can't get simpler, and this is my value add. Everything else, like, agent gets it. It just can't come up with it, but it totally gets it and understands why it's done in a certain way, etcetera. So, like, my contribution is kinda like these few bits, but everything else in terms of, like, the education that goes on after that is, like, not my domain anymore. So maybe, yeah, it's like education kinda changes in those ways where you kinda have to infuse the few bits that you feel strongly about the curriculum, or the best the better way of explaining it or something like that. The things that agents can't do is your job now. The things that agents can do, they can probably do better than you or, like, very soon. And so you should be strategic about what you're actually spending time on. Well, we appreciate the few cents. Thank you, Andrej.

1:06:11

Okay. Find us on Twitter at no priors pod. Subscribe to our YouTube channel if you wanna see our faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at nopriors.com.

Full transcripts, AI insights,episode chat — free.

Full transcripts, AI insights,
episode chat — free.