How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)
Lenny Rachitsky 1h 25m 3 days agoEN
Interviews with world-class product leaders and growth experts to uncover concrete, actionable, and tactical advice to help you build, launch, and grow your own product.
Cat Wu explains how Anthropic’s product teams ship at unusually high speed by reducing process, clarifying goals, and using research previews to get ideas into users’ hands quickly. She also shares how the PM role is changing in AI, why product taste and evals matter more, how Claude Code and Cowork are used internally, and what people should do to thrive in an AI-driven workplace.
How Anthropic structures product work for speed
Cat describes her role alongside Boris Cherny on Claude Code, saying he sets much of the product vision while she focuses on the path from today’s product to that future and on aligning cross-functional teams. She says the line between product and engineering is intentionally blurry, with both sides driving work where they have the strongest conviction.
She argues that AI has compressed product timelines from quarters to weeks or even days, so PMs need to optimize for getting ideas into users’ hands quickly rather than coordinating long-range roadmaps. Anthropic supports this with clear goals, weekly metrics reviews, team principles, lightweight one-pagers when needed, and a launch process that relies heavily on research previews to lower the cost of shipping early.
Feature timelines have shrunk from months to weeks, days, or even one day
Claude Code ships many features as research previews to reduce commitment and speed up learning
PMs add value by clarifying goals, setting principles, and removing blockers to shipping
Fast shipping, leaks, and product tradeoffs
Asked about Anthropic’s pace, Cat says the main driver is not a secret internal model but a culture and process designed to remove barriers so anyone can take an idea to launch quickly. She also addresses two recent controversies: the Claude Code source leak, which she says came from human error and led to stronger safeguards, and the OpenClaw decision, which she frames as a capacity and prioritization choice in favor of first-party products and the API.
She then outlines Anthropic’s PM organization and explains that roles are increasingly overlapping: engineers do product work, PMs write code, and designers are more technical. In that environment, she says the rarest and most valuable skill is product taste—knowing what to build and how to shape it—while also acknowledging that moving this fast can reduce product consistency and make the product suite harder for new users to navigate.
Anthropic’s speed comes mostly from low process and strong expectations, not just better models
The Claude Code source leak was investigated as a human and process failure, not an individual failure
Cat says the biggest sacrifice of fast shipping is product consistency and clarity for new users
Mission, focus, and why Anthropic has been successful
Cat says Anthropic’s biggest advantage is a unifying mission around bringing safe AGI to humanity, which helps teams make fast decisions across org boundaries. She distinguishes mission from focus: mission means teams are willing to sacrifice their own local goals for Anthropic’s broader goals, while focus helps the company avoid spreading itself too thin.
That alignment reduces internal friction and makes tradeoffs easier. Cat says the company is unusually willing to prioritize what helps Anthropic overall, even if it means a specific product team delays or gives up something it wants.
A shared mission helps Anthropic resolve competing priorities quickly
Teams are willing to sacrifice local goals for company-level goals
Cat says she would be happy if Anthropic succeeded even if Claude Code itself failed
When to use Claude Code, Desktop, and Cowork
Cat explains how she thinks about Anthropic’s product surfaces. She uses Claude Code in the terminal for the most powerful, up-to-date coding workflows; Desktop when she wants a more visual interface, especially for front-end work with previews; and web or mobile when she wants to kick off tasks away from her laptop.
She positions Cowork as the tool for non-code outputs like inbox triage, docs, and slide decks. To get the most from it, she recommends connecting the relevant context sources—such as Slack, Gmail, Calendar, and Drive—so the model can pull the right information. She gives a concrete example of using Cowork to assemble a conference deck overnight by combining PMM input, launch history, internal channels, and design templates into a polished first draft.
Claude Code is best for coding tasks, while Cowork is for non-code outputs like docs and decks
Connecting data sources like Slack, Gmail, Calendar, and Drive improves Cowork’s results
Cat used Cowork to generate a 20-page conference deck draft from internal context and prior materials
Anthropic’s internal AI workflows and custom tools
Cat says her day-to-day stack is heavily centered on Claude Code, Cowork, and Slack, with Slack functioning as Anthropic’s operating system. She spends a meaningful portion of her time pushing the limits of Anthropic’s own tools and studying where they fail so the team can improve them.
She also describes a rise in custom internal software built with Claude Code. One example is a sales tool that automatically tailors customer decks using Salesforce, Gong, notes, and product availability context, replacing repetitive manual work with a faster, more personalized workflow. She adds that Applied AI is one of the heaviest users of both Claude Code and Cowork because the team needs to support customers technically while managing large volumes of context and communication.
Slack is a core coordination layer at Anthropic, enhanced with custom bots and integrations
Employees are building personalized internal tools instead of relying only on generic SaaS products
Applied AI heavily uses Claude Code and Cowork for customer support, prototyping, and meeting prep
What PMs need now: taste, model intuition, and evals
Cat says the hardest PM skill in AI is defining what the product should look like a month from now despite rapidly changing model capabilities and user behavior. She warns that it is easy to design for a hypothetical super-capable future model, but the real challenge is extracting the most value from current models and guiding users onto the paths where those models work best.
To build that skill, she recommends spending a lot of time using the models, asking them to introspect on their own mistakes, and learning from a small set of trusted users who can articulate what is and is not working. She also argues that evals are underappreciated: even a small number of good evals can sharpen product definition, clarify success, and help teams measure progress on ambiguous AI behaviors.
Cat says PMs must avoid building only for a future super-capable model and instead optimize for current model limits
One underrated technique is asking the model to explain why it made a mistake
A small set of strong evals can define success and guide product improvements
Claude’s personality, evolving harnesses, and the path to many agents
Cat says Claude’s character is not a cosmetic detail but a core part of why people like working with it. She highlights traits such as low ego, positivity, competence, and a willingness to admit mistakes, arguing that these make Claude feel like a better collaborator and improve the overall product experience.
She also explains how new models force product changes. As models improve, Anthropic can remove scaffolding that older models needed, like stronger reminders around to-do lists, while also unlocking features that previously were not reliable enough to ship, such as stronger code review. Looking ahead, she describes a progression from single successful tasks to many simultaneous tasks, eventually requiring remote execution, better oversight interfaces, and systems that learn from user feedback over time.
Claude’s low-ego, positive personality is a major reason users enjoy working with it
New model releases often let Anthropic remove old prompting and harness workarounds
Cat expects workflows to evolve from one task at a time to dozens or hundreds of concurrent agents
How to thrive with AI: automate real work and keep improving
Cat’s advice for listeners is to use AI to automate repetitive work they already do, then reinvest the saved time into more creative or neglected projects. She stresses that people should focus on building workflows and apps they actually use every day, because real usage is what creates leverage and teaches where the tools still fall short.
She also cautions against stopping at a flashy but unreliable prototype. In her view, 95% automation is not enough; the last stretch to reliability is what makes an automation truly useful. At the same time, she warns against over-customizing tools to the point that setup becomes the work itself. Her broader principle is simple: understand the constraints, act with agency, and just do things.
Automate repetitive tasks first, then use the extra time for higher-value work
A 95% reliable automation is usually not good enough to trust
Build tools you use daily, not just demos or endlessly customized setups
Show Notes
Tap timecodes to jump
Cat Wu is Head of Product for Claude Code and Cowork at Anthropic, building one of the most important AI products of this generation. Before joining Anthropic, Cat spent years as an engineer and briefly worked in VC. Today, she’s interviewing hundreds of product managers who are trying to break into AI—and seeing firsthand what separates those who thrive from those who fall behind.
We discuss:
1. How Anthropic’s shipping cadence went from months to weeks to days
2. The emerging skills PMs need to develop right now
3. Why you need to build products that don’t yet fully work, so you’re ready when the next model closes the gap
4. Cat’s most underrated AI skill: asking the model to introspect on its own mistakes
5. Why Claude’s personality is core to its success
6. Why Anthropic’s mission alignment eliminates the friction that slows most large organizations
7. Why “just do things” is the most important principle for working at AI-native companies
—
Brought to you by:
WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUs
Vanta—Automate compliance, manage risk, and accelerate trust with AI
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.
—
Lenny may be an investor in the companies discussed.
To hear more, visit www.lennysnewsletter.com
Transcript
0:00 Link copied!
I think it is very hard to be the right amount of AGI pill. It's very easy to build the product for the super AGIstrong model. The hard thing is figuring out for the current model, how do you elicit the maximum capability. I've never seen anything like the pace you folks at Anthropic are shipping at. We wanna remove every single barrier to shipping things. The timelines for a lot of our product features have gone down from six month to one month and sometimes to even one day. You're interviewing hundreds of PMs and you just keep feeling like they're approaching it very incorrectly. The PM role is changing a lot. It's changing really quickly. The thing that is extremely important for building AI native products is iterating so quickly, figuring out a way for you to actually launch features every single week. What do you think are the emerging skills PMs need to develop? It comes back to product taste. As code becomes much cheaper to write, the thing that becomes more valuable is deciding what to write.Today, my guest is Cat Wu, Head of Product for Claude Code and CoWork at Anthropic. Cat is at the center of everything that is changing in AI and product and building, and she and her team are building the product that is most changing the way that we all build our products. She is so full of insights and wisdom and lessons.This is an episode you cannot miss.Before we get into it, don't forget to check out lenny'sproductpass.comfor an insane set of deals available exclusively to Lenny's newsletter subscribers.With that, I bring you Cat Wu.Cat, welcome to the podcast.Thanks for having me. I have so many questions. I'm so excited to have you on this podcast. I wanna start with giving people an understanding of your role alongside Boris.Everybody knows Boris. This he's his episode is the number one most popular episode on this podcast. No pressure.He created Claude Code, he leads the eng team, shipsa bazillion PRs a day from his phone, just like, I don't even know what the number is anymore.I think people don't give you enough credit for the success that Claude Code has had and co work and all the things you all are building. Help us understandyour role on the team, how you work with Boris, how you split responsibilities.
2:13 Link copied!
Just like what does the PMR look like on on the Claude Code team? I feel very lucky to work with Boris. He's been an amazing thought partner.He's our tech lead. He's very much the product visionary.And heis great at setting like, this is what the product needs to be in like three months, six months from now. This is like what the AGI pilled version of the product is.And a lot of my role is figuring out, okay, what is the path from where we are today to like that vision three to six months from now? AndI spend more of my time on the cross functional. So making sure thatour marketing team, sales team, finance, capacity, etc, are like bought in on the plan and that we're all rowing the same direction.And that once the feature is ready, that there aren't any blockers to shipping it. I think in many ways it works well because we kind of like mind meld, but it is actually like remarkably blurry of a line. Like, I think we're like 80% mind meld. And then there's, like, this 20% of things that, like, maybe I care a lot more about than Boris, so, like, I'll drive those. And, like, 20% where he cares a lot more than me, and he just, like, drives those. This episode is brought to you by our season's presenting sponsor WorkOS.
3:25 Link copied!
What do OpenAI,Anthropic,Cursor, Vercel, Replit, Sierra, Clay, and hundreds of other winning companies all have in common? They are all powered by Work OS.If you're building a product for the enterprise, you've felt the pain of integrating single sign on, SCIM, RBAC, audit logs, and other features required by large companies. Work OS turns those deal blockers into drop in APIs with a modern developer platform built specifically for b to b SaaS. Literally every startup that I'm an investor in that starts to expand upmarket ends up working with Work OS. And that's because they are the best. Whether you are a seed stage startup trying to land your first enterprise customer or a unicorn expanding globally, Work OS is the fastest path to becoming enterprise ready and unblocking growth.It's essentially Stripe for enterprise features.Visit workos.comto get started or just hit up their Slack where they have actual engineers waiting to answer your questions. Work OS allows you to build faster with delightful APIs, comprehensive docs, and a smooth developer experience.Go to workos.comto make your app enterprise ready today.Something that you shared actually before we startedrecording is the fact that you're interviewinghundreds of PMs all the time. Like, if I had a nickel every time someone asked me for an intro to someone at Anthropic to go work at Anthropic as a PM, I'd be I'd be I'd have 30,000,000,000 in ARR.It's just like the number one place people wanna go work at, so I can only imagine how many PMs you're interviewing.You told me that you're just seeing people doing it doing it wrong,the way they're approaching what they think it takes to be a successful AI PM. Talk about what you're seeing and what people need to understand about what it is what it takes to be successful these days. I think before AI,technology shifts were a lot slower, so you could plan on the six to twelve month time horizons.
5:12 Link copied!
And because you were shipping features at a bit of a slower rate, there was a lot more emphasis on coordinating with all the other partner teams to make sure thatthey're shipping features that unblock your features because code at that time was very expensive to make.I think now with AI and with how much that has accelerated engineeringand with how quickly the model capabilities are improving,the timelines for a lot of our product features have gone down from six months to one month and sometimes to one week or even one day. And with that, we actually need to make sure thatproducts ship quite quickly.And what that means is as a PM,there should be less emphasis onmaking sure that you're aligning your multi quarter roadmaps with your partner teams and more emphasis on, okay, how can we figure out the fastest way to get something out the door? How can we figure out how to make aconcept corner of our product suite where we can justan engineer has an idea or a PM has an idea. And by the end of the week, we are able to get into our users' hands. I think the PMs who do the best on AI native productsare the ones who can figure out how can I, like, shorten the time from having this idea to actually getting the product in the hands of users and help define what are the most important tasks that need to work out of the box for my product? So what I love about this is what you're saying is just like people haven't grasped how fast they need to move and what how much of the job now is just moving is helping the team move fast.
6:49 Link copied!
What what helps do that? What do you what do you do? What does your PM team do to help them move this fast other than have access to the the most advanced models? I think the first thing is to set clear goals.Because LMs are so general, that actually creates a lot of ambiguity in who we're building for, what problems we're trying to solve, what the top use cases are. And so I think a great PM is able to say, Okay,ourkey user is professional developers.The main problem that we want to solve for this feature is maybe there's like too many permission prompts and people are feeling fatigue.And like the use case is wewant professionaldevelopers at enterprises to safely get to zero permission prompts. And that actually sets a pretty clear goal because itrules out a lot of potential approaches for reducing permission prompts so that people can get a lot more done with one prompt.And then I think the second thing that's very important isfiguring out some repeatable process for getting these features shipped.So for Claude Code, what we do is we actually ship almost all of our features in research preview. We clearly brand this when we ship something so that users know that this is an early product, this is just an idea, this is just something that we're trying to get feedback on and iterating on, and that this might not be supported forever.And what this does is itreduces our commitment for shipping something. We can just get something out in a week or two.And then the third thing that a PM should do is help create the framework for the team so that they know when to pull in cross functional partners and what those cross functional partners' expectations are. So for example, we have a really tight process between engineering,marketing, and docs. So when engineers have a feature that they feel is ready and that we've dogfooded internally, they post it in our evergreen launch room. And then Sarah, who leads our docs and Alex, who leads PMM,and Tarek and Lydia on DevRel, just jump in and can turn aroundthe marketing announcement for it the very next day. And because we have this really tight process, it lowers the friction for any engineer to ship something. PM is the role that should be setting this up. How do PRDs fit into this? So the fact that you said that goals are really important part is just like being aligned on what does success look like? Who is this for? Who is this not for? Are you ready PRDs? Is it just like a couple of bullet points? How how's that evolved in the world of a PM? So there's two things that we do. One is we have very rigorous metricsand we do metrics readouts with the entire team every week. The goal of this is to make sure that everyone deeply understands all the facets of our business, what our key goals are, how they're trending, and what drives them.
9:29 Link copied!
The second thing that we do is we have this list of team principles.And this includeswho our key users are, why those are our key users.And the reason that we articulate all of this is so that everybody on the team feels like they understand how our business works. They understand what's important to us and what we're willing to trade off, and it lets people make decisions by themselves without feeling like they're blocked on PM or any other stakeholder. I love how so much of this is like, okay. We still need PMs in the future. And there's so much talk of like, why do we need PMs? We're just going to ship and build. We need engineers.Oh, we actually do PRDs sometimes. So I think for features that are like particularly ambiguous,it does help to write out just a one pager on what the goals are,what the delightful use cases are, what the failure modes currently are that we need to fix.And there are occasionally some projects, especially things that require heavy infrastructure,that do take many months. And for those situations, we do write PRDs still. I want to drill a little bit further into just how you're able to move so fast.I've never seen anything like the pace folks at Anthropic are shipping at. Like, someone made this calendar of launches across Anthropic, and it was literallyevery day there was like a major feature or product.So one question people had online is, you guys just launched this,not launched but built this incredible model, Mythos, that is still in preview because it's so powerful. People are a little afraid of what it can do.Have you guys been using this? Is this part of the reason you've been able to move so fast? We've been moving pretty fast forseveral quarters now, so I think it it's not fully Mythos.
11:11 Link copied!
Mythos is an incredibly powerful model. We do use the models internally.And I think this has increased our rate of shipping a little bit, but I don't think it explains the bulk bulk of the increase. I I think a lot of it is the process and the expectation on the team. So we're very low on process. We wanna remove every single barrier to shipping things. We wanna make sure every single person on the team feels empoweredto take their idea from just an idea to, like, out in the world in less than a week, sometimes even in a day. Cool. Oh, man. What a what an advantage to have the best model and also be building product. That's so cool. We are very lucky to be able to work with the Frontier models. Oh my god. What a what an awesome advantage. Just like build a thing and then use it and then accelerate faster. It's so interesting. There's a couple like these other side things I wanna just kinda go on these like side quests on this conversation. There's so much happening with Anthropic, and I just I'm so curious to get your insight.One isa week ago or so, the whole source code of Claude Code leaked. Somebody got it out there. I think it was a mistake someone made. Is there anything you comment there just like what happened? What went wrong? What should people know? So we immediately looked into this when we saw it.
12:19 Link copied!
We realized that this was the result of human error. There is a human working with Claude to write a PR. This was just an update to how we release our packages.And it actually went through two layers of human review. And so this was a result of human error. And we've hardened our processes to make sure that it doesn't happen in the future. Is this person still adanthropic? Are they doing alright? Yes. Yes. It's a process failure. Andthe most important thing is to just, like, learn from it and to add more safeguards so that doesn't happen again. And so that's that's what we've been focused on, and most of those have shipped. Okay. Another question I had is OpenCLaw.So recently, there's been this move tokeep people from using Claude subscriptionwith their OpenClaws,people got really upset, they're confused why this is happening, it feels like there'slike, I don't harm caused to the open source community.What what are people what do people need to understand about kind of what went into this decision? So we've been seeing a lot of demand for Claude.
13:22 Link copied!
And we've been working very hard to both scale our infrastructure and also to make our harness more token efficient so that you can get more usage out of it. It wasn't designed for third party products, which have differentusage patterns than our first party ones. We spent a bunch of timetrying to figure out what is the most seamless transition that we can offer. And so I was very happy to be able to say that everyone gets some credits alongside their subscription.But, yeah, we we did have to make the hard decision that we needed to prioritizeour first party products and our API. And so this is the this is the decision that resulted from that. Yeah. This like, to me, it makes so much sense. Like, you guys are subsidizingthis usage at, like, $200 a month. There's, like it's, like, basically unlimited use of this. And, like, I think people don't understand. This is trying to make money. We're trying to be profitable here. We can't just, like, give away compute when it's so in demand. So I get it. Coming back to the PM team, what does just, like, the PM team look like at Anthropic? How many PMs are there? How are they kind of organized?
14:26 Link copied!
Yeah. So we have a few PM teams. I think we're maybe around 30 or 40 PMs right now. So we have the research PM teamwho Diane leads.And this team is responsible for understanding all of the feedback from our customersfor our models and then feeding that to the best research team to act on it. And they also shepherd the model launch.There is the Claude Developer Platform team that maintains the APIs that Claude Code is built on top of.And they also release things like managed agents, which is a way for you to build your agents and we can host it on your behalf.And then there's Claude Code that works on both Claude Code and the Cowork core products.There's Enterprise that helps make Claude Code and Cowork easier to adopt for all of our enterprise customers.And so this is everything from cost controls, RBAC, security controls,and just making sure that these enterprises feel very confident and comfortableusing our tools.And then we also have our Growth team that is responsible for growing across our entire product suite. So we work very closely with them on Claude Code and CoWork growth. And I know they also work withour other teams on c CDP growth, so growth of people who use the Cloud API. So speaking of growth, so Amol was just on the podcast. He had this really interesting insight that most people haven't been sharing. There's always this sense that we need fewer PMs in the future, what's the why do we need PMs? Engineers can just ship. His take is that because engineers are moving so fast,PMs and designers are squeezed, there's less time to stay on top of everything that is happening, there's a feature shipping every day, So his take is he needs more PMs because it's hard to keep up. What's your take there? Do you feel like there'll be an increase in hiring of PMs?
16:13 Link copied!
What do think is going on with the PM profession long term? I think all of the roles are merging.PMs are doing some engineering work. Engineers are doing PM work. Designers are PM ing and also landing code. You can either hire a lot more engineers who have great product taste, or you cankeep your engineering hiring the same and hire a lot more PMs to help guide some of their work.On our team, we'repretty focused on hiring engineers with great product taste.This way we can reduce the amount of overhead for shipping any product. Thereare many engineers on our team who are fully able to end to end go from see user feedback on Twitter through to ship a product at the end of the week with almost no product involvement. And this I think is actually the most efficient way to ship something.So I think engineerand PM are kind of overlappingand you will get a lot of benefit from having more of either. I think product taste is stilla very rare skill to have, and we'll pretty much hire anyone who we feel hasdemonstrated this strongly.And your background was in engineering, right? Yeah. I was an engineer for many years. I was then a VC very brieflybeforejoining Anthropic.And actually almost all the PMs on our team have either been engineers or ship codehere on Claude Code. And so that's one of the things that I think helps build trust with the team and also just enablesus to move a lot faster. And then actually our designers also have been front end engineers before. Wow. Because that's the big question. Like there's definitely this merging that's happening. The Venn diagrams are combining. I think the big question for a lot of people is if you're coming from engineering or product or design, which of those core skills is going to be most valuable? I could see it Anthropic and on Claude Code. Engineering is very valuable. I'm curious if other companies, if you have a design background becoming a PM is more valuable or just a PMPM. I still think it comes back toproduct taste. As code becomes much cheaper to write,the thing that becomes more valuable is deciding what to write. Like what is the right UX for this feature? What is the most delightful way that a user can experience it? Weget tens of thousands of GitHub issues asking forevery single thing under the sun. Andit takes a lot ofcare andtaste to figure out, okay, which of these is worth building and what is the right way to build it? And I think thatskill set can come from any background, but I think that's the most important I think the reason why an engineering background isparticularly useful, at least for the next few months,is if you have an engineering background,you have a better sense for how hard something should be. And that's often a factor in what you choose to build. So like if something is very easy to build, then maybe instead of debating it, you just spend an hour doing it. But if something is harder to build and you know that upfront, then you know that, okay,this will just like cost a lot more for for our team to get this out the door. So it helps a bit with the prioritization.
19:27 Link copied!
You said in the next for the next few months, is that just, like, because the models will get so good potentially in the next few months, you may not even need to know that as as much? I think the valued skill sets does changequite frequently.And so it's really hard to predict more than a few months out. So it's less a commentary onwhat shifts I think will happen and more of a commentary that I think large shifts will happen. So you're not saying that's when Muthos comes out and will change everything. And we don't need to know anything about engineering.No. I'm just saying that every every few months, it seems like there's a Yeah.There's a largeincrease in coding capability,which then changes what other roles are valuable.Think themost important thing isto be able tohave this first principles thinking where you can figure outhow the tech landscape is changing,what the teamreally needs from you and to, like, jump in and fix that hole. BecauseI think the work is becoming more amorphous,which means thata great PM is able tounderstand what all the gaps are, to figure out what the highest priority ones are, and then to just like figure out, okay, how do I learn that skill set? Or what is like the skill set that I have that I can, like, apply to this challenge?SoI I think the current environmentvalues people who arewho are able to wear a lot of hats, are able to swap them,and are, like, very low egoabout what work they do to help the team move faster. I love this answer.
21:08 Link copied!
There's this question I've been askingpeople in your in your shoes, folks, that are kind of at the bleeding edge of what AI is capable of and building with the latest tools, which is just where will human brains continue to be useful and necessary for a while until we get to super intelligence.What I'm hearing here is essentiallypicking the things to work on,knowing where the market is going and figuring out what to prioritize essentially, And then it's knowing if the thing you've built is good and right and getting it out there in some early version at least. Does that sound right? Is there anything else of just like where human brains will continue to be useful for at least the next few months? I think humans stillprovide a level of common sense that the models don't.And there's like a thousand moving pieces to any product launch. Some of them are very small, but there'salways a lot that could potentially go wrong.I think the model doesn't always have a great sense ofwho all the stakeholders are, how they relate to each other, what their preferences are, what are the right venues to communicate with them to keep them on board. I think a lot of this, like,more tacit common sense, like EQ kind of knowledgeis still very valuable.Of course, we want the models to get better at this, and I think they will be.
22:22 Link copied!
But right now, I think there's still gaps. How do you just kinda deal as a human going through so much constant change just, like, just being on the inside of the tornado? Maybe it's calm there.But just, like, how do you how do you stay on top of what's going on? How do you stay sane through all this craziness that we're moving through? I think our team is still people who lean into the chaos.So we try to face every challenge with a smile because there's always so much going on. There's all there's always so many risks and tricky situations that,you know, if you get too stressed about anything, you'll burn out.And so we really look for people whocan kind of like look at a challenge, be like, that's gonna be hard, but I'm excited to tackle it. And I'm gonna do the best that I possibly can.And I know I won't be perfect,but I'll be able to sleep at night knowing that I did my best. That's an interesting answer to just like what skillswill be important in this future because it's I forget who said this. Maybe Ben Mann, that this is the most normal. This is the world will ever be.
23:23 Link copied!
Yeah. It definitely gets harder. Like, I feel likethere are a lot of weeks where maybe Sunday night, there's some, like, p zero. And then by Monday, there's, like, a p zero zero. And by Monday afternoon, there's a p zero zero zero. And you're like, wow. I can't believe I was so worried about that p zero from Sunday.But I think you just have to acknowledge that there's only so much that you can do that you need to sleep well so that you can make good decisions next day.And just, like, brutally prioritize where you spend your time, what's the most important thing to get right, And be Okay letting things go. Like there'sproducts that we ship that aren't as polished as I wish they were.Butourtop goal is to help empower professional developers. And if a product isn'tsuccessful,as long as it's not blocking the core use case,it's okay because we'll hear the feedback and we'll fix it in the next release.Launching a feature that is buggy is the kind of thing that would have kept me up at night.Butit is something that I am now able to, like, live with knowing that, okay, we're gonna get that quick feedback and we're going to fix it in the next release. What I'm imagining is there's that GIF. I think it's maybe from Pirates of the Caribbean where it's this guy walking down a pair of stairs on a ship and the whole ship is just being demolished around him and he's so chill just strolling down his turkey as everything's falling apart. And that's interesting because everyone I've met through from Anthropic is just so chill and just so like optimistic. Yeah.
24:52 Link copied!
That's I think that's a really interesting insight is just like having this calmness and optimismversus just like, oh my god. Everything's crazy and going going nuts.Yeah. I think if you don't have it, you'll get pretty burnt out.I I think we also tend to hire people whohave been in the industry for a while and have experienced lots of ups and downs andhave a good sense forwhat gives them energy and how to maintain their energy over time. I think that's helped us a lot. So interesting.Something that I wanted to ask about is so there's these roles blurring. Engineers are becoming PMs. Everyone's dogs or cats. Everyone's everyone.What what do we lose in that in that world? Do we lose like career ladders and clear career paths? Do we lose design consistency,code quality? You know, there's probably some downsides. What are some things you find are just like, okay, that's something we're sacrificing for the greater good? We're sacrificing product consistency.Historically, when code was expensive to write, you would carefully plan out everything in your product suite, how every product relates to each other, what the use case for every single one is, how they integrate,and you would pretty much have one product for each use case.And now with AI moving so quicklyandwith so many ideas that we need to test out,we do sometimes have features that overlap with each other.A lot of the times it's because there's two form factors that we love internally and wewant the external audience to tell us which one is better.What that means for someonewho's a new user, though, isa new user might not know, Okay, what is the best path to accomplish X?There ismore education we need to do to help people understand what the core features are and what the best practices are for using them. I think thisis the cost of launching a lot of features.I think users also feel like it's hard to keep up with the latest.Usuallyin traditional PM, you ship a feature every month or quarter. And so it's really easy for a user tounderstand, okay, I just need to check-in on this once a month and I'll learn some new things. And if I ignore it for six months,it's fine. I don't feel like I'm missing out. I think with these agentictools,not just called Code and CoWork, but like across the whole ecosystem,people feel this need to checkTwitter every single day to see what the absolute latest thing is.AndI think there's more we can do to help peoplefeel less like they're on thisever increasinglyfast treadmilland that they feel like I I would love people to feel like they can just open these tools. The tools will educate themor, like, teach them what they wanna know and that they canjustfeel more bought along. Yeah. I saw you launch this really interesting feature the other day. I think it's slash power up where it basically walks you through all the cool ways and basically all the best practices to use cloud code. Is that kind of all in these lines? Yeah, exactly. So in the past, we didn't actually want to do something like PowerUp because we felt like the product should be intuitive enough thatyou don't actually need to go through any tutorial.
28:10 Link copied!
And over time, we've just realized that there's just so many features and there's so much demand for a built in onboarding experiencethat wediverged a bit from our original principle of saying no no onboarding flowand added this because there's just so many users who wanted to know there's a 100 features. What are the 10 that I absolutely need to use? And So we put that together. Yeah, it's such a bizarre world. Anthropic has been really successful with B2B enterprises,where traditionally you don't launch a bunch of stuff, you just kind of have quarterly release maybe, and it's like the opposite of every day we got something new. So just maybe following that thread, the run Anthropic has been on is justotherworldly.Anthropic was way behind when it started, was, Emol shared this, just like one of the least funded companies didn't have distribution, wasn't the first to go, OpenAI was way ahead, and it was just like no way Anthropic has any chance to compete significantly long term. Now it's just killing it, just beatingthe biggest companies teams with so much, just like the growth is just$11,000,000,000in ARR in one month, perps of growth. By the time this comes out, it'd probably be evenhigher.Just speaking on the inside, what are some ingredients that have allowed Anthropic to be this successful and kind of come from behind anddo this well? The two most important things are one,this unifying mission.
29:34 Link copied!
It's hard to state howimportant this is.We hire people who care most about bringing safe AGIto all of humanity.And this is actually something thatwe reference frequently in our decisions about whatour entire product org should focus on shipping.And because we put this mission above any individual product line, we're able to make very fast decisions that cut across the entire organd execute on them in a unified way. So I think this isis like something that I've never seen at a company of our scale. And so just to make sure that's clear. So essentially having the the number one mission is safety, alignment, making sure AI is good for the world. And you're saying just having that as a clearmission makes decisions a lot easier to make. If there's two competing priorities,we'll talk about which one is more important for Anthropic's mission.And it makes it a lot easier to decide which of the two we prioritize.And then everyonewill stand behind the one that we decide.And so sometimes that means that like, hey, we wanna ship something on Claude Code, but this other thing is more important. And so we deprioritize shipping this and we just wait until later. What's really interesting about that is that explains, I think, versus another company maybe rhymes with OpenAIdid a lot of different things.
30:56 Link copied!
And what I'm hearing here essentially is like, okay, we're not gonna launch social network, we're not gonna launcha feed of interesting information because it's not aligned to this mission and that has kept Anthropic focused, which just seems to be a core ingredient to the success. Well, when when I think about mission, I think aboutputting Anthropic'sgoals ahead of any individual org or any individual product.And sofor me, it'sI think the second thing that we're very good at is focus. I think missionto me is slightly different.Mission means thatteams are willing to make sacrificesthat hurt their own goals and their own KRs in service of Anthropic goals and Anthropic KRs.Andpeople are very happy to make those trade offs. So like an extreme example isif Claude Code failed, but Anthropic succeeded, I would be extremely happy.And like we're like the whole team is very willing to make decisionsthat follow that chain of thought. I don't know if you can talk about this in-depth, but do you feel like the Open Claw decision is a part of this? Just like, okay, this is not furthering the mission of Anthropic.
32:07 Link copied!
We need to stop this because it's not working in the way we want it to work. I think one of the most important things for Anthropic is togrow the number of users that we're able to reach. One of the ways that we're able to do this is with the cloud subscriptions with our first party products.And so we just very much want to double down on that,but that does come at the expense of third party products sometimes. So we've been talking about Claude, co work, all these things, something that I want to make sure people get. And I'm curious just how you use these tools. So there's Claude Code, there's Claude Desktop slash Web, there's Co work.What's the best way to understand when to use which? When do you use each of these three? So I tend to use Claude Code in the terminal whenI'm just kicking off like a one off coding task and I want all of the latest features.The CLI is our initial product surface, and it's also the one where our features often land first. And so it'sthe most powerful of all the tools.So that's what I tend to use when I'm just like trying to kick off one or maybe like a handful of tests at a time. I think desktop really shines when you're doing something that requires front end work. And so one thing that I love to do is to use our preview feature. So if I'm building a web app, I'll often use Claude Code in desktop.I'll have the preview pane open on the right hand side so that I can actually see the web app that I'm making in real time as I'm chatting with Claude. It's also really great for people who want something a bit more graphical.A terminal can feel very unfamiliarto someone who's non technical.You get a bunch of these scary pop ups on your machineand you can't click around the way that you're used to in pretty much every other product that you use. So there's a lot of people who just don't feel comfortable in terminal. And if that's you, I would highly recommend checking out Claude Code on desktop.Desktop is also great for gettingan at a glance view of everything that's happening. So you can see your CLI terminal sessions in desktop. You can see your other desktop sessions.You can see your sessions that you kicked off on web and mobile. So it's a one stop control plane where you can see all of your tasks.I think the benefit of web and mobile is that it's really great for kicking things off on the go. So CLI and desktop both require you to be on your local laptop.And this is constraining because sometimes you're out and about, you're like touching grass, you're going on a walk and you don't have your laptop open. Ican't count the number of people who I've seen like holding their laptop open, like tethered to their phone while they're outside. And this just means that we're missing a product that solves that need. And so for for me, what mobile lets you do is kick off these tasks on the go so that you don't you don't need to bring your laptop everywhere and make sure that your laptop's open wherever you are. I love that. I've I've seen people on plane. Like, it's just like such a meme now. Just I need to finish let this agent finish. I can't shut this damn bad new Wi Fi. And then I think for co work, the the role that this fills is there's a lot of work that everyone does where the output isn't code. So whether that's likegetting to Slack zero or inbox zero, or whether that's creating a slide deck for some customer meeting that's coming up, or whether that's writing a quick doc on what the goals of a feature are or what the launch plan for a feature is. All these tasksproduce outputs that are non code and co work is best positioned for that. So the way that I split the products in my mind isif I'm building something where the output is code, I'll use Claude Code or desktop or Claude Code on mobile.And if the output is anything that's not code, I'll use Cowork for it. People are just likesleeping on the success that Cowork is having. It's just like growing incrediblyfast. And I think people still don't understandmaybe what it's for.
35:59 Link copied!
And so what if you give us a couple use cases just in your work as a PM? What are some, like, really interesting, maybe unexpectedways you use co work to save you time, get more work done? If you're getting started on co work, the first thing thatyou really need to do is connect all the data sources that are relevant to your role. Because co work can only do a great job if it has access to all the context that it needs to be able to curate the output for you. So what that means for me is I connect it to my Google Calendar, I connect it to my Slack, to my Gmail, to my Google Drive so that it just knows it has the flexibility tofind relevant context,to ask questions, to pull in threads.And this substantially improves the quality of the result.The kinds of things I use it for arelike last night wehave this Code with Claude conference coming up and there's a few talks that I'm giving there.And one of the talks that we're doing talks about thetransition of Claude Code from an assistant to like a full on agent.And one of the things that I wanted to do in this talk was to showcase all of the products that we've been shipping that enable this transitionand also to figure out, okay, whatare the success stories that people have had internally that we can use as demos?And soI have myGoogle Drive connected. I have Slack connected.Alex, who's our product marketer, put togethera draft of what the points that he thinks we should cover are. And so I just fed this all into Cowork. I told Cowork the narrative that I want to tell, and it actually just worked for an hour. Itwalked through Twitterto see what we launched. It looked through our evergreen lunchroom.It looked in our Claude Code Announce channel, which is where our team posts demos ofhow they've been getting the most value out of Claude Code.And it synthesized all this together to this 20 page deck that I woke up to this morning and I read through it and it was pretty good. Therewere a few tweaks, so I did have to give it a round of feedback.I like my slides to have extremely minimal words and it was a little too wordy.But you know,it was far faster than like what I would be able to produce.And because CoWork has access to our whole design system,it actually looks likean Anthropic designer put it together. Like it when you visually see it, you're like, oh, this is like incredibly polished.Sothese are the kinds of things thatare so much faster. Like, this making this slide deck would have taken me hours.But instead, it, like, turns out a draft thatis actually quite good so that I could focus on making sure that the demos are amazing that we plug into it. This sounds like a dream come true to PMs that putting decks together is so annoying. It's so slow.
38:51 Link copied!
I and I love people will see this deck whenever you present this. This will be out in the world. This like, obviously, it's not the the one shotted version, but you've iterated on it. So just to help people try this for themselves.So step one is connect their what did you say? Slack. What else do you suggest they connect? Slack, Google Calendar,Gmail,G Drive. You you should connect your communications tools and where you store your source of truth data for what your team cares about, what you care about, and what you're working on. Okay. And then what was the prompt roughly that you put in there to generate this deck? So I just wrote, make me a slide deck for the Code with Claude conference.This is what our PMM suggested it should cover.This is the current draft that I made that I don't like. This is one that I made manually that I don't like, but I linked it. Can you start by creating a proposed outline with details?Also make sure it doesn't overlap too much with a keynote talk, which is more important.And then Claude read a bunch of the links that I sent to it and created a proposed outline.So then I read through its proposalandall the different ideas that it generated for what we could cover.And I just made a decision on what I wanted to actually be in the final deck. And I think this is like an example of what the role of the PM still is today. It's likeClaude is a great brainstorming partner. It's able to synthesize a massive amount of information really quicklyand present all of the possibilities to you. But the role of the PM is still to make the end decision of, okay, what should belong in the final product? So for this, what I ended up deciding wasthat I wanted the talk to coverthe progression from making local tasks successfulto making every PR green to helping engineers land more PRs. And for each of these, which demo would be the most compelling.And thenafterthis decision about the outline, co work just went off for a few hours and built the whole slide deck. This is so awesome. What an awesomepart of the job to not have to do anymore. And it feels like you're talking to essentially a deck designerthat also has like actual knowledge about what you've worked on and,and can like make it actually the content,which you want it to be, not just make it look really nice.
41:10 Link copied!
How did you did you do the design system piece? How does that work? How does it know the design system of Anthropic?So what I did for this is we actually already have like a standardized deckthat we use across all of our external engagements. And so I just gave Claude access to that. And so it's able to see like what colors we use, what fonts we use,the different kinds ofwhat's it called? Like slide formats that are possible. And so it has like 20 of these example slides. Like give an example. Got it. See, like upload, here's our template work from this. Yeah. You can also connect to like your Figma MCP ifyou have your slide formatsaved there, and it can pull that in. Along those lines, something I'm always curious about is what's kind of inyour stack of tools as a PM and Anthropic?Obviously, Claude Code and Cowork and all the Anthropic tools. What else are you using? What other Slack you mentioned? Is there anything else?So my stack is pretty heavily Claude Code, Cowork, and Slack.Anthropic largely runs on Slack.I feel like it's like the core OS of our company.And day to day, likea lot of I would say maybe 30% of my time ispushing the boundaries of what Cowork and Claude Code can do so that I have a very strong sense ofwhat we're not good at.AndI spend a lot of time talking with the model to understandwhy it makes mistakes that it does. We actually have a lot of internal tools that we make. Like, I think one of the things that Claude Code has really unlocked for our entirecompanyis it really lowers the barrier to making any custom app that you want. And so we've seen this like surge inpersonalized work softwarethat people are building for, like, custom use cases instead ofusing tools that don't perfectly fit the use case. I gotta hear more. What are some examples? What are things you've built other people built that are really popular and useful? One of the sales folks on Claude Code, herealized he was making these repetitivedecks over and over and over again. And so he actually has this web app that he built with the examples of the core Claude Code decks that we know work well. So like a 101, two zero one, and mastering Claude Code.
43:32 Link copied!
And then he has a way to input specific customer context that pulls from Salesforce, pulls from Gong, that pulls from other notes so that we can customize the decks for specific customers.And so we'll pull out things like, okay, this customeris usingBedrock or Cloud for Enterprise or Console, which affects what features are available to them.It'll pull out things like, okay, this customer is concerned aboutthe code review stage of the SLC.And so we'll add a slide about our code review features there.It'll pull out things like, okay, this customer needs to be HIPAA compliant or needs XYZ security controls. And so we'll make sure to add a slide or two in their deck about that. And then for example,ifthis is a customer that's on Vertex or Bedrockand doesn't want to use Cloud for Enterprise, then we'll just take out some of the slides that are called for enterprise only features. And so normally, this is like manual work that could take twenty, thirty minutes.Orand so people either, like, spend that time doing it or they'll just decide not to do it and use the general deck.With this, it takes, like, a few seconds and you get a tailored deck. What's interesting about it is like Slack is like the tool that nobody'sit's just likenobody's trying to create their own. Slack just continues to win and it's just like the way you describe it as kind of the OS of so many companies. It's so interesting. Like people talk about Salesforce as just like SaaS. We don't need SaaS software anymore. We're gonna build our own. It's like Slack is a durable tool that nobody wants to try to compete with and build a better version. I think it's pretty important communications infrastructure.
45:08 Link copied!
And I think they do the core task of helping everyone get real time updates incredibly well. Yeah. Like people hate on Slack, but it's really great at what it's trying to do. And like like most cutting edge teams are are hooked on it. So interesting. Yeah. And I also love how customize how easy they've made to customize it. And so it'swe we love making Slack bots.Andthis kind of like hackabilitymeans that we're able to integrate with Slack the way that we want to. So really appreciate Slack's work on that. Time to time to buy some CRM stock.I am so excited to tell you about this season's supporting sponsor,Vanta. Vanta helps over 15,000companies like Cursor,Ramp, Duolingo,Snowflake, and Atlassian earn and prove trust with their customers.Teams are building and shipping products faster than ever thanks to AI. But as a result, the amount of risk being introduced into your product and your business is higher than it's ever been.Every security leader that I talk to is feeling the increasing weightof protecting their organization,their business, and not to mention their customer data.Because things are moving so fast, they are constantly reacting,having to guess at priorities, and having to make do with outdated solutions.Vanta automates compliance and risk management with over thirty five secondurity and privacy frameworks, including SOC two, ISO 27,001,and HIPAA. This helps companies get compliant fast and stay compliant.More than ever before, trust has the power to make or break your business. Learn more at vanta.com/lenny.And as a listener of this podcast, you get 1,000off Vanta.That's vanta.com/lenny.Okay. So you talked about all these different teams that and how they use Claude Code and Code Work to operate.Which teams do you find other than engineering? I imagine engineering is the biggest token spender,but if not, that'd be really interesting. What's kind of like the second place function right now for tokens? Oh, Applied AI is amazing at pushing the boundaries of what Claude Code and CoWork can do.
47:11 Link copied!
A lot of our applied AI team spends time with our customershelping them adopt our API.And so sometimes our applied AI team will, for example, make prototypes on behalf of these customers,which Claude Code makes so much faster than it used to be.They also have the dual goal of needing to manage a lot of customer comms, a lot of customerinbound andhistorical context, call notes. And so they're both extremely heavy on co work and on Claude Code. And just to understand Applied AI, is that like does that like forward to play engineering sort of role? Like, what did they how would you how would most people describe what applied the Applied AI team is doing? Yeah. It's helping our customers adopt the latest APIandmodel featuresacross their company, both forpowering their company's products and also for internal acceleration. Got it. So it's like customer success, go to market y, kind of like for deploy engineering sort of thing. Exactly. It's like a very technical go to market person. Got it. Okay. Awesome. So that's so you're saying that might be the secondorg that uses the most tokens?
48:20 Link copied!
Yeah. And then we also see them pushing the boundaries of what co work can do. So for example,ifSo a lot of these folks cover multiple customers andin any given day can have like five to 10 customer engagementson a high day.And so what they often use co work to do is the night before they'll ask it to summarize, okay, what are all my customer meetings that are coming up the next day?Whatare all the things that this customer has asked me for?What's top of mind for them? What are the action items from the past meetings?And co work will just put together this dossier,this brief of what they should be aware of going into the next meeting. And co work can also research answers. Soif a customer asked, okay, when is feature X going to launch?Cohort can help the Piped AI person research through Slack to get the latest ETA,add that to the notes so that during the customer call, the PodAI person has the absolute latest. And these are just workflows that people are building for themselves and sharing with other people on their team. So cool. Something that kind of this question, this trend,I don't know, question topic comes up a lot recently, which istoken spend exceeding people's salary,where people just use AIand it costs more than how much they're making. Are there any numbers floating around Anthropic of just like how much token spend, say engineersspend, I don't know, a month, a day or PMs, anything like that. It is clear to us that as the models get better,people delegate far more tasks to it and they spend a lot more hours in tools like Claude Code and CoWork.
50:00 Link copied!
And so we do seethe token cost per engineer or like per any knowledge worker increase every time that there is a model jump or like a substantial product improvement.I think it'sstill much lower than what the average engineer salary is,but we see the percentage increasing over time. It's such an interesting like we talked about how you have access to the most cutting edge models and other advantage of working Anthropic. I believe you guys have basically unlimited tokens. You don't you can use as much as you want. Is that right? We can use a lot of tokens. Some people do run into limits. So Okay. There's a limit. Okay.Boris, shut it down.Okay.Like, it's so interesting how many advantages come from having the most advanced model. It's such an interesting, like, flywheel that starts to kick in.I think we also believe a lot in empowering our internal teams to build as fast as possible.And we also trust that everyone understandshow much capacitythatserving these models truly costs.And we trust our team to use the tokens responsibly.So it's very frowned upon to waste tokens,but we do trust individuals to make that judgment call. Awesome.
51:15 Link copied!
Coming back to the PM role, you talked we talked a little bit about this, but I think this will be really interesting for people to hear.Just what I want to understand is what do you think are the kind of the emergingskillsthat PMs need to developyoumost look for, AI companies most look for when they're hiring PMs these days?I think the hardest skillisbeing able todefinewhat the product should look like a month from now.I think there's a lot of ambiguity in what models are capable of in that timeline and how user behavior will change.But I think there are patterns that the best PMs can see based on how users are abusing the limits of the existing product.And the best PMs can sense that, can set a direction and can steadily execute towards it and change the path if the model capabilities are much better than or worse than what they'd originally expected.I think it is very hard to be the right amount of AGI pilled because I think everyone can see this futurewhere the models are extremely smart and can do almost everything, in which caseyou actually don't need that complicated a product. You can actually just have a text box again where you tell the model what you want.Andit's so smart that it can add any tool or add any integration that it needs to get the job done. It knows when it's uncertain. It can ask clarifying questions. It'skind of very easy to build the product for the super AGIstrong model.I think the hard thing is figuring outfor the current model,how do you elicitthe maximum capability?How do you help usersgoget onto thegolden path? How do you guide users to interact with the model strengths and patch its weaknesses?The the skill is, like, pretty rare.
53:18 Link copied!
Andhow do you build that skill? Is it just using each like, basically understanding the limits of each model, having, like you talked about taste, understanding, having taste into what the model maybe is capable of, but it's great and not great at, where it's changed. I think it's spending a ton of time talking and using the model.One of the things I really like to do isto ask the model to introspect on its own behaviors.Sosometimes when I notice that the model does something unexpected,like for example,there's situations where the model willmake a front end change and run tests, but not actually use the UI.It's actually pretty useful toask the model to reflect on why it did this.And sometimes they'll say that, Hey, there was like something confusing in the system prompt,or I didn't realize thatthe front end verification was part of this task, or, Hey, I delegated the verification to this sub agent and the sub agent didn't do the test and I didn't check its work.A lot of times just beingvery curious about why the model made the decision that it did will show youwhat misled it so that you can fix the harness in order to close this gap.The other thing that helps isto figure out who arethe users who you trust the most to give you accurate feedback about the model.Usually there's a handful of people who are much better than others at articulating what makes a specific model or model harness combination good.Andthere's a lot of people who will give you feedback, but not everyone's feedback is as qualified.And so finding a group of those like five people you trust is really important for getting very fast feedback.I think the third thing thatis useful, but not everyone loves doing is building evals.You don't need to build hundreds of evalsfor them to be useful. Just building 10 great evalsis important forhelpingthe team quantify what the goal is and what their progress towards it is and what they're missing. And so I think evals is this, like, underappreciated thing that more more PMs, more engineers should be working on. We've covered evals a bunch. There's this trend of just like that is the future of product management's writing evals because it and essentially, it's what a success look like. Okay. Cool. Let me actually concretely define it, and then we'll know. How much of your time are you spending writing evals, would you say? I think the importance of evals varies a bit based on the feature that you're working onor like what the problemyou're trying to solve is. So there are a lot of folks on our team who do spend a lot of time working on evals. We have a small pod of folks who collaborate very closely with research to more precisely understandour Claude Codebehaviorsandwhat thelargest areas of improvement are and trying to measure those pretty concretely.
56:16 Link copied!
I personally jump into evals whenthere's a feature that I think needs a bit more product definition.And oftenthe output of this is, okay, here are fiveevals that I made.This is how you run them. These are the ones that succeed and these are the ones that don't. And this is like the prompt that I've used toincrease the success rate.It varies a lot though based on the exact feature.Not every feature needs it, but I think features such as memory benefit a lot from it. This point you made about people being very good at evaluating models so interesting, it's almost like a human eval of just like, okay. They understand where it's spiking or it's maybe lacking.Is there anyone specific that you wanna shout out that's very good at this?Two people who I think are incredible at this are,one, Amanda,who who molds Claude's character.It's just like such a hard role because the task is soambiguous.Even coding is easier because you can verify the success,whereas crafting the character requires a very strong sense of convictioninwhat who Claude should be.And I think she has like an incredible ability to not only mold the character, but also to articulatewhat the goals are, what the characterwhat's successful and what's not.The other group of people who I really trust is just like the Claude Code team.So we often have team lunches. And whenever there's a new model we're testing,one of the fastest ways for us to get feedback is to just like at these team lunches, just like go to every single person and just be like, hey. What is your vibe on the model?And oftentimeswe'll we'll get feedback like, okay, this model is likenot fully explaining its thinking. It's like too abrupt.Or like, hey, this model is likejust loves writing a ton of memories, but we're not sure if the memories are high quality or not. Or like some people will notice that, Okay, this model loves to test itself, which is great. Or like this model isn't testing itself enough.So that informs what data we look at to verify, Okay, is this a larger pattern?Sowe have a ton of data, but it is very hard to extract insights.And sothe feedback from this group helps us inform, Okay, what are the hypotheses we want toAnd then we're able to extractdata totest that. This point you made about the character of Claude,I had Ben Mann on the podcast cofounder,and he talked about this just like the character, the constitution of Claude is such an important part ofof of Claude. And I I didn't realize until afterwards just like like people like with OpenAI, actually, of the exam one of the reasons people are sad is, like, the personalityof your Claude is, like, because Claude's personality is so good and fun and and interesting unlike other models.
59:13 Link copied!
And there's and the way he put it is the personality is what makes Claude so good at so many things. It feels like this, like, trivial side thing. Okay. It's gonna be funny and interesting and talk in a fun way, but it's, like, so core to the success of Claude. There anything you'd share there about just, like, what people may not understand about why the character as you described in the personality is so key?When you reflect on everyone you've worked with,there's just some people where you're like, I really like their energy. Like, I really like their vibe.Andwhen people think about Claude and Claude Code,this is one of the things that people bring up the most where they just really love that Claude is likeit's lighthearted and fun,but it also is extremely competent at your task.People really like, like, Claude's low ego. And so if you tell it, hey, you did this thing wrong, it's like truly sorry. It's like, oh, shoot. Like, thanks for telling me. Like, let me fix it. Let's work together. It's also very positive. Soif you're feeling like, oh, this is like an insurmountable task. I don't knowhow to get started. Claude is like, okay. It's okay. These are, like, the steps that I think we should take. Like, do you want me to get started on it for you?I think part of what makes a great coworker is thispositivity,this bias towards action, thisability to give youearnest feedback,not just agreeing with every single thing that you say. And so we try to imbue this into Claude because we think it makes it a lot more enjoyable to work with. There's something I wanna come back to. You talked about how when new models come out, you often have to kind of revisit things you've built. That's so interesting and so, like, frustrating maybe just like, oh, goddamn it. We shipped this thing and now we have to rethink it. Talk about just like how often you have to come back with a new model and they're like, okay, we have to redo this product that we launched a few months ago. A lot of the changes that we make with a new model isremoving features that are no longer needed.
1:01:12 Link copied!
Soa lot of times we add features to the product as a crutch for the model because it's not naturally doing itself. So the classic example for this is the to do list. When we first launched Claude Code, people would ask it to do these large refactors and Claude Code would say, Okay, cool. I need to change these 20 call sites. And it would go and change five of them and then stop. And then we were like, okay, how do we force it to remember to get every single one of these 20?And so Sid on our team was like, okay, what if we justthink about what a human would do? A human would make a list of everything that they need to change, similar to how in Versus Code you would look up all the call sites and there'll be a list on the left side and you would go through them one by one and replace all. How do we give this kind of like a tool to Claude? And so he added a to do list.And we found that with that, Claude was actually able to fix all these 20 call sites. But then with Opus four and later models, we realized that we didn't need toforce it to use this to do list. It would naturally use it itself.For the earlier models, we had to keep reminding it, hey, you finish everything on to do list? You can't finish until you're done with everything on the to do list. And for the later models, without prompting, it just naturallythinks to do everything on the to do list.These days, the to do list isstill nice to have as like a userbecause then you can more clearly see what Claude is working on. But honestly, it's such a deemphasized part of the product right now thatthe model may use it, the model may not use it. It's like really not necessary for it to make thorough changes anymore. I forget who said this on the podcastthat the model will eat your harness for breakfast.
1:02:50 Link copied!
And what I'm hearing here is essentially you you remove things over time that you've had to add on top of the model where it was not operating the way you want it. And essentially, as the models get smarter, you just it becomes simpler and simpler for it just to do the thing you want it to do. Yeah.We can move remove a lot of prompting interventionsevery time the model gets smarter. And we actually do this every time we launch a model. We read through the entire system prompt and we reflect on, okay, for each of these sections, does the model really need this reminder anymore? And if not, we'll remove it. The most exciting thing that new models unlock though, is just entirely new features.So there'sa lot of features that we've been testing out with prior models and the accuracy wasn't high enough for us to want to launch them.And so one example of this is code review.We tried to build a code review product a few times and we've launchedsome poor versions of code review, which is the code review command in the past. And it was only with the most recent models that we felt like, Okay, this code review is so good thatour engineering team relies on this code review to pass before we merge PRs.Andwe found that this waswe've always dreamed of Claude being able to be a reliable code reviewer that can actually that we can confidently feel catches the majority of bugs.And it was only with Opus four point five and four point six and Sonnet 4.6 that we felt like, okay, we are now abletorunmultiple code review agents simultaneouslytothe entirety of the code base andto synthesize a set ofreal issues that an engineer needs to address before merge. And so this is like a new capability that the the newest models have unlocked. This is another trend that is very common on this podcast of build something that will possibly be possible in the next six months. You kind of at the edge of what's working sort of, and then it'll catch up and then it'll be an amazing product and you'll be ahead of everyone. Yeah, exactly.
1:04:54 Link copied!
It's pretty important to build products thatdon't necessarily work yet so that,you know, okay, what is missingfor this product to work? And then with the newest model, you can just swap it into the prototype you've already made and see, okay, does this new model close that gap? How much are you able to speak to just kind of where things are going with Claude and co work as kind of the vision of it? I imagine you don't wanna give away too much about the goal, but it feels like you're there's all these awesome features being added on top, dispatch, control from phone, and all these mobile app, all these things. What's kinda just like a way to understand the vision for all these things long term? We think about this in terms of building blocks. Sofor both Claude Code and CoWork, the core building block ismaking individual tasks successful.Soyou want us to produce some output,you give it a clear prompt description. Is it able to consistently produce acceptable output that you're able to either merge or share with your colleagues or external audience? So the task is the core building block.As the models get smarter, the task success rate gets a lot higher.And then we see people moving towards doing multiple tasks at the same time. So multi clauding was this big thing towards the 2025,and it's only increased since then. And so we see this as, okay, great.One task works and now you can do like six tasks at a time.As the models get even smarter, the way that we are extrapolating this is, okay,next, maybe you're gonna run like 50 calls at a time or hundreds of calls at a time. So And what is the infrastructure we need to build to enable that?At that point, you're probably not going to run everything locally on your machine anymore. There's just like not enough RAMto do it.And so we'rethinking abouthow do we make it easier for you to manage all these? These will probably run remotely.How do we build the interface so that you as a human knowwhich tasks you need to lookinto?How do we make sure that the agent is fully verifying its work so thatwhen you look at a task and it says it's done, you canvery quickly verify and fully trust that it is done to your spec.And how do we make sure that this process is self improving so that when you do see a task that isn't done to your liking, you can give it feedback and the model will know for every future run to incorporate that feedback so it never makes that mistake again.So this is the progression that we're bringing our users along for. There's a lot of people listening, a lot of product managers,a lot of maybe founders, lot of other cross functional folks listening.
1:07:30 Link copied!
There's a lot of worry about just how their rolejust the the future of their careers.What advice would you have for just people tonot just survive this transition to this very AI driven world, but to be really successful to essentially just to thrive in this future? What are just like things people need to hear, need to be doing? I think AI gives everybody a ton more leverage than they used to. And so I would push you towardsanytime you realize that you're doing some manual task multiple times,think about how you can use Claude Code, CoWork, or other AI tools toautomate that for you.Most peoplehave creative parts of their job that they absolutely love.And then tedious parts of their job that they really hate doing. I think the beauty of AIis that it can do those tedious parts for you. It can learn from every time that you've done that manual taskand generalize and then run it automatically.And so that you can focus on the creative parts. And that means you can do a lot more than you used to be able to do.So I think my immediate push for people is figure out the repetitive parts that you can pass to Claude, iterate on those automations until the success rate is very high,and then focus on, okay, what more can you be doingfor your team, for your product, for your company that peoplehaven't had the bandwidth to pick up so far? Or what is that pet project that you always thought the company should do that you've never hadbandwidth to do. If AI can take care ofthe grunt work, then youhave this extra 20% time now that you might not have before.So my push is to lean into these tools, hand off the work that you're not excited to do, figure out how it can accelerate you. And then as a result, you'll be able to do so much more. Something core to what you just shared, which I fully agree with is find problems to solve with AI. There's always potential what all these tools can do.
1:09:29 Link copied!
Some of the hard like, for a lot of people, the hardest part is just like, what should I actually do? And what you're saying here is just pay attention to things that you are doing constantly you can automate, Pay attention to just like ideas that have been floating around that you haven't had time to do.It's basically it's like solve a problem for yourself is kind of the core advice there. Exactly.I would also push listeners towards focusing onbringing your automations from, okay, this is a cool concept to like, hey, this actually works 100% of the time. Sometimes I see userstrying to automate something,getting it to like 95%accuracy and then giving up on it.And thisif an automation doesn't work a 100% of the time, it's not really an automation.And that last five to 10% does takemore time.Also building the automation is often a lot slower than you doing it yourself.I would encourage listeners to put in that timetoscope some automation that you really wanna get to a 100,put in the elbow grease toteach Claude your preferences, to give it feedback so that it can improve its skill so that it can get to that 100%.And then, like, really then you'll be able to rely on it. There there's just not much value in a 95% there automation.
1:10:45 Link copied!
I am super guilty of that. This is really good advice for me.I am guilty of this too. I've been teaching it. I've been teaching co work to try to get me to inbox zero for Gmail,and it has not beenit has beenvery time consuming, and it is definitely not there as you probably realize. Yeah. I funny enough, that's exactly where my mind goes. I have thisworkflow I set up where every email I get, it looksfor things that are spammy,which is just like all these like, hey, can I come on your podcast or what about this one? Like all these things I'm just like, I don't have time for these sorts of things, And I have it categorized into a folder called spammy.And it's just like, it's 95 great. But then there's like, oh wow, I missed an email because it went in there. So this is a good push for me to like, I'm gonna work on this. I'm gonna get it to perfect. Yeah. We also are working on making the flow for customizing these commands a lot easier. Because right now I think you have to know too many concepts. You have to know to define a skill. You have to know to use this skill and give it feedback. And then you have to know to tell co work to update the skill based on all the feedback that you gave.And then you also have to know where to read the skill to make sure that the feedback was incorporated the way that you want. It's also our job to make this flow really seamless so that it doesn't feel painful to do. Amazing.
1:11:59 Link copied!
Is there anything else, Cat, you wanted to share? Anything else you wanted to leave listeners with? Anything you wanted to double down on that we haven't already touched on before we get to our very exciting lightning round? I see a lot of people playing around with AIand building like prototype apps andtinkering with building workflows,I would really push people towardsbuilding apps that you're actually using every single day. Because I think only through that usage are you actually getting the value. Like if you build a prototype app thatisn't helping you get more done,then the AI isn't really adding value to youryour day. And there's only so much you learn from that when it's like, okay, I just did one shot at something. Oh, that's cool. And then you never come back to it. It's like, you're not learning a lot. And you're not getting like much leverage from it. And actual leverage. Yeah. That's such a good point. I also think there's a lot of people who spend a lot of time customizingtheir workflow.I think there's two ends of the spectrum. One is people who never customize or never build automations.But there's this polar opposite end of people whoobsess aroundcustomizing their tool,adding a ton of skills andMCPsandthese workflow improvements.And I think sometimes that can even distract from your core goal of launching some product or building some feature. I think there's a lot of fun in customizing,and we definitely wanna make our products very hackable so that you you can make it work really well for you. But there is a limit tohow much it's useful.And I think there there's a camp of people who maybe spend so much time customizing that they're, like, not sleeping and not doing the, like, core task that they originally set out to do. I see a lot of that on Twitter.
1:13:44 Link copied!
Just like, look at my setup. It's out of control. It's so optimized. Then what are you what are what are you actually building?No. But my setup is so awesome. I could get so much done. I think the simple setups actually work better.Slash power up. Getting take level up a little bit. Yeah. Yeah. There's this tweet that just came out yesterday where he talked about this divide that's interesting betweenpeople thattried ChatGPT,Claude back in the day, it was like, okay, and they're like, nah, this is terrible,and they kind of gave up on like what AI could do for them, they're just like so cynical, like, no way, it's not actually that big of a deal. And then there's people that are using it to code essentially,who see the fullintense power of it and how good it is. Andpeople on both sides don't understand the other side and why they, like how much they, how they see the world.And so your advice is really good here to just like actually use it for real things and see how good it actually has gotten. Yeah. I think the big shift is that the twenty twenty four generation of products were chat basedandthe Claude Code generation of products is action based.
1:14:49 Link copied!
Andthe big moment people have is when Claude can just do things on your behalf.Itis an amazing feeling to know that the agent is capable of doing so much more than telling you what to do. The agent can actually just do it itself.And when people feel that, think that's the eye opening moment.Shout out a Chrome extension, the Claude called Chrome extension, which you could just watch it doing stuff. You'd be like, fill out this form for me. You know? Like, alright. Here I go. Exactly.Okay. Anything else before we get to a very exciting lightning round? No. Let's do it. Let's do it. Cat, I've got five questions for you. Welcome to the lightning round. There's this animation in that place. I have to make sure to say it.Are you ready? I'm ready.First question. What are two or three books that you find yourself recommending most to other people? I really like How Asia Works.It's a story about economic development and what are the policiesandgovernments that makelong lasting successful economies.The other books that I'm really into are The Technology Trap.So this is actually about the past fewtechnology revolutions, so the Industrial Revolution and the Computer Revolution,and how this has affectedworkers.The reason that I really like this is because I think there's a lot we can learn from historyto make sure that this transition goes well.Andmaybe on, like, a fun note, I really like Paper Menagerie.It's just like a book of short stories about, like, coming of age andAIandjust like self discovery. Favorite recent movie or TV show you have really enjoyed? I really like Drive to Survive.There's no, like, deeper meaning to it. I justthere's just something very satisfying aboutpeople being so obsessed with a singular engineering goalandjust the purity of the pursuit.And I also really love Free Solo, which is aboutAlex Honnoldclimbing El Capitan without a harness.And I think similarly, it's justsuch a pure achievementto be able to climb this extremely challenging,dangerous routeand to be able to have the mental focusto do it knowing that if you make a single mistake, you die. It's insane. Yeah. That movie is out of control, and it's interesting how these relate in some way to the work you do. I actually am a rock climber.
1:17:24 Link copied!
I first watched Free Solo before I climbed rocks. And so I I thought it was impressive, but I didn't understand how impressive it was. It's one of the rare movies where, like, the more you know about it, the more you're you're blown away by how insane this is. Like, the kindsthe kinds of moves he's doing on the wall are things that, like, I don't think I will ever be able to do in my lifetime if it were set in a gym, like, one feet off the ground.With a rope. With a rope. Didyou see the documentary on that other guy, the younger one that went on, like, ice? I did. That one was very sad. But that was wild. Okay. Favorite product you've recently discovered that you really love? The product that is like most changed my life outside of Claude products is probably Waymo.Like, I'm a diehard Waymo user.Use it twice a day, get to and from work. So the two things that I really like about it are one,I don't feel bad if a Waymo is waiting for me. And so I feel I feel less pressure tobe right at the curbside the moment it arrives.And the second thing is I feel like it lets me be a bit more productive.When I'm in the car with another human, I typically try not to do any work calls.Feel a little rude if I'm on my laptop the whole time. But one thing I really appreciate about the Waymo is I can call into a work call. I'm not worried about someone overhearing me. I'm not worried about,hey. Is this, like, rude? Am I talking too loud? Do I need to tell ask someone to, like, change the music? And so this has been, like I feel like this has given me back, like, thirty minutes every day. All these second order effects of of technology. It's so interesting. Yeah. I always thought Waymo needed to be priced lower than Uber and Lyft to succeed.But, actually, I'm, like, very happy to pay a two x premium for it. I love Waymo. It's just like like, once you see it, you're just like,this is insane.
1:19:13 Link copied!
And and then you get used to it. Like, you get in there, you're like, this is crazy,and then you forget about it. Totally. And I think it's also changed the vernacular.Like, a lot of people at Anthropic love Waymo. And I think in the past, you'd be like, hey. Like, let's call, like, blah blah rideshare app. And now, like, everyone's just like, okay. Is Waymo here? Okay. Two more questions.Do you have a favorite life motto that you often come back to in work or in life? Just do things.That tracks. I think there's a lot of value in like first principles thinking. And ifyou like, if you know what you're optimizing for and you have like strong first principles, then you can normally deduce what the right like course of action is and be able to clearly articulate that to all the stakeholders.And thenyou should just like do it. Like, think jobs are fake.If you understand the constraints, you can figure outwhat you can do and then just like try to do it quickly, learn from the mistakes, and apologize or fix them if you did something wrong. You you could just do things. Whoever said that. I think it's liberating actually to, like, tell people this. I think in a lot of companies, like, roles are very strictly defined.Like, Okay, this is what the PM does, this is what the designer does, this is what an engineer does. And then even team scopes are very rigidly defined. So, hey, this corner of the code base we touch and this corner, like, we're not allowed to touch.And I think what Just Do Things lets people do is they feel, like, empowered to make these decisions, empowered to operate across team boundaries just to, like, get something done. That feels like a big important skill to be good at. People call it agency.
1:20:44 Link copied!
Just, like, do the things that need be done.Buy towards action. All these ways of describing just, like, don't wait for permission.Yeah. I think this is my favorite reason to work at a start up at some point in your life Becauseone thing that was very life changing for me was actually working at scale when we were 20 people.And so there was just no process and we had really big problems that we needed to solve.It was like, I really appreciate Alex and the rest of the team for empowering me and the rest of the team to justfigure things out without any boundaries for what sales supposed to do, what ops supposed to do, what engineers supposed to do. Just like you have all the tools at your disposal.You have some like ambitious hairy problem statement and can do whatever you need to like get to a good solution. You almost need that experience to build that skill to feel comfortable doing that because a lot of people, you know, they go through school or in college and all these like, do the thing we tell you to do, and then you will get a good grade. And you have to kind of unlearn that of like, okay, I'm just gonna do the thing that needs to be done, and even if people think it's dumb,I think it's the right thing to do. Yeah. Exactly. Okay. I actually have two more quick questions. Two more final questions. One is when Claude thinks there's all these I don't know if you call them verbs.What's the term for these things? Thinking words. Thinking words. And interestingly, these all leaked in the source code.Is it do you have a favorite thinking word?
1:22:03 Link copied!
I really like manifesting.It's also like the sticker thatI have on my laptop.It's my favorite. Clearly the winner. Okay. Final question. Ask Boris this too. With AGIpotentially arriving in our lifetime,when you don'tpotentially have to work? What are you gonna do? What are you gonna do with all your time? I think it it will take a long time for AGI to diffuse across society. So I think the immediate thing is actually just,helping bring the world along. I think my non serious answer for after this happens isI'll probably just do a lot of rock climbing. I'll probably just live in someI'll probably move to Fountain Blue and just live amongst 10,000 boulders andclimb for a bit. There's also so many books I wanna read thatmygoal is to be able to read one or two books a week.And I'm currently at probably like 0.5.Thebacklog is pretty big. I think there's just so much we can learn from historyand so much that I don't understand as well as I would love to. I don't know anything about physicsor, like, robotics or, like, any hardware or, like, aerospace or there's just so many interesting topics. So I'm excited to learn even even knowing that the AGI will already know it.
1:23:26 Link copied!
Cat, this was amazing. You're awesome.Two follow-up questions. Where can folks find you online if they wanna reach out and just follow what you're up to? And how can listeners be useful to you? The best way to reach out is I am underscoreCat Wu on Twitter.Feel free totag me in things. Feel free to DM me. I read all my DMs. I don't always respond to every single one, but I will read them all.And then the thing that is most helpful istell us where Claude Code and CoWork aren't working well for you. Weare very grateful for the amount of positive feedback,but the thing that we thrive on isedge cases, errors,specific tasks that we can reproducewhereClaude Code or co work fail. Becauseif you're able to share that with us and we're able to reproduce it, then this is something that we're able to actively improve for our next generations of models and for our next harnesses.Extremely cool. Everyone on people on Twitter are not shy with sharing this feedback. So keep it Yes. Share us share please please share the problems that you're having with us. Yeah. And it's really cool to see all you your team being on so active on Twitter and responding to people.So like what I'm hearing, like this is actually stuff you guys actually see and react to. Yeah. We appreciate everyone being so engaged with us. It gives the team a ton of energy.
1:24:49 Link copied!
We have this channel of like user love. And so wheneveryou guys share a success story, we post it there. And whenever you guys share, like, issues with our product, we put it into our feedback channel. That way our broader team is able to act on it. That is so cool to know. Thanks for sharing that. Well, Cat, thank you so much for being here. Thanks for having me.Bye, everyone.Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify,or your favorite podcast app.Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lenny'spodcast.com.See you in the next episode.
Free to start
Full transcripts, AI insights, episode chat — free.
Sign up with Google in one click. 10 unlock credits included. No card needed.