<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
  <title>Field Notes</title>
  <link>https://www.popsikle.net/blog/</link>
  <description>Essays, notes, links, and research in progress from NERV.</description>
  <language>en-us</language>
  <atom:link href="https://www.popsikle.net/blog/feed.xml" rel="self" type="application/rss+xml" />
  <item>
  <title>Faster drafts, denser days: the burnout risk on AI-heavy teams</title>
  <link>https://www.popsikle.net/blog/faster-drafts-denser-days-ai-burnout-risk/</link>
  <guid isPermaLink="true">https://www.popsikle.net/blog/faster-drafts-denser-days-ai-burnout-risk/</guid>
  <pubDate>Fri, 17 Apr 2026 12:00:00 GMT</pubDate>
  <description>AI can reduce drafting time without making the workday lighter. On AI-heavy teams, more of the day shifts into review, supervision, and judgment, which may require more recovery time, not less.</description>
  
  <category>ai</category>
  <category>software engineering</category>
  <category>work</category>
  <category>management</category>
  <category>burnout</category>
  <content:encoded><![CDATA[<p>AI coding tools are usually sold as time savers. In one narrow sense, that is
true. GitHub’s early Copilot study reported a 55.8% speedup on a bounded task,
Google measured roughly a 21% reduction in time-on-task in an enterprise RCT,
and a multi-company field paper found a 26.08% increase in completed tasks
(<a href="https://arxiv.org/abs/2302.06590" target="_blank" rel="noreferrer">Peng et al.</a>,
<a href="https://arxiv.org/abs/2410.12944" target="_blank" rel="noreferrer">Paradis et al.</a>,
<a href="https://economics.mit.edu/sites/default/files/inline-files/draft_copilot_experiments.pdf" target="_blank" rel="noreferrer">Cui et al.</a>).</p>
<p>But “faster” is not the same thing as “lighter.” The better question is whether
AI changes the weight and texture of the workday. My view is that it often does.
AI compresses some drafting and lookup friction, then shifts more of the
remaining day toward verification, integration, review, prompt steering, and
accountability for the merge (<a href="https://doi.org/10.1145/3706598.3713778" target="_blank" rel="noreferrer">Lee et al.</a>,
<a href="https://dora.dev/insights/balancing-ai-tensions/" target="_blank" rel="noreferrer">DORA</a>).</p>
<p>That distinction matters because reduced keystrokes do not necessarily mean
reduced strain. On AI-heavy teams, faster drafting can shift more of the day
into review, judgment, supervision, and exception handling. If that is true,
leaders may need to design for more recovery, not less.</p>
<p>Software engineering was already cognitively heavy before AI. The work has long
been dominated by understanding code, not just typing it, so “AI replaced the
boring part with the hard part” is too simple
(<a href="https://doi.org/10.1145/3546576" target="_blank" rel="noreferrer">Feitelson</a>). What AI really changes is the
texture of the leftover work. When the draft appears instantly, the human job
becomes deciding whether the thing is correct, safe, locally appropriate, and
worth shipping at all
(<a href="https://www.sciencedirect.com/science/article/pii/0005109883900468" target="_blank" rel="noreferrer">Bainbridge</a>,
<a href="https://doi.org/10.1145/3706598.3713778" target="_blank" rel="noreferrer">Lee et al.</a>).</p>
<p>Here is the loop I keep coming back to:</p>
<p><img src="https://www.popsikle.net/blog/assets/content/ai-work-intensification/work-intensification-flow.png" alt="Flow chart showing how AI-assisted coding can turn local drafting gains into denser supervisory work, heavier verification, and burnout risk if teams absorb the savings as more scope instead of more recovery."></p>
<p><em>One plausible path from faster drafting to denser work.</em></p>
<h2>The problem is the almost-right output</h2>
<p>The expensive failure mode is not obviously wrong code. It is plausible code.
Wrong output can be rejected quickly. “Almost right” output has to be read line
by line, checked against local conventions, run through edge cases, and tested
against system behavior that only the team actually knows.</p>
<p>DORA has a good phrase for this: the <strong>verification tax</strong>. Time saved in
generation gets repaid in auditing, prompt refinement, review, and rework
(<a href="https://dora.dev/insights/balancing-ai-tensions/" target="_blank" rel="noreferrer">DORA</a>). The 2025 Stack
Overflow survey points in the same direction: 66% of developers said AI answers
that are “almost right” are frustrating, and 45.2% said debugging AI-generated
code takes more time
(<a href="https://survey.stackoverflow.co/2025/ai" target="_blank" rel="noreferrer">Stack Overflow 2025</a>).</p>
<p>There is also direct workload evidence here. A study on validating and repairing
LLM-generated code found that when developers knew code was AI-generated, they
performed better checks but also showed higher cognitive workload
(<a href="https://arxiv.org/abs/2405.16081" target="_blank" rel="noreferrer">Tang et al.</a>). That matches the lived
experience pretty closely: the tool removes some keystrokes, but it can leave
you with more supervisory attention per hour.</p>
<h2>Productivity is real, but it is not uniform</h2>
<p>The mixed productivity literature is not a contradiction. It is a clue. AI
looks strongest when the task is self-contained and the success condition is
clear. It looks weaker, or even negative, when the task depends on repository
history, tacit conventions, and careful integration.</p>
<p>The most useful counterweight to the upbeat productivity story is METR’s RCT
with experienced open-source maintainers working in their own repositories.
Those developers were 19% slower with early-2025 frontier tools even though
they believed AI had helped
(<a href="https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study-paper.pdf" target="_blank" rel="noreferrer">Becker et al.</a>).
That does not erase the positive findings from Copilot, Google, or the field
experiments. It tells you the gains are conditional, and that supervision costs
are large enough to flip the sign in realistic work.</p>
<p>That is also why I do not find self-reported productivity persuasive on its own.
The work can feel smoother while still becoming denser. Generating a first draft
faster is not the same thing as reducing end-to-end effort once review,
integration, and defect cleanup are included
(<a href="https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study-paper.pdf" target="_blank" rel="noreferrer">Becker et al.</a>,
<a href="https://dora.dev/insights/balancing-ai-tensions/" target="_blank" rel="noreferrer">DORA</a>).</p>
<h2>The burnout argument should be made carefully</h2>
<p>I do not think the evidence supports the lazy claim that AI simply causes
burnout. DORA’s work found higher flow, higher satisfaction, and less burnout
among heavier AI users, which is real counterevidence
(<a href="https://dora.dev/insights/value-of-development-work/" target="_blank" rel="noreferrer">Storer</a>).</p>
<p>But that is not the whole picture. Berkeley researchers following AI use inside
an actual company found a different dynamic: expanded scope, more simultaneous
threads, fewer natural stopping points, and work seepage into lunch, evenings,
and other recovery time
(<a href="https://newsroom.haas.berkeley.edu/ai-promised-to-free-up-workers-time-uc-berkeley-haas-researchers-found-the-opposite/" target="_blank" rel="noreferrer">Ye and Ranganathan</a>).</p>
<p>The clean way to reconcile those findings is to stop asking whether AI is good
or bad in the abstract. The better question is what the organization does with
the local speedup. If faster drafting gets translated into bigger PRs, more
scope, and machine-paced expectations without redesigning review and recovery,
the day gets denser even if some people feel more productive in the short run
(<a href="https://dora.dev/insights/value-of-development-work/" target="_blank" rel="noreferrer">DORA</a>,
<a href="https://newsroom.haas.berkeley.edu/ai-promised-to-free-up-workers-time-uc-berkeley-haas-researchers-found-the-opposite/" target="_blank" rel="noreferrer">Ye and Ranganathan</a>).</p>
<p>I want to be careful not to overclaim here. I am not saying the research is
settled, or that I can prove AI causes burnout. I am saying that after more
than 25 years running engineering teams across different industries, I
recognize the shape of work intensification when I see it. On AI-heavy teams,
the hours on the calendar do not always go up, but more of the day gets spent
in the most mentally expensive mode: judgment, review, supervision, and
exception handling. People end the day more drained. Breaks get squeezed.
Focus gets chopped up. Work-life balance feels worse even when the timesheet
looks roughly the same. That is not proof, but it is a signal strong enough
that leaders should redesign the work before the damage is obvious.</p>
<h2>If the workday gets denser, shorter schedules make more sense</h2>
<p>This is why I think shorter schedules deserve more serious attention than “keep
the same hours and supervise at machine tempo.” Long hours are associated with
worse cognitive outcomes in Whitehall II, and newer diary research suggests
longer days can help same-day performance while hurting next-day performance
through worse sleep and lower morning resilience
(<a href="https://pubmed.ncbi.nlm.nih.gov/19126590/" target="_blank" rel="noreferrer">Virtanen et al.</a>,
<a href="https://doi.org/10.1002/job.2847" target="_blank" rel="noreferrer">ten Brummelhuis et al.</a>).</p>
<p>The strongest modern evidence points to real hour reduction, not compression. A
large six-country four-day-week study found better burnout, mental health,
physical health, and job satisfaction
(<a href="https://doi.org/10.1038/s41562-025-02259-6" target="_blank" rel="noreferrer">Fan et al.</a>). The case for a
six-hour day is still directionally positive, but thinner and older
(<a href="https://www.jstage.jst.go.jp/article/jhe1972/30/1-2/30_1-2_197/_article" target="_blank" rel="noreferrer">Akerstedt et al.</a>,
<a href="https://doi.org/10.5271/sjweh.3610" target="_blank" rel="noreferrer">Schiller et al.</a>).</p>
<p>If AI is increasing the cognitive density of engineering work, then reducing
hours is not just a perk. It is one way to keep the gains from turning into
continuous supervision with no recovery buffer.</p>
<h2>What I would change to protect judgment and recovery on an AI-heavy team</h2>
<p>The first thing I would change is the assumption that local coding speed should
automatically become end-to-end schedule compression. That is where I think a
lot of teams are going to get this wrong. If a project used to take two to
three weeks of design, one to two weeks of coding, and one week of review, I
would not try to turn that into one week of design, three days of coding, and
one week of review just because the model can draft faster. I would meet in the
middle. I would keep the total timeline roughly similar, maybe two weeks for
design and implementation together and one week for team review.</p>
<p>That might sound like leaving productivity on the table. I think it is the
opposite. The gain is not just calendar math. It is better design quality, more
time for the author to read their own work with fresh eyes, and less pressure
to jam review into lunch, evenings, and context-switched gaps. The draft may
appear faster, but judgment does not. The real constraint moves from typing to
thinking, and teams should plan accordingly.</p>
<p>I would also make author review a first-class part of the schedule. AI can
produce a lot of plausible code quickly, but the person closest to the change
still has the best chance of catching local mistakes before they hit the rest of
the team. That means the author should have time blocked for a real self-review
pass, not just generation followed by immediate PR creation. For larger changes,
I would normalize a cooling-off period between drafting and review so the author
can come back with fresher judgment instead of merging on momentum.</p>
<p>Reviewer capacity has to be protected the same way on-call capacity is
protected. Human review does not scale at the same rate as machine generation.
So I would push harder on smaller batches, tighter PR boundaries, explicit
evidence requirements, and clearer ownership. A reviewer should not have to
reconstruct intent from a wall of generated code. The PR should say what
changed, why it changed, what the risky parts are, what tests were run, and
where the reviewer should spend attention. That does not remove review work, but
it lowers the cognitive tax.</p>
<p>I would also deliberately add recovery time back into the system. No lunch-hour
review expectations. No quiet assumption that people will finish the thinking
work at night because the coding part got faster. No back-to-back days packed
with deep review, meetings, and prompt steering. On an AI-heavy team, recovery
is not a nice extra. It is part of keeping judgment sharp enough to be
trustworthy. That can mean fewer simultaneous threads, protected focus blocks,
review rotations, quiet hours, or even shorter schedules if more of the day is
now spent in high-attention work.</p>
<p>I would resist the urge to turn every local efficiency gain into more scope.
This is one of the easiest management mistakes to make. The model saves a few
days on drafting, so the organization quietly fills those days with more
tickets, more PRs, or more parallel work. That is how denser work gets
normalized without anyone saying it out loud. A healthier choice is to spend
some of that gain on better design, more complete testing, smaller changes, and
actual breathing room between cognitively expensive tasks.</p>
<p>Finally, I would measure whether the system is getting healthier, not just
faster. Review time, PR size, rework, after-hours activity, reopened work, and
escaped defects tell you far more than prompt counts or raw merge volume. The
real question is not whether the model wrote code quickly. The real question is
whether the team can absorb that speed without turning every day into continuous
supervision.</p>
<p>Some practical changes fall out of that pretty quickly:</p>
<ul>
<li>Require authors to include a risk note and test summary for larger
AI-assisted PRs.</li>
<li>Put a soft cap on how many AI-heavy reviews one person is expected to do in a
day.</li>
<li>Normalize overnight or half-day cooldowns before opening large PRs.</li>
<li>Protect at least one meeting-light block each day for deep review or
decompression.</li>
<li>Ban the quiet cultural habit of “just review it over lunch” or “take one more
look tonight.”</li>
<li>Spend some AI time savings on design reviews and edge-case thinking instead
of immediately increasing scope.</li>
</ul>
<p>My view is not that AI makes software work easier or harder in general. It makes
the work different. It removes some friction and concentrates more of the
remaining day in supervision, exception handling, and judgment. That can become
a real productivity gain, or a faster path to mental saturation. The difference
is whether teams redesign the system around human review and human recovery
instead of pretending people should work at machine tempo.</p>
<p>The management mistake is to treat faster drafting as permission to squeeze the
rest of the system. That is backwards. The faster the draft appears, the more
deliberate teams have to be about design quality, review load, and recovery
time. Otherwise AI does not lighten the work. It just removes the pauses that
used to keep people from operating at continuous supervisory intensity.</p>
]]></content:encoded>
</item>
</channel>
</rss>