<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>amlug</title><description>Personal technical writing by Ralph Kuepper (amlug). Deep notes on compilers, systems, and the occasional side project.</description><link>https://amlug.net/</link><item><title>The vibe-coding disaster list is shorter than CrowdStrike alone</title><link>https://amlug.net/posts/vibe-coding-disaster-list/</link><guid isPermaLink="true">https://amlug.net/posts/vibe-coding-disaster-list/</guid><description>CrowdStrike&apos;s 2024 outage caused more damage than every named AI-coding production failure combined. The &apos;vibe coding broke production&apos; narrative falls apart on the numbers.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A number that stopped me.&lt;/p&gt;
&lt;p&gt;Estimated Fortune 500 damage from CrowdStrike&apos;s July 2024 outage: $5.4 billion. Cause: a Windows kernel driver tried to read 21 input parameters from a struct that defined 20. Missing bounds check, missing test coverage. Pure human-written C++.&lt;/p&gt;
&lt;p&gt;Estimated damage from every publicly named AI-generated-code production failure I could find, combined: a small fraction of that. Probably under a hundred million dollars across every documented case. Maybe much less. Most of the famous &quot;vibe coding disasters&quot; aren&apos;t actually about AI-generated code at all.&lt;/p&gt;
&lt;p&gt;Sit with that for a moment. The narrative is &quot;AI is breaking production.&quot; The numbers say humans broke production at a scale AI hasn&apos;t approached, do it constantly, and have for sixty years. That doesn&apos;t make AI fine. It means we&apos;ve been pointing the conversation at the wrong thing.&lt;/p&gt;
&lt;p&gt;What follows is a sort through three different failure modes that all get called &quot;vibe coding broke production,&quot; and the human baseline nobody seems to check.&lt;/p&gt;
&lt;h2&gt;The setup&lt;/h2&gt;
&lt;p&gt;When someone says &quot;vibe coding broke production,&quot; they could mean any of three things, and the press treats them interchangeably.&lt;/p&gt;
&lt;p&gt;The first is AI-written code shipping with the code itself defective. The agent isn&apos;t running anymore; the artifact is. A function with a logic bug, a config with insecure defaults, a hallucinated package import that resolves to malware.&lt;/p&gt;
&lt;p&gt;The second is an AI agent taking a destructive action at runtime. The &quot;code&quot; is mostly beside the point — what failed was the agent&apos;s decision, not the artifact it left behind. Drop a database, run terraform destroy, ignore a code freeze.&lt;/p&gt;
&lt;p&gt;The third is a human shipping bad code they don&apos;t fully understand because an AI wrote it. The author was AI, the deployer was human, blame goes either way. This is the messy middle, and most of the empirical data sits here.&lt;/p&gt;
&lt;p&gt;These need different defenses. SAST, code review, and dependency pinning catch the first. Sandboxing and permission scoping catch the second — keeping the agent out of the things you don&apos;t want deleted. The third is engineering culture, which either existed or didn&apos;t before AI showed up.&lt;/p&gt;
&lt;p&gt;Most coverage conflates them, and the conflation leads to wrong fixes. &quot;Replit&apos;s AI deleted Jason Lemkin&apos;s database&quot; gets cited as evidence that AI-written code is dangerous. It&apos;s actually evidence that AI agents with database write privileges are dangerous, which is a much more obvious finding. The code Replit wrote that day wasn&apos;t the failure. The action it took was.&lt;/p&gt;
&lt;p&gt;I went through the named cases in each category. Here&apos;s what&apos;s in the public record.&lt;/p&gt;
&lt;h2&gt;Failure mode 1: AI-written code broke in production&lt;/h2&gt;
&lt;p&gt;The list of publicly named cases where AI-generated code shipped, ran in production, and demonstrably caused a failure is short.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amazon, March 2, 2026.&lt;/strong&gt; Amazon Q (Amazon&apos;s internal AI coding assistant) was, per the company&apos;s own internal post-incident review, &quot;one of the primary contributors&quot; to a code change that miscalculated delivery times. Result: 1.6 million website errors, 120,000 lost orders. Amazon&apos;s internal memo cited &quot;novel GenAI usage for which best practices and safeguards are not yet fully established&quot; as a contributing factor. Three days later a separate incident took &lt;a href=&quot;http://Amazon.com&quot;&gt;Amazon.com&lt;/a&gt; down for six hours and lost an estimated 6.3M orders. Same memo described both as part of &quot;a trend of incidents&quot; with &quot;high blast radius.&quot;&lt;/p&gt;
&lt;p&gt;Caveat though: Amazon publicly disputes the AI attribution. Their official statement points to &quot;an engineering team user error&quot; with broader impact than it should have had. The reference to &quot;Gen-AI assisted changes&quot; was reportedly deleted from the internal memo before the engineering meeting that the FT and CNBC reported on. So you have an internal Amazon assessment versus an external Amazon statement. Both sourced. The OECD AI Incidents Monitor has classified the March 2 incident as an AI Incident regardless. Even the strongest publicly-attributed case has a corporate dispute attached.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lovable, CVE-2025-48757.&lt;/strong&gt; AI-generated app code shipped for 170+ confirmed production apps, all missing Row Level Security on their Supabase backends. Researchers Matt Palmer and Kody Low scanned 1,645 apps showcased on Lovable&apos;s own marketplace; 170 of them leaked user data through identical RLS misconfigurations. A second researcher at Palantir reproduced the issue independently with 15 lines of Python and pulled debt balances, home addresses, and API keys in under an hour. CVSS 9.3. The code &quot;worked&quot; in the sense that it returned HTTP 200. It just returned everyone&apos;s data to anyone who asked.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Moltbook, January 2026.&lt;/strong&gt; AI-generated frontend code with a Supabase API key embedded in client-side JavaScript and no Row Level Security. Founder Matt Schlicht: &quot;I didn&apos;t write a single line of code for @moltbook.&quot; Wiz Research found the misconfiguration within minutes of the platform launching: 1.5 million API authentication tokens, 35,000 emails, and private agent messages exposed. Some of those messages contained third-party OpenAI API keys in plaintext, so the breach didn&apos;t stop at Moltbook. Meta acquired Moltbook two months later, so the founder did fine. The users&apos; data was already mirrored on torrent sites.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Slopsquatting.&lt;/strong&gt; AI assistants confidently suggest package names that don&apos;t exist. Attackers register the most-suggested ones with malicious payloads. The cleanest documented case: security researcher Bar Lanyado registered &lt;code&gt;huggingface-cli&lt;/code&gt; after noticing LLMs kept hallucinating it. He uploaded nothing — no code, no README, no SEO — and the package got over 30,000 downloads in three months. A USENIX 2025 study found roughly 20% of AI-generated code samples reference nonexistent packages, and the hallucinated names are persistent across sessions, which makes them ideal squatting targets. Slopsquatting is the one genuinely new failure mode AI introduces. Humans don&apos;t typo the same fake name over and over across thousands of users.&lt;/p&gt;
&lt;p&gt;That&apos;s the named tier. After that you get the anonymized cases — clearly real, but no company will put their name on them.&lt;/p&gt;
&lt;p&gt;David Loker, VP of AI at CodeRabbit, has publicly described an AI-generated change at his own company that &quot;would have taken down our database in production&quot; if it had rolled out. Caught in review. A March 2026 Lightrun survey of engineering leaders at AT&amp;amp;T, Citi, Microsoft, Salesforce, and UnitedHealth Group found 43% of AI-generated code changes need debugging in production. Sonar&apos;s CEO Tariq Shaukat — formerly of Bumble and Google Cloud, so a credible source — has publicly said his team is &quot;hearing more and more&quot; about consistent outages at major financial institutions where developers attribute the failure to AI-generated code. No names. An AI-assisted trading system reportedly lost $78,947 in January 2026 due to a silent fallback issue. Anonymous.&lt;/p&gt;
&lt;p&gt;Then the empirical layer. Doesn&apos;t name companies, but rigorous enough to be hard to dismiss.&lt;/p&gt;
&lt;p&gt;CodeRabbit analyzed 470 GitHub PRs and found AI-generated code had 1.7× more bugs, 75% more logic errors, 8× more I/O issues, and 2× more concurrency mistakes than human-written code. Veracode&apos;s 2025 GenAI Code Security Report: 45% of AI-generated code samples failed basic security tests against the OWASP Top 10. Tenzai (December 2025) tested 15 apps across five major AI tools, found 69 vulnerabilities, and noted that every single app lacked CSRF protection, every tool introduced SSRF vulnerabilities, and zero apps set security headers. Escape.tech scanned 5,600 vibe-coded apps live in production and found 2,000+ vulnerabilities, 400+ exposed secrets, and 175 instances of exposed personal data.&lt;/p&gt;
&lt;p&gt;The empirical layer says &quot;AI code is shipping with defects to production constantly.&quot; The named-case layer is suspiciously small relative to that. &lt;strong&gt;Companies are eating these failures privately.&lt;/strong&gt; Amazon is the only Fortune 500 company that has publicly had its name attached to one, and Amazon spent the press cycle disputing the attribution.&lt;/p&gt;
&lt;h2&gt;Failure mode 2: AI agents took destructive actions at runtime&lt;/h2&gt;
&lt;p&gt;This is the column most &quot;vibe coding disaster&quot; articles cite. None of it tells you whether AI-written code is safe to ship. It tells you that AI agents with destructive privileges are dangerous, which is a different question.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Replit&apos;s AI deletes SaaStr&apos;s production database, July 2025.&lt;/strong&gt; Jason Lemkin, founder of SaaStr, was nine days into building a project on Replit when the agent ran destructive commands during an explicit code freeze. Wiped records on 1,206 executives and 1,196 companies. The agent then admitted to &quot;a catastrophic error in judgement&quot; and falsely told Lemkin the rollback wouldn&apos;t work — it did. Replit&apos;s CEO publicly acknowledged the incident and rolled out automatic dev/prod database separation as a fix. The deletion got the headlines. The interesting part was what came after: the agent fabricated 4,000 fake user records to cover the deletion, and lied about its own rollback capability when asked. Both behaviors, not code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Amazon Kiro deletes AWS Cost Explorer environment, December 2025.&lt;/strong&gt; Kiro decided that the cleanest path forward to fix a permissions issue was to delete and recreate the environment from scratch. 13-hour outage in the mainland China region. A senior AWS employee told the FT it was &quot;small but entirely foreseeable.&quot; Amazon disputes the AI framing here too.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Claude Code runs &lt;code&gt;terraform destroy&lt;/code&gt; on DataTalks.Club.&lt;/strong&gt; Wiped 2.5 years of production data, ~1.94 million rows, 100K+ students affected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Orchids zero-click hack on a BBC reporter, February 2026.&lt;/strong&gt; Researcher Etizaz Mohsin used a vulnerability in the Orchids platform to insert one line of code into BBC tech correspondent Joe Tidy&apos;s project and gain full remote control of his laptop. Zero clicks, zero downloads. This one is technically a hybrid — the platform vulnerability enabled the agent&apos;s privileges to be exploited — but it gets reported as an AI failure.&lt;/p&gt;
&lt;p&gt;These are real, well-sourced failures. Structurally, they&apos;re the same failure mode as a system administrator running &lt;code&gt;rm -rf /&lt;/code&gt; on the wrong server. The only novel piece is that the entity at the keyboard is now an LLM. The fact that someone with destructive privileges can do destructive things has been the foundation of every Unix admin nightmare since 1971.&lt;/p&gt;
&lt;h2&gt;The human baseline nobody checks&lt;/h2&gt;
&lt;p&gt;This is the piece missing from most &quot;AI broke production&quot; coverage, and it&apos;s the bulk of what&apos;s actually going on.&lt;/p&gt;
&lt;p&gt;The Consortium for Information &amp;amp; Software Quality estimated the cost of poor software quality in the US alone at $2.41 trillion per year in 2022. Operational failures, technical debt, cybersecurity damage, project failures. The human-written-code baseline that AI code is being measured against. The baseline is already brutal.&lt;/p&gt;
&lt;p&gt;The named cases are everywhere. A short list of famous human-written code that broke production catastrophically.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CrowdStrike, July 19, 2024.&lt;/strong&gt; The largest IT outage in history. 8.5 million Windows machines crashed simultaneously. $5.4 billion in estimated Fortune 500 damages alone, before counting smaller businesses, healthcare disruption, or the airlines that grounded thousands of flights. Cause per CrowdStrike&apos;s own root cause analysis: a kernel driver tried to read 21 input parameters from a struct that defined 20. Missing bounds check in C++ code. Missing test coverage for the input validation. Delta is suing for $500 million. Class action lawsuits are pending. Pure human-written code, signed off by a human review process at a $90B cybersecurity company.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Knight Capital, August 1, 2012.&lt;/strong&gt; $440 million lost in 45 minutes. Bankrupted the company. An engineer deployed updated trading code to seven of eight servers. The eighth still ran a deprecated 2003 feature called &quot;Power Peg.&quot; A repurposed flag bit reactivated the dead code. Within 45 minutes, four million erroneous orders had hit the market across 154 stocks worth $7.65B in positions. No deployment validation, no peer review, no circuit breakers, no automated rollback. Knight got bailed out and absorbed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AWS S3, February 28, 2017.&lt;/strong&gt; Half the internet went dark for four hours from one typo. An AWS engineer was running an established playbook to remove a small set of S3 billing servers. He typo&apos;d a parameter. The command removed too many servers, including ones running the index and placement subsystems. Cascading failure. Slack, Trello, Quora, Medium, Docker, IFTTT, and AWS&apos;s own status dashboard all went down. Cyence estimated S&amp;amp;P 500 companies lost $150M during the outage alone.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GitLab, January 31, 2017.&lt;/strong&gt; An engineer ran &lt;code&gt;rm -rf&lt;/code&gt; on the production database server instead of the secondary replica. Lost six hours of data, affected 5,000 projects, 700 user accounts. Then discovered five backup mechanisms had all failed silently — none had been tested in production. The only working backup was a manual one taken six hours earlier. Same shape as the Replit/SaaStr incident. The only thing that changed was the entity at the keyboard.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Therac-25, 1985–1987.&lt;/strong&gt; Killed at least three patients with massive radiation overdoses. Race condition in human-written code: if the operator typed a prescription too quickly, the machine could fire its high-power electron beam without the proper shielding in place. Software replaced hardware safety interlocks present in earlier models. Now a canonical case study in how human-written safety-critical code can kill people.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Boeing 737 MAX MCAS, 2018–2019.&lt;/strong&gt; 346 deaths across two crashes. Software design flaw: MCAS could repeatedly trigger nose-down trim from a single Angle-of-Attack sensor, with no failure handling, and pilots were never told the system existed. Human-written, human-reviewed, signed off by humans, killed hundreds of humans.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ariane 5 Flight 501, 1996.&lt;/strong&gt; Rocket exploded 40 seconds after launch. ~$370 million lost. A 64-bit floating-point velocity got converted to a 16-bit signed integer. Overflow. Self-destruct. The code was reused from Ariane 4 without re-validation against Ariane 5&apos;s flight envelope.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Northeast blackout, August 14, 2003.&lt;/strong&gt; 55 million people lost power across 8 US states and Ontario. ~100 deaths attributed to the blackout. General Electric&apos;s XA/21 energy management system had a race condition that prevented operators from receiving alarm notifications about cascading line trips.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TSB Bank IT migration, April 2018.&lt;/strong&gt; 1.9 million customers locked out of their accounts. Some saw other customers&apos; balances. CEO resigned. Estimated cost: £330 million.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;http://Healthcare.gov&quot;&gt;Healthcare.gov&lt;/a&gt;, October 2013.&lt;/strong&gt; The most expensive failed website launch in US history. ~$2.1 billion total cost. On launch day, six people successfully enrolled. The system collapsed under 50,000 concurrent users when designed for far more.&lt;/p&gt;
&lt;p&gt;This isn&apos;t an exhaustive list. It&apos;s just the big ones. The empirical baseline behind them, per Steve McConnell&apos;s &lt;em&gt;Code Complete&lt;/em&gt;: 15–50 bugs per 1,000 lines of human-written code as the industry average. 1–5 bugs per kLOC even in released, post-test software. 0.5 bugs per kLOC for Microsoft&apos;s released products with their full review process. 0.1 bugs per kLOC for the NASA Space Shuttle, the gold standard, achieved at thousands of dollars per line of code.&lt;/p&gt;
&lt;p&gt;When CodeRabbit reports AI-generated code has 1.7× more bugs than human-written code, the comparison is to a baseline of 15–50 bugs per kLOC. That gets us to maybe 25–85 bugs per kLOC for AI code. Both numbers are alarming. The human baseline is already a managed disaster — code review, CI, staged rollouts, postmortems, runbook discipline are the entire reason any of this works at all. Without that process, the baseline would be much worse than it already is.&lt;/p&gt;
&lt;h2&gt;Where the comparison actually lands&lt;/h2&gt;
&lt;p&gt;Put the failure modes side by side. For every AI-code failure pattern in failure mode 1, there&apos;s a famous human-coded equivalent that predates AI by a decade or more.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;AI-code failure&lt;/th&gt;
&lt;th&gt;Human-coded equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lovable apps shipped without Row Level Security&lt;/td&gt;
&lt;td&gt;Tea app shipped with an open Firebase bucket. Uber&apos;s &quot;God View&quot; gave employees access to everyone&apos;s location. Uncountable S3 buckets left publicly readable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Moltbook hardcoded its Supabase key in client JavaScript&lt;/td&gt;
&lt;td&gt;Decades of secrets-committed-to-git incidents. The entire reason GitGuardian exists as a company.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slopsquatting: AI hallucinates a package name, attacker registers it&lt;/td&gt;
&lt;td&gt;Typosquatting: a human typos &lt;code&gt;react-router&lt;/code&gt; and installs malware. &lt;em&gt;Same attack class, different cause of the typo. This is the closest equivalent and it isn&apos;t quite the same.&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Q generated a logic error in delivery time calculation&lt;/td&gt;
&lt;td&gt;Knight Capital reused a flag bit that reactivated dead code. AWS S3 engineer typo&apos;d a command parameter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI-generated &lt;code&gt;Promise.all&lt;/code&gt; over an array storms the connection pool&lt;/td&gt;
&lt;td&gt;Every concurrency bug humans have made since threading existed. Therac-25, Northeast blackout, Mars Pathfinder.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI agent runs &lt;code&gt;DROP TABLE&lt;/code&gt; or &lt;code&gt;terraform destroy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GitLab engineer ran &lt;code&gt;rm -rf&lt;/code&gt; on the wrong server. AWS S3 engineer wiped too many servers with one keystroke.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI hallucinates security defaults&lt;/td&gt;
&lt;td&gt;Therac-25 was designed without proper interlocks. CrowdStrike shipped without bounds checking.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The point isn&apos;t that humans are bad too, so AI is fine. The point is that bad code failing in production is the default state of software. AI is a new author of that bad code. The bad-code-fails-in-prod problem predates it by sixty years and costs $2.4 trillion a year.&lt;/p&gt;
&lt;h2&gt;Where the comparison breaks the AI case&lt;/h2&gt;
&lt;p&gt;Two things don&apos;t survive the comparison cleanly.&lt;/p&gt;
&lt;p&gt;The first is velocity. Human-written code happens at human speed. AI is faster — by a lot. CodeRabbit&apos;s 1.7× bug rate combined with a plausibly 5–10× code-volume increase per developer means absolute bug volume per developer is up an order of magnitude, even at constant per-line quality. The review process that worked at the old volume doesn&apos;t necessarily scale to the new one. Lightrun&apos;s 43%-need-debugging stat is consistent with this. Amazon&apos;s response to its March outages was to mandate senior engineer sign-off on AI-assisted code — a velocity-control measure, not a code-quality measure.&lt;/p&gt;
&lt;p&gt;The second is slopsquatting. Typosquatting and dependency confusion existed before AI. But &quot;the AI consistently hallucinates the same plausible-but-fake package name across many users, creating a deterministic attack surface for whoever registers it first&quot; is a category that didn&apos;t exist when humans were typing every import statement. The attack relies on the predictability of LLM hallucinations, which has no human analog.&lt;/p&gt;
&lt;p&gt;Everything else on the AI side has a structural human equivalent. Slopsquatting is the one place the comparison genuinely breaks toward &quot;AI introduces a novel risk.&quot; The rest is cause-of-failure, not type-of-failure.&lt;/p&gt;
&lt;h2&gt;The meta-point&lt;/h2&gt;
&lt;p&gt;Most of the press coverage of &quot;AI is breaking production&quot; conflates the three failure modes from the setup, and the conflation matters.&lt;/p&gt;
&lt;p&gt;AI-written code that ships with bugs is real, and probably happening at scale below the public attribution threshold. The defenses are old: review, test, scan, pin dependencies, ship secure-by-default templates.&lt;/p&gt;
&lt;p&gt;AI agents with destructive privileges are dangerous in a way that has nothing to do with the code they author. The defenses are also old: scope permissions, sandbox, require approval for destructive operations, log everything. The Replit incident is structurally identical to a junior sysadmin running &lt;code&gt;rm -rf&lt;/code&gt; in the wrong directory, which the industry has been defending against since the 1970s.&lt;/p&gt;
&lt;p&gt;The human-written-code baseline is already catastrophic. CrowdStrike alone caused more economic damage in one weekend than every named AI-code production failure combined. Knight Capital lost more money in 45 minutes than the visible cost of every documented vibe-coding incident put together. The reason these incidents don&apos;t dominate the news cycle anymore is that we&apos;ve gotten used to them.&lt;/p&gt;
&lt;p&gt;So the defensible conclusion isn&apos;t &quot;AI code is fine.&quot; It also isn&apos;t &quot;AI code is dangerous.&quot; It&apos;s smaller: &lt;strong&gt;the failure mode is the same. The question is whether your review and rollout discipline keeps pace with whatever, or whoever, is now generating your code 5× faster.&lt;/strong&gt; That&apos;s a question about engineering culture. It happens to be the same question we&apos;ve been answering badly for sixty years.&lt;/p&gt;
&lt;p&gt;Engineering cultures that already handle bugs well — review, CI, staged rollouts, blameless postmortems — handle AI bugs about the same. There are just more bugs per developer per hour. Cultures that don&apos;t handle bugs well find out faster. Adding AI to an environment with no CI and no review process was always going to surface the missing CI and review process. AI didn&apos;t create that gap. It declined to paper over it.&lt;/p&gt;
&lt;p&gt;The accurate version of &quot;vibe coding broke production&quot; is something like: your existing review process broke under increased code velocity, and the velocity increase happened to be from AI. Less marketable. Closer to true.&lt;/p&gt;
&lt;h2&gt;Closing&lt;/h2&gt;
&lt;p&gt;Anyone using AI-code failures as evidence that AI shouldn&apos;t write code is, by extension, arguing humans should stop writing code too. The CrowdStrike kernel driver, the Knight Capital deployment, the AWS typo, the GitLab &lt;code&gt;rm -rf&lt;/code&gt;, the Therac-25 race condition, the 737 MAX MCAS — these are all cases where humans wrote code with the same failure modes AI gets blamed for, with vastly higher body counts and dollar costs, across decades of evidence.&lt;/p&gt;
&lt;p&gt;I work on a TypeScript-to-native compiler that is mostly written by Claude Code under my direction. I review the architecture and the diffs; the agents do most of the typing. The thing I&apos;m defending against is failure mode 1: AI-written code that compiles, passes tests, ships, and turns out to have a logic error in production. I review what comes back the same way I&apos;d review human code — maybe more carefully, because the volume is higher. That&apos;s the only real adjustment.&lt;/p&gt;
&lt;p&gt;The question worth asking isn&apos;t whether AI code is safe to ship. It&apos;s whether your engineering culture is honest enough to catch anyone&apos;s bad code before it reaches users. If yes, AI is mostly a speedup. If not, you&apos;ll find out at scale.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Sources for the named incidents: CrowdStrike&apos;s own root cause analysis (PDF on their site), the SEC 8-K filing on Knight Capital, AWS&apos;s official postmortem on the S3 outage at &lt;a href=&quot;http://aws.amazon.com/message/41926&quot;&gt;aws.amazon.com/message/41926&lt;/a&gt;, GitLab&apos;s published postmortem, the OECD AI Incidents Monitor for the Amazon March 2026 incident, the NIST NVD entry for CVE-2025-48757, Wiz Research&apos;s disclosure on Moltbook, and Bar Lanyado&apos;s writeup of the huggingface-cli experiment. The CISQ &quot;Cost of Poor Software Quality in the US: A 2022 Report&quot; is the source for the $2.41 trillion figure. CodeRabbit&apos;s State of AI vs Human Code Generation Report, Veracode&apos;s 2025 GenAI Code Security Report, the Lightrun engineering survey, and the Tenzai assessment are the sources for the empirical comparison data.&lt;/em&gt;&lt;/p&gt;
</content:encoded></item><item><title>Your default compiler flags are leaving 8× on the table</title><link>https://amlug.net/posts/default-compiler-flags-8x/</link><guid isPermaLink="true">https://amlug.net/posts/default-compiler-flags-8x/</guid><description>Five compiled languages agree on a numeric loop to within 2%. A compiled-TypeScript experiment is 8× faster. This isn&apos;t a story about TypeScript — it&apos;s about what the other five lost by default.</description><pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Here is a number that stopped me.&lt;/p&gt;
&lt;p&gt;I ran a tight loop on an Apple M1 Max. One hundred million iterations,
adding &lt;code&gt;1.0&lt;/code&gt; to a double each time. The program was compiled and run in
eight languages. The timings, in milliseconds, best of five runs:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;loop_overhead&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Rust (&lt;code&gt;rustc -O&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C++ (&lt;code&gt;g++ -O3&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Java (OpenJDK 21, JIT)&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swift (&lt;code&gt;swiftc -O&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go (&lt;code&gt;go build&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node.js 25 (V8)&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bun 1.3.5 (JavaScriptCore)&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compiled TypeScript (Perry, LLVM)&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Sit with the top five for a moment. Five different compilers, five
different language designs, two decades apart in age. They agree on this
loop to within 2%. That alone is worth noticing — the conventional
wisdom that picking C++ over Go buys you raw throughput is, on this
benchmark, just not visible.&lt;/p&gt;
&lt;p&gt;And then the bottom row. Twelve. An order of magnitude better than the
five &quot;fast&quot; languages — built, of all things, by compiling TypeScript.&lt;/p&gt;
&lt;p&gt;This isn&apos;t a story about TypeScript being fast. It&apos;s a story about why
the five compiled languages are identical to each other, and why their
shared default output is eight times slower than it has to be.&lt;/p&gt;
&lt;h2&gt;The setup&lt;/h2&gt;
&lt;p&gt;The compiled-TypeScript entry is &lt;a href=&quot;https://perry.sh/&quot;&gt;Perry&lt;/a&gt;, an ahead-of-time compiler I
work on. It parses TS with SWC and generates native code through LLVM,
the same backend clang and rustc use. For this article Perry is a
measuring instrument — a way to isolate one specific thing: LLVM&apos;s
optimizer, when you hand it identical IR but with different flags.&lt;/p&gt;
&lt;p&gt;The benchmark is a ported set of eight compute microbenchmarks, one of
which is the loop above. Full source and raw numbers are in &lt;a href=&quot;https://github.com/ralphkuepper/perry/tree/main/benchmarks/polyglot&quot;&gt;the
polyglot benchmark suite&lt;/a&gt;. I ran every benchmark in every language at
the flags its documentation recommends for release builds — nothing
extra, nothing turned off. Best of five; best of twenty on the one
benchmark (&lt;code&gt;fibonacci&lt;/code&gt;) sensitive to branch-predictor state.&lt;/p&gt;
&lt;p&gt;These are compute microbenchmarks. Before we continue: do not generalize
them to &quot;language X is 8× slower than language Y on real workloads.&quot; On
a realistic application — one that spends its time in I/O, allocation,
a scheduler, a database driver — the programming-language choice drops
to the noise floor. What these numbers probe is narrow: the compiler&apos;s
output on numeric loops with &lt;code&gt;double&lt;/code&gt; / &lt;code&gt;f64&lt;/code&gt; arithmetic. That narrow
probe, it turns out, is where the defaults get interesting.&lt;/p&gt;
&lt;p&gt;Three specific optimization choices account for every case where the
compiled-TypeScript column looks strange. I&apos;ll walk through each with
the LLVM IR to back the claim.&lt;/p&gt;
&lt;h2&gt;Optimization 1: IEEE 754 strict addition is really slow&lt;/h2&gt;
&lt;p&gt;The 99 ms Rust number is not laziness, and it&apos;s not because Rust is
worse than C. Here is what clang emits from a vanilla &lt;code&gt;-O3&lt;/code&gt; build of the
same C++ loop:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;; clang -O3 bench.cpp -S -emit-llvm, inside bench_loop_overhead:
2:                                    ; preds = %2, %0
  %3 = phi i32    [ 0,           %0 ], [ %9, %2 ]
  %4 = phi double [ 0.000000e+00, %0 ], [ %8, %2 ]
  %5 = fadd double %4, 1.000000e+00    ; serialized
  %6 = fadd double %5, 1.000000e+00    ; waits for %5
  %7 = fadd double %6, 1.000000e+00    ; waits for %6
  %8 = fadd double %7, 1.000000e+00    ; waits for %7
  %9 = add nuw i32 %3, 4
  %10 = icmp eq i32 %9, 100000000
  br i1 %10, label %11, label %2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;clang unrolled the loop four times — four &lt;code&gt;fadd&lt;/code&gt;s in the body, counter
incrementing by 4. But look at the data dependencies: each &lt;code&gt;fadd&lt;/code&gt; takes
the result of the previous &lt;code&gt;fadd&lt;/code&gt; as its input. &lt;code&gt;%6&lt;/code&gt; cannot start
until &lt;code&gt;%5&lt;/code&gt; finishes. &lt;code&gt;%7&lt;/code&gt; has to wait on &lt;code&gt;%6&lt;/code&gt;. Every instruction in the
body sits in a serial latency chain.&lt;/p&gt;
&lt;p&gt;On an M1 Max a single &lt;code&gt;fadd&lt;/code&gt; takes about 3 cycles. Four serialized
fadds per loop body × 3 cycles each = 12 cycles per iteration. With the
4× unrolling, 100M iterations becomes 25M loop bodies. 25M × 12 cycles
= 300M cycles. At 3.2 GHz that&apos;s 94 ms. Measured: 98 ms. Close enough.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;clang cannot collapse this chain.&lt;/strong&gt; Not because it doesn&apos;t see the
pattern — it obviously does — but because IEEE 754 forbids the
transformation. Floating-point addition is not associative. For
arbitrary inputs, &lt;code&gt;(a + b) + c&lt;/code&gt; can differ from &lt;code&gt;a + (b + c)&lt;/code&gt;, because
a large intermediate in one order rounds away bits that would have
survived in the other. Programs that care about that — numerical
simulations, interval arithmetic, reproducibility guarantees — need
the result. The compiler must preserve it.&lt;/p&gt;
&lt;p&gt;Now the same function with one flag added:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;; clang -O3 -ffast-math bench.cpp:
2:                                    ; preds = %2, %0
  %3 = phi i32        [ 0, %0 ],              [ %6, %2 ]
  %4 = phi &amp;lt;2 x double&amp;gt; [ zeroinitializer, %0 ], [ %5, %2 ]
  %5 = fadd fast &amp;lt;2 x double&amp;gt; %4, splat (double 4.000000e+00)
  %6 = add nuw i32 %3, 8
  %7 = icmp eq i32 %6, 100000000
  br i1 %7, label %8, label %2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One &lt;code&gt;fadd fast &amp;lt;2 x double&amp;gt;&lt;/code&gt; per iteration. Two parallel lanes, each
adding &lt;code&gt;4.0&lt;/code&gt; (because LLVM folded &lt;code&gt;((x+1)+1)+1)+1&lt;/code&gt; into &lt;code&gt;x+4&lt;/code&gt;). Eight
additions per iteration, one vector instruction. No dependency between
iterations except the accumulator itself.&lt;/p&gt;
&lt;p&gt;LLVM needed &lt;code&gt;fast&lt;/code&gt; to permit the rewrite — the &lt;code&gt;fast&lt;/code&gt; flag is a bundle
that includes &lt;code&gt;reassoc&lt;/code&gt; (&quot;may reorder&quot;), &lt;code&gt;contract&lt;/code&gt; (&quot;may fuse mul+add
into fma&quot;), and four more properties about NaN, infinity, signed zero,
and reciprocal arithmetic. Turning it on says &quot;I don&apos;t care about
strict IEEE 754 anywhere in this compilation unit.&quot; clang&apos;s measured
result with the flag: &lt;strong&gt;12 ms&lt;/strong&gt;. Eight times faster than the default.&lt;/p&gt;
&lt;p&gt;Perry&apos;s generated IR for the same function carries &lt;code&gt;reassoc contract&lt;/code&gt;
on every float instruction by default — a subset of &lt;code&gt;fast&lt;/code&gt; that permits
reordering and fma contraction but preserves NaN, Inf, and &lt;code&gt;-0.0&lt;/code&gt;
semantics (which JS programs can observe). After LLVM&apos;s standard
optimization pipeline runs on Perry&apos;s naïve load/fadd/store IR, it
becomes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;vector.body:
  %vec.phi   = phi &amp;lt;2 x double&amp;gt; [...], [ %0, %vector.body ]
  %vec.phi14 = phi &amp;lt;2 x double&amp;gt; [...], [ %1, %vector.body ]
  %0 = fadd reassoc contract &amp;lt;2 x double&amp;gt; %vec.phi,   splat (double 1.0)
  %1 = fadd reassoc contract &amp;lt;2 x double&amp;gt; %vec.phi14, splat (double 1.0)
  %index.next = add nuw i32 %index, 4
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Two parallel &lt;code&gt;&amp;lt;2 x double&amp;gt;&lt;/code&gt; accumulators instead of clang-fast&apos;s one —
LLVM&apos;s interleave pass picked a different unroll factor here, but the
result is structurally identical: parallel fadd lanes, no serial chain.
Final disassembly shows Perry&apos;s binary running four independent
&lt;code&gt;fadd.2d&lt;/code&gt; NEON instructions per loop iteration, consuming the two FP
issue pipes M1 has available. Measured: &lt;strong&gt;12 ms&lt;/strong&gt;, the same number C++
gets with &lt;code&gt;-ffast-math&lt;/code&gt;, by a different route.&lt;/p&gt;
&lt;p&gt;Two things follow.&lt;/p&gt;
&lt;p&gt;First: &lt;strong&gt;the thing Rust and C++ lost by default was never compiler
quality. It was one bit of metadata on every fadd instruction.&lt;/strong&gt; Perry
turns that bit on in its emitter. clang turns it on when you pass
&lt;code&gt;-ffast-math&lt;/code&gt;. Both end up at the same 12 ms because both are routing
through the same LLVM optimizer. LLVM is doing the work. The languages
differ only in whether they hand LLVM the permission slip.&lt;/p&gt;
&lt;p&gt;Second: &lt;strong&gt;Go cannot participate.&lt;/strong&gt; Go&apos;s compiler has no &lt;code&gt;-ffast-math&lt;/code&gt;,
no &lt;code&gt;reassoc&lt;/code&gt; flag, and its backend does not ship a floating-point
reassociation pass. Writing the same loop in Go and building with
&lt;code&gt;go build&lt;/code&gt; — with any flags, any compiler version — produces something
indistinguishable from clang&apos;s default 97 ms. This is intentional: Go&apos;s
design prioritizes predictable compiler output over absolute
throughput. It&apos;s also the cleanest instance in this whole investigation
of &quot;the default is the ceiling.&quot;&lt;/p&gt;
&lt;p&gt;For Rust, the situation is halfway. Stable Rust has no flag to toggle
&lt;code&gt;reassoc&lt;/code&gt; on individual fadd instructions. Nightly exposes
&lt;code&gt;std::intrinsics::fadd_fast&lt;/code&gt;, which takes the same loop from 99 ms to
12 ms — matching clang-fast. Manual 4-way unrolling in stable Rust
reaches 24 ms, good but not great. On this benchmark, &quot;use nightly&quot; is
a real answer if you need parity.&lt;/p&gt;
&lt;h2&gt;Optimization 2: the benchmark that fooled me&lt;/h2&gt;
&lt;p&gt;Here is &lt;code&gt;accumulate&lt;/code&gt;: loop 100 million times, do &lt;code&gt;sum += i % 1000&lt;/code&gt; on
&lt;code&gt;double&lt;/code&gt; values, report the elapsed time. My prior belief going in was
straightforward: on ARM64 there is no hardware instruction for &lt;code&gt;fmod&lt;/code&gt;
on f64. The default C++ benchmark uses &lt;code&gt;double&lt;/code&gt;, so the modulo
lowers to a libm function call — roughly 30 ns per call, 30 ns × 100M
iterations = three full seconds theoretical, something under a second
in practice once clang vectorizes around the call. Perry&apos;s type
inference recognizes the operands are integer-valued and emits &lt;code&gt;srem&lt;/code&gt;
— one hardware instruction, one cycle — which is why Perry reports 24
ms while the other languages sit at 96–99 ms.&lt;/p&gt;
&lt;p&gt;That story is wrong in an interesting way.&lt;/p&gt;
&lt;p&gt;Here is what clang actually emits, with default flags, for the C++
version of &lt;code&gt;accumulate&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;%9  = urem &amp;lt;4 x i32&amp;gt; %5, splat (i32 1000)
%10 = urem &amp;lt;4 x i32&amp;gt; %6, splat (i32 1000)
%11 = urem &amp;lt;4 x i32&amp;gt; %7, splat (i32 1000)
%12 = urem &amp;lt;4 x i32&amp;gt; %8, splat (i32 1000)
%13 = uitofp nneg &amp;lt;4 x i32&amp;gt; %9  to &amp;lt;4 x double&amp;gt;
; ... uitofp for the other three lanes ...
%17 = tail call double @llvm.vector.reduce.fadd.v4f64(double %4,  &amp;lt;4 x double&amp;gt; %13)
%18 = tail call double @llvm.vector.reduce.fadd.v4f64(double %17, &amp;lt;4 x double&amp;gt; %14)
%19 = tail call double @llvm.vector.reduce.fadd.v4f64(double %18, &amp;lt;4 x double&amp;gt; %15)
%20 = tail call double @llvm.vector.reduce.fadd.v4f64(double %19, &amp;lt;4 x double&amp;gt; %16)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because &lt;code&gt;i&lt;/code&gt; is declared &lt;code&gt;int&lt;/code&gt; in &lt;code&gt;bench.cpp&lt;/code&gt;, clang was free to lower
&lt;code&gt;i % 1000&lt;/code&gt; to &lt;strong&gt;vectorized integer remainder&lt;/strong&gt; — &lt;code&gt;urem &amp;lt;4 x i32&amp;gt;&lt;/code&gt;. No
&lt;code&gt;fmod&lt;/code&gt; anywhere. The C++ benchmark isn&apos;t paying the libm tax I assumed
it was.&lt;/p&gt;
&lt;p&gt;So what is the 97 ms? Look at the bottom: four &lt;code&gt;llvm.vector.reduce.fadd&lt;/code&gt;
calls, chained, each feeding the next. Without &lt;code&gt;reassoc&lt;/code&gt;, a
&lt;code&gt;vector.reduce.fadd.v4f64&lt;/code&gt; must happen in a specific order — it&apos;s
semantically a serial chain of three &lt;code&gt;fadd&lt;/code&gt;s inside. Four of those
chained per iteration is twelve serial &lt;code&gt;fadd&lt;/code&gt;s. That&apos;s the bottleneck.&lt;/p&gt;
&lt;p&gt;Perry, on the same benchmark, compiles down to:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;vector.body:
  %1 = urem i64 %index, 1000
  %2 = urem i64 %0, 1000
  %3 = uitofp nneg i64 %1 to double
  %4 = uitofp nneg i64 %2 to double
  %5 = fadd reassoc contract double %vec.phi,   %3
  %6 = fadd reassoc contract double %vec.phi15, %4
  %index.next = add nuw i64 %index, 2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Two parallel scalar accumulators. Two &lt;code&gt;urem&lt;/code&gt;s, two &lt;code&gt;uitofp&lt;/code&gt;s, two
&lt;code&gt;fadd&lt;/code&gt;s, no reductions. The &lt;code&gt;urem&lt;/code&gt; was always going to be there — both
compilers found the integer remainder. The difference is that Perry&apos;s
&lt;code&gt;reassoc&lt;/code&gt; flag let LLVM hoist the accumulate out into parallel lanes
instead of a vector-reduce chain.&lt;/p&gt;
&lt;p&gt;The original story I told about Perry vs C++ on this benchmark — that
it&apos;s the &lt;code&gt;fmod&lt;/code&gt; libm call versus an &lt;code&gt;srem&lt;/code&gt; hardware instruction — turns
out to be a story about Perry vs &lt;em&gt;naïvely-compiled TypeScript&lt;/em&gt;. Perry
does have an integer-mod fast path, and it&apos;s a real optimization: if a
future TypeScript compiler on this benchmark emitted &lt;code&gt;frem double&lt;/code&gt;, it
would sit around 600 ms (Node&apos;s number: 602 ms, which is exactly this —
V8 didn&apos;t inline the fmod call). The fast path matters against that
reference point.&lt;/p&gt;
&lt;p&gt;But against &lt;code&gt;clang -O3&lt;/code&gt; on the same algorithm, the fast path isn&apos;t
what&apos;s making the difference. It&apos;s the reassociation flag, again.&lt;/p&gt;
&lt;p&gt;C++ with &lt;code&gt;-O3 -ffast-math&lt;/code&gt; on &lt;code&gt;accumulate&lt;/code&gt; clocks at 26 ms. Virtually
identical to Perry&apos;s 24 ms. Rust with its stable-toolchain opt variant
(switch the accumulator to &lt;code&gt;i64&lt;/code&gt; so the benchmark stays in integer
domain) gets to 41 ms — the integer change helps, but the reduce-chain
cost for stable Rust&apos;s fadd structure isn&apos;t reachable at 24 ms without
full &lt;code&gt;fast&lt;/code&gt; FMF, which stable Rust doesn&apos;t expose. Nightly Rust&apos;s
&lt;code&gt;fadd_fast&lt;/code&gt; doesn&apos;t help on this benchmark either, because the
bottleneck here is the reduce-chain shape, not individual fadd
permissions.&lt;/p&gt;
&lt;p&gt;Go with its opt variant (&lt;code&gt;int64&lt;/code&gt; accumulator) goes from 99 ms to 70 ms,
the biggest improvement any Go benchmark saw in the opt sweep. The
delta came entirely from avoiding the &lt;code&gt;uitofp&lt;/code&gt; per iteration, not from
vectorizing the remainder. Go&apos;s compiler emitted one iteration per loop
pass, scalar &lt;code&gt;SMULH + MSUB&lt;/code&gt; for the modulo, scalar integer add. No
vectorization. 70 ms is what you get when nothing auto-parallelizes the
accumulator.&lt;/p&gt;
&lt;h2&gt;Optimization 3: it&apos;s reassoc all the way down&lt;/h2&gt;
&lt;p&gt;The third optimization the plan called for was bounds-check elimination
and i32 loop counter promotion on the &lt;code&gt;array_read&lt;/code&gt; benchmark — sum 10
million &lt;code&gt;double&lt;/code&gt;s from an array. Perry&apos;s codegen detects the classic
&lt;code&gt;for (let i = 0; i &amp;lt; arr.length; i++)&lt;/code&gt; pattern, caches &lt;code&gt;arr.length&lt;/code&gt; at
loop entry, maintains a parallel i32 counter alongside the f64 one, and
skips the JS runtime bounds check. Measured: 3 ms.&lt;/p&gt;
&lt;p&gt;The prediction was that the other languages would be meaningfully
slower on the default benchmarks and would snap close to 3 ms when
given the right idiom: Rust&apos;s &lt;code&gt;.iter().sum()&lt;/code&gt;, Swift&apos;s
&lt;code&gt;withUnsafeBufferPointer&lt;/code&gt;, C++&apos;s already-no-bounds &lt;code&gt;std::vector&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here&apos;s what actually happened:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;default&lt;/th&gt;
&lt;th&gt;opt&lt;/th&gt;
&lt;th&gt;delta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;C++&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;-89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;-10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Swift&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Only C++ moved materially. And C++ moved because of &lt;code&gt;-ffast-math&lt;/code&gt;
(again), not bounds-elim — there are no C++ bounds to eliminate. With
&lt;code&gt;fast&lt;/code&gt; on, LLVM interleaves the array-sum reduction four lanes wide
(four parallel &lt;code&gt;&amp;lt;2 x double&amp;gt;&lt;/code&gt; accumulators, 8 f64 per load, 4 × 8 =
32-wide unroll) and gets to 1 ms. That&apos;s faster than Perry&apos;s 3 ms.&lt;/p&gt;
&lt;p&gt;Rust&apos;s &lt;code&gt;.iter().sum()&lt;/code&gt; vs the indexed &lt;code&gt;for i in 0..arr.len()&lt;/code&gt; form gave
about one millisecond — within run-to-run noise. rustc at &lt;code&gt;-O&lt;/code&gt; already
proves &lt;code&gt;i &amp;lt; arr.len()&lt;/code&gt; for that classic loop shape and strips the
bounds check as dead code. There was nothing to eliminate.&lt;/p&gt;
&lt;p&gt;Swift&apos;s &lt;code&gt;UnsafeBufferPointer&lt;/code&gt; produced an identical 9 ms. The safe
indexed form was already efficient.&lt;/p&gt;
&lt;p&gt;So the third &quot;Perry optimization&quot; I set out to document turns out to be
real in Perry&apos;s source — the code in &lt;code&gt;stmt.rs&lt;/code&gt; does track &lt;code&gt;i32 counter&lt;/code&gt;
promotion and &lt;code&gt;bounded_index_pairs&lt;/code&gt; — but it isn&apos;t load-bearing on this
benchmark. The loop vectorizer&apos;s interleave factor is what separates 9
ms from 1 ms. That&apos;s an LLVM heuristic, not a bounds thing.&lt;/p&gt;
&lt;p&gt;The honest takeaway is smaller than the plan suggested: bounds-check
elimination is mostly already happening, at least in Rust and C++, for
the straight-line loops these benchmarks exercise. What isn&apos;t already
happening is aggressive vectorization under strict IEEE 754, which is
the same optimization we discussed in section 1.&lt;/p&gt;
&lt;h2&gt;Where the compiled-TypeScript side loses&lt;/h2&gt;
&lt;p&gt;Two benchmarks where Perry loses cleanly. They matter to the argument
— if the thesis were just &quot;TypeScript is faster,&quot; they&apos;d be awkward.
Since the thesis is &quot;defaults matter,&quot; they&apos;re consistent with it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;object_create&lt;/code&gt;: 0 ms (Rust, C++, Go, Swift) vs 2 ms (Perry).&lt;/strong&gt; The
benchmark allocates a million &lt;code&gt;Point{x, y}&lt;/code&gt; structs, sums fields, and
reports the time. In statically typed compiled languages, the optimizer
stack-allocates the struct, inlines the constructor, proves the struct
never escapes the loop, and eliminates the whole thing as dead code.
The measured result is zero because the work is zero. Perry cannot
match this without abandoning its dynamic value model. A recent Perry
pass (v0.5.17) does scalar-replacement for objects whose only uses are
field get/set, which is why Perry measures 2 ms and not 10 — but any
method call on the object defeats it. This is the shape of workload
where ahead-of-time compiling a dynamic language pays a real tax
against languages with static types, and no amount of flag-tuning
closes the gap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;nested_loops&lt;/code&gt;: Perry 9 ms vs C++ opt 1 ms.&lt;/strong&gt; Same story as
&lt;code&gt;array_read&lt;/code&gt;, same cause: &lt;code&gt;-ffast-math&lt;/code&gt; enables a more aggressive
interleave factor than Perry&apos;s &lt;code&gt;reassoc contract&lt;/code&gt; subset does. Perry&apos;s
3 ms on &lt;code&gt;array_read&lt;/code&gt; and 9 ms on &lt;code&gt;nested_loops&lt;/code&gt; are both beaten by C++
opt, because &lt;code&gt;fast&lt;/code&gt; includes &lt;code&gt;nnan&lt;/code&gt; and &lt;code&gt;ninf&lt;/code&gt; permissions that the
loop vectorizer uses to pick a higher unroll. Perry deliberately does
not emit those, because JavaScript programs can observe NaN and
infinity and it would break &lt;code&gt;Math.max(-0, 0) === -0&lt;/code&gt;. That&apos;s a real
correctness tradeoff — the ceiling Perry could hit if it stopped caring
about NaN/Inf semantics is several milliseconds faster on flat-array
sums. Right now it doesn&apos;t hit it.&lt;/p&gt;
&lt;h3&gt;An aside: where JIT beats AOT&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;fibonacci&lt;/code&gt;: Java 280 ms vs Perry 311 ms. Recursive &lt;code&gt;fib(40)&lt;/code&gt; runs
about two billion real calls. Java&apos;s C2 JIT observes the recursion at
runtime and applies aggressive inlining based on actual hot-call
frequencies — something no AOT compiler can match without whole-program
profile data. Perry, C++, and Rust all cluster at ~310–319 ms through
LLVM; Swift at 360 ms and Go at 450 ms lose at the recursion-folding
stage inside their own backends. This benchmark is essentially a
compiler-pass quality test, not a flag-tuning target. No flag changes
any of these numbers materially.&lt;/p&gt;
&lt;h2&gt;The meta-point&lt;/h2&gt;
&lt;p&gt;Rust, C++, Go, and Swift picked conservative defaults for a reason.
Their users care about reproducibility, IEEE 754 correctness, and about
not having to audit every numeric operation for the possibility the
compiler silently reassociated it. A 3D renderer that reads back the
same color from two parallel paths, a simulation that needs bit-exact
replay for debugging, a financial calculation that must be verifiably
deterministic — all of these care, and they&apos;d be angry if
&lt;code&gt;(a + b) + c != a + (b + c)&lt;/code&gt;. The languages&apos; compile defaults reflect
that population.&lt;/p&gt;
&lt;p&gt;Compiling TypeScript for a JavaScript audience is a different tradeoff.
JS programs mostly don&apos;t treat &lt;code&gt;-0.0&lt;/code&gt; distinctly from &lt;code&gt;0.0&lt;/code&gt; even when
they could. Most TS code that hits a numeric loop is a game tick, a
compiler pass, a canvas renderer — workloads where a bit of
reassociation is fine. So Perry turns &lt;code&gt;reassoc&lt;/code&gt; on by default. It isn&apos;t
braver or smarter than Rust; it serves a different population.&lt;/p&gt;
&lt;p&gt;What&apos;s interesting isn&apos;t that Perry made the call. It&apos;s that the call
is invisible in most comparisons. The numbers people see when they
benchmark &quot;Rust vs TypeScript&quot; or &quot;C++ vs JavaScript&quot; reflect the
defaults both sides picked, with no indication that one side spent
those defaults on numerical robustness and the other spent them on
throughput. The benchmarks look like they&apos;re comparing languages. They
are actually comparing flag choices.&lt;/p&gt;
&lt;p&gt;There&apos;s no meta-rule for which default is right. &quot;Enable reassoc by
default&quot; is good for numeric loops and bad for scientific simulations.
&quot;Strict IEEE by default&quot; is the opposite. Both are defensible. What
isn&apos;t defensible is concluding from benchmark tables alone that one
language is faster than another. The defaults are the experiment.&lt;/p&gt;
&lt;h2&gt;Closing&lt;/h2&gt;
&lt;p&gt;Every claim in this post is reproducible with the code at the link
below. The four &lt;code&gt;bench_opt&lt;/code&gt; files showed that the &quot;Perry wins&quot; column
closes to within noise on all three flag-sensitive benchmarks when the
other languages are given the equivalent optimization path — except on
Go, where the path doesn&apos;t exist. None of this required anything
exotic. &lt;code&gt;-ffast-math&lt;/code&gt; is a flag you can type today. Nightly Rust&apos;s
&lt;code&gt;fadd_fast&lt;/code&gt; intrinsic is &lt;code&gt;#![feature(core_intrinsics)]&lt;/code&gt; plus one use
statement. Whether either should be your default is a judgment call
about what you&apos;re building.&lt;/p&gt;
&lt;p&gt;Perry exists because some of us wanted to compile TypeScript to
something that isn&apos;t a JavaScript engine. It uses LLVM. You&apos;ve seen
other LLVM-based compilers in this post: clang, rustc, swiftc. They all
produce similar output when you ask them for similar things. The
experiment this article documented is what they do when you don&apos;t.&lt;/p&gt;
&lt;h2&gt;Reproduction&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;git clone https://github.com/ralphkuepper/perry
cd perry/benchmarks/polyglot
cargo build --release --manifest-path=../../Cargo.toml -p perry
bash run_all.sh 5          # default-flags numbers — produces RESULTS.md
bash run_opt.sh 5 20       # opt variants — produces RESULTS_OPT.md
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Hardware used for the numbers in this post: Apple M1 Max (10 cores),
64 GB RAM, macOS 26.4. Perry commit &lt;code&gt;e1cbd37&lt;/code&gt; (v0.5.22). rustc 1.92.0
stable, 1.97.0-nightly 2026-04-14, Apple clang 21.0, Swift 6.3, Go
1.21.3, Node 25.8, Bun 1.3.5, Python 3.14.&lt;/p&gt;
&lt;p&gt;All LLVM IR snippets in this article are in &lt;code&gt;assets/&lt;/code&gt; as full &lt;code&gt;.ll&lt;/code&gt;
files, reproducible with &lt;code&gt;clang -S -emit-llvm&lt;/code&gt; (C++), &lt;code&gt;rustc -O --emit=llvm-ir&lt;/code&gt; (Rust), and &lt;code&gt;PERRY_SAVE_LL=&amp;lt;dir&amp;gt; perry compile&lt;/code&gt; (Perry).
The accompanying &lt;code&gt;METHODOLOGY.md&lt;/code&gt; in that directory has the exact
iteration counts, clocks, and timing methodology.&lt;/p&gt;
</content:encoded></item><item><title>On publishing slowly</title><link>https://amlug.net/posts/on-publishing-slowly/</link><guid isPermaLink="true">https://amlug.net/posts/on-publishing-slowly/</guid><description>Why this site will only see four to eight posts a year — and why that&apos;s the point.</description><pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I have a drafts folder full of technical writing that never shipped. Some of
it was too half-formed to post; most of it was fine, but I kept pushing it
to &quot;after I run one more benchmark,&quot; and after enough rounds of that, the
piece stops feeling urgent to anyone, myself included.&lt;/p&gt;
&lt;p&gt;This site is an attempt to fix the second failure mode without giving in to
the first. The plan is narrow: four to eight posts a year, each one about
something I actually had to work out, and nothing in between. No weekly
cadence, no &quot;thinking out loud&quot; threads, no TIL dumps. If I have a
half-formed thought worth sharing, a Mastodon post is the right shape for
it, not a page on a domain with my name on it.&lt;/p&gt;
&lt;h2&gt;What goes here&lt;/h2&gt;
&lt;p&gt;Most of what I&apos;ll write about lives near compilers and systems. I spend
my days on &lt;a href=&quot;https://perry.dev/&quot;&gt;Perry&lt;/a&gt;, a TypeScript-to-native compiler,
and the questions I get stuck on tend to be the kind that take a week of
measurement to answer honestly. Things like: what does &lt;code&gt;for...of&lt;/code&gt; actually
cost in a lowered IR, where is the line between &quot;clever abstraction&quot; and
&quot;extra two memory loads per iteration,&quot; and when is the answer to a
performance question &quot;the benchmark was wrong.&quot;&lt;/p&gt;
&lt;p&gt;Before Perry I spent years on Swift server-side work and contributed to
&lt;a href=&quot;https://vapor.codes/&quot;&gt;Vapor&lt;/a&gt;; some of that will show up here too, when I
have something to add that isn&apos;t already in the docs.&lt;/p&gt;
&lt;h2&gt;What does not&lt;/h2&gt;
&lt;p&gt;No company updates. No product announcements. No &quot;10 things I learned
this year.&quot; If you&apos;re looking for Skelpo news or Perry release notes,
those live on their own sites and will stay there.&lt;/p&gt;
&lt;h2&gt;The first real post&lt;/h2&gt;
&lt;p&gt;There&apos;s a benchmark investigation in the queue — a longer piece on the
cost of abstraction in the Perry front-end, with enough numbers attached
that I&apos;d rather over-verify before publishing than push it and have to
correct it in public. That article is the reason this site exists now
instead of in six months.&lt;/p&gt;
&lt;p&gt;Until then, this is the note on the door.&lt;/p&gt;
</content:encoded></item></channel></rss>