“Write It So It’s Right”: A New Rite in Dev-Assist AI?

J. Paul Reed
4 min readJun 22, 2023

Last week, I attended Sapphire Ventures’ second annual Hypergrowth Engineering Summit. I had the honor of speaking last year at the inaugural event, so it was great to be back chatting with such an astute, engaged, and interesting group of technology leaders and entrepreneurs.

Two trends were the talk of the conference, both in the talks and during the breaks:

  • First was the shift in focus from the organizational hypergrowth to profitability demonstration. (An irony Sapphire’s David Carter gave us all a good laugh about, given the event’s namesake!)
  • The second: the impact of ChatGTP, Copilot, and AI generally on not only the tech industry, but engineering organizations themselves.

This likely surprises approximately… no one.

I work with a lot of organizations on engineering operational resilience, and so the impact of general AI, and how it changes the joint cognitive systems we participate in, always piques my interest. Resilience Engineering, too, has had a long standing focus on automation, humans’ interactions with it, and its effects on socio-technical systems.

Given this, two statements made during the conference really stood out to me: both were from Eddie Aftandilian, who spoke on the current state of AI-powered developer tools. Eddie is a principal researcher at GitHub Next, studying how developers interact with tools like Copilot.

First of note, he provided a prompt a developer might give a developer-assistant AI:

Write a [Continuous Integration] script for my project and make sure that it works.

The CI Script to End All CI Scripts

Given many developers are tasked with their own release engineering work these days, it’s a very plausible example. As a (recovering?) build / release engineer, I found it to be a fascinating one, too.

Here’s why: years ago, Thoughtworker Ken Mugrage and I had an interesting conversation about precisely these types of CI/CD scripts and infrastructure. Ken is a recovering build engineer too, and has spent a ton of time in the (release engineering) trenches at various organizations, same as me.

We started talking about how as DevOps practices proliferated, more developers were writing these types of scripts. We noticed we could always tell just by looking at a build script whether it had been written by a developer, and operations engineer, or a build engineer. (I’d always meant to write a longer-form post about what, exactly, made this so obvious to us… but like so many things, it distills down to: with enough expertise, “you know it — or the lack of it — when you see it.”)

Back to the example AI prompt, it should become clear that the latter half of that sentence — “and make sure that it works” — is doing some heavy lifting.

A lot of heavy lifting.

Much of Copilot’s training corpus is pulled from open source projects. Open source has some of the most amazing build systems around. (The build scripts for the Linux kernel come to mind.) But for many open source projects, an “elegant build system” is not an itch anyone wants to spend time (or has the reach) to scratch. So developers cobble something together that works and move on. Nothing wrong with that.

But assuming the AI’s “knowledge” around “CI scripts” is built off of corpora rife with training models that represent “just get it done and move on,” we’re going to get results that represent a pretty mixed bag, at best.

For instance, one aspect of elegant build systems is that they are pluggable and modularized such that they do reasonable things both in the CI/CD infrastructure, as well as developers’ desktops. They support hooks, so getting advanced custom debugging tools or various types of packaging output are easy. They handle secrets correct, if they need to. If the AI is tasked with grepping its way through GitHub for CI scripts, we’re likely to see it produce a lot riffs on “while true; do ./configure; make; make install; done”.

Which brings us to the second half of that sentence: “make sure that it
works.” Would the above script work? Probably. Most of the time. You might be wasting a lot of cycles (read: “cloud spend”) with pointless rebuilds. You’ll get into weird states, because errors get swallowed. (Good luck debugging that if they’re swallowed in ways that don’t ever go to a log file!) But it would work. Until it doesn’t.

So, fundamentally, we get back to “make sure that it works” becomes… a very loaded statement. What does “make sure” here mean? How do we validate “it works” in any meaningful way that isn’t just a new form of “Well, it runs on my machine?” Heck, what does “work” even actually mean? If you ask a developer, an operations engineer, and a build engineer whether their CI/CD infrastructure “works,” you’re liable to get four answers. At least.

Fundamentally, AI and “prompt engineering” start to raise some interesting questions for technologists precisely around the idea of “work-iness” and “correctness.”

“Write me some unit tests… and make sure that they work!”

It begs the question: who becomes the arbiter of whether a CI script (or any source code, for that matter) “works?” Is it thousands of random CI scripts from all over the Internet, many of them copied from each other, warts and all? Or do we still think expertise from an engineer who’s seen everything “under the Sun” [sic] is still… relevant? And how will organizations react if — when, really — those two definitions don’t match?

This brings us to the second comment Eddie made, which not only gave me pause… but some hope, too:

“Fundamentally, we don’t really understand how [large language models] work, which is why it’s so important that humans remain in the loop.”

Indeed.

--

--

J. Paul Reed

Resilience Engineering, human factors, software delivery, and incidents insights; Principal at Spective Coherence: What Will We Discover Together?