Intellectual debt: The hidden costs of machine learning

Published December 16, 2019 Updated December 28, 2023 5 min read

Gary Kaiser

Many of us are already familiar with technical debt. And while the concept is not new, the disruptions of cloud transformation and increasing time-to-market pressures shine a bright light on its many downsides.

Summarized: technical debt is measured by the catch-up rework required to compensate for poor decisions. And while there’s always more than one way to solve a problem, short-sightedness should not rule the decision.

Commonly applied to development processes, technical debt accrues overtime when we choose an inefficient path of least resistance. Whether it’s reusing what we already have, sticking with platforms we know, or opting for a shortcut instead of a longer but better approach, these decisions may mean less work now, but can mean much more work in the future.

Our good intentions promise that we’ll revisit the shortcomings later—but of course “later” rarely arrives. Even small amounts of technical debt compound as new code branches from old, further embedding the shortcomings into the system. At some point the debt reaches a tipping point where the high costs of maintenance prevent innovation.

Technical Debt—Dilbert Comic Strip on 2017-01-03

Martin Fowler provides a great summary of technical debt here, and offers additional resource links.

Intellectual debt

Intellectual debt follows a similar curve. It’s not a new concept, but most of us are probably unfamiliar with its meaning.

Intellectual debt can be defined as “the gap between what works and our knowledge of why it works”. Perhaps the most common examples lie in the pharmaceutical world; drugs are sometimes tested and approved based on statistically meaningful outcomes on patients, even though nobody knows how they work. “We have an answer; it seems to work. We’ll figure out the theory later.” But what happens when, as with technical debt, later never comes? We lose our understanding of how things work. Or worse; early successes may lull us into complacency, leading us to stop caring. Our ability to foresee interactions and conflicts suffers. We can’t predict when things might go wrong. And when the answers fail, we’re unable to make corrections quickly.

In the days of human-centric IT operations, domain experts filled the ranks of operations teams. The best of these helped their companies grow by improving business agility and customer experience, even before these became digital mantras. Business Darwinism naturally (and sometimes abruptly) took care of those that couldn’t keep up. The sudden lure of artificial intelligence (AI) and machine learning (ML) systems designed for IT brings new urgency to the topic of intellectual debt.

(For a thorough, non-IT article on intellectual debt, read Jonathan Zittrain’s “Intellectual Debt: With Great Power Comes Great Ignorance.”)

Intellectual debt and IT: A machine learning inflection point

Today’s APM landscape buzzes with both AI and machine learning messaging, and the terms are frequently used interchangeably. But beneath the hype, ML-based systems dominate vendor offerings; they’re relatively simple to retrofit to existing solutions and can sift through mountains of data to highlight relationships that might be interesting.

Machine learning systems suggest answers to open-ended or fuzzy questions; the greater (and more consistent) the pool of relevant data, the more accurate the answers. But while some scientific disciplines have many fuzzy edges—medicine, or perhaps facial recognition, are good examples—IT systems certainly do not. Computers are inherently logical and deterministic; for each observable outcome there’s a knowable specific cause.

Machine learning systems work—often quite well—through correlation, not causation. They can’t explain cause and effect since that’s not how they operate, instead, they point out varying degrees of correlation. In simpler terms, they apply heuristics instead of “thinking” logically. Treating IT systems as fuzzy problems, as machine learning does, offers real improvement—in scale and in accuracy—over human analysis, but isn’t the most effective approach, quietly building intellectual debt.

Don’t misunderstand; if machine learning can help you rapidly narrow your focus and solve complex problems, that’s not a bad thing—if your frame of reference is the manual, human intervention approach of Gen2 APM solutions. But machine learning systems have a myopic focus on answers without theory, conclusions without explanation. Zittrain points out that they “traffic in byzantine patterns with predictive utility, not neat articulations of relationships between cause and effect.” They’re like the lone IT hero who glances at a bunch of metric charts and conjures up an answer based on “gut feel” gained through experience over time. But by blindly accepting either of these—the IT hero or the machine learning magician—we accrue intellectual debt.

If machine learning systems can’t offer a reason for their answers, it’s difficult to know when they’ve misfired. As monitored environments become larger, more dynamic, and more heterogeneous, these misfires will become more common. In fact, the machine learning approach of suggesting multiple answers could be considered a broad-brush approach to improve the perception of accuracy—even though, without human intervention, accuracy decreases.

What does intellectual debt look like? In the next blog, we’ll look at a few examples of how intellectual debt might begin to accrue unnoticed, with an eye towards its impact on IT efficiency. We’ll also present the case that efficiency alone isn’t the best approach to judging value.