The Systems Thinking Approach to Reliable AI

A practical look at using systems thinking to build trustworthy AI workflows with context, utilities, and feedback loops.

June 25, 2026

An LLM left on its own is a closed system, and a closed system is unpredictable. Systems thinking is how you turn it into something you can actually trust.

On a recent Snowflake migration, I told the model the same rule several times. One table, automation_sandbox.stage_prod, was for DDL reference only. Never a source. It mostly listened. Then somewhere in a thousand-line file it sourced that table once, fully qualified, and I did not catch it. The mistake sat quietly until dev integration, where it surfaced late and cost real time to unwind.

Here is the uncomfortable part. The model did what I asked most of the time, and “most of the time” is exactly the problem. A rule that holds ninety-five percent of the time is not a rule, it is a coin flip with good odds, and you find the five percent at the worst possible moment.

That miss is what pushed me toward a different way of working with AI. Not better prompts. A better system.

Everything Reduces to Inputs, Process, Outputs

There is a line that gets passed around in systems circles: the simplest form of any system is a series of inputs and outputs. It is a paraphrase of the Input-Process-Output model, and no single person owns it. It is foundational across systems theory, cybernetics, and computing.

It sounds almost too obvious to be useful. It is not.

Once you start seeing your work as inputs flowing through a process into outputs, you start noticing that the process is the part you have the least control over when an AI is in the loop.

The whole job, it turns out, is getting control of the middle.

The Five Things Every System Does

Systems theory is the framework for studying interconnected, interdependent components that form a complex whole. There are five aspects worth carrying with you, because every technique later is just one of these applied to AI.

Emergence: the whole being greater than the sum of its parts.
Interdependence: changing one component affects the others.
Open versus closed: whether a system interacts with its environment or not.
Boundaries: what is inside the system and what is context.
Feedback: the mechanism by which a system holds or changes its structure.

That is the entire toolkit. The rest of this piece just shows those five at work in AI.

You Already Work in Systems, You Just Have Not Named Them

Think about how you actually do a piece of work. On a migration, I have a reusable approach: shim the bronze layer, migrate the pipelines, orchestrate them, deploy through CI/CD, validate the outputs. Inside “migrate the pipelines” sits a finer-grained procedure for delta control, eight steps I run without thinking about them anymore.

That nesting is systems work. The method is a system, and each step is itself a smaller system. The reason this matters is transferability: the eight-step procedure is muscle memory for me, so it lives nowhere but my head, and a procedure that lives in one head cannot be handed to another engineer, let alone to a model.

Writing it down is what makes it transferable at all. Externalizing tacit process is the prerequisite for everything that follows. Naming is also the first act of drawing a boundary: it fixes what is inside the system and what is merely context, which is the exact line a model needs in order to work on it.

You cannot hand a model a system you have never named.

An LLM On Its Own Is a Closed System

This is the reframe that changed how I work.

A model by itself processes within its own boundary. It does not interact with an environment, so its behavior stays probabilistic. That is genuinely fine for some workloads. LLMs are excellent at probabilistic tasks served by a large corpus, things like document generation and code, and they are increasingly reliable at next-token prediction inside their training distribution.

But probabilistic is exactly why my migration rule slipped. A closed system has no environment to check itself against. So the fix is not to push harder inside the boundary by restating the rule louder. The fix is to open the system up and give the model an environment to interact with.

There are three ways to open it, and the rest of the techniques are just instances of these three:

Give it an environment to read (context and constraints)
Give it an environment to act on (utilities)
Give it an environment that responds (feedback)

Opening Move One: An Environment to Read

The cheapest, highest-leverage move is a plain markdown file of persistent instructions that the model reads at the start of every session. In the tooling I use it is AGENTS.md. It carries boundaries and environment guidance in plain text.

This is the direct fix for my migration story. The rule about automation_sandbox.stage_prod held most of the time when I said it in the prompt. Put in AGENTS.md, it holds every session, every time, with no re-saying. The persistence is the whole point. A quick practitioner tip: use AGENTS.md as the standard and symlink CLAUDE.md to it, so you get one source of truth and no tool lock-in.

The same idea scales. A rule for one project becomes a reusable skill, a named set of steps the model loads when relevant. A set of skills becomes a plugin, a bundled unit you can ship to a team or an org.

That progression, from project rules to reusable steps to bundled units, is the path from a workflow that lives on your laptop to a capability your whole org can use.

Opening Move Two: An Environment to Act On

Some things should never be probabilistic. Loading a file, executing a known piece of SQL the same way every time, parsing a pipeline and generating column-level lineage. For those, you write a utility, a deterministic script the model can orchestrate but does not have to reason its way through.

This is interdependence in practice. The model orchestrates, the script guarantees. You take the part that must be exact off the model’s plate entirely, and you are left with the model doing what it is good at and the script doing what it is good at.

Opening Move Three: An Environment That Responds

The last move is feedback, and it is where reliability actually comes from. Individual techniques help, but the system holds together because of how the pieces connect over time.

A weak validation step returns pass or fail. A strong one returns specifics and routes them back to the step that can act on them. That is a feedback loop, and it is the difference between a system that silently fails and one that corrects itself.

If you look back at that eight-step delta procedure, the feedback was already in it: audit the entry, audit the output against source, do the counts make sense. The loop was there before any AI was involved. Naming it is what lets you rebuild it deliberately.

Where This Breaks

None of this is a magic happy path, and the honest part is that every technique above has a failure mode. The good news is that they are not AI failures, they are systems failures, which means the same five aspects that built the system also diagnose it.

A boundary goes stale. An AGENTS.md that no longer matches reality silently steers every session wrong, and because it is read every time, it is wrong every time.

Skills start to conflict. Two skills that both claim the same job produce nondeterministic selection, which is the exact opposite of what you built them for.

Feedback breaks. A validation step that only returns pass or fail cannot close the loop, so the system loses the one mechanism it had to correct itself.

A loop runs away. An agentic workflow with no termination condition does not converge, it just spins, because feedback with no damping is not feedback, it is noise.

Emergence cuts both ways. The same property that makes the whole greater than the sum is how a small bad instruction propagates into a large bad outcome. At scale, your skill quality becomes your output quality.

You can also over-constrain. Determinism bought by removing all judgment leaves you with an expensive script. The goal is reliable, not lobotomized.

Look at that list again and notice what it is made of: boundaries, interdependence, feedback, emergence. These are not new problems. They are the same five aspects from the start of this piece, showing up as failure instead of structure. The framework that builds the system is the framework that debugs it.

The Skill That Outlasts the Tools

It still reduces to inputs, process, and outputs. You just have a lot more control over what happens in the middle now.

That is the part worth holding onto. The tool names will change. AGENTS.md might be called something else in a year, and the specific frameworks will churn. The systems lens will not. Find the closed system, open it up, give it an environment to read, to act on, and to respond. That is the whole game, and it is the same game whether you are working with an AI or not.

If you’re looking to design AI systems that behave predictably, bring deterministic discipline to your LLM workflows, or build the context and feedback loops that make AI reliable in production, Hakkoda and IBM can help. Contact our team to start the conversation today.

Ready to learn more?

Speak with one of our experts.