Field Notes

Reasoning vs Execution: Where AI Actually Fits in Automation

Durable automation systems separate AI-driven interpretation from deterministic execution instead of treating models as the entire workflow engine.

May 17, 2026/AI SystemsAutomationSystems Design

For the past couple of years, I’ve noticed a pattern repeating itself across teams building AI-powered workflows.

Someone identifies a manual process that should be automated.

The first instinct is almost always:

Input → AI Model → Output

Maybe the workflow gets wrapped in a custom prompt (skill), an agent, or a chain of tools. At first, it feels magical. The demo works. The automation appears flexible. The possibilities seem endless.

Then reality starts to set in.

The workflow becomes slow. Outputs drift. Edge cases pile up. Reliability drops. Nobody fully understands why certain failures happen. The prompt grows from 20 lines to 400. Eventually, the system becomes harder to maintain than the original process it was supposed to replace.

This isn't because AI is bad. It's because we’re often using AI for the wrong parts of the system.

The most reliable automation systems I’ve seen don’t treat AI as the entire workflow engine. They separate reasoning from execution.

This distinction matters more than most people realize.

The Core Mistake: Using AI to Hide Broken Processes

One of the biggest traps in AI automation is using models as a band-aid for operational problems that already existed.

Messy workflows.
Inconsistent inputs.
Poorly defined business rules.
Tribal knowledge.
Weak data models.
Undocumented edge cases.

Instead of fixing the underlying process, teams wrap it in a prompt.

The system may initially appear more intelligent, but often it’s just obscuring complexity instead of removing it.

This creates what I’d call fuzzy infrastructure:

difficult to reason about
difficult to debug
difficult to test
difficult to scale

A deterministic system failing usually produces a reproducible bug.

An AI-driven system failing often produces probabilistic degradation:

inconsistent outputs
silent corruption
edge-case hallucinations
confidence without reliability

That difference becomes painfully obvious in production environments.

AI Is Excellent at Interpretation

The important nuance is that AI is incredibly valuable in automation systems, just not always where people first apply it.

AI excels at:

interpreting messy inputs
semantic classification
extracting structure from unstructured data
summarization
translation
identifying intent
generating candidate actions
filling gaps humans would normally reason through

In other words, AI is strongest where ambiguity genuinely exists.

But many workflows don’t actually contain much ambiguity.

A lot of enterprise workflows are fundamentally:

state management
business rules
orchestration
validation
deterministic execution

These are areas where traditional software engineering still massively outperforms AI systems in reliability, observability, and scalability.

The dangerous pattern looks like this:

Entire workflow → AI black box

More durable patterns look like this:

Messy input → AI interpretation → deterministic execution pipeline

OR wherever possible

Structured input → deterministic execution pipeline

That difference is subtle, but it completely changes the reliability profile of the system.

Deterministic Systems Compound Over Time

One reason traditional automation ages better is because deterministic systems compound operationally.

They become:

easier to understand
easier to monitor
easier to optimize
easier to debug
easier to scale

Failures are usually diagnosable. Rules are explicit. Outputs are reproducible.

Meanwhile, many AI-first workflows accumulate hidden complexity over time:

larger prompts
fragile assumptions
undocumented business logic
prompt chaining
model-specific behavior
vendor / tooling dependency
growing evaluation burden

Ironically, some “AI automations” end up recreating the same problems legacy systems had, just with less observability.

I’ve started thinking about some custom AI skills and prompt systems the same way people used to think about giant stored procedures in databases:

deeply coupled
hard to reason about
difficult to test
maintained by tribal knowledge

The technology is new.
The architectural mistakes are not.

The AI Tax Is Real

Every AI call introduces overhead.

Not just financially, but operationally.

Using AI adds:

latency
nondeterminism
prompt maintenance
model drift risk
evaluation complexity
infrastructure cost
vendor dependency
debugging challenges

That doesn’t mean AI isn’t worth using. It means AI should justify its complexity.

The question shouldn’t be:

“Can AI do this?”

The question should be:

“Is ambiguity intrinsic to this problem?”

If the workflow already has:

structured inputs
clear business rules
deterministic outputs
stable requirements

…then traditional software, such as a script, is often the better tool.

AI Should Increase Clarity, Not Reduce It

One of the most important lessons I’ve learned is that AI works best when paired with high-quality systems design.

Good AI systems are built on:

clean interfaces
structured metadata
explicit workflows
observable state
strong evaluation pipelines
reliable data models

AI is not an alternative to maintaining good data.

If anything, it increases the importance of it.

A lot of organizations assume AI will compensate for inconsistent systems. In reality, poor data quality and broken workflows eventually surface through reliability issues, scaling problems, and operational debt.

The limitations may be temporarily masked but are rarely eliminated.

Backtesting and Evaluation Matter More Than Ever

Another issue I see constantly is teams treating AI workflows as inherently difficult to test.

This is backwards.

AI systems require more testing discipline, not less.

Many AI automations are evaluated emotionally:

“The demo looked good”
“It usually works”
“It handled my example”
“The outputs feel intelligent”

That isn’t enough for production systems.

Whether the workflow is deterministic, AI-driven, or hybrid, reliability requires:

replayable test cases
edge-case analysis
regression testing
benchmark datasets
confidence thresholds
backtesting
evaluation frameworks

One of the best uses of AI, ironically, is helping engineers rapidly build and iterate deterministic systems that can actually be tested rigorously.

That’s increasingly become my preferred workflow:

Use AI to accelerate development
Use AI to identify edge cases
Use AI to interpret unstructured inputs
Use deterministic systems for execution
Escalate uncertain cases to humans

This tends to produce systems that are both flexible and reliable.

A Practical Framework

The simplest framework I’ve landed on is this:

Use deterministic logic when:

rules are explicit
outputs must be auditable
workflows are repeatable
precision matters
failures are expensive
state consistency is critical
systems need to scale predictably

Use AI when:

inputs are unstructured
ambiguity is unavoidable
semantic interpretation is required
humans currently rely on judgment
edge cases are difficult to enumerate
flexibility matters more than exact precision

Use hybrid systems when:

inputs are messy but execution must be reliable
AI can normalize or classify before deterministic processing
workflows need human escalation paths
you want flexibility without sacrificing observability

In practice, the hybrid model is where I think most durable automation systems will land.

The Real Opportunity

AI should help systems understand. Deterministic infrastructure should help systems reliably act. This distinction matters because reasoning is probabilistic whereas execution cannot be for critical workflows.

The future of automation probably isn’t fully deterministic systems or fully autonomous AI agents.

It’s systems that clearly separate reasoning from execution.