Field Notes
Reasoning vs Execution: Where AI Actually Fits in Automation
Durable automation systems separate AI-driven interpretation from deterministic execution instead of treating models as the entire workflow engine.
For the past couple of years, I’ve noticed a pattern repeating itself across teams building AI-powered workflows.
Someone identifies a manual process that should be automated.
The first instinct is almost always:
Input → AI Model → Output
Maybe the workflow gets wrapped in a custom prompt (skill), an agent, or a chain of tools. At first, it feels magical. The demo works. The automation appears flexible. The possibilities seem endless.
Then reality starts to set in.
The workflow becomes slow. Outputs drift. Edge cases pile up. Reliability drops. Nobody fully understands why certain failures happen. The prompt grows from 20 lines to 400. Eventually, the system becomes harder to maintain than the original process it was supposed to replace.
This isn't because AI is bad. It's because we’re often using AI for the wrong parts of the system.
The most reliable automation systems I’ve seen don’t treat AI as the entire workflow engine. They separate reasoning from execution.
This distinction matters more than most people realize.
The Core Mistake: Using AI to Hide Broken Processes
One of the biggest traps in AI automation is using models as a band-aid for operational problems that already existed.
Messy workflows.
Inconsistent inputs.
Poorly defined business rules.
Tribal knowledge.
Weak data models.
Undocumented edge cases.
Instead of fixing the underlying process, teams wrap it in a prompt.
The system may initially appear more intelligent, but often it’s just obscuring complexity instead of removing it.
This creates what I’d call fuzzy infrastructure:
- difficult to reason about
- difficult to debug
- difficult to test
- difficult to scale
A deterministic system failing usually produces a reproducible bug.
An AI-driven system failing often produces probabilistic degradation:
- inconsistent outputs
- silent corruption
- edge-case hallucinations
- confidence without reliability
That difference becomes painfully obvious in production environments.
AI Is Excellent at Interpretation
The important nuance is that AI is incredibly valuable in automation systems, just not always where people first apply it.
AI excels at:
- interpreting messy inputs
- semantic classification
- extracting structure from unstructured data
- summarization
- translation
- identifying intent
- generating candidate actions
- filling gaps humans would normally reason through
In other words, AI is strongest where ambiguity genuinely exists.
But many workflows don’t actually contain much ambiguity.
A lot of enterprise workflows are fundamentally:
- state management
- business rules
- orchestration
- validation
- deterministic execution
These are areas where traditional software engineering still massively outperforms AI systems in reliability, observability, and scalability.
The dangerous pattern looks like this:
Entire workflow → AI black box
More durable patterns look like this:
Messy input → AI interpretation → deterministic execution pipeline
OR wherever possible
Structured input → deterministic execution pipeline
That difference is subtle, but it completely changes the reliability profile of the system.
Deterministic Systems Compound Over Time
One reason traditional automation ages better is because deterministic systems compound operationally.
They become:
- easier to understand
- easier to monitor
- easier to optimize
- easier to debug
- easier to scale
Failures are usually diagnosable. Rules are explicit. Outputs are reproducible.
Meanwhile, many AI-first workflows accumulate hidden complexity over time:
- larger prompts
- fragile assumptions
- undocumented business logic
- prompt chaining
- model-specific behavior
- vendor / tooling dependency
- growing evaluation burden
Ironically, some “AI automations” end up recreating the same problems legacy systems had, just with less observability.
I’ve started thinking about some custom AI skills and prompt systems the same way people used to think about giant stored procedures in databases:
- deeply coupled
- hard to reason about
- difficult to test
- maintained by tribal knowledge
The technology is new.
The architectural mistakes are not.
The AI Tax Is Real
Every AI call introduces overhead.
Not just financially, but operationally.
Using AI adds:
- latency
- nondeterminism
- prompt maintenance
- model drift risk
- evaluation complexity
- infrastructure cost
- vendor dependency
- debugging challenges
That doesn’t mean AI isn’t worth using. It means AI should justify its complexity.
The question shouldn’t be:
“Can AI do this?”
The question should be:
“Is ambiguity intrinsic to this problem?”
If the workflow already has:
- structured inputs
- clear business rules
- deterministic outputs
- stable requirements
…then traditional software, such as a script, is often the better tool.
AI Should Increase Clarity, Not Reduce It
One of the most important lessons I’ve learned is that AI works best when paired with high-quality systems design.
Good AI systems are built on:
- clean interfaces
- structured metadata
- explicit workflows
- observable state
- strong evaluation pipelines
- reliable data models
AI is not an alternative to maintaining good data.
If anything, it increases the importance of it.
A lot of organizations assume AI will compensate for inconsistent systems. In reality, poor data quality and broken workflows eventually surface through reliability issues, scaling problems, and operational debt.
The limitations may be temporarily masked but are rarely eliminated.
Backtesting and Evaluation Matter More Than Ever
Another issue I see constantly is teams treating AI workflows as inherently difficult to test.
This is backwards.
AI systems require more testing discipline, not less.
Many AI automations are evaluated emotionally:
- “The demo looked good”
- “It usually works”
- “It handled my example”
- “The outputs feel intelligent”
That isn’t enough for production systems.
Whether the workflow is deterministic, AI-driven, or hybrid, reliability requires:
- replayable test cases
- edge-case analysis
- regression testing
- benchmark datasets
- confidence thresholds
- backtesting
- evaluation frameworks
One of the best uses of AI, ironically, is helping engineers rapidly build and iterate deterministic systems that can actually be tested rigorously.
That’s increasingly become my preferred workflow:
- Use AI to accelerate development
- Use AI to identify edge cases
- Use AI to interpret unstructured inputs
- Use deterministic systems for execution
- Escalate uncertain cases to humans
This tends to produce systems that are both flexible and reliable.
A Practical Framework
The simplest framework I’ve landed on is this:
Use deterministic logic when:
- rules are explicit
- outputs must be auditable
- workflows are repeatable
- precision matters
- failures are expensive
- state consistency is critical
- systems need to scale predictably
Use AI when:
- inputs are unstructured
- ambiguity is unavoidable
- semantic interpretation is required
- humans currently rely on judgment
- edge cases are difficult to enumerate
- flexibility matters more than exact precision
Use hybrid systems when:
- inputs are messy but execution must be reliable
- AI can normalize or classify before deterministic processing
- workflows need human escalation paths
- you want flexibility without sacrificing observability
In practice, the hybrid model is where I think most durable automation systems will land.
The Real Opportunity
AI should help systems understand. Deterministic infrastructure should help systems reliably act. This distinction matters because reasoning is probabilistic whereas execution cannot be for critical workflows.
The future of automation probably isn’t fully deterministic systems or fully autonomous AI agents.
It’s systems that clearly separate reasoning from execution.