This article did not start as an abstract attempt to invent yet another standard.
It came from practice.
I use AI agents a lot in software development, documentation, project research, knowledge-vault maintenance, task planning, and sometimes even ordinary work scenarios. The more I work with them, the more one repeated problem becomes visible.
We often think we are talking about the same thing.
In practice, we often mean different things.
There are more AI-assisted development tools now. More agents. More copilots. More automation layers. Many products talk about workflows, memory, context, tools, permissions, approvals, MCP, agents, and human-in-the-loop systems.
But a shared language is still missing.
One tool says “agent” and means a chat interface with repository access. Another says “agent” and means an autonomous worker. A third says “workflow,” but that may mean a prompt chain, a CI pipeline, a task plan, or just a nice diagram. “Memory” can mean chat history, a vector database, project notes, a user profile, or organizational knowledge. “Permission” can mean a system prompt, an approve button, a GitHub token, a policy file, or an informal team agreement.
As long as one person works with one assistant, this is tolerable.
But when work involves several people, several agents, several repositories, several tools, and real actions inside a project, the problem becomes serious.
At that point, the question is no longer only how smart the model is.
The question is how the work itself is described.
The Problem Is Not Only Agents
In AI developer tooling, we often jump too quickly to the question: “What can this agent do?”
That question matters. But before it, there are more boring and more fundamental questions.
Who participates in the project? What is each actor responsible for? Which context sources may be read? Which ones are off-limits? What memory can survive after the task? Which actions can an agent only suggest, and which actions can it perform? Where is review required? Who approves risky changes? What counts as a handoff? What should appear in the audit trail?
If these things are not described explicitly, the system may still work.
It will simply work through habits, hidden prompts, tool-specific settings, and human memory.
That is fine in the beginning. It does not scale well.
When AI becomes part of engineering work, the missing layer is not another beautiful UI or another list of prompts.
The missing layer is something that can be read, discussed, checked, compared, and moved between tools.
In simpler terms: a specification layer.
Why I Started NexFlow
I started NexFlow as an attempt to create that common language.
NexFlow is an open specification-first project for describing AI developer teams.
The repository is public here:
https://github.com/iwizy/NexFlow
The boundary is important: NexFlow is not an AI coding agent, an LLM wrapper, a chat application, or a production runtime.
At this stage, it is a specification, documentation, JSON Schemas, reference examples, and an RFC process.
I wanted to start with the language, not with the runtime.
If you start with a runtime too early, it is easy to hard-code your current preferences into it: a specific provider, a specific memory model, a specific permission model, a specific workflow. Then the specification becomes just a description of one product.
I wanted the opposite.
First, describe the language.
How can a team declaratively describe a project, agents, tasks, handoffs, permissions, capabilities, context sources, memory scopes, providers, events, and extensions? How can this stay simple enough to read, but structured enough to validate with schemas and maybe execute later?
The current version is still an early draft.
But it is already useful as a way to discuss the work more precisely.
Capability Is Not Permission
One of the central distinctions in NexFlow is capability versus permission.
A capability is what an actor can technically do.
For example: read a repository, modify a file, create a pull request, execute a command, access Linear, call an MCP server, or read documentation.
A permission is a policy decision: whether that action is allowed, denied, or approval-gated.
In practice, this distinction sounds obvious. In real AI tools, it is often blurred.
If an agent can technically call a tool, does that mean it is allowed to call it? If it can read a GitHub issue, can it edit the issue? If it can see documentation, can it save the result as long-term memory? If it has local filesystem access, can it write anywhere?
In a serious system, the answer should not depend on the mood of the current chat.
It should be declared explicitly. A capability should not automatically authorize action.
That is a small distinction, but a more governable system can be built around it.
Context Should Be Explicit Too
An AI agent does not work in empty space.
It reads repositories, tasks, documentation, design files, issue trackers, knowledge bases, decision history, Obsidian vaults, Linear projects, Figma files, MCP servers, web sources, and local files.
But context is not just “give the model more text.”
Context has a source, access mode, freshness, classification, owner, limitations, and risk level.
Reading a public README is one thing. Reading an internal roadmap is another. Opening production logs is another. Using personal data is another. Persisting the result into long-term memory is something else again.
If these sources are not described, the AI system begins to operate in a fog.
It feels as if the agent “knows the project.”
But where did that knowledge come from? Is it current? Is it allowed? Who authorized the access? What will be retained after the task?
In NexFlow, context sources should be declared explicitly.
Not because that looks elegant.
Because without it, security, quality, and responsibility are hard to discuss honestly.
Memory Should Not Be Magical
Everyone wants AI to “remember the context.”
I want that too. But memory without boundaries quickly becomes a junk drawer, a privacy risk, and a source of strange decisions. Especially when an agent carries context from one task into another, from one project into another, or from one user into another.
So memory should not be described as a vague “let the agent remember.”
It should be scoped.
Ephemeral memory for the current interaction. Task memory for a specific task. Project memory for a project. Team memory for a team. User memory for an individual. Organization memory for an organization.
Each scope should have retention, ownership, visibility, update rules, sensitivity, and allowed consumers.
That sounds bureaucratic only until an agent saves the wrong thing, uses it in the wrong place, or confidently relies on outdated information.
Memory should be useful.
It should not be uncontrolled.
Handoff Matters More Than It Seems
In a normal team, we intuitively understand how work moves from one person to another.
An analyst prepares requirements. A developer implements. QA checks. A tech lead reviews risk. A manager makes a decision.
In AI-assisted work, handoff becomes even more important.
If an implementation agent finishes a task, what exactly does it hand over to a reviewer? Which artifacts are involved? What are the acceptance criteria? What remains blocked? What was tested? What was not tested? Which decisions were made? Where is human judgment required?
Without a proper handoff, work becomes a stream of messages.
With a proper handoff, the work has state.
And state can be read, reviewed, automated, and retained.
Why Specification-First
It is fair to ask: why not build a CLI or runtime immediately?
Because a runtime without a clear model starts making architectural decisions too early.
It decides how state is stored. How agents are named. How permissions work. How memory is written. How providers are selected. How approvals happen. How events are logged.
Too often, these decisions remain inside one product.
I am interested in a different layer.
A layer that is useful before a runtime exists.
A team should be able to describe a project with manifests: who participates, which agents exist, what they can do, which context sources are available, where approvals are required, which memory is allowed, and which events should be audited.
Even if nothing is executed automatically, that configuration is already useful as reviewable documentation.
The runtime can come later.
Not the other way around.
What Exists Now
NexFlow currently has a draft 0.1 manifest vocabulary.
The core file set includes:
project.yaml, agents.yaml, agent-definitions.yaml, workflow.yaml, tasks.yaml, handoffs.yaml, permissions.yaml, capabilities.yaml, context.yaml, memory.yaml, providers.yaml, model-profiles.yaml, prompt-sets.yaml, retrieval-profiles.yaml, events.yaml, and extensions.yaml.
There is documentation for the core concepts, manifest reference, context model, memory model, autonomy model, capability model, handoff protocol, event model, agent definitions, model profiles, prompt sets, retrieval profiles, extensions, provider abstraction, security, governance, validation, and conformance.
There are practical draft JSON Schemas for the manifests.
There are reference examples for minimal, software, startup, enterprise, and product delivery teams.
There is an RFC process, with accepted project-vision and core-manifest RFCs and several active draft RFCs around conformance, validation, extension namespaces, approval gates, and agent definition versioning.
And there is an important limitation: this is not a runtime yet.
NexFlow does not execute workflows. It does not call providers. It does not run agents. It does not persist production memory. It also does not provide a production CLI yet.
That is intentional.
First the language. Then validation. Then runtime decisions.
What This Means In Practice
The practical question matters too: what does a developer or team get from this description if the runtime does not execute workflows yet?
The answer is simple: a reviewable artifact.
Instead of discussing an AI-assisted workflow at the level of “the agent helps,” the team can describe the system explicitly.
For example:
actor:
role: implementation_agent
capabilities:
- read_repository
- edit_files
- run_tests
permissions:
edit_files: allowed
push_changes: requires_approval
delete_data: denied
context:
sources:
- repository
- issue_tracker
- project_docs
memory:
scope: task
retention: temporary
handoff:
reviewer: human_reviewer
requires:
- summary
- changed_files
- tests_run
- open_risks
This is not a final schema and not a promise of runtime behavior. It is an example of the kind of explicitness that matters.
In this form, a team can discuss not an abstract “AI agent,” but concrete boundaries: what it can read, what it can change, where approval is required, what memory is retained, what must be handed to the reviewer, and which risks remain open.
Even if the workflow is still executed manually, such a description is useful.
It helps compare tools, prepare reviews, explain security models, design handoffs, and avoid mixing technical capability with organizational permission.
Why This May Matter Later
I do not think every team will start writing AI-team manifests tomorrow.
That is not how these things usually happen.
But I think the need for such a language will grow.
While an AI agent behaves like a personal assistant, chat is enough. But when AI agents become part of development, support, analysis, QA, documentation, release processes, and product delivery, companies will want to know what is actually happening.
Who had access to what.
Why an agent was allowed to perform an action.
Who approved a risky operation.
Which context was used.
Which memory was written.
Why work moved from one actor to another.
What was done automatically, and what was only suggested.
Without a shared language, each tool answers these questions in its own way.
With a shared language, there is a chance for portability, auditability, comparison, and a calmer evolution of AI-assisted engineering.
The Main Point
There is no magic in NexFlow.
Honestly, I like that.
Good foundational systems often look boring at first. They do not promise to replace a team in a week. They do not say an agent will “just do everything.” They give a way to describe reality more precisely.
In AI, that matters especially.
The stronger models and tools become, the more important it becomes not only to do something, but to understand who did it, why it was allowed, which context was used, and under whose responsibility it happened.
NexFlow is my attempt to start from that side.
Not from yet another agent.
From a language for describing AI developer teams before they act.
The project is open on GitHub:
https://github.com/iwizy/NexFlow
It is an early draft.
But if we want AI to become a normal part of engineering work, we will need to agree not only on models and tools.
We will need to agree on how to describe the work itself.
