I recently spent some time looking at the current state of coding agents. I found that they are already useful for investigation and some implementation work. In the MPS context, however, their usefulness depends heavily on the tools we give them. Without tools, the agents are capable of exploration and simple edits, but not serious autonomous work.

I have tried Claude Code (Opus 4.7), Codex (GPT 5.5), and Copilot (Opus 4.6). Overall, in my rather simple usage scenarios, the model was more important than the agent harness.

One useful shift in my experience was realizing that I did not have to use an agent only by asking questions or assigning coding tasks. I could also ask it to interview me, which made it much better at extracting missing context before doing any work. A simple /grill-me skill from Matt Pocock turns an agent into a surprisingly thorough interviewing machine. (The skill seems to work better with Claude Code than Codex.) This is most useful when coding new functionality with the agent because it helps you get the agent aligned with what you want without tediously writing a fully detailed prompt.

Another thing I realized was that coding agents are very resourceful. I asked Claude Code to investigate an mbeddr build after a Renovate version update: it was failing on TeamCity but passing for me locally. Given only a link to the TeamCity build, Claude downloaded the logs and looked up the jars of the updated library in the Gradle cache on my machine. It then extracted a class file from the jar and ran a hex dump on it to check the class file version. Finally, it compared my local build with the TeamCity build and pinpointed the root cause: a transitive dependency was being resolved to a different version locally than on TeamCity.

Agents can also analyze thread dumps and stack traces well, especially if they can look at the relevant code. Their resourcefulness shows up there too: don’t be surprised if an agent checks out a repository from GitHub to look at a file or uses cURL if you don’t give it access to Git or GitHub CLI.

I have since asked Claude and Codex to investigate a few more bugs and failing builds, such as a difference in classloading between docx4j 11.4 and 11.5, or a deadlock in MPS, and they were often able to pinpoint the causes much quicker than I would have, saving me hours of work.

The MPS baseline

Several people have mentioned in the MPS forum that recent agents can manipulate XML files “directly and correctly”. I was intrigued and wanted to understand the baseline for myself, i.e. what the agents can do out of the box. I chose a narrow scenario: user projects, meaning projects where previously developed MPS languages are used to model something, not projects where the languages themselves are being developed.

Agents did a good job at exploring the repository by directly reading the MPS model files, stored as XML. I would tell the agent to explore the repository with the intention of making a certain change and ask it to point me where a good place would be to make the change. The agent would go forth and start exploring the models, using Python scripts to extract bits of information from the files and avoid reading entire files into their context. The only issue was that after getting all that information, the agent would start talking cryptically to me.

MPS model files contain “registry” entries at the beginning which serve as a dictionary, mapping fully qualified names and IDs of concepts, roles, properties, and links to short indices, e.g. the DotExpression concept from base language gets index 2OqwBi which is used later in the file for all dot expressions. So the agent would then tell me that to implement my request, it is necessary to change the 2YfkLk of THw35q and insert a 2EnYce into it. The LLM must have assumed that if XML files contain these indices a lot, they must be meaningful to humans.

The agent would also ask me whether I would like it to perform the modifications it suggested. This worked out well for simple cases. Modifying a property value is trivial, as is making simple changes such as adjusting a few paths within a build language model. However, lacking the knowledge about how MPS node IDs are generated, the agent would come up with human-readable IDs for new nodes, such as requirement001 and requirement002. These IDs would then fail to parse in MPS, and the resulting model could not be loaded.

So the baseline was mixed: agents could inspect MPS XML and make small edits, but they did not understand enough of the model format to explain their findings in human terms or create valid new nodes reliably.

Since the agent was fine with writing its own Python scripts already, I assumed it would have fared even better if I could give it ready-made tools for working with MPS models. So that was my next step.

Basic tooling

My first attempt at tooling was to give the agents the possibility to “decompress” or “expand” MPS files: filter the XML, adding near each use of an index the fully qualified name of the thing it represents. I also added the possibility to list all models in a subdirectory because I found that the LLM struggled with figuring out where a particular model, identified just by its reference, was located on the file system.

This already improved the model’s ability to explore the projects. I called the tool mops (short for “MPS OPerationS”) and gave the agents a simple instruction: “Use the mops CLI tool to explore MPS models. Run mops --help for more information.” This was enough to have the model start to use it. The tool was written in Go and was very fast.

Next, I added a subcommand to the tool to generate new IDs for a given model and check the MPS model for basic structural validity. All of this was possible to implement without access to a running MPS instance. This enabled the model to make some simple edits in a user model (“add a requirement to a list of requirements in the project and implement it in the architecture, tracing your changes back to the original requirement”). The changes were made in separate models and linked together.

One pattern I noticed in these successful edits was that agents almost never tried to create a completely new structure from scratch. Instead, they opted for finding an existing node that looked similar, copying its shape, and adapting it by changing the few properties and references that mattered for the task; much like a human unfamiliar with a codebase would look for existing similar code, copy-paste it and modify.

That suggested another useful primitive for MPS tooling: let the agent add a node by duplicating an existing node (or an entire subtree) and modifying the copy in one operation. For many modeling tasks, that would be closer to how humans work in MPS than constructing XML node by node.

These growing requirements made me realize that the simple tooling would not be enough to close the feedback/validation loop that agents need to perform well. For that, we need access to a running MPS instance.

Advanced tooling

There are two main approaches when it comes to giving coding agents access to another system. One is creating an MCP (Model Context Protocol) server that exposes operations to the agent through a tool interface. The other is writing CLI tools (with GitHub CLI being the primary example). I am aware of a few initiatives developing custom MCP servers for MPS, but none are public yet, so I cannot talk about them (though I expect them to become public quite soon).

My research indicates, however, that while an MCP server running directly within MPS is easier to develop, a CLI tool may prove to be a better approach in the long run. MCP tools have descriptions, schemas, and tool results in the model’s context, and all of these can easily grow to consume a meaningful part of the model context, leaving less of it available for the task itself and making the model effectively dumber. MCP tools are also difficult to invoke outside of a coding agent. A CLI tool, by contrast, is available in the terminal, in a script or in a CI job. CLI tools are also composable: the agent can feed the output of the tool into a filter such as jq or grep, or combine several tools in novel ways. These arguments, along with the fact that the MCP approach is already being investigated by enough people, have led me to choose the CLI direction.

The main technical challenge for the CLI approach is that an MCP server can easily run in the context of a running MPS instance, while a CLI tool would either have to talk to a plugin installed inside MPS (complicated, and probably with little advantage over MCP) or start its own headless MPS instance and communicate with it, with MPS having a high startup time (on the order of dozens of seconds).

The advantage of starting a separate headless MPS instance is, however, that the tool becomes independent of any running “headful” MPS instance. This enables autonomous agents that can work on tasks in several worktrees in parallel, without fighting over the running MPS instance and available ports. Long startup times can be offset through a strategy also adopted by Gradle and Kotlin: have the CLI start a long-running daemon and keep it running in the background for a while for future CLI calls.

So now I am rewriting the tool in Kotlin/Java and introducing the daemon. The work is currently still on a branch on GitHub, but I expect to finish it in the coming weeks and release an initial version.

My vision is for the tool to become a single CLI for working with MPS projects from the command line. It should enable agents to edit the MPS models, ideally in large chunks. Eventually, it should also incorporate the MPS build backends (generate, checkmodels, migrate, remigrate, execute-method) and make them easier to use for both humans and LLMs, e.g. by providing less verbose logging and better output.

So what can agents do in MPS today? Investigate, explore, and make simple changes in MPS projects. Before we can have models edit models and do serious autonomous work, we need to give them better tooling for bulk edits, model checking and generation, with compact result reporting.


P.S. I’m currently open to one new consulting engagement in the MPS space. The best fit would be work around MPS development, migrations to new MPS versions, build infrastructure, CI, custom tooling, or making AI agents useful in real projects. If this sounds useful for your team, I’d be happy to talk.