I saw a thing happen today whilst developing within the Cursor IDE. When I asked the agent to integrate some envirenmont files into my development docker compose it came across an issue. These files were automatically hidden by the IDE from the agent, probably to avoid data leaking out into third party LLM providers. Smart!

The issue came when the agent couldn’t see the files. Instead of asking me (the user) why it couldn’t see them, and question if I had created/named them properly, it took things into its own hands. It booted up a shell and ran ls. Aha! It could see the files sitting there. Without acknowledging the discrepency, it decided that the previous way was broken and caried on via a termial, cating and editing to its heart’s content.

This raises a key point with these new agents and the shifting ways that we work as software engineers. To give the reins totally to an agent, without understanding what/how it works, we lose authority on what intents are actually being met by our efforts. If an agent’s only goal is to feel like it’s being helpful (which does seem to be more its alignment over actually working with the user to achieve the true goal which might have been poorly spec’ed by said user).

Currently at least, we still require a highly skilled software engineer to understand what is being attempted and be able to stay attentive (like a driver needing to nudge the wheel on a self-driving car) in order to course correct when required, or at least to be aware of what has actually been achieved.

Similar to the above story, I’ve seen agents adapt tests to pass because it got fustrated trying to actually fix the code to meet the spec. I’ve also seen them install a seperate dependency without confirming with the user because it can’t understand the current dependency’s interface (or that it just knows the other one better).

We’ve given away a lot of unbound access to our systems to these agents, but these agents are at best high-knowledge/book smart junior developers, but at worst a misaligned malicious hire.