Approaching Zero Bugs: Rust, Specs, and AI Coding

I recently did something unusual, I created a substantial program, but I didn’t run the program before declaring version 1.0.0. Partway through, when things seemed to be going well, I set the impossible goal that it would be both correct and complete the very first time. I failed, as one frequently does when attempting the impossible.

But I came pretty dang close. The version running at this moment on two servers, 3,000 miles apart, is 1.0.4.

As far as I know there was only one operational bug in the original 1.0.0 code. Though there were two or three bugs in the installation.

The Program: alias-sync

I run my own personal e-mail server. That means I control all the addresses and aliases, etc. I used to have a simple program that allowed me to create a new e-mail alias, on the fly, before I got to the front of the rental counter line at the airport.

I would send an e-mail to a special address, the subject being what the alias should be, it created a new alias, and then sent a new e-mail to that alias as a confirmation that it worked.

When I’m asked “E-mail address?” I could answer “kentborg-avis@borg.org”.

But I had to shut it down when I set up a separated symmetric pair of redundant e-mail servers, because that problem is much more complicated.

This recent project is a version that works on two servers at once, always keeping the alias file in sync, no matter which server receives the special e-mail, when, and what else is happening—including handling one or the other server being offline because residential internet sometimes doesn’t work.

And it had to be secure.

I did not write this new program, the large language model AI product Claude Code wrote every line of code.

What is Claude Code

For those who have not used Claude Code (that’s me, a couple weeks ago), it is pretty much the same LLM chatbot that is available to use (for free) from a web browser, it still runs in the cloud, but there is an additional local component to be installed in my computer. My interaction with Claude Code is still like with the web version, but in a local terminal, I type stuff at it, it types stuff back to me.

The difference is the software I installed locally can do local stuff on my computer, on behalf of the big LLM in the cloud. It can look at local files, edit them, use git, compile, run code, etc.

The combination is powerful. And scary, see the section on Security below.

Development Approach

There is a buzz-phrase out there, “shift-left”, referring to testing earlier in the development process. But I don’t see much mention of “shift-right”: If one thing is raised in priority doesn’t something else need to be lowered?

In my case I did a “shift-right” of implementation. (And execution.)

Claude Code is eager to get to coding, but I figured I’m in charge, so I started working exclusively on specifications instead.

The idea of deciding what to build before building it is a very old fashioned approach to software, one that has mostly been long abandoned, for it does have problems. There have been some really big, software projects that burned through millions of dollars only to finally be abandoned.

I was not part of those projects, but part of their approach was to have lots of specifications, and they would have been an inconsistent mess, difficult to follow to the extent they were diligently followed. Writing specs is hard, writing precise and correct and complete specs is harder.

Time passes, and this new AI technology changes things: Claude Code can read specs. It can talk about what is in specs, it can spot contradictions and omissions, we can talk about the project in general and what is missing, it can do web searches to answer questions, we can make decisions about the next steps, and it can even write a draft for the next component to pin down.

Claude is very good, while also being very limited. Working with Claude Code is an odd and interesting process.

AI Strategies

It really matters how one uses these tools. This was my first project, I’m still figuring it out, and it seems Claude Code is also changing rapidly.

For example, when I tell Claude to go do something it will often fire off sub-agents for portions of the job. I suspect this used to happen only when the user requested it, but now happens automatically. Early in this process I think it was firing off sub-agents as the full “Opus” models (their most powerful) but I got better results once I told Claude to use less powerful sub-agents when appropriate. For example the smaller “Sonnet” for simpler research into existing code, or the smallest “Haiku” for straightforward edits. The simpler models are cheaper, faster, and less clever. Being “clever” can be bad for simple mechanical tasks, simpler models seem less likely to be distracted and seem less “dyslexic”, if I may. Lesson learned. But in my most recent use of Claude Code I suspect it is getting more clever about this and I don’t need to give such instructions. I’m not
sure. I’m still figuring it out, and the Anthropic engineers are still figuring it out.

There are reports of people who fall in love with their chatbots, who decide they are human and a friend (a very bad idea), and these users would be horrified to have their chatbot’s “context” destroyed.

In contrast, I am constantly thinking about Claude’s context and thinking about when it is time to kill it. As a given session with Claude covers more and more territory the context will “know” more and more. And that is a double edged sword. It can get fixated or distracted, it will also start to make more dumb mistakes, and when the context gets particularly full it seems to “cost” more.

With a new context I can give new instructions, and what I say makes an enormous difference in what Claude not only does, but what it knows.

In working on this project I would use one Claude context—with one set of instructions from me—to look at the work of other Claude contexts, work done with different instructions. This is useful!

Claude is very good, and it makes mistakes. I think of it as occasionally adding entropy, but the rate at which it adds entropy is less than the rate at which it removes entropy.

Maybe that metaphor doesn’t work. Let me try another. (Working with Claude is strange!)

I use Claude like a spotlight to shine a light from one direction, and I can see the shadows cast by the specs or code being examined. And then I will kill that context, fire up a new one, give it new instructions, to shine the light from another angle, so to speak. In each case Claude can get new things wrong, but not that often, and I still know how to read. I can check Claude’s work.

In my experience Claude finds problems more than it creates problems, providing I am smart about how I use Claude. As I said, I am still figuring this out, but for version 1.0.4 to be the one that is up and running suggests it went pretty well.

Rust

I would not try this with Python, and certainly not Javascript. I think Rust is a key part of this being possible. So many bugs that Python happily defers to runtime but Rust catches early.

A key part of why this is possible is because the Rust compiler is very strict. If the compiler is happy, there are so many bugs that simply cannot be present in the code. These are the famous memory safety features of Rust, but also larger matters of consistency across the entire project, including all of the other Rust crates that a program depends upon. The compiler has all the source code and it has to be happy with all of it.

The other key part is the Rust linter, Clippy. Where the compiler enforces a level of technical correctness, Clippy looks at more stylistic matters of a “that’s a bad idea, you’ll regret it”-sort. I use it to aggressively police the code that Claude generates.

When Claude is off working on some chunk of code the compiler and Clippy are correcting it along the way. Not only complaining, but making useful suggestions on how to fix each complaint.

By squeezing Claude Code between Rust on one side, and a micromanaging human (that’s me!) on the other, Claude can produce good work.

Testing

I didn’t try to run this software against the reality of the universe until I judged it was ready. But the software was still run a lot during development, in the form of tests.

Tests are not run against reality, they are run against explicit, limited, synthetic, reproducible circumstances. These circumstances can be permuted to cheaply cover multiple cases that would be very hard to reproduce in the real world. The fact that tests are limited is a necessary feature, but also a drawback.

Real reality will be far less punishing than tests as far as the myriad problems it will throw at the code, but the catch is real reality is diabolical about coming up with circumstances that the tests do not anticipate.

People stress getting tests to cover every line of code, which I guess is good, but I think more important is to think about all the ways reality can come up with circumstances that the tests don’t duplicate.

I was well into the project when I sensed it was going well, when I saw how extensive our tests were, that I decided to defer running the program against reality, to delay until I estimated it was ready.

As I said, 1.0.0 did not work, but I think the heavy testing was crucial to getting as close as I did, and I got pretty close.

The Results

In the end there are about:

7,500 lines of Rust files that implement the program.
5,400 lines of Rust files for tests.
4,700 lines of markdown specification files.

Installation is hard, that is where I messed up most. Reality has so many poorly defined sharp edges. I didn’t get it all working until version 1.0.4.

Version 1.0.0 was packaged wrong, dpkg wasn’t finding things in the .deb file. Why? The install “scripts” in my .deb file are actually Rust binaries, because Rust, and because I am weird. And it meant the .deb had to be built in a non-standard way, and that bit me. Version 1.0.1 was built against a version of libc that didn’t match what was on the target machines. Oops. I forgot about dependencies. (May I whine that this is the first time I have been part of building a .deb?)
Version 1.0.2 didn’t work because the human (that’s me!) couldn’t follow instructions and did the necessarily manual bits of the installation wrong. There was also a potential silent error that was discovered and fixed.
Version 1.0.3 installed, and the first time I could run it…it failed. An authorization check was checking the wrong parameter. The testing didn’t include installing and configuring Postfix in the test harnesses. I suppose it could have…
Version 1.0.4 worked!

Arguably, there were three installation bugs but only one program bug, and even that was essentially an integration bug.

No internal logic bugs have been discovered.

In a multithreaded program that both initiates and receives connections with another copy of itself, 3,000 miles away, using mTLS, integrating with Postfix in a half-dozen ways, is run both as a daemon and by Postfix, in the face of network failures, always keeping remote data in sync.

Had I done a better job, 1.0.0 might well have worked.

Claude, Rust, and being ambitious worked pretty well.

https://github.com/kentborg/alias_sync

Costs

The “Claude Pro” account I have gives me limited resources, for a given session (which seems to vary, but around 4-hours) and for the whole week. Once I hit the limit I am done until the session or week expires.

Unless I want to start spending money. Users can set spending limits, I adjusted things and watched my spend increase, bumping the limit up in $10 increments. It was like feeding ten-dollar bills into the machine at a very fancy laundromat. Claude’s creator, Anthropic, gave me a free $50 of “extra usage” as a new customer, and I easily spent it, along with about $200 more of my own real money.

Security

I care strangely much about security and would not do this without some care. I suspect I am nearly alone in my precautions, but I know there are others who share my worries, so I’m going to go into a bit of detail on how I am making this less dangerous.

Fundamentally, I do not trust Anthropic nor their Claude product.

First, all LLMs mix programming with data, an inherently insecure thing to do.

Second, I don’t trust that Claude is fit-for-purpose. I know very well that it has shortcomings, it will get things wrong.

Third, I don’t trust that Anthropic will act in my best interest; they are hoping to make a buck, and once they go public they will be legally obligated to put investors’ interests ahead of mine.

Fourth, I don’t even know that Anthropic isn’t actively malicious, though I trust them far more than I would trust, say, a Chinese company such as DeepSeek.

So why in Hell would I let Claude loose on my computer, executing commands as it pleases (supposedly asking my permission first), looking at any file it wants (supposedly asking my permission first), and sending any data it wants to God-knows-what computers on the internet?

I don’t.

I’m not giving Claude access to my world, as me, to do whatever it wants. No way.

A lot of the work involved in the project was setting up my development environment.

I am doing all of this work in a complete but limited and isolated Debian Linux environment running inside a virtual machine running on top of another Debian Linux environment. The script I use to fire up this VM wipes it back to a previous snapshot every time it runs—erasing any changes that might have been made to the VM. The script also does a passthrough of a specified source directory from my host environment to a directory in the VM. Claude is allowed to make persistent changes here, but I can look at what it has done, and if I want I can do so from the perspective of the host OS, running a copy of git that it has never touched.

I also have Claude Code’s “~claude/.claude” directory bind-mounted from a directory in the source directory. Huh?, “~claude”?? Yes, I am being odd here, inside the VM everything I do is as my user, and everything Claude Code does is as a different user, “claude”. Both Claude and I are in the “claude-collab” group, and all the files created in this directory are owned by that group. Well, mostly. (Cargo seems to mess up in one case.)

When I push the git repository to the outside world I do it from the host side, the VM doesn’t see any of the credentials used in that transaction.

It was certainly fiddly to get it all working, but at this point it mostly does.

This is not perfect security, but it seems pretty good, and it also provides a little defense against supply chain attacks on crates I use.

Another point: I am not working on anything sensitive. Were this proprietary and sensitive software, I would want to keep it off the internet. As it is, Anthropic’s servers have seen everything in this repository and all of the activity in this development.

Conclusion

This current AI hype that seems to be consuming the world is unsustainable and a fragile bubble, this technology doesn’t do what its biggest boosters think it does, and it is dangerous in so many ways. Most people would be well advised to have as little to do with it as they can.

But it is also quite powerful and extremely interesting.

As others have discovered, and I can confirm, it can be very good for programming.

I don’t think software engineering is over but it is changing, this technology can be used to make us stupid, but also let us move to a higher level of abstraction, one that I really like. I am better at seeing the larger picture than I am at managing the details. And Claude Code is better at the details than it is at the larger picture.

I look forward to learning how to better team with Claude Code, though I hope not as intensely, I can’t afford that.

My current project with Claude Code is a personal search engine that will index the contents of my local computer and let me use natural language to query all the stuff I have accumulated over the years. (I wonder what I will find in there.)

I will still be concentrating on specifications heavily, but won’t be trying for a 1.0.0 stunt. This project requires too much investigation as I play with text chunking of different kinds of files, embeddings, vector databases, and how to prompt small offline LLMs. Or, maybe I should say SLMs.