AI Code in Production

AI generated code seems to prioritise simple, repetitive solutions to problems posed. This paper highlights the difference between how complex a human vs AI written solution across a range of providers. Models will often repeat and, often unnecessarily, add complexity to codebases to design “accessible” solutions. A frequent misunderstanding models will output include on error handling. For example, asking a model to ensure that data is validated by a Pydantic object correctly can output

from pydantic import BaseModel

class Foo(BaseModel):
  bar: str
	
try:
  foobar = {"bar": "hello world"}
  foobar = Foo(**foobar)
except Exception as e:
  print(f"Unable to validate data: {e}")

Functionally the code runs as expected but Pydantic already handles the errors with clear messages exported. The try-except loop has added 3 additional lines needlessly and, compounded over a complex codebase, these additions can make the code far more difficult to read.

When building production code, maintainability should be prioritised over most other areas. A new developer should be able to pick up failing code and a) quickly understand what’s causing the error and b) be able to diagnose and fix.

Having spent some time developing on an existing codebase using AI agentic coding tools (GitHub’s Copilot and Claude Code), I found myself adding needless complexity to fix bugs that would directly impact maintainability.

With complex codebases come complex interactions between classes and functionality that AI models seem to struggle to manage. They are often very effective at solving the problem that is given but do so in a way with no regard to the overall functionality of the app. I was stopping a flood in one area by taking flood defenses from another.

These issues would compound, often unseen, until a deployment to dev and a full app run through would crash the server. AI generated logging messages were often unnecessary and frequent making debugging properly a nightmare. This is not the fault of AI systems but with how they are trained, designed to solve single issues for the user with no regard to the wider audience of the codebase.

They are also buggers for good object naming, often ignoring wider context and adding their own weird “flavours” when they need to output lots of code (eg “vibe coding”).

Good practice looks like consistent naming, such as

jons_variables = ["var1", "var2", "var3"]
for jons_variable in jons_variables:
  ...

where it is clear that the element in the for loop is a subset of the original object.

AI coding agents have this annoying habit of giving elements bad names and then duplicating objects for different needs

jons_variables = ["var1", "var2", "var3"]
for j_v in jons_variables:
  another_var = j_v

Over a large file having to go back and work out what another_var actually means makes maintenance difficult, especially when this behaviour is repeated.

These AI tools prioritise overly simplistic outputs that work well on a local issue but are substantial tech debt across a whole repository.

I would estimate I’ve wasted hours on features where AI solutions have not met full-app requirements, including clean up.

Where I find AI useful

I think AI can be really helpful at helping you understand new features or codebases from other developers. Claude Code especially is good at understanding broader context and explaining the context to the user, as well as being able to hone in on specific uncertainties a user has. It is able to find relevant context, including from different services (eg Lambdas that my app invokes) and spends time working out how the code is implemented to response to queries.

What Claude Code is not good at is turning these insights into robust and maintainable code. Even with the full repository context it constantly misses edge cases.

AI can be very effective tool for context retrieval, but a lousy software developer. It creates buggy code that doesn’t fit with the wider project.

I have found Claude Code a very useful automated code documentation tool. I have been able to save real time asking it questions, having it retrieve context and then offer suggestions for solutions. I now start every prompt to the tool with "Do not write any code - ..." . This works well to stop Claude before it starts making shaky architectural decisions.

It helps me get up to speed with the existing codebase and plan new solutions but is not good enough to write robust code.