MICHAEL CZEISZPERGER

Modernizing a 559K-Line Java Codebase with AI-Assisted Development

Web Performance Load Tester has been in continuous development for over 25 years. I designed the original architecture and wrote the first version. Over the years, a dozen developers have contributed to what is now 559,000 lines of Java. Roughly the length of the King James Bible, twice. Today I’m the sole developer again, and for version 7.0, I modernized the entire stack.

The Scope

Here’s what changed:

Six concurrent migrations across a 559,000-line codebase: Java 6 to 11, Eclipse RCP 3.6 to 4.19, SWT to React 18, Ant to Maven, session-based to JWT auth, and manual to containerized CI/CD. One developer, 1,294 commits, 791 automated tests.

That’s four major migration dimensions happening simultaneously in a half-million-line codebase. Miss an interaction between any two of them, and the product breaks in ways that are hard to diagnose. I did it as the sole developer, using AI-assisted development to handle the volume.

Why Not Rewrite?

The typical advice for a modernization this large is “rewrite it.” That advice is almost always wrong.

You lose 25 years of edge-case handling, customer-specific fixes, and battle-tested behavior. The King James Bible analogy isn’t just about line count. Like the Bible, this codebase has been translated and retranslated by a dozen developers over the years, and every verse exists for a reason, even if nobody remembers what that reason was.

So I used what Martin Fowler calls the Strangler Fig pattern: the new system grows alongside the old one, gradually replacing it, while the old system keeps working the entire time. The name comes from strangler fig vines that germinate on a host tree and eventually become self-supporting. The old tree dies, but there’s never a moment where nothing is standing.

Every change had to keep the product shippable. Enterprise clients (Amazon, Boeing, the US Census, McKinsey) were running production load tests throughout the migration. “We’ll be down for a few months while we rewrite” was never an option.

The Four Dimensions

Java 6 to 11. Five major versions, each with its own compatibility breaks. The compiler errors are the easy part. The hard part is runtime behavior: subtle differences in classloading, reflection access, and concurrency that only surface under specific conditions. For a load testing tool, “under specific conditions” is literally the product’s purpose.

Eclipse RCP 3.6 to 4.19. This was the most labor-intensive dimension. Eclipse RCP 4.x is effectively a different framework from 3.x. The extension point model, the workbench lifecycle, the dependency injection approach: all changed. Every plugin descriptor, every extension point registration, every view and editor had to be audited and either migrated or rewritten.

Ant to Maven. If this were a normal Java project, the build migration would be straightforward. It is not a normal Java project. It’s an OSGi/Eclipse RCP application, and OSGi dependency management is its own circle of hell. Every bundle has a manifest declaring its imports and exports. Every version range matters. One minor change to a dependency version and our previous Eclipse RCP developer would spend three months hand-editing configuration files to get the build working again.

The Strangler Fig pattern was critical here. I didn’t replace Ant with Maven. I built Maven alongside Ant. The old Ant build system remained exactly as it was, fully functional, building the same product the same way. The new Maven build grew in parallel, taking over modules one at a time. At any point during the migration, you could build the product with either system. Anything new went into Maven from the start. The old Ant system only shrank; it never needed to change.

This eliminated the scariest failure mode in a build migration: the moment where the old build is half-dismantled, the new build doesn’t quite work yet, and nobody can ship anything. That moment never happened.

SWT to React 18. The biggest user-visible change. SWT gives you native OS widgets that look right but are painful to iterate on. React gives you a modern web UI that runs alongside the desktop application. The new dashboard uses JWT authentication and containerized CI/CD, which means the frontend can ship on its own cadence.

AI-Assisted Development

Here’s the part that made this feasible as a one-person effort.

I used Claude Code throughout the modernization, combined with a multi-LLM spec-driven development process. The workflow: write a detailed specification for each migration task, get consensus from multiple AI models on the approach, then implement, with Claude Code handling the repetitive transformation work while I focused on the architectural decisions and integration points.

At the end: 1,294 commits, 791 automated tests, one developer.

The AI is not writing the code for you. That framing is wrong and leads to bad outcomes. What it does is handle the tedious, high-volume parts of migration work that would otherwise bury a sole developer. Updating hundreds of plugin descriptors to the new Eclipse RCP format. Converting build configurations. Migrating deprecated API calls to their modern equivalents. Writing test scaffolding for untested legacy code.

These are tasks where the pattern is clear but the volume is crushing. One person could spend weeks on mechanical transformations, or could describe the pattern and let Claude Code apply it across the codebase. The time freed up goes to the work that actually requires judgment: deciding what to migrate versus rewrite, resolving architectural conflicts between the old and new frameworks, and testing the edge cases that only a domain expert would think to check.

The Multi-LLM Spec Process

For non-trivial migration decisions, I used multiple AI models in a consensus process. Write a spec describing the current state and the target state, send it to several models, and compare their recommendations. Where they agree, proceed with confidence. Where they disagree, you’ve found the interesting part of the problem.

This caught several issues that a single model would have missed. Different models have different blind spots, and a migration with this many interacting dimensions has enough surface area to find all of them.

What Didn’t Work

Not everything went smoothly.

An early attempt routed the AI agent system through AWS: IoT Core via MQTT for transport, Lambda for orchestration, AgentCore for the Python agent. Architecturally elegant. Good separation of concerns. Unusable in practice. The round-trip latency made every tool call painful, and a typical workflow needs 15 to 20 of them. The migration back to local execution (which added 1,717 lines and removed 5,755) was one of the most impactful changes in the project. The lesson: for a desktop application with a rich in-memory data model, the agent needs to live where the data lives.

Lessons for Engineering Managers

If you’re evaluating a similar modernization, here’s what I’d want you to know.

Strangle, don’t rewrite. I know I already said this. I’m saying it again because the temptation will come back every time you hit a particularly gnarly piece of legacy code. The gnarly pieces are gnarly because they handle real complexity. A rewrite won’t make that complexity disappear. The Strangler Fig approach (old and new coexisting, new growing gradually) applies to every dimension of the migration, not just the build system.

AI-assisted development changes the staffing equation. This modernization would have required a team of 3 to 5 developers using traditional methods, or it would have taken one developer several years. With Claude Code and the multi-LLM process, one developer completed it in a fraction of that time. That doesn’t mean you should cut your team. It means one developer can now tackle work that previously required a team.

End-to-end integration tests are non-negotiable. 791 tests isn’t just a metric. Unit tests tell you the math is right. Integration tests tell you the product works. The distinction matters enormously during a migration, because a unit test can pass while the feature it’s testing is completely broken in ways the unit test doesn’t cover.

The integration test suite uses real test cases against real, complex web applications. It actually configures and replays load test sessions as part of the automated suite. Not mocked HTTP, not simulated responses: the real product doing real work. That’s the only thing that gives you confidence beyond “well, the math in the unit tests is right.” When you’re migrating a half-million-line codebase across four dimensions simultaneously, “the unit tests pass” is necessary but nowhere near sufficient. You need to know that the most important functionality the user actually sees still works end to end.

Keep shipping throughout. The constraint that the product had to remain shippable at every commit forced disciplined, incremental work. It also meant clients got improvements continuously rather than waiting for a big-bang release that might never come.

What Came Next

The modernization turned a codebase I was maintaining into one I could build on. That’s the real payoff: not cleaner code for its own sake, but the foundation for what came next. I built two agentic AI systems on top of the updated stack, one that configures test cases through natural language conversation using 75 MCP tools, and another that investigates load test data and generates professional performance reports. That work is covered in Building Agentic AI Systems for Web Performance Load Tester 7.0.

×