March 31, 2026

Modernizing a 550K-Line Java Codebase with AI-Assisted Development

legacy-modernization AI-assisted-development Java Claude-Code

Web Performance Load Tester has been in continuous development for over 25 years. I designed the original architecture and wrote the first version. Over the years, a dozen developers have contributed to what is now 550,000 lines of hand-written Java. Longer than the entire Lord of the Rings trilogy. Today I’m the sole developer again, and for version 7.0, I modernized the platform and started moving the product to the cloud.

The Scope

Here’s what changed:

Four concurrent migrations across a 550,000-line codebase: Java 6 to 11, Eclipse RCP 3.6 to 4.19, Ant to Maven, and a new React reporting module running embedded in the desktop app alongside the existing SWT reports and in a new cloud dashboard. One developer, 1,294 commits, 791 automated tests.

That’s four major migration dimensions happening simultaneously in a half-million-line codebase. Miss an interaction between any two of them, and the product breaks in ways that are hard to diagnose. I did it as the sole developer, using AI-assisted development to handle the volume.

Why Not Rewrite?

The typical advice for a modernization this large is “rewrite it.” That advice is almost always wrong.

You lose 25 years of edge-case handling, customer-specific fixes, and battle-tested behavior. A codebase this old has been written and rewritten by a dozen developers, and every odd corner of it exists for a reason, even if nobody remembers what that reason was. One of the hardest decisions you make when writing software is build-vs-buy, and we often decided to take advantage of open source, only now the program has 3,437 external Java JAR library files, each with their own requirements and depencies.

So I used what Martin Fowler calls the Strangler Fig pattern: the new system grows alongside the old one, gradually replacing it, while the old system keeps working the entire time. The name comes from strangler fig vines that germinate on a host tree and eventually become self-supporting. The old tree dies, but there’s never a moment where nothing is standing.

Every change had to keep the product shippable. Clients were running production load tests throughout the migration. “We’ll be down for a few months while we rewrite” was never an option.

The Four Dimensions

Java 6 to 11. Five major versions, each with its own compatibility breaks. The compiler errors are the easy part. The hard part is runtime behavior: subtle differences in classloading, reflection access, and concurrency that only surface under specific conditions. For a load testing tool, “under specific conditions” is literally the product’s purpose.

The other hard part is the dependency graph. The product pulls in over fifty third-party libraries. Each has its own supported Java range, and most of them had to be upgraded in step with the main code. Some depend on each other, so a version bump in one library constrains the next. Picking the right version of each library in the right order, so no pair contradicts the others and every library works on the Java version you’re currently targeting, is the kind of combinatorial puzzle that AI is genuinely good at. Claude Code walked that matrix, identified the safe upgrade paths, and flagged the libraries that needed to be replaced rather than upgraded.

For anyone who wants to see what "fifty libraries" actually means in this codebase:

Category	Libraries
Runtime platform (1)	Eclipse RCP 4.19 / Equinox / SWT / JFace / WST / EMF / ECF / JDT / LTK / Nebula / ecj. Roughly 200 `org.eclipse.*` JARs treated as one platform.
AWS (2)	AWS SDK v2 2.33.11 (around 40 modules: s3, ec2, cloudwatch, sts, ses, iot, bedrockruntime, netty-nio-client, apache-client, jackson-core, …); legacy AWS SDK v1.
Web / HTTP (6)	Jetty 9.4.53 (jetty-, websocket-, http2-, fcgi-, infinispan-*, apache-jsp; ~60 JARs); Netty 4.1.126; Apache HttpComponents (httpclient, httpcore, httpasyncclient, mime4j); Jersey 2.x (JAX-RS impl); HK2 + AOP Alliance (Jersey DI); Reactive Streams.
Browser automation (3)	Selenium 4.41.0; HtmlUnit (+ nekohtml, cssparser, sac); legacy PhantomJSDriver, OperaDriver.
Document / data (5)	Apache POI 5.2.5 (+ xmlbeans, curvesapi, SparseBitSet); Jackson 2.x (core, databind, annotations, jaxrs, module-jaxb); Gson; JSON.org (org.json); Minimal-JSON.
Apache Commons (10)	cli, codec, collections (v1 + v4), compress, exec, io, jxpath, lang3, logging, math3. Each released independently.
Other Apache (5)	Ant 1.9.16 + sub-modules; Jasper / taglibs (JSP); Xalan + serializer; Lucene 1.9.1; Log4j API.
Scripting / JS (2)	Rhino (+ htmlunit-core-js); FreeMarker.
Charts (2)	JFreeChart + jcommon; XChart 3.8.8.
Observability (2)	OpenTelemetry 1.59; SLF4J.
Testing (3)	JUnit + Hamcrest; TestNG (+ jcommander, bsh); Byte Buddy / CGLIB / Javassist (mock backends). Mostly test-scope.
System / crypto / misc (9)	ASM 9.6; JNA + JNA-Platform; Bouncy Castle; Eclipse Paho MQTT; JCIFS; JSch 0.1.55; Ini4j; Joda-Time; Guava (+ failureaccess).
OSGi / logging (1)	Pax Logging + Pax ConfigManager.
Standards APIs	javax / JSR: servlet, websocket, jaxrs, annotation, inject, persistence, mail, jsp, auth.message. 11 spec JARs treated as one.
Google / annotations (3)	Protobuf-java; auto-service-annotations; jspecify.
Compression (1)	Brotli dec.

Eclipse RCP 3.6 to 4.19. This was the most labor-intensive dimension. Eclipse RCP 4.x is effectively a different framework from 3.x. The extension point model, the workbench lifecycle, the dependency injection approach: all changed. Every plugin descriptor, every extension point registration, every view and editor had to be audited and either migrated or rewritten.

Ant to Maven. If this were a normal Java project, the build migration would be straightforward. It is not a normal Java project. It’s an OSGi/Eclipse RCP application, and OSGi dependency management is its own circle of hell, made even more difficult with 3,437 external JAR files. Every bundle has a manifest declaring its imports and exports. Every version range matters. One minor change to a dependency version and our previous Eclipse RCP developer would spend three months hand-editing configuration files to get the build working again.

The Strangler Fig pattern was critical here. I didn’t replace Ant with Maven. I built Maven alongside Ant. The old Ant build system remained exactly as it was, fully functional, building the same product the same way. The new Maven build grew in parallel, taking over modules one at a time. At any point during the migration, you could build the product with either system. Anything new went into Maven from the start. The old Ant system only shrank; it never needed to change.

This eliminated the scariest failure mode in a build migration: the moment where the old build is half-dismantled, the new build doesn’t quite work yet, and nobody can ship anything. That moment never happened.

Reporting in React 18, alongside SWT. The biggest user-visible change, and a textbook Strangler Fig move. The old SWT reports were static: you saw what the engine produced. The new React reports are interactive. Users can sort by any column, change percentiles in a popup, adjust the criteria that define capacity, and explore the same run from several angles without re-running it. The old SWT reports are still in the desktop app for users who want them; the new interactive reports sit next to them in the same window.

The same React code runs in two places: in the cloud, and embedded inside the desktop application via SWT’s browser widget. One codebase, two deployment targets. Only the UI changed. The engine that generates load, captures results, and computes statistics is the same code it was before. Reporting was the first module to move because that’s where customers spend the most time and where interactivity pays off fastest. The plan from here is to walk the rest of the UI the same way: replace each SWT piece with React, deploy it to both the desktop browser widget and the cloud, end with one UI codebase running in both places. The rest of the desktop UI was modernized visually in the meantime but kept on the SWT base.

This is the Strangler Fig pattern doing exactly what it’s supposed to do: old and new coexist, the old subsystem keeps working for customers who rely on it, and the new one ships on its own cadence. Nothing in the pattern requires you to replace the entire app for the move to be worthwhile. The cloud side uses JWT authentication; the desktop product still uses the license-key mechanism that existing customers already have deployed. The deploy scripts that push the product were updated for Maven; the product itself is not containerized.

AI-Assisted Development

Here’s the part that made this feasible as a one-person effort.

I used Claude Code throughout the modernization, combined with a multi-LLM spec-driven development process. The workflow: write a detailed specification for each migration task, get consensus from multiple AI models on the approach, then implement, with Claude Code handling the repetitive transformation work while I focused on the architectural decisions and integration points.

At the end: 1,294 commits, 791 automated tests, one developer.

The AI wrote a lot of the code. I’m not going to pretend otherwise. What I did was act as product engineer, designer, architect, and domain expert: deciding what should happen and how the pieces should fit together. What I didn’t do was sit and type out low-level code for an extremely complicated application. Claude Code did that at volume: updating hundreds of plugin descriptors to the new Eclipse RCP format, converting build configurations, migrating deprecated API calls to their modern equivalents, writing test scaffolding for untested legacy code.

This doesn’t mean a non-programmer could have done it. You still need an experienced understanding of how a complex application is put together: the build system, the test suite, the deployment pipeline, the way modules depend on each other. Without that, you can’t tell whether what the AI just produced is correct, or whether it’s going to wedge something three steps from now. The experience that paid off the most, though, was organizing large software projects.

AI, left to itself, does not produce a self-consistent codebase. It doesn’t group similar functionality together. It doesn’t have a clean sense of what belongs in which module or package. Without coaching, the output trends toward spaghetti. My job was to specify the structure up front, notice when a new piece belonged somewhere different than where Claude Code had put it, and keep the source tree coherent as it grew. That’s where the time saved by not typing actually went, alongside the judgment work a human still has to do: deciding what to migrate versus rewrite, resolving architectural conflicts between the old and new frameworks, and testing the edge cases only a domain expert would think to check.

The Multi-LLM Spec Process

For non-trivial migration decisions, I used multiple AI models in a consensus process. Write a spec describing the current state and the target state, send it to several models, and compare their recommendations. Where they agree, proceed with confidence. Where they disagree, you’ve found the interesting part of the problem.

This caught several issues that a single model would have missed. Different models have different blind spots, and a migration with this many interacting dimensions has enough surface area to find all of them.

Lessons for Engineering Managers

If you’re evaluating a similar modernization, here’s what I’d want you to know.

Strangle, don’t rewrite. I know I already said this. I’m saying it again because the temptation will come back every time you hit a particularly gnarly piece of legacy code. The gnarly pieces are gnarly because they handle real complexity. A rewrite won’t make that complexity disappear. The Strangler Fig approach (old and new coexisting, new growing gradually) is how Ant-to-Maven stayed shippable throughout, and it’s how the React reporting module now runs alongside the existing SWT reports. You don’t have to strangle everything for the pattern to pay off. Each slice that moves cleanly is a win on its own.

AI-assisted development changes the staffing equation. This modernization would have required a team of 3 to 5 developers using traditional methods, or it would have taken one developer several years. With Claude Code and the multi-LLM process, one developer completed it in a fraction of that time. That doesn’t mean you should cut your team. It means one developer can now tackle work that previously required a team.

End-to-end integration tests are non-negotiable. 791 tests isn’t just a metric. Unit tests tell you the math is right. Integration tests tell you the product works. The distinction matters enormously during a migration, because a unit test can pass while the feature it’s testing is completely broken in ways the unit test doesn’t cover.

The integration test suite uses real test cases against real, complex web applications. It actually configures and replays load test sessions as part of the automated suite. Not mocked HTTP, not simulated responses: the real product doing real work. That’s the only thing that gives you confidence beyond “well, the math in the unit tests is right.” When you’re migrating a half-million-line codebase across four dimensions simultaneously, “the unit tests pass” is necessary but nowhere near sufficient. You need to know that the most important functionality the user actually sees still works end to end.

Keep shipping throughout. The constraint that the product had to remain shippable at every commit forced disciplined, incremental work. It also meant clients got improvements continuously rather than waiting for a big-bang release that might never come.

What Came Next

The modernization turned a codebase I was maintaining into one I could build on. That’s the real payoff: not cleaner code for its own sake, but the foundation for what came next. I built two agentic AI systems on top of the updated stack, one that configures test cases through natural language conversation using 75 MCP tools, and another that investigates load test data and generates professional performance reports. That work is covered in Building Agentic AI Systems for Web Performance Load Tester 7.0.