Michael Czeiszperger | Software Architect

MICHAEL CZEISZPERGER

Software Architect | Agentic AI Architecture | AI-Assisted Development

I've been building developer tools for 25 years, most recently modernizing a 550,000-line Java codebase and integrating AI agents directly into the product: 75 MCP tools, agentic test configuration, and automated report generation. The modernization itself was done using Claude Code and multi-LLM spec-driven development. Clients have included the US Bureau of Economic Analysis, the New York Marathon, the US Census, the Governments of Canada and France, and scores of others. I got my start doing R&D for Yamaha Music, then worked on Solaris at Sun Microsystems, and have been building and shipping software ever since.

Professional Work

At Web Performance, I've served enterprise clients including the US Census, Canadian government, and New York Road Runners.

Legacy Modernization

Modernizing a 550K-Line Codebase

Modernized a 25-year-old Java program suite with 550,000 lines of hand-written Java as sole developer using Claude Code and multi-LLM spec-driven development. Java 6 to 11, Eclipse RCP 3.6 to 4.19, SWT to React 18, Ant to Maven.

Java 6→11, Eclipse RCP 3.6→4.19, Ant→Maven, SWT→React 18
550K lines modernized as sole developer
1,294 commits with > 3,000 automated tests
React 18 dashboard with JWT auth and containerized CI/CD
Multi-model AI consensus for spec-driven migration

Read the Blog Post → WebPerformance.com View Live Demo

Agentic AI Systems

Agentic AI for Load Testing

Built agentic AI systems for Web Performance Load Tester 7.0: a test configuration agent with 75 MCP tools and a 3-stage router, agentic report generation with an AI-driven investigation loop, and structured RAG using triage hierarchy, recipes, and live data retrieval.

75 MCP tools for autonomous test configuration and analysis
3-stage router: UI context, intent classification, tool selection
Agentic report generation with AI-driven investigation loop
Structured RAG: diagnostic triage, recipe system, live data retrieval

Read the Blog Post →

Multi-LLM Workflow

Multi-LLM Spec-Driven Development

A three-pillar workflow for agentic coding: spec-driven planning, multi-LLM review, and hand-curated tests with golden datasets. Each pillar catches a class of failure the other two can't.

Claude Code drives; OpenAI, Gemini, and DeepSeek review every stage in debate mode
Oracle analysis across 39 models: +13.4 pp on SWE-Bench Verified, +21.0 pp on SWE-Bench Pro
Hand-curated golden datasets catch the bugs semantic review can't
Each pillar catches a different class of failure: intent, implementation, output

Read the Blog Post →

Personal Projects

Featured

Theme Park Analytics Platform

Tracks ride downtime and wait times across major US theme parks and calculates "shame scores" for unreliable attractions. Built with Claude Code, GitHub SpecKit, and multi-LLM consensus. 935+ tests, pre-flight validation, automatic snapshot before deployment, pre-commit validation.

Python/Flask with SQLAlchemy Core, MySQL with partitioned time-series tables and atomic table swaps
Three-layer caching: browser (5-min TTL), server-side thread-safe cache, pre-aggregated ranking tables
Scikit-survival ML models for breakdown prediction; Levenshtein fuzzy matching for data reconciliation
935+ tests: unit, integration, contract (OpenAPI), property-based (Hypothesis), and golden-data regression

GitHub Repository →

Mobile App

WalkOnAlerts

iPhone/Android app that sends real-time notifications when theme park rides reopen after breakdowns, helping visitors catch walk-on opportunities before crowds return. Built on breakdown data from 86,500+ analyzed events across 31 US theme parks.

React Native / Expo with TypeScript; Python/Flask REST API backend
Scikit-survival gradient-boosted models with 23-feature vectors for breakdown duration prediction
Tiered push notifications via Expo Push API with deduplication and batching
Sub-60-second reopening detection across 31 US parks
Data-driven stay-or-leave recommendations using survival curves at 97 time-point intervals

WalkOnAlerts.com →

Agentic AI/MCP

AI Financial Analysis System

Natural-language financial reporting for Copilot Money. Reverse-engineered the GraphQL API and built a Chrome extension with 12 specialized tools that answer complex analytical questions about your spending.

TypeScript/React 18/Vite Chrome Extension (Manifest v3) with MCP tool pattern in service worker
12 specialized tools with branded types for compile-time USD safety
Deterministic math via dedicated calculator tool, AI never touches arithmetic
Levenshtein-distance fuzzy matching with 4-strategy confidence scoring for category resolution
Multi-model AI (Claude + GPT fallback) with autonomous error recovery

Read the Blog Post →

Open Source

ScrollKit

A niche labor of love: a CircuitPython library for driving scrolling LED displays showing real-time data from anywhere. Powers the physical LED product sold at ThemeParkWaits.com.

Three-process async architecture: display (~50 FPS), data updates, and embedded HTTP server
Pygame-based LED matrix simulator with factory pattern for platform abstraction
OTA live code updates via GitHub integration with circuit-breaker restart protection
Embedded web UI (Adafruit HTTP Server) for WiFi config, park selection, and brightness control
Pre-allocated fixed-size queues and explicit GC to manage ~200KB ESP32 RAM

See It In Action → GitHub

Theme Park Analytics Platform

I've got a bunch of software ideas that have been on the back burner, unimplemented, because they would have taken weeks full-time to implement, and who has that much spare time? When I came across GitHub Speckit a few months ago, I wanted to test it on a complicated personal project before risking it at work, and I've been wanting to do this one for ages. A twist is that I integrated the Zen MCP server into the dev process, so it uses multi-LLM consensus for design and implementation details.

"Theme Park Analytics Platform" tracks ride downtime and wait times across major US theme parks and calculates "shame scores" for unreliable attractions. There are plenty of sites that track wait times; the point here is to provide industry-wide insights and analysis. The interesting part isn't the app itself, it's the development guardrails I've built into the Claude Code workflow.

Tech Stack

Python 3.11+ / Flask 3.0+ with SQLAlchemy 2.0+ Core (no ORM, raw SQL with centralized helper functions)
MySQL/MariaDB with partitioned time-series tables, CTEs, window functions, and atomic table swaps (RENAME) for zero-downtime live ranking updates
Three-layer caching: browser-side (5-min TTL with prefetch), thread-safe server-side QueryCache (MD5-keyed), and pre-aggregated ranking tables (<10ms response)
Scikit-survival gradient-boosted models for breakdown duration prediction with 23-feature vectors
Pydantic 2.5+ for validation, python-Levenshtein for fuzzy data reconciliation, Hypothesis for property-based testing
Weather integration (OpenMeteo API) with tenacity retry logic; holidays library for feature extraction
Chart.js 4.4 frontend; Gunicorn behind Apache reverse proxy; AWS SSM Parameter Store for production config

Testing Regime (935+ tests)

Layered testing: Unit tests (~800, mocked DB, <5 sec total) + Integration tests (~135, real MySQL with transaction rollback)
Contract tests validating OpenAPI schemas
Golden data tests with hand-computed expected values for regression catching
Replicated dev database for new feature development and integration tests
TDD enforced: Red-green-refactor or it doesn't ship

Deployment Safeguards

Pre-flight validation (syntax, imports, deps)
Automatic snapshot before deploy with one-command rollback
Pre-service validation runs before gunicorn starts
Smoke tests with automatic rollback on failure
Precommit validation via Zen AI before any commit

Design

The front-end was iteratively designed with Claude Desktop. The original prompt was to create a website theme based on the work of Disney legend Mary Blair, best known for creating It's a Small World and widely considered the most influential of Disney's conceptual artists.

What's Next

The next thing to add is pattern recognition, once there's more than a couple of weeks of data. I'm interested in seeing whether there's any relationship between weather and reliability.

Screenshots

Main Dashboard

Charts showing park performance over time

Performance Charts

Wait Times Heat Map

Shame Scores by Park

Performance Metrics

Professional Work

Modernizing a 550K-Line Codebase

Agentic AI for Load Testing

Multi-LLM Spec-Driven Development

Personal Projects

Theme Park Analytics Platform

WalkOnAlerts

AI Financial Analysis System

ScrollKit

Enterprise Clients

Government & Public Sector

Fortune 500 & Major Corporations

Consulting & Professional Services

Healthcare, Education & Nonprofit

Technology & Global Services

Theme Park Analytics Platform

Tech Stack

Testing Regime (935+ tests)

Deployment Safeguards

Design

What's Next

Screenshots