AI Research

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

AI News Desk

Hugging Face

Jun 30, 2026

4 min read

Modernizing enterprise applications is one of the largest and most expensive software engineering activities organizations undertake.

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

Modernizing enterprise applications is one of the largest and most expensive software engineering activities organizations undertake. Teams migrate applications across frameworks to improve maintainability, cloud readiness, developer productivity, and access to modern capabilities.

Recent advances in coding agents have sparked excitement around AI-assisted modernization. But an important question remains:

Can AI agents reliably modernize real-world enterprise applications?

Existing software engineering benchmarks have demonstrated impressive progress in bug fixing and code generation, but framework migration presents a fundamentally different challenge. Success requires not only translating code, but also preserving behavior, adapting build systems, and navigating runtime dependencies.

To address this gap, we introduce ScarfBench (Self-Contained Application Refactoring Benchmark) , an open benchmark for evaluating AI agents on cross-framework migration tasks in Enterprise Java.

ScarfBench focuses on migrations across three major Java ecosystems:

Unlike traditional benchmarks that compare generated code against reference implementations, ScarfBench evaluates whether migrated applications actually build, deploy, and preserve behavior.

Framework migration is much more than replacing annotations.

A simple repository migration can require changes across dependency injection, persistence configuration, queries, and framework descriptors. Small mistakes in any of these pieces can prevent successful deployment.

Figure: Spring → Jakarta Migration Example

Framework migration requires translating framework semantics, not just source code.

ScarfBench provides a systematic way to evaluate AI agents on enterprise Java framework migration tasks.

This provides a much more realistic measure of modernization quality.

ScarfBench includes both focused migration tasks and whole-application migrations.

Starting from a JSR-based enterprise Java taxonomy, expert migrations create verified implementations across Spring, Jakarta EE, and Quarkus.

We evaluated several state-of-the-art coding agents on ScarfBench.

Share this article

X LinkedIn Telegram

Source: Hugging Face