EditFlow: Benchmarking and Optimizing Code Edit Recommendation Systems via Reconstruction of Developer Flows (SPLASH 2026 - OOPSLA)

Sat 3 - Fri 9 October 2026 Oakland, California, United States

co-located with SPLASH/ISSTA 2026

Who

Chenyan Liu, Yun Lin, Jiaxin Chang, Jiawei Liu, Binhang Qi, Bo Jiang, Zhiyong Huang, Jin Song Dong

Track

SPLASH 2026 OOPSLA

Abstract

Large language models (LLMs) for code editing have achieved remarkable progress, yet recent empirical studies reveal a fundamental disconnect between technical accuracy and developer productivity. Despite their strong benchmark performance, developers complete tasks 19% slower when using AI assistance, with over 68.81% of recommendations disrupting their mental flow. This misalignment stems from the use of static commit snapshots that lack temporal information, causing models to optimize for end results rather than the incremental, context-sensitive steps that align with developers’ natural reasoning process. To bridge this gap, we present EditFlow, a comprehensive framework for benchmarking and optimizing subsequent code edit recommendation systems through the reconstruction of developer editing flows. EditFlow addresses three key challenges. First, collecting edit-order data is inherently difficult: manual annotation introduces prohibitive overhead, while development logs capture only single trajectories instead of all plausible editing flows. Second, benchmarking recommendation performance during developers’ ongoing editing requires a digital-twin-like simulation that can faithfully reproduce editing behaviors and interoperate with heterogeneous recommendation systems. Third, existing systems vary drastically in scale and architecture, posing challenges for developing a unified optimization strategy that endows all models with mental-flow awareness regardless of design or capability. To overcome these challenges, EditFlow integrates three tightly coupled components: (1) a prompt auto- tuning mechanism that learns to infer the relative order between two edits from a small annotated dataset, (2) a digital twin environment that replays reconstructed edit sequences to simulate developers’ original editing process, and (3) a unified optimization strategy (EditFlow) that evaluates flow coherence between user edits and recommendations, filtering out flow-breaking suggestions. Evaluations on 100 annotated and 500 industry commits show that EditFlow improves order reconstruction accuracy by 63.81%, reduces flow violations by over 75%, and boosts recommendation precision by 90.70%. A user study with 32 developers further demonstrates 25.11% faster task completion and significantly higher perceived recommendation quality. To our knowledge, EditFlow is the first framework to evaluate and optimize code edit recommendation systems from the perspective of developers’ mental flow, establishing flow-awareness as a new dimension for advancing human-AI code collaboration

Chenyan Liu

Shanghai Jiao Tong University; National University of Singapore

China

Yun Lin

Shanghai Jiao Tong University

China

Jiaxin Chang