Process-Centric Analysis of Agentic Software Systems
Agentic systems are modern software systems: they consist of orchestrated modules, expose interfaces, and are deployed in software pipelines. Unlike conventional programs, their execution, i.e., trajectories, is inherently stochastic and adaptive to the problems they are solving. Evaluation of such systems is often outcome-centric, i.e., judging their performance based on success or failure at the final step. This narrow focus overlooks detailed insights about such systems, failing to explain how agents reason, plan, act, or change their strategies. Inspired by the structured representation of conventional software systems as graphs, we introduce Graphectory to systematically encode the temporal and semantic relations in such software systems. Graphectory facilitates the design of process-centric metrics and analyses to assess the quality of agentic workflows.
Using Graphectory, we automatically analyze 4000 trajectories of two dominant agentic programming workflows, namely SWE-agent and OpenHands, with a combination of four backbone Large Language Models (LLMs), attempting to resolve SWE-bench Verified issues. Our fully automated analyses (completed within four minutes) reveal that: (1) agents using richer prompts or stronger LLMs exhibit more complex Graphectory, reflecting deeper exploration, broader context gathering, and more thorough validation before patch submission; (2) agents’ problem-solving strategies vary with both problem difficulty and the underlying LLM—for resolved issues, the strategies often follow coherent localization–patching–validation steps, while unresolved ones exhibit chaotic, repetitive, or backtracking behaviors; and (3) even when successful, agentic programming systems often display inefficient processes, leading to unnecessarily prolonged trajectories.
We also implement a novel technique for real-time construction and analysis of Graphectory and Langutory during the agent’s execution to flag trajectory issues. Upon detecting such issues in the trajectory, the proposed technique notifies the agent with a diagnostic message and, when applicable, rolls back the trajectory. The experimental results show that online monitoring and process-centric analysis, when accompanied by proper interventions, can improve resolution rates by 6.9%-23.5% across different models for problematic instances, while significantly shortening trajectories with a near-zero overhead