RandSet: Randomized Corpus Reduction for Fuzzing Seed Scheduling (SPLASH 2026 - OOPSLA)

Sat 3 - Fri 9 October 2026 Oakland, California, United States

co-located with SPLASH/ISSTA 2026

Who

Yuchong Xie, Kaikai Zhang, Yu Liu, Rundong Yang, Ping Chen, Shuai Wang, Dongdong She

Track

SPLASH 2026 OOPSLA

Abstract

Seed explosion is a fundamental problem in fuzzing seed scheduling. It occurs when a fuzzer maintains a corpus with a huge number of seeds and fails to choose a promising one. Existing seed scheduling works focus on seed prioritization but suffer from the seed explosion since the seed corpus size is still huge. We tackle seed explosion from a new perspective, corpus reduction, i.e., compute a seed corpus subset. Corpus reduction can eliminate redundant seeds in the corpus and significantly reduce corpus size. However, this could lead to poor diversity in seed selection and severely impact the fuzzing performance. Meanwhile, effective corpus reduction incurs large runtime overhead. In practice, it’s challenging to adopt corpus reduction in fuzzing seed scheduling. Prior techniques like cull_queue, AFLCmin and MinSet all suffer from poor seed diversity. AFL-Cmin and MinSet incur prohibitive runtime overhead and are hence only applicable to one-time task initial seed selection rather than high-frequency seed scheduling.

We propose a novel randomized corpus reduction technique, RandSet, that can reduce the corpus size and yield diverse seed selection simultaneously. Meanwhile, the runtime overhead of RandSet is minimal, suiting a high-frequency seed scheduling task. Our key insight is to introduce randomness into corpus reduction so as to enjoy the two benefits of a randomized algorithm: randomized output (i.e., diverse seed selection) and low runtime overhead. Specifically, we formulate the corpus reduction in seed scheduling as a classic set cover problem and compute a randomized subset of seed corpus as a set cover to cover all features of the entire corpus. We then develop a novel seed scheduling approach using the randomized corpus subset. Our technique can effectively mitigate seed explosion by scheduling a small and randomized subset of the corpus rather than the entire corpus.

We implement RandSet on three popular fuzzers: AFL++, LibAFL and Centipede to showcase its general algorithmic design. We perform a comprehensive evaluation of RandSet on three benchmarks: standalone programs, FuzzBench and Magma. Our evaluation results show that RandSet can achieve significantly more diverse seed selection compared with other corpus reduction techniques. RandSet also yields high reduction ratio, achieving an average subset ratio of 4.03% and 5.99% after corpus reduction in terms of standalone programs and FuzzBench programs. In terms of fuzzing performance gain from our randomized corpus reduction, RandSet achieves a 16.58% gain on standalone programs and up to 3.57% gain on FuzzBench programs in AFL++. RandSet triggers up to 7 more ground-truth bugs than the state-of-the-art fuzzer on Magma, while introducing only 3.93% overhead on standalone programs and as low as 1.17% overhead on FuzzBench.

Yuchong Xie

Hong Kong University of Science and Technology

Kaikai Zhang

Hong Kong University of Science and Technology