Designing GPU Data Structures for Efficient Memory Oversubscription
Efficient concurrent data structures are important building blocks for accelerating applications on GPUs. With ever-increasing memory footprint of GPU workloads, data structures used by kernels can exceed global memory capacity. Using the unified virtual memory (UVM) model is a popular approach for kernels to oversubscribe GPU memory without the need for explicit memory management by a programmer. However, data structures executing with UVM can suffer from performance degradation due to the high overheads associated with data migration and thrashing under irregular access patterns.
In this paper, we propose hash table and skip list designs to handle use cases where the data structure significantly oversubscribes GPU memory. We propose two-level hierarchical designs for both data structures that aim to maximize access locality. The outer-level container enables efficient jumps to desired regions of the data structure, while the inner container allows operating on the data. The inner container is sized to facilitate efficient data transfers between the CPU and the GPU. In addition, we also propose several optimizations to overcome bottlenecks in the state-of-the-art skip list algorithm for GPUs. Our experimental results show that our custom data structure designs substantially improve performance over optimized UVM baselines while supporting high degree of GPU memory oversubscription.