ReasonAlloc: Hierarchical Decoding-Time KV Cache Budget Allocation for Reasoning Models
The paper introduces ReasonAlloc, a hierarchical method for allocating key-value (KV) cache budgets during the decoding phase of reasoning models. It addresses computational resource management challenges by distributing cache resources more efficiently through a structured, multi-level allocation strategy. The approach aims to maintain model speed and accuracy while processing complex reasoning tasks. Experimental results demonstrate performance improvements over baseline allocation methods. The work highlights the importance of resource-aware inference for scaling reasoning models in practical applications.