vllm.v1.kv_offload.reuse_manager ¶
Reuse-frequency gating for CPU KV-cache offload stores.
FilterReusedOffloadingManager — OffloadingManager decorator that skips storing blocks that have not yet been seen enough times.
FilterReusedOffloadingManager ¶
Bases: OffloadingManager
An :class:OffloadingManager decorator that skips storing blocks whose reuse frequency is below store_threshold.
All methods are delegated to the backing manager. Two methods are intercepted:
lookup— records each visited block hash in an internal LRU counter.prepare_store— filters out block hashes that have not yet crossed the threshold before calling the backingprepare_store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backing | OffloadingManager | The underlying | required |
store_threshold | int | A block must be seen at least this many times in | 2 |
max_tracker_size | int | Maximum entries in the internal tracker's LRU table. | 64000 |
Source code in vllm/v1/kv_offload/reuse_manager.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
lookup ¶
Record each hash, then delegate lookup to backing manager.
Source code in vllm/v1/kv_offload/reuse_manager.py
prepare_store ¶
prepare_store(
block_hashes: Iterable[BlockHash],
) -> PrepareStoreOutput | None
Filter out blocks below threshold, then delegate to backing.
Filtering is evaluated before calling the backing manager's prepare_store so that blocks that would be skipped do not consume any CPU offload capacity.