We use GPU memory and host storage for KV data cache as in AsyncKVCacheManager. This can help to reduce the recomputation of KV data. All the kvcache related operations are implemented as asynchronous ...
Thanks for your reply, @geoffreyQiu. I still have two questions. First, does your assumption (the kvdata is hit in gpu kvcache) always hold true in real-world scenarios? Have you conducted any ...