vllm.attention.ops.rocm_aiter_paged_attn
AITERPagedAttention
¶
Bases: PagedAttention
Source code in vllm/attention/ops/rocm_aiter_paged_attn.py
forward_decode
staticmethod
¶
forward_decode(
query: Tensor,
key_cache: Tensor,
value_cache: Tensor,
block_tables: Tensor,
seq_lens: Tensor,
max_seq_len: int,
kv_cache_dtype: str,
num_kv_heads: int,
scale: float,
alibi_slopes: Optional[Tensor],
k_scale: Tensor,
v_scale: Tensor,
tp_rank: int = 0,
blocksparse_local_blocks: int = 0,
blocksparse_vert_stride: int = 0,
blocksparse_block_size: int = 64,
blocksparse_head_sliding_step: int = 0,
) -> Tensor
Source code in vllm/attention/ops/rocm_aiter_paged_attn.py
write_to_paged_cache
staticmethod
¶
write_to_paged_cache(
key: Tensor,
value: Tensor,
key_cache: Tensor,
value_cache: Tensor,
slot_mapping: Tensor,
kv_cache_dtype: str,
k_scale: Tensor,
v_scale: Tensor,
) -> None