vllm.model_executor.layers.fused_moe.moe_pallas
_histogram
¶
Compute the histogram of a int32 tensor. The bin edges are defined by the min and max values, with step = 1.
Source code in vllm/model_executor/layers/fused_moe/moe_pallas.py
fused_moe
¶
fused_moe(
hidden_states: Tensor,
w1: Tensor,
w2: Tensor,
gating_output: Tensor,
topk: int,
global_num_experts: int,
expert_map: Tensor = None,
renormalize: bool = False,
) -> Tensor
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_states
|
Tensor
|
[*, hidden_size] |
required |
w1
|
Tensor
|
[num_experts, intermediate_size * 2, hidden_size] |
required |
w2
|
Tensor
|
[num_experts, hidden_size, intermediate_size] |
required |
gating_output
|
Tensor
|
[*, num_experts] |
required |