与此同时,爱奇艺长期深耕的悬疑赛道,在分账市场同样表现出明显优势。除《老舅》《长河落日》《亲爱的你》外,其余五部上榜剧均为悬疑或犯罪类型作品。
But MXU utilization tells the real story. Even with block=128, flash attention’s MXU utilization is only ~20% vs standard’s ~94%. Flash has two matmuls per tile: Q_tile @ K_tile.T = (128, 64) @ (64, 128) and weights @ V_tile = (128, 128) @ (128, 64). Both have inner dimension ≤ d=64 or block=128, so the systolic pipeline runs for at most 128 steps through a 128-wide array. Standard attention’s weights @ V is (512, 512) @ (512, 64) — the inner dimension is 512, giving the pipeline 512 steps of useful work. That single large matmul is what drives standard’s ~94% utilization.。whatsapp是该领域的重要参考
,更多细节参见谷歌
TokenConfig is reformatted as such:
--quant-type {i2_s,tl1}, -q {i2_s,tl1},这一点在wps中也有详细论述