Skip to content

Commit 6cf8629

Browse files
committed
fix typo and inst
1 parent e80f04d commit 6cf8629

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

_posts/2024-01-03-introduce-flashinfer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ Figure 10: Fused RoPE attention performance, use Llama2-7B setting: um_kv_heads=
186186
</p>
187187

188188
RoPE has negligible overhead on all 4 GPUs, especially for RTX 6000 Ada and RTX 4090 GPU which has
189-
strong CUDA Cores performance (RoPE requires `sin`/`cos` computation that can only be accelerated with Tensor Cores).
189+
strong CUDA Cores performance (RoPE requires `sin`/`cos` computation that can not be accelerated with Tensor Cores).
190190

191191
### Low-Precision Attention
192192

_posts/2024-01-08-cascade-inference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ layout: post
33
title: "Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding"
44
date: 2024-02-02
55
comments: true
6-
author: Zihao Ye (UW), Ruihang Lai (CMU), Bo-Ru Lu (UW), Chien-Yu Lin (UW), Size Zheng (UW & PKU), Lequn Chen (UW), Tianqi Chen (CMU & OctoML), Luis Ceze (UW & OctoML)
6+
author: Zihao Ye (UW), Ruihang Lai (CMU), Bo-Ru Lu (UW), Chien-Yu Lin (UW), Size Zheng (UW & PKU), Lequn Chen (UW), Tianqi Chen (CMU & OctoAI), Luis Ceze (UW & OctoAI)
77
redirect_from: "/2024/01/08/cascade-inference"
88
---
99

0 commit comments

Comments
 (0)