fix typo and inst

yzh119 · yzh119 · commit 6cf8629c3071 · 2024-02-06T17:33:29.000+08:00
diff --git a/_posts/2024-01-03-introduce-flashinfer.md b/_posts/2024-01-03-introduce-flashinfer.md
@@ -186,7 +186,7 @@ Figure 10: Fused RoPE attention performance, use Llama2-7B setting: um_kv_heads=
 </p>
 
 RoPE has negligible overhead on all 4 GPUs, especially for RTX 6000 Ada and RTX 4090 GPU which has
-strong CUDA Cores performance (RoPE requires `sin`/`cos` computation that can only be accelerated with Tensor Cores).
+strong CUDA Cores performance (RoPE requires `sin`/`cos` computation that can not be accelerated with Tensor Cores).
 
 ### Low-Precision Attention
 
diff --git a/_posts/2024-01-08-cascade-inference.md b/_posts/2024-01-08-cascade-inference.md
@@ -3,7 +3,7 @@ layout: post
 title:  "Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding"
 date:  2024-02-02
 comments: true
-author: Zihao Ye (UW), Ruihang Lai (CMU), Bo-Ru Lu (UW), Chien-Yu Lin (UW), Size Zheng (UW & PKU), Lequn Chen (UW), Tianqi Chen (CMU & OctoML), Luis Ceze (UW & OctoML)
+author: Zihao Ye (UW), Ruihang Lai (CMU), Bo-Ru Lu (UW), Chien-Yu Lin (UW), Size Zheng (UW & PKU), Lequn Chen (UW), Tianqi Chen (CMU & OctoAI), Luis Ceze (UW & OctoAI)
 redirect_from: "/2024/01/08/cascade-inference"
 ---