Loading / 加载中

Self-Hosted Gemma 2 9B: FP8 Quantization Imposes 58% Prefill Latency Penalty on NVIDIA L4 but Improves Decoding and Frees VRAM | thinkgap