User Seeks to Reduce High Time-to-First-Token on GPT Pro API Relay Deployed on US CN2 Server
English summary
A user set up a GPT Pro subscription relay for colleagues using sub2api on a US-based CN2 server with a ping latency of approximately 160ms. The relay exhibits high time-to-first-token (TTFT), making responses slow. The user is seeking optimization advice without clear direction.
Chinese summary
用户为同事搭建了 GPT Pro 共享订阅中转,使用 sub2api 部署在一台美国 CN2 服务器上,ping 延迟约 160ms。目前中转首 token 生成时间(TTFT)很高,反应缓慢,用户寻求优化思路。
Key points
GPT Pro subscription is shared among colleagues via a relay.
GPT Pro 订阅通过中转分享给同事。
The relay uses sub2api and is hosted on a US CN2 server with ~160ms ping.
中转使用 sub2api,部署在 ping 约 160ms 的美国 CN2 服务器上。
Time-to-first-token (TTFT) is noticeably high, affecting responsiveness.
首 token 生成时间(TTFT)明显偏高,影响响应速度。
User requests optimization strategies for this setup.
用户寻求针对此配置的优化思路。