<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Technical Analysis on Svtter's Blog</title><link>https://svtter.cn/en/categories/technical-analysis/</link><description>Recent content in Technical Analysis on Svtter's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 23 Jan 2026 11:52:52 +0800</lastBuildDate><atom:link href="https://svtter.cn/en/categories/technical-analysis/index.xml" rel="self" type="application/rss+xml"/><item><title>The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?</title><link>https://svtter.cn/en/p/the-mathematical-trap-of-big-model-coding-plan-packages-can-promised-usage-be-delivered-under-concurrency-limits/</link><pubDate>Fri, 23 Jan 2026 11:52:52 +0800</pubDate><guid>https://svtter.cn/en/p/the-mathematical-trap-of-big-model-coding-plan-packages-can-promised-usage-be-delivered-under-concurrency-limits/</guid><description>&lt;img src="https://svtter.cn/p/%E5%A4%A7%E6%A8%A1%E5%9E%8B-coding-plan-%E5%A5%97%E9%A4%90%E7%9A%84%E6%95%B0%E5%AD%A6%E9%99%B7%E9%98%B1%E5%B9%B6%E5%8F%91%E9%99%90%E5%88%B6%E4%B8%8B%E7%9A%84%E6%89%BF%E8%AF%BA%E9%87%8F%E8%83%BD%E5%90%A6%E5%85%91%E7%8E%B0/cover.png" alt="Featured image of post The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?" /&gt;&lt;h2 id="preface"&gt;Preface
&lt;/h2&gt;&lt;p&gt;Recently, several domestic big model manufacturers have launched Coding Plan subscription packages for developers, promoting &amp;ldquo;low prices for massive usage,&amp;rdquo; claiming that for just tens to hundreds of RMB per month, you can get &amp;ldquo;hundreds of billions of tokens&amp;rdquo; of usage quota.&lt;/p&gt;
&lt;p&gt;It sounds wonderful, but as a developer accustomed to speaking with data, I decided to do some calculations: &lt;strong&gt;Under concurrency limits, can these promised usage amounts really be consumed?&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="typical-package-structure"&gt;Typical Package Structure
&lt;/h2&gt;&lt;p&gt;Taking the common three-tier packages on the market as an example:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Monthly Fee&lt;/th&gt;
&lt;th&gt;Promised Usage (every 5 hours)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lite&lt;/td&gt;
&lt;td&gt;~20 RMB&lt;/td&gt;
&lt;td&gt;About 120 prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;~100 RMB&lt;/td&gt;
&lt;td&gt;About 600 prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max&lt;/td&gt;
&lt;td&gt;~200 RMB&lt;/td&gt;
&lt;td&gt;About 2,400 prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Officials will also add: &amp;ldquo;Each prompt is expected to call the model 15-20 times, with a total monthly usage of up to tens to hundreds of billions of tokens.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It seems like incredible value, but the devil is in the details.&lt;/p&gt;
&lt;h2 id="key-limitation-concurrency"&gt;Key Limitation: Concurrency
&lt;/h2&gt;&lt;p&gt;Most manufacturers&amp;rsquo; documentation will casually mention: &amp;ldquo;Package usage is subject to concurrency limits (number of in-flight request tasks).&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But what exactly is the limit? Often not explicitly stated. According to community feedback and actual measurements, typical concurrency limits are as follows:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Concurrency (in-flight requests)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lite&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;~4-5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max&lt;/td&gt;
&lt;td&gt;~7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This number directly determines your actual throughput ceiling.&lt;/p&gt;
&lt;h2 id="math-time-can-the-max-package-use-2400-prompts"&gt;Math Time: Can the Max Package Use 2,400 Prompts?
&lt;/h2&gt;&lt;p&gt;Let&amp;rsquo;s take the highest-tier Max package as an example and do a simple calculation.&lt;/p&gt;
&lt;h3 id="known-conditions"&gt;Known Conditions
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Promised Usage&lt;/strong&gt;: 2,400 prompts every 5 hours&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrency Limit&lt;/strong&gt;: 7&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model calls triggered per prompt&lt;/strong&gt;: 15-20 times (official data)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model generation speed&lt;/strong&gt;: About 50-60 tokens/second&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5 hours = 18,000 seconds&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="calculation-process"&gt;Calculation Process
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Step 1: Estimate single API call time&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A complete API call includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Input processing: ~1 second&lt;/li&gt;
&lt;li&gt;Model inference generation (assuming 500 tokens output): 500 ÷ 55 ≈ 9 seconds&lt;/li&gt;
&lt;li&gt;Network round-trip delay: ~1 second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Total: About 10-12 seconds/call&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 2: Calculate maximum calls in 5 hours&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Maximum calls = Concurrency × (Total time ÷ Single call time)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; = 7 × (18,000 ÷ 10)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; = 12,600 calls
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Step 3: Convert to prompts&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;According to official claims, each prompt triggers 15-20 calls:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Completable prompts = 12,600 ÷ 17.5 ≈ 720 prompts
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id="conclusion"&gt;Conclusion
&lt;/h3&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Official Promise&lt;/th&gt;
&lt;th&gt;Concurrency Limit&lt;/th&gt;
&lt;th&gt;Achievement Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompts per 5 hours&lt;/td&gt;
&lt;td&gt;2,400&lt;/td&gt;
&lt;td&gt;~720&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Even under ideal conditions, the actual usable amount of the Max package is only about 30% of the promise.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="harsher-reality-call-inflation-in-agent-mode"&gt;Harsher Reality: Call Inflation in Agent Mode
&lt;/h2&gt;&lt;p&gt;The above calculation is still based on the official claim of &amp;ldquo;15-20 calls per prompt.&amp;rdquo; But in actual AI Coding Agent scenarios (like Claude Code, Cline, etc.), the situation is much worse.&lt;/p&gt;
&lt;h3 id="how-agent-mode-works"&gt;How Agent Mode Works
&lt;/h3&gt;&lt;p&gt;When you give an AI programming assistant a task, it typically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Analyzes requirements, creates a plan&lt;/li&gt;
&lt;li&gt;Reads relevant files (each file may trigger a call)&lt;/li&gt;
&lt;li&gt;Writes code&lt;/li&gt;
&lt;li&gt;Runs tests&lt;/li&gt;
&lt;li&gt;Discovers errors, fixes them&lt;/li&gt;
&lt;li&gt;Repeats 3-5 until successful&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A seemingly simple prompt may trigger &lt;strong&gt;50-100+ model calls&lt;/strong&gt; in an Agent loop.&lt;/p&gt;
&lt;h3 id="actual-measurement-case"&gt;Actual Measurement Case
&lt;/h3&gt;&lt;p&gt;User feedback:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;2 simple prompts, 80 seconds, consumed 38M Tokens, used up 97% of the 5-hour limit&amp;rdquo;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Reverse calculation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each prompt consumes about 19M tokens&lt;/li&gt;
&lt;li&gt;If calculated at 128K context, equivalent to &lt;strong&gt;~127 model calls/prompt&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is &lt;strong&gt;6-8 times higher&lt;/strong&gt; than the official &amp;ldquo;15-20 times.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="revised-actual-usable-amount"&gt;Revised Actual Usable Amount
&lt;/h3&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Calls per prompt&lt;/th&gt;
&lt;th&gt;Usable prompts in 5 hours&lt;/th&gt;
&lt;th&gt;Achievement Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Official ideal&lt;/td&gt;
&lt;td&gt;17.5&lt;/td&gt;
&lt;td&gt;720&lt;/td&gt;
&lt;td&gt;30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Light usage&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;252&lt;/td&gt;
&lt;td&gt;10.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Moderate usage&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;168&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heavy Agent usage&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;td&gt;&amp;lt;126&lt;/td&gt;
&lt;td&gt;&amp;lt;5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="why-is-this-happening"&gt;Why Is This Happening?
&lt;/h2&gt;&lt;h3 id="1-token-calculation-includes-context"&gt;1. Token Calculation Includes Context
&lt;/h3&gt;&lt;p&gt;Big model token consumption isn&amp;rsquo;t just output, it includes input. In Coding scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each call must send complete conversation history&lt;/li&gt;
&lt;li&gt;Code project context can easily reach tens of K tokens&lt;/li&gt;
&lt;li&gt;128K context window means each call may consume 100K+ tokens&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-concurrency-is-a-hard-constraint"&gt;2. Concurrency is a Hard Constraint
&lt;/h3&gt;&lt;p&gt;Regardless of how large your package quota is, concurrency determines the maximum throughput per unit time. This is a &lt;strong&gt;physical bottleneck&lt;/strong&gt;, not something commercial strategies can bypass.&lt;/p&gt;
&lt;h3 id="3-promises-based-on-ideal-assumptions"&gt;3. Promises Based on Ideal Assumptions
&lt;/h3&gt;&lt;p&gt;Manufacturers&amp;rsquo; promotional numbers are often based on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each call uses only small context&lt;/li&gt;
&lt;li&gt;Each prompt triggers only a few calls&lt;/li&gt;
&lt;li&gt;Users won&amp;rsquo;t use continuously at high intensity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But these assumptions rarely hold true in real AI Coding scenarios.&lt;/p&gt;
&lt;h2 id="a-table-to-see-the-truth"&gt;A Table to See the Truth
&lt;/h2&gt;&lt;p&gt;Taking the Max package (~200 RMB/month) as an example:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Official Promotion&lt;/th&gt;
&lt;th&gt;Theoretical Limit&lt;/th&gt;
&lt;th&gt;Actual Expectation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompts per 5 hours&lt;/td&gt;
&lt;td&gt;2,400&lt;/td&gt;
&lt;td&gt;720&lt;/td&gt;
&lt;td&gt;150-400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly prompts&lt;/td&gt;
&lt;td&gt;345,600&lt;/td&gt;
&lt;td&gt;103,680&lt;/td&gt;
&lt;td&gt;21,600-57,600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly tokens&lt;/td&gt;
&lt;td&gt;&amp;ldquo;Hundreds of billions&amp;rdquo;&lt;/td&gt;
&lt;td&gt;~10 billion&lt;/td&gt;
&lt;td&gt;1-3 billion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Achievement Rate&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;30%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5-17%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="advice-for-developers"&gt;Advice for Developers
&lt;/h2&gt;&lt;h3 id="1-dont-be-fooled-by-hundreds-of-billions-of-tokens"&gt;1. Don&amp;rsquo;t Be Fooled by &amp;ldquo;Hundreds of Billions of Tokens&amp;rdquo;
&lt;/h3&gt;&lt;p&gt;Token count is a highly misleading metric. In Coding Agent scenarios, context takes up the majority, with truly effective output tokens possibly only 1-5%.&lt;/p&gt;
&lt;h3 id="2-focus-on-concurrency"&gt;2. Focus on Concurrency
&lt;/h3&gt;&lt;p&gt;This is the core metric that determines actual experience. If manufacturers don&amp;rsquo;t disclose concurrency limits, it&amp;rsquo;s likely because the numbers don&amp;rsquo;t look good.&lt;/p&gt;
&lt;h3 id="3-calculate-cost-per-prompt"&gt;3. Calculate Cost per Prompt
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Actual cost per prompt = Monthly fee ÷ Actual usable prompts
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Taking the Max package as an example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Official promotion: 200 ÷ 345,600 = 0.0006 RMB/prompt&lt;/li&gt;
&lt;li&gt;Actual situation: 200 ÷ 30,000 = &lt;strong&gt;0.007 RMB/prompt&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A 10x difference.&lt;/p&gt;
&lt;h3 id="4-consider-pay-as-you-go"&gt;4. Consider Pay-as-You-Go
&lt;/h3&gt;&lt;p&gt;If your usage isn&amp;rsquo;t high, pay-as-you-go may be more cost-effective than monthly packages. At least you won&amp;rsquo;t pay for &amp;ldquo;unusable quotas.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="conclusion-1"&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;The emergence of big model Coding Plan packages is itself a good thing, lowering the barrier for developers to use AI programming assistants. But when choosing packages, be sure to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Require manufacturers to disclose concurrency limits&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Calculate throughput limits yourself&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Don&amp;rsquo;t be misled by the big numbers of &amp;ldquo;hundreds of billions of tokens&amp;rdquo;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After all, &lt;strong&gt;promised usage that can&amp;rsquo;t be consumed equals a disguised price increase.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This article is based on public information and mathematical derivation; specific values may vary due to manufacturer adjustments. Readers are advised to verify through actual measurements.&lt;/em&gt;&lt;/p&gt;</description></item></channel></rss>