Preface
Recently, several domestic big model manufacturers have launched Coding Plan subscription packages for developers, promoting “low prices for massive usage,” claiming that for just tens to hundreds of RMB per month, you can get “hundreds of billions of tokens” of usage quota.
It sounds wonderful, but as a developer accustomed to speaking with data, I decided to do some calculations: Under concurrency limits, can these promised usage amounts really be consumed?
Typical Package Structure
Taking the common three-tier packages on the market as an example:
| Package | Monthly Fee | Promised Usage (every 5 hours) |
|---|---|---|
| Lite | ~20 RMB | About 120 prompts |
| Pro | ~100 RMB | About 600 prompts |
| Max | ~200 RMB | About 2,400 prompts |
Officials will also add: “Each prompt is expected to call the model 15-20 times, with a total monthly usage of up to tens to hundreds of billions of tokens.”
It seems like incredible value, but the devil is in the details.
Key Limitation: Concurrency
Most manufacturers’ documentation will casually mention: “Package usage is subject to concurrency limits (number of in-flight request tasks).”
But what exactly is the limit? Often not explicitly stated. According to community feedback and actual measurements, typical concurrency limits are as follows:
| Package | Concurrency (in-flight requests) |
|---|---|
| Lite | 2 |
| Pro | ~4-5 |
| Max | ~7 |
This number directly determines your actual throughput ceiling.
Math Time: Can the Max Package Use 2,400 Prompts?
Let’s take the highest-tier Max package as an example and do a simple calculation.
Known Conditions
- Promised Usage: 2,400 prompts every 5 hours
- Concurrency Limit: 7
- Model calls triggered per prompt: 15-20 times (official data)
- Model generation speed: About 50-60 tokens/second
- 5 hours = 18,000 seconds
Calculation Process
Step 1: Estimate single API call time
A complete API call includes:
- Input processing: ~1 second
- Model inference generation (assuming 500 tokens output): 500 ÷ 55 ≈ 9 seconds
- Network round-trip delay: ~1 second
Total: About 10-12 seconds/call
Step 2: Calculate maximum calls in 5 hours
| |
Step 3: Convert to prompts
According to official claims, each prompt triggers 15-20 calls:
| |
Conclusion
| Metric | Official Promise | Concurrency Limit | Achievement Rate |
|---|---|---|---|
| Prompts per 5 hours | 2,400 | ~720 | 30% |
Even under ideal conditions, the actual usable amount of the Max package is only about 30% of the promise.
Harsher Reality: Call Inflation in Agent Mode
The above calculation is still based on the official claim of “15-20 calls per prompt.” But in actual AI Coding Agent scenarios (like Claude Code, Cline, etc.), the situation is much worse.
How Agent Mode Works
When you give an AI programming assistant a task, it typically:
- Analyzes requirements, creates a plan
- Reads relevant files (each file may trigger a call)
- Writes code
- Runs tests
- Discovers errors, fixes them
- Repeats 3-5 until successful
A seemingly simple prompt may trigger 50-100+ model calls in an Agent loop.
Actual Measurement Case
User feedback:
“2 simple prompts, 80 seconds, consumed 38M Tokens, used up 97% of the 5-hour limit”
Reverse calculation:
- Each prompt consumes about 19M tokens
- If calculated at 128K context, equivalent to ~127 model calls/prompt
This is 6-8 times higher than the official “15-20 times.”
Revised Actual Usable Amount
| Scenario | Calls per prompt | Usable prompts in 5 hours | Achievement Rate |
|---|---|---|---|
| Official ideal | 17.5 | 720 | 30% |
| Light usage | 50 | 252 | 10.5% |
| Moderate usage | 75 | 168 | 7% |
| Heavy Agent usage | 100+ | <126 | <5% |
Why Is This Happening?
1. Token Calculation Includes Context
Big model token consumption isn’t just output, it includes input. In Coding scenarios:
- Each call must send complete conversation history
- Code project context can easily reach tens of K tokens
- 128K context window means each call may consume 100K+ tokens
2. Concurrency is a Hard Constraint
Regardless of how large your package quota is, concurrency determines the maximum throughput per unit time. This is a physical bottleneck, not something commercial strategies can bypass.
3. Promises Based on Ideal Assumptions
Manufacturers’ promotional numbers are often based on:
- Each call uses only small context
- Each prompt triggers only a few calls
- Users won’t use continuously at high intensity
But these assumptions rarely hold true in real AI Coding scenarios.
A Table to See the Truth
Taking the Max package (~200 RMB/month) as an example:
| Metric | Official Promotion | Theoretical Limit | Actual Expectation |
|---|---|---|---|
| Prompts per 5 hours | 2,400 | 720 | 150-400 |
| Monthly prompts | 345,600 | 103,680 | 21,600-57,600 |
| Monthly tokens | “Hundreds of billions” | ~10 billion | 1-3 billion |
| Achievement Rate | 100% | 30% | 5-17% |
Advice for Developers
1. Don’t Be Fooled by “Hundreds of Billions of Tokens”
Token count is a highly misleading metric. In Coding Agent scenarios, context takes up the majority, with truly effective output tokens possibly only 1-5%.
2. Focus on Concurrency
This is the core metric that determines actual experience. If manufacturers don’t disclose concurrency limits, it’s likely because the numbers don’t look good.
3. Calculate Cost per Prompt
| |
Taking the Max package as an example:
- Official promotion: 200 ÷ 345,600 = 0.0006 RMB/prompt
- Actual situation: 200 ÷ 30,000 = 0.007 RMB/prompt
A 10x difference.
4. Consider Pay-as-You-Go
If your usage isn’t high, pay-as-you-go may be more cost-effective than monthly packages. At least you won’t pay for “unusable quotas.”
Conclusion
The emergence of big model Coding Plan packages is itself a good thing, lowering the barrier for developers to use AI programming assistants. But when choosing packages, be sure to:
- Require manufacturers to disclose concurrency limits
- Calculate throughput limits yourself
- Don’t be misled by the big numbers of “hundreds of billions of tokens”
After all, promised usage that can’t be consumed equals a disguised price increase.
This article is based on public information and mathematical derivation; specific values may vary due to manufacturer adjustments. Readers are advised to verify through actual measurements.
