The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?

语速

Preface

Recently, several domestic big model manufacturers have launched Coding Plan subscription packages for developers, promoting “low prices for massive usage,” claiming that for just tens to hundreds of RMB per month, you can get “hundreds of billions of tokens” of usage quota.

It sounds wonderful, but as a developer accustomed to speaking with data, I decided to do some calculations: Under concurrency limits, can these promised usage amounts really be consumed?

Typical Package Structure

Taking the common three-tier packages on the market as an example:

Package	Monthly Fee	Promised Usage (every 5 hours)
Lite	~20 RMB	About 120 prompts
Pro	~100 RMB	About 600 prompts
Max	~200 RMB	About 2,400 prompts

Officials will also add: “Each prompt is expected to call the model 15-20 times, with a total monthly usage of up to tens to hundreds of billions of tokens.”

It seems like incredible value, but the devil is in the details.

Key Limitation: Concurrency

Most manufacturers’ documentation will casually mention: “Package usage is subject to concurrency limits (number of in-flight request tasks).”

But what exactly is the limit? Often not explicitly stated. According to community feedback and actual measurements, typical concurrency limits are as follows:

Package	Concurrency (in-flight requests)
Lite	2
Pro	~4-5
Max	~7

This number directly determines your actual throughput ceiling.

Math Time: Can the Max Package Use 2,400 Prompts?

Let’s take the highest-tier Max package as an example and do a simple calculation.

Known Conditions

Promised Usage: 2,400 prompts every 5 hours
Concurrency Limit: 7
Model calls triggered per prompt: 15-20 times (official data)
Model generation speed: About 50-60 tokens/second
5 hours = 18,000 seconds

Calculation Process

Step 1: Estimate single API call time

A complete API call includes:

Input processing: ~1 second
Model inference generation (assuming 500 tokens output): 500 ÷ 55 ≈ 9 seconds
Network round-trip delay: ~1 second

Total: About 10-12 seconds/call

Step 2: Calculate maximum calls in 5 hours

1
2
3
Maximum calls = Concurrency × (Total time ÷ Single call time)
              = 7 × (18,000 ÷ 10)
              = 12,600 calls

Step 3: Convert to prompts

According to official claims, each prompt triggers 15-20 calls:

1
Completable prompts = 12,600 ÷ 17.5 ≈ 720 prompts

Conclusion

Metric	Official Promise	Concurrency Limit	Achievement Rate
Prompts per 5 hours	2,400	~720	30%

Even under ideal conditions, the actual usable amount of the Max package is only about 30% of the promise.

Harsher Reality: Call Inflation in Agent Mode

The above calculation is still based on the official claim of “15-20 calls per prompt.” But in actual AI Coding Agent scenarios (like Claude Code, Cline, etc.), the situation is much worse.

How Agent Mode Works

When you give an AI programming assistant a task, it typically:

Analyzes requirements, creates a plan
Reads relevant files (each file may trigger a call)
Writes code
Runs tests
Discovers errors, fixes them
Repeats 3-5 until successful

A seemingly simple prompt may trigger 50-100+ model calls in an Agent loop.

Actual Measurement Case

User feedback:

“2 simple prompts, 80 seconds, consumed 38M Tokens, used up 97% of the 5-hour limit”

Reverse calculation:

Each prompt consumes about 19M tokens
If calculated at 128K context, equivalent to ~127 model calls/prompt

This is 6-8 times higher than the official “15-20 times.”

Revised Actual Usable Amount

Scenario	Calls per prompt	Usable prompts in 5 hours	Achievement Rate
Official ideal	17.5	720	30%
Light usage	50	252	10.5%
Moderate usage	75	168	7%
Heavy Agent usage	100+	<126	<5%

Why Is This Happening?

1. Token Calculation Includes Context

Big model token consumption isn’t just output, it includes input. In Coding scenarios:

Each call must send complete conversation history
Code project context can easily reach tens of K tokens
128K context window means each call may consume 100K+ tokens

2. Concurrency is a Hard Constraint

Regardless of how large your package quota is, concurrency determines the maximum throughput per unit time. This is a physical bottleneck, not something commercial strategies can bypass.

3. Promises Based on Ideal Assumptions

Manufacturers’ promotional numbers are often based on:

Each call uses only small context
Each prompt triggers only a few calls
Users won’t use continuously at high intensity

But these assumptions rarely hold true in real AI Coding scenarios.

A Table to See the Truth

Taking the Max package (~200 RMB/month) as an example:

Metric	Official Promotion	Theoretical Limit	Actual Expectation
Prompts per 5 hours	2,400	720	150-400
Monthly prompts	345,600	103,680	21,600-57,600
Monthly tokens	“Hundreds of billions”	~10 billion	1-3 billion
Achievement Rate	100%	30%	5-17%

Advice for Developers

1. Don’t Be Fooled by “Hundreds of Billions of Tokens”

Token count is a highly misleading metric. In Coding Agent scenarios, context takes up the majority, with truly effective output tokens possibly only 1-5%.

2. Focus on Concurrency

This is the core metric that determines actual experience. If manufacturers don’t disclose concurrency limits, it’s likely because the numbers don’t look good.

3. Calculate Cost per Prompt

1
Actual cost per prompt = Monthly fee ÷ Actual usable prompts

Taking the Max package as an example:

Official promotion: 200 ÷ 345,600 = 0.0006 RMB/prompt
Actual situation: 200 ÷ 30,000 = 0.007 RMB/prompt

A 10x difference.

4. Consider Pay-as-You-Go

If your usage isn’t high, pay-as-you-go may be more cost-effective than monthly packages. At least you won’t pay for “unusable quotas.”

Conclusion

The emergence of big model Coding Plan packages is itself a good thing, lowering the barrier for developers to use AI programming assistants. But when choosing packages, be sure to:

Require manufacturers to disclose concurrency limits
Calculate throughput limits yourself
Don’t be misled by the big numbers of “hundreds of billions of tokens”

After all, promised usage that can’t be consumed equals a disguised price increase.

This article is based on public information and mathematical derivation; specific values may vary due to manufacturer adjustments. Readers are advised to verify through actual measurements.