Featured image of post The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?

The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?

Calculating the actual usable volume of big model Coding packages from the perspective of concurrency limits and throughput, revealing the gap between promises and reality.

语速

Preface

Recently, several domestic big model manufacturers have launched Coding Plan subscription packages for developers, promoting “low prices for massive usage,” claiming that for just tens to hundreds of RMB per month, you can get “hundreds of billions of tokens” of usage quota.

It sounds wonderful, but as a developer accustomed to speaking with data, I decided to do some calculations: Under concurrency limits, can these promised usage amounts really be consumed?

Typical Package Structure

Taking the common three-tier packages on the market as an example:

PackageMonthly FeePromised Usage (every 5 hours)
Lite~20 RMBAbout 120 prompts
Pro~100 RMBAbout 600 prompts
Max~200 RMBAbout 2,400 prompts

Officials will also add: “Each prompt is expected to call the model 15-20 times, with a total monthly usage of up to tens to hundreds of billions of tokens.”

It seems like incredible value, but the devil is in the details.

Key Limitation: Concurrency

Most manufacturers’ documentation will casually mention: “Package usage is subject to concurrency limits (number of in-flight request tasks).”

But what exactly is the limit? Often not explicitly stated. According to community feedback and actual measurements, typical concurrency limits are as follows:

PackageConcurrency (in-flight requests)
Lite2
Pro~4-5
Max~7

This number directly determines your actual throughput ceiling.

Math Time: Can the Max Package Use 2,400 Prompts?

Let’s take the highest-tier Max package as an example and do a simple calculation.

Known Conditions

  • Promised Usage: 2,400 prompts every 5 hours
  • Concurrency Limit: 7
  • Model calls triggered per prompt: 15-20 times (official data)
  • Model generation speed: About 50-60 tokens/second
  • 5 hours = 18,000 seconds

Calculation Process

Step 1: Estimate single API call time

A complete API call includes:

  • Input processing: ~1 second
  • Model inference generation (assuming 500 tokens output): 500 ÷ 55 ≈ 9 seconds
  • Network round-trip delay: ~1 second

Total: About 10-12 seconds/call

Step 2: Calculate maximum calls in 5 hours

1
2
3
Maximum calls = Concurrency × (Total time ÷ Single call time)
              = 7 × (18,000 ÷ 10)
              = 12,600 calls

Step 3: Convert to prompts

According to official claims, each prompt triggers 15-20 calls:

1
Completable prompts = 12,600 ÷ 17.5 ≈ 720 prompts

Conclusion

MetricOfficial PromiseConcurrency LimitAchievement Rate
Prompts per 5 hours2,400~72030%

Even under ideal conditions, the actual usable amount of the Max package is only about 30% of the promise.

Harsher Reality: Call Inflation in Agent Mode

The above calculation is still based on the official claim of “15-20 calls per prompt.” But in actual AI Coding Agent scenarios (like Claude Code, Cline, etc.), the situation is much worse.

How Agent Mode Works

When you give an AI programming assistant a task, it typically:

  1. Analyzes requirements, creates a plan
  2. Reads relevant files (each file may trigger a call)
  3. Writes code
  4. Runs tests
  5. Discovers errors, fixes them
  6. Repeats 3-5 until successful

A seemingly simple prompt may trigger 50-100+ model calls in an Agent loop.

Actual Measurement Case

User feedback:

“2 simple prompts, 80 seconds, consumed 38M Tokens, used up 97% of the 5-hour limit”

Reverse calculation:

  • Each prompt consumes about 19M tokens
  • If calculated at 128K context, equivalent to ~127 model calls/prompt

This is 6-8 times higher than the official “15-20 times.”

Revised Actual Usable Amount

ScenarioCalls per promptUsable prompts in 5 hoursAchievement Rate
Official ideal17.572030%
Light usage5025210.5%
Moderate usage751687%
Heavy Agent usage100+<126<5%

Why Is This Happening?

1. Token Calculation Includes Context

Big model token consumption isn’t just output, it includes input. In Coding scenarios:

  • Each call must send complete conversation history
  • Code project context can easily reach tens of K tokens
  • 128K context window means each call may consume 100K+ tokens

2. Concurrency is a Hard Constraint

Regardless of how large your package quota is, concurrency determines the maximum throughput per unit time. This is a physical bottleneck, not something commercial strategies can bypass.

3. Promises Based on Ideal Assumptions

Manufacturers’ promotional numbers are often based on:

  • Each call uses only small context
  • Each prompt triggers only a few calls
  • Users won’t use continuously at high intensity

But these assumptions rarely hold true in real AI Coding scenarios.

A Table to See the Truth

Taking the Max package (~200 RMB/month) as an example:

MetricOfficial PromotionTheoretical LimitActual Expectation
Prompts per 5 hours2,400720150-400
Monthly prompts345,600103,68021,600-57,600
Monthly tokens“Hundreds of billions”~10 billion1-3 billion
Achievement Rate100%30%5-17%

Advice for Developers

1. Don’t Be Fooled by “Hundreds of Billions of Tokens”

Token count is a highly misleading metric. In Coding Agent scenarios, context takes up the majority, with truly effective output tokens possibly only 1-5%.

2. Focus on Concurrency

This is the core metric that determines actual experience. If manufacturers don’t disclose concurrency limits, it’s likely because the numbers don’t look good.

3. Calculate Cost per Prompt

1
Actual cost per prompt = Monthly fee ÷ Actual usable prompts

Taking the Max package as an example:

  • Official promotion: 200 ÷ 345,600 = 0.0006 RMB/prompt
  • Actual situation: 200 ÷ 30,000 = 0.007 RMB/prompt

A 10x difference.

4. Consider Pay-as-You-Go

If your usage isn’t high, pay-as-you-go may be more cost-effective than monthly packages. At least you won’t pay for “unusable quotas.”

Conclusion

The emergence of big model Coding Plan packages is itself a good thing, lowering the barrier for developers to use AI programming assistants. But when choosing packages, be sure to:

  1. Require manufacturers to disclose concurrency limits
  2. Calculate throughput limits yourself
  3. Don’t be misled by the big numbers of “hundreds of billions of tokens”

After all, promised usage that can’t be consumed equals a disguised price increase.


This article is based on public information and mathematical derivation; specific values may vary due to manufacturer adjustments. Readers are advised to verify through actual measurements.