<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Coding on Svtter's Blog</title><link>https://svtter.cn/en/tags/coding/</link><description>Recent content in Coding on Svtter's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Fri, 23 Jan 2026 11:52:52 +0800</lastBuildDate><atom:link href="https://svtter.cn/en/tags/coding/index.xml" rel="self" type="application/rss+xml"/><item><title>The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?</title><link>https://svtter.cn/en/p/the-mathematical-trap-of-big-model-coding-plan-packages-can-promised-usage-be-delivered-under-concurrency-limits/</link><pubDate>Fri, 23 Jan 2026 11:52:52 +0800</pubDate><guid>https://svtter.cn/en/p/the-mathematical-trap-of-big-model-coding-plan-packages-can-promised-usage-be-delivered-under-concurrency-limits/</guid><description>&lt;img src="https://svtter.cn/p/%E5%A4%A7%E6%A8%A1%E5%9E%8B-coding-plan-%E5%A5%97%E9%A4%90%E7%9A%84%E6%95%B0%E5%AD%A6%E9%99%B7%E9%98%B1%E5%B9%B6%E5%8F%91%E9%99%90%E5%88%B6%E4%B8%8B%E7%9A%84%E6%89%BF%E8%AF%BA%E9%87%8F%E8%83%BD%E5%90%A6%E5%85%91%E7%8E%B0/cover.png" alt="Featured image of post The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?" /&gt;&lt;h2 id="preface"&gt;Preface
&lt;/h2&gt;&lt;p&gt;Recently, several domestic big model manufacturers have launched Coding Plan subscription packages for developers, promoting &amp;ldquo;low prices for massive usage,&amp;rdquo; claiming that for just tens to hundreds of RMB per month, you can get &amp;ldquo;hundreds of billions of tokens&amp;rdquo; of usage quota.&lt;/p&gt;
&lt;p&gt;It sounds wonderful, but as a developer accustomed to speaking with data, I decided to do some calculations: &lt;strong&gt;Under concurrency limits, can these promised usage amounts really be consumed?&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="typical-package-structure"&gt;Typical Package Structure
&lt;/h2&gt;&lt;p&gt;Taking the common three-tier packages on the market as an example:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Monthly Fee&lt;/th&gt;
&lt;th&gt;Promised Usage (every 5 hours)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lite&lt;/td&gt;
&lt;td&gt;~20 RMB&lt;/td&gt;
&lt;td&gt;About 120 prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;~100 RMB&lt;/td&gt;
&lt;td&gt;About 600 prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max&lt;/td&gt;
&lt;td&gt;~200 RMB&lt;/td&gt;
&lt;td&gt;About 2,400 prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Officials will also add: &amp;ldquo;Each prompt is expected to call the model 15-20 times, with a total monthly usage of up to tens to hundreds of billions of tokens.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It seems like incredible value, but the devil is in the details.&lt;/p&gt;
&lt;h2 id="key-limitation-concurrency"&gt;Key Limitation: Concurrency
&lt;/h2&gt;&lt;p&gt;Most manufacturers&amp;rsquo; documentation will casually mention: &amp;ldquo;Package usage is subject to concurrency limits (number of in-flight request tasks).&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But what exactly is the limit? Often not explicitly stated. According to community feedback and actual measurements, typical concurrency limits are as follows:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Package&lt;/th&gt;
&lt;th&gt;Concurrency (in-flight requests)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lite&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;~4-5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max&lt;/td&gt;
&lt;td&gt;~7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This number directly determines your actual throughput ceiling.&lt;/p&gt;
&lt;h2 id="math-time-can-the-max-package-use-2400-prompts"&gt;Math Time: Can the Max Package Use 2,400 Prompts?
&lt;/h2&gt;&lt;p&gt;Let&amp;rsquo;s take the highest-tier Max package as an example and do a simple calculation.&lt;/p&gt;
&lt;h3 id="known-conditions"&gt;Known Conditions
&lt;/h3&gt;&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Promised Usage&lt;/strong&gt;: 2,400 prompts every 5 hours&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrency Limit&lt;/strong&gt;: 7&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model calls triggered per prompt&lt;/strong&gt;: 15-20 times (official data)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model generation speed&lt;/strong&gt;: About 50-60 tokens/second&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5 hours = 18,000 seconds&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="calculation-process"&gt;Calculation Process
&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;Step 1: Estimate single API call time&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A complete API call includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Input processing: ~1 second&lt;/li&gt;
&lt;li&gt;Model inference generation (assuming 500 tokens output): 500 ÷ 55 ≈ 9 seconds&lt;/li&gt;
&lt;li&gt;Network round-trip delay: ~1 second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Total: About 10-12 seconds/call&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 2: Calculate maximum calls in 5 hours&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;span class="lnt"&gt;2
&lt;/span&gt;&lt;span class="lnt"&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Maximum calls = Concurrency × (Total time ÷ Single call time)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; = 7 × (18,000 ÷ 10)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; = 12,600 calls
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Step 3: Convert to prompts&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;According to official claims, each prompt triggers 15-20 calls:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Completable prompts = 12,600 ÷ 17.5 ≈ 720 prompts
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;h3 id="conclusion"&gt;Conclusion
&lt;/h3&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Official Promise&lt;/th&gt;
&lt;th&gt;Concurrency Limit&lt;/th&gt;
&lt;th&gt;Achievement Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompts per 5 hours&lt;/td&gt;
&lt;td&gt;2,400&lt;/td&gt;
&lt;td&gt;~720&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Even under ideal conditions, the actual usable amount of the Max package is only about 30% of the promise.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="harsher-reality-call-inflation-in-agent-mode"&gt;Harsher Reality: Call Inflation in Agent Mode
&lt;/h2&gt;&lt;p&gt;The above calculation is still based on the official claim of &amp;ldquo;15-20 calls per prompt.&amp;rdquo; But in actual AI Coding Agent scenarios (like Claude Code, Cline, etc.), the situation is much worse.&lt;/p&gt;
&lt;h3 id="how-agent-mode-works"&gt;How Agent Mode Works
&lt;/h3&gt;&lt;p&gt;When you give an AI programming assistant a task, it typically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Analyzes requirements, creates a plan&lt;/li&gt;
&lt;li&gt;Reads relevant files (each file may trigger a call)&lt;/li&gt;
&lt;li&gt;Writes code&lt;/li&gt;
&lt;li&gt;Runs tests&lt;/li&gt;
&lt;li&gt;Discovers errors, fixes them&lt;/li&gt;
&lt;li&gt;Repeats 3-5 until successful&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A seemingly simple prompt may trigger &lt;strong&gt;50-100+ model calls&lt;/strong&gt; in an Agent loop.&lt;/p&gt;
&lt;h3 id="actual-measurement-case"&gt;Actual Measurement Case
&lt;/h3&gt;&lt;p&gt;User feedback:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;2 simple prompts, 80 seconds, consumed 38M Tokens, used up 97% of the 5-hour limit&amp;rdquo;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Reverse calculation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each prompt consumes about 19M tokens&lt;/li&gt;
&lt;li&gt;If calculated at 128K context, equivalent to &lt;strong&gt;~127 model calls/prompt&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is &lt;strong&gt;6-8 times higher&lt;/strong&gt; than the official &amp;ldquo;15-20 times.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="revised-actual-usable-amount"&gt;Revised Actual Usable Amount
&lt;/h3&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Calls per prompt&lt;/th&gt;
&lt;th&gt;Usable prompts in 5 hours&lt;/th&gt;
&lt;th&gt;Achievement Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Official ideal&lt;/td&gt;
&lt;td&gt;17.5&lt;/td&gt;
&lt;td&gt;720&lt;/td&gt;
&lt;td&gt;30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Light usage&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;252&lt;/td&gt;
&lt;td&gt;10.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Moderate usage&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;168&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heavy Agent usage&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;td&gt;&amp;lt;126&lt;/td&gt;
&lt;td&gt;&amp;lt;5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="why-is-this-happening"&gt;Why Is This Happening?
&lt;/h2&gt;&lt;h3 id="1-token-calculation-includes-context"&gt;1. Token Calculation Includes Context
&lt;/h3&gt;&lt;p&gt;Big model token consumption isn&amp;rsquo;t just output, it includes input. In Coding scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each call must send complete conversation history&lt;/li&gt;
&lt;li&gt;Code project context can easily reach tens of K tokens&lt;/li&gt;
&lt;li&gt;128K context window means each call may consume 100K+ tokens&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-concurrency-is-a-hard-constraint"&gt;2. Concurrency is a Hard Constraint
&lt;/h3&gt;&lt;p&gt;Regardless of how large your package quota is, concurrency determines the maximum throughput per unit time. This is a &lt;strong&gt;physical bottleneck&lt;/strong&gt;, not something commercial strategies can bypass.&lt;/p&gt;
&lt;h3 id="3-promises-based-on-ideal-assumptions"&gt;3. Promises Based on Ideal Assumptions
&lt;/h3&gt;&lt;p&gt;Manufacturers&amp;rsquo; promotional numbers are often based on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each call uses only small context&lt;/li&gt;
&lt;li&gt;Each prompt triggers only a few calls&lt;/li&gt;
&lt;li&gt;Users won&amp;rsquo;t use continuously at high intensity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But these assumptions rarely hold true in real AI Coding scenarios.&lt;/p&gt;
&lt;h2 id="a-table-to-see-the-truth"&gt;A Table to See the Truth
&lt;/h2&gt;&lt;p&gt;Taking the Max package (~200 RMB/month) as an example:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Official Promotion&lt;/th&gt;
&lt;th&gt;Theoretical Limit&lt;/th&gt;
&lt;th&gt;Actual Expectation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompts per 5 hours&lt;/td&gt;
&lt;td&gt;2,400&lt;/td&gt;
&lt;td&gt;720&lt;/td&gt;
&lt;td&gt;150-400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly prompts&lt;/td&gt;
&lt;td&gt;345,600&lt;/td&gt;
&lt;td&gt;103,680&lt;/td&gt;
&lt;td&gt;21,600-57,600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly tokens&lt;/td&gt;
&lt;td&gt;&amp;ldquo;Hundreds of billions&amp;rdquo;&lt;/td&gt;
&lt;td&gt;~10 billion&lt;/td&gt;
&lt;td&gt;1-3 billion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Achievement Rate&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;30%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5-17%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="advice-for-developers"&gt;Advice for Developers
&lt;/h2&gt;&lt;h3 id="1-dont-be-fooled-by-hundreds-of-billions-of-tokens"&gt;1. Don&amp;rsquo;t Be Fooled by &amp;ldquo;Hundreds of Billions of Tokens&amp;rdquo;
&lt;/h3&gt;&lt;p&gt;Token count is a highly misleading metric. In Coding Agent scenarios, context takes up the majority, with truly effective output tokens possibly only 1-5%.&lt;/p&gt;
&lt;h3 id="2-focus-on-concurrency"&gt;2. Focus on Concurrency
&lt;/h3&gt;&lt;p&gt;This is the core metric that determines actual experience. If manufacturers don&amp;rsquo;t disclose concurrency limits, it&amp;rsquo;s likely because the numbers don&amp;rsquo;t look good.&lt;/p&gt;
&lt;h3 id="3-calculate-cost-per-prompt"&gt;3. Calculate Cost per Prompt
&lt;/h3&gt;&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Actual cost per prompt = Monthly fee ÷ Actual usable prompts
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Taking the Max package as an example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Official promotion: 200 ÷ 345,600 = 0.0006 RMB/prompt&lt;/li&gt;
&lt;li&gt;Actual situation: 200 ÷ 30,000 = &lt;strong&gt;0.007 RMB/prompt&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A 10x difference.&lt;/p&gt;
&lt;h3 id="4-consider-pay-as-you-go"&gt;4. Consider Pay-as-You-Go
&lt;/h3&gt;&lt;p&gt;If your usage isn&amp;rsquo;t high, pay-as-you-go may be more cost-effective than monthly packages. At least you won&amp;rsquo;t pay for &amp;ldquo;unusable quotas.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="conclusion-1"&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;The emergence of big model Coding Plan packages is itself a good thing, lowering the barrier for developers to use AI programming assistants. But when choosing packages, be sure to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Require manufacturers to disclose concurrency limits&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Calculate throughput limits yourself&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Don&amp;rsquo;t be misled by the big numbers of &amp;ldquo;hundreds of billions of tokens&amp;rdquo;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After all, &lt;strong&gt;promised usage that can&amp;rsquo;t be consumed equals a disguised price increase.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;This article is based on public information and mathematical derivation; specific values may vary due to manufacturer adjustments. Readers are advised to verify through actual measurements.&lt;/em&gt;&lt;/p&gt;</description></item><item><title>Can GLM 4.6 Be Strengthened Through Spec-Kit</title><link>https://svtter.cn/en/p/can-glm-4.6-be-strengthened-through-spec-kit/</link><pubDate>Fri, 14 Nov 2025 15:41:46 +0800</pubDate><guid>https://svtter.cn/en/p/can-glm-4.6-be-strengthened-through-spec-kit/</guid><description>&lt;img src="https://svtter.cn/p/%E9%80%9A%E8%BF%87-spec-kit-%E5%8A%A0%E5%BC%BA%E5%BC%B1%E6%A8%A1%E5%9E%8B%E7%9A%84%E8%A1%A8%E7%8E%B0/pics/bg.png" alt="Featured image of post Can GLM 4.6 Be Strengthened Through Spec-Kit" /&gt;&lt;blockquote&gt;
&lt;p&gt;Another article on how to mitigate losses with glm4.6. Our old friend glm 4.6. The new friend doubao-seed-code has also arrived.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;a class="link" href="https://github.com/github/spec-kit" target="_blank" rel="noopener"
&gt;github spec-kit&lt;/a&gt; is a coding agent enhancement tool launched by GitHub, aimed at making engineering more standardized and easier.&lt;/p&gt;
&lt;p&gt;I initially looked down on this, thinking I have the claude code max plan, so why bother using it? Then:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://svtter.cn/p/%E9%80%9A%E8%BF%87-spec-kit-%E5%8A%A0%E5%BC%BA%E5%BC%B1%E6%A8%A1%E5%9E%8B%E7%9A%84%E8%A1%A8%E7%8E%B0/pics/limit.png"
width="2118"
height="126"
srcset="https://svtter.cn/p/%E9%80%9A%E8%BF%87-spec-kit-%E5%8A%A0%E5%BC%BA%E5%BC%B1%E6%A8%A1%E5%9E%8B%E7%9A%84%E8%A1%A8%E7%8E%B0/pics/limit_hu_5db449d2dc02deb2.png 480w, https://svtter.cn/p/%E9%80%9A%E8%BF%87-spec-kit-%E5%8A%A0%E5%BC%BA%E5%BC%B1%E6%A8%A1%E5%9E%8B%E7%9A%84%E8%A1%A8%E7%8E%B0/pics/limit_hu_9964d513d07b6656.png 1024w"
loading="lazy"
class="gallery-image"
data-flex-grow="1680"
data-flex-basis="4034px"
&gt;&lt;/p&gt;
&lt;p&gt;This is actually the result of using spec kit, leading to a huge token consumption. Otherwise, based on my usual usage, it should have been just right.&lt;/p&gt;
&lt;p&gt;This means that cheaper models might be more cost-effective to use. Because they are less capable, constraining their behavior with extensive specs might lead to better performance than before.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s try out &lt;a class="link" href="https://github.com/github/spec-kit" target="_blank" rel="noopener"
&gt;spec-kit&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="installation"&gt;Installation
&lt;/h2&gt;&lt;p&gt;For installation, it&amp;rsquo;s recommended to take a dual approach.&lt;/p&gt;
&lt;p&gt;One is to use it directly without worrying too much about installation:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;uvx --from git+https://github.com/github/spec-kit.git specify init . --github-token&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$GITHUB_TOKEN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Here, &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; refers to the GitHub personal token.&lt;/p&gt;
&lt;p&gt;Another method is to install it first and then use it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;pipx install git+https://github.com/github/spec-kit.git
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Each has its pros and cons. The former requires no installation but needs to pull from git each time; the latter requires a one-time installation but involves dependency management.&lt;/p&gt;
&lt;h2 id="specification-driven-development"&gt;Specification Driven Development
&lt;/h2&gt;&lt;p&gt;SDD is a newly emerging concept. It uses extensive constraints to enable coding agents to write production-ready code.&lt;/p&gt;
&lt;p&gt;This article explains it well:&lt;/p&gt;
&lt;a href="https://mp.weixin.qq.com/s/zVvkSCFiknLZcolKjYLoIA" target="_blank" rel="noopener" style="text-decoration:none; display:block; max-width:600px; border: 1px solid #e0e0e0; border-radius:8px; overflow:hidden; color:inherit; font-family:-apple-system, BlinkMacSystemFont, sans-serif; margin:1em 0;"&gt;
&lt;div style="position:relative; padding-top:56.25%; background:#f0f0f0;"&gt;
&lt;!-- Cover image example: recommended to replace with actual cover image URL --&gt;
&lt;img src="pics/sdd.jpg" alt="Article cover" style="position:absolute; top:0; left:0; width:100%; height:100%; object-fit:cover;"&gt;
&lt;/div&gt;
&lt;div style="padding:12px;"&gt;
&lt;h3 style="margin:0 0 8px; font-size:18px; line-height:1.2; color:#000"&gt;
Follow-up on Spec-Driven Development Two Months Later: spec-kit and Ecosystem Development Research
&lt;/h3&gt;
&lt;p style="margin:0 0 10px; color:#555; font-size:14px; line-height:1.4;"&gt;
This article follows up on the rapid development of GitHub's spec-kit project two months after its release, including its community growth, feature iterations, and ecosystem status. It also explores the core concepts of Specification Driven Development (SDD), compares main tools, discusses challenges, and outlines industry trends for 2025, providing developers with practical advice and outlook.
&lt;/p&gt;
&lt;div style="font-size:12px; color:#999;"&gt;
WeChat Official Account · Publication Date
&lt;/div&gt;
&lt;/div&gt;
&lt;/a&gt;
&lt;p&gt;Based on the concept of SDD, I decided to use spec-kit to test the performance of weaker models. The test task was: to add a publish button to hugo-admin.&lt;/p&gt;
&lt;p&gt;Unfortunately, the performance is still not quite up to par. What&amp;rsquo;s rather amusing is the content it output in the middle:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;div class="chroma"&gt;
&lt;table class="lntable"&gt;&lt;tr&gt;&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code&gt;&lt;span class="lnt"&gt; 1
&lt;/span&gt;&lt;span class="lnt"&gt; 2
&lt;/span&gt;&lt;span class="lnt"&gt; 3
&lt;/span&gt;&lt;span class="lnt"&gt; 4
&lt;/span&gt;&lt;span class="lnt"&gt; 5
&lt;/span&gt;&lt;span class="lnt"&gt; 6
&lt;/span&gt;&lt;span class="lnt"&gt; 7
&lt;/span&gt;&lt;span class="lnt"&gt; 8
&lt;/span&gt;&lt;span class="lnt"&gt; 9
&lt;/span&gt;&lt;span class="lnt"&gt;10
&lt;/span&gt;&lt;span class="lnt"&gt;11
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class="lntd"&gt;
&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;● Bash&lt;span class="o"&gt;(&lt;/span&gt;python -c &lt;span class="s2"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; import sys…)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; ⎿ Error: Exit code 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; Traceback (most recent call last):
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; File &amp;#34;&lt;/span&gt;&amp;lt;string&amp;gt;&lt;span class="s2"&gt;&amp;#34;, line 4, in &amp;lt;module&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; File &amp;#34;&lt;/span&gt;/home/svtter/work/blog/hugo-admin/services/post_service.py&lt;span class="s2"&gt;&amp;#34;, line 15, in &amp;lt;module&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; import frontmatter
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; ModuleNotFoundError: No module named &amp;#39;frontmatter&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt; ● The tests show the functionality is implemented correctly but the frontmatter module isn&amp;#39;t installed in the current environment.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s2"&gt;That&amp;#39;s fine for our implementation - the module is specified in requirements.txt and will be available when the application runs.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Then it skipped the tests. I can only say that ChatGLM 4.6 is an AI model that doesn&amp;rsquo;t get bogged down in internal struggles. Here is the &lt;a class="link" href="https://github.com/Svtter/hugo-admin/commit/ad7891e0038f154f37a4b65325746dce1c00d1b4" target="_blank" rel="noopener"
&gt;commit&lt;/a&gt; it submitted.&lt;/p&gt;
&lt;p&gt;Afterwards, I switched to &lt;code&gt;doubao-seed-code&lt;/code&gt; to continue testing other features, but the performance of doubao-seed-code combined with Claude Code wasn&amp;rsquo;t great either. You can check out its &lt;a class="link" href="https://github.com/Svtter/hugo-admin/commit/9acd83940da558c1335f036e2dc475062166869d" target="_blank" rel="noopener"
&gt;commit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the end, I completed the entire functionality using Trae (which does not support spec-kit). The corresponding &lt;a class="link" href="https://github.com/Svtter/hugo-admin/commit/53f895017e9d2b94880f5385562993274076d7a7" target="_blank" rel="noopener"
&gt;commit&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;If you can manually manage the current context and some obvious &amp;ldquo;information the model tends to forget,&amp;rdquo; then you can completely avoid using &lt;a class="link" href="https://github.com/github/spec-kit" target="_blank" rel="noopener"
&gt;spec-kit&lt;/a&gt; when working with Claude Code. This thing is a token hog—it essentially uses a sledgehammer to crack a nut.&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://github.com/github/spec-kit" target="_blank" rel="noopener"
&gt;spec-kit&lt;/a&gt; does not support Trae, and Trae doesn&amp;rsquo;t need that support to perform well.&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Why Agent</title><link>https://svtter.cn/en/p/why-agent/</link><pubDate>Tue, 30 Sep 2025 11:54:06 +0800</pubDate><guid>https://svtter.cn/en/p/why-agent/</guid><description>&lt;img src="https://svtter.cn/p/why-agent/pics/why-agent-background.svg" alt="Featured image of post Why Agent" /&gt;&lt;p&gt;I&amp;rsquo;ve always had a question: Why do we need agent frameworks? Aren&amp;rsquo;t large models enough on their own? This article reflects my current understanding of the subject.&lt;/p&gt;
&lt;p&gt;After using several tools extensively and participating in multiple agent projects recently, I&amp;rsquo;ve reached some conclusions.&lt;/p&gt;
&lt;h2 id="the-limitations-of-llms"&gt;The Limitations of LLMs
&lt;/h2&gt;&lt;p&gt;The primary reason for using agents is the inherent limitations of LLMs.&lt;/p&gt;
&lt;p&gt;First and foremost is the &lt;strong&gt;context window&lt;/strong&gt;, as explicitly mentioned in &lt;a class="link" href="https://docs.langchain.com/oss/python/deepagents/subagents#why-use-subagents%3F" target="_blank" rel="noopener"
&gt;langchain/subagent&lt;/a&gt;. Although many modern models have significantly expanded context windows (GPT-4 Turbo 128K, Claude-3.5 Sonnet 200K, Gemini-1.5 Pro up to 2M), they are still insufficient for truly complex tasks. For example, processing a massive codebase or analyzing hundreds of documents quickly exhausts these limits. Furthermore, processing extremely long contexts is both expensive and slow.&lt;/p&gt;
&lt;p&gt;Beyond context, there are other capability gaps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Vision Capabilities&lt;/strong&gt;: While modern VLMs (Vision Language Models) are powerful, traditional CV (Computer Vision) models often perform better in specific scenarios. Additionally, some models (like DeepSeek-V3) don&amp;rsquo;t have native vision capabilities.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource Access&lt;/strong&gt;: LLMs cannot directly interact with databases, file systems, or network services.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Specialized Tools&lt;/strong&gt;: Tools for code execution, complex mathematics, or data analysis require protocols like MCP to be accessible to an LLM.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="what-agents-can-do"&gt;What Agents Can Do
&lt;/h2&gt;&lt;p&gt;Beyond addressing the limitations above, here are some practical ways agents add value.&lt;/p&gt;
&lt;h3 id="domain-specific-text-processing"&gt;Domain-Specific Text Processing
&lt;/h3&gt;&lt;p&gt;Agents can process different text segments (contexts) independently.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Context Optimization&lt;/strong&gt;: Agents can compress or selectively provide context, effectively extending the usable context window.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance Gains&lt;/strong&gt;: An LLM within an agent can focus on a single, specific task, leading to better performance. When given too much text, LLMs often struggle to identify key information; smaller, targeted context makes this much easier.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Specialized Knowledge&lt;/strong&gt;: LLMs are trained on general data. To make an agent a domain expert, we can inject specific knowledge directly into its context.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="visual-capability-integration"&gt;Visual Capability Integration
&lt;/h3&gt;&lt;p&gt;Through agents, we can integrate traditional vision models to handle tasks that LLMs struggle with. For example, using an MCP (Model Context Protocol) to bridge an agent with vision capabilities.&lt;/p&gt;
&lt;p&gt;A notable example is &lt;a class="link" href="https://docs.bigmodel.cn/cn/coding-plan/mcp/vision-mcp-server" target="_blank" rel="noopener"
&gt;Zhipu&amp;rsquo;s Vision MCP&lt;/a&gt;. Using this MCP in conjunction with an agent significantly enhances visual processing power. This highlights the value of MCP servers that integrate specialized services.&lt;/p&gt;
&lt;h2 id="further-reading"&gt;Further Reading
&lt;/h2&gt;&lt;blockquote class="twitter-tweet"&gt;&lt;p lang="zh" dir="ltr"&gt;大家经常聊的 Agent，很多时候其实只是一个 Workflow。这两个概念混用，会导致产品设计和技术选型上走很多弯路。&lt;br&gt;&lt;br&gt;Anthropic 给了一个很清晰的划分，核心区别在于：&lt;br&gt;系统执行任务时，是由代码预设路径（Code-Driven），还是由LLM自己动态决定下一步（LLM-Driven）。前者是 Workflow，后者才是…&lt;/p&gt;&amp;mdash; 一泽Eze (@eze_is_1) &lt;a href="https://twitter.com/eze_is_1/status/1982740850070425826?ref_src=twsrc%5Etfw"&gt;October 27, 2025&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src="https://platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;
&lt;p&gt;Agents and workflows allow LLMs to use tools. While the input and output remain text, the nature of what that text represents has changed. The creator of the text is no longer necessarily a human.&lt;/p&gt;
&lt;h2 id="agent-frameworks"&gt;Agent Frameworks
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="https://ai.pydantic.dev/" target="_blank" rel="noopener"
&gt;Pydantic AI&lt;/a&gt;: I find this particularly useful because it integrates Pydantic models into the agent framework, making it much easier to debug. I&amp;rsquo;ve tested its integration with &lt;a class="link" href="https://ai.pydantic.dev/" target="_blank" rel="noopener"
&gt;Qwen3&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="https://www.langchain.com/" target="_blank" rel="noopener"
&gt;LangChain&lt;/a&gt;: I haven&amp;rsquo;t used this in production, only for basic debugging. The API changes frequently, which can be challenging. One minor issue is prompt handling; &lt;a class="link" href="https://svtter.cn/p/string-template-in-prompt.md/" &gt;I used Jinja to solve this&lt;/a&gt;. Alternatively, the &amp;ldquo;LangChain way&amp;rdquo; involves using &lt;a class="link" href="https://python.langchain.com/docs/concepts/prompt_templates/#string-prompttemplates" target="_blank" rel="noopener"
&gt;PromptTemplates&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>