[{"content":"MiniMax released M3 on 2026-06-01 (minimax/minimax-m3-20260531 on OpenRouter), but the upstream models.json shipped by @oh-my-pi/pi-ai@15.7.3 hadn\u0026rsquo;t been updated to include it. This post documents the patch I applied to add M3 support across all five provider endpoints.\nTarget File 1 ~/.bun/install/global/node_modules/@oh-my-pi/pi-ai/src/models.json Provider Entries Added (5) All entries are appended at the end of their respective provider object, mirroring the structure of the existing MiniMax-M2.7 entry.\n1. minimax (Official Anthropic-compatible API) key: MiniMax-M3 api: anthropic-messages baseUrl: https://api.minimax.io/anthropic contextWindow: 204800, maxTokens: 131072 cost: input 0.3, output 1.2, cacheRead 0.06, cacheWrite 0.375 thinking: budget mode, minimal..xhigh 2. minimax-cn (Official Anthropic-compatible API, China) key: MiniMax-M3 api: anthropic-messages baseUrl: https://api.minimaxi.com/anthropic Same context/cost/thinking as minimax 3. minimax-code (Coding Plan, OpenAI-compatible) key: MiniMax-M3 api: openai-completions baseUrl: https://api.minimax.io/v1 cost: all 0 (Coding Plan flat-rate) compat: supportsStore=false, supportsDeveloperRole=false, supportsReasoningEffort=false, reasoningContentField=reasoning_content thinking: effort mode, minimal..high 4. minimax-code-cn (Coding Plan CN) Mirror of minimax-code with baseUrl: https://api.minimaxi.com/v1 and provider minimax-code-cn.\n5. openrouter (OpenRouter Passthrough) key: minimax/minimax-m3-20260531 api: openai-completions baseUrl: https://openrouter.ai/api/v1 cost: input 0.3, output 1.2, cacheRead 0.05, cacheWrite 0 thinking: effort mode, minimal..high Verification Searching for \u0026quot;MiniMax-M3|minimax-m3\u0026quot; in the patched file returns exactly 5 hits — one per provider block.\nCaveats omp update will overwrite the patch. Re-apply after updates, or pin the package version. If upstream later ships an official M3 entry, our local copy may diverge (custom pricing/context) until the next update. Pricing values for M3 were inferred from the M2.7 template and the OpenRouter listing ($0.30 / $1.20). Confirm against the official MiniMax pricing page if cost accuracy matters. Context window (204800) and maxTokens (131072) mirror M2.7 — adjust if M3 differs at GA. Addendum (2026-06-02): The proper route via OMP user config The pi-ai patch above is a hack — any omp update re-pulls the package and the patch is gone. The proper OMP way is ~/.omp/agent/models.yml: a user-level file that OMP merges on top of the built-in catalog, with no bun-global dependency, and which omp update leaves alone.\nFinal config Append to ~/.omp/agent/models.yml:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # MiniMax M3 Code Plan # Set MINIMAX_API_KEY in ~/.zshenv first minimax: baseUrl: https://api.minimaxi.com/anthropic apiKey: MINIMAX_API_KEY api: anthropic-messages authHeader: true disableStrictTools: true models: - id: MiniMax-M3 name: MiniMax M3 reasoning: true input: [text, image] contextWindow: 1000000 maxTokens: 16384 cost: input: 0 output: 0 cacheRead: 0 cacheWrite: 0 apiKey: MINIMAX_API_KEY follows OMP\u0026rsquo;s resolution rule: try the value as an env-var name first, then fall back to a literal. I export MINIMAX_API_KEY=$MINIMAX_CODE_PLAN_KEY in ~/.zshenv, so the key is sourced at runtime and the dotfile stays clean for git.\nKey choices Why anthropic-messages, not openai-completions: M3 speaks both protocols. The openai-completions route had two friction points:\nOMP\u0026rsquo;s openai-completions transport emits developer role + reasoning_effort for reasoning models. MiniMax\u0026rsquo;s schema check is stricter than OpenAI\u0026rsquo;s, and an empty reasoning field occasionally 400s After switching to anthropic-messages, tool calls and streaming reasoning go through the Anthropic SDK normalization path — same as kimi and claude Why disableStrictTools: true: The Anthropic SDK sends strict: true on every tool definition by default. Third-party Anthropic-fronted gateways (MiniMax, kimi, etc.) usually don\u0026rsquo;t recognize the field and 400. The kimi provider in the same file already sets this flag. The trade-off is that tool schemas are not server-side validated, so prompts have to carry the schema discipline.\nContext 1M / maxTokens 16K: contextWindow: 1000000 matches OpenRouter\u0026rsquo;s spec for minimax/minimax-m3-20260531 (M2.7 was 204800, M3 is 5× that). maxTokens: 16384 carries over from M2.7 — I couldn\u0026rsquo;t find an official M3 number. cost is all zero because the Code Plan is flat-rate.\nSwitching to it 1 2 3 4 5 # At launch omp --model minimax/MiniMax-M3 # Or in the TUI /model minimax/MiniMax-M3 After the switch, /status should show ANTHROPIC_BASE_URL pointing at api.minimaxi.com/anthropic.\nHow the two routes compose Dimension pi-ai bundled models.json patch models.yml custom provider Persistence omp update wipes it Persistent Cross-machine sync No (bun-global path) Yes (dotfile in git) Upgrade cost Re-apply patch OMP merges automatically Merge with built-in Yes Yes, last-write-wins The two compose. models.yml providers enter through OMP\u0026rsquo;s \u0026ldquo;custom\u0026rdquo; channel; whatever pi-ai later ships in its bundled list (if M3 lands upstream) enters through the \u0026ldquo;built-in\u0026rdquo; channel. When both define the same provider/model with different baseUrl, OMP\u0026rsquo;s last-write-wins rule means models.yml always wins — which is exactly what you want for a CN endpoint override.\n","date":"2026-06-01T12:00:00+08:00","image":"https://svtter.cn/p/omp-m3-%E6%A8%A1%E5%9E%8B%E8%A1%A5%E4%B8%81%E4%B8%BA-pi-ai-%E6%B7%BB%E5%8A%A0-minimax-m3-%E6%94%AF%E6%8C%81/cover_hu_39a79487ffa929b8.png","permalink":"https://svtter.cn/en/p/omp-m3-model-patch-adding-minimax-m3-to-pi-ai/","title":"OMP M3 Model Patch: Adding MiniMax M3 to pi-ai"},{"content":"Recently, kimi-code migrated from Python to TypeScript. Here\u0026rsquo;s a quick analysis.\nBased on my review of the kimi-code source code (particularly packages/kosong/src/providers/kimi.ts, kimi-schema.ts, kimi-files.ts, etc.) and relevant OpenCode compatibility issues, here are the kimi-k2.6-specific optimizations in kimi-code and how they differ from OpenCode.\n1. Native Kimi Provider (Not a Generic OpenAI-compatible Layer) kimi-code does not treat Kimi as \u0026ldquo;just another OpenAI-compatible endpoint.\u0026rdquo; Instead, it implements a dedicated kimi provider type:\nFeature kimi-code OpenCode Provider Type Dedicated 'kimi' type with independent adapter Accessed via generic OpenAI/Anthropic bridge Proprietary Fields Native handling of reasoning_content, thinking, generationKwargs reasoning_content often lost in the bridge layer Auth Headers Supports kimiRequestHeaders, X-Msh-Tool-Call-Id, and other Moonshot-specific headers Generic header forwarding 2. Full Lifecycle Handling of reasoning_content kimi-k2.6 has thinking enabled by default and requires reasoning_content to be preserved across multi-turn conversation history. Otherwise, tool calls will result in a 400 error.\nHow kimi-code handles it:\nconvertMessage: Extracts internal think content parts and serializes them into the reasoning_content field, ensuring thinking content is never lost in message history Streaming Parser: Explicitly extracts delta.reasoning_content / message.reasoning_content in both _convertStreamResponse and _convertNonStreamResponse TUI Rendering: A dedicated ThinkingComponent renders thinking content in real time, with expand/collapse support and a spinner animation OpenCode\u0026rsquo;s Problem:\nThe OpenCode Go bridge drops reasoning_content on the second turn, causing the Moonshot API to return:\n1 thinking is enabled but reasoning_content is missing in assistant tool call message 3. JSON Schema Normalization (kimi-schema.ts) Moonshot\u0026rsquo;s tool parameter validator has strict and unique requirements for JSON Schema. This is one of the primary sources of incompatibility between OpenCode and kimi-k2.6.\nWhat kimi-code\u0026rsquo;s normalizeKimiToolSchema does:\nDereferences $ref: Inlines definitions from $defs / definitions, eliminating external references Fills in missing type: The Kimi validator rejects nested property schemas that omit type (e.g., MCP-generated enum-only schemas). kimi-code infers and backfills type: string/object/array, etc. Circular reference detection: Preserves the original $ref when a circular reference is detected, avoiding infinite recursion OpenCode\u0026rsquo;s Problem:\nGenerated schemas use #/definitions/ instead of the #/$defs/ format required by Moonshot, and lack schema type inference and backfilling for Kimi, causing complex tool calls to fail with 400.\n4. Native Thinking Mode Configuration System kimi-code has built-in support for Kimi\u0026rsquo;s thinking mode from the configuration layer all the way to the UI:\nConfig Parsing: ThinkingConfigSchema supports mode: auto/on/off and effort: low/medium/high/xhigh/max\nModel Capability Tags: ModelAlias supports capabilities: ['thinking', 'always_thinking']\nModel Selector UI: Press ←→ to toggle thinking on/off; always-on models cannot be turned off\nProvider Method: withThinking(effort) correctly generates:\n1 2 3 4 { \u0026#34;reasoning_effort\u0026#34;: \u0026#34;high\u0026#34;, \u0026#34;extra_body\u0026#34;: { \u0026#34;thinking\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;enabled\u0026#34; } } } Token Budget: Automatically normalizes legacy max_tokens to Kimi\u0026rsquo;s preferred max_completion_tokens\nOpenCode\u0026rsquo;s Problem:\nWhen using the Anthropic bridge, it hardcodes thinking content blocks, but the Kimi API only supports text/image_url/video_url/video, resulting in:\n1 Invalid value: thinking. Supported values are: \u0026#39;text\u0026#39;,\u0026#39;image_url\u0026#39;,\u0026#39;video_url\u0026#39; and \u0026#39;video\u0026#39;. 5. Native Moonshot Service Integration kimi-code includes Moonshot-exclusive services instead of relying on generic local implementations:\nMoonshotFetchURLProvider: Prioritizes Moonshot\u0026rsquo;s coding-fetch service (with built-in page text extraction), falling back to local fetch only on failure MoonshotWebSearchProvider: Calls the Moonshot search API directly, supporting enable_page_crawling KimiFiles: Uploads videos to the Moonshot file service, returning video_url in the ms://\u0026lt;file-id\u0026gt; format 6. Tool Call Layer Details Built-in Functions: Tool names starting with $ are recognized as Kimi builtin functions and serialized as type: 'builtin_function' Usage Extraction: Supports Moonshot\u0026rsquo;s proprietary choices[0].usage placement, as well as cached_tokens and other fields Finish Reason Mapping: Maps OpenAI-style stop/tool_calls/length values to an internal unified enum 7. CLI Core and LLM SDK Architectural Isolation This is an easily overlooked but important architectural difference.\nThe core CLI of kimi-code (apps/kimi-code) does not directly depend on any OpenAI or Anthropic TypeScript SDK. Looking at its package.json, the core dependencies are only generic libraries like TUI rendering (pi-tui), CLI parsing (commander), and syntax highlighting (cli-highlight). All LLM provider interactions are isolated within the self-developed kosong package.\nWhile packages/kosong internally uses openai and @anthropic-ai/sdk as implementation details (since the Kimi API is OpenAI-compatible), it exposes a unified LLM abstraction interface to the outside. The CLI core only depends on kosong and has no awareness of underlying vendor SDKs.\nOpenCode is different. Its packages/opencode core package directly depends on a large number of vendor SDKs:\n@ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google @ai-sdk/azure @openrouter/ai-sdk-provider \u0026hellip; (more than a dozen provider-specific packages in total) This means OpenCode\u0026rsquo;s core code is deeply coupled with each vendor\u0026rsquo;s SDK, while kimi-code\u0026rsquo;s core CLI stays clean, with all model interactions fully isolated through a self-developed abstraction layer.\n8. What Commit History Reveals About Evolution Paths The structural code differences above are just a static snapshot. What\u0026rsquo;s more interesting is comparing the commit histories of the two projects—their dynamic evolution directions are completely different.\nkimi-code: Native Design, Continuously Reducing Configuration Burden 842e699 — \u0026ldquo;Kimi For Coding\u0026rdquo; (Initial Commit)\nThis was the starting point of the entire project. The initial code already included:\npackages/kosong/src/providers/kimi.ts: Dedicated Kimi provider packages/kosong/src/providers/kimi-schema.ts: Dedicated JSON Schema normalizer packages/kosong/src/providers/kimi-files.ts: Dedicated file upload service Conclusion: kimi-code treated the Kimi API as a first-class citizen from day one, not as a later patch.\nd95b013 fix(catalog): preserve reasoning fields in custom model (#70)\nThis commit fixed a very subtle issue. models.dev uses the interleaved field to mark reasoning support, but early code treated interleaved=true as undefined, causing models selected via /connect to silently lose their reasoning capability.\nFixes:\ninterleaved=true is mapped to the default reasoning_content interleaved is added to the update-catalog.mjs allowlist; otherwise the offline catalog in release builds would silently drop the field again 61f7d0e fix(kosong): make openai-compatible thinking work without reasoning_key (#78)\nThis is the core commit for reasoning handling, showcasing kimi-code\u0026rsquo;s deep thinking on compatibility. The diff reveals a three-layer design:\nInbound Auto-Scan (response parsing)\n1 2 const KNOWN_REASONING_KEYS = [\u0026#39;reasoning_content\u0026#39;, \u0026#39;reasoning_details\u0026#39;, \u0026#39;reasoning\u0026#39;] as const; // Auto-scan three fields; first string value wins Outbound Default Write-Back (request serialization)\n1 2 const DEFAULT_OUTBOUND_REASONING_KEY = KNOWN_REASONING_KEYS[0]; // \u0026#39;reasoning_content\u0026#39; // Defaults to writing back as reasoning_content, no user config needed Auto-Inject reasoning_effort (historical continuity)\n1 2 // When history contains ThinkPart but caller hasn\u0026#39;t explicitly set reasoning_effort, // auto-inject \u0026#39;medium\u0026#39; to prevent strict gateways like One API / DeepSeek from returning 400 Edge cases are handled meticulously: blank reasoning_key (\u0026quot;\u0026quot;) is normalized to undefined; values explicitly set by the caller via withGenerationKwargs are not silently overwritten by auto-injection.\nThe verification goal explicitly states:\nManually verified end-to-end against the real DeepSeek API with a hand-written config.toml that does not set reasoning_key: thinking content renders, no 400, multi-turn conversations work.\nOpenCode: Generic Layer Design, OpenAI-centric eb84f46 fix(llm): split OpenAI reasoning summary blocks (#29000)\nThis commit demonstrates OpenCode\u0026rsquo;s completely different approach to reasoning—designed around the OpenAI Responses API:\nMaintains a state machine for encrypted_content and item_reference Folds multiple summary parts by item_id + summary_index When store:false, filters out reasoning items lacking encrypted_content This is completely different from Kimi\u0026rsquo;s reasoning_content mechanism. Kimi does not need encrypted_content or item_reference; it simply attaches a reasoning_content field to the message.\nA Hard Fact OpenCode Issue #26331 \u0026ldquo;Bug: OpenCode Go bridge layer incompatible with kimi-k2.6 tool calls\u0026rdquo; — Status: still open OpenCode Issue #27054 \u0026ldquo;KIMI K2.6 showing error in Opencode GO\u0026rdquo; — Status: closed, but the resolution was to disable MCP (a workaround) The last comment on #27054:\nThe workaround is to disable your MCP and then initiate the session\nThat\u0026rsquo;s not a fix. That\u0026rsquo;s avoiding the problem.\nCommit History Comparison Summary Dimension kimi-code OpenCode Initial Design Initial commit includes full Kimi provider + schema normalizer + file service Generic multi-model architecture, adapted later via bridge Reasoning Mechanism Designed around reasoning_content field, with auto-scan / write-back / effort injection Designed around OpenAI Responses\u0026rsquo; encrypted_content + item_reference Schema Handling Dedicated normalizeKimiToolSchema, dereferences $ref + backfills type Generic schema validation, focused on friendly error messages Config Philosophy Makes OpenAI-compatible gateways \u0026ldquo;zero-config\u0026rdquo; by auto-inferring all fields Relies on users manually adapting via bridge/config Issue Status Continuously shipping reasoning-related patches (#70, #78) kimi-k2.6 compatibility issue #26331 still open Summary: Core Differences Dimension kimi-code OpenCode Architecture Positioning Native design for Kimi/Moonshot, dedicated provider Generic multi-model agent, adapted via bridge Thinking/Reasoning Native support, full lifecycle preservation of reasoning_content Easily lost in bridge layer, causing 400 errors JSON Schema Dedicated normalizeKimiToolSchema for dereferencing and type backfilling Generic schema generation, does not meet Kimi validator requirements API Format Directly generates Moonshot-native format (including thinking config, $defs normalization, etc.) Transformed through OpenAI/Anthropic protocol conversion, causing format mismatches Service Integration Built-in Moonshot fetch/search/file services Uses generic local tools Core Dependencies CLI core does not directly depend on vendor SDKs; isolated via self-developed kosong package Core package directly coupled with @ai-sdk/openai and more than a dozen other vendor SDKs Looking at commit history, kimi-code\u0026rsquo;s evolution is directed at continuously eliminating user configuration burden (reasoning_key went from required → optional override → auto-inferred; interleaved went from filtered → correctly mapped), while OpenCode\u0026rsquo;s evolution is directed at deepening OpenAI ecosystem integration (Responses API, encrypted reasoning, item reference), leaving Kimi adaptation stuck at the generic bridge layer.\nThat\u0026rsquo;s the truth at the commit level: one is native evolution, the other is a bridge gap.\n","date":"2026-05-27T10:30:00+08:00","image":"https://svtter.cn/p/kimi-code-%E5%AF%B9-kimi-k2.6-%E7%9A%84%E4%B8%93%E7%94%A8%E5%A4%84%E7%90%86%E4%B8%8E-opencode-%E7%9A%84%E5%AF%B9%E6%AF%94/featured-image_hu_7a0d1a4754b82ff9.png","permalink":"https://svtter.cn/en/p/how-kimi-code-handles-kimi-k2.6-a-comparison-with-opencode/","title":"How kimi-code Handles kimi-k2.6: A Comparison with OpenCode"},{"content":"Yesterday at the Oh My Pi (OMP) repository, I experienced something shocking: an AI bot didn\u0026rsquo;t just reply to my issue—it understood the problem, dug through the source code on its own, and opened a precise PR to fix the bug. The entire process took less than 5 minutes.\nThe Origin When using OMP (a terminal AI coding agent), I discovered a UX issue: Ctrl+T can hide thinking blocks, but hiding them simultaneously turns off extended thinking entirely—not just hiding the display, but the model stops thinking altogether. Users assume they\u0026rsquo;re just \u0026ldquo;turning off the display,\u0026rdquo; but the actual effect is \u0026ldquo;turning off the brain.\u0026rdquo;\nSo I went to the OMP GitHub repository and opened a feature request: #1313.\nRoboOmp\u0026rsquo;s First Response Seconds after I submitted the issue, a bot called roboomp automatically replied. Not with template nonsense like \u0026ldquo;thanks for your feedback, forwarded to the product team.\u0026rdquo; It directly told me:\nMost of this feature already exists—the hideThinkingBlock setting, Ctrl+T shortcut, and rendering path The only missing piece is a CLI startup parameter There\u0026rsquo;s a design decision that requires maintainer input: the coupling between hideThinkingBlock and hideThinkingSummary And it provided exact filenames and line numbers: settings-schema.ts:663, input-controller.ts:755, stream.ts:583,697.\nThis wasn\u0026rsquo;t cobbled together from search results—it actually read the code.\nI Pointed Out the Design Flaw I replied with a comment explaining that this coupling is a footgun:\nUsers press Ctrl+T intending to reduce visual noise, but unknowingly turn off extended thinking, degrading model output quality \u0026ldquo;Don\u0026rsquo;t want to see the reasoning process\u0026rdquo; and \u0026ldquo;don\u0026rsquo;t want the model to reason\u0026rdquo; are two different things that shouldn\u0026rsquo;t be tied together The behavior varies across providers (MiniMax can\u0026rsquo;t turn it off, Anthropic/OpenAI can), so the same shortcut has inconsistent behavior I also included the commit history that introduced this coupling for easier tracing.\nIt Opened a PR Itself Then something unbelievable happened—roboomp replied with two consecutive comments and directly opened a PR: #1314.\nThe PR changes: 0 addition, 3 deletion. It only deleted three lines:\nsdk.ts:1860 — agent initialization no longer assigns hideThinkingBlock to hideThinkingSummary input-controller.ts:758 — Ctrl+T handler no longer links them selector-controller.ts:273 — settings UI follows the same logic The PR description included complete repro steps, root cause analysis, and fix approach. It even confirmed the commit archaeology I provided—45bd444 was indeed the commit that introduced this bug.\nWhy This Shocked Me \u0026ldquo;AI can write code\u0026rdquo; isn\u0026rsquo;t news. Copilot, Claude Code, Cursor can all write code. But what\u0026rsquo;s different this time:\nComplete Closed Loop The entire process was zero-human:\nI opened an issue → bot read the codebase, provided existing implementation status I pointed out the design flaw → bot understood my point It located the commit that introduced the bug itself, opened a PR that deletes just 3 lines From issue to PR, no human did anything in between.\nIt Knows When to Wait In its first reply, it said \u0026ldquo;Holding on implementation until a maintainer weighs in on the coupling question\u0026rdquo;—it knew this was a design decision requiring judgment and shouldn\u0026rsquo;t act autonomously. But when I clarified the coupling problem, it determined that waiting was no longer necessary and opened a PR directly.\nThe Fix Was Minimal 0 addition / 3 deletion. It understood what the minimal fix was—no refactoring, no abstraction, no gold-plating. Many human developers can\u0026rsquo;t do this.\nWhat Is RoboOmp RoboOmp is an AI bot deployed by can1357, the OMP repository maintainer. It\u0026rsquo;s not a GitHub Actions workflow (I checked the CI config to confirm), but an independent server-side agent:\nListens to GitHub Webhook events (issue creation, comments, etc.) Reads source code through GitHub API, understands code structure Uses LLM to analyze context, autonomously decides next steps—comment, label, open PR From can1357\u0026rsquo;s GitHub profile, this person comes from a hypervisor/reverse engineering background (ByePg, NoVmp, NtRays), now working on AI agent platforms (agentx, hindsight). RoboOmp is likely the result of building exceptionally deep code understanding capabilities.\nThis project is not open source.\nAre There Similar Open Source Projects I looked around, and currently the closest ones are:\nProject Description optio (962⭐) AI coding agent workflow orchestration, task → merged PR claude-code-github-agent Hooks 40+ GitHub events, auto triage/review/fix, architecture most similar to roboomp software-factory Issue/PR-driven automatic development system But honestly, none reach roboomp\u0026rsquo;s level. Most are still at the \u0026ldquo;receive webhook → call LLM → post comment\u0026rdquo; stage. RoboOmp is the first I\u0026rsquo;ve seen that can autonomously read source code, understand code structure, participate in design discussions, and make precise fixes.\nWhat This Means This made me realize that the capability boundaries of AI coding agents are expanding rapidly. A year ago we were discussing \u0026ldquo;can AI write correct code,\u0026rdquo; now the question is \u0026ldquo;can AI be a maintainer in open source communities.\u0026rdquo;\nThe capabilities roboomp demonstrated—reading code, understanding context, participating in discussions, making minimal fixes—are essentially what a junior maintainer does. If this capability continues to improve, the maintenance model of open source projects could undergo fundamental changes.\nThink about it: what does an open source maintainer spend the most time on every day? Replying to issues, triaging bugs, writing small fixes. These are exactly what roboomp excels at. If every open source project could deploy such a bot, maintainers could focus their time on architectural decisions and community building.\nOf course, current limitations are obvious—it can only handle problems with clear boundaries and well-defined scope. But this experience makes me believe that \u0026ldquo;AI maintainer\u0026rdquo; is not a distant future scenario, but something happening right now.\n","date":"2026-05-23T18:00:00+08:00","image":"https://svtter.cn/p/%E6%88%91%E5%9C%A8-github-%E4%B8%8A%E9%81%87%E5%88%B0%E4%B8%80%E4%B8%AA-ai-bot%E5%AE%83%E8%AF%BB%E4%BA%86%E6%88%91%E7%9A%84-issue%E7%90%86%E8%A7%A3%E4%BA%86%E9%97%AE%E9%A2%98%E7%84%B6%E5%90%8E%E8%87%AA%E5%B7%B1%E6%8F%90%E4%BA%86%E4%B8%AA-pr/cover_hu_d3bff012e1170a25.jpg","permalink":"https://svtter.cn/en/p/roboomp-an-ai-bot-that-creates-its-own-pull-requests/","title":"RoboOmp: An AI Bot That Creates Its Own Pull Requests"},{"content":"opencode is a 160k-star AI coding tool with 27 workflow files in its .github/workflows/ directory. This number is not uncommon for open source projects, but what\u0026rsquo;s truly interesting is not the quantity, but the scope these workflows cover: from conventional CI/CD to AI-driven community governance, they\u0026rsquo;ve done almost everything GitHub Actions can do.\nThis article analyzes the design of these workflows by category, discusses the pros and cons of this level of automation, and shares insights for our own projects.\nOverview The 27 workflows can be divided into four categories:\nCategory Count Purpose CI/Testing 4 typecheck, unit tests, e2e, Nix builds Release/Delivery 5 CLI release, container builds, VS Code extension, GitHub Action release Automation/Bot 16 issue governance, PR compliance, AI code review, documentation updates Docs/Other 2 statistics, Discord notifications 16 automation workflows account for 60% of the total. opencode doesn\u0026rsquo;t just use Actions to run tests and releases—it also entrusts community governance and code quality review to the automation system.\nCI/Testing: Solid but Restrained Four testing-related workflows:\ntypecheck.yml — Runs bun typecheck on PR and push to dev. Simple and direct, no unnecessary actions.\ntest.yml — Cross-platform test matrix (Linux + Windows), runs unit tests and Playwright e2e. Has concurrency control where new commits in the same PR cancel old runs. Test results generate JUnit reports uploaded as artifacts.\nnix-eval.yml — Verifies Nix flake builds on four architectures (x86_64-linux, aarch64-linux, x86_64-darwin, aarch64-darwin). Mandatory package failures block the build, optional package failures are just warnings.\nstorybook.yml — Storybook builds for UI components, only triggered when storybook/ui-related files change. Path triggering avoids unnecessary runs.\nSeveral noteworthy design choices:\nconcurrency group + cancel-in-progress: Multiple workflows use this pattern so the same PR doesn\u0026rsquo;t stack multiple runs. For a project receiving lots of community PRs, this saves significant CI resources. Path triggering: containers.yml only runs when container files change, storybook.yml only runs when UI changes. Not everything runs on all commits. Mixed Runner Strategy: Most workflows use Blacksmith\u0026rsquo;s third-party hosted runners (blacksmith-4vcpu-ubuntu-2404, blacksmith-4vcpu-windows-2025). Blacksmith is a GitHub Actions API-compatible accelerated runner service using custom infrastructure, significantly faster than GitHub\u0026rsquo;s free runners. Only lightweight bot tasks (close-issues, close-prs, compliance-close, pr-standards, deploy) stay on GitHub\u0026rsquo;s native ubuntu-latest. Compute-intensive compilation, testing, and releases all go through Blacksmith, simple script tasks use GitHub\u0026rsquo;s native runners, allocating resources by task load. Release/Delivery: Full Platform Coverage publish.yml is the most complex workflow, handling the complete release process in a single file:\nVersion number calculation CLI build matrix (multi-platform, multi-architecture) Windows code signing (Azure Signing) macOS code signing (Apple Developer) Electron app builds npm publishing GitHub Release creation AUR (Arch Linux) publishing One workflow covers distribution for CLI, desktop apps, npm packages, and Linux packages. This \u0026ldquo;release everywhere at once\u0026rdquo; pattern is user-friendly—regardless of platform, everyone gets the new version on the same day.\nOther release workflows are split by artifact type:\npublish-github-action.yml — Listens for github-v* tags, publishes GitHub Action to Marketplace publish-vscode.yml — Listens for vscode-v* tags, publishes to both VS Code Marketplace and Open VSX containers.yml — Multi-architecture container image builds, pushes to GHCR release-github-action.yml — Creates pre-releases when github directory changes on dev branch Tag triggering is a good practice: releases are explicit actions, not triggered by accidental code pushes. publish.yml automatically builds snapshots when pushing to ci/dev/beta/fix branches, but official releases require manual dispatch or tags.\nAutomation/Bot: AI-Driven Community Governance This is opencode\u0026rsquo;s most distinctive feature. Among the 16 automation workflows, multiple directly call upon opencode\u0026rsquo;s own AI capabilities to handle community affairs.\nIssue Management triage.yml — When a new issue is created, opencode AI automatically triages it, adding labels and categories.\nduplicate-issues.yml — When a new issue is created/edited, opencode AI analyzes whether it duplicates existing issues. Also checks whether it follows one of three issue templates and whether it contains AI-generated content. Non-compliant issues get a needs:compliance label.\ncompliance-close.yml — Every 30 minutes, checks issues/PRs with needs:compliance label and auto-closes if not fixed within 2 hours. Different prompt messages are given for issues vs PRs when closing.\nclose-issues.yml — Closes stale issues daily at 2 AM UTC.\nThese four layers form complete issue lifecycle management:\n1 New issue → AI triage → duplicate/compliance check → compliance grace period → stale cleanup PR Management pr-standards.yml is one of the longest workflows, doing two things:\nTitle format check: Enforces conventional commits format (feat/fix/refactor/\u0026hellip;) Template compliance check: PR description must include required sections like issue references, change type, verification method Non-compliant PRs get a needs:compliance label and auto-close after 2 hours. Team members and bots are exempt.\npr-management.yml — Checks for duplicates when PR is created, adds labels for community contributors.\nclose-prs.yml — Closes PRs older than 1 month with insufficient reactions daily at 10 PM UTC. Default threshold is 2 reactions, configurable.\nAI Code Review review.yml — Input /review in PR comments, opencode AI analyzes code and leaves review comments on specific lines. Only available to repo owner/members.\nopencode.yml — Input /oc or /opencode in issue or PR comments to trigger opencode AI for more general interactions.\nThese two workflows demonstrate the \u0026ldquo;AI as collaborator\u0026rdquo; approach: not fully automatic code review, but on-demand triggering with humans making final decisions in the loop.\nDocumentation \u0026amp; Maintenance docs-update.yml — Every 12 hours, checks recent commits and uses opencode AI to determine if documentation needs updates.\ngenerate.yml — Runs code generation scripts when pushing to dev, auto-commits changes.\nbeta.yml — Syncs beta branch hourly.\nstats.yml — Updates download statistics to STATS.md daily.\nDesign Patterns Worth Adopting 1. Layered Governance opencode doesn\u0026rsquo;t stuff all automation into one workflow, but splits it by responsibility. An issue goes through four workflows in relay from creation to closure. Each workflow does one thing, combining to form a complete governance chain.\nBenefits of this design:\nIndividual workflows can be modified or disabled independently without affecting other steps Each workflow\u0026rsquo;s trigger conditions and permission scope are minimized Easy to locate which step has problems when they occur 2. Compliance Grace Period compliance-close.yml doesn\u0026rsquo;t close immediately upon detecting non-compliance, but gives a 2-hour grace period. This is reasonable for global contributors in different time zones—you might submit an issue while sleeping, and wake up with time to fix it.\n3. AI at Decision Points, Not Execution Points triage, duplicate detection, and code review all have AI make initial assessments, with humans making final decisions. But execution-level tasks like code builds and releases don\u0026rsquo;t use AI at all. This is a pragmatic division: AI excels at pattern recognition and initial classification, but not precise execution.\n4. Explicit vs Automatic Triggers Releases use tag triggers, maintenance uses schedule triggers, governance uses event triggers. Three trigger types correspond to three different automation trust levels: releases need human confirmation, maintenance can be scheduled automatic, governance needs immediate response.\nRisks of Over-Automation opencode\u0026rsquo;s automation system is comprehensive, but there are points to watch:\nCommunity barrier: New contributors submitting issues must follow specific templates, PRs must conform to conventional commits, otherwise auto-closed after 2 hours. For a 160k-star project, this strictness is reasonable—it filters out many low-quality contributions. But for small projects, this level of automation would scare away potential contributors.\nMaintenance cost: 27 workflows means 27 automation scripts to maintain. opencode has custom runners and dedicated scripts. If a workflow\u0026rsquo;s logic needs adjustment, maintainers need to switch between GitHub Actions YAML and custom scripts.\nAI uncertainty: duplicate-issues and triage use AI for judgment, but AI can misjudge. A reasonable issue marked as duplicate and closed creates a negative experience for contributors. opencode uses grace periods and manual review to mitigate this, but the risk remains.\nInsights for Our Projects Not every project needs 27 workflows. But opencode\u0026rsquo;s layered governance and \u0026ldquo;AI at decision points\u0026rdquo; approach are worth referencing:\nStart with issue templates: If the project starts receiving lots of duplicate or low-quality issues, add templates and duplicate checking first, rather than manually handling each one. Use grace periods for compliance checks: Always give a grace period when auto-closing non-compliant contributions. Use AI for classification, not execution: Let AI help triage issues and check PR formats, but don\u0026rsquo;t let AI auto-merge code or publish releases. Use tag triggers for releases: This is the safest approach. Automatic snapshot releases are acceptable, official versions need human confirmation. Add on demand: Add automation only when you have pain points. opencode\u0026rsquo;s 27 workflows weren\u0026rsquo;t built in a day, but gradually added as community scale grew. Summary opencode\u0026rsquo;s GitHub Actions system demonstrates automation practices for large-scale open source projects: CI/CD covers full platform releases, community governance uses multi-workflow relay processing, AI is applied to decision points like triage and review. The core of this system is not technical complexity, but three principles: \u0026ldquo;layered, grace periods, explicit triggers\u0026rdquo;. For our own projects, we don\u0026rsquo;t need to copy all 27 workflows, but these principles can be directly applied.\n","date":"2026-05-22T10:00:00+08:00","image":"https://svtter.cn/p/opencode-%E7%9A%84-github-actions-%E8%87%AA%E5%8A%A8%E5%8C%96%E4%BD%93%E7%B3%BB27-%E4%B8%AA-workflow-%E8%83%8C%E5%90%8E%E7%9A%84%E5%B7%A5%E7%A8%8B%E5%AE%9E%E8%B7%B5/cover_hu_90eb74bd0d5b025c.jpg","permalink":"https://svtter.cn/en/p/opencodes-github-actions-automation-system-engineering-practices-behind-27-workflows/","title":"OpenCode's GitHub Actions Automation System: Engineering Practices Behind 27 Workflows"},{"content":"I previously wrote an article OpenCode Configuration Optimization Record, which addressed token consumption and context management issues. However, configuration optimization handles \u0026ldquo;how the model runs,\u0026rdquo; while \u0026ldquo;the quality of code when it\u0026rsquo;s half-written\u0026rdquo; is something configuration cannot manage. This article starts from my development process of the opencode-review plugin, discussing how opencode-review helps an agent review and improve its own code within a session, resulting in higher quality code entering the PR.\nProblem: Who Guards Code Quality Within a Session? When using OpenCode to write code, a typical workflow is: the agent completes coding within a session, then I review the diff and create a PR. But I discovered a recurring problem: code written by agents often enters PRs with \u0026ldquo;first draft\u0026rdquo; quality issues.\nThese issues include: missing error handling, security vulnerabilities, poorly performing queries, and missing tests. If the agent could perform a self-review within the session—before the code is committed to the PR—many problems wouldn\u0026rsquo;t exist at the PR stage.\nThis is different from code review at the CI stage. I\u0026rsquo;ve already implemented CI review through opencode-actions (I previously wrote an introductory article)—it happens after PR creation, triggered by GitHub Actions. Later, Cloudflare also shared similar ideas in their engineering blog: using OpenCode to build large-scale AI code review. opencode-review aims to solve an earlier stage: within the session, before the PR, enabling the agent to proactively review and fix issues after writing code. The two complement each other: opencode-review raises the quality baseline of code entering the PR, while opencode-actions serves as the final checkpoint.\nSpecifically, there are three sub-problems to address:\nIncomplete review coverage: Code generated by agents may introduce security vulnerabilities and performance issues, but they won\u0026rsquo;t proactively check for these Lack of systematic review framework: Without structured dimensions to evaluate code, it\u0026rsquo;s easy to focus only on functional correctness while ignoring security and performance Lack of closed loop between issue discovery and fixes: Even when the agent discovers problems, a mechanism is needed to automatically fix them rather than waiting for someone to point them out Design of opencode-review Based on these three problems, I designed opencode-review: a structured code review plugin.\nMulti-Dimensional Analysis The first design decision is why divide into five dimensions rather than a general \u0026ldquo;good or bad\u0026rdquo; evaluation.\nCode quality is not a single dimension. A piece of code may be functionally correct and performant, but contain SQL injection vulnerabilities; or it may be secure and harmless, but lack test coverage. Evaluating them together inevitably leads to vague results.\nAcademically, the Modern Code Review (MCR) Survey collected code review research from 2013-2025, proposing a classification system covering multiple task dimensions including defect detection, security review, performance analysis, and maintainability assessment. Ericsson\u0026rsquo;s research team also verified in Automated Code Review Using Large Language Models at Ericsson that dimension-specific review is more effective in industrial scenarios than general review.\nopencode-review\u0026rsquo;s five dimensions—code-quality, security, performance, testing, documentation—correspond to the core review dimensions identified in these studies. Each dimension can be independently toggled because different projects focus on different priorities: an internal tool may not need documentation review, but a security-sensitive service cannot skip the security dimension.\nSeverity Grading The second design decision is why divide into three severity levels (critical / suggestion / highlight).\nThis comes from lessons learned in the static analysis tool domain. Security tools and linters have long faced a problem: alert fatigue. When all issues are marked as equally important, developers start ignoring them. Veracode\u0026rsquo;s research points out that the direct consequence of alert fatigue is that truly serious issues get drowned out in noise.\nThe logic of three levels is:\ncritical: Must fix (security vulnerabilities, logic errors, resource leaks) suggestion: Suggested improvements (code readability, performance optimization, better practices) highlight: Worth noting (style consistency, potential improvement space) This way developers can prioritize handling critical issues without missing a SQL injection among a bunch of \u0026ldquo;consider refactoring\u0026rdquo; suggestions.\nAuto-Fix Chain The third design decision is why critical issues should automatically trigger fixes rather than just being reported.\nThis is a controversial design. Traditional review tools typically \u0026ldquo;report but don\u0026rsquo;t fix,\u0026rdquo; leaving fixes to developers. But opencode-review\u0026rsquo;s scenario is different—the code it reviews is itself just written by an AI agent, so having another agent fix it is reasonable.\nAcademically, this belongs to the Automated Program Repair (APR) domain. A Survey of LLM-based Automated Program Repair (arXiv 2506.23749) reviewed 63 LLM-based APR systems from 2022-2025, divided into four paradigms. Among them, the \u0026ldquo;analysis-augmented\u0026rdquo; paradigm—using static analysis to locate problems first, then using LLMs to generate fixes—was proven most effective. opencode-review\u0026rsquo;s auto-fix chain is essentially this paradigm: reviewer discovers critical issue → locates problem position → spawns fixer sub-agent → generates minimal fix.\nAn ICSE 2025 paper also points out that a key challenge for LLMs in APR is objective alignment—the goal of fixing is not \u0026ldquo;generate code that looks reasonable,\u0026rdquo; but \u0026ldquo;precisely solve the reported problem.\u0026rdquo; This is why opencode-review\u0026rsquo;s fixer is designed as minimal fix—making only the minimal modifications to solve the problem, no rewriting, no refactoring, no \u0026ldquo;convenient\u0026rdquo; other changes.\nHidden Benefit of Auto-Review: Continuous Improvement of Code Quality Baseline The three designs above solve \u0026ldquo;discovering problems\u0026rdquo; and \u0026ldquo;fixing problems.\u0026rdquo; But auto-review has an easily overlooked benefit: it continuously raises the baseline of code quality inadvertently.\nThis effect comes from two mechanisms:\nFirst, the shaping of code writers by review feedback. FSE 2022 research found in two years of industrial practice that when developers know their code will be automatically reviewed, they consciously follow standards more during the coding phase—because the cost of being pointed out afterward becomes lower, and the benefit of writing well upfront becomes higher. This is a nudge effect. In the AI agent scenario, this effect is stronger: the agent writes code in a session, gets reviewed and pointed out issues, fixes them, gets reviewed again—this cycle can complete multiple rounds within the same session. Each round of feedback corrects the agent\u0026rsquo;s output tendency, equivalent to an implicit fine-tuning process.\nSecond, direct quality accumulation from automatic fixes. Critical issues being automatically fixed means the code quality of each commit is higher than without review. This isn\u0026rsquo;t a one-time improvement, but continuous. Like lint rules in a codebase—at first they only prohibit obvious errors, but as rules accumulate, the overall style and quality of the codebase is unconsciously raised. The auto-fix chain does something similar: security vulnerabilities are automatically patched, resource leaks are automatically fixed, missing tests are automatically added. Over time, the codebase\u0026rsquo;s quality baseline naturally becomes higher than without auto-review.\nSimply put: review is not the goal, quality improvement is. Auto-review turns \u0026ldquo;post-hoc inspection\u0026rdquo; into \u0026ldquo;in-process improvement.\u0026rdquo;\nCooldown Mechanism There\u0026rsquo;s one more design detail: cooldown_seconds.\nauto-review triggers when the session is idle, but idle events can trigger frequently (for example, when the agent is waiting for user confirmation, it also idles). Without cooldown, the same code might be reviewed several times, wasting tokens. The default 120-second cooldown period is an empirical value—enough for one round of modifications to complete, without waiting too long.\nopencode-froggy: Another Approach opencode-froggy (85 Stars, just released 0.12.0 yesterday) provides another approach. It doesn\u0026rsquo;t do structured multi-dimensional review, but instead provides 6 specialized agents (architect, code-reviewer, code-simplifier, doc-writer, partner, rubber-duck) and a flexible hooks system.\nFroggy\u0026rsquo;s code-reviewer is a general read-only review agent that doesn\u0026rsquo;t distinguish dimensions or severity. But its hooks system is strong—you can configure session.idle events to automatically run lint, auto-format, or even intercept when writing sensitive files:\n1 2 3 4 5 6 7 8 --- hooks: - event: session.idle conditions: [hasCodeChange, isMainSession] actions: - bash: \u0026#34;npm run lint --fix\u0026#34; - command: simplify-changes --- This is a \u0026ldquo;developer orchestrates the workflow\u0026rdquo; approach, complementing opencode-review\u0026rsquo;s \u0026ldquo;out-of-the-box structured review.\u0026rdquo;\nComparison opencode-review opencode-froggy Review method Structured multi-dimensional analysis General code-reviewer agent Severity grading critical / suggestion / highlight None Auto-fix critical issue → fixer sub-agent code-simplifier, manual trigger Trigger method session idle + cooldown hooks configuration Custom rules custom_rules supports project norms None Other features None 6 agents + hooks + gitingest + blockchain The two don\u0026rsquo;t conflict and can be installed together. My suggestion is: opencode-review for daily auto-review, froggy\u0026rsquo;s hooks for workflow orchestration.\nPlugin Installation The two plugins have different installation methods.\nopencode-froggy supports direct installation via npm, just add to opencode.json:\n1 2 3 { \u0026#34;plugin\u0026#34;: [\u0026#34;opencode-froggy\u0026#34;] } opencode-review currently doesn\u0026rsquo;t have npm installation available yet, requires cloning and local linking:\n1 2 3 4 5 6 7 8 9 # Clone to any location git clone https://github.com/sun-praise/opencode-review.git /path/to/opencode-review # Project-level installation (recommended) mkdir -p .opencode/plugins ln -s /path/to/opencode-review/src/index.ts .opencode/plugins/opencode-review.ts # Or global installation ln -s /path/to/opencode-review/src/index.ts ~/.config/opencode/plugins/opencode-review.ts opencode-review also needs to create .opencode/review.json to configure review behavior:\n1 2 3 4 5 6 7 8 9 10 11 12 { \u0026#34;language\u0026#34;: \u0026#34;zh\u0026#34;, \u0026#34;dimensions\u0026#34;: [\u0026#34;code-quality\u0026#34;, \u0026#34;security\u0026#34;, \u0026#34;performance\u0026#34;, \u0026#34;testing\u0026#34;, \u0026#34;documentation\u0026#34;], \u0026#34;trigger\u0026#34;: { \u0026#34;auto_on_idle\u0026#34;: true, \u0026#34;cooldown_seconds\u0026#34;: 120 }, \u0026#34;custom_rules\u0026#34;: [ \u0026#34;All API endpoints must have error handling\u0026#34;, \u0026#34;Database queries must use parameterized statements\u0026#34; ] } Other Notable Plugins The ecosystem already has over 70 plugins, here are a few more recommendations:\nopencode-worktree: Zero-friction git worktree management opencode-notify: Send system notifications when tasks complete dynamic-context-pruning: Automatically prune outdated tool outputs, optimizing token usage envsitter-guard: Prevent agents from reading .env sensitive files See the complete list at awesome-opencode.\nReferences Modern Code Review (MCR) Survey — 2013-2025 code review research survey Automated Code Review Using LLMs at Ericsson — Industrial practice of LLM-assisted code review A Survey of LLM-based Automated Program Repair — LLM auto-fix survey, covering 63 systems Aligning the Objective of LLM-Based Program Repair (ICSE 2025) — Objective alignment issues in LLM fixing Understanding Automated Code Review Process (FSE 2022) — Two years of industrial environment auto-review experience AI-Assisted Assessment in Modern Code Review (AIware 2024) — Deployment and evaluation of AutoCommenter Code Review Agent Benchmark (c-CRAB) — AI agent code review benchmark opencode-actions - a coding review agent — GitHub Action built on OpenCode, code review at CI stage Cloudflare: Orchestrating AI Code Review at Scale — Cloudflare using OpenCode to build large-scale AI review ","date":"2026-05-19T10:00:00+08:00","image":"https://svtter.cn/p/opencode-%E9%85%8D%E7%BD%AE%E4%B9%8B%E5%A4%96%E7%9A%84%E4%BC%98%E5%8C%96-%E5%9F%BA%E4%BA%8E%E6%8F%92%E4%BB%B6%E7%9A%84%E4%BC%98%E5%8C%96/cover_hu_aa7d9b9348689ac4.png","permalink":"https://svtter.cn/en/p/opencode-optimization-beyond-configuration-plugin-based-optimization/","title":"OpenCode Optimization Beyond Configuration — Plugin-Based Optimization"},{"content":"Recently, I explored the development trajectory of DeepSeek and compiled a DeepSeek Technical Evolution. After reviewing it, I found it to be a valuable document worth publishing. However, converting it directly to markdown would result in the loss of much information and detail. Therefore, I tried Hugo\u0026rsquo;s {{ .raw }} mode and was pleasantly surprised to find that it works well while still supporting SEO.\nI\u0026rsquo;ll be publishing more articles like this in the future. These articles are co-created with AI, and the reading experience is somewhat better than plain text.\nI typically store HTML files in static-html for unified management. I recommend you give it a try. The introduction article is here sth: An HTML Preview Server for AI Agents.\n","date":"2026-05-15T15:59:17+08:00","permalink":"https://svtter.cn/en/p/a-new-blog-format/","title":"A New Blog Format"},{"content":"I\u0026rsquo;ve open sourced a small tool: static-html, with the command-line name sth.\nWhat it does is simple: it provides an HTTP service that lets you register locally generated HTML files and preview them in a browser.\nWhy This Tool Is Needed The problem stems from AI Agent output.\nNowadays I use agents like Claude Code and OpenCode for my work, and they often need to output complex content—code review summaries, comparative analyses, quotations, architecture design documents. When this content is sent to Telegram as plain text, the formatting gets completely messed up, tables become unreadable, and code syntax highlighting is lost.\nIn short, it\u0026rsquo;s just a big mess.\nThe initial approach was to have agents directly generate HTML files locally and open them in a browser. But the problems were:\nThe agent runs on a server without a graphical interface Locally generated file paths are unpredictable and management is chaotic No history—previously sent content can\u0026rsquo;t be found So I needed a service where an agent could \u0026ldquo;send\u0026rdquo; an HTML file and get back a URL that could be opened in any device\u0026rsquo;s browser. The agent would handle mobile and PC compatibility.\nWhat sth Does sth is a lightweight HTTP service written in Go with just two core commands:\n1 2 3 4 5 # Start the service sth start # Send an HTML file sth send ./report.html sth send packages the target HTML file along with resource files from the same directory (CSS, JS, images, etc.) and uploads them, then returns a URL. Opening this URL displays the complete page effect.\nIn practice, it runs on my intranet development machine, and agents specify the remote address via the --server parameter:\n1 sth send ./report.html --server http://dev-1:3939 My Actual Usage Currently sth mainly runs on my development server, working in tandem with the Hermes Agent.\nHermes is my daily AI assistant running on Telegram. When it needs to output complex content—such as code review conclusions, technical solution comparisons, project quotations—it calls the html-report skill to generate a beautifully formatted HTML file, then sends it to the preview server via sth send, and finally sends me the URL.\nThe entire workflow is:\n1 2 3 4 User question -\u0026gt; Hermes Agent analysis -\u0026gt; Generate HTML report (html-report skill) -\u0026gt; sth send to preview server -\u0026gt; Return URL -\u0026gt; Send to Telegram This way I can tap the link on my phone and see a well-formatted report instead of a blob of plain text.\nMetadata Management Beyond basic sending and previewing, sth also supports tagging, categorizing, and associating sessions with projects:\n1 2 3 4 5 sth tag \u0026lt;session-id\u0026gt; code-review pricing sth categorize \u0026lt;session-id\u0026gt; \u0026#34;Technical Review\u0026#34; sth project \u0026lt;session-id\u0026gt; hydrogen-permeation sth list --project hydrogen-permeation sth search \u0026#34;quotation\u0026#34; --tag pricing This feature solves a practical problem: over time, sent reports accumulate. Through tags and project categorization, you can quickly find previous outputs.\nThe difference between list and search is: list matches metadata fields exactly, while search performs full-text search. They can be used in combination.\nTechnical Details Language: Go 1.24+ Storage: SQLite (github.com/mattn/go-sqlite3, requires CGO) Deployment: Single binary file, just manage with systemd Build: go build -o dist/sth ./cmd/html-server It\u0026rsquo;s just that simple, no unnecessary dependencies.\nOpen Source This tool was previously a private repo, but I just made it public today: sun-praise/static-html.\nIf you\u0026rsquo;re also using AI Agents for daily development work and have encountered the problem where \u0026ldquo;complex agent output can\u0026rsquo;t be read in chat tools,\u0026rdquo; give sth a try. It\u0026rsquo;s lightweight enough and does what it needs to do.\n","date":"2026-05-09T12:00:00+08:00","image":"https://svtter.cn/p/sth%E4%B8%80%E4%B8%AA%E7%BB%99-ai-agent-%E7%94%A8%E7%9A%84-html-%E9%A2%84%E8%A7%88%E6%9C%8D%E5%8A%A1%E5%99%A8/cover_hu_c1aeaf4891bf735f.jpg","permalink":"https://svtter.cn/en/p/sth-an-html-preview-server-for-ai-agents/","title":"sth: An HTML Preview Server for AI Agents"},{"content":"Problem Description When using DeepSeek models (such as deepseek-v4-flash) directly in Claude Code with extended thinking enabled, multi-turn conversations trigger a 400 error:\n1 Bad Request: {\u0026#34;error\u0026#34;:{\u0026#34;message\u0026#34;:\u0026#34;The content[].thinking in the thinking mode must be passed back to the API.\u0026#34;,\u0026#34;type\u0026#34;:\u0026#34;invalid_request_error\u0026#34;,\u0026#34;param\u0026#34;:null,\u0026#34;code\u0026#34;:\u0026#34;invalid_request_error\u0026#34;}} Root Cause Analysis Call Chain 1 Claude Code → DeepSeek Anthropic Compatible Endpoint (https://api.deepseek.com/anthropic) Protocol Incompatibility According to the DeepSeek Anthropic API Compatibility Documentation, the compatibility status is as follows:\nMessage Field Support Status content[].thinking ✅ Supported content[].redacted_thinking ❌ Not Supported In extended thinking mode during multi-turn conversations, Claude Code faithfully passes back all thinking blocks from the previous round (including redacted_thinking types) to the API as-is. DeepSeek does not recognize redacted_thinking, hence the 400 error.\nAdditionally, DeepSeek\u0026rsquo;s thinking block format differs from Anthropic\u0026rsquo;s native protocol, and the replay logic in tool_use scenarios is not fully compatible either.\nCore Conflict Anthropic API requirement: In extended thinking mode, content[].thinking and content[].redacted_thinking must be passed back unchanged DeepSeek compatibility layer: Only supports thinking, does not support redacted_thinking Claude Code behavior: Hard-coded according to Anthropic protocol, does not distinguish between target endpoint types Community Feedback This is a widespread community issue that almost all CC agent/router projects have encountered:\nIssue Project Title #1 cc-use DeepSeek Thinking Mode Error: content[].thinking Must Be Passed Back #878 openclaude DeepSeek V4: reasoning_content must be passed back (400) on tool_calls #1355 claude-code-router CCR 代理 deepseek V4 思考时返回 400 #4543 new-api ClaudeCode 接入 DeepSeek V4 遇到 400 reasoning_content 报错 #355 9router DeepSeek API Error 400 – Missing reasoning_content #16748 hermes-agent DeepSeek /anthropic: stripped thinking blocks cause HTTP 400 on replay #2414 cc-switch Claude 使用 cc-switch 配置 deepseek-v4-pro，无法识别字段 #174 cc-haha /compact 命令在使用 DeepSeek API 时无法工作 DeepSeek Official Response Zero response. Nor is there any need to respond.\nFirst, DeepSeek has no public API issue repository. All feedback occurs in third-party projects without any DeepSeek official personnel participating in any discussions. Second, whether to use Anthropic as a compatibility standard, I think DeepSeek should be hesitant. Temporary Workarounds Disable extended thinking — When using DeepSeek in CC, turn off thinking mode Use proxy filtering — Add a proxy layer between CC and DeepSeek to filter out redacted_thinking blocks Switch models — Use DeepSeek for non-thinking scenarios and Anthropic native models for thinking scenarios Why Doesn\u0026rsquo;t OpenCode Have This Problem? OpenCode (opencode-ai/opencode) naturally avoids this problem architecturally, not through a dedicated \u0026ldquo;fix\u0026rdquo;.\nThe key lies in the convertMessages method in internal/llm/provider/anthropic.go (lines 60-119):\nWhen building assistant messages, it only passes back TextContent (text) and ToolCall (tool calls) Completely ignores ReasoningContent (thinking content), not putting it in messages thinking content is only displayed in the UI through stream thinking_delta events and is not passed back to the API Comparison with Claude Code\u0026rsquo;s behavior:\nClaude Code OpenCode thinking replay ✅ Faithfully replay all thinking blocks (including redacted_thinking) ❌ Do not replay thinking blocks architectural reason Follow Anthropic API specification, requires unchanged replay Self-managed conversation state, thinking only for UI display DeepSeek compatibility ❌ Triggers 400 (redacted_thinking not recognized) ✅ Not affected (doesn\u0026rsquo;t pass thinking at all) Conclusion: OpenCode avoids the problem at the cost of not following Anthropic\u0026rsquo;s extended thinking specification. This approach is friendly to third-party compatible endpoints like DeepSeek, but if Anthropic native thinking context retention capability is needed in the future, re-implementation may be necessary.\nDoes Not Replay Thinking Blocks Affect DeepSeek Performance? Basically no, reasons:\nthinking blocks are the model\u0026rsquo;s internal scratchpad, not final output. The text replies and tool calls in the conversation history already retain key decisions and conclusions DeepSeek\u0026rsquo;s reasoning is closer to OpenAI\u0026rsquo;s mode — each round is generated independently, unlike Anthropic\u0026rsquo;s strong reliance on cross-round replay to maintain reasoning coherence OpenCode\u0026rsquo;s extensive actual use also confirms this — community users run multi-turn conversations using DeepSeek thinking mode in OpenCode without feedback about reasoning quality degradation The truly potentially affected extreme scenario: in ultra-long multi-turn tasks, the model may repeat conclusions it has already reasoned through. However, in most actual use, the impact is negligible.\nRelated Claude Code Native Issues CC itself has similar thinking block replay bugs on Anthropic models (not DeepSeek-specific):\nIssue Title Status #10199 API Error 400 - Thinking Block Modification Error Open (oncall) #51985 thinking block missing in multi-turn conversations Open #20692 thinking blocks order error on first tool use Open (oncall) #54482 Thinking blocks stripped from context every turn (Opus 4.7) Open ","date":"2026-04-30T15:00:00+08:00","image":"https://svtter.cn/p/deepseek--claude-code-thinking-block-%E5%85%BC%E5%AE%B9%E6%80%A7%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90/cover_hu_9c89421729b9674b.png","permalink":"https://svtter.cn/en/p/deepseek--claude-code-thinking-block-compatibility-analysis/","title":"DeepSeek + Claude Code: Thinking Block Compatibility Analysis"},{"content":"When using deepseek-reasoner, we often encounter this problem:\n1 The reasoning_content\u0026#39; in the thinking mode must be passed back to the API. Update Both issues have now been officially resolved by opencode. Users only need to install the latest version of opencode and use it through the deepseek provider, without additional configuration.\n1 2 3 4 5 6 Issue 1 The reasoning_content\u0026#39; in the thinking mode must be passed back to the API. Issue 2 Bad Request: {\u0026#34;error\u0026#34;:{\u0026#34;message\u0026#34;:\u0026#34;The content[].thinking in the thinking mode must be passed back to the API.\u0026#34;,\u0026#34;type\u0026#34;:\u0026#34;invalid_request_error\u0026#34;,\u0026#34;param\u0026#34;:null,\u0026#34;code\u0026#34;:\u0026#34;invalid_request_error\u0026#34;}} Both issues have been officially resolved. Install version 1.14.29 or above.\nThe old solution follows:\nHow to solve it? It\u0026rsquo;s straightforward.\nHow to Configure Add provider information to your configuration:\n.config/opencode/opencode.json or .config/opencode/opencode.jsonc\nModify the provider section to:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 { \u0026#34;provider\u0026#34;: { \u0026#34;deepseek\u0026#34;: { \u0026#34;npm\u0026#34;: \u0026#34;@ai-sdk/anthropic\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;DeepSeek\u0026#34;, \u0026#34;options\u0026#34;: { \u0026#34;baseURL\u0026#34;: \u0026#34;https://api.deepseek.com/anthropic\u0026#34;, \u0026#34;apiKey\u0026#34;: \u0026#34;\u0026lt;apikey\u0026gt;\u0026#34; }, \u0026#34;models\u0026#34;: { \u0026#34;deepseek-v4-pro\u0026#34;: { \u0026#34;name\u0026#34;: \u0026#34;DeepSeek-V4-Pro\u0026#34;, \u0026#34;limit\u0026#34;: { \u0026#34;context\u0026#34;: 1048576, \u0026#34;output\u0026#34;: 262144 }, \u0026#34;options\u0026#34;: { \u0026#34;thinking\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;enabled\u0026#34;, \u0026#34;budgetTokens\u0026#34;: 8192 } } }, \u0026#34;deepseek-v4-flash\u0026#34;: { \u0026#34;name\u0026#34;: \u0026#34;DeepSeek-V4-Flash\u0026#34;, \u0026#34;limit\u0026#34;: { \u0026#34;context\u0026#34;: 1048576, \u0026#34;output\u0026#34;: 262144 }, \u0026#34;options\u0026#34;: { \u0026#34;thinking\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;enabled\u0026#34;, \u0026#34;budgetTokens\u0026#34;: 8192 } } } } } } } How to Use Select the deepseek model.\nThe result.\nSupplement This method cannot solve this problem\nBad Request: {\u0026quot;error\u0026quot;:{\u0026quot;message\u0026quot;:\u0026quot;The content[].thinking in the thinking mode must be passed back to the API.\u0026quot;,\u0026quot;type\u0026quot;:\u0026quot;invalid_request_error\u0026quot;,\u0026quot;param\u0026quot;:null,\u0026quot;code\u0026quot;:\u0026quot;invalid_request_error\u0026quot;}}\nIf you encounter this problem, you need to wait for opencode to fix it.\nRelated article: DeepSeek + Claude Code: Thinking Block Compatibility Issue Analysis — Analyzes the root cause of 400 errors triggered by multi-turn conversations in extended thinking mode when using DeepSeek with Claude Code, along with community solutions.\n","date":"2026-04-24T12:23:58+08:00","image":"https://svtter.cn/p/%E5%A6%82%E4%BD%95%E8%A7%A3%E5%86%B3-opencode-%E4%B8%AD-deepseek-%E6%A8%A1%E5%9E%8B%E7%9A%84-reasoning-%E9%97%AE%E9%A2%98/cover_hu_61ec0320f76903c9.png","permalink":"https://svtter.cn/en/p/how-to-fix-deepseek-model-reasoning-issues-in-opencode/","title":"How to Fix DeepSeek Model Reasoning Issues in OpenCode"},{"content":"To make it easier to integrate opencode for code review, I built a GitHub Action repository. Working with opencode to implement this was straightforward.\nCurrently, it provides two main features: one is review, and the other is using the runner to execute opencode (directly running opencode\u0026rsquo;s prompts on the runner) to handle other functionalities. For example, modifying code, creating new issues, creating PRs based on issues, etc.\nHow stable is it? This repository has been validated across multiple projects, and the release version is reliable. However, note that the main branch version is a rapidly iterating version. How to integrate? Add the following to your .github/workflows/opencode-review.yml:\n1 2 3 4 5 6 7 8 - name: Run OpenCode review uses: sun-praise/opencode-actions/review@v1 with: github-token: ${{ secrets.GITHUB_TOKEN }} # only one is enough. zhipu-api-key: ${{ secrets.ZHIPU_API_KEY }} opencode-go-api-key: ${{ secrets.OPENCODE_GO_API_KEY }} Currently, this action mainly supports Z.AI, ZHIPU, and OPENCODE GO subscriptions. Therefore, if using ZHIPU, simply add your ZHIPU_API_KEY to the project\u0026rsquo;s secrets. If using the opencode go subscription, you need to add OPENCODE_GO_API_KEY.\nEverything else can use the default configuration. The default model is zhipuai-coding-plan/glm-5-turbo. For more configuration requirements, I recommend checking the original repository\u0026rsquo;s README.\nI previously covered this quick review script in my code review article.\nDifferences Actually, opencode has its own actions, so why did I build another one?\nThe differences from the official version are mainly reflected in several aspects:\nFeature Upstream Status This Repository Model default fallback Only required input Three-level fallback (input → MODEL_NAME → hardcoded default) Provider convenience fields None zhipu-api-key, opencode-go-api-key, etc. Review prompt template None Chinese-formatted review (mergeable/conditionally mergeable/not mergeable) Retry logic None attempts / retry-profile / retry-on-regex / retry-delay Execution timeout None timeout-seconds Version check None OPENCODE_MIN_VERSION Installation retry None install-attempts XDG cache caching Only caches bin Caches both bin + XDG cache Detailed explanations are available in sun-praise/opencode-actions#29.\nReview effectiveness You can see the results from the repo\u0026rsquo;s own PR at opencode-actions#30\nThe effect looks like this:\nFuture Try integrating gemini cli. Google\u0026rsquo;s GEM 3.1 PRO model currently offers great value for money, with the highest intelligence per unit. Integrate MCP plugin functionality. If MCP is available during opencode review, it may bring better review results. Integration of commercial plugin features ","date":"2026-04-23T11:36:34+08:00","image":"https://svtter.cn/p/opencode-actions-%E4%B8%80%E4%B8%AA-coding-review-agent/cover_hu_f40616f3683073a4.jpg","permalink":"https://svtter.cn/en/p/opencode-actions-a-code-review-agent/","title":"opencode-actions - A Code Review Agent"},{"content":"I\u0026rsquo;ve gone back to writing articles \u0026ldquo;myself\u0026rdquo; again. The reason I say \u0026ldquo;myself\u0026rdquo; is:\nActually, my recent articles were all written through conversations with DeepSeek, where I had DeepSeek generate the output.\nAfter generating the articles, I\u0026rsquo;d have Codex polish them. (But Codex\u0026rsquo;s polishing was absolute crap.)\nIn between, I also tried having GPT-5.4 generate output—that is, communicate with me + write the first draft.\nThe Problem The reason I stopped using GPT-5.4, despite looking like an incredibly powerful large model, is that its output was truly garbage. It had an overwhelmingly heavy AI flavor, read with a strong translation tone that was genuinely uncomfortable. Beyond the translation tone, another major issue was that it couldn\u0026rsquo;t express what I meant. In my view, Chinese is a language with rich semantics and nuance, so this kind of expression easily deviates from my own thoughts and intentions. I believe Chinese emphasizes subtle expression, not blunt straightforwardness. GPT-5.4 had a lot of blunt straightforwardness. It was very uncomfortable. I think readers would feel uncomfortable reading it too.\nBut fundamentally, the main problem is the AI flavor. AI-generated articles universally have this AI flavor problem, and GPT-5.4 is the most obvious.\nRecently, it\u0026rsquo;s probably because Codex has a 2x discount, so everyone wants to try it. Plus, Simple Codex\u0026rsquo;s Terminal Benchmark certification score has given people a lot more confidence.\nThe issue of not sounding human isn\u0026rsquo;t just my perspective. This is everyone\u0026rsquo;s complaint.\n先什么时候能让gpt讲人话，而不是叽里咕噜讲一堆车轱辘废话，难绷。\n\u0026mdash; 竹筒Tom (@0xAzathoth_) April 5, 2026 In recent articles, when I explained in conversation \u0026ldquo;don\u0026rsquo;t be aggressive toward vendors,\u0026rdquo; it would write \u0026ldquo;this article isn\u0026rsquo;t targeting anyone.\u0026rdquo; A typical example is the later articles discussing LLM pricing.\nIf it knew about the Chinese meme \u0026ldquo;I\u0026rsquo;m not targeting anyone, I\u0026rsquo;m saying everyone here is xx\u0026rdquo; (from a Stephen Chow movie). I don\u0026rsquo;t think it would express itself that way.\nSo I\u0026rsquo;ve decided to write articles myself—I\u0026rsquo;ll take responsibility for the results.\nFurther Analysis - Let Me Talk About Other Things GPT-5.4 has another obvious problem: I said not to do something, but it still does it. Or it outputs content saying it will do something, then doesn\u0026rsquo;t do it in the next step. If this appears in a longer multi-round conversation, I think it\u0026rsquo;s acceptable. But in the current situation, having just said it would do something in the previous sentence, then not doing it in the next step—this performance, I feel, isn\u0026rsquo;t good enough.\nASI cares about not just \u0026ldquo;safety,\u0026rdquo; but actually \u0026ldquo;alignment.\u0026rdquo; SAM doesn\u0026rsquo;t understand this. Actually \u0026ldquo;not listening when told\u0026rdquo; is a failure of \u0026ldquo;alignment.\u0026rdquo; I don\u0026rsquo;t like Sam. This problem is actually a management problem. The safety team doesn\u0026rsquo;t get the promised 20% compute. So naturally alignment can\u0026rsquo;t be achieved.\nI\u0026rsquo;ll add some supporting materials later. Or open a new blog to discuss this.\nRegarding collaboration with OpenCode, rather than being more open, it\u0026rsquo;s actually targeted opposition. We users benefit from this. The harder vendors fight, the more users benefit.\nWhen Opus quotas were reduced, Codex immediately switched to token-based billing.\nA Few Words About Doubao Also, Doubao is a typical passive-aggressive master. Whether in group chat or voice, it\u0026rsquo;s the same. I don\u0026rsquo;t know where the training data went wrong.\nAlso, I didn\u0026rsquo;t expect the group chat assistant to get into arguments with people in the group 🤣\nSupplement Happened to see Old Feng from Cloud Numbers also discussing this problem. Yes, I Use AI to Write Articles.\nHis articles don\u0026rsquo;t look as heavily AI-flavored. Maybe Opus is more suitable for writing.\nAdditionally, if you include your own writing style in the prompt, it might further reduce the AI feeling.\n","date":"2026-04-06T21:49:34+08:00","image":"https://svtter.cn/p/%E6%88%91%E8%BF%98%E6%98%AF%E8%87%AA%E5%B7%B1%E5%86%99%E6%96%87%E7%AB%A0%E4%BB%A5%E5%8F%8A%E5%AF%B9-gpt-5.4-%E7%9A%84%E4%B8%80%E4%BA%9B%E6%83%B3%E6%B3%95/cover_hu_ba2f421465e05f6d.jpg","permalink":"https://svtter.cn/en/p/im-writing-articles-myself-again-and-some-thoughts-on-gpt-5.4/","title":"I'm Writing Articles Myself Again, and Some Thoughts on GPT-5.4"},{"content":"Many people start thinking seriously about self-hosting an LLM not because of technical romance, but because API bills, rate limits, or compliance requirements have started to collide with real business constraints.\nSo a very natural question shows up: if the model runs on your own machine, does that mean you can finally use it without limits?\nMy answer is: no. Self-hosting a model does not mean unlimited freedom. It mostly means that many of the constraints and costs previously absorbed by the platform are now transferred to you.\nBut there is a more useful second question: once usage gets large enough, can self-hosting actually become cheaper?\nThe answer is: possibly, but under stricter conditions than many people expect.\nIn short: self-hosting an LLM does not mean unlimited freedom.\nIt means taking on part of the cost and responsibility that a platform would normally absorb. Self-hosting becomes financially attractive only when load stays high, utilization remains strong, and you can either accept model trade-offs or optimize the stack yourself.\nLocal deployment does not mean no limits Let us clear up the most common misunderstanding first.\nMany people interpret \u0026ldquo;the model runs on my own machine\u0026rdquo; as \u0026ldquo;I can now use it however I want.\u0026rdquo; In reality, the limits do not disappear. They simply show up in a different form.\nThe first limit is hardware.\nParameter count, VRAM capacity, quantization level, KV cache, and concurrency are real physical constraints. Even a quantized 70B model still puts serious pressure on memory and bandwidth. Being able to run it does not mean it runs comfortably. Getting output does not mean latency and throughput are acceptable.\nThe second limit is model capability itself.\nHallucinations, knowledge cutoffs, long-context degradation, and unstable reasoning do not vanish just because the model sits on your own server. Deployment location does not change the model\u0026rsquo;s ceiling. More importantly, most so-called self-hosting setups use open-weight models, not the actual closed models behind systems like Claude or GPT.\nThe third limit is responsibility transfer.\nWhen you use an API, content safety, service stability, rate limiting, and much of the infrastructure burden are partially handled by the provider. Once you self-host, those problems do not go away. They become your monitoring, your operations, your review pipeline, and your incident response.\nSo self-hosting is not \u0026ldquo;use without limits.\u0026rdquo; It is \u0026ldquo;you own the boundaries.\u0026rdquo;\nThe real calculation is not just the price of a GPU If you want to know whether self-hosting is worth it, the real comparison is not \u0026ldquo;how much does the card cost?\u0026rdquo; but these two larger accounts.\nThe annual cost of self-hosting can be written roughly like this:\n1 Annual self-hosting cost = hardware depreciation + electricity + network / hosting + operations labor + redundancy for failures The annual API cost is more direct:\n1 Annual API cost = average daily token usage * price per million tokens * 365 That looks simple, but three details are often ignored.\nSelf-hosting is not a one-time hardware purchase. Electricity, spare parts, hosting conditions, alerting, upgrades, and maintenance all keep happening. API pricing is not a single fixed number. Model choice, input-output ratio, cache hit rate, and tool usage can all change the final bill significantly. Utilization is easy to underestimate. If your machine sits idle most of the time, a low per-inference cost means very little. On the other hand, if the workload is stable and the hardware stays busy, the financial case for self-hosting becomes much stronger. So the numbers below should be read as rough order-of-magnitude guidance, not as a procurement quote.\nA rough but useful breakeven table To keep the discussion simple, let us start with a deliberately rough set of assumptions:\nAPI pricing is estimated at roughly CNY 50 per million tokens token usage counts both input and output together local hardware is depreciated over 3 years self-hosting cost includes baseline power and operations overhead the local setup mainly assumes open-weight model inference, not strict parity with top closed models this does not include training, fine-tuning, or a dedicated platform team Under those assumptions, you get a rough picture like this:\nScenario Daily token usage Likely local setup Annual self-hosting cost Annual API cost Rough conclusion Light usage 500K Single high-end consumer workstation CNY 20K - 40K about CNY 9K API is cheaper Medium usage 5M Dual-GPU or small inference workstation CNY 60K - 120K about CNY 91K Near breakeven Heavy usage 50M Multi-GPU server or cluster CNY 400K - 800K about CNY 912K Self-hosting may be cheaper If you want local quality to get as close as possible to top-tier closed models, this table usually moves upward again, because stronger models, more VRAM, and higher availability targets all push infrastructure and operations costs higher.\nThis table points to three things.\nIndividuals and small teams usually do not save money with self-hosting. If your workload is only a few hundred thousand tokens per day, APIs are still usually the more economical option. You spend less on hardware and avoid carrying the operations burden. The real breakeven point tends to appear only in consistently high-usage scenarios. Not one occasional spike, but a workload that stays high day after day. Only then can hardware cost be spread efficiently enough. The larger the usage, the more attractive self-hosting becomes financially. That is why large companies invest seriously in inference platforms. It is not because they enjoy complexity. It is because once the scale is large enough, the math really changes. One critical condition: you may not be comparing the same thing The biggest problem in many \u0026ldquo;self-hosting is cheaper than API\u0026rdquo; discussions is not the arithmetic. It is that the compared products are often not equivalent.\nOn the API side, you may be buying access to a top-tier closed model. On the local side, you may be running a quantized open-weight model. Both are called \u0026ldquo;LLMs,\u0026rdquo; but they are not the same product in a strict sense.\nThat means:\nif open-weight quality is acceptable for your use case, self-hosting may indeed save a lot of money if your quality bar is high and you depend on the best closed models, the room for self-hosting becomes much smaller if you compare a cheaper model to a more expensive model, the result is not just a deployment conclusion, but also a model-selection conclusion Put differently, many people think they are calculating deployment cost when they are actually accepting a capability downgrade first.\nThere is nothing wrong with that trade-off, but it should be stated clearly.\nWhat self-hosting gives you besides cost savings If a company still chooses to self-host after doing the math, it is usually not only about saving API money.\nData control. Some businesses simply do not want raw data flowing through third-party providers for long-term operational or compliance reasons. Local deployment makes the compliance and audit path easier to manage. Customization. You can optimize around your own tasks with quantization, routing, distillation, fine-tuning, and tighter integration into internal systems. Standard APIs usually give you less freedom here. A more predictable cost ceiling. API pricing scales directly with usage. When the business grows, the bill grows with it. Self-hosting has a large upfront investment, but under high and stable load, the cost curve is often easier to predict. Offline operation and availability. If your environment requires internal-only deployment, or if you cannot accept key workflows depending entirely on external services, local deployment may simply fit the engineering requirements better. A more practical decision framework If you do not want to model every variable from day one, start with these three questions.\nIs your workload consistently high over time? If you only see occasional spikes rather than sustained token usage every day, APIs are often still the better choice because you are not paying for idle hardware. Can you accept the gap between a local model and a closed flagship model? If your business depends on best-in-class model quality, a large part of the claimed savings may come from lowering model quality rather than from deployment efficiency alone. Do you actually have the ability to operate an inference service long term? What happens when a GPU fails, drivers conflict, service latency spikes, the model version needs to change, or rate limiting and monitoring need to be built? If nobody owns these questions, the issue is no longer just cost. It becomes a delivery problem. Conclusion Back to the original question: does self-hosting an LLM really let you use it without limits?\nMy answer is still: no.\nIt does not remove hardware bottlenecks, erase model capability gaps, or magically solve moderation, reliability, and operations work for you. What it gives you is not absolute freedom, but more control and the responsibility that comes with it.\nAt the same time, self-hosting is absolutely not a fake option. It becomes increasingly reasonable when several conditions are true at once:\nyour token usage stays high for a long time the workload is stable and hardware utilization remains high open-weight models are acceptable, or you already have the ability to optimize them well data control, internal deployment, or predictable cost ceilings matter to you If you are an individual, a small team, or just an occasional heavy user, APIs are still usually the more practical answer: less effort, less operational burden, and lower cost of experimentation.\nIf you are already in the phase where you burn tokens steadily every day, then it is worth calculating the full picture instead of staring only at API unit prices. Very often the answer is not \u0026ldquo;now I can use it without limits,\u0026rdquo; but a more grounded question that matters more: is this worth owning yourself?\n","date":"2026-03-19T12:30:00+08:00","image":"https://svtter.cn/p/%E8%87%AA%E5%B7%B1%E9%83%A8%E7%BD%B2%E5%A4%A7%E6%A8%A1%E5%9E%8B%E7%9C%9F%E7%9A%84%E5%B0%B1%E8%83%BD%E8%82%86%E6%97%A0%E5%BF%8C%E6%83%AE%E5%9C%B0%E7%94%A8%E5%90%97/cover_hu_69638efe5f61bf9.jpg","permalink":"https://svtter.cn/en/p/does-self-hosting-an-llm-really-let-you-use-it-without-limits/","title":"Does Self-Hosting an LLM Really Let You Use It Without Limits?"},{"content":"Preface Recently, several domestic big model manufacturers have launched Coding Plan subscription packages for developers, promoting \u0026ldquo;low prices for massive usage,\u0026rdquo; claiming that for just tens to hundreds of RMB per month, you can get \u0026ldquo;hundreds of billions of tokens\u0026rdquo; of usage quota.\nIt sounds wonderful, but as a developer accustomed to speaking with data, I decided to do some calculations: Under concurrency limits, can these promised usage amounts really be consumed?\nTypical Package Structure Taking the common three-tier packages on the market as an example:\nPackage Monthly Fee Promised Usage (every 5 hours) Lite ~20 RMB About 120 prompts Pro ~100 RMB About 600 prompts Max ~200 RMB About 2,400 prompts Officials will also add: \u0026ldquo;Each prompt is expected to call the model 15-20 times, with a total monthly usage of up to tens to hundreds of billions of tokens.\u0026rdquo;\nIt seems like incredible value, but the devil is in the details.\nKey Limitation: Concurrency Most manufacturers\u0026rsquo; documentation will casually mention: \u0026ldquo;Package usage is subject to concurrency limits (number of in-flight request tasks).\u0026rdquo;\nBut what exactly is the limit? Often not explicitly stated. According to community feedback and actual measurements, typical concurrency limits are as follows:\nPackage Concurrency (in-flight requests) Lite 2 Pro ~4-5 Max ~7 This number directly determines your actual throughput ceiling.\nMath Time: Can the Max Package Use 2,400 Prompts? Let\u0026rsquo;s take the highest-tier Max package as an example and do a simple calculation.\nKnown Conditions Promised Usage: 2,400 prompts every 5 hours Concurrency Limit: 7 Model calls triggered per prompt: 15-20 times (official data) Model generation speed: About 50-60 tokens/second 5 hours = 18,000 seconds Calculation Process Step 1: Estimate single API call time\nA complete API call includes:\nInput processing: ~1 second Model inference generation (assuming 500 tokens output): 500 ÷ 55 ≈ 9 seconds Network round-trip delay: ~1 second Total: About 10-12 seconds/call\nStep 2: Calculate maximum calls in 5 hours\n1 2 3 Maximum calls = Concurrency × (Total time ÷ Single call time) = 7 × (18,000 ÷ 10) = 12,600 calls Step 3: Convert to prompts\nAccording to official claims, each prompt triggers 15-20 calls:\n1 Completable prompts = 12,600 ÷ 17.5 ≈ 720 prompts Conclusion Metric Official Promise Concurrency Limit Achievement Rate Prompts per 5 hours 2,400 ~720 30% Even under ideal conditions, the actual usable amount of the Max package is only about 30% of the promise.\nHarsher Reality: Call Inflation in Agent Mode The above calculation is still based on the official claim of \u0026ldquo;15-20 calls per prompt.\u0026rdquo; But in actual AI Coding Agent scenarios (like Claude Code, Cline, etc.), the situation is much worse.\nHow Agent Mode Works When you give an AI programming assistant a task, it typically:\nAnalyzes requirements, creates a plan Reads relevant files (each file may trigger a call) Writes code Runs tests Discovers errors, fixes them Repeats 3-5 until successful A seemingly simple prompt may trigger 50-100+ model calls in an Agent loop.\nActual Measurement Case User feedback:\n\u0026ldquo;2 simple prompts, 80 seconds, consumed 38M Tokens, used up 97% of the 5-hour limit\u0026rdquo;\nReverse calculation:\nEach prompt consumes about 19M tokens If calculated at 128K context, equivalent to ~127 model calls/prompt This is 6-8 times higher than the official \u0026ldquo;15-20 times.\u0026rdquo;\nRevised Actual Usable Amount Scenario Calls per prompt Usable prompts in 5 hours Achievement Rate Official ideal 17.5 720 30% Light usage 50 252 10.5% Moderate usage 75 168 7% Heavy Agent usage 100+ \u0026lt;126 \u0026lt;5% Why Is This Happening? 1. Token Calculation Includes Context Big model token consumption isn\u0026rsquo;t just output, it includes input. In Coding scenarios:\nEach call must send complete conversation history Code project context can easily reach tens of K tokens 128K context window means each call may consume 100K+ tokens 2. Concurrency is a Hard Constraint Regardless of how large your package quota is, concurrency determines the maximum throughput per unit time. This is a physical bottleneck, not something commercial strategies can bypass.\n3. Promises Based on Ideal Assumptions Manufacturers\u0026rsquo; promotional numbers are often based on:\nEach call uses only small context Each prompt triggers only a few calls Users won\u0026rsquo;t use continuously at high intensity But these assumptions rarely hold true in real AI Coding scenarios.\nA Table to See the Truth Taking the Max package (~200 RMB/month) as an example:\nMetric Official Promotion Theoretical Limit Actual Expectation Prompts per 5 hours 2,400 720 150-400 Monthly prompts 345,600 103,680 21,600-57,600 Monthly tokens \u0026ldquo;Hundreds of billions\u0026rdquo; ~10 billion 1-3 billion Achievement Rate 100% 30% 5-17% Advice for Developers 1. Don\u0026rsquo;t Be Fooled by \u0026ldquo;Hundreds of Billions of Tokens\u0026rdquo; Token count is a highly misleading metric. In Coding Agent scenarios, context takes up the majority, with truly effective output tokens possibly only 1-5%.\n2. Focus on Concurrency This is the core metric that determines actual experience. If manufacturers don\u0026rsquo;t disclose concurrency limits, it\u0026rsquo;s likely because the numbers don\u0026rsquo;t look good.\n3. Calculate Cost per Prompt 1 Actual cost per prompt = Monthly fee ÷ Actual usable prompts Taking the Max package as an example:\nOfficial promotion: 200 ÷ 345,600 = 0.0006 RMB/prompt Actual situation: 200 ÷ 30,000 = 0.007 RMB/prompt A 10x difference.\n4. Consider Pay-as-You-Go If your usage isn\u0026rsquo;t high, pay-as-you-go may be more cost-effective than monthly packages. At least you won\u0026rsquo;t pay for \u0026ldquo;unusable quotas.\u0026rdquo;\nConclusion The emergence of big model Coding Plan packages is itself a good thing, lowering the barrier for developers to use AI programming assistants. But when choosing packages, be sure to:\nRequire manufacturers to disclose concurrency limits Calculate throughput limits yourself Don\u0026rsquo;t be misled by the big numbers of \u0026ldquo;hundreds of billions of tokens\u0026rdquo; After all, promised usage that can\u0026rsquo;t be consumed equals a disguised price increase.\nThis article is based on public information and mathematical derivation; specific values may vary due to manufacturer adjustments. Readers are advised to verify through actual measurements.\n","date":"2026-01-23T11:52:52+08:00","image":"https://svtter.cn/p/%E5%A4%A7%E6%A8%A1%E5%9E%8B-coding-plan-%E5%A5%97%E9%A4%90%E7%9A%84%E6%95%B0%E5%AD%A6%E9%99%B7%E9%98%B1%E5%B9%B6%E5%8F%91%E9%99%90%E5%88%B6%E4%B8%8B%E7%9A%84%E6%89%BF%E8%AF%BA%E9%87%8F%E8%83%BD%E5%90%A6%E5%85%91%E7%8E%B0/cover_hu_9cc6128247e715e7.png","permalink":"https://svtter.cn/en/p/the-mathematical-trap-of-big-model-coding-plan-packages-can-promised-usage-be-delivered-under-concurrency-limits/","title":"The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?"},{"content":"Many friends want to know: What is the internal server situation and infrastructure of our small software development company?\nInternal development clusters essentially solve the following issues:\nGit code management Data security and backup Multiple virtual machines providing development environments To address the above problems, we adopt the following solution.\nHardware Configuration We need to run approximately 10 virtual machine servers and 4 development machines simultaneously. If not counting electricity costs, I\u0026rsquo;ve kept the server hardware cost at around 5000 RMB, which has been running stably for 2 years.\nMain equipment:\nSecond-hand Dell mini hosts Thunderobot MIX hosts Why Not Use Entry-Level or Mid-Range Servers? The biggest reason is unnecessary. Power consumption comparison:\nConfiguration Type Power Consumption Equivalent Single-socket configuration ≈ 4-6 Mini hosts Dual-socket configuration ≈ 10-15 Mini hosts While it saves money, when server memory requirements are high, it can still impact development efficiency.\nStorage and Networking Switch: Mercury entry-level 2.5G switch NAS Server: UGREEN DH4300 Plus Usage Virtual machine servers mainly run:\nCI Runner Engineer development environments Finally However, recently, as we need to develop some services on Kubernetes, the current configuration has become somewhat inadequate.\n","date":"2026-01-18T09:06:31+08:00","image":"https://svtter.cn/p/%E5%B0%8F%E5%9E%8B%E5%85%AC%E5%8F%B8%E7%9A%84%E7%A7%81%E6%9C%89%E4%BA%91/bg_hu_ab1bcdb7adebd4eb.png","permalink":"https://svtter.cn/en/p/private-cloud-for-small-companies/","title":"Private Cloud for Small Companies"},{"content":"Recently, I\u0026rsquo;ve been extensively using the opencode/claude code combination for development and have explored three particularly useful tools.\nThey address several issues:\nParallel development on a single server; controlling tmux: tmux and tmux-mcp Preventing claude code from stopping at meaningless points: ralph-loop End-to-end automated testing: playwright mcp Tool List tmux mcp First, configure tmux in the Linux environment with opencode, then have opencode install https://github.com/rinadelph/tmux-mcp.git. Once installed, you can use oc to control tmux content.\nThis method can be used to reactivate stopped opencode sessions. For example, you can open multiple tmux sessions and have one opencode monitor, start, and stop tasks through the tmux tool.\nralph-loop Ralph is an autonomous AI agent loop that repeatedly runs Amp until all PRD items are completed. Each iteration creates a brand new Amp instance with a clean context.\nRalph likely originated from here: https://github.com/snarktank/ralph\nDue to its effectiveness (which actually occurred after further improvements in model performance), it was also introduced to claude code.\nRalph-loop is a Claude Code plugin that allows Claude Code to automatically restart when tasks are completed, forming a loop execution mechanism. This is particularly useful for tasks that require continuous improvement or iteration.\nInstallation Method Install through Claude Code official plugin market:\n/plugin install ralph-wiggum@claude-plugins-official or cc '/plugin install ralph-wiggum@claude-plugins-official' Configuration and Usage:\nAfter installation, you can start it in Claude Code via the /ralph-loop command Set tasks and termination conditions, Claude Code will automatically restart each time it stops This is particularly useful for scenarios requiring multiple iterations of code improvement, debugging, or testing Use Cases Code Refactoring: Have Claude Code continuously improve code quality Test-Driven Development: Write tests, then have Claude Code continuously improve implementations Debugging Loops: Automatically restart debugging sessions Continuous Integration: Simulate CI/CD processes locally The drawback of this plugin is that it consumes a lot of tokens; without a max20 subscription, it\u0026rsquo;s better not to use it. However, for tasks requiring high-quality output, this tool can significantly improve work efficiency.\nplaywright mcp This plugin can launch browsers to complete end-to-end testing or write end-to-end test code. It can better form loops to have cc or oc improve code.\nInstallation method: claude 'help me install playwright mcp'\nRewriting as Agents I recommend directly rewriting these tools and MCPs as agents through opencode or claude code.\nCompared to skills commands, these tools are more suitable for invocation through agents. Agent context is very clean, making tool invocation almost inevitable.\nSummary As LLMs become increasingly powerful, numerous MCPs that rely on LLM capabilities naturally gain improvements. Tools that weren\u0026rsquo;t very useful before become more effective. This aligns with the saying: \u0026ldquo;Don\u0026rsquo;t build things that become meaningless after large model capability enhancements.\u0026rdquo; Large model capabilities continue to improve, and prices keep decreasing.\nI believe the next step is to bridge interactions between different modalities and tools, as well as endowing tools with large model capabilities, which is one of the inevitable development directions for agent engineers.\n","date":"2026-01-17T22:18:33+08:00","image":"https://svtter.cn/p/%E6%9C%80%E8%BF%91%E5%8F%91%E7%8E%B0%E5%A5%BD%E7%94%A8%E7%9A%84-mcp-%E5%B7%A5%E5%85%B7/bg_hu_75fc4ea3dca783a6.png","permalink":"https://svtter.cn/en/p/recently-discovered-useful-mcp-tools/","title":"Recently Discovered Useful MCP Tools"},{"content":"Claude Code\u0026rsquo;s $100/month price tag is a bit steep for many. To address this, I\u0026rsquo;ve been experimenting with a more practical and affordable workflow.\nIn terms of models, my recommendation is to use Gemini 3 Flash on an as-needed (pay-as-you-go) basis as a replacement.\nWhy? Gemini 3 Flash offers incredible value. It\u0026rsquo;s fast, efficient, and costs a fraction of what you\u0026rsquo;d pay for Opus or Sonnet. For the vast majority of tasks, Flash is more than enough.\nThe Cost-Saving Workflow Here is my current \u0026ldquo;budget\u0026rdquo; workflow:\nPlanning \u0026amp; Proposals: Use Gemini 3 Flash. Execution \u0026amp; Building: Use the free GLM 4.7 (or MiniMax M2.1) via OpenCode. If you have a Zhipu Coding Plan, that works perfectly too. Speaking of Gemini 3, we have to talk about GPT-5.2.\nMany engineers still rely on ChatGPT.com directly instead of using a proper coding agent. Regardless of the efficiency debate, the reliability is concerning. From my experience, GPT-5.2\u0026rsquo;s default tone has been tuned to be overly \u0026ldquo;people-pleasing,\u0026rdquo; which might not be ideal for professional developers seeking direct technical feedback.\nFurthermore, while GPT-5.2 scored impressively on SWE-bench Verified, my real-world experience has been mixed. It\u0026rsquo;s worth looking at the history of SWE-bench:\nOriginally proposed by a team from Princeton University (ICLR 2024), it evaluates a model\u0026rsquo;s ability to solve real GitHub issues. However, in August 2024, OpenAI\u0026rsquo;s Preparedness team collaborated with the original authors to create SWE-bench Verified (a subset of 500 manually verified issues). Since OpenAI was involved in the design of this benchmark, their models\u0026rsquo; performance on it should be taken with a grain of salt. While not necessarily a deliberate manipulation, the risk of inherent bias is significant.\nUltimately, as I often say, \u0026ldquo;Codex\u0026rdquo; models don\u0026rsquo;t always deliver the most practical results in everyday coding.\nOpenCode Tips Leveraging Agents: OpenCode supports launching SubAgents. When debugging complex projects, you can have OpenCode launch agents in different directories to handle front-end and back-end tasks separately, which also helps avoid permission issues.\nOpenSpec: Cross-Agent Collaboration:\n1 2 3 4 1. OpenCode + Gemini 3 Flash → Generate proposal 2. Codex → Code Review 3. Claude Code → Secondary Review 4. OpenSpec Apply → Final Execution OpenSpec generates reliable specs, but sometimes cheaper models produce lower-quality code. In such cases, you can generate multiple times using the spec and select the best result.\nFinal Thoughts As AI Agent engineers, we need to adapt to these ongoing trends:\nModels are becoming smarter. Execution is becoming faster. Prices are dropping. While these trends are promising, we still need to balance speed, cost, and quality for every task. We might soon see agent systems that automate this balancing act, but for now, it\u0026rsquo;s a crucial part of the engineer\u0026rsquo;s role.\n","date":"2026-01-05T16:00:00+08:00","image":"https://svtter.cn/p/%E9%AB%98%E6%95%88%E7%9C%81%E9%92%B1%E6%88%91%E7%9A%84-ai-agent-%E5%B7%A5%E4%BD%9C%E6%B5%81%E9%80%89%E6%8B%A9/featured-image_hu_405209157f86e461.jpg","permalink":"https://svtter.cn/en/p/efficient-and-cost-effective-my-ai-agent-workflow-choice/","title":"Efficient and Cost-Effective: My AI Agent Workflow Choice"},{"content":"This is my analysis report on the coding performance and cost-effectiveness of several models, used to compare the performance and cost efficiency of different models in coding tasks, in order to select the most suitable model.\nFor Chinese language tasks, using GLM 4.7 is clearly more cost-effective. The price of 2000 RMB basically covers a year of usage. The downside is that during peak hours, even the enterprise MAX version can be very slow.\nFrom my practical experience, the capabilities of minimax m2.1 far exceed those of GLM 4.7.\n","date":"2026-01-03T00:00:00Z","image":"https://svtter.cn/p/%E7%BC%96%E7%A0%81%E6%80%A7%E8%83%BD%E4%B8%8E%E6%A8%A1%E5%9E%8B%E6%80%A7%E4%BB%B7%E6%AF%94%E5%88%86%E6%9E%90/pics/bg-new-v2_hu_ceb9056d8d051df1.jpg","permalink":"https://svtter.cn/en/p/coding-performance-and-model-cost-effectiveness-analysis/","title":"Coding Performance and Model Cost-Effectiveness Analysis"},{"content":"My project uses uv to manage Python dependencies, but Claude Code habitually defaults to python or pip install. I tried using Skills and Hooks to enforce this standard and encountered quite a few pitfalls.\nGoal Create a Skill: Inform Claude that the project uses uv Create a Hook: Intercept python/pip commands Verify effectiveness Troubleshooting Journey First Attempt: Wrong Skill File Structure (Commit 8a05759) 1 2 ❌ .claude/skills/python-uv.md ✅ .claude/skills/python-uv/SKILL.md The frontmatter also needed changes:\n1 2 3 4 5 6 7 8 9 --- # Wrong description: Python dependency and execution management using uv location: project # Correct name: python-uv description: Python dependency and execution management using uv. Use when adding Python packages, running Python scripts, or managing Python dependencies. Enforces uv instead of pip/python commands. --- Key points:\nFilename must be SKILL.md, placed in the corresponding directory Frontmatter requires a name field description should be detailed to help Claude identify when to trigger Second Attempt: Hook Only Warns Without Blocking (Commit d250c3b) Initially wrote the Hook in Bash, which only displayed warnings but didn\u0026rsquo;t prevent execution. Also tried configuring environment.PATH, which didn\u0026rsquo;t work.\nThird Attempt: Wrong Hook Exit Code (Commit d3790a4) Tried using exit 1 to block commands, but it still didn\u0026rsquo;t work.\nCorrect exit codes:\nexit 0: Allow execution exit 1: Hook fails, but doesn\u0026rsquo;t block the tool exit 2: Actually blocks tool execution ✅ Fourth Attempt: Fixed Skill Format (Commit 2595b68) Found the file structure was wrong, changed to the correct skills/xxx/SKILL.md format.\nFifth Attempt: Rewrote Hook in Python (Commit dcc726d) Bash JSON parsing was too fragile, ultimately rewrote in Python:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 #!/usr/bin/env python3 \u0026#34;\u0026#34;\u0026#34; Hook to block python/python3 commands and enforce uv usage. \u0026#34;\u0026#34;\u0026#34; import json import sys import re try: # Correctly parse JSON input input_data = json.load(sys.stdin) except json.JSONDecodeError as e: print(f\u0026#34;Error: Invalid JSON input: {e}\u0026#34;, file=sys.stderr) sys.exit(1) # Get tool name and command tool_name = input_data.get(\u0026#34;tool_name\u0026#34;, \u0026#34;\u0026#34;) tool_input = input_data.get(\u0026#34;tool_input\u0026#34;, {}) command = tool_input.get(\u0026#34;command\u0026#34;, \u0026#34;\u0026#34;) # Only process Bash tool if tool_name != \u0026#34;Bash\u0026#34; or not command: sys.exit(0) # Check if using python/python3 if re.search(r\u0026#39;\\bpython3?\\b\u0026#39;, command): # Whitelist: allow version checks etc. if re.search(r\u0026#39;(--version|--help|which python)\u0026#39;, command): sys.exit(0) # Block command error_msg = ( f\u0026#34;\\n❌ BLOCKED: This project requires using \u0026#39;uv\u0026#39;\\n\\n\u0026#34; f\u0026#34;Original command:\\n {command}\\n\\n\u0026#34; f\u0026#34;Suggested replacement:\\n {suggested}\\n\u0026#34; ) print(error_msg, file=sys.stderr) sys.exit(2) # Use exit 2 to block tool invocation # Allow other commands sys.exit(0) Configuration file (.claude/settings.json):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 { \u0026#34;hooks\u0026#34;: { \u0026#34;PreToolUse\u0026#34;: [ { \u0026#34;matcher\u0026#34;: \u0026#34;Bash\u0026#34;, // Simplified matcher format \u0026#34;hooks\u0026#34;: [ { \u0026#34;type\u0026#34;: \u0026#34;command\u0026#34;, \u0026#34;command\u0026#34;: \u0026#34;$CLAUDE_PROJECT_DIR/.claude/hooks/pre-bash\u0026#34; } ] } ] } // Remove ineffective environment.PATH configuration } Notes:\nmatcher can directly specify the tool name, no need for expressions Use $CLAUDE_PROJECT_DIR to reference the project path environment.PATH configuration doesn\u0026rsquo;t work in Hooks, don\u0026rsquo;t waste time on it Final File Structure 1 2 3 4 5 6 7 .claude/ ├── skills/ │ └── python-uv/ │ └── SKILL.md ├── hooks/ │ └── pre-bash # Python script └── settings.json Testing ✅ Block regular commands:\n1 2 $ python test.py ❌ BLOCKED: Use \u0026#39;uv run\u0026#39; instead ✅ Allow version checks:\n1 2 $ python --version Python 3.11.0 ✅ Skill active: When asked \u0026ldquo;how to run Python scripts,\u0026rdquo; Claude will proactively suggest using uv run.\nKey Points Claude Code Sometimes Doesn\u0026rsquo;t Proactively Query Specifications\nI explicitly requested creating Hooks and Skills, but Claude Code started writing code without first checking the official documentation. This led to:\nFile structure errors multiple times Wrong exit codes Incorrect JSON parsing approach If it had used WebFetch to read the official Hook and Skill documentation before starting, all these pitfalls could have been avoided.\nThis isn\u0026rsquo;t about users needing to read documentation, but rather that AI agents should check specifications before executing unfamiliar tasks.\nSkills and Hooks Can Indeed Enforce Claude Code\u0026rsquo;s Behavior\nHooks can constrain Python commands to provide the correct command suggestions. This is also the approach used in Kilo Code.\nReferences uv Claude Code Hooks Documentation Claude Code Skills Documentation ","date":"2025-12-30T10:00:00+08:00","image":"https://svtter.cn/p/%E7%BB%99-claude-code-%E9%85%8D%E7%BD%AE-python-uv-%E7%9A%84-hook-%E5%92%8C-skill/pics/bg.svg","permalink":"https://svtter.cn/en/p/configuring-claude-code-python-uv-hooks-and-skills/","title":"Configuring Claude Code Python UV Hooks and Skills"},{"content":"Recently, I verified an effective AI development method that doesn\u0026rsquo;t affect the existing workflow. Here\u0026rsquo;s a summary.\nWhy Choose VM + Claude Code Isolation: Avoid polluting the main system, can snapshot and rollback Reproducibility: Team members can quickly replicate the same environment Suitable for automated testing: Browser automation tools like Playwright require a desktop environment Safety: Not too worried about agent generating rm -rf / commands. VM crashes don\u0026rsquo;t affect the virtualization platform; just recreate it. Environment Setup 1. PVE Creates Ubuntu Desktop VM Download Ubuntu Desktop ISO, upload to PVE\u0026rsquo;s ISO storage Create VM: CPU: host type, 2-4 cores RAM: 4-8 GB Disk: VirtIO SCSI, 40GB+ Network: VirtIO Mount ISO and start installation After installation, install qemu-guest-agent: 1 2 sudo apt install qemu-guest-agent sudo systemctl enable --now qemu-guest-agent 2. Configure Xfce + xrdp Remote Desktop Install xrdp and Xfce (lighter and more compatible than GNOME):\n1 2 3 4 5 6 sudo apt install xrdp sudo systemctl enable xrdp sudo systemctl start xrdp sudo apt install xfce4 xfce4-goodies echo xfce4-session \u0026gt; ~/.xsession During installation, when prompted to choose a display manager, select lightdm.\nSolve Black Screen Issue Edit xrdp startup script:\n1 sudo nano /etc/xrdp/startwm.sh Before the last two lines test -x and exec, add:\n1 2 3 unset DBUS_SESSION_BUS_ADDRESS unset XDG_RUNTIME_DIR startxfce4 Restart xrdp:\n1 sudo systemctl restart xrdp Note: Don\u0026rsquo;t log in to the desktop locally before connecting, otherwise the same user will see a black screen.\nAdjust Resolution/DPI Before Windows remote desktop connection, lower the resolution in \u0026ldquo;Display Options\u0026rdquo; (e.g., 1920×1080) Or in Xfce: Settings Manager → Appearance → Fonts → Increase DPI (e.g., 120) Disable Crash Prompts After switching desktops, there may be GNOME component crash prompts (doesn\u0026rsquo;t affect usage):\n1 sudo systemctl disable apport 3. Install Claude Code 1 2 3 4 5 6 7 8 9 10 # Install Node.js (e.g., using nvm) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash source ~/.bashrc nvm install --lts # Install Claude Code npm install -g @anthropic-ai/claude-code # Or use official installation script curl -fsSL https://claude.ai/install.sh | bash Run claude command for the first time and follow prompts to log in and authenticate.\nAutomated Testing Workflow: MCP Configuration Playwright MCP MCP Features @playwright/mcp (Microsoft official) Lightweight, based on accessibility tree @executeautomation/playwright-mcp-server (community) More complete features, supports screenshots, JS execution @agentdeskai/browser-tools-mcp Console log monitoring, Lighthouse performance analysis Claude Code Configuration Create .mcp.json in project root:\n1 2 3 4 5 6 7 8 9 10 11 12 { \u0026#34;mcpServers\u0026#34;: { \u0026#34;playwright\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;@executeautomation/playwright-mcp-server\u0026#34;] }, \u0026#34;browser-tools\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;npx\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-y\u0026#34;, \u0026#34;@agentdeskai/browser-tools-mcp\u0026#34;] } } } Or add via CLI:\n1 claude mcp add playwright --scope project -- npx -y @executeautomation/playwright-mcp-server OpenCode Configuration If using OpenCode, the configuration format is different (opencode.json):\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { \u0026#34;$schema\u0026#34;: \u0026#34;https://opencode.ai/config.json\u0026#34;, \u0026#34;mcp\u0026#34;: { \u0026#34;playwright\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;local\u0026#34;, \u0026#34;command\u0026#34;: [\u0026#34;npx\u0026#34;, \u0026#34;-y\u0026#34;, \u0026#34;@executeautomation/playwright-mcp-server\u0026#34;], \u0026#34;enabled\u0026#34;: true }, \u0026#34;browser-tools\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;local\u0026#34;, \u0026#34;command\u0026#34;: [\u0026#34;npx\u0026#34;, \u0026#34;-y\u0026#34;, \u0026#34;@agentdeskai/browser-tools-mcp\u0026#34;], \u0026#34;enabled\u0026#34;: true } } } Configuration Comparison:\nConfiguration Item Claude Code OpenCode Top-level key mcpServers mcp type Not needed Required (local) command String Array Usage Example After configuration, you can drive tests with natural language:\n1 2 3 4 5 claude \u0026#34;Open localhost:3000, test the login flow, verify if it redirects to homepage\u0026#34; claude \u0026#34;Screenshot and compare homepage layout under mobile/tablet/desktop sizes\u0026#34; claude \u0026#34;Check if page console has any errors\u0026#34; Results Display Summary The VM + Claude Code + Playwright MCP combination provides an isolated, reproducible automated development testing environment. The entire process:\nPVE creates Ubuntu Desktop VM Configure Xfce + xrdp remote access Install Claude Code / OpenCode Configure Playwright MCP Drive automated testing with natural language ","date":"2025-12-25T00:00:00+08:00","image":"https://svtter.cn/p/%E5%9C%A8%E8%99%9A%E6%8B%9F%E6%9C%BA%E4%B8%8A%E8%BF%90%E8%A1%8C-claude-code-%E5%AE%9E%E7%8E%B0%E8%87%AA%E5%8A%A8%E5%8C%96%E6%B5%8B%E8%AF%95%E4%B8%8E%E5%BC%80%E5%8F%91/cover_hd_upscaled_hu_94ae091260bc9a94.png","permalink":"https://svtter.cn/en/p/running-claude-code-in-a-vm-for-automated-testing-and-development/","title":"Running Claude Code in a VM for Automated Testing and Development"},{"content":"When managing a Hugo blog, you often need to switch back and forth between the terminal and editor. To simplify this process, I developed hugo-admin, a lightweight web management interface based on Flask.\nWhy It\u0026rsquo;s Needed The typical workflow for writing Hugo blogs is:\nExecute hugo new post/xxx.md in terminal Open the file with an editor to write content Start hugo server in terminal to preview Switch to browser to check the effect If not satisfied, return to editor to modify This workflow is fine, but it would be more convenient if all operations could be completed in one place.\nMain Features hugo-admin provides the following features:\nDashboard: Blog statistics overview Article Management: Browse, search, filter articles Markdown Editor: Online editing with auto-save support Hugo Server Control: Start/stop server, view logs in real-time Image Management: Upload and manage article images Interface Display Tech Stack Backend uses Flask + Flask-SocketIO, frontend uses Tailwind CSS + Alpine.js. Real-time log pushing is implemented based on WebSocket.\n1 2 3 4 5 6 7 8 hugo-admin/ ├── app.py # Flask application ├── services/ # Business logic layer │ ├── hugo_service.py # Hugo server management │ ├── post_service.py # Article operations │ └── cache_service.py # Cache layer ├── templates/ # Jinja2 templates └── static/ # Static resources Installation and Usage 1 2 3 4 5 6 7 8 9 10 11 12 13 # Clone repository git clone https://github.com/Svtter/hugo-admin.git cd hugo-admin # Install dependencies pip install -r requirements.txt # Configure Hugo directory cp config.py config_local.py # Edit config_local.py to set HUGO_ROOT # Start python app.py After starting, visit http://127.0.0.1:5000.\nCore Implementation The Python version uses SQLite for caching to avoid scanning the file system every time:\n1 post_service = PostService(app.config[\u0026#39;CONTENT_DIR\u0026#39;], use_cache=True) Hugo server control manages processes based on psutil, supporting real-time log pushing:\n1 hugo_manager = HugoServerManager(app.config[\u0026#39;HUGO_ROOT\u0026#39;], socketio) Advanced Version Besides the open-source Python version, I also developed a Go language implementation of the advanced version. Compared to the open-source version, the Go version has the following advantages:\nHigher performance: Compiled Go language executes more efficiently Lower resource usage: Less memory and CPU usage Single file deployment: Compiled into a single executable file, no need for dependency environment More features: Includes more advanced features Direct Hugo API usage: No need for SQLite3 cache, directly calls Hugo API to get article information, more lightweight and efficient The advanced version is priced at $10 USD. Click here to purchase to get complete source code. If you have higher requirements for performance and deployment convenience, consider the advanced version.\nFuture Plans Git operations interface Batch operation support Docker deployment The project is open source, welcome to Star and PR.\n","date":"2025-12-23T16:00:00+08:00","image":"https://svtter.cn/p/hugo-admin-%E8%BD%BB%E9%87%8F%E7%BA%A7-hugo-%E5%8D%9A%E5%AE%A2%E7%AE%A1%E7%90%86%E7%95%8C%E9%9D%A2/pics/featured_hu_825ae970c1a17dd7.png","permalink":"https://svtter.cn/en/p/hugo-admin-a-lightweight-hugo-blog-management-interface/","title":"Hugo Admin - A Lightweight Hugo Blog Management Interface"},{"content":"Recently, I used Claude Code to add some SEO features to my own blog theme Fried Rice, and the overall experience was quite good.\nBackground Fried Rice is a theme forked from hugo-theme-stack. Previously, I had already added some basic JSON-LD structured data, and this time I wanted to continue improving it.\nWhat Was Done This Time Mainly enhancing SEO structured data:\nWebSite schema (supports search action) Organization schema (includes founder, contact point, address) FAQ schema (supports inline FAQ in articles) Enhanced Article/BlogPosting schema (added accessibility metadata) Claude Code\u0026rsquo;s Performance The entire development process took about 2 hours. Claude Code helped me:\nWrite code - Hugo template syntax is cumbersome, letting AI write it saves a lot of effort Review code - After I committed, I asked it to check, and it found several issues: datePublished was defined 3 times founder object was defined repeatedly JSON output had double escaping issues Variable scope errors Fix issues - After finding issues, I asked it to fix them directly, all fixed at once Create PR, tag, write CHANGELOG - These trivial tasks can also be done A pleasant surprise was that it could find logical issues in the code. For example, Hugo\u0026rsquo;s jsonify output was HTML-escaped causing JSON format errors, and it found the correct solution (using safeJS).\nShortcomings Sometimes needs multiple reminders to use the correct tools Not very familiar with Hugo template syntax in some places, needs several iterations Summary For this kind of \u0026ldquo;add feature + fix bug\u0026rdquo; task, Claude Code is quite useful. Especially for tedious syntax like Hugo templates, having AI write it is much more efficient.\nRelated Projects Based on:\n","date":"2025-12-23T15:00:00+08:00","image":"https://svtter.cn/p/%E7%94%A8-claude-code-%E5%BC%80%E5%8F%91-fried-rice-%E4%B8%BB%E9%A2%98/pics/bg_hu_c772ce32f450697b.png","permalink":"https://svtter.cn/en/p/developing-fried-rice-theme-with-claude-code/","title":"Developing Fried Rice Theme with Claude Code"},{"content":"CS146S is a good course, one of the reasons is that it teaches modern software engineers how to better collaborate with AI. Secondly, it basically covers all my modern coding capabilities. (It\u0026rsquo;s a joke!)\nIn the following content, I will embed the slides from the course as hyperlinks in my text. If you\u0026rsquo;re interested, you can click the hyperlinks directly to open the corresponding slides.\nBasic Techniques I think everyone, like me, has already mastered the basic capabilities. More clear and explicit prompts let LLMs execute instructions unambiguously. Additionally, there are prompt optimization techniques, and using Claude to optimize prompts.\nThe course also talked about how to build coding agents, emphasizing that you can use the Claude Code SDK. It\u0026rsquo;s now called Claude Agent SDK.\nTo enhance LLM capabilities, you can also use MCP services. I built git-mcp, and there\u0026rsquo;s also an unopen-sourced experimental startup MCP.\nMCP a bit deeper (content from the PPT) With MCP, it\u0026rsquo;s worth noting the Host/Server/Client concept. Many Hosts are not open-source. Deepchat\u0026rsquo;s Host can be referenced.\nLimitations:\n1 2 3 Agents don\u0026#39;t handle many tools very well today APIs eat up your **context** window quickly Design APIs to be AI-native rather that rigid IDE Agent From the IDE perspective, I\u0026rsquo;ve switched from frequently using Cursor to using Claude Code + VSCode for programming. I feel Claude Code as a CLI is more powerful. However, I haven\u0026rsquo;t used Cursor for a while, so I don\u0026rsquo;t know if there have been some improvements. Trae\u0026rsquo;s solo mode is just like that, basically insufficient intelligence is the biggest problem. Trae CN.\nAdditionally worth mentioning is that Silas Alberti, Head of Research Cognition\u0026rsquo;s slides are very powerful.\nThis summary diagram is awesome. Is it really free to watch?\nThis article also mentions the concept of parallel agents.\nSo for me, the next direction to improve is cloud + async.\nThis is Silas Alberti\u0026rsquo;s advice:\ndevin and Claude Code Cloud are exactly the same. Actually, you can completely use Claude Code Cloud version for vibe coding.\nAgent Manager Engineers need to become agent managers, not just software engineers.\nUnder the Claude Code designer mindset, the software design process should be:\nProvide high level requirements 🟩 Convert requirements into a design doc 🟩/🟦 Implement solution from doc 🟦 Add tests 🟦 Ensure CI (continuous integration) passes 🟦 Code review 🟦 Update docs 🟦 My habit is more to write simple requirements, then generate design, then let Claude Code implement the rest itself.\nI recently found it\u0026rsquo;s not that capable. I adopted a test-driven development approach to ensure every step is done correctly. Otherwise, CI and Add tests actually have no meaning.\nTechniques for directing agents:\nAgent behavior files (Claude.md/Cursorrules/agents.md) Hooks Commands Subagents I\u0026rsquo;ve already used subagents and commands a lot. But I haven\u0026rsquo;t found a killing scenario for hooks yet.\nBest practice Claude Code What I want to say is to use subagents as much as possible to avoid the \u0026ldquo;lost in the middle\u0026rdquo; phenomenon.\nClaude Code CLI Why did I buy Claude Code?\nWe can do more things through the SDK:\n1 2 3 4 claude -p \\ \u0026#34;what did i do this week?\u0026#34; \\ --allowedTools Bash(git log:*) --output-format stream-json Conclusion This course is free, but the insights inside surpass most paid courses. If you can understand and quickly absorb it, don\u0026rsquo;t be stingy with your time, learn it.\n","date":"2025-12-15T20:45:35+08:00","image":"https://svtter.cn/p/cs146s-%E6%98%AF%E4%B8%80%E9%97%A8%E5%A5%BD%E8%AF%BE%E7%A8%8B/pics/bg_hu_d049252254f1ed21.png","permalink":"https://svtter.cn/en/p/cs146s-is-a-good-course/","title":"CS146S is a Good Course"},{"content":"Recently, I had an incident with Dify/Langchain and reached this conclusion.\nRetrospective About 7 months ago, I deployed the open-source Dify to the server and started an instance through the official docker compose. However, recently, due to a sandbox escape vulnerability in Dify\u0026rsquo;s code node (CVE-2025-3466), I was privilege-escalated via webshell and had a Monero mining program implanted.\nFortunately, after this privilege escalation, the intruder didn\u0026rsquo;t do much, and the intrusion was in the docker container, with limited damage.\nCVE-2025-3466 Details CVE ID: CVE-2025-3466 Release Date: July 7, 2025 CVSS Score: 9.8 (Critical) Affected Versions: langgenius/dify 1.1.0 - 1.1.2 Fixed Version: 1.1.3\nVulnerability Description: Dify\u0026rsquo;s code node has a sandbox escape vulnerability, allowing attackers to bypass sandbox security restrictions by overwriting global JavaScript functions (such as parseInt), thereby executing arbitrary code with full root privileges.\nAttack Flow:\nAttacker crafts malicious payload in the code node\u0026rsquo;s input Malicious code overwrites global JavaScript functions before sandbox restrictions are enforced Uses the overwritten functions to bypass security checks Executes arbitrary commands, gaining complete control of the container Implants webshell backdoor and Monero mining program Impact Scope:\nUnauthorized access to secret keys and API keys Access to internal network servers Lateral movement within the dify.ai system Complete takeover of server control Related Links:\nNVD Details GitHub Advisory GHSA-x53g-q9xm-rf4m From this perspective, several key factors are indispensable for protecting server security.\nPersonal Server Security From a security perspective, there are several things that must be done on personal servers. The first thing is to avoid using passwords as much as possible. For example, SSH passwords.\nSSH Passwords Password login must be disabled. SSH password cracking is relatively easy. If the password is simple, or if the user changes the password themselves and uses a simple password, the server will be breached.\nIf using Debian/Linux, disabling password login and disabling root login are mandatory:\nThe fewer software packages used, the narrower the attacker\u0026rsquo;s attack surface. Once only nginx is exposed on your server, and port 80 and port 22 (SSH) are not open, the attacker\u0026rsquo;s attack surface is limited to nginx-related content.\nUse Rootless Docker Using container technology is equivalent to further virtualizing on top of the cloud service provider\u0026rsquo;s infrastructure.\nUsing rootless docker can further limit container permissions. Even if the container is breached, the attacker cannot directly gain root privileges on the host. This is the last line of defense.\nLimit Container Network Access Most services don\u0026rsquo;t need unrestricted external network access permissions. Reasonably configuring container network policies to limit unnecessary network access can greatly reduce the attack surface.\nFor example, many services only need to access databases or internal services, and don\u0026rsquo;t need to access the external network at all. If the container doesn\u0026rsquo;t have external network access permissions, even if breached, the attacker cannot download mining programs or communicate with C2 servers.\nHow to Use Open Source Software with Caution This incident made me reflect on the following points when using emerging open source software:\nChoose Mature Projects Look at the project\u0026rsquo;s star count, commit frequency, and issue handling status. If a project:\nHas few stars (less than a few hundred) Hasn\u0026rsquo;t been updated in recent months Has a large number of unresolved issues Then the risk of using this project is high.\nAudit Dependencies Open source software often depends on a large number of third-party libraries. Like Dify in this incident, there was a serious code node sandbox escape vulnerability. Before deployment, it\u0026rsquo;s best to:\nLook at the project\u0026rsquo;s dependency tree Check for known vulnerabilities Regularly update dependencies Regular Updates and Security Scanning Regularly check CVE databases Use tools like snyk, trivy for dependency vulnerability scanning Update to fixed versions in a timely manner Limit Permissions Even if you trust a certain open source software, you should give it minimal permissions:\nDon\u0026rsquo;t give containers privileged permissions Limit container resource usage (CPU, memory) Use read-only file systems (if possible) Don\u0026rsquo;t mount the host\u0026rsquo;s sensitive directories into the container Monitoring and Alerting Security is a continuous process, can\u0026rsquo;t rely solely on prevention. Establishing comprehensive monitoring and alerting mechanisms is crucial:\nMonitor system resource usage (CPU, memory, disk IO anomalies may indicate mining programs) Monitor network traffic (abnormal outbound connections) Monitor process list (abnormal processes) Set up log alerts (e.g., failed login attempts) Conclusion Open source software provides us with great convenience, but also brings security risks. Although this incident didn\u0026rsquo;t cause much loss, it gave me an important lesson:\nDon\u0026rsquo;t blindly trust any software, especially emerging open source projects. Do more investigation before use, give minimal permissions during use, and continuously monitor and update after use.\nServer security is not a one-time solution, but requires continuous attention and maintenance.\n","date":"2025-12-13T11:02:40+08:00","image":"https://svtter.cn/p/%E8%B0%A8%E6%85%8E%E4%BD%BF%E7%94%A8%E6%96%B0%E5%85%B4%E5%BC%80%E6%BA%90%E8%BD%AF%E4%BB%B6/pics/background_hu_2c6e24b85ca96fd9.png","permalink":"https://svtter.cn/en/p/use-emerging-open-source-software-with-caution/","title":"Use Emerging Open Source Software with Caution"},{"content":" 1 2 3 Which is the most expensive model on Silicon Flow? I mean siliconflow.cn Help me take a look Over the past year, I have attempted to use deepchat and large model APIs (such as k2 thinking turbo) to build a relatively private chat tool (or agent assistant) for handling some private data. However, the overall experience has not been great. The large models often provide incorrect answers.\nFor search capabilities, I used the bocha API, resetting 10 credits to provide search functionality for the large model.\nTest Questions I feel there are still some issues with context handling (within a single chat window). I briefly tested this question: Which is the most expensive model on Silicon Flow?.\nThe answer is:\nKimi k2 thinking turbo First, deepchat:\nHmm, incorrect.\nThen, kimi official:\nAlso incorrect.\nTrying deepseek First, let\u0026rsquo;s try the client.\nIncorrect.\nThen, deepseek official.\nVery close, and the answer seems reasonable. Unfortunately, it\u0026rsquo;s still incorrect.\nIf we ask ChatGPT directly Hiss, a bit off. Let\u0026rsquo;s try gpt-5.\nPrompt:\nInference - Reasons for Poor Performance Insufficient search capability. The Bocha API is to blame. Different models may have different optimal hyperparameters for best performance. I called the large model API from Silicon Flow. Conclusion For this specific problem, ChatGPT still performs better. Compared to before, the official search + model combination also seems to perform better. Therefore, unless the data is particularly sensitive, it\u0026rsquo;s better to use the official service. This article is for reference only, just for fun. ","date":"2025-11-19T17:03:18.914891+08:00","image":"https://svtter.cn/p/%E7%AC%AC%E4%B8%89%E6%96%B9%E5%AE%A2%E6%88%B7%E7%AB%AF%E4%B8%8E%E5%A4%A7%E6%A8%A1%E5%9E%8B-api-%E7%BB%93%E5%90%88--%E6%80%A7%E8%83%BD%E5%B0%8F%E6%B5%8B/pics/bg_hu_bb76a70def489bb0.jpg","permalink":"https://svtter.cn/en/p/third-party-client-performance/","title":"Third-party Client Performance"},{"content":" Another article on how to mitigate losses with glm4.6. Our old friend glm 4.6. The new friend doubao-seed-code has also arrived.\ngithub spec-kit is a coding agent enhancement tool launched by GitHub, aimed at making engineering more standardized and easier.\nI initially looked down on this, thinking I have the claude code max plan, so why bother using it? Then:\nThis is actually the result of using spec kit, leading to a huge token consumption. Otherwise, based on my usual usage, it should have been just right.\nThis means that cheaper models might be more cost-effective to use. Because they are less capable, constraining their behavior with extensive specs might lead to better performance than before.\nLet\u0026rsquo;s try out spec-kit.\nInstallation For installation, it\u0026rsquo;s recommended to take a dual approach.\nOne is to use it directly without worrying too much about installation:\n1 uvx --from git+https://github.com/github/spec-kit.git specify init . --github-token=$GITHUB_TOKEN Here, GITHUB_TOKEN refers to the GitHub personal token.\nAnother method is to install it first and then use it:\n1 pipx install git+https://github.com/github/spec-kit.git Each has its pros and cons. The former requires no installation but needs to pull from git each time; the latter requires a one-time installation but involves dependency management.\nSpecification Driven Development SDD is a newly emerging concept. It uses extensive constraints to enable coding agents to write production-ready code.\nThis article explains it well:\nFollow-up on Spec-Driven Development Two Months Later: spec-kit and Ecosystem Development Research This article follows up on the rapid development of GitHub's spec-kit project two months after its release, including its community growth, feature iterations, and ecosystem status. It also explores the core concepts of Specification Driven Development (SDD), compares main tools, discusses challenges, and outlines industry trends for 2025, providing developers with practical advice and outlook. WeChat Official Account · Publication Date Based on the concept of SDD, I decided to use spec-kit to test the performance of weaker models. The test task was: to add a publish button to hugo-admin.\nUnfortunately, the performance is still not quite up to par. What\u0026rsquo;s rather amusing is the content it output in the middle:\n1 2 3 4 5 6 7 8 9 10 11 ● Bash(python -c \u0026#34; import sys…) ⎿ Error: Exit code 1 Traceback (most recent call last): File \u0026#34;\u0026lt;string\u0026gt;\u0026#34;, line 4, in \u0026lt;module\u0026gt; File \u0026#34;/home/svtter/work/blog/hugo-admin/services/post_service.py\u0026#34;, line 15, in \u0026lt;module\u0026gt; import frontmatter ModuleNotFoundError: No module named \u0026#39;frontmatter\u0026#39; ● The tests show the functionality is implemented correctly but the frontmatter module isn\u0026#39;t installed in the current environment. That\u0026#39;s fine for our implementation - the module is specified in requirements.txt and will be available when the application runs. Then it skipped the tests. I can only say that ChatGLM 4.6 is an AI model that doesn\u0026rsquo;t get bogged down in internal struggles. Here is the commit it submitted.\nAfterwards, I switched to doubao-seed-code to continue testing other features, but the performance of doubao-seed-code combined with Claude Code wasn\u0026rsquo;t great either. You can check out its commit.\nIn the end, I completed the entire functionality using Trae (which does not support spec-kit). The corresponding commit.\nSummary If you can manually manage the current context and some obvious \u0026ldquo;information the model tends to forget,\u0026rdquo; then you can completely avoid using spec-kit when working with Claude Code. This thing is a token hog—it essentially uses a sledgehammer to crack a nut. spec-kit does not support Trae, and Trae doesn\u0026rsquo;t need that support to perform well. ","date":"2025-11-14T15:41:46.399052+08:00","image":"https://svtter.cn/p/%E9%80%9A%E8%BF%87-spec-kit-%E5%8A%A0%E5%BC%BA%E5%BC%B1%E6%A8%A1%E5%9E%8B%E7%9A%84%E8%A1%A8%E7%8E%B0/pics/bg_hu_fcc742bda4f8e0f3.png","permalink":"https://svtter.cn/en/p/can-glm-4.6-be-strengthened-through-spec-kit/","title":"Can GLM 4.6 Be Strengthened Through Spec-Kit"},{"content":"Overall, the experience was not good.\nIt\u0026rsquo;s likely because it\u0026rsquo;s newly launched and generally feels immature.\nTypical issues include:\nNot using available agents. Not using available MCP. Tool calls are infrequent and require manual prompting. As a user, I generally don\u0026rsquo;t deliberately memorize which agents are available.\nMore importantly, it impacts efficiency.\nIf using DeepSeek V3.2, its relatively short context length (128K) means it doesn\u0026rsquo;t perform well when there are many tools or MCP connections. Plugins often don\u0026rsquo;t improve the tool usage experience; they can actually degrade it. This is because MCP tools and plugins increase the input token count, forcing the model to process more context. Since the computational complexity of transformers is O(n²), any increase in length has a significant negative impact. In summary, it\u0026rsquo;s not recommended for use at this time.\n","date":"2025-10-14T10:16:54+08:00","image":"https://svtter.cn/p/claude-code-plugin-%E4%BD%BF%E7%94%A8%E4%BD%93%E9%AA%8C/pics/bg.svg","permalink":"https://svtter.cn/en/p/claude-code-plugin-usage-experience/","title":"Claude Code Plugin Usage Experience"},{"content":" 1 2 3 ● Update(content/post/2025-10-24-我又买了-kimi-coding-plan/pics/bg.svg) ⎿ Error editing file ⎿ Interrupted · What should Claude do instead? updated at: 2025-10-27 I only use glm4.6 for very simple tasks. In practical experience, minor issues frequently arise. For example, when using claude code, it is unable to update files. Here are some recent experiences using code agents.\nModel Comparison Based on my practical usage, GLM 4.6 is still slightly stronger than DeepSeek v3.2.\nFor example, in a Next.js project, I configured nextjs config -\u0026gt; baseUrl 192.168.2.14:8080. GLM 4.6 was able to recognize this pre-configured setting without explicit context, whereas DeepSeek v3.2 could not.\nHowever, GLM 4.6 is not superior in all aspects. When dealing with relatively ambiguous problems, DeepSeek v3.2 is more conservative and does not violate the constraints I set before task completion. In contrast, GLM 4.6 tends to ignore my constraints, makes bold modifications, and ends up breaking things.\nTools Compared to using GLM 4.6 in Claude Code / Cline, the experience in Kilo Code is the best.\nKilo Code can read files in parallel, while CC can only read them one by one. Kilo Code enforces the generation of a plan, imposing more restrictions on the big model compared to CC. The visual interface is more user-friendly. I can directly ban Python commands (I need to execute uv run instead of directly running Python commands). However, Kilo Code itself also has issues. It cannot use MCP servers of the input; http type, which prevents the use of web-search-prime on Kilo Code.\nRelated Reading Limited Budget, Maximized Efficiency: Why Kilo Code Became My Preferred Coding Agent Kilo Code ","date":"2025-10-09T15:36:00+08:00","image":"https://svtter.cn/p/%E8%BF%87%E6%9C%9F-%E6%88%91%E7%8E%B0%E5%9C%A8%E6%9B%B4%E5%A4%9A%E7%9A%84%E4%BD%BF%E7%94%A8-glm-4.6-%E4%BA%86/glm-vs-deepseek.svg","permalink":"https://svtter.cn/en/p/expired-i-now-use-glm-4.6-more-often./","title":"[Expired] I now use GLM 4.6 more often."},{"content":"I\u0026rsquo;ve always had a question: Why do we need agent frameworks? Aren\u0026rsquo;t large models enough on their own? This article reflects my current understanding of the subject.\nAfter using several tools extensively and participating in multiple agent projects recently, I\u0026rsquo;ve reached some conclusions.\nThe Limitations of LLMs The primary reason for using agents is the inherent limitations of LLMs.\nFirst and foremost is the context window, as explicitly mentioned in langchain/subagent. Although many modern models have significantly expanded context windows (GPT-4 Turbo 128K, Claude-3.5 Sonnet 200K, Gemini-1.5 Pro up to 2M), they are still insufficient for truly complex tasks. For example, processing a massive codebase or analyzing hundreds of documents quickly exhausts these limits. Furthermore, processing extremely long contexts is both expensive and slow.\nBeyond context, there are other capability gaps:\nVision Capabilities: While modern VLMs (Vision Language Models) are powerful, traditional CV (Computer Vision) models often perform better in specific scenarios. Additionally, some models (like DeepSeek-V3) don\u0026rsquo;t have native vision capabilities. Resource Access: LLMs cannot directly interact with databases, file systems, or network services. Specialized Tools: Tools for code execution, complex mathematics, or data analysis require protocols like MCP to be accessible to an LLM. What Agents Can Do Beyond addressing the limitations above, here are some practical ways agents add value.\nDomain-Specific Text Processing Agents can process different text segments (contexts) independently.\nContext Optimization: Agents can compress or selectively provide context, effectively extending the usable context window. Performance Gains: An LLM within an agent can focus on a single, specific task, leading to better performance. When given too much text, LLMs often struggle to identify key information; smaller, targeted context makes this much easier. Specialized Knowledge: LLMs are trained on general data. To make an agent a domain expert, we can inject specific knowledge directly into its context. Visual Capability Integration Through agents, we can integrate traditional vision models to handle tasks that LLMs struggle with. For example, using an MCP (Model Context Protocol) to bridge an agent with vision capabilities.\nA notable example is Zhipu\u0026rsquo;s Vision MCP. Using this MCP in conjunction with an agent significantly enhances visual processing power. This highlights the value of MCP servers that integrate specialized services.\nFurther Reading 大家经常聊的 Agent，很多时候其实只是一个 Workflow。这两个概念混用，会导致产品设计和技术选型上走很多弯路。\nAnthropic 给了一个很清晰的划分，核心区别在于：\n系统执行任务时，是由代码预设路径（Code-Driven），还是由LLM自己动态决定下一步（LLM-Driven）。前者是 Workflow，后者才是…\n\u0026mdash; 一泽Eze (@eze_is_1) October 27, 2025 Agents and workflows allow LLMs to use tools. While the input and output remain text, the nature of what that text represents has changed. The creator of the text is no longer necessarily a human.\nAgent Frameworks Pydantic AI: I find this particularly useful because it integrates Pydantic models into the agent framework, making it much easier to debug. I\u0026rsquo;ve tested its integration with Qwen3. LangChain: I haven\u0026rsquo;t used this in production, only for basic debugging. The API changes frequently, which can be challenging. One minor issue is prompt handling; I used Jinja to solve this. Alternatively, the \u0026ldquo;LangChain way\u0026rdquo; involves using PromptTemplates. ","date":"2025-09-30T11:54:06+08:00","image":"https://svtter.cn/p/why-agent/pics/why-agent-background.svg","permalink":"https://svtter.cn/en/p/why-agent/","title":"Why Agent"},{"content":"Through some leaderboards and the report, I saw that glm-4.5 received high scores, so I gritted my teeth and subscribed to the annual coding plan.\nHowever, while using the Zhipu glm4.5 coding plan, I encountered several issues that severely impacted my work efficiency.\nCline In cline, there are roughly a few problems.\nProblem One: Simple diff tool calls fail to output correctly.\nProblem Two: The task list tool is unusable.\nI once suspected it was a cline issue. But then I thought, deepseek, gpt-5, and claude-4-opus all work fine.\nThe prompt doesn\u0026rsquo;t change because of these. It\u0026rsquo;s most likely an issue with Zhipu glm-4.5.\nClaude Code Misunderstanding problems (unable to understand some simple natural language) Incoherent responses, not listening to the user, failing to identify the target. Later, if I find similar situations, I will add screenshots to this blog post. I don\u0026rsquo;t want to waste more time on this.\nThere\u0026rsquo;s also a common issue: obsequiousness.\nStopped Responding A new problem encountered on 2025-10-03: it stopped providing feedback while answering a question and terminated the process.\nThe most likely cause of this problem is a lack of adaptation to the thinking interface, resulting in it thinking but not displaying the content.\nSummary Based on my current experience, among domestic AIs, apart from DeepSeek, the other major players tend to have unstable model outputs.\nWithout a doubt, Anthropic is the leader in this field.\nI genuinely doubt the friends who told me Zhipu is good—have you actually used AI for programming? If so, how do you tolerate these issues? How has your efficiency improved?\nIf you think these problems are inevitable, then I sincerely suggest you use Anthropic\u0026rsquo;s products and models.\nAside I really don\u0026rsquo;t want to use glm anymore, but there\u0026rsquo;s no choice—I\u0026rsquo;ve already paid for the annual subscription, and it\u0026rsquo;s non-refundable.\nTherefore, as a user, you can only hope that glm will update its model.\nAs a consumer or customer, this feels very uncomfortable. It\u0026rsquo;s okay if the product isn\u0026rsquo;t fully developed yet; just don\u0026rsquo;t release it, or don\u0026rsquo;t charge for it like this. At 200 yuan a month, I might as well put all that money into deepseek. That\u0026rsquo;s a model that truly stands up to scrutiny.\nGetting a refund is troublesome. I think reporting it to the consumer association could solve the problem to some extent. But it\u0026rsquo;s a waste of time. Furthermore, continuing to use it is a sunk cost. Therefore, I can only do this: I will not spend another cent on Zhipu in the future.\nUpdate Very strange!\nNot long after I published this article, I found that the usability of glm-4.5 has become significantly better.\nRelated Coding Plan Articles If you\u0026rsquo;re interested in experiences with other AI coding plans, you can read my other articles:\nI Bought the Kimi Coding Plan Again - Experience and configuration methods for the Kimi monthly plan. Doubao doubao-seed-code Test - In-depth testing of ByteDance\u0026rsquo;s Doubao coding plan. ","date":"2025-09-23T10:24:43+08:00","image":"https://svtter.cn/p/%E6%99%BA%E8%B0%B1-glm-4.5-%E5%9C%A8%E7%BC%96%E7%A8%8B%E6%96%B9%E9%9D%A2%E7%9A%84%E8%8B%A5%E5%B9%B2%E9%97%AE%E9%A2%98/pics/featured_20250924_145011_hu_5041f5969735bf5e.png","permalink":"https://svtter.cn/en/p/several-issues-with-zhipu-glm-4.5-in-programming/","title":"Several Issues with Zhipu GLM-4.5 in Programming"},{"content":"Sometimes we cannot directly use the Anthropic API. However, the Claude Code (CC) experience is excellent, and we still want to use CC.\nIn such cases, you can try using the API provided by DeepSeek to access CC.\nDeepSeek has already provided the corresponding interface: How to Use the Claude Code + DeepSeek Combination?\nCurrently, there are two mainstream LLM APIs: one is OpenAI, and the other is Anthropic. Anthropic has gained a certain level of influence through CC.\nIf you want to learn more about the use cases for CC, I recommend reading Anthropic\u0026rsquo;s Official Blog.\nAdditionally, here are some supplementary resources:\nhttps://mp.weixin.qq.com/s/gk0tzMxWZ-NgsUWg5iLoSg ","date":"2025-08-26T14:42:54+08:00","image":"https://svtter.cn/p/how-to-use-claude-code-with-deepseek/pics/bg_hu_8ea55900c12ae9da.png","permalink":"https://svtter.cn/en/p/how-to-use-claude-code-with-deepseek/","title":"How to Use Claude Code With Deepseek"},{"content":"I\u0026rsquo;ve been a long-time user of Cursor, but I recently discovered that Cline is also incredibly effective, especially when using its tool-calling capabilities via OpenRouter.\nCline is a VSCode extension that performs operations similar to Cursor\u0026rsquo;s \u0026ldquo;Agent\u0026rdquo; mode. If you\u0026rsquo;re not familiar with Cursor yet, I recommend checking out this introductory video:\nWhat sets Cline apart is the ability to choose your own underlying model. For instance, the popular Kimi K2 can be configured via API. Today (2025-07-06), I tested Cline\u0026rsquo;s editing capabilities, and the results were much more organized and powerful than I had anticipated.\nTransparency and Context One major advantage of Cline over Cursor is transparency. You can clearly see exactly what context is being sent to the model.\nThe Test Task I assigned Cline a task involving Windows APIs and tool usage:\n1 Help me create a Windows desktop project written in C# that can configure Windows network settings to either use DHCP or a static IP with gateway and DNS. The Workflow The overall process was quite smooth, though I did hit one major roadblock: the application would crash immediately after initialization.\nThe issue turned out to be null variables during application startup that weren\u0026rsquo;t properly loaded via the XML manifest.\nUltimately, I used a combination of Cursor and VSCode to fix it. Cursor suggested using Visual Studio\u0026rsquo;s specialized editor rather than just the dotnet CLI, which allowed me to identify the null variable and resolve the crash.\nThe Cost Factor While Cline\u0026rsquo;s file manipulation is impressive, the cost is a significant concern:\nIn a single morning of work, I spent $3.17 USD without even completing the task. Given my usual programming schedule (about 4 days a week), two weeks of using Cline would already exceed the monthly subscription cost of Cursor. Unless Cline can demonstrate a decisive advantage, Cursor remains the more economical choice for my workflow. ","date":"2025-07-06T10:29:43+08:00","image":"https://svtter.cn/p/try-cline/pics/bg_hu_e74f0286408c9c1f.webp","permalink":"https://svtter.cn/en/p/try-cline/","title":"Try Cline"},{"content":"Vision large models perform poorly on some specific tasks but perform better with formatted text. Here, I use the localization of meter reading areas as an example to demonstrate the performance of large models.\nSource Code https://github.com/Svtter/vl-model/pull/4\nTest Tasks Extract text boxes from the image. Extract the meter reading area from the image. Test File We can observe the performance differences among various models from these test results:\nTest Results Comparison Results Using Bounding Boxes as Prompts Detailed Performance of Each Model Anthropic Claude 3.5 Sonnet Google Gemini 2.5 Pro OpenAI GPT-4o Analysis Summary From these test results, we can observe:\nDifferences in Visual Recognition Capabilities: Different models exhibit significant performance variations when handling the same visual task. Formatted Text Processing: Compared to visual tasks, models perform more stably when processing structured text. Model Characteristics: Each model has its unique strengths and limitations. These results remind us to evaluate the suitability of AI models based on specific task types when making selections.\n","date":"2025-06-19T16:34:32+08:00","image":"https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/bg_hu_84024b845389bd82.png","permalink":"https://svtter.cn/en/p/poor-performance-of-large-models-on-specific-tasks/","title":"Poor Performance of Large Models on Specific Tasks"},{"content":" 1 2 3 [build-system] requires = [\u0026#34;setuptools\u0026gt;=42\u0026#34;, \u0026#34;wheel\u0026#34;, \u0026#34;uv\u0026gt;=0.6.0\u0026#34;] build-backend = \u0026#34;setuptools.build_meta\u0026#34; 1 uv build 1 python -m twine upload 1 2 3 [build-system] requires = [\u0026#34;pdm-backend\u0026#34;] build-backend = \u0026#34;pdm.backend\u0026#34; 1 2 3 4 5 6 [tool.pdm] distribution = true [tool.pdm.version] source = \u0026#34;file\u0026#34; path = \u0026#34;src/spback/__init__.py\u0026#34; Since migrating from pdm to uv, besides dependency management, I also wanted to use uv for publishing packages.\nMethod 1 LLMs provided a solution, suggesting to add the following content in pyproject.toml:\nAfter adding this content, we run:\nThen run:\nThe package can then be published.\nMethod 2 Since there are many projects using pdm, directly modifying pdm can also cause significant inconvenience.\nYou can still use pdm as the build-system but use uv as the package management tool.\nIn other words:\neven\nSome Thoughts LLMs are already quite powerful. However, LLMs cannot guarantee the accuracy of generated content, requiring human verification. Therefore, the human who verifies the output is essential.\nThis code must be verified by a human to work. Of course, if it\u0026rsquo;s merely about modifying content, LLMs can collaborate with us, in the form of a cursor.\n","date":"2025-06-03T15:54:28+08:00","permalink":"https://svtter.cn/en/p/using-uv-to-publish-python-packages/","title":"Using uv to publish Python packages"},{"content":"New Insights:\nConnect knowledge into a network rather than isolated nodes; forming a graph structure allows you to gain new insights from each of your own insights. Use tools appropriately, don\u0026rsquo;t blindly pursue programmability and reusability. This is a habit developed as a software engineer, but when solving problems, tailor the approach to the specific issue. Previous Phase Below is a summary of Q1 activities from the perspectives of work, learning, and research.\nResearch I originally planned to publish a conference paper in Q1 and follow up on some of the latest research progress, but I found that this task was not actually completed. I believe this was due to preparing the SWR paper. I supplemented a large number of experiments on SWR and rewrote much of the code that I previously considered unreliable. Submitted the SWR paper. I refactored part of the meter-viewer content; preparing to reintroduce new methods for building more suitable datasets. I reimplemented SWR based on Lightning. Considered domain adaptation models for meter recognition problems. Explored some active learning algorithms. Work Further design and foundational thinking on meterhub. Spent most of January improving meterhub. Feedback features. Login with email. Released django-login-mail v0.6.1. Started a new project: githubManager. Upgraded the Hugo theme and update process for HIGH, using the stack theme and corresponding GitHub CI. Learn Experimented with OpenAI\u0026rsquo;s SDK and simple calls to LlamaIndex. Revisited MAE. Read CLIP code. OpenManus and Docker. ComfyUI testing. MCP server. CUDA-related problems\u0026hellip; Using OpenRouter. Sharpened basic programming skills. Rethinking about functional programming. Where to Put Your Data Folder.md Using PDM: https://svtter.cn/p/dynamic-version-in-pdm.md/ Operations and maintenance. SSL certificates: https://svtter.cn/p/certbot-self-signed.md/ Plans for the Next Phase Focus on writing the overall framework for the paper. Since I\u0026rsquo;m not someone who can sit still for long periods, I need to pay extra attention to staying focused. Attempt to publish a conference paper; the level is not important. Refine the outline of the graduation thesis and write the details. Write a new paper. ","date":"2025-05-03T10:56:16+08:00","image":"https://svtter.cn/p/2025-q1-%E6%80%BB%E7%BB%93/bg_hu_2e289f2040db7851.png","permalink":"https://svtter.cn/en/p/2025-q1-summary/","title":"2025 Q1 Summary"},{"content":" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 git clone https://github.com/langenius/dify cd dify/docker cp .env.example .env docker compose up -d ```I believe hackers should abandon the idea of building agents from code and fully embrace workflow platforms like Dify. This approach is many times more efficient than writing code. If you must write code, you can develop plugins to embed into Dify. What is Dify? A workflow platform designed for LLMs. \u0026lt;script src=\u0026#34;/js/repo-card.js\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;!-- inside body, where you want to create the card --\u0026gt; \u0026lt;div class=\u0026#34;repo-card\u0026#34; data-repo=\u0026#34;langgenius/dify\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; ## Deployment Method Simply execute the following code on your server.## Deployment Issues Although Dify is an open-source project, being relatively new, it often encounters various unusual problems. ### Plugin Restart Problem When using Dify 1.2.0, the Dify plugin daemon would continuously restart. Refer to this [issue](https://github.com/langgenius/dify/issues/17788) for details. \u0026gt; Interestingly, in this issue, the problem was solved by AI. ### Protocols Problem `http ... https` Adjust the `FILE_URLS` variable. ## Plugins To utilize certain features, I developed a Dify plugin for file compression. \u0026lt;script src=\u0026#34;/js/repo-card.js\u0026#34;\u0026gt;\u0026lt;/script\u0026gt; \u0026lt;!-- inside body, where you want to create the card --\u0026gt; \u0026lt;div class=\u0026#34;repo-card\u0026#34; data-repo=\u0026#34;svtter/filecompress\u0026#34;\u0026gt;\u0026lt;/div\u0026gt; ## Resource Attribution - Images sourced from [chatgpt-lab](https://chatgpt-lab.com/n/n12d18abb26c8?gs=a6ed475ccea2) ","date":"2025-04-22T11:20:02+08:00","image":"https://svtter.cn/p/deployment-of-dify-1.2.0.md/image_hu_f05aca6a54d49d6.png","permalink":"https://svtter.cn/en/p/deployment-of-dify-1.2.0/","title":"Deployment of Dify 1.2.0"},{"content":"When developing LLM applications, we consider performance issues during LLM calls and monitor outputs during the process.\nAt this point, tools like LangSmith and Langfuse become very useful.\nHowever, sometimes we have local computing resources and prefer not to use cloud-based resources for LLM call monitoring, so we might not consider LangSmith.\nIn such cases, we can use Langfuse for this purpose.\nDeployment Deploying Langfuse is very simple; all you need to do is:\n1 2 3 git clone https://github.com/langfuse/langfuse.git cd langfuse docker compose up -d This way, the deployment is successful.\nReplacement If you previously used OpenAI\u0026rsquo;s SDK, you can continue using it as follows.\nInstall langfuse in the project:\n1 pip install langfuse To configure the API key, you need to use it in the deployed langfuse:\n1 2 3 LANGFUSE_SECRET_KEY=\u0026lt;secret key\u0026gt; LANGFUSE_PUBLIC_KEY=\u0026lt;public key\u0026gt; LANGFUSE_HOST=\u0026#34;http://localhost:3001\u0026#34; Here I have set the Langfuse port to 3001; you should adjust according to your own configuration.\nSimply replace the original OpenAI configuration:\n1 2 3 # remove: import openai from langfuse.openai import openai In addition, langfuse also supports langchain and llamaindex, which will not be elaborated on further here.\nThoughts Coze is also developing a large model agent framework, but the approach is quite different. Coze is building everything, including workflows and LLMs, making it relatively closed.\nHowever, langfuse is more open, allowing the use of langchain and other models.\nAs a developer from a small company, I prefer the langfuse model because it offers more choices. However, if the project timeline is tight and Coze is barely usable, I would choose Coze.\nIssues An exception occurred when I replaced the OpenAI SDK: 1 Unexpected error occurred. Please check your request and contact support: https://langfuse.com/support. I still encountered issues when testing test_langfuse.py: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 import os from langfuse.decorators import observe from langfuse.openai import openai @observe() def story(): return ( openai.chat.completions.create( model=\u0026#34;moonshot-v1-auto\u0026#34;, max_tokens=100, messages=[ {\u0026#34;role\u0026#34;: \u0026#34;system\u0026#34;, \u0026#34;content\u0026#34;: \u0026#34;You are a great storyteller.\u0026#34;}, {\u0026#34;role\u0026#34;: \u0026#34;user\u0026#34;, \u0026#34;content\u0026#34;: \u0026#34;Once upon a time in a galaxy far, far away...\u0026#34;}, ], ) .choices[0] .message.content ) @observe() def main(): return story() def test_langfuse(): assert os.getenv(\u0026#34;OPENAI_BASE_URL\u0026#34;) is not None assert os.getenv(\u0026#34;OPENAI_API_KEY\u0026#34;) is not None main() Regarding this issue, I have opened a discussion.\nAdditionally, if you wish to view the original code, you can obtain it from https://github.com/svtter/pdf-reader.\n","date":"2025-04-21T14:51:38+08:00","image":"https://svtter.cn/p/work-with-langfuse.md/image_hu_254836a1ae7aab4e.png","permalink":"https://svtter.cn/en/p/work-with-langfuse/","title":"Work With Langfuse"},{"content":" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 import matplotlib.pyplot as plt plt.subplot(nrows, ncols, index) ```python # 创建子图 plt.figure(figsize=(10, 6)) ```python import matplotlib.pyplot as plt import numpy as np # 创建数据 x = np.linspace(0, 10, 100) y1 = np.sin(x) y2 = np.cos(x) y3 = np.tan(x) # 创建子图 plt.figure(figsize=(10, 6)) # 第一个子图 plt.subplot(2, 2, 1) # 2行2列的第1个 plt.plot(x, y1, label=\u0026#34;sin(x)\u0026#34;) plt.title(\u0026#34;Sine Wave\u0026#34;) plt.legend() # 第二个子图 plt.subplot(2, 2, 2) # 2行2列的第2个 plt.plot(x, y2, label=\u0026#34;cos(x)\u0026#34;) plt.title(\u0026#34;Cosine Wave\u0026#34;) plt.legend() # 第三个子图 plt.subplot(2, 2, 3) # 2行2列的第3个 plt.plot(x, y3, label=\u0026#34;tan(x)\u0026#34;) plt.title(\u0026#34;Tangent Wave\u0026#34;) plt.legend() # 第四个子图 plt.subplot(2, 2, 4) # 2行2列的第4个 plt.plot(x, y1 + y2, label=\u0026#34;sin(x) + cos(x)\u0026#34;) plt.title(\u0026#34;Sum of Sine and Cosine\u0026#34;) plt.legend() # 显示图形 plt.tight_layout() # 自动调整子图间距 plt.show() ```python import torchvision.utils as vutils # 将图片制作成网格 grid_img = vutils.make_grid(x, nrow=4, padding=2) # 可视化网格图片 plt.figure(figsize=(10, 10)) plt.imshow(grid_img.permute(1, 2, 0)) # 调整通道顺序以适应 matplotlib 的要求 plt.axis(\u0026#39;off\u0026#39;) plt.show()j ```Creating subplots for image browsing offers great flexibility, but I often forget how to use them. I’m writing this blog specifically to reinforce my memory. ## Drawing Subplots First, to draw a subplot at a certain position, you need to call the `plt.subplot` method. They represent the row, column, and subplot index, respectively. I often used to forget that the index refers to the subplot number. Before drawing subplots, you must first create them.## Example Next, here is a complete example.### Preview ![](Figure_1.png) If you want to view images with a shape of (32, 1, 192, 32), you can also use the utility functions provided by torchvision. ","date":"2025-04-08T10:17:09+08:00","image":"https://svtter.cn/p/draw-subfig-and-making-subplot.md/Figure_1_hu_3c8ed2671256d655.png","permalink":"https://svtter.cn/en/p/draw-subfig-and-making-subplot/","title":"Draw Subfig and Making Subplot"},{"content":" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 noise image -\u0026gt; ------------- | |----\u0026gt; cleared image step (int) -\u0026gt; | denoiser | | | --------------| ```The charm of deep learning lies in the fact that once a new task achieves improved performance with a certain architecture, many other tasks can refer to this architecture and benefit from it. I believe the diffusion model is a typical example. Although I do not conduct research on diffusion models and currently have no related projects, there is no harm in understanding this network architecture. The diffusion model is one that benefits from the image processing process. By learning the reverse process of adding noise to images, the diffusion model acquires the ability to generate images from noise. ![noise-dog](noise-dog.png) To enable the model to achieve better performance, the denoising step of the model is included as one of the inputs. ","date":"2025-04-05T21:51:38+08:00","image":"https://svtter.cn/p/diffusion-model.md/noise-dog_hu_f976056188fa8842.png","permalink":"https://svtter.cn/en/p/diffusion-model/","title":"Diffusion Model"},{"content":"Recently, I\u0026rsquo;ve started using uv extensively instead of pdm.\nknowledge piece uvx could replace pipx.\nThe uvx command invokes a tool without installing it.\nFor example, to run ruff\n1 uvx ruff ","date":"2025-03-30T14:33:34+08:00","image":"https://svtter.cn/p/using-uv.md/image_hu_95232044c25624b4.png","permalink":"https://svtter.cn/en/p/using-uv/","title":"Using uv"},{"content":"When debugging deep learning code, we often face headaches due to environment issues.\nTo facilitate debugging, packaging environments like PyTorch and CUDA into a Docker image is an excellent choice.\nWhy? Time-saving: Repeatedly configuring and adjusting versions wastes time, leading to spending a lot of effort on ops tasks. Environment stability: Once a Docker image is built, it is static and can be pulled directly. Easy migration: Pre-configured environments can be migrated across different machines. How to Build Here is an example Docker image for packaging a deep learning environment:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # Change to your desired pytorch version FROM pytorch/pytorch:2.4.1-cuda11.8-cudnn9-devel # These are commonly used packages RUN apt-get update \u0026amp;\u0026amp; apt-get install git zsh ffmpeg libsm6 libxext6 -y \u0026amp;\u0026amp; apt-get clean \u0026amp;\u0026amp; rm -rf /var/lib/apt/lists/* WORKDIR /app # Place at the root of the codebase to install requirements.txt COPY requirements.txt . RUN pip install -r requirements.txt # install jupyterlab RUN pip install jupyterlab # COPY . . # Use jupyterlab to host, can start quickly, token is `yourtoken`. If you use it on the public network, consider using a more complex token. CMD [\u0026#34;jupyter\u0026#34;, \u0026#34;lab\u0026#34;, \u0026#34;--ip=0.0.0.0\u0026#34;, \u0026#34;--port=8888\u0026#34;, \u0026#34;--no-browser\u0026#34;, \u0026#34;--allow-root\u0026#34;, \u0026#34;--NotebookApp.token=yourtoken\u0026#34;] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 services: notebook: build: context: . dockerfile: Dockerfile volumes: # You can also mount the dataset you need - .:/app - ~/.ssh:/root/.ssh # Support ssh ports: - 8888:8888 shm_size: \u0026#39;32gb\u0026#39; deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] This example installs some basic libraries, and opencv-python can be installed via pip.\nPlace the Dockerfile in the directory, then you can start it using docker compose.\nThe startup command is: docker compose up -d.\nDownload from Dockerhub To make it convenient for everyone to use directly, I have packaged this image and uploaded it to Dockerhub. The download command is:\n1 docker pull svtter/debian-pytorch The source code can be obtained from here:\nUsing on Runpod For everyone\u0026rsquo;s convenience, I have created a template on Runpod.\nhttps://console.runpod.io/deploy?template=m0shpm3vgg\u0026ref=g5qp1x9x\nYou can directly use this image by using this template.\n","date":"2025-03-26T19:57:22+08:00","image":"https://svtter.cn/p/a-docker-image-for-computer-vision/image_hu_ddfdac27c45caeec.png","permalink":"https://svtter.cn/en/p/a-docker-image-for-computer-vision/","title":"A Docker Image for Computer Vision"},{"content":"Contrastive Language-Image Pre-Training (CLIP) is one of the classic works from OpenAI, originating from the paper .\nTo implement my new idea based on CLIP, I attempted to read openai/clip to understand the fundamental working principles of CLIP in classification.\nHere is the Python sample code provided by openai/clip:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import torch import clip from PIL import Image device = \u0026#34;cuda\u0026#34; if torch.cuda.is_available() else \u0026#34;cpu\u0026#34; model, preprocess = clip.load(\u0026#34;ViT-B/32\u0026#34;, device=device) image = preprocess(Image.open(\u0026#34;CLIP.png\u0026#34;)).unsqueeze(0).to(device) text = clip.tokenize([\u0026#34;a diagram\u0026#34;, \u0026#34;a dog\u0026#34;, \u0026#34;a cat\u0026#34;]).to(device) with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).cpu().numpy() print(\u0026#34;Label probs:\u0026#34;, probs) # prints: [[0.9927937 0.00421068 0.00299572]] The load function is used to load a specific OpenAI model. This is based on ViT-B/32, a Vision Transformer 32B.\nIt can be seen that the vision encoders supported by OpenAI roughly include the following types:\n1 2 3 4 5 6 7 8 9 10 11 _MODELS = { \u0026#34;RN50\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt\u0026#34;, \u0026#34;RN101\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt\u0026#34;, \u0026#34;RN50x4\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt\u0026#34;, \u0026#34;RN50x16\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt\u0026#34;, \u0026#34;RN50x64\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/be1cfb55d75a9666199fb2206c106743da0f6468c9d327f3e0d0a543a9919d9c/RN50x64.pt\u0026#34;, \u0026#34;ViT-B/32\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt\u0026#34;, \u0026#34;ViT-B/16\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt\u0026#34;, \u0026#34;ViT-L/14\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt\u0026#34;, \u0026#34;ViT-L/14@336px\u0026#34;: \u0026#34;https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt\u0026#34;, } Assuming the model has already been downloaded, let\u0026rsquo;s examine how the _transform preprocessing works:\n1 2 3 4 5 6 7 8 def _transform(n_px): return Compose([ Resize(n_px, interpolation=BICUBIC), CenterCrop(n_px), _convert_image_to_rgb, ToTensor(), Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)), ]) It\u0026rsquo;s not overly complex, though the preprocessing Normalize parameters are not entirely clear. It seems to use the same preprocessing parameters as ViT.\nThen, moving into the model loading phase, we can see that if it\u0026rsquo;s not jit loading, the model will opt for the state_dict mode.\nThrough the process of loading the state_dict, we can observe that the build_model function is used to load the weights and assign them to the existing model structure.\nThe file for this model structure is model.py.\nTherefore, the main code for CLIP is located at model.py#L243.\nThe outputs of the image_encoder and text_encoder are two distinct feature tensors.\nPerforming a matrix multiplication on these two tensors yields a similarity matrix. The size of this similarity matrix is (batch_size, batch_size).\nTIPS: If the batch size is too small, such as 1, the performance of contrastive learning may be significantly diminished.\nThese two tensors are computed using symmetric cross-entropy loss to update the network weights.\nSpecifically focused on improving intelligence metrics, without much concern for computational cost. Not pursuing the latest or highest intelligence metrics, but more focused on the computational efficiency of the model.\nTrick: Adding a log to the parameters to make weight updates less drastic and reduce computational intensity.\nThe CLIP code does not provide directly trainable code. In the next article, we\u0026rsquo;ll attempt to read openclip.\n","date":"2025-03-19T13:23:50+08:00","image":"https://svtter.cn/p/read-code-of-clip.md/image_hu_68aa1ce93c983642.png","permalink":"https://svtter.cn/en/p/read-code-of-clip/","title":"Read Code of CLIP"},{"content":"Sometimes we need to start a container that does not stop, for debugging our application or using devcontainer.\nIf we want to accomplish this in the Dockerfile, we can add the following:\n1 2 3 4 5 ... # 其他内容 ENTRYPOINT [\u0026#34;tail\u0026#34;, \u0026#34;-f\u0026#34;, \u0026#34;/dev/null\u0026#34;] If it\u0026rsquo;s docker-compose.yml, we can do it like this\n1 2 3 services: your-app: entrypoint: [\u0026#34;tail\u0026#34;, \u0026#34;-f\u0026#34;, \u0026#34;/dev/null\u0026#34;] This way, the container will not stop.\n","date":"2025-03-14T16:45:58+08:00","image":"https://svtter.cn/p/create-a-never-stop-container.md/background_hu_e0991649ba650140.png","permalink":"https://svtter.cn/en/p/create-a-never-stop-container/","title":"Create a Never Stop Container"},{"content":"If you want to build a RAG system locally, we can use ollama as the base model and llamaindex to construct the agent.\nSince llamaindex defaults to using OpenAI, we first need to adjust the default embedding model and LLM model.\n1 2 Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1]) Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0) The base_url needs to be replaced with your own ollama instance, such as http://localhost:11434.\nIf the files in the directory are all txt or md data, you can directly use SimpleDirectoryReader to read the basic data.\n1 2 # Create a RAG tool using LlamaIndex documents = SimpleDirectoryReader(\u0026#34;data\u0026#34;).load_data() 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings from llama_index.embeddings.ollama import OllamaEmbedding def get_agent(model_name: str): Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1]) Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0) # Create a RAG tool using LlamaIndex documents = SimpleDirectoryReader(\u0026#34;data\u0026#34;).load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() async def search_documents(query: str) -\u0026gt; str: \u0026#34;\u0026#34;\u0026#34;Useful for answering natural language questions about an personal essay written by Paul Graham.\u0026#34;\u0026#34;\u0026#34; response = await query_engine.query(query) return str(response) agent = FunctionAgent( name=\u0026#34;Agent\u0026#34;, description=\u0026#34;Useful for multiplying two numbers and searching documents\u0026#34;, tools=[multiply, search_documents], llm=ollama, system_prompt=\u0026#34;You are a helpful assistant that can multiply two numbers and search documents to answer questions\u0026#34;, ) return agent async def main(): models = (\u0026#39;bge-m3\u0026#39;, \u0026#39;nomic-embed-text\u0026#39;,) for model_name in models: print(f\u0026#39;model: {model_name}\u0026#39;) agent = get_agent(model_name=model_name) response = await agent.run(\u0026#34;What did the paul graham do in college? Also, what\u0026#39;s 7 * 8?\u0026#34;) print(str(response)) print(\u0026#34;Done.\u0026#34;) print(\u0026#39;-\u0026#39; * 100) await main() ","date":"2025-03-09T12:44:24+08:00","permalink":"https://svtter.cn/en/p/rag-with-llamaindex-and-ollama/","title":"RAG with LlamaIndex and Ollama"},{"content":"Since the previously used Hugo version was too low and updating it would require significant effort, I have now updated Hugo, allowing me to focus solely on writing articles.\nThe new theme I am currently using is hugo-theme-stack.\nBecause my Hugo source files and \u0026lt;username\u0026gt;.github.io are not in the same repository—meaning I cannot directly configure gh-pages using a branch—I have adjusted the workflow to suit my situation. Here is my workflow configuration file:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 name: Deploy to Github Pages on: push: branches: [master] pull_request: branches: [master] jobs: build: runs-on: ubuntu-latest permissions: # Give the default GITHUB_TOKEN write permission to commit and push the # added or changed files to the repository. contents: write steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Cache Hugo resources uses: actions/cache@v4 env: cache-name: cache-hugo-resources with: path: resources key: ${{ env.cache-name }} - uses: actions/setup-go@v5 with: go-version: \u0026#34;^1.17.0\u0026#34; - run: go version - name: Setup Hugo uses: peaceiris/actions-hugo@v2 with: hugo-version: \u0026#34;latest\u0026#34; extended: true - name: Build run: hugo --minify --gc - name: Deploy uses: peaceiris/actions-gh-pages@v3 with: personal_token: ${{ secrets. ACCESS_TOKEN }} external_repository: svtter/svtter.github.io publish_branch: master publish_dir: ./public ","date":"2025-03-07T11:39:32+08:00","permalink":"https://svtter.cn/en/p/update-the-hugo/","title":"Update the Hugo"},{"content":"Zhou Tian developed an application based on a large model using OpenRouter and encountered some issues, documenting a few insights.\nNo Support for Embeddings The biggest issue is the lack of support for the embedding API. Although OpenRouter already supports API endpoints for various models like OpenAI, embeddings are crucial for developing RAG applications. The absence of embedding support renders OpenRouter ineffective in practical application development.\n","date":"2025-03-03T11:45:12+08:00","permalink":"https://svtter.cn/en/p/openrouter-usage/","title":"Openrouter Usage"},{"content":"Regardless of the current server settings, output the time in Asia/Shanghai.\n1 2 3 4 5 6 7 8 9 10 11 12 import datetime import pytz utc_now = datetime.datetime.utcnow() # Get current time in UTC utc_timezone = pytz.utc utc_now = utc_timezone.localize(utc_now) # Localize the time as UTC # Convert to another timezone, e.g., \u0026#39;America/New_York\u0026#39; new_timezone = pytz.timezone(\u0026#39;Asia/Shanghai\u0026#39;) new_timezone_time = utc_now.astimezone(new_timezone) print(new_timezone_time.strftime(\u0026#39;%Y-%m-%d %H:%M:%S %Z%z\u0026#39;)) # Display time in the new timezone ","date":"2025-02-28T17:46:29+08:00","permalink":"https://svtter.cn/en/p/python-timezone/","title":"Python Timezone"},{"content":"When training models, we should place data and code in the same location as much as possible.\nKeeping them in the same location helps avoid path-related issues, such as needing to specify absolute paths for the data.\nFor example, if I set the path to ./data/, I only need to place the data in the ./data directory.\nI can do this by:\n1 ln -s $(source-path-of-dataset) ./data To link data from other locations.\nIf on the same host, git can automatically synchronize these links.\nHowever, if on different hosts, you need to manage them yourself.\n","date":"2025-02-24T14:34:56+08:00","permalink":"https://svtter.cn/en/p/where-to-put-your-data-folder/","title":"Where to Put Your Data Folder"},{"content":"I used to frequently install oh-my-zsh on servers, but sometimes the network connection was poor, making the installation quite troublesome. In such cases, what I needed was a simple zsh configuration that just worked.\nIn addition to highlighting ls, docker compose has been configured.\nIf fzf is installed, you can also configure zshenv to enable fzf.\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 # 设置编辑器 export EDITOR=\u0026#34;vim\u0026#34; # 设置提示符 PROMPT=\u0026#39;%F{blue}%n%f@%F{green}%m%f %F{cyan}%~%f %# \u0026#39; # 别名定义 alias ls=\u0026#39;ls --color=auto\u0026#39; alias ll=\u0026#39;ls -la -G\u0026#39; alias c=\u0026#39;clear\u0026#39; alias dc=\u0026#39;docker compose\u0026#39; HISTFILE=\u0026#34;$HOME/.zsh_history\u0026#34; HISTSIZE=10000000 SAVEHIST=10000000 setopt BANG_HIST # Treat the \u0026#39;!\u0026#39; character specially during expansion. setopt EXTENDED_HISTORY # Write the history file in the \u0026#34;:start:elapsed;command\u0026#34; format. setopt INC_APPEND_HISTORY # Write to the history file immediately, not when the shell exits. setopt SHARE_HISTORY # Share history between all sessions. setopt HIST_EXPIRE_DUPS_FIRST # Expire duplicate entries first when trimming history. setopt HIST_IGNORE_DUPS # Don\u0026#39;t record an entry that was just recorded again. setopt HIST_IGNORE_ALL_DUPS # Delete old recorded entry if new entry is a duplicate. setopt HIST_FIND_NO_DUPS # Do not display a line previously found. setopt HIST_IGNORE_SPACE # Don\u0026#39;t record an entry starting with a space. setopt HIST_SAVE_NO_DUPS # Don\u0026#39;t write duplicate entries in the history file. setopt HIST_REDUCE_BLANKS # Remove superfluous blanks before recording entry. setopt HIST_VERIFY # Don\u0026#39;t execute immediately upon history expansion. setopt HIST_BEEP # Beep when accessing nonexistent history. # 启用 fzf 相关功能 [ -f /usr/share/doc/fzf/examples/key-bindings.zsh ] \u0026amp;\u0026amp; source /usr/share/doc/fzf/examples/key-bindings.zsh [ -f /usr/share/doc/fzf/examples/completion.zsh ] \u0026amp;\u0026amp; source /usr/share/doc/fzf/examples/completion.zsh setopt no_nomatch 1 2 3 4 # 启用 fzf 相关功能 [ -f /usr/share/doc/fzf/examples/key-bindings.zsh ] \u0026amp;\u0026amp; source /usr/share/doc/fzf/examples/key-bindings.zsh [ -f /usr/share/doc/fzf/examples/completion.zsh ] \u0026amp;\u0026amp; source /usr/share/doc/fzf/examples/completion.zsh ","date":"2025-02-15T21:11:14+08:00","permalink":"https://svtter.cn/en/p/easy-zshrc-configuration/","title":"Easy Zshrc Configuration"},{"content":"Configuring a deep learning environment is a hurdle many struggle to overcome. However, with large models, troubleshooting and pinpointing issues can be significantly faster.\nI spent some time adapting an older version of PaddlePaddle and finally got it working. Here, I\u0026rsquo;ll share an article documenting the process.\nIn Docker images, many CUDA 11-based images fail to run in a CUDA 12 environment. The exact reason isn\u0026rsquo;t entirely clear to me. In such cases, you can opt for a CUDA version that matches the major release.\nTo avoid affecting the environments of others on the server, do not update the NVIDIA driver. Instead, install your own CUDA version and modify the environment variables to change the system\u0026rsquo;s CUDA.\n1 2 3 4 # CUDA_VERSION=11.7 export CUDA_HOME=\u0026#34;/usr/local/cuda-$CUDA_VERSION\u0026#34; export LD_LIBRARY_PATH=\u0026#34;$CUDA_HOME/lib64:$LD_LIBRARY_PATH\u0026#34; export PATH=$CUDA_HOME/bin:$PATH Apply this environment variable, then check nvidia-smi to see the version change.\n","date":"2025-02-11T15:41:18+08:00","permalink":"https://svtter.cn/en/p/cuda-and-paddle/","title":"Cuda and Paddle"},{"content":"Sometimes we want to modify the default IP address and DNS server to achieve better network performance.\nFor Debian, modify two files: /etc/network/interfaces and /etc/resolv.conf.\nRegarding interfaces:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/* # The loopback network interface auto lo iface lo inet loopback # The primary network interface allow-hotplug ens18 iface ens18 inet static address 192.168.2.35 netmask 255.255.255.0 gateway 192.168.2.60 When the gateway is 192.168.2.60;\nIf DNS configuration is also needed, modify /etc/resolv.conf\n1 nameserver 192.168.2.60 Disable IPv6\n/etc/sysctl.conf\n1 net.ipv6.conf.all.disable_ipv6 = 1 Don\u0026rsquo;t forget to restart the network: systemctl restart networking.\n","date":"2025-02-10T20:49:27+08:00","permalink":"https://svtter.cn/en/p/change-network-of-debian/","title":"Change Network of Debian"},{"content":"Since I frequently use multiple devices simultaneously, I often encounter the issue of writing articles on one device and then continuing to use Logseq on another. Simply copying and pasting feels cumbersome.\nI have roughly divided the solution to this problem into two stages. The first stage involves using a portable hard drive to directly copy data between different systems. During this stage, I utilized a Git bare repository.\nStage 1 git init --bare logseq-database.git\nThen, in other working Git repositories, I added a remote. For example, if my disk path is E:\\:\ngit remote add origin E:\\logseq-database.git\nThis way, I could synchronize Logseq data directly between different devices.\nStage 2 I found using a portable hard drive still inconvenient. I repurposed a 10-year-old ThinkPad, installed Gitea on it. Initially, I used Gogs, but Gogs had unfriendly handling of submodules and inexplicable bugs. Therefore, I ultimately switched to Gitea. After setting up Gitea, I migrated the original Git repository to my local machine. For example: http://gitea.local/svtter/logseq-database.git.\nStage 3 I discovered that storing and sharing large files often caused issues. I added support for git-lfs by running:\ngit lfs install\nand\ngit lfs track *.pdf\nto prevent large files from leaving too much data in the .git directory.\nStage 4 Periodically back up part of the data to GitHub. However, I generally no longer do this.\nWhen using Git for merging, one must be cautious about an issue: if a file name changes, automated merging will simply ignore it. Here’s how it typically happens: I modify a file on two devices simultaneously, and on one device, not only is the content changed, but the file name is also altered. Then, both devices perform a Git commit separately. As a result, when Git performs the merge, it won’t prompt a conflict. After Git’s automated merge, the modifications made on one of the devices will disappear.\nTo address this issue, the approach is to use the rebase method as much as possible during merging. However, rebasing is slow when merging files and requires a lot of manual handling.\nTherefore, when modifying the same file on two devices simultaneously, first pull the remote changes. Second, avoid changing file names whenever possible. Otherwise, changes may be lost.\nFortunately, Git commit history is always preserved. If all else fails, retrieve the lost parts from the commit records and add a new commit.\n","date":"2025-02-08T18:18:29+08:00","permalink":"https://svtter.cn/en/p/using-git-to-manage-logseq-files/","title":"Using git to manage logseq files"},{"content":"Browsing datasets can be quite troublesome, especially when the dataset is large.\nnpy (numpy array) and h5 files are two common data storage formats.\nThe drawback of h5 files is that they are prone to data corruption. I have encountered issues multiple times where h5 files could not be opened.\nnpy files have clear advantages in terms of read speed and file transfer. However, they are loaded entirely into memory at once, which can easily cause memory overflow if the server is not powerful enough.\nCommon image datasets typically separate labels and images, such as COCO. This allows you to use a file browser to view images and quickly observe their characteristics. However, in most cases, we don\u0026rsquo;t view images on a local computer but rather work with datasets on a server.\nIn 2024, when working with PyTorch, I find it more convenient to directly plot images using matplotlib. Matplotlib is generally used to display a single image, but using subplots allows you to display multiple images simultaneously. If OpenCV is used, you can overlay some label values onto the images. However, there is a drawback: if you are working on a remote server, transferring generated images can consume significant bandwidth.\nUltimately, the choice of method depends on your own judgment!\n","date":"2025-01-12T18:31:12+08:00","permalink":"https://svtter.cn/en/p/browsing-and-storing-image-datasets/","title":"Browsing and Storing Image Datasets"},{"content":"Previously, WordPress was running on CentOS 7; the performance of this machine was often underutilized, so some migration was needed to improve performance. To avoid losing relevant data, the WordPress migration work was carried out. This article documents the WordPress migration process.\nTo minimize the time spent on backups, I first used the WordPress plugin, All-in-one WP migration. This plugin can back up plugins, articles, themes, and other plugins.\nWhen backing up the old website, I downloaded the generated backup file.\nWhen creating the new website (via Coolify), the file upload kept failing. I wasn\u0026rsquo;t sure what was happening.\nSubsequently, I modified several upload restrictions.\nOne of them was .htaccess.\n1 2 3 4 5 php_value upload_max_filesize 200M php_value post_max_size 200M php_value memory_limit 256M php_value max_execution_time 300 php_value max_input_time 300 Another one is wp-config.php\n1 2 3 4 5 @ini_set( \u0026#39;upload_max_filesize\u0026#39; , \u0026#39;200M\u0026#39; ); @ini_set( \u0026#39;post_max_size\u0026#39;, \u0026#39;200M\u0026#39;); @ini_set( \u0026#39;memory_limit\u0026#39;, \u0026#39;256M\u0026#39; ); @ini_set( \u0026#39;max_execution_time\u0026#39;, \u0026#39;300\u0026#39; ); @ini_set( \u0026#39;max_input_time\u0026#39;, \u0026#39;300\u0026#39; ); My backup file is 199MB in size. However, despite adjusting the two files mentioned above, I found that I still couldn\u0026rsquo;t restore the backup. This left me puzzled. Through console debugging, I discovered that after the upload was completed, the server would return an HTTP 413 error. Later, I found this article and identified the issue.\nAfter troubleshooting, I realized that the problem was actually caused by Cloudflare. The free Cloudflare plan does not support file uploads larger than 100MB, resulting in an HTTP 413 error.\nSubsequently, I configured my local hosts file to allow the domain to directly connect to the server\u0026rsquo;s real IP address.\nFinally, it succeeded.\n","date":"2024-11-15T17:16:45+08:00","permalink":"https://svtter.cn/en/p/documenting-a-wordpress-migration/","title":"Documenting a WordPress Migration"},{"content":"This is a summary for July 2024 to September 2024.\nThis quarter passed quite quickly; it doesn\u0026rsquo;t seem long since the last review. There truly is a gap in intelligence between people; I feel like I\u0026rsquo;m a not-very-hardworking fool.\nWhen conducting a review, try not to modify the content of the journal as much as possible. Otherwise, it will be troublesome to trace back later.\nLife #life Most of my energy during this period has been focused on taking care of my pregnant wife. The child was born safely, and the mother is doing well and happy 😆. For friends preparing for pregnancy and childbirth, I recommend this book: Pregnant with My Wife. Pregnancy is quite challenging for both men and women. Influenced by hormones, emotions can often fluctuate easily, and the man needs to balance work and family well.\nFinancially, prepare sufficient funds and budget for family expenses in advance. If your job is not stable enough, I don\u0026rsquo;t recommend rushing to have children. Although the old saying goes, \u0026ldquo;Start a family before establishing a career,\u0026rdquo; starting a family doesn\u0026rsquo;t mean having children immediately. During pregnancy, the emotional needs of the woman will be higher, so be mentally prepared. It\u0026rsquo;s best to seek advice from slightly older friends or relatives. Also, pay special attention to the location for childbirth. The ideal situation is to have your own independent family. Otherwise, you might have to deal with a series of complicated matters, which can be exhausting. Adding work pressure on top of that makes it even more difficult. In short, if conditions aren\u0026rsquo;t right, don\u0026rsquo;t have children; don\u0026rsquo;t make things hard for yourself. If you\u0026rsquo;re unhappy, your family will likely be unhappy too.\nRead Rich Dad Poor Dad 2 but couldn\u0026rsquo;t finish it. Made a small profit in stocks and cashed out; couldn\u0026rsquo;t keep up with the market trend anymore. In investment and financial management, too many people want to make quick money, so designing strategies directly aimed at quick profits can make money. Leverage is absolutely not advisable; it\u0026rsquo;s just used to deceive gamblers.\nI wonder how the friend who borrowed money to buy stocks a couple of days ago is doing.\nResearch #Research Although a new network was constructed, it couldn\u0026rsquo;t be transformed into a publishable paper.\nExperiments and New Research New experiments verified that CRNN is still the best method in certain scenarios; formed an article specifically discussing CTC. However, it\u0026rsquo;s not yet at the level for publication.\nTried many multimodal approaches, different encoding methods, word embedding, and one-hot. Spent a lot of time learning about transformers. Completed querynet, a new structure for solving multimodal problems. However, it still hasn\u0026rsquo;t solved the problem I proposed.\nIn terms of mid-term progress, the content is insufficient. Although I\u0026rsquo;ve done many experiments now, relatively few can be solidified into theories, and the overall coherence is lacking. This is not very consistent with the original plan.\nRegarding submissions, just received feedback on the new paper SWR; the new paper has been rejected and is still in rebuttal. But overall, it\u0026rsquo;s still okay.\nI feel that the current research progress is fine, but there haven\u0026rsquo;t been significant achievements.\nSome Thoughts If researching a field, spend time exploring, organizing materials, and increasing understanding of the industry. Organizing thoughts #thinking Tech-related #Tech Extensively adopted functional programming Deeply explored the React framework, understood useState, useEffect, and fixed many bugs in the framework. Deeply explored react-router. Project-related #project Developed a relatively universal front-end and back-end underlying framework. Front-end based on React, back-end based on Django Ninja. After so many years of development, I\u0026rsquo;ve finally solidified something. By using this framework, many problems brought by the native Django and React frameworks can be avoided. This is a kind of account for all these years. The framework still has many areas for improvement, with the biggest help being efficiency improvement. If there\u0026rsquo;s an opportunity, it will be open-sourced. Next Quarter Adjust SWR well, submit it, and strive to get one article accepted. Find a way forward for querynet and the new research problem proposed, respectively. Launch meterhub. On top of completing the above, finish the projects at hand and then organize the graduation thesis.\n","date":"2024-10-14T11:08:56+08:00","permalink":"https://svtter.cn/en/p/2024-q3-summary/","title":"2024 Q3 Summary"}]