Posts on Svtter's Blog

OMP M3 Model Patch: Adding MiniMax M3 to pi-ai

Mon, 01 Jun 2026 12:00:00 +0800

MiniMax released M3 on 2026-06-01 (minimax/minimax-m3-20260531 on OpenRouter), but the upstream models.json shipped by @oh-my-pi/pi-ai@15.7.3 hadn’t been updated to include it. This post documents the patch I applied to add M3 support across all five provider endpoints.

Target File

1

~/.bun/install/global/node_modules/@oh-my-pi/pi-ai/src/models.json

Provider Entries Added (5)

All entries are appended at the end of their respective provider object, mirroring the structure of the existing MiniMax-M2.7 entry.

1. `minimax` (Official Anthropic-compatible API)

key: MiniMax-M3
api: anthropic-messages
baseUrl: https://api.minimax.io/anthropic
contextWindow: 204800, maxTokens: 131072
cost: input 0.3, output 1.2, cacheRead 0.06, cacheWrite 0.375
thinking: budget mode, minimal..xhigh

2. `minimax-cn` (Official Anthropic-compatible API, China)

key: MiniMax-M3
api: anthropic-messages
baseUrl: https://api.minimaxi.com/anthropic
Same context/cost/thinking as minimax

3. `minimax-code` (Coding Plan, OpenAI-compatible)

key: MiniMax-M3
api: openai-completions
baseUrl: https://api.minimax.io/v1
cost: all 0 (Coding Plan flat-rate)
compat: supportsStore=false, supportsDeveloperRole=false, supportsReasoningEffort=false, reasoningContentField=reasoning_content
thinking: effort mode, minimal..high

4. `minimax-code-cn` (Coding Plan CN)

Mirror of minimax-code with baseUrl: https://api.minimaxi.com/v1 and provider minimax-code-cn.

5. `openrouter` (OpenRouter Passthrough)

key: minimax/minimax-m3-20260531
api: openai-completions
baseUrl: https://openrouter.ai/api/v1
cost: input 0.3, output 1.2, cacheRead 0.05, cacheWrite 0
thinking: effort mode, minimal..high

Verification

Searching for "MiniMax-M3|minimax-m3" in the patched file returns exactly 5 hits — one per provider block.

Caveats

omp update will overwrite the patch. Re-apply after updates, or pin the package version.
If upstream later ships an official M3 entry, our local copy may diverge (custom pricing/context) until the next update.
Pricing values for M3 were inferred from the M2.7 template and the OpenRouter listing ($0.30 / $1.20). Confirm against the official MiniMax pricing page if cost accuracy matters.
Context window (204800) and maxTokens (131072) mirror M2.7 — adjust if M3 differs at GA.

Addendum (2026-06-02): The proper route via OMP user config

The pi-ai patch above is a hack — any omp update re-pulls the package and the patch is gone. The proper OMP way is ~/.omp/agent/models.yml: a user-level file that OMP merges on top of the built-in catalog, with no bun-global dependency, and which omp update leaves alone.

Final config

Append to ~/.omp/agent/models.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


# MiniMax M3 Code Plan
# Set MINIMAX_API_KEY in ~/.zshenv first
 minimax:
 baseUrl: https://api.minimaxi.com/anthropic
 apiKey: MINIMAX_API_KEY
 api: anthropic-messages
 authHeader: true
 disableStrictTools: true
 models:
 - id: MiniMax-M3
 name: MiniMax M3
 reasoning: true
 input: [text, image]
 contextWindow: 1000000
 maxTokens: 16384
 cost:
 input: 0
 output: 0
 cacheRead: 0
 cacheWrite: 0

apiKey: MINIMAX_API_KEY follows OMP’s resolution rule: try the value as an env-var name first, then fall back to a literal. I export MINIMAX_API_KEY=$MINIMAX_CODE_PLAN_KEY in ~/.zshenv, so the key is sourced at runtime and the dotfile stays clean for git.

Key choices

Why anthropic-messages, not openai-completions: M3 speaks both protocols. The openai-completions route had two friction points:

OMP’s openai-completions transport emits developer role + reasoning_effort for reasoning models. MiniMax’s schema check is stricter than OpenAI’s, and an empty reasoning field occasionally 400s
After switching to anthropic-messages, tool calls and streaming reasoning go through the Anthropic SDK normalization path — same as kimi and claude

Why disableStrictTools: true: The Anthropic SDK sends strict: true on every tool definition by default. Third-party Anthropic-fronted gateways (MiniMax, kimi, etc.) usually don’t recognize the field and 400. The kimi provider in the same file already sets this flag. The trade-off is that tool schemas are not server-side validated, so prompts have to carry the schema discipline.

Context 1M / maxTokens 16K: contextWindow: 1000000 matches OpenRouter’s spec for minimax/minimax-m3-20260531 (M2.7 was 204800, M3 is 5× that). maxTokens: 16384 carries over from M2.7 — I couldn’t find an official M3 number. cost is all zero because the Code Plan is flat-rate.

Switching to it

1
2
3
4
5


# At launch
omp --model minimax/MiniMax-M3

# Or in the TUI
/model minimax/MiniMax-M3

After the switch, /status should show ANTHROPIC_BASE_URL pointing at api.minimaxi.com/anthropic.

How the two routes compose

Dimension	pi-ai bundled `models.json` patch	`models.yml` custom provider
Persistence	`omp update` wipes it	Persistent
Cross-machine sync	No (bun-global path)	Yes (dotfile in git)
Upgrade cost	Re-apply patch	OMP merges automatically
Merge with built-in	Yes	Yes, last-write-wins

The two compose. models.yml providers enter through OMP’s “custom” channel; whatever pi-ai later ships in its bundled list (if M3 lands upstream) enters through the “built-in” channel. When both define the same provider/model with different baseUrl, OMP’s last-write-wins rule means models.yml always wins — which is exactly what you want for a CN endpoint override.

How kimi-code Handles kimi-k2.6: A Comparison with OpenCode

Wed, 27 May 2026 10:30:00 +0800

Recently, kimi-code migrated from Python to TypeScript. Here’s a quick analysis.

Based on my review of the kimi-code source code (particularly packages/kosong/src/providers/kimi.ts, kimi-schema.ts, kimi-files.ts, etc.) and relevant OpenCode compatibility issues, here are the kimi-k2.6-specific optimizations in kimi-code and how they differ from OpenCode.

1. Native Kimi Provider (Not a Generic OpenAI-compatible Layer)

kimi-code does not treat Kimi as “just another OpenAI-compatible endpoint.” Instead, it implements a dedicated kimi provider type:

Feature	kimi-code	OpenCode
Provider Type	Dedicated `'kimi'` type with independent adapter	Accessed via generic OpenAI/Anthropic bridge
Proprietary Fields	Native handling of `reasoning_content`, `thinking`, `generationKwargs`	`reasoning_content` often lost in the bridge layer
Auth Headers	Supports `kimiRequestHeaders`, `X-Msh-Tool-Call-Id`, and other Moonshot-specific headers	Generic header forwarding

2. Full Lifecycle Handling of `reasoning_content`

kimi-k2.6 has thinking enabled by default and requires reasoning_content to be preserved across multi-turn conversation history. Otherwise, tool calls will result in a 400 error.

How kimi-code handles it:

convertMessage: Extracts internal think content parts and serializes them into the reasoning_content field, ensuring thinking content is never lost in message history
Streaming Parser: Explicitly extracts delta.reasoning_content / message.reasoning_content in both _convertStreamResponse and _convertNonStreamResponse
TUI Rendering: A dedicated ThinkingComponent renders thinking content in real time, with expand/collapse support and a spinner animation

OpenCode’s Problem:

The OpenCode Go bridge drops reasoning_content on the second turn, causing the Moonshot API to return:

1

thinking is enabled but reasoning_content is missing in assistant tool call message

3. JSON Schema Normalization (`kimi-schema.ts`)

Moonshot’s tool parameter validator has strict and unique requirements for JSON Schema. This is one of the primary sources of incompatibility between OpenCode and kimi-k2.6.

What kimi-code’s normalizeKimiToolSchema does:

Dereferences $ref: Inlines definitions from $defs / definitions, eliminating external references
Fills in missing type: The Kimi validator rejects nested property schemas that omit type (e.g., MCP-generated enum-only schemas). kimi-code infers and backfills type: string/object/array, etc.
Circular reference detection: Preserves the original $ref when a circular reference is detected, avoiding infinite recursion

OpenCode’s Problem:

Generated schemas use #/definitions/ instead of the #/$defs/ format required by Moonshot, and lack schema type inference and backfilling for Kimi, causing complex tool calls to fail with 400.

4. Native Thinking Mode Configuration System

kimi-code has built-in support for Kimi’s thinking mode from the configuration layer all the way to the UI:

Config Parsing: ThinkingConfigSchema supports mode: auto/on/off and effort: low/medium/high/xhigh/max
Model Capability Tags: ModelAlias supports capabilities: ['thinking', 'always_thinking']
Model Selector UI: Press ←→ to toggle thinking on/off; always-on models cannot be turned off

Provider Method: withThinking(effort) correctly generates:

1
2
3
4


{
 "reasoning_effort": "high",
 "extra_body": { "thinking": { "type": "enabled" } }
}

Token Budget: Automatically normalizes legacy max_tokens to Kimi’s preferred max_completion_tokens

OpenCode’s Problem:

When using the Anthropic bridge, it hardcodes thinking content blocks, but the Kimi API only supports text/image_url/video_url/video, resulting in:

1

Invalid value: thinking. Supported values are: 'text','image_url','video_url' and 'video'.

5. Native Moonshot Service Integration

kimi-code includes Moonshot-exclusive services instead of relying on generic local implementations:

MoonshotFetchURLProvider: Prioritizes Moonshot’s coding-fetch service (with built-in page text extraction), falling back to local fetch only on failure
MoonshotWebSearchProvider: Calls the Moonshot search API directly, supporting enable_page_crawling
KimiFiles: Uploads videos to the Moonshot file service, returning video_url in the ms://<file-id> format

6. Tool Call Layer Details

Built-in Functions: Tool names starting with $ are recognized as Kimi builtin functions and serialized as type: 'builtin_function'
Usage Extraction: Supports Moonshot’s proprietary choices[0].usage placement, as well as cached_tokens and other fields
Finish Reason Mapping: Maps OpenAI-style stop/tool_calls/length values to an internal unified enum

7. CLI Core and LLM SDK Architectural Isolation

This is an easily overlooked but important architectural difference.

The core CLI of kimi-code (apps/kimi-code) does not directly depend on any OpenAI or Anthropic TypeScript SDK. Looking at its package.json, the core dependencies are only generic libraries like TUI rendering (pi-tui), CLI parsing (commander), and syntax highlighting (cli-highlight). All LLM provider interactions are isolated within the self-developed kosong package.

While packages/kosong internally uses openai and @anthropic-ai/sdk as implementation details (since the Kimi API is OpenAI-compatible), it exposes a unified LLM abstraction interface to the outside. The CLI core only depends on kosong and has no awareness of underlying vendor SDKs.

OpenCode is different. Its packages/opencode core package directly depends on a large number of vendor SDKs:

@ai-sdk/openai
@ai-sdk/anthropic
@ai-sdk/google
@ai-sdk/azure
@openrouter/ai-sdk-provider
… (more than a dozen provider-specific packages in total)

This means OpenCode’s core code is deeply coupled with each vendor’s SDK, while kimi-code’s core CLI stays clean, with all model interactions fully isolated through a self-developed abstraction layer.

8. What Commit History Reveals About Evolution Paths

The structural code differences above are just a static snapshot. What’s more interesting is comparing the commit histories of the two projects—their dynamic evolution directions are completely different.

kimi-code: Native Design, Continuously Reducing Configuration Burden

842e699 — “Kimi For Coding” (Initial Commit)

This was the starting point of the entire project. The initial code already included:

packages/kosong/src/providers/kimi.ts: Dedicated Kimi provider
packages/kosong/src/providers/kimi-schema.ts: Dedicated JSON Schema normalizer
packages/kosong/src/providers/kimi-files.ts: Dedicated file upload service

Conclusion: kimi-code treated the Kimi API as a first-class citizen from day one, not as a later patch.

d95b013 fix(catalog): preserve reasoning fields in custom model (#70)

This commit fixed a very subtle issue. models.dev uses the interleaved field to mark reasoning support, but early code treated interleaved=true as undefined, causing models selected via /connect to silently lose their reasoning capability.

Fixes:

interleaved=true is mapped to the default reasoning_content
interleaved is added to the update-catalog.mjs allowlist; otherwise the offline catalog in release builds would silently drop the field again

61f7d0e fix(kosong): make openai-compatible thinking work without reasoning_key (#78)

This is the core commit for reasoning handling, showcasing kimi-code’s deep thinking on compatibility. The diff reveals a three-layer design:

Inbound Auto-Scan (response parsing)

1
2


const KNOWN_REASONING_KEYS = ['reasoning_content', 'reasoning_details', 'reasoning'] as const;
// Auto-scan three fields; first string value wins

Outbound Default Write-Back (request serialization)

1
2


const DEFAULT_OUTBOUND_REASONING_KEY = KNOWN_REASONING_KEYS[0]; // 'reasoning_content'
// Defaults to writing back as reasoning_content, no user config needed

Auto-Inject reasoning_effort (historical continuity)

1
2


// When history contains ThinkPart but caller hasn't explicitly set reasoning_effort,
// auto-inject 'medium' to prevent strict gateways like One API / DeepSeek from returning 400

Edge cases are handled meticulously: blank reasoning_key ("") is normalized to undefined; values explicitly set by the caller via withGenerationKwargs are not silently overwritten by auto-injection.

The verification goal explicitly states:

Manually verified end-to-end against the real DeepSeek API with a hand-written config.toml that does not set reasoning_key: thinking content renders, no 400, multi-turn conversations work.

OpenCode: Generic Layer Design, OpenAI-centric

eb84f46 fix(llm): split OpenAI reasoning summary blocks (#29000)

This commit demonstrates OpenCode’s completely different approach to reasoning—designed around the OpenAI Responses API:

Maintains a state machine for encrypted_content and item_reference
Folds multiple summary parts by item_id + summary_index
When store:false, filters out reasoning items lacking encrypted_content

This is completely different from Kimi’s reasoning_content mechanism. Kimi does not need encrypted_content or item_reference; it simply attaches a reasoning_content field to the message.

A Hard Fact

OpenCode Issue #26331 “Bug: OpenCode Go bridge layer incompatible with kimi-k2.6 tool calls” — Status: still open
OpenCode Issue #27054 “KIMI K2.6 showing error in Opencode GO” — Status: closed, but the resolution was to disable MCP (a workaround)

The last comment on #27054:

The workaround is to disable your MCP and then initiate the session

That’s not a fix. That’s avoiding the problem.

Commit History Comparison Summary

Dimension	kimi-code	OpenCode
Initial Design	Initial commit includes full Kimi provider + schema normalizer + file service	Generic multi-model architecture, adapted later via bridge
Reasoning Mechanism	Designed around `reasoning_content` field, with auto-scan / write-back / effort injection	Designed around OpenAI Responses’ `encrypted_content` + `item_reference`
Schema Handling	Dedicated `normalizeKimiToolSchema`, dereferences `$ref` + backfills `type`	Generic schema validation, focused on friendly error messages
Config Philosophy	Makes OpenAI-compatible gateways “zero-config” by auto-inferring all fields	Relies on users manually adapting via bridge/config
Issue Status	Continuously shipping reasoning-related patches (#70, #78)	kimi-k2.6 compatibility issue #26331 still open

Summary: Core Differences

Dimension	kimi-code	OpenCode
Architecture Positioning	Native design for Kimi/Moonshot, dedicated provider	Generic multi-model agent, adapted via bridge
Thinking/Reasoning	Native support, full lifecycle preservation of `reasoning_content`	Easily lost in bridge layer, causing 400 errors
JSON Schema	Dedicated `normalizeKimiToolSchema` for dereferencing and type backfilling	Generic schema generation, does not meet Kimi validator requirements
API Format	Directly generates Moonshot-native format (including `thinking` config, `$defs` normalization, etc.)	Transformed through OpenAI/Anthropic protocol conversion, causing format mismatches
Service Integration	Built-in Moonshot fetch/search/file services	Uses generic local tools
Core Dependencies	CLI core does not directly depend on vendor SDKs; isolated via self-developed `kosong` package	Core package directly coupled with `@ai-sdk/openai` and more than a dozen other vendor SDKs

Looking at commit history, kimi-code’s evolution is directed at continuously eliminating user configuration burden (reasoning_key went from required → optional override → auto-inferred; interleaved went from filtered → correctly mapped), while OpenCode’s evolution is directed at deepening OpenAI ecosystem integration (Responses API, encrypted reasoning, item reference), leaving Kimi adaptation stuck at the generic bridge layer.

That’s the truth at the commit level: one is native evolution, the other is a bridge gap.

RoboOmp: An AI Bot That Creates Its Own Pull Requests

Sat, 23 May 2026 18:00:00 +0800

Yesterday at the Oh My Pi (OMP) repository, I experienced something shocking: an AI bot didn’t just reply to my issue—it understood the problem, dug through the source code on its own, and opened a precise PR to fix the bug. The entire process took less than 5 minutes.

The Origin

When using OMP (a terminal AI coding agent), I discovered a UX issue: Ctrl+T can hide thinking blocks, but hiding them simultaneously turns off extended thinking entirely—not just hiding the display, but the model stops thinking altogether. Users assume they’re just “turning off the display,” but the actual effect is “turning off the brain.”

So I went to the OMP GitHub repository and opened a feature request: #1313.

RoboOmp’s First Response

Seconds after I submitted the issue, a bot called roboomp automatically replied. Not with template nonsense like “thanks for your feedback, forwarded to the product team.” It directly told me:

Most of this feature already exists—the hideThinkingBlock setting, Ctrl+T shortcut, and rendering path
The only missing piece is a CLI startup parameter
There’s a design decision that requires maintainer input: the coupling between hideThinkingBlock and hideThinkingSummary

And it provided exact filenames and line numbers: settings-schema.ts:663, input-controller.ts:755, stream.ts:583,697.

This wasn’t cobbled together from search results—it actually read the code.

I Pointed Out the Design Flaw

I replied with a comment explaining that this coupling is a footgun:

Users press Ctrl+T intending to reduce visual noise, but unknowingly turn off extended thinking, degrading model output quality
“Don’t want to see the reasoning process” and “don’t want the model to reason” are two different things that shouldn’t be tied together
The behavior varies across providers (MiniMax can’t turn it off, Anthropic/OpenAI can), so the same shortcut has inconsistent behavior

I also included the commit history that introduced this coupling for easier tracing.

It Opened a PR Itself

Then something unbelievable happened—roboomp replied with two consecutive comments and directly opened a PR: #1314.

The PR changes: 0 addition, 3 deletion. It only deleted three lines:

sdk.ts:1860 — agent initialization no longer assigns hideThinkingBlock to hideThinkingSummary
input-controller.ts:758 — Ctrl+T handler no longer links them
selector-controller.ts:273 — settings UI follows the same logic

The PR description included complete repro steps, root cause analysis, and fix approach. It even confirmed the commit archaeology I provided—45bd444 was indeed the commit that introduced this bug.

Why This Shocked Me

“AI can write code” isn’t news. Copilot, Claude Code, Cursor can all write code. But what’s different this time:

Complete Closed Loop

The entire process was zero-human:

I opened an issue → bot read the codebase, provided existing implementation status
I pointed out the design flaw → bot understood my point
It located the commit that introduced the bug itself, opened a PR that deletes just 3 lines

From issue to PR, no human did anything in between.

It Knows When to Wait

In its first reply, it said “Holding on implementation until a maintainer weighs in on the coupling question”—it knew this was a design decision requiring judgment and shouldn’t act autonomously. But when I clarified the coupling problem, it determined that waiting was no longer necessary and opened a PR directly.

The Fix Was Minimal

0 addition / 3 deletion. It understood what the minimal fix was—no refactoring, no abstraction, no gold-plating. Many human developers can’t do this.

What Is RoboOmp

RoboOmp is an AI bot deployed by can1357, the OMP repository maintainer. It’s not a GitHub Actions workflow (I checked the CI config to confirm), but an independent server-side agent:

Listens to GitHub Webhook events (issue creation, comments, etc.)
Reads source code through GitHub API, understands code structure
Uses LLM to analyze context, autonomously decides next steps—comment, label, open PR

From can1357’s GitHub profile, this person comes from a hypervisor/reverse engineering background (ByePg, NoVmp, NtRays), now working on AI agent platforms (agentx, hindsight). RoboOmp is likely the result of building exceptionally deep code understanding capabilities.

This project is not open source.

Are There Similar Open Source Projects

I looked around, and currently the closest ones are:

Project	Description
optio (962⭐)	AI coding agent workflow orchestration, task → merged PR
claude-code-github-agent	Hooks 40+ GitHub events, auto triage/review/fix, architecture most similar to roboomp
software-factory	Issue/PR-driven automatic development system

But honestly, none reach roboomp’s level. Most are still at the “receive webhook → call LLM → post comment” stage. RoboOmp is the first I’ve seen that can autonomously read source code, understand code structure, participate in design discussions, and make precise fixes.

What This Means

This made me realize that the capability boundaries of AI coding agents are expanding rapidly. A year ago we were discussing “can AI write correct code,” now the question is “can AI be a maintainer in open source communities.”

The capabilities roboomp demonstrated—reading code, understanding context, participating in discussions, making minimal fixes—are essentially what a junior maintainer does. If this capability continues to improve, the maintenance model of open source projects could undergo fundamental changes.

Think about it: what does an open source maintainer spend the most time on every day? Replying to issues, triaging bugs, writing small fixes. These are exactly what roboomp excels at. If every open source project could deploy such a bot, maintainers could focus their time on architectural decisions and community building.

Of course, current limitations are obvious—it can only handle problems with clear boundaries and well-defined scope. But this experience makes me believe that “AI maintainer” is not a distant future scenario, but something happening right now.

OpenCode's GitHub Actions Automation System: Engineering Practices Behind 27 Workflows

Fri, 22 May 2026 10:00:00 +0800

opencode is a 160k-star AI coding tool with 27 workflow files in its .github/workflows/ directory. This number is not uncommon for open source projects, but what’s truly interesting is not the quantity, but the scope these workflows cover: from conventional CI/CD to AI-driven community governance, they’ve done almost everything GitHub Actions can do.

This article analyzes the design of these workflows by category, discusses the pros and cons of this level of automation, and shares insights for our own projects.

Overview

The 27 workflows can be divided into four categories:

Category	Count	Purpose
CI/Testing	4	typecheck, unit tests, e2e, Nix builds
Release/Delivery	5	CLI release, container builds, VS Code extension, GitHub Action release
Automation/Bot	16	issue governance, PR compliance, AI code review, documentation updates
Docs/Other	2	statistics, Discord notifications

16 automation workflows account for 60% of the total. opencode doesn’t just use Actions to run tests and releases—it also entrusts community governance and code quality review to the automation system.

CI/Testing: Solid but Restrained

Four testing-related workflows:

typecheck.yml — Runs bun typecheck on PR and push to dev. Simple and direct, no unnecessary actions.

test.yml — Cross-platform test matrix (Linux + Windows), runs unit tests and Playwright e2e. Has concurrency control where new commits in the same PR cancel old runs. Test results generate JUnit reports uploaded as artifacts.

nix-eval.yml — Verifies Nix flake builds on four architectures (x86_64-linux, aarch64-linux, x86_64-darwin, aarch64-darwin). Mandatory package failures block the build, optional package failures are just warnings.

storybook.yml — Storybook builds for UI components, only triggered when storybook/ui-related files change. Path triggering avoids unnecessary runs.

Several noteworthy design choices:

concurrency group + cancel-in-progress: Multiple workflows use this pattern so the same PR doesn’t stack multiple runs. For a project receiving lots of community PRs, this saves significant CI resources.
Path triggering: containers.yml only runs when container files change, storybook.yml only runs when UI changes. Not everything runs on all commits.
Mixed Runner Strategy: Most workflows use Blacksmith’s third-party hosted runners (blacksmith-4vcpu-ubuntu-2404, blacksmith-4vcpu-windows-2025). Blacksmith is a GitHub Actions API-compatible accelerated runner service using custom infrastructure, significantly faster than GitHub’s free runners. Only lightweight bot tasks (close-issues, close-prs, compliance-close, pr-standards, deploy) stay on GitHub’s native ubuntu-latest. Compute-intensive compilation, testing, and releases all go through Blacksmith, simple script tasks use GitHub’s native runners, allocating resources by task load.

Release/Delivery: Full Platform Coverage

publish.yml is the most complex workflow, handling the complete release process in a single file:

Version number calculation
CLI build matrix (multi-platform, multi-architecture)
Windows code signing (Azure Signing)
macOS code signing (Apple Developer)
Electron app builds
npm publishing
GitHub Release creation
AUR (Arch Linux) publishing

One workflow covers distribution for CLI, desktop apps, npm packages, and Linux packages. This “release everywhere at once” pattern is user-friendly—regardless of platform, everyone gets the new version on the same day.

Other release workflows are split by artifact type:

publish-github-action.yml — Listens for github-v* tags, publishes GitHub Action to Marketplace
publish-vscode.yml — Listens for vscode-v* tags, publishes to both VS Code Marketplace and Open VSX
containers.yml — Multi-architecture container image builds, pushes to GHCR
release-github-action.yml — Creates pre-releases when github directory changes on dev branch

Tag triggering is a good practice: releases are explicit actions, not triggered by accidental code pushes. publish.yml automatically builds snapshots when pushing to ci/dev/beta/fix branches, but official releases require manual dispatch or tags.

Automation/Bot: AI-Driven Community Governance

This is opencode’s most distinctive feature. Among the 16 automation workflows, multiple directly call upon opencode’s own AI capabilities to handle community affairs.

Issue Management

triage.yml — When a new issue is created, opencode AI automatically triages it, adding labels and categories.

duplicate-issues.yml — When a new issue is created/edited, opencode AI analyzes whether it duplicates existing issues. Also checks whether it follows one of three issue templates and whether it contains AI-generated content. Non-compliant issues get a needs:compliance label.

compliance-close.yml — Every 30 minutes, checks issues/PRs with needs:compliance label and auto-closes if not fixed within 2 hours. Different prompt messages are given for issues vs PRs when closing.

close-issues.yml — Closes stale issues daily at 2 AM UTC.

These four layers form complete issue lifecycle management:

1

New issue → AI triage → duplicate/compliance check → compliance grace period → stale cleanup

PR Management

pr-standards.yml is one of the longest workflows, doing two things:

Title format check: Enforces conventional commits format (feat/fix/refactor/…)
Template compliance check: PR description must include required sections like issue references, change type, verification method

Non-compliant PRs get a needs:compliance label and auto-close after 2 hours. Team members and bots are exempt.

pr-management.yml — Checks for duplicates when PR is created, adds labels for community contributors.

close-prs.yml — Closes PRs older than 1 month with insufficient reactions daily at 10 PM UTC. Default threshold is 2 reactions, configurable.

AI Code Review

review.yml — Input /review in PR comments, opencode AI analyzes code and leaves review comments on specific lines. Only available to repo owner/members.

opencode.yml — Input /oc or /opencode in issue or PR comments to trigger opencode AI for more general interactions.

These two workflows demonstrate the “AI as collaborator” approach: not fully automatic code review, but on-demand triggering with humans making final decisions in the loop.

Documentation & Maintenance

docs-update.yml — Every 12 hours, checks recent commits and uses opencode AI to determine if documentation needs updates.

generate.yml — Runs code generation scripts when pushing to dev, auto-commits changes.

beta.yml — Syncs beta branch hourly.

stats.yml — Updates download statistics to STATS.md daily.

Design Patterns Worth Adopting

1. Layered Governance

opencode doesn’t stuff all automation into one workflow, but splits it by responsibility. An issue goes through four workflows in relay from creation to closure. Each workflow does one thing, combining to form a complete governance chain.

Benefits of this design:

Individual workflows can be modified or disabled independently without affecting other steps
Each workflow’s trigger conditions and permission scope are minimized
Easy to locate which step has problems when they occur

2. Compliance Grace Period

compliance-close.yml doesn’t close immediately upon detecting non-compliance, but gives a 2-hour grace period. This is reasonable for global contributors in different time zones—you might submit an issue while sleeping, and wake up with time to fix it.

3. AI at Decision Points, Not Execution Points

triage, duplicate detection, and code review all have AI make initial assessments, with humans making final decisions. But execution-level tasks like code builds and releases don’t use AI at all. This is a pragmatic division: AI excels at pattern recognition and initial classification, but not precise execution.

4. Explicit vs Automatic Triggers

Releases use tag triggers, maintenance uses schedule triggers, governance uses event triggers. Three trigger types correspond to three different automation trust levels: releases need human confirmation, maintenance can be scheduled automatic, governance needs immediate response.

Risks of Over-Automation

opencode’s automation system is comprehensive, but there are points to watch:

Community barrier: New contributors submitting issues must follow specific templates, PRs must conform to conventional commits, otherwise auto-closed after 2 hours. For a 160k-star project, this strictness is reasonable—it filters out many low-quality contributions. But for small projects, this level of automation would scare away potential contributors.

Maintenance cost: 27 workflows means 27 automation scripts to maintain. opencode has custom runners and dedicated scripts. If a workflow’s logic needs adjustment, maintainers need to switch between GitHub Actions YAML and custom scripts.

AI uncertainty: duplicate-issues and triage use AI for judgment, but AI can misjudge. A reasonable issue marked as duplicate and closed creates a negative experience for contributors. opencode uses grace periods and manual review to mitigate this, but the risk remains.

Insights for Our Projects

Not every project needs 27 workflows. But opencode’s layered governance and “AI at decision points” approach are worth referencing:

Start with issue templates: If the project starts receiving lots of duplicate or low-quality issues, add templates and duplicate checking first, rather than manually handling each one.
Use grace periods for compliance checks: Always give a grace period when auto-closing non-compliant contributions.
Use AI for classification, not execution: Let AI help triage issues and check PR formats, but don’t let AI auto-merge code or publish releases.
Use tag triggers for releases: This is the safest approach. Automatic snapshot releases are acceptable, official versions need human confirmation.
Add on demand: Add automation only when you have pain points. opencode’s 27 workflows weren’t built in a day, but gradually added as community scale grew.

Summary

opencode’s GitHub Actions system demonstrates automation practices for large-scale open source projects: CI/CD covers full platform releases, community governance uses multi-workflow relay processing, AI is applied to decision points like triage and review. The core of this system is not technical complexity, but three principles: “layered, grace periods, explicit triggers”. For our own projects, we don’t need to copy all 27 workflows, but these principles can be directly applied.

OpenCode Optimization Beyond Configuration — Plugin-Based Optimization

Tue, 19 May 2026 10:00:00 +0800

I previously wrote an article OpenCode Configuration Optimization Record, which addressed token consumption and context management issues. However, configuration optimization handles “how the model runs,” while “the quality of code when it’s half-written” is something configuration cannot manage. This article starts from my development process of the opencode-review plugin, discussing how opencode-review helps an agent review and improve its own code within a session, resulting in higher quality code entering the PR.

Problem: Who Guards Code Quality Within a Session?

When using OpenCode to write code, a typical workflow is: the agent completes coding within a session, then I review the diff and create a PR. But I discovered a recurring problem: code written by agents often enters PRs with “first draft” quality issues.

These issues include: missing error handling, security vulnerabilities, poorly performing queries, and missing tests. If the agent could perform a self-review within the session—before the code is committed to the PR—many problems wouldn’t exist at the PR stage.

This is different from code review at the CI stage. I’ve already implemented CI review through opencode-actions (I previously wrote an introductory article)—it happens after PR creation, triggered by GitHub Actions. Later, Cloudflare also shared similar ideas in their engineering blog: using OpenCode to build large-scale AI code review. opencode-review aims to solve an earlier stage: within the session, before the PR, enabling the agent to proactively review and fix issues after writing code. The two complement each other: opencode-review raises the quality baseline of code entering the PR, while opencode-actions serves as the final checkpoint.

Specifically, there are three sub-problems to address:

Incomplete review coverage: Code generated by agents may introduce security vulnerabilities and performance issues, but they won’t proactively check for these
Lack of systematic review framework: Without structured dimensions to evaluate code, it’s easy to focus only on functional correctness while ignoring security and performance
Lack of closed loop between issue discovery and fixes: Even when the agent discovers problems, a mechanism is needed to automatically fix them rather than waiting for someone to point them out

Design of opencode-review

Based on these three problems, I designed opencode-review: a structured code review plugin.

Multi-Dimensional Analysis

The first design decision is why divide into five dimensions rather than a general “good or bad” evaluation.

Code quality is not a single dimension. A piece of code may be functionally correct and performant, but contain SQL injection vulnerabilities; or it may be secure and harmless, but lack test coverage. Evaluating them together inevitably leads to vague results.

Academically, the Modern Code Review (MCR) Survey collected code review research from 2013-2025, proposing a classification system covering multiple task dimensions including defect detection, security review, performance analysis, and maintainability assessment. Ericsson’s research team also verified in Automated Code Review Using Large Language Models at Ericsson that dimension-specific review is more effective in industrial scenarios than general review.

opencode-review’s five dimensions—code-quality, security, performance, testing, documentation—correspond to the core review dimensions identified in these studies. Each dimension can be independently toggled because different projects focus on different priorities: an internal tool may not need documentation review, but a security-sensitive service cannot skip the security dimension.

Severity Grading

The second design decision is why divide into three severity levels (critical / suggestion / highlight).

This comes from lessons learned in the static analysis tool domain. Security tools and linters have long faced a problem: alert fatigue. When all issues are marked as equally important, developers start ignoring them. Veracode’s research points out that the direct consequence of alert fatigue is that truly serious issues get drowned out in noise.

The logic of three levels is:

critical: Must fix (security vulnerabilities, logic errors, resource leaks)
suggestion: Suggested improvements (code readability, performance optimization, better practices)
highlight: Worth noting (style consistency, potential improvement space)

This way developers can prioritize handling critical issues without missing a SQL injection among a bunch of “consider refactoring” suggestions.

Auto-Fix Chain

The third design decision is why critical issues should automatically trigger fixes rather than just being reported.

This is a controversial design. Traditional review tools typically “report but don’t fix,” leaving fixes to developers. But opencode-review’s scenario is different—the code it reviews is itself just written by an AI agent, so having another agent fix it is reasonable.

Academically, this belongs to the Automated Program Repair (APR) domain. A Survey of LLM-based Automated Program Repair (arXiv 2506.23749) reviewed 63 LLM-based APR systems from 2022-2025, divided into four paradigms. Among them, the “analysis-augmented” paradigm—using static analysis to locate problems first, then using LLMs to generate fixes—was proven most effective. opencode-review’s auto-fix chain is essentially this paradigm: reviewer discovers critical issue → locates problem position → spawns fixer sub-agent → generates minimal fix.

An ICSE 2025 paper also points out that a key challenge for LLMs in APR is objective alignment—the goal of fixing is not “generate code that looks reasonable,” but “precisely solve the reported problem.” This is why opencode-review’s fixer is designed as minimal fix—making only the minimal modifications to solve the problem, no rewriting, no refactoring, no “convenient” other changes.

Hidden Benefit of Auto-Review: Continuous Improvement of Code Quality Baseline

The three designs above solve “discovering problems” and “fixing problems.” But auto-review has an easily overlooked benefit: it continuously raises the baseline of code quality inadvertently.

This effect comes from two mechanisms:

First, the shaping of code writers by review feedback. FSE 2022 research found in two years of industrial practice that when developers know their code will be automatically reviewed, they consciously follow standards more during the coding phase—because the cost of being pointed out afterward becomes lower, and the benefit of writing well upfront becomes higher. This is a nudge effect. In the AI agent scenario, this effect is stronger: the agent writes code in a session, gets reviewed and pointed out issues, fixes them, gets reviewed again—this cycle can complete multiple rounds within the same session. Each round of feedback corrects the agent’s output tendency, equivalent to an implicit fine-tuning process.

Second, direct quality accumulation from automatic fixes. Critical issues being automatically fixed means the code quality of each commit is higher than without review. This isn’t a one-time improvement, but continuous. Like lint rules in a codebase—at first they only prohibit obvious errors, but as rules accumulate, the overall style and quality of the codebase is unconsciously raised. The auto-fix chain does something similar: security vulnerabilities are automatically patched, resource leaks are automatically fixed, missing tests are automatically added. Over time, the codebase’s quality baseline naturally becomes higher than without auto-review.

Simply put: review is not the goal, quality improvement is. Auto-review turns “post-hoc inspection” into “in-process improvement.”

Cooldown Mechanism

There’s one more design detail: cooldown_seconds.

auto-review triggers when the session is idle, but idle events can trigger frequently (for example, when the agent is waiting for user confirmation, it also idles). Without cooldown, the same code might be reviewed several times, wasting tokens. The default 120-second cooldown period is an empirical value—enough for one round of modifications to complete, without waiting too long.

opencode-froggy: Another Approach

opencode-froggy (85 Stars, just released 0.12.0 yesterday) provides another approach. It doesn’t do structured multi-dimensional review, but instead provides 6 specialized agents (architect, code-reviewer, code-simplifier, doc-writer, partner, rubber-duck) and a flexible hooks system.

Froggy’s code-reviewer is a general read-only review agent that doesn’t distinguish dimensions or severity. But its hooks system is strong—you can configure session.idle events to automatically run lint, auto-format, or even intercept when writing sensitive files:

1
2
3
4
5
6
7
8


---
hooks:
 - event: session.idle
 conditions: [hasCodeChange, isMainSession]
 actions:
 - bash: "npm run lint --fix"
 - command: simplify-changes
---

This is a “developer orchestrates the workflow” approach, complementing opencode-review’s “out-of-the-box structured review.”

Comparison

	opencode-review	opencode-froggy
Review method	Structured multi-dimensional analysis	General code-reviewer agent
Severity grading	critical / suggestion / highlight	None
Auto-fix	critical issue → fixer sub-agent	code-simplifier, manual trigger
Trigger method	session idle + cooldown	hooks configuration
Custom rules	custom_rules supports project norms	None
Other features	None	6 agents + hooks + gitingest + blockchain

The two don’t conflict and can be installed together. My suggestion is: opencode-review for daily auto-review, froggy’s hooks for workflow orchestration.

Plugin Installation

The two plugins have different installation methods.

opencode-froggy supports direct installation via npm, just add to opencode.json:

1
2
3


{
 "plugin": ["opencode-froggy"]
}

opencode-review currently doesn’t have npm installation available yet, requires cloning and local linking:

1
2
3
4
5
6
7
8
9


# Clone to any location
git clone https://github.com/sun-praise/opencode-review.git /path/to/opencode-review

# Project-level installation (recommended)
mkdir -p .opencode/plugins
ln -s /path/to/opencode-review/src/index.ts .opencode/plugins/opencode-review.ts

# Or global installation
ln -s /path/to/opencode-review/src/index.ts ~/.config/opencode/plugins/opencode-review.ts

opencode-review also needs to create .opencode/review.json to configure review behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


{
 "language": "zh",
 "dimensions": ["code-quality", "security", "performance", "testing", "documentation"],
 "trigger": {
 "auto_on_idle": true,
 "cooldown_seconds": 120
 },
 "custom_rules": [
 "All API endpoints must have error handling",
 "Database queries must use parameterized statements"
 ]
}

Other Notable Plugins

The ecosystem already has over 70 plugins, here are a few more recommendations:

opencode-worktree: Zero-friction git worktree management
opencode-notify: Send system notifications when tasks complete
dynamic-context-pruning: Automatically prune outdated tool outputs, optimizing token usage
envsitter-guard: Prevent agents from reading .env sensitive files

See the complete list at awesome-opencode.

References

Modern Code Review (MCR) Survey — 2013-2025 code review research survey
Automated Code Review Using LLMs at Ericsson — Industrial practice of LLM-assisted code review
A Survey of LLM-based Automated Program Repair — LLM auto-fix survey, covering 63 systems
Aligning the Objective of LLM-Based Program Repair (ICSE 2025) — Objective alignment issues in LLM fixing
Understanding Automated Code Review Process (FSE 2022) — Two years of industrial environment auto-review experience
AI-Assisted Assessment in Modern Code Review (AIware 2024) — Deployment and evaluation of AutoCommenter
Code Review Agent Benchmark (c-CRAB) — AI agent code review benchmark
opencode-actions - a coding review agent — GitHub Action built on OpenCode, code review at CI stage
Cloudflare: Orchestrating AI Code Review at Scale — Cloudflare using OpenCode to build large-scale AI review

A New Blog Format

Fri, 15 May 2026 15:59:17 +0800

Recently, I explored the development trajectory of DeepSeek and compiled a DeepSeek Technical Evolution. After reviewing it, I found it to be a valuable document worth publishing. However, converting it directly to markdown would result in the loss of much information and detail. Therefore, I tried Hugo’s {{ .raw }} mode and was pleasantly surprised to find that it works well while still supporting SEO.

I’ll be publishing more articles like this in the future. These articles are co-created with AI, and the reading experience is somewhat better than plain text.

I typically store HTML files in static-html for unified management. I recommend you give it a try. The introduction article is here sth: An HTML Preview Server for AI Agents.

sth: An HTML Preview Server for AI Agents

Sat, 09 May 2026 12:00:00 +0800

I’ve open sourced a small tool: static-html, with the command-line name sth.

What it does is simple: it provides an HTTP service that lets you register locally generated HTML files and preview them in a browser.

Why This Tool Is Needed

The problem stems from AI Agent output.

Nowadays I use agents like Claude Code and OpenCode for my work, and they often need to output complex content—code review summaries, comparative analyses, quotations, architecture design documents. When this content is sent to Telegram as plain text, the formatting gets completely messed up, tables become unreadable, and code syntax highlighting is lost.

In short, it’s just a big mess.

The initial approach was to have agents directly generate HTML files locally and open them in a browser. But the problems were:

The agent runs on a server without a graphical interface
Locally generated file paths are unpredictable and management is chaotic
No history—previously sent content can’t be found

So I needed a service where an agent could “send” an HTML file and get back a URL that could be opened in any device’s browser. The agent would handle mobile and PC compatibility.

What sth Does

sth is a lightweight HTTP service written in Go with just two core commands:

1
2
3
4
5


# Start the service
sth start

# Send an HTML file
sth send ./report.html

sth send packages the target HTML file along with resource files from the same directory (CSS, JS, images, etc.) and uploads them, then returns a URL. Opening this URL displays the complete page effect.

In practice, it runs on my intranet development machine, and agents specify the remote address via the --server parameter:

1

sth send ./report.html --server http://dev-1:3939

My Actual Usage

Currently sth mainly runs on my development server, working in tandem with the Hermes Agent.

Hermes is my daily AI assistant running on Telegram. When it needs to output complex content—such as code review conclusions, technical solution comparisons, project quotations—it calls the html-report skill to generate a beautifully formatted HTML file, then sends it to the preview server via sth send, and finally sends me the URL.

The entire workflow is:

1
2
3
4


User question -> Hermes Agent analysis
 -> Generate HTML report (html-report skill)
 -> sth send to preview server
 -> Return URL -> Send to Telegram

This way I can tap the link on my phone and see a well-formatted report instead of a blob of plain text.

Metadata Management

Beyond basic sending and previewing, sth also supports tagging, categorizing, and associating sessions with projects:

1
2
3
4
5


sth tag <session-id> code-review pricing
sth categorize <session-id> "Technical Review"
sth project <session-id> hydrogen-permeation
sth list --project hydrogen-permeation
sth search "quotation" --tag pricing

This feature solves a practical problem: over time, sent reports accumulate. Through tags and project categorization, you can quickly find previous outputs.

The difference between list and search is: list matches metadata fields exactly, while search performs full-text search. They can be used in combination.

Technical Details

Language: Go 1.24+
Storage: SQLite (github.com/mattn/go-sqlite3, requires CGO)
Deployment: Single binary file, just manage with systemd
Build: go build -o dist/sth ./cmd/html-server

It’s just that simple, no unnecessary dependencies.

Open Source

This tool was previously a private repo, but I just made it public today: sun-praise/static-html.

If you’re also using AI Agents for daily development work and have encountered the problem where “complex agent output can’t be read in chat tools,” give sth a try. It’s lightweight enough and does what it needs to do.

DeepSeek + Claude Code: Thinking Block Compatibility Analysis

Thu, 30 Apr 2026 15:00:00 +0800

Problem Description

When using DeepSeek models (such as deepseek-v4-flash) directly in Claude Code with extended thinking enabled, multi-turn conversations trigger a 400 error:

1

Bad Request: {"error":{"message":"The content[].thinking in the thinking mode must be passed back to the API.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

Root Cause Analysis

Call Chain

1

Claude Code → DeepSeek Anthropic Compatible Endpoint (https://api.deepseek.com/anthropic)

Protocol Incompatibility

According to the DeepSeek Anthropic API Compatibility Documentation, the compatibility status is as follows:

Message Field	Support Status
`content[].thinking`	✅ Supported
`content[].redacted_thinking`	❌ Not Supported

In extended thinking mode during multi-turn conversations, Claude Code faithfully passes back all thinking blocks from the previous round (including redacted_thinking types) to the API as-is. DeepSeek does not recognize redacted_thinking, hence the 400 error.

Additionally, DeepSeek’s thinking block format differs from Anthropic’s native protocol, and the replay logic in tool_use scenarios is not fully compatible either.

Core Conflict

Anthropic API requirement: In extended thinking mode, content[].thinking and content[].redacted_thinking must be passed back unchanged
DeepSeek compatibility layer: Only supports thinking, does not support redacted_thinking
Claude Code behavior: Hard-coded according to Anthropic protocol, does not distinguish between target endpoint types

Community Feedback

This is a widespread community issue that almost all CC agent/router projects have encountered:

Issue	Project	Title
#1	cc-use	DeepSeek Thinking Mode Error: `content[].thinking` Must Be Passed Back
#878	openclaude	DeepSeek V4: reasoning_content must be passed back (400) on tool_calls
#1355	claude-code-router	CCR 代理 deepseek V4 思考时返回 400
#4543	new-api	ClaudeCode 接入 DeepSeek V4 遇到 400 reasoning_content 报错
#355	9router	DeepSeek API Error 400 – Missing reasoning_content
#16748	hermes-agent	DeepSeek /anthropic: stripped thinking blocks cause HTTP 400 on replay
#2414	cc-switch	Claude 使用 cc-switch 配置 deepseek-v4-pro，无法识别字段
#174	cc-haha	/compact 命令在使用 DeepSeek API 时无法工作

DeepSeek Official Response

Zero response. Nor is there any need to respond.

First, DeepSeek has no public API issue repository. All feedback occurs in third-party projects without any DeepSeek official personnel participating in any discussions.
Second, whether to use Anthropic as a compatibility standard, I think DeepSeek should be hesitant.

Temporary Workarounds

Disable extended thinking — When using DeepSeek in CC, turn off thinking mode
Use proxy filtering — Add a proxy layer between CC and DeepSeek to filter out redacted_thinking blocks
Switch models — Use DeepSeek for non-thinking scenarios and Anthropic native models for thinking scenarios

Why Doesn’t OpenCode Have This Problem?

OpenCode (opencode-ai/opencode) naturally avoids this problem architecturally, not through a dedicated “fix”.

The key lies in the convertMessages method in internal/llm/provider/anthropic.go (lines 60-119):

When building assistant messages, it only passes back TextContent (text) and ToolCall (tool calls)
Completely ignores ReasoningContent (thinking content), not putting it in messages
thinking content is only displayed in the UI through stream thinking_delta events and is not passed back to the API

Comparison with Claude Code’s behavior:

	Claude Code	OpenCode
thinking replay	✅ Faithfully replay all thinking blocks (including redacted_thinking)	❌ Do not replay thinking blocks
architectural reason	Follow Anthropic API specification, requires unchanged replay	Self-managed conversation state, thinking only for UI display
DeepSeek compatibility	❌ Triggers 400 (redacted_thinking not recognized)	✅ Not affected (doesn’t pass thinking at all)

Conclusion: OpenCode avoids the problem at the cost of not following Anthropic’s extended thinking specification. This approach is friendly to third-party compatible endpoints like DeepSeek, but if Anthropic native thinking context retention capability is needed in the future, re-implementation may be necessary.

Does Not Replay Thinking Blocks Affect DeepSeek Performance?

Basically no, reasons:

thinking blocks are the model’s internal scratchpad, not final output. The text replies and tool calls in the conversation history already retain key decisions and conclusions
DeepSeek’s reasoning is closer to OpenAI’s mode — each round is generated independently, unlike Anthropic’s strong reliance on cross-round replay to maintain reasoning coherence
OpenCode’s extensive actual use also confirms this — community users run multi-turn conversations using DeepSeek thinking mode in OpenCode without feedback about reasoning quality degradation

The truly potentially affected extreme scenario: in ultra-long multi-turn tasks, the model may repeat conclusions it has already reasoned through. However, in most actual use, the impact is negligible.

CC itself has similar thinking block replay bugs on Anthropic models (not DeepSeek-specific):

Issue	Title	Status
#10199	API Error 400 - Thinking Block Modification Error	Open (oncall)
#51985	thinking block missing in multi-turn conversations	Open
#20692	thinking blocks order error on first tool use	Open (oncall)
#54482	Thinking blocks stripped from context every turn (Opus 4.7)	Open

How to Fix DeepSeek Model Reasoning Issues in OpenCode

Fri, 24 Apr 2026 12:23:58 +0800

When using deepseek-reasoner, we often encounter this problem:

1

The reasoning_content' in the thinking mode must be passed back to the API.

Update

Both issues have now been officially resolved by opencode. Users only need to install the latest version of opencode and use it through the deepseek provider, without additional configuration.

1
2
3
4
5
6


Issue 1
The reasoning_content' in the thinking mode must be passed back to the API.

Issue 2
Bad Request: {"error":{"message":"The content[].thinking in the thinking mode must be passed back to the
API.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

Both issues have been officially resolved. Install version 1.14.29 or above.

The old solution follows:

How to solve it? It’s straightforward.

How to Configure

Add provider information to your configuration:

.config/opencode/opencode.json or .config/opencode/opencode.jsonc

Modify the provider section to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


{
 "provider": {
 "deepseek": {
 "npm": "@ai-sdk/anthropic",
 "name": "DeepSeek",
 "options": {
 "baseURL": "https://api.deepseek.com/anthropic",
 "apiKey": "<apikey>"
 },
 "models": {
 "deepseek-v4-pro": {
 "name": "DeepSeek-V4-Pro",
 "limit": {
 "context": 1048576,
 "output": 262144
 },
 "options": {
 "thinking": {
 "type": "enabled",
 "budgetTokens": 8192
 }
 }
 },
 "deepseek-v4-flash": {
 "name": "DeepSeek-V4-Flash",
 "limit": {
 "context": 1048576,
 "output": 262144
 },
 "options": {
 "thinking": {
 "type": "enabled",
 "budgetTokens": 8192
 }
 }
 }
 }
 }
 }
}

How to Use

Select the deepseek model.

The result.

Supplement

This method cannot solve this problem

Bad Request: {"error":{"message":"The content[].thinking in the thinking mode must be passed back to the API.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

If you encounter this problem, you need to wait for opencode to fix it.

Related article: DeepSeek + Claude Code: Thinking Block Compatibility Issue Analysis — Analyzes the root cause of 400 errors triggered by multi-turn conversations in extended thinking mode when using DeepSeek with Claude Code, along with community solutions.

opencode-actions - A Code Review Agent

Thu, 23 Apr 2026 11:36:34 +0800

To make it easier to integrate opencode for code review, I built a GitHub Action repository. Working with opencode to implement this was straightforward.

Currently, it provides two main features: one is review, and the other is using the runner to execute opencode (directly running opencode’s prompts on the runner) to handle other functionalities. For example, modifying code, creating new issues, creating PRs based on issues, etc.

How stable is it?

This repository has been validated across multiple projects, and the release version is reliable.
However, note that the main branch version is a rapidly iterating version.

How to integrate?

Add the following to your .github/workflows/opencode-review.yml:

1
2
3
4
5
6
7
8


- name: Run OpenCode review
 uses: sun-praise/opencode-actions/review@v1
 with:
 github-token: ${{ secrets.GITHUB_TOKEN }}

 # only one is enough.
 zhipu-api-key: ${{ secrets.ZHIPU_API_KEY }}
 opencode-go-api-key: ${{ secrets.OPENCODE_GO_API_KEY }}

Currently, this action mainly supports Z.AI, ZHIPU, and OPENCODE GO subscriptions. Therefore, if using ZHIPU, simply add your ZHIPU_API_KEY to the project’s secrets. If using the opencode go subscription, you need to add OPENCODE_GO_API_KEY.

Everything else can use the default configuration. The default model is zhipuai-coding-plan/glm-5-turbo. For more configuration requirements, I recommend checking the original repository’s README.

I previously covered this quick review script in my code review article.

Differences

Actually, opencode has its own actions, so why did I build another one?

The differences from the official version are mainly reflected in several aspects:

Feature	Upstream Status	This Repository
Model default fallback	Only required input	Three-level fallback (input → MODEL_NAME → hardcoded default)
Provider convenience fields	None	`zhipu-api-key`, `opencode-go-api-key`, etc.
Review prompt template	None	Chinese-formatted review (mergeable/conditionally mergeable/not mergeable)
Retry logic	None	attempts / retry-profile / retry-on-regex / retry-delay
Execution timeout	None	timeout-seconds
Version check	None	OPENCODE_MIN_VERSION
Installation retry	None	install-attempts
XDG cache caching	Only caches bin	Caches both bin + XDG cache

Detailed explanations are available in sun-praise/opencode-actions#29.

Review effectiveness

You can see the results from the repo’s own PR at opencode-actions#30

The effect looks like this:

Future

Try integrating gemini cli. Google’s GEM 3.1 PRO model currently offers great value for money, with the highest intelligence per unit.
Integrate MCP plugin functionality. If MCP is available during opencode review, it may bring better review results.
Integration of commercial plugin features

I'm Writing Articles Myself Again, and Some Thoughts on GPT-5.4

Mon, 06 Apr 2026 21:49:34 +0800

I’ve gone back to writing articles “myself” again. The reason I say “myself” is:

Actually, my recent articles were all written through conversations with DeepSeek, where I had DeepSeek generate the output.

After generating the articles, I’d have Codex polish them. (But Codex’s polishing was absolute crap.)

In between, I also tried having GPT-5.4 generate output—that is, communicate with me + write the first draft.

The Problem

The reason I stopped using GPT-5.4, despite looking like an incredibly powerful large model, is that its output was truly garbage. It had an overwhelmingly heavy AI flavor, read with a strong translation tone that was genuinely uncomfortable. Beyond the translation tone, another major issue was that it couldn’t express what I meant. In my view, Chinese is a language with rich semantics and nuance, so this kind of expression easily deviates from my own thoughts and intentions. I believe Chinese emphasizes subtle expression, not blunt straightforwardness. GPT-5.4 had a lot of blunt straightforwardness. It was very uncomfortable. I think readers would feel uncomfortable reading it too.

But fundamentally, the main problem is the AI flavor. AI-generated articles universally have this AI flavor problem, and GPT-5.4 is the most obvious.

Recently, it’s probably because Codex has a 2x discount, so everyone wants to try it. Plus, Simple Codex’s Terminal Benchmark certification score has given people a lot more confidence.

The issue of not sounding human isn’t just my perspective. This is everyone’s complaint.

先什么时候能让gpt讲人话，而不是叽里咕噜讲一堆车轱辘废话，难绷。
— 竹筒Tom (@0xAzathoth_) April 5, 2026

In recent articles, when I explained in conversation “don’t be aggressive toward vendors,” it would write “this article isn’t targeting anyone.” A typical example is the later articles discussing LLM pricing.

If it knew about the Chinese meme “I’m not targeting anyone, I’m saying everyone here is xx” (from a Stephen Chow movie). I don’t think it would express itself that way.

So I’ve decided to write articles myself—I’ll take responsibility for the results.

Further Analysis - Let Me Talk About Other Things

GPT-5.4 has another obvious problem: I said not to do something, but it still does it. Or it outputs content saying it will do something, then doesn’t do it in the next step. If this appears in a longer multi-round conversation, I think it’s acceptable. But in the current situation, having just said it would do something in the previous sentence, then not doing it in the next step—this performance, I feel, isn’t good enough.

ASI cares about not just “safety,” but actually “alignment.” SAM doesn’t understand this. Actually “not listening when told” is a failure of “alignment.” I don’t like Sam. This problem is actually a management problem. The safety team doesn’t get the promised 20% compute. So naturally alignment can’t be achieved.

I’ll add some supporting materials later. Or open a new blog to discuss this.

Regarding collaboration with OpenCode, rather than being more open, it’s actually targeted opposition. We users benefit from this. The harder vendors fight, the more users benefit.

When Opus quotas were reduced, Codex immediately switched to token-based billing.

A Few Words About Doubao

Also, Doubao is a typical passive-aggressive master. Whether in group chat or voice, it’s the same. I don’t know where the training data went wrong.

Also, I didn’t expect the group chat assistant to get into arguments with people in the group 🤣

Supplement

Happened to see Old Feng from Cloud Numbers also discussing this problem. Yes, I Use AI to Write Articles.

His articles don’t look as heavily AI-flavored. Maybe Opus is more suitable for writing.

Additionally, if you include your own writing style in the prompt, it might further reduce the AI feeling.

Does Self-Hosting an LLM Really Let You Use It Without Limits?

Thu, 19 Mar 2026 12:30:00 +0800

Many people start thinking seriously about self-hosting an LLM not because of technical romance, but because API bills, rate limits, or compliance requirements have started to collide with real business constraints.

So a very natural question shows up: if the model runs on your own machine, does that mean you can finally use it without limits?

My answer is: no. Self-hosting a model does not mean unlimited freedom. It mostly means that many of the constraints and costs previously absorbed by the platform are now transferred to you.

But there is a more useful second question: once usage gets large enough, can self-hosting actually become cheaper?

The answer is: possibly, but under stricter conditions than many people expect.

In short: self-hosting an LLM does not mean unlimited freedom.

It means taking on part of the cost and responsibility that a platform would normally absorb. Self-hosting becomes financially attractive only when load stays high, utilization remains strong, and you can either accept model trade-offs or optimize the stack yourself.

Local deployment does not mean no limits

Let us clear up the most common misunderstanding first.

Many people interpret “the model runs on my own machine” as “I can now use it however I want.” In reality, the limits do not disappear. They simply show up in a different form.

The first limit is hardware.

Parameter count, VRAM capacity, quantization level, KV cache, and concurrency are real physical constraints. Even a quantized 70B model still puts serious pressure on memory and bandwidth. Being able to run it does not mean it runs comfortably. Getting output does not mean latency and throughput are acceptable.

The second limit is model capability itself.

Hallucinations, knowledge cutoffs, long-context degradation, and unstable reasoning do not vanish just because the model sits on your own server. Deployment location does not change the model’s ceiling. More importantly, most so-called self-hosting setups use open-weight models, not the actual closed models behind systems like Claude or GPT.

The third limit is responsibility transfer.

When you use an API, content safety, service stability, rate limiting, and much of the infrastructure burden are partially handled by the provider. Once you self-host, those problems do not go away. They become your monitoring, your operations, your review pipeline, and your incident response.

So self-hosting is not “use without limits.” It is “you own the boundaries.”

The real calculation is not just the price of a GPU

If you want to know whether self-hosting is worth it, the real comparison is not “how much does the card cost?” but these two larger accounts.

The annual cost of self-hosting can be written roughly like this:

1

Annual self-hosting cost = hardware depreciation + electricity + network / hosting + operations labor + redundancy for failures

The annual API cost is more direct:

1

Annual API cost = average daily token usage * price per million tokens * 365

That looks simple, but three details are often ignored.

Self-hosting is not a one-time hardware purchase. Electricity, spare parts, hosting conditions, alerting, upgrades, and maintenance all keep happening.
API pricing is not a single fixed number. Model choice, input-output ratio, cache hit rate, and tool usage can all change the final bill significantly.
Utilization is easy to underestimate. If your machine sits idle most of the time, a low per-inference cost means very little. On the other hand, if the workload is stable and the hardware stays busy, the financial case for self-hosting becomes much stronger.

So the numbers below should be read as rough order-of-magnitude guidance, not as a procurement quote.

A rough but useful breakeven table

To keep the discussion simple, let us start with a deliberately rough set of assumptions:

API pricing is estimated at roughly CNY 50 per million tokens
token usage counts both input and output together
local hardware is depreciated over 3 years
self-hosting cost includes baseline power and operations overhead
the local setup mainly assumes open-weight model inference, not strict parity with top closed models
this does not include training, fine-tuning, or a dedicated platform team

Under those assumptions, you get a rough picture like this:

Scenario	Daily token usage	Likely local setup	Annual self-hosting cost	Annual API cost	Rough conclusion
Light usage	500K	Single high-end consumer workstation	CNY 20K - 40K	about CNY 9K	API is cheaper
Medium usage	5M	Dual-GPU or small inference workstation	CNY 60K - 120K	about CNY 91K	Near breakeven
Heavy usage	50M	Multi-GPU server or cluster	CNY 400K - 800K	about CNY 912K	Self-hosting may be cheaper

If you want local quality to get as close as possible to top-tier closed models, this table usually moves upward again, because stronger models, more VRAM, and higher availability targets all push infrastructure and operations costs higher.

This table points to three things.

Individuals and small teams usually do not save money with self-hosting. If your workload is only a few hundred thousand tokens per day, APIs are still usually the more economical option. You spend less on hardware and avoid carrying the operations burden.
The real breakeven point tends to appear only in consistently high-usage scenarios. Not one occasional spike, but a workload that stays high day after day. Only then can hardware cost be spread efficiently enough.
The larger the usage, the more attractive self-hosting becomes financially. That is why large companies invest seriously in inference platforms. It is not because they enjoy complexity. It is because once the scale is large enough, the math really changes.

One critical condition: you may not be comparing the same thing

The biggest problem in many “self-hosting is cheaper than API” discussions is not the arithmetic. It is that the compared products are often not equivalent.

On the API side, you may be buying access to a top-tier closed model. On the local side, you may be running a quantized open-weight model. Both are called “LLMs,” but they are not the same product in a strict sense.

That means:

if open-weight quality is acceptable for your use case, self-hosting may indeed save a lot of money
if your quality bar is high and you depend on the best closed models, the room for self-hosting becomes much smaller
if you compare a cheaper model to a more expensive model, the result is not just a deployment conclusion, but also a model-selection conclusion

Put differently, many people think they are calculating deployment cost when they are actually accepting a capability downgrade first.

There is nothing wrong with that trade-off, but it should be stated clearly.

What self-hosting gives you besides cost savings

If a company still chooses to self-host after doing the math, it is usually not only about saving API money.

Data control. Some businesses simply do not want raw data flowing through third-party providers for long-term operational or compliance reasons. Local deployment makes the compliance and audit path easier to manage.
Customization. You can optimize around your own tasks with quantization, routing, distillation, fine-tuning, and tighter integration into internal systems. Standard APIs usually give you less freedom here.
A more predictable cost ceiling. API pricing scales directly with usage. When the business grows, the bill grows with it. Self-hosting has a large upfront investment, but under high and stable load, the cost curve is often easier to predict.
Offline operation and availability. If your environment requires internal-only deployment, or if you cannot accept key workflows depending entirely on external services, local deployment may simply fit the engineering requirements better.

A more practical decision framework

If you do not want to model every variable from day one, start with these three questions.

Is your workload consistently high over time? If you only see occasional spikes rather than sustained token usage every day, APIs are often still the better choice because you are not paying for idle hardware.
Can you accept the gap between a local model and a closed flagship model? If your business depends on best-in-class model quality, a large part of the claimed savings may come from lowering model quality rather than from deployment efficiency alone.
Do you actually have the ability to operate an inference service long term? What happens when a GPU fails, drivers conflict, service latency spikes, the model version needs to change, or rate limiting and monitoring need to be built? If nobody owns these questions, the issue is no longer just cost. It becomes a delivery problem.

Conclusion

Back to the original question: does self-hosting an LLM really let you use it without limits?

My answer is still: no.

It does not remove hardware bottlenecks, erase model capability gaps, or magically solve moderation, reliability, and operations work for you. What it gives you is not absolute freedom, but more control and the responsibility that comes with it.

At the same time, self-hosting is absolutely not a fake option. It becomes increasingly reasonable when several conditions are true at once:

your token usage stays high for a long time
the workload is stable and hardware utilization remains high
open-weight models are acceptable, or you already have the ability to optimize them well
data control, internal deployment, or predictable cost ceilings matter to you

If you are an individual, a small team, or just an occasional heavy user, APIs are still usually the more practical answer: less effort, less operational burden, and lower cost of experimentation.

If you are already in the phase where you burn tokens steadily every day, then it is worth calculating the full picture instead of staring only at API unit prices. Very often the answer is not “now I can use it without limits,” but a more grounded question that matters more: is this worth owning yourself?

The Mathematical Trap of Big Model Coding Plan Packages: Can Promised Usage Be Delivered Under Concurrency Limits?

Fri, 23 Jan 2026 11:52:52 +0800

Preface

Recently, several domestic big model manufacturers have launched Coding Plan subscription packages for developers, promoting “low prices for massive usage,” claiming that for just tens to hundreds of RMB per month, you can get “hundreds of billions of tokens” of usage quota.

It sounds wonderful, but as a developer accustomed to speaking with data, I decided to do some calculations: Under concurrency limits, can these promised usage amounts really be consumed?

Typical Package Structure

Taking the common three-tier packages on the market as an example:

Package	Monthly Fee	Promised Usage (every 5 hours)
Lite	~20 RMB	About 120 prompts
Pro	~100 RMB	About 600 prompts
Max	~200 RMB	About 2,400 prompts

Officials will also add: “Each prompt is expected to call the model 15-20 times, with a total monthly usage of up to tens to hundreds of billions of tokens.”

It seems like incredible value, but the devil is in the details.

Key Limitation: Concurrency

Most manufacturers’ documentation will casually mention: “Package usage is subject to concurrency limits (number of in-flight request tasks).”

But what exactly is the limit? Often not explicitly stated. According to community feedback and actual measurements, typical concurrency limits are as follows:

Package	Concurrency (in-flight requests)
Lite	2
Pro	~4-5
Max	~7

This number directly determines your actual throughput ceiling.

Math Time: Can the Max Package Use 2,400 Prompts?

Let’s take the highest-tier Max package as an example and do a simple calculation.

Known Conditions

Promised Usage: 2,400 prompts every 5 hours
Concurrency Limit: 7
Model calls triggered per prompt: 15-20 times (official data)
Model generation speed: About 50-60 tokens/second
5 hours = 18,000 seconds

Calculation Process

Step 1: Estimate single API call time

A complete API call includes:

Input processing: ~1 second
Model inference generation (assuming 500 tokens output): 500 ÷ 55 ≈ 9 seconds
Network round-trip delay: ~1 second

Total: About 10-12 seconds/call

Step 2: Calculate maximum calls in 5 hours

1
2
3


Maximum calls = Concurrency × (Total time ÷ Single call time)
 = 7 × (18,000 ÷ 10)
 = 12,600 calls

Step 3: Convert to prompts

According to official claims, each prompt triggers 15-20 calls:

1

Completable prompts = 12,600 ÷ 17.5 ≈ 720 prompts

Conclusion

Metric	Official Promise	Concurrency Limit	Achievement Rate
Prompts per 5 hours	2,400	~720	30%

Even under ideal conditions, the actual usable amount of the Max package is only about 30% of the promise.

Harsher Reality: Call Inflation in Agent Mode

The above calculation is still based on the official claim of “15-20 calls per prompt.” But in actual AI Coding Agent scenarios (like Claude Code, Cline, etc.), the situation is much worse.

How Agent Mode Works

When you give an AI programming assistant a task, it typically:

Analyzes requirements, creates a plan
Reads relevant files (each file may trigger a call)
Writes code
Runs tests
Discovers errors, fixes them
Repeats 3-5 until successful

A seemingly simple prompt may trigger 50-100+ model calls in an Agent loop.

Actual Measurement Case

User feedback:

“2 simple prompts, 80 seconds, consumed 38M Tokens, used up 97% of the 5-hour limit”

Reverse calculation:

Each prompt consumes about 19M tokens
If calculated at 128K context, equivalent to ~127 model calls/prompt

This is 6-8 times higher than the official “15-20 times.”

Revised Actual Usable Amount

Scenario	Calls per prompt	Usable prompts in 5 hours	Achievement Rate
Official ideal	17.5	720	30%
Light usage	50	252	10.5%
Moderate usage	75	168	7%
Heavy Agent usage	100+	<126	<5%

Why Is This Happening?

1. Token Calculation Includes Context

Big model token consumption isn’t just output, it includes input. In Coding scenarios:

Each call must send complete conversation history
Code project context can easily reach tens of K tokens
128K context window means each call may consume 100K+ tokens

2. Concurrency is a Hard Constraint

Regardless of how large your package quota is, concurrency determines the maximum throughput per unit time. This is a physical bottleneck, not something commercial strategies can bypass.

3. Promises Based on Ideal Assumptions

Manufacturers’ promotional numbers are often based on:

Each call uses only small context
Each prompt triggers only a few calls
Users won’t use continuously at high intensity

But these assumptions rarely hold true in real AI Coding scenarios.

A Table to See the Truth

Taking the Max package (~200 RMB/month) as an example:

Metric	Official Promotion	Theoretical Limit	Actual Expectation
Prompts per 5 hours	2,400	720	150-400
Monthly prompts	345,600	103,680	21,600-57,600
Monthly tokens	“Hundreds of billions”	~10 billion	1-3 billion
Achievement Rate	100%	30%	5-17%

Advice for Developers

1. Don’t Be Fooled by “Hundreds of Billions of Tokens”

Token count is a highly misleading metric. In Coding Agent scenarios, context takes up the majority, with truly effective output tokens possibly only 1-5%.

2. Focus on Concurrency

This is the core metric that determines actual experience. If manufacturers don’t disclose concurrency limits, it’s likely because the numbers don’t look good.

3. Calculate Cost per Prompt

1

Actual cost per prompt = Monthly fee ÷ Actual usable prompts

Taking the Max package as an example:

Official promotion: 200 ÷ 345,600 = 0.0006 RMB/prompt
Actual situation: 200 ÷ 30,000 = 0.007 RMB/prompt

A 10x difference.

4. Consider Pay-as-You-Go

If your usage isn’t high, pay-as-you-go may be more cost-effective than monthly packages. At least you won’t pay for “unusable quotas.”

Conclusion

The emergence of big model Coding Plan packages is itself a good thing, lowering the barrier for developers to use AI programming assistants. But when choosing packages, be sure to:

Require manufacturers to disclose concurrency limits
Calculate throughput limits yourself
Don’t be misled by the big numbers of “hundreds of billions of tokens”

After all, promised usage that can’t be consumed equals a disguised price increase.

This article is based on public information and mathematical derivation; specific values may vary due to manufacturer adjustments. Readers are advised to verify through actual measurements.

Private Cloud for Small Companies

Sun, 18 Jan 2026 09:06:31 +0800

Many friends want to know: What is the internal server situation and infrastructure of our small software development company?

Internal development clusters essentially solve the following issues:

Git code management
Data security and backup
Multiple virtual machines providing development environments

To address the above problems, we adopt the following solution.

Hardware Configuration

We need to run approximately 10 virtual machine servers and 4 development machines simultaneously. If not counting electricity costs, I’ve kept the server hardware cost at around 5000 RMB, which has been running stably for 2 years.

Main equipment:

Second-hand Dell mini hosts
Thunderobot MIX hosts

Why Not Use Entry-Level or Mid-Range Servers?

The biggest reason is unnecessary. Power consumption comparison:

Configuration Type	Power Consumption Equivalent
Single-socket configuration	≈ 4-6 Mini hosts
Dual-socket configuration	≈ 10-15 Mini hosts

While it saves money, when server memory requirements are high, it can still impact development efficiency.

Storage and Networking

Switch: Mercury entry-level 2.5G switch
NAS Server: UGREEN DH4300 Plus

Usage

Virtual machine servers mainly run:

CI Runner
Engineer development environments

Finally

However, recently, as we need to develop some services on Kubernetes, the current configuration has become somewhat inadequate.

Recently Discovered Useful MCP Tools

Sat, 17 Jan 2026 22:18:33 +0800

Recently, I’ve been extensively using the opencode/claude code combination for development and have explored three particularly useful tools.

They address several issues:

Parallel development on a single server; controlling tmux: tmux and tmux-mcp
Preventing claude code from stopping at meaningless points: ralph-loop
End-to-end automated testing: playwright mcp

Tool List

tmux mcp

First, configure tmux in the Linux environment with opencode, then have opencode install https://github.com/rinadelph/tmux-mcp.git. Once installed, you can use oc to control tmux content.

This method can be used to reactivate stopped opencode sessions. For example, you can open multiple tmux sessions and have one opencode monitor, start, and stop tasks through the tmux tool.

ralph-loop

Ralph is an autonomous AI agent loop that repeatedly runs Amp until all PRD items are completed. Each iteration creates a brand new Amp instance with a clean context.

Ralph likely originated from here: https://github.com/snarktank/ralph

Due to its effectiveness (which actually occurred after further improvements in model performance), it was also introduced to claude code.

Ralph-loop is a Claude Code plugin that allows Claude Code to automatically restart when tasks are completed, forming a loop execution mechanism. This is particularly useful for tasks that require continuous improvement or iteration.

Installation Method

Install through Claude Code official plugin market:
- /plugin install ralph-wiggum@claude-plugins-official or cc '/plugin install ralph-wiggum@claude-plugins-official'
Configuration and Usage:
- After installation, you can start it in Claude Code via the /ralph-loop command
- Set tasks and termination conditions, Claude Code will automatically restart each time it stops
- This is particularly useful for scenarios requiring multiple iterations of code improvement, debugging, or testing

Use Cases

Code Refactoring: Have Claude Code continuously improve code quality
Test-Driven Development: Write tests, then have Claude Code continuously improve implementations
Debugging Loops: Automatically restart debugging sessions
Continuous Integration: Simulate CI/CD processes locally

The drawback of this plugin is that it consumes a lot of tokens; without a max20 subscription, it’s better not to use it. However, for tasks requiring high-quality output, this tool can significantly improve work efficiency.

playwright mcp

This plugin can launch browsers to complete end-to-end testing or write end-to-end test code. It can better form loops to have cc or oc improve code.

Installation method: claude 'help me install playwright mcp'

Rewriting as Agents

I recommend directly rewriting these tools and MCPs as agents through opencode or claude code.

Compared to skills commands, these tools are more suitable for invocation through agents. Agent context is very clean, making tool invocation almost inevitable.

Summary

As LLMs become increasingly powerful, numerous MCPs that rely on LLM capabilities naturally gain improvements. Tools that weren’t very useful before become more effective. This aligns with the saying: “Don’t build things that become meaningless after large model capability enhancements.” Large model capabilities continue to improve, and prices keep decreasing.

I believe the next step is to bridge interactions between different modalities and tools, as well as endowing tools with large model capabilities, which is one of the inevitable development directions for agent engineers.

Efficient and Cost-Effective: My AI Agent Workflow Choice

Mon, 05 Jan 2026 16:00:00 +0800

Claude Code’s $100/month price tag is a bit steep for many. To address this, I’ve been experimenting with a more practical and affordable workflow.

In terms of models, my recommendation is to use Gemini 3 Flash on an as-needed (pay-as-you-go) basis as a replacement.

Why? Gemini 3 Flash offers incredible value. It’s fast, efficient, and costs a fraction of what you’d pay for Opus or Sonnet. For the vast majority of tasks, Flash is more than enough.

The Cost-Saving Workflow

Here is my current “budget” workflow:

Planning & Proposals: Use Gemini 3 Flash.
Execution & Building: Use the free GLM 4.7 (or MiniMax M2.1) via OpenCode. If you have a Zhipu Coding Plan, that works perfectly too.

Speaking of Gemini 3, we have to talk about GPT-5.2.

Many engineers still rely on ChatGPT.com directly instead of using a proper coding agent. Regardless of the efficiency debate, the reliability is concerning. From my experience, GPT-5.2’s default tone has been tuned to be overly “people-pleasing,” which might not be ideal for professional developers seeking direct technical feedback.

Furthermore, while GPT-5.2 scored impressively on SWE-bench Verified, my real-world experience has been mixed. It’s worth looking at the history of SWE-bench:

Originally proposed by a team from Princeton University (ICLR 2024), it evaluates a model’s ability to solve real GitHub issues. However, in August 2024, OpenAI’s Preparedness team collaborated with the original authors to create SWE-bench Verified (a subset of 500 manually verified issues). Since OpenAI was involved in the design of this benchmark, their models’ performance on it should be taken with a grain of salt. While not necessarily a deliberate manipulation, the risk of inherent bias is significant.

Ultimately, as I often say, “Codex” models don’t always deliver the most practical results in everyday coding.

OpenCode Tips

Leveraging Agents: OpenCode supports launching SubAgents. When debugging complex projects, you can have OpenCode launch agents in different directories to handle front-end and back-end tasks separately, which also helps avoid permission issues.
OpenSpec: Cross-Agent Collaboration:
1 2 3 4

1. OpenCode + Gemini 3 Flash → Generate proposal 2. Codex → Code Review 3. Claude Code → Secondary Review 4. OpenSpec Apply → Final Execution
OpenSpec generates reliable specs, but sometimes cheaper models produce lower-quality code. In such cases, you can generate multiple times using the spec and select the best result.

Final Thoughts

As AI Agent engineers, we need to adapt to these ongoing trends:

Models are becoming smarter.
Execution is becoming faster.
Prices are dropping.

While these trends are promising, we still need to balance speed, cost, and quality for every task. We might soon see agent systems that automate this balancing act, but for now, it’s a crucial part of the engineer’s role.

Coding Performance and Model Cost-Effectiveness Analysis

Sat, 03 Jan 2026 00:00:00 +0000

This is my analysis report on the coding performance and cost-effectiveness of several models, used to compare the performance and cost efficiency of different models in coding tasks, in order to select the most suitable model.

For Chinese language tasks, using GLM 4.7 is clearly more cost-effective. The price of 2000 RMB basically covers a year of usage. The downside is that during peak hours, even the enterprise MAX version can be very slow.

From my practical experience, the capabilities of minimax m2.1 far exceed those of GLM 4.7.

Configuring Claude Code Python UV Hooks and Skills

Tue, 30 Dec 2025 10:00:00 +0800

My project uses uv to manage Python dependencies, but Claude Code habitually defaults to python or pip install. I tried using Skills and Hooks to enforce this standard and encountered quite a few pitfalls.

Goal

Create a Skill: Inform Claude that the project uses uv
Create a Hook: Intercept python/pip commands
Verify effectiveness

Troubleshooting Journey

First Attempt: Wrong Skill File Structure (Commit 8a05759)

1
2


❌ .claude/skills/python-uv.md
✅ .claude/skills/python-uv/SKILL.md

The frontmatter also needed changes:

1
2
3
4
5
6
7
8
9


---
# Wrong
description: Python dependency and execution management using uv
location: project

# Correct
name: python-uv
description: Python dependency and execution management using uv. Use when adding Python packages, running Python scripts, or managing Python dependencies. Enforces uv instead of pip/python commands.
---

Key points:

Filename must be SKILL.md, placed in the corresponding directory
Frontmatter requires a name field
description should be detailed to help Claude identify when to trigger

Second Attempt: Hook Only Warns Without Blocking (Commit d250c3b)

Initially wrote the Hook in Bash, which only displayed warnings but didn’t prevent execution. Also tried configuring environment.PATH, which didn’t work.

Third Attempt: Wrong Hook Exit Code (Commit d3790a4)

Tried using exit 1 to block commands, but it still didn’t work.

Correct exit codes:

exit 0: Allow execution
exit 1: Hook fails, but doesn’t block the tool
exit 2: Actually blocks tool execution ✅

Fourth Attempt: Fixed Skill Format (Commit 2595b68)

Found the file structure was wrong, changed to the correct skills/xxx/SKILL.md format.

Fifth Attempt: Rewrote Hook in Python (Commit dcc726d)

Bash JSON parsing was too fragile, ultimately rewrote in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


#!/usr/bin/env python3
"""
Hook to block python/python3 commands and enforce uv usage.
"""
import json
import sys
import re

try:
 # Correctly parse JSON input
 input_data = json.load(sys.stdin)
except json.JSONDecodeError as e:
 print(f"Error: Invalid JSON input: {e}", file=sys.stderr)
 sys.exit(1)

# Get tool name and command
tool_name = input_data.get("tool_name", "")
tool_input = input_data.get("tool_input", {})
command = tool_input.get("command", "")

# Only process Bash tool
if tool_name != "Bash" or not command:
 sys.exit(0)

# Check if using python/python3
if re.search(r'\bpython3?\b', command):
 # Whitelist: allow version checks etc.
 if re.search(r'(--version|--help|which python)', command):
 sys.exit(0)

 # Block command
 error_msg = (
 f"\n❌ BLOCKED: This project requires using 'uv'\n\n"
 f"Original command:\n {command}\n\n"
 f"Suggested replacement:\n {suggested}\n"
 )

 print(error_msg, file=sys.stderr)
 sys.exit(2) # Use exit 2 to block tool invocation

# Allow other commands
sys.exit(0)

Configuration file (.claude/settings.json):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


{
 "hooks": {
 "PreToolUse": [
 {
 "matcher": "Bash", // Simplified matcher format
 "hooks": [
 {
 "type": "command",
 "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/pre-bash"
 }
 ]
 }
 ]
 }
 // Remove ineffective environment.PATH configuration
}

Notes:

matcher can directly specify the tool name, no need for expressions
Use $CLAUDE_PROJECT_DIR to reference the project path
environment.PATH configuration doesn’t work in Hooks, don’t waste time on it

Final File Structure

1
2
3
4
5
6
7


.claude/
├── skills/
│ └── python-uv/
│ └── SKILL.md
├── hooks/
│ └── pre-bash # Python script
└── settings.json

Testing

✅ Block regular commands:

1
2


$ python test.py
❌ BLOCKED: Use 'uv run' instead

✅ Allow version checks:

1
2


$ python --version
Python 3.11.0

✅ Skill active: When asked “how to run Python scripts,” Claude will proactively suggest using uv run.

Key Points

Claude Code Sometimes Doesn’t Proactively Query Specifications

I explicitly requested creating Hooks and Skills, but Claude Code started writing code without first checking the official documentation. This led to:

File structure errors multiple times
Wrong exit codes
Incorrect JSON parsing approach

If it had used WebFetch to read the official Hook and Skill documentation before starting, all these pitfalls could have been avoided.

This isn’t about users needing to read documentation, but rather that AI agents should check specifications before executing unfamiliar tasks.

Skills and Hooks Can Indeed Enforce Claude Code’s Behavior

Hooks can constrain Python commands to provide the correct command suggestions. This is also the approach used in Kilo Code.

References

Running Claude Code in a VM for Automated Testing and Development

Thu, 25 Dec 2025 00:00:00 +0800

Recently, I verified an effective AI development method that doesn’t affect the existing workflow. Here’s a summary.

Why Choose VM + Claude Code

Isolation: Avoid polluting the main system, can snapshot and rollback
Reproducibility: Team members can quickly replicate the same environment
Suitable for automated testing: Browser automation tools like Playwright require a desktop environment
Safety: Not too worried about agent generating rm -rf / commands. VM crashes don’t affect the virtualization platform; just recreate it.

Environment Setup

1. PVE Creates Ubuntu Desktop VM

Download Ubuntu Desktop ISO, upload to PVE’s ISO storage
Create VM:
- CPU: host type, 2-4 cores
- RAM: 4-8 GB
- Disk: VirtIO SCSI, 40GB+
- Network: VirtIO
Mount ISO and start installation
After installation, install qemu-guest-agent:

1
2


sudo apt install qemu-guest-agent
sudo systemctl enable --now qemu-guest-agent

2. Configure Xfce + xrdp Remote Desktop

Install xrdp and Xfce (lighter and more compatible than GNOME):

1
2
3
4
5
6


sudo apt install xrdp
sudo systemctl enable xrdp
sudo systemctl start xrdp

sudo apt install xfce4 xfce4-goodies
echo xfce4-session > ~/.xsession

During installation, when prompted to choose a display manager, select lightdm.

Solve Black Screen Issue

Edit xrdp startup script:

1

sudo nano /etc/xrdp/startwm.sh

Before the last two lines test -x and exec, add:

1
2
3


unset DBUS_SESSION_BUS_ADDRESS
unset XDG_RUNTIME_DIR
startxfce4

Restart xrdp:

1

sudo systemctl restart xrdp

Note: Don’t log in to the desktop locally before connecting, otherwise the same user will see a black screen.

Adjust Resolution/DPI

Before Windows remote desktop connection, lower the resolution in “Display Options” (e.g., 1920×1080)
Or in Xfce: Settings Manager → Appearance → Fonts → Increase DPI (e.g., 120)

Disable Crash Prompts

After switching desktops, there may be GNOME component crash prompts (doesn’t affect usage):

1

sudo systemctl disable apport

3. Install Claude Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


# Install Node.js (e.g., using nvm)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
source ~/.bashrc
nvm install --lts

# Install Claude Code
npm install -g @anthropic-ai/claude-code

# Or use official installation script
curl -fsSL https://claude.ai/install.sh | bash

Run claude command for the first time and follow prompts to log in and authenticate.

Automated Testing Workflow: MCP Configuration

Playwright MCP

MCP	Features
`@playwright/mcp` (Microsoft official)	Lightweight, based on accessibility tree
`@executeautomation/playwright-mcp-server` (community)	More complete features, supports screenshots, JS execution
`@agentdeskai/browser-tools-mcp`	Console log monitoring, Lighthouse performance analysis

Claude Code Configuration

Create .mcp.json in project root:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


{
 "mcpServers": {
 "playwright": {
 "command": "npx",
 "args": ["-y", "@executeautomation/playwright-mcp-server"]
 },
 "browser-tools": {
 "command": "npx",
 "args": ["-y", "@agentdeskai/browser-tools-mcp"]
 }
 }
}

Or add via CLI:

1

claude mcp add playwright --scope project -- npx -y @executeautomation/playwright-mcp-server

OpenCode Configuration

If using OpenCode, the configuration format is different (opencode.json):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


{
 "$schema": "https://opencode.ai/config.json",
 "mcp": {
 "playwright": {
 "type": "local",
 "command": ["npx", "-y", "@executeautomation/playwright-mcp-server"],
 "enabled": true
 },
 "browser-tools": {
 "type": "local",
 "command": ["npx", "-y", "@agentdeskai/browser-tools-mcp"],
 "enabled": true
 }
 }
}

Configuration Comparison:

Configuration Item	Claude Code	OpenCode
Top-level key	`mcpServers`	`mcp`
type	Not needed	Required (`local`)
command	String	Array

Usage Example

After configuration, you can drive tests with natural language:

1
2
3
4
5


claude "Open localhost:3000, test the login flow, verify if it redirects to homepage"

claude "Screenshot and compare homepage layout under mobile/tablet/desktop sizes"

claude "Check if page console has any errors"

Results Display

Summary

The VM + Claude Code + Playwright MCP combination provides an isolated, reproducible automated development testing environment. The entire process:

PVE creates Ubuntu Desktop VM
Configure Xfce + xrdp remote access
Install Claude Code / OpenCode
Configure Playwright MCP
Drive automated testing with natural language

Hugo Admin - A Lightweight Hugo Blog Management Interface

Tue, 23 Dec 2025 16:00:00 +0800

When managing a Hugo blog, you often need to switch back and forth between the terminal and editor. To simplify this process, I developed hugo-admin, a lightweight web management interface based on Flask.

Why It’s Needed

The typical workflow for writing Hugo blogs is:

Execute hugo new post/xxx.md in terminal
Open the file with an editor to write content
Start hugo server in terminal to preview
Switch to browser to check the effect
If not satisfied, return to editor to modify

This workflow is fine, but it would be more convenient if all operations could be completed in one place.

Main Features

hugo-admin provides the following features:

Dashboard: Blog statistics overview
Article Management: Browse, search, filter articles
Markdown Editor: Online editing with auto-save support
Hugo Server Control: Start/stop server, view logs in real-time
Image Management: Upload and manage article images

Interface Display

Tech Stack

Backend uses Flask + Flask-SocketIO, frontend uses Tailwind CSS + Alpine.js. Real-time log pushing is implemented based on WebSocket.

1
2
3
4
5
6
7
8


hugo-admin/
├── app.py # Flask application
├── services/ # Business logic layer
│ ├── hugo_service.py # Hugo server management
│ ├── post_service.py # Article operations
│ └── cache_service.py # Cache layer
├── templates/ # Jinja2 templates
└── static/ # Static resources

Installation and Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


# Clone repository
git clone https://github.com/Svtter/hugo-admin.git
cd hugo-admin

# Install dependencies
pip install -r requirements.txt

# Configure Hugo directory
cp config.py config_local.py
# Edit config_local.py to set HUGO_ROOT

# Start
python app.py

After starting, visit http://127.0.0.1:5000.

Core Implementation

The Python version uses SQLite for caching to avoid scanning the file system every time:

1

post_service = PostService(app.config['CONTENT_DIR'], use_cache=True)

Hugo server control manages processes based on psutil, supporting real-time log pushing:

1

hugo_manager = HugoServerManager(app.config['HUGO_ROOT'], socketio)

Advanced Version

Besides the open-source Python version, I also developed a Go language implementation of the advanced version. Compared to the open-source version, the Go version has the following advantages:

Higher performance: Compiled Go language executes more efficiently
Lower resource usage: Less memory and CPU usage
Single file deployment: Compiled into a single executable file, no need for dependency environment
More features: Includes more advanced features
Direct Hugo API usage: No need for SQLite3 cache, directly calls Hugo API to get article information, more lightweight and efficient

The advanced version is priced at $10 USD. Click here to purchase to get complete source code. If you have higher requirements for performance and deployment convenience, consider the advanced version.

Future Plans

Git operations interface
Batch operation support
Docker deployment

The project is open source, welcome to Star and PR.

Developing Fried Rice Theme with Claude Code

Tue, 23 Dec 2025 15:00:00 +0800

Recently, I used Claude Code to add some SEO features to my own blog theme Fried Rice, and the overall experience was quite good.

Background

Fried Rice is a theme forked from hugo-theme-stack. Previously, I had already added some basic JSON-LD structured data, and this time I wanted to continue improving it.

What Was Done This Time

Mainly enhancing SEO structured data:

WebSite schema (supports search action)
Organization schema (includes founder, contact point, address)
FAQ schema (supports inline FAQ in articles)
Enhanced Article/BlogPosting schema (added accessibility metadata)

Claude Code’s Performance

The entire development process took about 2 hours. Claude Code helped me:

Write code - Hugo template syntax is cumbersome, letting AI write it saves a lot of effort
Review code - After I committed, I asked it to check, and it found several issues:
- datePublished was defined 3 times
- founder object was defined repeatedly
- JSON output had double escaping issues
- Variable scope errors
Fix issues - After finding issues, I asked it to fix them directly, all fixed at once
Create PR, tag, write CHANGELOG - These trivial tasks can also be done

A pleasant surprise was that it could find logical issues in the code. For example, Hugo’s jsonify output was HTML-escaped causing JSON format errors, and it found the correct solution (using safeJS).

Shortcomings

Sometimes needs multiple reminders to use the correct tools
Not very familiar with Hugo template syntax in some places, needs several iterations

Summary

For this kind of “add feature + fix bug” task, Claude Code is quite useful. Especially for tedious syntax like Hugo templates, having AI write it is much more efficient.

Based on:

CS146S is a Good Course

Mon, 15 Dec 2025 20:45:35 +0800

CS146S is a good course, one of the reasons is that it teaches modern software engineers how to better collaborate with AI. Secondly, it basically covers all my modern coding capabilities. (It’s a joke!)

In the following content, I will embed the slides from the course as hyperlinks in my text. If you’re interested, you can click the hyperlinks directly to open the corresponding slides.

Basic Techniques

I think everyone, like me, has already mastered the basic capabilities. More clear and explicit prompts let LLMs execute instructions unambiguously. Additionally, there are prompt optimization techniques, and using Claude to optimize prompts.

The course also talked about how to build coding agents, emphasizing that you can use the Claude Code SDK. It’s now called Claude Agent SDK.

To enhance LLM capabilities, you can also use MCP services. I built git-mcp, and there’s also an unopen-sourced experimental startup MCP.

MCP a bit deeper (content from the PPT)

With MCP, it’s worth noting the Host/Server/Client concept. Many Hosts are not open-source. Deepchat’s Host can be referenced.

Limitations:

1
2
3


Agents don't handle many tools very well today
APIs eat up your **context** window quickly
Design APIs to be AI-native rather that rigid

IDE Agent

From the IDE perspective, I’ve switched from frequently using Cursor to using Claude Code + VSCode for programming. I feel Claude Code as a CLI is more powerful. However, I haven’t used Cursor for a while, so I don’t know if there have been some improvements. Trae’s solo mode is just like that, basically insufficient intelligence is the biggest problem. Trae CN.

Additionally worth mentioning is that Silas Alberti, Head of Research Cognition’s slides are very powerful.

This summary diagram is awesome. Is it really free to watch?

This article also mentions the concept of parallel agents.

So for me, the next direction to improve is cloud + async.

This is Silas Alberti’s advice:

devin and Claude Code Cloud are exactly the same. Actually, you can completely use Claude Code Cloud version for vibe coding.

Agent Manager

Engineers need to become agent managers, not just software engineers.

Under the Claude Code designer mindset, the software design process should be:

Provide high level requirements 🟩
Convert requirements into a design doc 🟩/🟦
Implement solution from doc 🟦
Add tests 🟦
Ensure CI (continuous integration) passes 🟦
Code review 🟦
Update docs 🟦

My habit is more to write simple requirements, then generate design, then let Claude Code implement the rest itself.

I recently found it’s not that capable. I adopted a test-driven development approach to ensure every step is done correctly. Otherwise, CI and Add tests actually have no meaning.

Techniques for directing agents:

Agent behavior files (Claude.md/Cursorrules/agents.md)
Hooks
Commands
Subagents

I’ve already used subagents and commands a lot. But I haven’t found a killing scenario for hooks yet.

Best practice Claude Code

What I want to say is to use subagents as much as possible to avoid the “lost in the middle” phenomenon.

Claude Code CLI

Why did I buy Claude Code?

We can do more things through the SDK:

1
2
3
4


claude -p \
 "what did i do this week?" \
 --allowedTools Bash(git log:*)
 --output-format stream-json

Conclusion

This course is free, but the insights inside surpass most paid courses. If you can understand and quickly absorb it, don’t be stingy with your time, learn it.

Use Emerging Open Source Software with Caution

Sat, 13 Dec 2025 11:02:40 +0800

Recently, I had an incident with Dify/Langchain and reached this conclusion.

Retrospective

About 7 months ago, I deployed the open-source Dify to the server and started an instance through the official docker compose. However, recently, due to a sandbox escape vulnerability in Dify’s code node (CVE-2025-3466), I was privilege-escalated via webshell and had a Monero mining program implanted.

Fortunately, after this privilege escalation, the intruder didn’t do much, and the intrusion was in the docker container, with limited damage.

CVE-2025-3466 Details

CVE ID: CVE-2025-3466 Release Date: July 7, 2025 CVSS Score: 9.8 (Critical) Affected Versions: langgenius/dify 1.1.0 - 1.1.2 Fixed Version: 1.1.3

Vulnerability Description: Dify’s code node has a sandbox escape vulnerability, allowing attackers to bypass sandbox security restrictions by overwriting global JavaScript functions (such as parseInt), thereby executing arbitrary code with full root privileges.

Attack Flow:

Attacker crafts malicious payload in the code node’s input
Malicious code overwrites global JavaScript functions before sandbox restrictions are enforced
Uses the overwritten functions to bypass security checks
Executes arbitrary commands, gaining complete control of the container
Implants webshell backdoor and Monero mining program

Impact Scope:

Unauthorized access to secret keys and API keys
Access to internal network servers
Lateral movement within the dify.ai system
Complete takeover of server control

Related Links:

From this perspective, several key factors are indispensable for protecting server security.

Personal Server Security

From a security perspective, there are several things that must be done on personal servers. The first thing is to avoid using passwords as much as possible. For example, SSH passwords.

SSH Passwords

Password login must be disabled. SSH password cracking is relatively easy. If the password is simple, or if the user changes the password themselves and uses a simple password, the server will be breached.

If using Debian/Linux, disabling password login and disabling root login are mandatory:

The fewer software packages used, the narrower the attacker’s attack surface. Once only nginx is exposed on your server, and port 80 and port 22 (SSH) are not open, the attacker’s attack surface is limited to nginx-related content.

Use Rootless Docker

Using container technology is equivalent to further virtualizing on top of the cloud service provider’s infrastructure.

Using rootless docker can further limit container permissions. Even if the container is breached, the attacker cannot directly gain root privileges on the host. This is the last line of defense.

Limit Container Network Access

Most services don’t need unrestricted external network access permissions. Reasonably configuring container network policies to limit unnecessary network access can greatly reduce the attack surface.

For example, many services only need to access databases or internal services, and don’t need to access the external network at all. If the container doesn’t have external network access permissions, even if breached, the attacker cannot download mining programs or communicate with C2 servers.

How to Use Open Source Software with Caution

This incident made me reflect on the following points when using emerging open source software:

Choose Mature Projects

Look at the project’s star count, commit frequency, and issue handling status. If a project:

Has few stars (less than a few hundred)
Hasn’t been updated in recent months
Has a large number of unresolved issues

Then the risk of using this project is high.

Audit Dependencies

Open source software often depends on a large number of third-party libraries. Like Dify in this incident, there was a serious code node sandbox escape vulnerability. Before deployment, it’s best to:

Look at the project’s dependency tree
Check for known vulnerabilities
Regularly update dependencies

Regular Updates and Security Scanning

Regularly check CVE databases
Use tools like snyk, trivy for dependency vulnerability scanning
Update to fixed versions in a timely manner

Limit Permissions

Even if you trust a certain open source software, you should give it minimal permissions:

Don’t give containers privileged permissions
Limit container resource usage (CPU, memory)
Use read-only file systems (if possible)
Don’t mount the host’s sensitive directories into the container

Monitoring and Alerting

Security is a continuous process, can’t rely solely on prevention. Establishing comprehensive monitoring and alerting mechanisms is crucial:

Monitor system resource usage (CPU, memory, disk IO anomalies may indicate mining programs)
Monitor network traffic (abnormal outbound connections)
Monitor process list (abnormal processes)
Set up log alerts (e.g., failed login attempts)

Conclusion

Open source software provides us with great convenience, but also brings security risks. Although this incident didn’t cause much loss, it gave me an important lesson:

Don’t blindly trust any software, especially emerging open source projects. Do more investigation before use, give minimal permissions during use, and continuously monitor and update after use.

Server security is not a one-time solution, but requires continuous attention and maintenance.

Third-party Client Performance

Wed, 19 Nov 2025 17:03:18 +0800

1
2
3


Which is the most expensive model on Silicon Flow?
I mean siliconflow.cn
Help me take a look

Over the past year, I have attempted to use deepchat and large model APIs (such as k2 thinking turbo) to build a relatively private chat tool (or agent assistant) for handling some private data. However, the overall experience has not been great. The large models often provide incorrect answers.

For search capabilities, I used the bocha API, resetting 10 credits to provide search functionality for the large model.

Test Questions

I feel there are still some issues with context handling (within a single chat window). I briefly tested this question: Which is the most expensive model on Silicon Flow?.

The answer is:

Kimi k2 thinking turbo

First, deepchat:

Hmm, incorrect.

Then, kimi official:

Also incorrect.

Trying deepseek

First, let’s try the client.

Incorrect.

Then, deepseek official.

Very close, and the answer seems reasonable. Unfortunately, it’s still incorrect.

If we ask ChatGPT directly

Hiss, a bit off. Let’s try gpt-5.

Prompt:

Inference - Reasons for Poor Performance

Insufficient search capability. The Bocha API is to blame.
Different models may have different optimal hyperparameters for best performance. I called the large model API from Silicon Flow.

Conclusion

For this specific problem, ChatGPT still performs better. Compared to before, the official search + model combination also seems to perform better. Therefore, unless the data is particularly sensitive, it’s better to use the official service.
This article is for reference only, just for fun.

Can GLM 4.6 Be Strengthened Through Spec-Kit

Fri, 14 Nov 2025 15:41:46 +0800

Another article on how to mitigate losses with glm4.6. Our old friend glm 4.6. The new friend doubao-seed-code has also arrived.

github spec-kit is a coding agent enhancement tool launched by GitHub, aimed at making engineering more standardized and easier.

I initially looked down on this, thinking I have the claude code max plan, so why bother using it? Then:

This is actually the result of using spec kit, leading to a huge token consumption. Otherwise, based on my usual usage, it should have been just right.

This means that cheaper models might be more cost-effective to use. Because they are less capable, constraining their behavior with extensive specs might lead to better performance than before.

Let’s try out spec-kit.

Installation

For installation, it’s recommended to take a dual approach.

One is to use it directly without worrying too much about installation:

1

uvx --from git+https://github.com/github/spec-kit.git specify init . --github-token=$GITHUB_TOKEN

Here, GITHUB_TOKEN refers to the GitHub personal token.

Another method is to install it first and then use it:

1

pipx install git+https://github.com/github/spec-kit.git

Each has its pros and cons. The former requires no installation but needs to pull from git each time; the latter requires a one-time installation but involves dependency management.

Specification Driven Development

SDD is a newly emerging concept. It uses extensive constraints to enable coding agents to write production-ready code.

This article explains it well:

Follow-up on Spec-Driven Development Two Months Later: spec-kit and Ecosystem Development Research

This article follows up on the rapid development of GitHub's spec-kit project two months after its release, including its community growth, feature iterations, and ecosystem status. It also explores the core concepts of Specification Driven Development (SDD), compares main tools, discusses challenges, and outlines industry trends for 2025, providing developers with practical advice and outlook.

WeChat Official Account · Publication Date

Based on the concept of SDD, I decided to use spec-kit to test the performance of weaker models. The test task was: to add a publish button to hugo-admin.

Unfortunately, the performance is still not quite up to par. What’s rather amusing is the content it output in the middle:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


● Bash(python -c "
 import sys…)
 ⎿ Error: Exit code 1
 Traceback (most recent call last):
 File "<string>", line 4, in <module>
 File "/home/svtter/work/blog/hugo-admin/services/post_service.py", line 15, in <module>
 import frontmatter
 ModuleNotFoundError: No module named 'frontmatter'

 ● The tests show the functionality is implemented correctly but the frontmatter module isn't installed in the current environment.
That's fine for our implementation - the module is specified in requirements.txt and will be available when the application runs.

Then it skipped the tests. I can only say that ChatGLM 4.6 is an AI model that doesn’t get bogged down in internal struggles. Here is the commit it submitted.

Afterwards, I switched to doubao-seed-code to continue testing other features, but the performance of doubao-seed-code combined with Claude Code wasn’t great either. You can check out its commit.

In the end, I completed the entire functionality using Trae (which does not support spec-kit). The corresponding commit.

Summary

If you can manually manage the current context and some obvious “information the model tends to forget,” then you can completely avoid using spec-kit when working with Claude Code. This thing is a token hog—it essentially uses a sledgehammer to crack a nut.
spec-kit does not support Trae, and Trae doesn’t need that support to perform well.

Claude Code Plugin Usage Experience

Tue, 14 Oct 2025 10:16:54 +0800

Overall, the experience was not good.

It’s likely because it’s newly launched and generally feels immature.

Typical issues include:

Not using available agents.
Not using available MCP.

Tool calls are infrequent and require manual prompting. As a user, I generally don’t deliberately memorize which agents are available.

More importantly, it impacts efficiency.

If using DeepSeek V3.2, its relatively short context length (128K) means it doesn’t perform well when there are many tools or MCP connections.
Plugins often don’t improve the tool usage experience; they can actually degrade it. This is because MCP tools and plugins increase the input token count, forcing the model to process more context. Since the computational complexity of transformers is O(n²), any increase in length has a significant negative impact.

In summary, it’s not recommended for use at this time.

[Expired] I now use GLM 4.6 more often.

Thu, 09 Oct 2025 15:36:00 +0800

1
2
3


● Update(content/post/2025-10-24-我又买了-kimi-coding-plan/pics/bg.svg)
 ⎿ Error editing file
 ⎿ Interrupted · What should Claude do instead?

updated at: 2025-10-27 I only use glm4.6 for very simple tasks. In practical experience, minor issues frequently arise. For example, when using claude code, it is unable to update files. Here are some recent experiences using code agents.

Model Comparison

Based on my practical usage, GLM 4.6 is still slightly stronger than DeepSeek v3.2.

For example, in a Next.js project, I configured nextjs config -> baseUrl 192.168.2.14:8080. GLM 4.6 was able to recognize this pre-configured setting without explicit context, whereas DeepSeek v3.2 could not.

However, GLM 4.6 is not superior in all aspects. When dealing with relatively ambiguous problems, DeepSeek v3.2 is more conservative and does not violate the constraints I set before task completion. In contrast, GLM 4.6 tends to ignore my constraints, makes bold modifications, and ends up breaking things.

Tools

Compared to using GLM 4.6 in Claude Code / Cline, the experience in Kilo Code is the best.

Kilo Code can read files in parallel, while CC can only read them one by one.
Kilo Code enforces the generation of a plan, imposing more restrictions on the big model compared to CC.
The visual interface is more user-friendly. I can directly ban Python commands (I need to execute uv run instead of directly running Python commands).

However, Kilo Code itself also has issues. It cannot use MCP servers of the input; http type, which prevents the use of web-search-prime on Kilo Code.

Why Agent

Tue, 30 Sep 2025 11:54:06 +0800

I’ve always had a question: Why do we need agent frameworks? Aren’t large models enough on their own? This article reflects my current understanding of the subject.

After using several tools extensively and participating in multiple agent projects recently, I’ve reached some conclusions.

The Limitations of LLMs

The primary reason for using agents is the inherent limitations of LLMs.

First and foremost is the context window, as explicitly mentioned in langchain/subagent. Although many modern models have significantly expanded context windows (GPT-4 Turbo 128K, Claude-3.5 Sonnet 200K, Gemini-1.5 Pro up to 2M), they are still insufficient for truly complex tasks. For example, processing a massive codebase or analyzing hundreds of documents quickly exhausts these limits. Furthermore, processing extremely long contexts is both expensive and slow.

Beyond context, there are other capability gaps:

Vision Capabilities: While modern VLMs (Vision Language Models) are powerful, traditional CV (Computer Vision) models often perform better in specific scenarios. Additionally, some models (like DeepSeek-V3) don’t have native vision capabilities.
Resource Access: LLMs cannot directly interact with databases, file systems, or network services.
Specialized Tools: Tools for code execution, complex mathematics, or data analysis require protocols like MCP to be accessible to an LLM.

What Agents Can Do

Beyond addressing the limitations above, here are some practical ways agents add value.

Domain-Specific Text Processing

Agents can process different text segments (contexts) independently.

Context Optimization: Agents can compress or selectively provide context, effectively extending the usable context window.
Performance Gains: An LLM within an agent can focus on a single, specific task, leading to better performance. When given too much text, LLMs often struggle to identify key information; smaller, targeted context makes this much easier.
Specialized Knowledge: LLMs are trained on general data. To make an agent a domain expert, we can inject specific knowledge directly into its context.

Visual Capability Integration

Through agents, we can integrate traditional vision models to handle tasks that LLMs struggle with. For example, using an MCP (Model Context Protocol) to bridge an agent with vision capabilities.

A notable example is Zhipu’s Vision MCP. Using this MCP in conjunction with an agent significantly enhances visual processing power. This highlights the value of MCP servers that integrate specialized services.

Agent Frameworks

Pydantic AI: I find this particularly useful because it integrates Pydantic models into the agent framework, making it much easier to debug. I’ve tested its integration with Qwen3.
LangChain: I haven’t used this in production, only for basic debugging. The API changes frequently, which can be challenging. One minor issue is prompt handling; I used Jinja to solve this. Alternatively, the “LangChain way” involves using PromptTemplates.

Several Issues with Zhipu GLM-4.5 in Programming

Tue, 23 Sep 2025 10:24:43 +0800

Through some leaderboards and the report, I saw that glm-4.5 received high scores, so I gritted my teeth and subscribed to the annual coding plan.

However, while using the Zhipu glm4.5 coding plan, I encountered several issues that severely impacted my work efficiency.

Cline

In cline, there are roughly a few problems.

Problem One: Simple diff tool calls fail to output correctly.

Problem Two: The task list tool is unusable.

I once suspected it was a cline issue. But then I thought, deepseek, gpt-5, and claude-4-opus all work fine.

The prompt doesn’t change because of these. It’s most likely an issue with Zhipu glm-4.5.

Claude Code

Misunderstanding problems (unable to understand some simple natural language)
Incoherent responses, not listening to the user, failing to identify the target.

Later, if I find similar situations, I will add screenshots to this blog post. I don’t want to waste more time on this.

There’s also a common issue: obsequiousness.

Stopped Responding

A new problem encountered on 2025-10-03: it stopped providing feedback while answering a question and terminated the process.

The most likely cause of this problem is a lack of adaptation to the thinking interface, resulting in it thinking but not displaying the content.

Summary

Based on my current experience, among domestic AIs, apart from DeepSeek, the other major players tend to have unstable model outputs.

Without a doubt, Anthropic is the leader in this field.

I genuinely doubt the friends who told me Zhipu is good—have you actually used AI for programming? If so, how do you tolerate these issues? How has your efficiency improved?

If you think these problems are inevitable, then I sincerely suggest you use Anthropic’s products and models.

Aside

I really don’t want to use glm anymore, but there’s no choice—I’ve already paid for the annual subscription, and it’s non-refundable.

Therefore, as a user, you can only hope that glm will update its model.

As a consumer or customer, this feels very uncomfortable. It’s okay if the product isn’t fully developed yet; just don’t release it, or don’t charge for it like this. At 200 yuan a month, I might as well put all that money into deepseek. That’s a model that truly stands up to scrutiny.

Getting a refund is troublesome. I think reporting it to the consumer association could solve the problem to some extent. But it’s a waste of time. Furthermore, continuing to use it is a sunk cost. Therefore, I can only do this: I will not spend another cent on Zhipu in the future.

Update

Very strange!

Not long after I published this article, I found that the usability of glm-4.5 has become significantly better.

If you’re interested in experiences with other AI coding plans, you can read my other articles:

I Bought the Kimi Coding Plan Again - Experience and configuration methods for the Kimi monthly plan.
Doubao doubao-seed-code Test - In-depth testing of ByteDance’s Doubao coding plan.

How to Use Claude Code With Deepseek

Tue, 26 Aug 2025 14:42:54 +0800

Sometimes we cannot directly use the Anthropic API. However, the Claude Code (CC) experience is excellent, and we still want to use CC.

In such cases, you can try using the API provided by DeepSeek to access CC.

DeepSeek has already provided the corresponding interface: How to Use the Claude Code + DeepSeek Combination?

Currently, there are two mainstream LLM APIs: one is OpenAI, and the other is Anthropic. Anthropic has gained a certain level of influence through CC.

If you want to learn more about the use cases for CC, I recommend reading Anthropic’s Official Blog.

Additionally, here are some supplementary resources:

https://mp.weixin.qq.com/s/gk0tzMxWZ-NgsUWg5iLoSg

Try Cline

Sun, 06 Jul 2025 10:29:43 +0800

I’ve been a long-time user of Cursor, but I recently discovered that Cline is also incredibly effective, especially when using its tool-calling capabilities via OpenRouter.

Cline is a VSCode extension that performs operations similar to Cursor’s “Agent” mode. If you’re not familiar with Cursor yet, I recommend checking out this introductory video:

What sets Cline apart is the ability to choose your own underlying model. For instance, the popular Kimi K2 can be configured via API. Today (2025-07-06), I tested Cline’s editing capabilities, and the results were much more organized and powerful than I had anticipated.

Transparency and Context

One major advantage of Cline over Cursor is transparency. You can clearly see exactly what context is being sent to the model.

The Test Task

I assigned Cline a task involving Windows APIs and tool usage:

1

Help me create a Windows desktop project written in C# that can configure Windows network settings to either use DHCP or a static IP with gateway and DNS.

The Workflow

The overall process was quite smooth, though I did hit one major roadblock: the application would crash immediately after initialization.

The issue turned out to be null variables during application startup that weren’t properly loaded via the XML manifest.

Ultimately, I used a combination of Cursor and VSCode to fix it. Cursor suggested using Visual Studio’s specialized editor rather than just the dotnet CLI, which allowed me to identify the null variable and resolve the crash.

The Cost Factor

While Cline’s file manipulation is impressive, the cost is a significant concern:

In a single morning of work, I spent $3.17 USD without even completing the task.
Given my usual programming schedule (about 4 days a week), two weeks of using Cline would already exceed the monthly subscription cost of Cursor.
Unless Cline can demonstrate a decisive advantage, Cursor remains the more economical choice for my workflow.

Poor Performance of Large Models on Specific Tasks

Thu, 19 Jun 2025 16:34:32 +0800

Vision large models perform poorly on some specific tasks but perform better with formatted text. Here, I use the localization of meter reading areas as an example to demonstrate the performance of large models.

Source Code

https://github.com/Svtter/vl-model/pull/4

Test Tasks

Extract text boxes from the image.
Extract the meter reading area from the image.

Test File

We can observe the performance differences among various models from these test results:

Test Results Comparison

Results Using Bounding Boxes as Prompts

Detailed Performance of Each Model

Anthropic Claude 3.5 Sonnet

Google Gemini 2.5 Pro

OpenAI GPT-4o

Analysis Summary

From these test results, we can observe:

Differences in Visual Recognition Capabilities: Different models exhibit significant performance variations when handling the same visual task.
Formatted Text Processing: Compared to visual tasks, models perform more stably when processing structured text.
Model Characteristics: Each model has its unique strengths and limitations.

These results remind us to evaluate the suitability of AI models based on specific task types when making selections.

Using uv to publish Python packages

Tue, 03 Jun 2025 15:54:28 +0800

1
2
3


[build-system]
requires = ["setuptools>=42", "wheel", "uv>=0.6.0"]
build-backend = "setuptools.build_meta"

1

uv build

1

python -m twine upload

1
2
3


[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

1
2
3
4
5
6


[tool.pdm]
distribution = true

[tool.pdm.version]
source = "file"
path = "src/spback/__init__.py"

Since migrating from pdm to uv, besides dependency management, I also wanted to use uv for publishing packages.

Method 1

LLMs provided a solution, suggesting to add the following content in pyproject.toml:

After adding this content, we run:

Then run:

The package can then be published.

Method 2

Since there are many projects using pdm, directly modifying pdm can also cause significant inconvenience.

You can still use pdm as the build-system but use uv as the package management tool.

In other words:

even

Some Thoughts

LLMs are already quite powerful. However, LLMs cannot guarantee the accuracy of generated content, requiring human verification. Therefore, the human who verifies the output is essential.

This code must be verified by a human to work. Of course, if it’s merely about modifying content, LLMs can collaborate with us, in the form of a cursor.

2025 Q1 Summary

Sat, 03 May 2025 10:56:16 +0800

New Insights:

Connect knowledge into a network rather than isolated nodes; forming a graph structure allows you to gain new insights from each of your own insights.
Use tools appropriately, don’t blindly pursue programmability and reusability. This is a habit developed as a software engineer, but when solving problems, tailor the approach to the specific issue.

Previous Phase

Below is a summary of Q1 activities from the perspectives of work, learning, and research.

Research

I originally planned to publish a conference paper in Q1 and follow up on some of the latest research progress, but I found that this task was not actually completed. I believe this was due to preparing the SWR paper. I supplemented a large number of experiments on SWR and rewrote much of the code that I previously considered unreliable. Submitted the SWR paper.
I refactored part of the meter-viewer content; preparing to reintroduce new methods for building more suitable datasets.
I reimplemented SWR based on Lightning.
Considered domain adaptation models for meter recognition problems.
Explored some active learning algorithms.

Work

Further design and foundational thinking on meterhub.
- Spent most of January improving meterhub.
- Feedback features.
- Login with email.
Released django-login-mail v0.6.1.
Started a new project: githubManager.
Upgraded the Hugo theme and update process for HIGH, using the stack theme and corresponding GitHub CI.

Learn

Experimented with OpenAI’s SDK and simple calls to LlamaIndex.
Revisited MAE.
Read CLIP code.
OpenManus and Docker.
ComfyUI testing.
MCP server.
CUDA-related problems…
Using OpenRouter.
Sharpened basic programming skills.
- Rethinking about functional programming.
- Where to Put Your Data Folder.md
- Using PDM: https://svtter.cn/p/dynamic-version-in-pdm.md/
Operations and maintenance.
- SSL certificates: https://svtter.cn/p/certbot-self-signed.md/

Plans for the Next Phase

Focus on writing the overall framework for the paper.
- Since I’m not someone who can sit still for long periods, I need to pay extra attention to staying focused.
Attempt to publish a conference paper; the level is not important.
Refine the outline of the graduation thesis and write the details.
Write a new paper.

Deployment of Dify 1.2.0

Tue, 22 Apr 2025 11:20:02 +0800

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


git clone https://github.com/langenius/dify
cd dify/docker
cp .env.example .env
docker compose up -d
```I believe hackers should abandon the idea of building agents from code and fully embrace workflow platforms like Dify. This approach is many times more efficient than writing code. If you must write code, you can develop plugins to embed into Dify.

What is Dify? A workflow platform designed for LLMs.

<script src="/js/repo-card.js"></script>

<!-- inside body, where you want to create the card -->
<div class="repo-card" data-repo="langgenius/dify"></div>

## Deployment Method

Simply execute the following code on your server.## Deployment Issues

Although Dify is an open-source project, being relatively new, it often encounters various unusual problems.

### Plugin Restart Problem

When using Dify 1.2.0, the Dify plugin daemon would continuously restart. Refer to this [issue](https://github.com/langgenius/dify/issues/17788) for details.

> Interestingly, in this issue, the problem was solved by AI.

### Protocols Problem

`http ... https`

Adjust the `FILE_URLS` variable.

## Plugins

To utilize certain features, I developed a Dify plugin for file compression.

<script src="/js/repo-card.js"></script>

<!-- inside body, where you want to create the card -->
<div class="repo-card" data-repo="svtter/filecompress"></div>

## Resource Attribution

- Images sourced from [chatgpt-lab](https://chatgpt-lab.com/n/n12d18abb26c8?gs=a6ed475ccea2)

Work With Langfuse

Mon, 21 Apr 2025 14:51:38 +0800

When developing LLM applications, we consider performance issues during LLM calls and monitor outputs during the process.

At this point, tools like LangSmith and Langfuse become very useful.

However, sometimes we have local computing resources and prefer not to use cloud-based resources for LLM call monitoring, so we might not consider LangSmith.

In such cases, we can use Langfuse for this purpose.

Deployment

Deploying Langfuse is very simple; all you need to do is:

1
2
3


git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d

This way, the deployment is successful.

Replacement

If you previously used OpenAI’s SDK, you can continue using it as follows.

Install langfuse in the project:

1

pip install langfuse

To configure the API key, you need to use it in the deployed langfuse:

1
2
3


LANGFUSE_SECRET_KEY=<secret key>
LANGFUSE_PUBLIC_KEY=<public key>
LANGFUSE_HOST="http://localhost:3001"

Here I have set the Langfuse port to 3001; you should adjust according to your own configuration.

Simply replace the original OpenAI configuration:

1
2
3


# remove: import openai

from langfuse.openai import openai

In addition, langfuse also supports langchain and llamaindex, which will not be elaborated on further here.

Thoughts

Coze is also developing a large model agent framework, but the approach is quite different. Coze is building everything, including workflows and LLMs, making it relatively closed.

However, langfuse is more open, allowing the use of langchain and other models.

As a developer from a small company, I prefer the langfuse model because it offers more choices. However, if the project timeline is tight and Coze is barely usable, I would choose Coze.

Issues

An exception occurred when I replaced the OpenAI SDK:

1

Unexpected error occurred. Please check your request and contact support: https://langfuse.com/support.

I still encountered issues when testing test_langfuse.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


import os

from langfuse.decorators import observe
from langfuse.openai import openai


@observe()
def story():
 return (
 openai.chat.completions.create(
 model="moonshot-v1-auto",
 max_tokens=100,
 messages=[
 {"role": "system", "content": "You are a great storyteller."},
 {"role": "user", "content": "Once upon a time in a galaxy far, far away..."},
 ],
 )
 .choices[0]
 .message.content
 )


@observe()
def main():
 return story()


def test_langfuse():
 assert os.getenv("OPENAI_BASE_URL") is not None
 assert os.getenv("OPENAI_API_KEY") is not None
 main()

Regarding this issue, I have opened a discussion.

Additionally, if you wish to view the original code, you can obtain it from https://github.com/svtter/pdf-reader.

Draw Subfig and Making Subplot

Tue, 08 Apr 2025 10:17:09 +0800

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73


import matplotlib.pyplot as plt

plt.subplot(nrows, ncols, index)

```python
# 创建子图
plt.figure(figsize=(10, 6))

```python
import matplotlib.pyplot as plt
import numpy as np

# 创建数据
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)

# 创建子图
plt.figure(figsize=(10, 6))

# 第一个子图
plt.subplot(2, 2, 1) # 2行2列的第1个
plt.plot(x, y1, label="sin(x)")
plt.title("Sine Wave")
plt.legend()

# 第二个子图
plt.subplot(2, 2, 2) # 2行2列的第2个
plt.plot(x, y2, label="cos(x)")
plt.title("Cosine Wave")
plt.legend()

# 第三个子图
plt.subplot(2, 2, 3) # 2行2列的第3个
plt.plot(x, y3, label="tan(x)")
plt.title("Tangent Wave")
plt.legend()

# 第四个子图
plt.subplot(2, 2, 4) # 2行2列的第4个
plt.plot(x, y1 + y2, label="sin(x) + cos(x)")
plt.title("Sum of Sine and Cosine")
plt.legend()

# 显示图形
plt.tight_layout() # 自动调整子图间距
plt.show()

```python
import torchvision.utils as vutils

# 将图片制作成网格
grid_img = vutils.make_grid(x, nrow=4, padding=2)

# 可视化网格图片
plt.figure(figsize=(10, 10))
plt.imshow(grid_img.permute(1, 2, 0)) # 调整通道顺序以适应 matplotlib 的要求
plt.axis('off')
plt.show()j
```Creating subplots for image browsing offers great flexibility, but I often forget how to use them. I’m writing this blog specifically to reinforce my memory.

## Drawing Subplots

First, to draw a subplot at a certain position, you need to call the `plt.subplot` method.
They represent the row, column, and subplot index, respectively. I often used to forget that the index refers to the subplot number.

Before drawing subplots, you must first create them.## Example
Next, here is a complete example.### Preview

![](Figure_1.png)

If you want to view images with a shape of (32, 1, 192, 32), you can also use the utility functions provided by torchvision.

Diffusion Model

Sat, 05 Apr 2025 21:51:38 +0800

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17



noise image -> -------------
 | |----> cleared image
step (int) -> | denoiser |
 | |
 --------------|
```The charm of deep learning lies in the fact that once a new task achieves improved performance with a certain architecture, many other tasks can refer to this architecture and benefit from it.

I believe the diffusion model is a typical example. Although I do not conduct research on diffusion models and currently have no related projects, there is no harm in understanding this network architecture.

The diffusion model is one that benefits from the image processing process.

By learning the reverse process of adding noise to images, the diffusion model acquires the ability to generate images from noise.

![noise-dog](noise-dog.png)

To enable the model to achieve better performance, the denoising step of the model is included as one of the inputs.

Using uv

Sun, 30 Mar 2025 14:33:34 +0800

Recently, I’ve started using uv extensively instead of pdm.

knowledge piece

uvx could replace pipx.

The uvx command invokes a tool without installing it.

For example, to run ruff

1

uvx ruff

A Docker Image for Computer Vision

Wed, 26 Mar 2025 19:57:22 +0800

When debugging deep learning code, we often face headaches due to environment issues.

To facilitate debugging, packaging environments like PyTorch and CUDA into a Docker image is an excellent choice.

Why?

Time-saving: Repeatedly configuring and adjusting versions wastes time, leading to spending a lot of effort on ops tasks.
Environment stability: Once a Docker image is built, it is static and can be pulled directly.
Easy migration: Pre-configured environments can be migrated across different machines.

How to Build

Here is an example Docker image for packaging a deep learning environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


# Change to your desired pytorch version
FROM pytorch/pytorch:2.4.1-cuda11.8-cudnn9-devel

# These are commonly used packages
RUN apt-get update && apt-get install git zsh ffmpeg libsm6 libxext6 -y && apt-get clean && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Place at the root of the codebase to install requirements.txt
COPY requirements.txt .
RUN pip install -r requirements.txt

# install jupyterlab
RUN pip install jupyterlab
# COPY . .

# Use jupyterlab to host, can start quickly, token is `yourtoken`. If you use it on the public network, consider using a more complex token.
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root", "--NotebookApp.token=yourtoken"]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


services:
 notebook:
 build:
 context: .
 dockerfile: Dockerfile
 volumes: # You can also mount the dataset you need
 - .:/app
 - ~/.ssh:/root/.ssh # Support ssh
 ports:
 - 8888:8888
 shm_size: '32gb'
 deploy:
 resources:
 reservations:
 devices:
 - driver: nvidia
 count: all
 capabilities: [gpu]

This example installs some basic libraries, and opencv-python can be installed via pip.

Place the Dockerfile in the directory, then you can start it using docker compose.

The startup command is: docker compose up -d.

Download from Dockerhub

To make it convenient for everyone to use directly, I have packaged this image and uploaded it to Dockerhub. The download command is:

1

docker pull svtter/debian-pytorch

The source code can be obtained from here:

Using on Runpod

For everyone’s convenience, I have created a template on Runpod.

https://console.runpod.io/deploy?template=m0shpm3vgg&ref=g5qp1x9x

You can directly use this image by using this template.

Read Code of CLIP

Wed, 19 Mar 2025 13:23:50 +0800

Contrastive Language-Image Pre-Training (CLIP) is one of the classic works from OpenAI, originating from the paper .

To implement my new idea based on CLIP, I attempted to read openai/clip to understand the fundamental working principles of CLIP in classification.

Here is the Python sample code provided by openai/clip:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
 image_features = model.encode_image(image)
 text_features = model.encode_text(text)

 logits_per_image, logits_per_text = model(image, text)
 probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs) # prints: [[0.9927937 0.00421068 0.00299572]]

The load function is used to load a specific OpenAI model. This is based on ViT-B/32, a Vision Transformer 32B.

It can be seen that the vision encoders supported by OpenAI roughly include the following types:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


_MODELS = {
 "RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
 "RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
 "RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
 "RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
 "RN50x64": "https://openaipublic.azureedge.net/clip/models/be1cfb55d75a9666199fb2206c106743da0f6468c9d327f3e0d0a543a9919d9c/RN50x64.pt",
 "ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
 "ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
 "ViT-L/14": "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt",
 "ViT-L/14@336px": "https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt",
}

Assuming the model has already been downloaded, let’s examine how the _transform preprocessing works:

1
2
3
4
5
6
7
8


def _transform(n_px):
 return Compose([
 Resize(n_px, interpolation=BICUBIC),
 CenterCrop(n_px),
 _convert_image_to_rgb,
 ToTensor(),
 Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
 ])

It’s not overly complex, though the preprocessing Normalize parameters are not entirely clear. It seems to use the same preprocessing parameters as ViT.

Then, moving into the model loading phase, we can see that if it’s not jit loading, the model will opt for the state_dict mode.
Through the process of loading the state_dict, we can observe that the build_model function is used to load the weights and assign them to the existing model structure.

The file for this model structure is model.py.
Therefore, the main code for CLIP is located at model.py#L243.

The outputs of the image_encoder and text_encoder are two distinct feature tensors.

Performing a matrix multiplication on these two tensors yields a similarity matrix. The size of this similarity matrix is (batch_size, batch_size).

TIPS: If the batch size is too small, such as 1, the performance of contrastive learning may be significantly diminished.

These two tensors are computed using symmetric cross-entropy loss to update the network weights.

Specifically focused on improving intelligence metrics, without much concern for computational cost. Not pursuing the latest or highest intelligence metrics, but more focused on the computational efficiency of the model.

Trick: Adding a log to the parameters to make weight updates less drastic and reduce computational intensity.

The CLIP code does not provide directly trainable code. In the next article, we’ll attempt to read openclip.

Create a Never Stop Container

Fri, 14 Mar 2025 16:45:58 +0800

Sometimes we need to start a container that does not stop, for debugging our application or using devcontainer.

If we want to accomplish this in the Dockerfile, we can add the following:

1
2
3
4
5


...

# 其他内容

ENTRYPOINT ["tail", "-f", "/dev/null"]

If it’s docker-compose.yml, we can do it like this

1
2
3


services:
 your-app:
 entrypoint: ["tail", "-f", "/dev/null"]

This way, the container will not stop.

RAG with LlamaIndex and Ollama

Sun, 09 Mar 2025 12:44:24 +0800

If you want to build a RAG system locally, we can use ollama as the base model and llamaindex to construct the agent.

Since llamaindex defaults to using OpenAI, we first need to adjust the default embedding model and LLM model.

1
2


 Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1])
 Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0)

The base_url needs to be replaced with your own ollama instance, such as http://localhost:11434.

If the files in the directory are all txt or md data, you can directly use SimpleDirectoryReader to read the basic data.

1
2


 # Create a RAG tool using LlamaIndex
 documents = SimpleDirectoryReader("data").load_data()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.ollama import OllamaEmbedding

def get_agent(model_name: str):
 Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1])
 Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0)

 # Create a RAG tool using LlamaIndex
 documents = SimpleDirectoryReader("data").load_data()
 index = VectorStoreIndex.from_documents(documents)
 query_engine = index.as_query_engine()


 async def search_documents(query: str) -> str:
 """Useful for answering natural language questions about an personal essay written by Paul Graham."""
 response = await query_engine.query(query)
 return str(response)


 agent = FunctionAgent(
 name="Agent",
 description="Useful for multiplying two numbers and searching documents",
 tools=[multiply, search_documents],
 llm=ollama,
 system_prompt="You are a helpful assistant that can multiply two numbers and search documents to answer questions",
 )
 return agent

async def main():
 models = ('bge-m3', 'nomic-embed-text',)

 for model_name in models:
 print(f'model: {model_name}')
 agent = get_agent(model_name=model_name)
 response = await agent.run("What did the paul graham do in college? Also, what's 7 * 8?")
 print(str(response))
 print("Done.")
 print('-' * 100)

await main()

Update the Hugo

Fri, 07 Mar 2025 11:39:32 +0800

Since the previously used Hugo version was too low and updating it would require significant effort, I have now updated Hugo, allowing me to focus solely on writing articles.

The new theme I am currently using is hugo-theme-stack.

Because my Hugo source files and <username>.github.io are not in the same repository—meaning I cannot directly configure gh-pages using a branch—I have adjusted the workflow to suit my situation. Here is my workflow configuration file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53



name: Deploy to Github Pages

on:
 push:
 branches: [master]
 pull_request:
 branches: [master]

jobs:
 build:
 runs-on: ubuntu-latest

 permissions:
 # Give the default GITHUB_TOKEN write permission to commit and push the
 # added or changed files to the repository.
 contents: write

 steps:
 - uses: actions/checkout@v4
 with:
 fetch-depth: 0

 - name: Cache Hugo resources
 uses: actions/cache@v4
 env:
 cache-name: cache-hugo-resources
 with:
 path: resources
 key: ${{ env.cache-name }}

 - uses: actions/setup-go@v5
 with:
 go-version: "^1.17.0"
 - run: go version

 - name: Setup Hugo
 uses: peaceiris/actions-hugo@v2
 with:
 hugo-version: "latest"
 extended: true

 - name: Build
 run: hugo --minify --gc

 - name: Deploy
 uses: peaceiris/actions-gh-pages@v3
 with:
 personal_token: ${{ secrets.
ACCESS_TOKEN }}
 external_repository: svtter/svtter.github.io
 publish_branch: master
 publish_dir: ./public

Openrouter Usage

Mon, 03 Mar 2025 11:45:12 +0800

Zhou Tian developed an application based on a large model using OpenRouter and encountered some issues, documenting a few insights.

No Support for Embeddings

The biggest issue is the lack of support for the embedding API. Although OpenRouter already supports API endpoints for various models like OpenAI, embeddings are crucial for developing RAG applications. The absence of embedding support renders OpenRouter ineffective in practical application development.

Python Timezone

Fri, 28 Feb 2025 17:46:29 +0800

Regardless of the current server settings, output the time in Asia/Shanghai.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


import datetime
import pytz

utc_now = datetime.datetime.utcnow() # Get current time in UTC
utc_timezone = pytz.utc
utc_now = utc_timezone.localize(utc_now) # Localize the time as UTC

# Convert to another timezone, e.g., 'America/New_York'
new_timezone = pytz.timezone('Asia/Shanghai')
new_timezone_time = utc_now.astimezone(new_timezone)

print(new_timezone_time.strftime('%Y-%m-%d %H:%M:%S %Z%z')) # Display time in the new timezone

Where to Put Your Data Folder

Mon, 24 Feb 2025 14:34:56 +0800

When training models, we should place data and code in the same location as much as possible.

Keeping them in the same location helps avoid path-related issues, such as needing to specify absolute paths for the data.

For example, if I set the path to ./data/, I only need to place the data in the ./data directory.

I can do this by:

1

ln -s $(source-path-of-dataset) ./data

To link data from other locations.

If on the same host, git can automatically synchronize these links.

However, if on different hosts, you need to manage them yourself.

Easy Zshrc Configuration

Sat, 15 Feb 2025 21:11:14 +0800

I used to frequently install oh-my-zsh on servers, but sometimes the network connection was poor, making the installation quite troublesome. In such cases, what I needed was a simple zsh configuration that just worked.

In addition to highlighting ls, docker compose has been configured.

If fzf is installed, you can also configure zshenv to enable fzf.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


# 设置编辑器
export EDITOR="vim"

# 设置提示符
PROMPT='%F{blue}%n%f@%F{green}%m%f %F{cyan}%~%f %# '

# 别名定义
alias ls='ls --color=auto'
alias ll='ls -la -G'
alias c='clear'
alias dc='docker compose'

HISTFILE="$HOME/.zsh_history"
HISTSIZE=10000000
SAVEHIST=10000000
setopt BANG_HIST # Treat the '!' character specially during expansion.
setopt EXTENDED_HISTORY # Write the history file in the ":start:elapsed;command" format.
setopt INC_APPEND_HISTORY # Write to the history file immediately, not when the shell exits.
setopt SHARE_HISTORY # Share history between all sessions.
setopt HIST_EXPIRE_DUPS_FIRST # Expire duplicate entries first when trimming history.
setopt HIST_IGNORE_DUPS # Don't record an entry that was just recorded again.
setopt HIST_IGNORE_ALL_DUPS # Delete old recorded entry if new entry is a duplicate.
setopt HIST_FIND_NO_DUPS # Do not display a line previously found.
setopt HIST_IGNORE_SPACE # Don't record an entry starting with a space.
setopt HIST_SAVE_NO_DUPS # Don't write duplicate entries in the history file.
setopt HIST_REDUCE_BLANKS # Remove superfluous blanks before recording entry.
setopt HIST_VERIFY # Don't execute immediately upon history expansion.
setopt HIST_BEEP # Beep when accessing nonexistent history.

# 启用 fzf 相关功能
[ -f /usr/share/doc/fzf/examples/key-bindings.zsh ] && source /usr/share/doc/fzf/examples/key-bindings.zsh
[ -f /usr/share/doc/fzf/examples/completion.zsh ] && source /usr/share/doc/fzf/examples/completion.zsh

setopt no_nomatch

1
2
3
4



# 启用 fzf 相关功能
[ -f /usr/share/doc/fzf/examples/key-bindings.zsh ] && source /usr/share/doc/fzf/examples/key-bindings.zsh
[ -f /usr/share/doc/fzf/examples/completion.zsh ] && source /usr/share/doc/fzf/examples/completion.zsh

Cuda and Paddle

Tue, 11 Feb 2025 15:41:18 +0800

Configuring a deep learning environment is a hurdle many struggle to overcome. However, with large models, troubleshooting and pinpointing issues can be significantly faster.

I spent some time adapting an older version of PaddlePaddle and finally got it working. Here, I’ll share an article documenting the process.

In Docker images, many CUDA 11-based images fail to run in a CUDA 12 environment. The exact reason isn’t entirely clear to me. In such cases, you can opt for a CUDA version that matches the major release.

To avoid affecting the environments of others on the server, do not update the NVIDIA driver. Instead, install your own CUDA version and modify the environment variables to change the system’s CUDA.

1
2
3
4


# CUDA_VERSION=11.7
export CUDA_HOME="/usr/local/cuda-$CUDA_VERSION"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export PATH=$CUDA_HOME/bin:$PATH

Apply this environment variable, then check nvidia-smi to see the version change.

Change Network of Debian

Mon, 10 Feb 2025 20:49:27 +0800

Sometimes we want to modify the default IP address and DNS server to achieve better network performance.

For Debian, modify two files: /etc/network/interfaces and /etc/resolv.conf.

Regarding interfaces:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug ens18
iface ens18 inet static
address 192.168.2.35
netmask 255.255.255.0
gateway 192.168.2.60

When the gateway is 192.168.2.60;

If DNS configuration is also needed, modify /etc/resolv.conf

1

nameserver 192.168.2.60

Disable IPv6

/etc/sysctl.conf

1

net.ipv6.conf.all.disable_ipv6 = 1

Don’t forget to restart the network: systemctl restart networking.

Using git to manage logseq files

Sat, 08 Feb 2025 18:18:29 +0800

Since I frequently use multiple devices simultaneously, I often encounter the issue of writing articles on one device and then continuing to use Logseq on another. Simply copying and pasting feels cumbersome.

I have roughly divided the solution to this problem into two stages. The first stage involves using a portable hard drive to directly copy data between different systems. During this stage, I utilized a Git bare repository.

Stage 1

git init --bare logseq-database.git

Then, in other working Git repositories, I added a remote. For example, if my disk path is E:\:

git remote add origin E:\logseq-database.git

This way, I could synchronize Logseq data directly between different devices.

Stage 2

I found using a portable hard drive still inconvenient. I repurposed a 10-year-old ThinkPad, installed Gitea on it. Initially, I used Gogs, but Gogs had unfriendly handling of submodules and inexplicable bugs. Therefore, I ultimately switched to Gitea. After setting up Gitea, I migrated the original Git repository to my local machine. For example: http://gitea.local/svtter/logseq-database.git.

Stage 3

I discovered that storing and sharing large files often caused issues. I added support for git-lfs by running:

git lfs install

and

git lfs track *.pdf

to prevent large files from leaving too much data in the .git directory.

Stage 4

Periodically back up part of the data to GitHub. However, I generally no longer do this.

When using Git for merging, one must be cautious about an issue: if a file name changes, automated merging will simply ignore it. Here’s how it typically happens: I modify a file on two devices simultaneously, and on one device, not only is the content changed, but the file name is also altered. Then, both devices perform a Git commit separately. As a result, when Git performs the merge, it won’t prompt a conflict. After Git’s automated merge, the modifications made on one of the devices will disappear.

To address this issue, the approach is to use the rebase method as much as possible during merging. However, rebasing is slow when merging files and requires a lot of manual handling.

Therefore, when modifying the same file on two devices simultaneously, first pull the remote changes. Second, avoid changing file names whenever possible. Otherwise, changes may be lost.

Fortunately, Git commit history is always preserved. If all else fails, retrieve the lost parts from the commit records and add a new commit.

Browsing and Storing Image Datasets

Sun, 12 Jan 2025 18:31:12 +0800

Browsing datasets can be quite troublesome, especially when the dataset is large.

npy (numpy array) and h5 files are two common data storage formats.
The drawback of h5 files is that they are prone to data corruption. I have encountered issues multiple times where h5 files could not be opened.
npy files have clear advantages in terms of read speed and file transfer. However, they are loaded entirely into memory at once, which can easily cause memory overflow if the server is not powerful enough.

Common image datasets typically separate labels and images, such as COCO. This allows you to use a file browser to view images and quickly observe their characteristics. However, in most cases, we don’t view images on a local computer but rather work with datasets on a server.

In 2024, when working with PyTorch, I find it more convenient to directly plot images using matplotlib. Matplotlib is generally used to display a single image, but using subplots allows you to display multiple images simultaneously. If OpenCV is used, you can overlay some label values onto the images. However, there is a drawback: if you are working on a remote server, transferring generated images can consume significant bandwidth.
Ultimately, the choice of method depends on your own judgment!

Documenting a WordPress Migration

Fri, 15 Nov 2024 17:16:45 +0800

Previously, WordPress was running on CentOS 7; the performance of this machine was often underutilized, so some migration was needed to improve performance. To avoid losing relevant data, the WordPress migration work was carried out. This article documents the WordPress migration process.

To minimize the time spent on backups, I first used the WordPress plugin, All-in-one WP migration. This plugin can back up plugins, articles, themes, and other plugins.

When backing up the old website, I downloaded the generated backup file.

When creating the new website (via Coolify), the file upload kept failing. I wasn’t sure what was happening.

Subsequently, I modified several upload restrictions.

One of them was .htaccess.

1
2
3
4
5


 php_value upload_max_filesize 200M
 php_value post_max_size 200M
 php_value memory_limit 256M
 php_value max_execution_time 300
 php_value max_input_time 300

Another one is wp-config.php

1
2
3
4
5


 @ini_set( 'upload_max_filesize' , '200M' );
 @ini_set( 'post_max_size', '200M');
 @ini_set( 'memory_limit', '256M' );
 @ini_set( 'max_execution_time', '300' );
 @ini_set( 'max_input_time', '300' );

My backup file is 199MB in size. However, despite adjusting the two files mentioned above, I found that I still couldn’t restore the backup. This left me puzzled. Through console debugging, I discovered that after the upload was completed, the server would return an HTTP 413 error. Later, I found this article and identified the issue.

After troubleshooting, I realized that the problem was actually caused by Cloudflare. The free Cloudflare plan does not support file uploads larger than 100MB, resulting in an HTTP 413 error.

Subsequently, I configured my local hosts file to allow the domain to directly connect to the server’s real IP address.

Finally, it succeeded.

2024 Q3 Summary

Mon, 14 Oct 2024 11:08:56 +0800

This is a summary for July 2024 to September 2024.

This quarter passed quite quickly; it doesn’t seem long since the last review. There truly is a gap in intelligence between people; I feel like I’m a not-very-hardworking fool.

When conducting a review, try not to modify the content of the journal as much as possible. Otherwise, it will be troublesome to trace back later.

Life #life

Most of my energy during this period has been focused on taking care of my pregnant wife. The child was born safely, and the mother is doing well and happy 😆. For friends preparing for pregnancy and childbirth, I recommend this book: Pregnant with My Wife. Pregnancy is quite challenging for both men and women. Influenced by hormones, emotions can often fluctuate easily, and the man needs to balance work and family well.

Financially, prepare sufficient funds and budget for family expenses in advance. If your job is not stable enough, I don’t recommend rushing to have children. Although the old saying goes, “Start a family before establishing a career,” starting a family doesn’t mean having children immediately. During pregnancy, the emotional needs of the woman will be higher, so be mentally prepared. It’s best to seek advice from slightly older friends or relatives. Also, pay special attention to the location for childbirth. The ideal situation is to have your own independent family. Otherwise, you might have to deal with a series of complicated matters, which can be exhausting. Adding work pressure on top of that makes it even more difficult. In short, if conditions aren’t right, don’t have children; don’t make things hard for yourself. If you’re unhappy, your family will likely be unhappy too.

Read Rich Dad Poor Dad 2 but couldn’t finish it. Made a small profit in stocks and cashed out; couldn’t keep up with the market trend anymore. In investment and financial management, too many people want to make quick money, so designing strategies directly aimed at quick profits can make money. Leverage is absolutely not advisable; it’s just used to deceive gamblers.

I wonder how the friend who borrowed money to buy stocks a couple of days ago is doing.

Research #Research

Although a new network was constructed, it couldn’t be transformed into a publishable paper.

Experiments and New Research

New experiments verified that CRNN is still the best method in certain scenarios; formed an article specifically discussing CTC. However, it’s not yet at the level for publication.

Tried many multimodal approaches, different encoding methods, word embedding, and one-hot. Spent a lot of time learning about transformers. Completed querynet, a new structure for solving multimodal problems. However, it still hasn’t solved the problem I proposed.

In terms of mid-term progress, the content is insufficient. Although I’ve done many experiments now, relatively few can be solidified into theories, and the overall coherence is lacking. This is not very consistent with the original plan.

Regarding submissions, just received feedback on the new paper SWR; the new paper has been rejected and is still in rebuttal. But overall, it’s still okay.

I feel that the current research progress is fine, but there haven’t been significant achievements.

Some Thoughts

If researching a field, spend time exploring, organizing materials, and increasing understanding of the industry.
Organizing thoughts #thinking

Extensively adopted functional programming
Deeply explored the React framework, understood useState, useEffect, and fixed many bugs in the framework.
Deeply explored react-router.

Developed a relatively universal front-end and back-end underlying framework. Front-end based on React, back-end based on Django Ninja.
- After so many years of development, I’ve finally solidified something. By using this framework, many problems brought by the native Django and React frameworks can be avoided. This is a kind of account for all these years.
- The framework still has many areas for improvement, with the biggest help being efficiency improvement. If there’s an opportunity, it will be open-sourced.

Next Quarter

Adjust SWR well, submit it, and strive to get one article accepted.
Find a way forward for querynet and the new research problem proposed, respectively.
Launch meterhub.

On top of completing the above, finish the projects at hand and then organize the graduation thesis.

Posts on Svtter's Blog

OMP M3 Model Patch: Adding MiniMax M3 to pi-ai

Target File

Provider Entries Added (5)

1. minimax (Official Anthropic-compatible API)

2. minimax-cn (Official Anthropic-compatible API, China)

3. minimax-code (Coding Plan, OpenAI-compatible)

4. minimax-code-cn (Coding Plan CN)

5. openrouter (OpenRouter Passthrough)

Verification

Caveats

Addendum (2026-06-02): The proper route via OMP user config

Final config

Key choices

Switching to it

How the two routes compose

How kimi-code Handles kimi-k2.6: A Comparison with OpenCode

1. Native Kimi Provider (Not a Generic OpenAI-compatible Layer)

2. Full Lifecycle Handling of reasoning_content

3. JSON Schema Normalization (kimi-schema.ts)

4. Native Thinking Mode Configuration System

5. Native Moonshot Service Integration

6. Tool Call Layer Details

7. CLI Core and LLM SDK Architectural Isolation

8. What Commit History Reveals About Evolution Paths

kimi-code: Native Design, Continuously Reducing Configuration Burden

OpenCode: Generic Layer Design, OpenAI-centric

A Hard Fact

Commit History Comparison Summary

Summary: Core Differences

RoboOmp: An AI Bot That Creates Its Own Pull Requests

The Origin

RoboOmp’s First Response

I Pointed Out the Design Flaw

It Opened a PR Itself

Why This Shocked Me

Complete Closed Loop

It Knows When to Wait

The Fix Was Minimal

What Is RoboOmp

Are There Similar Open Source Projects

What This Means

OpenCode's GitHub Actions Automation System: Engineering Practices Behind 27 Workflows

Overview

CI/Testing: Solid but Restrained

Release/Delivery: Full Platform Coverage

Automation/Bot: AI-Driven Community Governance

Issue Management

PR Management

AI Code Review

Documentation & Maintenance

Design Patterns Worth Adopting

1. Layered Governance

2. Compliance Grace Period

3. AI at Decision Points, Not Execution Points

4. Explicit vs Automatic Triggers

Risks of Over-Automation

Insights for Our Projects

Summary

OpenCode Optimization Beyond Configuration — Plugin-Based Optimization

Problem: Who Guards Code Quality Within a Session?

Design of opencode-review

Multi-Dimensional Analysis

Severity Grading

Auto-Fix Chain

Hidden Benefit of Auto-Review: Continuous Improvement of Code Quality Baseline

Cooldown Mechanism

opencode-froggy: Another Approach

Comparison

Plugin Installation

Other Notable Plugins

References

A New Blog Format

sth: An HTML Preview Server for AI Agents

Why This Tool Is Needed

What sth Does

My Actual Usage

Metadata Management

Technical Details

Open Source

1. `minimax` (Official Anthropic-compatible API)

2. `minimax-cn` (Official Anthropic-compatible API, China)

3. `minimax-code` (Coding Plan, OpenAI-compatible)

4. `minimax-code-cn` (Coding Plan CN)

5. `openrouter` (OpenRouter Passthrough)

2. Full Lifecycle Handling of `reasoning_content`

3. JSON Schema Normalization (`kimi-schema.ts`)