AI on Svtter's Blog

RoboOmp: An AI Bot That Creates Its Own Pull Requests

Sat, 23 May 2026 18:00:00 +0800

Yesterday at the Oh My Pi (OMP) repository, I experienced something shocking: an AI bot didn’t just reply to my issue—it understood the problem, dug through the source code on its own, and opened a precise PR to fix the bug. The entire process took less than 5 minutes.

The Origin

When using OMP (a terminal AI coding agent), I discovered a UX issue: Ctrl+T can hide thinking blocks, but hiding them simultaneously turns off extended thinking entirely—not just hiding the display, but the model stops thinking altogether. Users assume they’re just “turning off the display,” but the actual effect is “turning off the brain.”

So I went to the OMP GitHub repository and opened a feature request: #1313.

RoboOmp’s First Response

Seconds after I submitted the issue, a bot called roboomp automatically replied. Not with template nonsense like “thanks for your feedback, forwarded to the product team.” It directly told me:

Most of this feature already exists—the hideThinkingBlock setting, Ctrl+T shortcut, and rendering path
The only missing piece is a CLI startup parameter
There’s a design decision that requires maintainer input: the coupling between hideThinkingBlock and hideThinkingSummary

And it provided exact filenames and line numbers: settings-schema.ts:663, input-controller.ts:755, stream.ts:583,697.

This wasn’t cobbled together from search results—it actually read the code.

I Pointed Out the Design Flaw

I replied with a comment explaining that this coupling is a footgun:

Users press Ctrl+T intending to reduce visual noise, but unknowingly turn off extended thinking, degrading model output quality
“Don’t want to see the reasoning process” and “don’t want the model to reason” are two different things that shouldn’t be tied together
The behavior varies across providers (MiniMax can’t turn it off, Anthropic/OpenAI can), so the same shortcut has inconsistent behavior

I also included the commit history that introduced this coupling for easier tracing.

It Opened a PR Itself

Then something unbelievable happened—roboomp replied with two consecutive comments and directly opened a PR: #1314.

The PR changes: 0 addition, 3 deletion. It only deleted three lines:

sdk.ts:1860 — agent initialization no longer assigns hideThinkingBlock to hideThinkingSummary
input-controller.ts:758 — Ctrl+T handler no longer links them
selector-controller.ts:273 — settings UI follows the same logic

The PR description included complete repro steps, root cause analysis, and fix approach. It even confirmed the commit archaeology I provided—45bd444 was indeed the commit that introduced this bug.

Why This Shocked Me

“AI can write code” isn’t news. Copilot, Claude Code, Cursor can all write code. But what’s different this time:

Complete Closed Loop

The entire process was zero-human:

I opened an issue → bot read the codebase, provided existing implementation status
I pointed out the design flaw → bot understood my point
It located the commit that introduced the bug itself, opened a PR that deletes just 3 lines

From issue to PR, no human did anything in between.

It Knows When to Wait

In its first reply, it said “Holding on implementation until a maintainer weighs in on the coupling question”—it knew this was a design decision requiring judgment and shouldn’t act autonomously. But when I clarified the coupling problem, it determined that waiting was no longer necessary and opened a PR directly.

The Fix Was Minimal

0 addition / 3 deletion. It understood what the minimal fix was—no refactoring, no abstraction, no gold-plating. Many human developers can’t do this.

What Is RoboOmp

RoboOmp is an AI bot deployed by can1357, the OMP repository maintainer. It’s not a GitHub Actions workflow (I checked the CI config to confirm), but an independent server-side agent:

Listens to GitHub Webhook events (issue creation, comments, etc.)
Reads source code through GitHub API, understands code structure
Uses LLM to analyze context, autonomously decides next steps—comment, label, open PR

From can1357’s GitHub profile, this person comes from a hypervisor/reverse engineering background (ByePg, NoVmp, NtRays), now working on AI agent platforms (agentx, hindsight). RoboOmp is likely the result of building exceptionally deep code understanding capabilities.

This project is not open source.

Are There Similar Open Source Projects

I looked around, and currently the closest ones are:

Project	Description
optio (962⭐)	AI coding agent workflow orchestration, task → merged PR
claude-code-github-agent	Hooks 40+ GitHub events, auto triage/review/fix, architecture most similar to roboomp
software-factory	Issue/PR-driven automatic development system

But honestly, none reach roboomp’s level. Most are still at the “receive webhook → call LLM → post comment” stage. RoboOmp is the first I’ve seen that can autonomously read source code, understand code structure, participate in design discussions, and make precise fixes.

What This Means

This made me realize that the capability boundaries of AI coding agents are expanding rapidly. A year ago we were discussing “can AI write correct code,” now the question is “can AI be a maintainer in open source communities.”

The capabilities roboomp demonstrated—reading code, understanding context, participating in discussions, making minimal fixes—are essentially what a junior maintainer does. If this capability continues to improve, the maintenance model of open source projects could undergo fundamental changes.

Think about it: what does an open source maintainer spend the most time on every day? Replying to issues, triaging bugs, writing small fixes. These are exactly what roboomp excels at. If every open source project could deploy such a bot, maintainers could focus their time on architectural decisions and community building.

Of course, current limitations are obvious—it can only handle problems with clear boundaries and well-defined scope. But this experience makes me believe that “AI maintainer” is not a distant future scenario, but something happening right now.

OpenCode's GitHub Actions Automation System: Engineering Practices Behind 27 Workflows

Fri, 22 May 2026 10:00:00 +0800

opencode is a 160k-star AI coding tool with 27 workflow files in its .github/workflows/ directory. This number is not uncommon for open source projects, but what’s truly interesting is not the quantity, but the scope these workflows cover: from conventional CI/CD to AI-driven community governance, they’ve done almost everything GitHub Actions can do.

This article analyzes the design of these workflows by category, discusses the pros and cons of this level of automation, and shares insights for our own projects.

Overview

The 27 workflows can be divided into four categories:

Category	Count	Purpose
CI/Testing	4	typecheck, unit tests, e2e, Nix builds
Release/Delivery	5	CLI release, container builds, VS Code extension, GitHub Action release
Automation/Bot	16	issue governance, PR compliance, AI code review, documentation updates
Docs/Other	2	statistics, Discord notifications

16 automation workflows account for 60% of the total. opencode doesn’t just use Actions to run tests and releases—it also entrusts community governance and code quality review to the automation system.

CI/Testing: Solid but Restrained

Four testing-related workflows:

typecheck.yml — Runs bun typecheck on PR and push to dev. Simple and direct, no unnecessary actions.

test.yml — Cross-platform test matrix (Linux + Windows), runs unit tests and Playwright e2e. Has concurrency control where new commits in the same PR cancel old runs. Test results generate JUnit reports uploaded as artifacts.

nix-eval.yml — Verifies Nix flake builds on four architectures (x86_64-linux, aarch64-linux, x86_64-darwin, aarch64-darwin). Mandatory package failures block the build, optional package failures are just warnings.

storybook.yml — Storybook builds for UI components, only triggered when storybook/ui-related files change. Path triggering avoids unnecessary runs.

Several noteworthy design choices:

concurrency group + cancel-in-progress: Multiple workflows use this pattern so the same PR doesn’t stack multiple runs. For a project receiving lots of community PRs, this saves significant CI resources.
Path triggering: containers.yml only runs when container files change, storybook.yml only runs when UI changes. Not everything runs on all commits.
Mixed Runner Strategy: Most workflows use Blacksmith’s third-party hosted runners (blacksmith-4vcpu-ubuntu-2404, blacksmith-4vcpu-windows-2025). Blacksmith is a GitHub Actions API-compatible accelerated runner service using custom infrastructure, significantly faster than GitHub’s free runners. Only lightweight bot tasks (close-issues, close-prs, compliance-close, pr-standards, deploy) stay on GitHub’s native ubuntu-latest. Compute-intensive compilation, testing, and releases all go through Blacksmith, simple script tasks use GitHub’s native runners, allocating resources by task load.

Release/Delivery: Full Platform Coverage

publish.yml is the most complex workflow, handling the complete release process in a single file:

Version number calculation
CLI build matrix (multi-platform, multi-architecture)
Windows code signing (Azure Signing)
macOS code signing (Apple Developer)
Electron app builds
npm publishing
GitHub Release creation
AUR (Arch Linux) publishing

One workflow covers distribution for CLI, desktop apps, npm packages, and Linux packages. This “release everywhere at once” pattern is user-friendly—regardless of platform, everyone gets the new version on the same day.

Other release workflows are split by artifact type:

publish-github-action.yml — Listens for github-v* tags, publishes GitHub Action to Marketplace
publish-vscode.yml — Listens for vscode-v* tags, publishes to both VS Code Marketplace and Open VSX
containers.yml — Multi-architecture container image builds, pushes to GHCR
release-github-action.yml — Creates pre-releases when github directory changes on dev branch

Tag triggering is a good practice: releases are explicit actions, not triggered by accidental code pushes. publish.yml automatically builds snapshots when pushing to ci/dev/beta/fix branches, but official releases require manual dispatch or tags.

Automation/Bot: AI-Driven Community Governance

This is opencode’s most distinctive feature. Among the 16 automation workflows, multiple directly call upon opencode’s own AI capabilities to handle community affairs.

Issue Management

triage.yml — When a new issue is created, opencode AI automatically triages it, adding labels and categories.

duplicate-issues.yml — When a new issue is created/edited, opencode AI analyzes whether it duplicates existing issues. Also checks whether it follows one of three issue templates and whether it contains AI-generated content. Non-compliant issues get a needs:compliance label.

compliance-close.yml — Every 30 minutes, checks issues/PRs with needs:compliance label and auto-closes if not fixed within 2 hours. Different prompt messages are given for issues vs PRs when closing.

close-issues.yml — Closes stale issues daily at 2 AM UTC.

These four layers form complete issue lifecycle management:

1

New issue → AI triage → duplicate/compliance check → compliance grace period → stale cleanup

PR Management

pr-standards.yml is one of the longest workflows, doing two things:

Title format check: Enforces conventional commits format (feat/fix/refactor/…)
Template compliance check: PR description must include required sections like issue references, change type, verification method

Non-compliant PRs get a needs:compliance label and auto-close after 2 hours. Team members and bots are exempt.

pr-management.yml — Checks for duplicates when PR is created, adds labels for community contributors.

close-prs.yml — Closes PRs older than 1 month with insufficient reactions daily at 10 PM UTC. Default threshold is 2 reactions, configurable.

AI Code Review

review.yml — Input /review in PR comments, opencode AI analyzes code and leaves review comments on specific lines. Only available to repo owner/members.

opencode.yml — Input /oc or /opencode in issue or PR comments to trigger opencode AI for more general interactions.

These two workflows demonstrate the “AI as collaborator” approach: not fully automatic code review, but on-demand triggering with humans making final decisions in the loop.

Documentation & Maintenance

docs-update.yml — Every 12 hours, checks recent commits and uses opencode AI to determine if documentation needs updates.

generate.yml — Runs code generation scripts when pushing to dev, auto-commits changes.

beta.yml — Syncs beta branch hourly.

stats.yml — Updates download statistics to STATS.md daily.

Design Patterns Worth Adopting

1. Layered Governance

opencode doesn’t stuff all automation into one workflow, but splits it by responsibility. An issue goes through four workflows in relay from creation to closure. Each workflow does one thing, combining to form a complete governance chain.

Benefits of this design:

Individual workflows can be modified or disabled independently without affecting other steps
Each workflow’s trigger conditions and permission scope are minimized
Easy to locate which step has problems when they occur

2. Compliance Grace Period

compliance-close.yml doesn’t close immediately upon detecting non-compliance, but gives a 2-hour grace period. This is reasonable for global contributors in different time zones—you might submit an issue while sleeping, and wake up with time to fix it.

3. AI at Decision Points, Not Execution Points

triage, duplicate detection, and code review all have AI make initial assessments, with humans making final decisions. But execution-level tasks like code builds and releases don’t use AI at all. This is a pragmatic division: AI excels at pattern recognition and initial classification, but not precise execution.

4. Explicit vs Automatic Triggers

Releases use tag triggers, maintenance uses schedule triggers, governance uses event triggers. Three trigger types correspond to three different automation trust levels: releases need human confirmation, maintenance can be scheduled automatic, governance needs immediate response.

Risks of Over-Automation

opencode’s automation system is comprehensive, but there are points to watch:

Community barrier: New contributors submitting issues must follow specific templates, PRs must conform to conventional commits, otherwise auto-closed after 2 hours. For a 160k-star project, this strictness is reasonable—it filters out many low-quality contributions. But for small projects, this level of automation would scare away potential contributors.

Maintenance cost: 27 workflows means 27 automation scripts to maintain. opencode has custom runners and dedicated scripts. If a workflow’s logic needs adjustment, maintainers need to switch between GitHub Actions YAML and custom scripts.

AI uncertainty: duplicate-issues and triage use AI for judgment, but AI can misjudge. A reasonable issue marked as duplicate and closed creates a negative experience for contributors. opencode uses grace periods and manual review to mitigate this, but the risk remains.

Insights for Our Projects

Not every project needs 27 workflows. But opencode’s layered governance and “AI at decision points” approach are worth referencing:

Start with issue templates: If the project starts receiving lots of duplicate or low-quality issues, add templates and duplicate checking first, rather than manually handling each one.
Use grace periods for compliance checks: Always give a grace period when auto-closing non-compliant contributions.
Use AI for classification, not execution: Let AI help triage issues and check PR formats, but don’t let AI auto-merge code or publish releases.
Use tag triggers for releases: This is the safest approach. Automatic snapshot releases are acceptable, official versions need human confirmation.
Add on demand: Add automation only when you have pain points. opencode’s 27 workflows weren’t built in a day, but gradually added as community scale grew.

Summary

opencode’s GitHub Actions system demonstrates automation practices for large-scale open source projects: CI/CD covers full platform releases, community governance uses multi-workflow relay processing, AI is applied to decision points like triage and review. The core of this system is not technical complexity, but three principles: “layered, grace periods, explicit triggers”. For our own projects, we don’t need to copy all 27 workflows, but these principles can be directly applied.

OpenCode Optimization Beyond Configuration — Plugin-Based Optimization

Tue, 19 May 2026 10:00:00 +0800

I previously wrote an article OpenCode Configuration Optimization Record, which addressed token consumption and context management issues. However, configuration optimization handles “how the model runs,” while “the quality of code when it’s half-written” is something configuration cannot manage. This article starts from my development process of the opencode-review plugin, discussing how opencode-review helps an agent review and improve its own code within a session, resulting in higher quality code entering the PR.

Problem: Who Guards Code Quality Within a Session?

When using OpenCode to write code, a typical workflow is: the agent completes coding within a session, then I review the diff and create a PR. But I discovered a recurring problem: code written by agents often enters PRs with “first draft” quality issues.

These issues include: missing error handling, security vulnerabilities, poorly performing queries, and missing tests. If the agent could perform a self-review within the session—before the code is committed to the PR—many problems wouldn’t exist at the PR stage.

This is different from code review at the CI stage. I’ve already implemented CI review through opencode-actions (I previously wrote an introductory article)—it happens after PR creation, triggered by GitHub Actions. Later, Cloudflare also shared similar ideas in their engineering blog: using OpenCode to build large-scale AI code review. opencode-review aims to solve an earlier stage: within the session, before the PR, enabling the agent to proactively review and fix issues after writing code. The two complement each other: opencode-review raises the quality baseline of code entering the PR, while opencode-actions serves as the final checkpoint.

Specifically, there are three sub-problems to address:

Incomplete review coverage: Code generated by agents may introduce security vulnerabilities and performance issues, but they won’t proactively check for these
Lack of systematic review framework: Without structured dimensions to evaluate code, it’s easy to focus only on functional correctness while ignoring security and performance
Lack of closed loop between issue discovery and fixes: Even when the agent discovers problems, a mechanism is needed to automatically fix them rather than waiting for someone to point them out

Design of opencode-review

Based on these three problems, I designed opencode-review: a structured code review plugin.

Multi-Dimensional Analysis

The first design decision is why divide into five dimensions rather than a general “good or bad” evaluation.

Code quality is not a single dimension. A piece of code may be functionally correct and performant, but contain SQL injection vulnerabilities; or it may be secure and harmless, but lack test coverage. Evaluating them together inevitably leads to vague results.

Academically, the Modern Code Review (MCR) Survey collected code review research from 2013-2025, proposing a classification system covering multiple task dimensions including defect detection, security review, performance analysis, and maintainability assessment. Ericsson’s research team also verified in Automated Code Review Using Large Language Models at Ericsson that dimension-specific review is more effective in industrial scenarios than general review.

opencode-review’s five dimensions—code-quality, security, performance, testing, documentation—correspond to the core review dimensions identified in these studies. Each dimension can be independently toggled because different projects focus on different priorities: an internal tool may not need documentation review, but a security-sensitive service cannot skip the security dimension.

Severity Grading

The second design decision is why divide into three severity levels (critical / suggestion / highlight).

This comes from lessons learned in the static analysis tool domain. Security tools and linters have long faced a problem: alert fatigue. When all issues are marked as equally important, developers start ignoring them. Veracode’s research points out that the direct consequence of alert fatigue is that truly serious issues get drowned out in noise.

The logic of three levels is:

critical: Must fix (security vulnerabilities, logic errors, resource leaks)
suggestion: Suggested improvements (code readability, performance optimization, better practices)
highlight: Worth noting (style consistency, potential improvement space)

This way developers can prioritize handling critical issues without missing a SQL injection among a bunch of “consider refactoring” suggestions.

Auto-Fix Chain

The third design decision is why critical issues should automatically trigger fixes rather than just being reported.

This is a controversial design. Traditional review tools typically “report but don’t fix,” leaving fixes to developers. But opencode-review’s scenario is different—the code it reviews is itself just written by an AI agent, so having another agent fix it is reasonable.

Academically, this belongs to the Automated Program Repair (APR) domain. A Survey of LLM-based Automated Program Repair (arXiv 2506.23749) reviewed 63 LLM-based APR systems from 2022-2025, divided into four paradigms. Among them, the “analysis-augmented” paradigm—using static analysis to locate problems first, then using LLMs to generate fixes—was proven most effective. opencode-review’s auto-fix chain is essentially this paradigm: reviewer discovers critical issue → locates problem position → spawns fixer sub-agent → generates minimal fix.

An ICSE 2025 paper also points out that a key challenge for LLMs in APR is objective alignment—the goal of fixing is not “generate code that looks reasonable,” but “precisely solve the reported problem.” This is why opencode-review’s fixer is designed as minimal fix—making only the minimal modifications to solve the problem, no rewriting, no refactoring, no “convenient” other changes.

Hidden Benefit of Auto-Review: Continuous Improvement of Code Quality Baseline

The three designs above solve “discovering problems” and “fixing problems.” But auto-review has an easily overlooked benefit: it continuously raises the baseline of code quality inadvertently.

This effect comes from two mechanisms:

First, the shaping of code writers by review feedback. FSE 2022 research found in two years of industrial practice that when developers know their code will be automatically reviewed, they consciously follow standards more during the coding phase—because the cost of being pointed out afterward becomes lower, and the benefit of writing well upfront becomes higher. This is a nudge effect. In the AI agent scenario, this effect is stronger: the agent writes code in a session, gets reviewed and pointed out issues, fixes them, gets reviewed again—this cycle can complete multiple rounds within the same session. Each round of feedback corrects the agent’s output tendency, equivalent to an implicit fine-tuning process.

Second, direct quality accumulation from automatic fixes. Critical issues being automatically fixed means the code quality of each commit is higher than without review. This isn’t a one-time improvement, but continuous. Like lint rules in a codebase—at first they only prohibit obvious errors, but as rules accumulate, the overall style and quality of the codebase is unconsciously raised. The auto-fix chain does something similar: security vulnerabilities are automatically patched, resource leaks are automatically fixed, missing tests are automatically added. Over time, the codebase’s quality baseline naturally becomes higher than without auto-review.

Simply put: review is not the goal, quality improvement is. Auto-review turns “post-hoc inspection” into “in-process improvement.”

Cooldown Mechanism

There’s one more design detail: cooldown_seconds.

auto-review triggers when the session is idle, but idle events can trigger frequently (for example, when the agent is waiting for user confirmation, it also idles). Without cooldown, the same code might be reviewed several times, wasting tokens. The default 120-second cooldown period is an empirical value—enough for one round of modifications to complete, without waiting too long.

opencode-froggy: Another Approach

opencode-froggy (85 Stars, just released 0.12.0 yesterday) provides another approach. It doesn’t do structured multi-dimensional review, but instead provides 6 specialized agents (architect, code-reviewer, code-simplifier, doc-writer, partner, rubber-duck) and a flexible hooks system.

Froggy’s code-reviewer is a general read-only review agent that doesn’t distinguish dimensions or severity. But its hooks system is strong—you can configure session.idle events to automatically run lint, auto-format, or even intercept when writing sensitive files:

1
2
3
4
5
6
7
8


---
hooks:
 - event: session.idle
 conditions: [hasCodeChange, isMainSession]
 actions:
 - bash: "npm run lint --fix"
 - command: simplify-changes
---

This is a “developer orchestrates the workflow” approach, complementing opencode-review’s “out-of-the-box structured review.”

Comparison

	opencode-review	opencode-froggy
Review method	Structured multi-dimensional analysis	General code-reviewer agent
Severity grading	critical / suggestion / highlight	None
Auto-fix	critical issue → fixer sub-agent	code-simplifier, manual trigger
Trigger method	session idle + cooldown	hooks configuration
Custom rules	custom_rules supports project norms	None
Other features	None	6 agents + hooks + gitingest + blockchain

The two don’t conflict and can be installed together. My suggestion is: opencode-review for daily auto-review, froggy’s hooks for workflow orchestration.

Plugin Installation

The two plugins have different installation methods.

opencode-froggy supports direct installation via npm, just add to opencode.json:

1
2
3


{
 "plugin": ["opencode-froggy"]
}

opencode-review currently doesn’t have npm installation available yet, requires cloning and local linking:

1
2
3
4
5
6
7
8
9


# Clone to any location
git clone https://github.com/sun-praise/opencode-review.git /path/to/opencode-review

# Project-level installation (recommended)
mkdir -p .opencode/plugins
ln -s /path/to/opencode-review/src/index.ts .opencode/plugins/opencode-review.ts

# Or global installation
ln -s /path/to/opencode-review/src/index.ts ~/.config/opencode/plugins/opencode-review.ts

opencode-review also needs to create .opencode/review.json to configure review behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


{
 "language": "zh",
 "dimensions": ["code-quality", "security", "performance", "testing", "documentation"],
 "trigger": {
 "auto_on_idle": true,
 "cooldown_seconds": 120
 },
 "custom_rules": [
 "All API endpoints must have error handling",
 "Database queries must use parameterized statements"
 ]
}

Other Notable Plugins

The ecosystem already has over 70 plugins, here are a few more recommendations:

opencode-worktree: Zero-friction git worktree management
opencode-notify: Send system notifications when tasks complete
dynamic-context-pruning: Automatically prune outdated tool outputs, optimizing token usage
envsitter-guard: Prevent agents from reading .env sensitive files

See the complete list at awesome-opencode.

References

Modern Code Review (MCR) Survey — 2013-2025 code review research survey
Automated Code Review Using LLMs at Ericsson — Industrial practice of LLM-assisted code review
A Survey of LLM-based Automated Program Repair — LLM auto-fix survey, covering 63 systems
Aligning the Objective of LLM-Based Program Repair (ICSE 2025) — Objective alignment issues in LLM fixing
Understanding Automated Code Review Process (FSE 2022) — Two years of industrial environment auto-review experience
AI-Assisted Assessment in Modern Code Review (AIware 2024) — Deployment and evaluation of AutoCommenter
Code Review Agent Benchmark (c-CRAB) — AI agent code review benchmark
opencode-actions - a coding review agent — GitHub Action built on OpenCode, code review at CI stage
Cloudflare: Orchestrating AI Code Review at Scale — Cloudflare using OpenCode to build large-scale AI review

sth: An HTML Preview Server for AI Agents

Sat, 09 May 2026 12:00:00 +0800

I’ve open sourced a small tool: static-html, with the command-line name sth.

What it does is simple: it provides an HTTP service that lets you register locally generated HTML files and preview them in a browser.

Why This Tool Is Needed

The problem stems from AI Agent output.

Nowadays I use agents like Claude Code and OpenCode for my work, and they often need to output complex content—code review summaries, comparative analyses, quotations, architecture design documents. When this content is sent to Telegram as plain text, the formatting gets completely messed up, tables become unreadable, and code syntax highlighting is lost.

In short, it’s just a big mess.

The initial approach was to have agents directly generate HTML files locally and open them in a browser. But the problems were:

The agent runs on a server without a graphical interface
Locally generated file paths are unpredictable and management is chaotic
No history—previously sent content can’t be found

So I needed a service where an agent could “send” an HTML file and get back a URL that could be opened in any device’s browser. The agent would handle mobile and PC compatibility.

What sth Does

sth is a lightweight HTTP service written in Go with just two core commands:

1
2
3
4
5


# Start the service
sth start

# Send an HTML file
sth send ./report.html

sth send packages the target HTML file along with resource files from the same directory (CSS, JS, images, etc.) and uploads them, then returns a URL. Opening this URL displays the complete page effect.

In practice, it runs on my intranet development machine, and agents specify the remote address via the --server parameter:

1

sth send ./report.html --server http://dev-1:3939

My Actual Usage

Currently sth mainly runs on my development server, working in tandem with the Hermes Agent.

Hermes is my daily AI assistant running on Telegram. When it needs to output complex content—such as code review conclusions, technical solution comparisons, project quotations—it calls the html-report skill to generate a beautifully formatted HTML file, then sends it to the preview server via sth send, and finally sends me the URL.

The entire workflow is:

1
2
3
4


User question -> Hermes Agent analysis
 -> Generate HTML report (html-report skill)
 -> sth send to preview server
 -> Return URL -> Send to Telegram

This way I can tap the link on my phone and see a well-formatted report instead of a blob of plain text.

Metadata Management

Beyond basic sending and previewing, sth also supports tagging, categorizing, and associating sessions with projects:

1
2
3
4
5


sth tag <session-id> code-review pricing
sth categorize <session-id> "Technical Review"
sth project <session-id> hydrogen-permeation
sth list --project hydrogen-permeation
sth search "quotation" --tag pricing

This feature solves a practical problem: over time, sent reports accumulate. Through tags and project categorization, you can quickly find previous outputs.

The difference between list and search is: list matches metadata fields exactly, while search performs full-text search. They can be used in combination.

Technical Details

Language: Go 1.24+
Storage: SQLite (github.com/mattn/go-sqlite3, requires CGO)
Deployment: Single binary file, just manage with systemd
Build: go build -o dist/sth ./cmd/html-server

It’s just that simple, no unnecessary dependencies.

Open Source

This tool was previously a private repo, but I just made it public today: sun-praise/static-html.

If you’re also using AI Agents for daily development work and have encountered the problem where “complex agent output can’t be read in chat tools,” give sth a try. It’s lightweight enough and does what it needs to do.

DeepSeek + Claude Code: Thinking Block Compatibility Analysis

Thu, 30 Apr 2026 15:00:00 +0800

Problem Description

When using DeepSeek models (such as deepseek-v4-flash) directly in Claude Code with extended thinking enabled, multi-turn conversations trigger a 400 error:

1

Bad Request: {"error":{"message":"The content[].thinking in the thinking mode must be passed back to the API.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

Root Cause Analysis

Call Chain

1

Claude Code → DeepSeek Anthropic Compatible Endpoint (https://api.deepseek.com/anthropic)

Protocol Incompatibility

According to the DeepSeek Anthropic API Compatibility Documentation, the compatibility status is as follows:

Message Field	Support Status
`content[].thinking`	✅ Supported
`content[].redacted_thinking`	❌ Not Supported

In extended thinking mode during multi-turn conversations, Claude Code faithfully passes back all thinking blocks from the previous round (including redacted_thinking types) to the API as-is. DeepSeek does not recognize redacted_thinking, hence the 400 error.

Additionally, DeepSeek’s thinking block format differs from Anthropic’s native protocol, and the replay logic in tool_use scenarios is not fully compatible either.

Core Conflict

Anthropic API requirement: In extended thinking mode, content[].thinking and content[].redacted_thinking must be passed back unchanged
DeepSeek compatibility layer: Only supports thinking, does not support redacted_thinking
Claude Code behavior: Hard-coded according to Anthropic protocol, does not distinguish between target endpoint types

Community Feedback

This is a widespread community issue that almost all CC agent/router projects have encountered:

Issue	Project	Title
#1	cc-use	DeepSeek Thinking Mode Error: `content[].thinking` Must Be Passed Back
#878	openclaude	DeepSeek V4: reasoning_content must be passed back (400) on tool_calls
#1355	claude-code-router	CCR 代理 deepseek V4 思考时返回 400
#4543	new-api	ClaudeCode 接入 DeepSeek V4 遇到 400 reasoning_content 报错
#355	9router	DeepSeek API Error 400 – Missing reasoning_content
#16748	hermes-agent	DeepSeek /anthropic: stripped thinking blocks cause HTTP 400 on replay
#2414	cc-switch	Claude 使用 cc-switch 配置 deepseek-v4-pro，无法识别字段
#174	cc-haha	/compact 命令在使用 DeepSeek API 时无法工作

DeepSeek Official Response

Zero response. Nor is there any need to respond.

First, DeepSeek has no public API issue repository. All feedback occurs in third-party projects without any DeepSeek official personnel participating in any discussions.
Second, whether to use Anthropic as a compatibility standard, I think DeepSeek should be hesitant.

Temporary Workarounds

Disable extended thinking — When using DeepSeek in CC, turn off thinking mode
Use proxy filtering — Add a proxy layer between CC and DeepSeek to filter out redacted_thinking blocks
Switch models — Use DeepSeek for non-thinking scenarios and Anthropic native models for thinking scenarios

Why Doesn’t OpenCode Have This Problem?

OpenCode (opencode-ai/opencode) naturally avoids this problem architecturally, not through a dedicated “fix”.

The key lies in the convertMessages method in internal/llm/provider/anthropic.go (lines 60-119):

When building assistant messages, it only passes back TextContent (text) and ToolCall (tool calls)
Completely ignores ReasoningContent (thinking content), not putting it in messages
thinking content is only displayed in the UI through stream thinking_delta events and is not passed back to the API

Comparison with Claude Code’s behavior:

	Claude Code	OpenCode
thinking replay	✅ Faithfully replay all thinking blocks (including redacted_thinking)	❌ Do not replay thinking blocks
architectural reason	Follow Anthropic API specification, requires unchanged replay	Self-managed conversation state, thinking only for UI display
DeepSeek compatibility	❌ Triggers 400 (redacted_thinking not recognized)	✅ Not affected (doesn’t pass thinking at all)

Conclusion: OpenCode avoids the problem at the cost of not following Anthropic’s extended thinking specification. This approach is friendly to third-party compatible endpoints like DeepSeek, but if Anthropic native thinking context retention capability is needed in the future, re-implementation may be necessary.

Does Not Replay Thinking Blocks Affect DeepSeek Performance?

Basically no, reasons:

thinking blocks are the model’s internal scratchpad, not final output. The text replies and tool calls in the conversation history already retain key decisions and conclusions
DeepSeek’s reasoning is closer to OpenAI’s mode — each round is generated independently, unlike Anthropic’s strong reliance on cross-round replay to maintain reasoning coherence
OpenCode’s extensive actual use also confirms this — community users run multi-turn conversations using DeepSeek thinking mode in OpenCode without feedback about reasoning quality degradation

The truly potentially affected extreme scenario: in ultra-long multi-turn tasks, the model may repeat conclusions it has already reasoned through. However, in most actual use, the impact is negligible.

CC itself has similar thinking block replay bugs on Anthropic models (not DeepSeek-specific):

Issue	Title	Status
#10199	API Error 400 - Thinking Block Modification Error	Open (oncall)
#51985	thinking block missing in multi-turn conversations	Open
#20692	thinking blocks order error on first tool use	Open (oncall)
#54482	Thinking blocks stripped from context every turn (Opus 4.7)	Open

How to Fix DeepSeek Model Reasoning Issues in OpenCode

Fri, 24 Apr 2026 12:23:58 +0800

When using deepseek-reasoner, we often encounter this problem:

1

The reasoning_content' in the thinking mode must be passed back to the API.

Update

Both issues have now been officially resolved by opencode. Users only need to install the latest version of opencode and use it through the deepseek provider, without additional configuration.

1
2
3
4
5
6


Issue 1
The reasoning_content' in the thinking mode must be passed back to the API.

Issue 2
Bad Request: {"error":{"message":"The content[].thinking in the thinking mode must be passed back to the
API.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

Both issues have been officially resolved. Install version 1.14.29 or above.

The old solution follows:

How to solve it? It’s straightforward.

How to Configure

Add provider information to your configuration:

.config/opencode/opencode.json or .config/opencode/opencode.jsonc

Modify the provider section to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


{
 "provider": {
 "deepseek": {
 "npm": "@ai-sdk/anthropic",
 "name": "DeepSeek",
 "options": {
 "baseURL": "https://api.deepseek.com/anthropic",
 "apiKey": "<apikey>"
 },
 "models": {
 "deepseek-v4-pro": {
 "name": "DeepSeek-V4-Pro",
 "limit": {
 "context": 1048576,
 "output": 262144
 },
 "options": {
 "thinking": {
 "type": "enabled",
 "budgetTokens": 8192
 }
 }
 },
 "deepseek-v4-flash": {
 "name": "DeepSeek-V4-Flash",
 "limit": {
 "context": 1048576,
 "output": 262144
 },
 "options": {
 "thinking": {
 "type": "enabled",
 "budgetTokens": 8192
 }
 }
 }
 }
 }
 }
}

How to Use

Select the deepseek model.

The result.

Supplement

This method cannot solve this problem

Bad Request: {"error":{"message":"The content[].thinking in the thinking mode must be passed back to the API.","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

If you encounter this problem, you need to wait for opencode to fix it.

Related article: DeepSeek + Claude Code: Thinking Block Compatibility Issue Analysis — Analyzes the root cause of 400 errors triggered by multi-turn conversations in extended thinking mode when using DeepSeek with Claude Code, along with community solutions.

Does Self-Hosting an LLM Really Let You Use It Without Limits?

Thu, 19 Mar 2026 12:30:00 +0800

Many people start thinking seriously about self-hosting an LLM not because of technical romance, but because API bills, rate limits, or compliance requirements have started to collide with real business constraints.

So a very natural question shows up: if the model runs on your own machine, does that mean you can finally use it without limits?

My answer is: no. Self-hosting a model does not mean unlimited freedom. It mostly means that many of the constraints and costs previously absorbed by the platform are now transferred to you.

But there is a more useful second question: once usage gets large enough, can self-hosting actually become cheaper?

The answer is: possibly, but under stricter conditions than many people expect.

In short: self-hosting an LLM does not mean unlimited freedom.

It means taking on part of the cost and responsibility that a platform would normally absorb. Self-hosting becomes financially attractive only when load stays high, utilization remains strong, and you can either accept model trade-offs or optimize the stack yourself.

Local deployment does not mean no limits

Let us clear up the most common misunderstanding first.

Many people interpret “the model runs on my own machine” as “I can now use it however I want.” In reality, the limits do not disappear. They simply show up in a different form.

The first limit is hardware.

Parameter count, VRAM capacity, quantization level, KV cache, and concurrency are real physical constraints. Even a quantized 70B model still puts serious pressure on memory and bandwidth. Being able to run it does not mean it runs comfortably. Getting output does not mean latency and throughput are acceptable.

The second limit is model capability itself.

Hallucinations, knowledge cutoffs, long-context degradation, and unstable reasoning do not vanish just because the model sits on your own server. Deployment location does not change the model’s ceiling. More importantly, most so-called self-hosting setups use open-weight models, not the actual closed models behind systems like Claude or GPT.

The third limit is responsibility transfer.

When you use an API, content safety, service stability, rate limiting, and much of the infrastructure burden are partially handled by the provider. Once you self-host, those problems do not go away. They become your monitoring, your operations, your review pipeline, and your incident response.

So self-hosting is not “use without limits.” It is “you own the boundaries.”

The real calculation is not just the price of a GPU

If you want to know whether self-hosting is worth it, the real comparison is not “how much does the card cost?” but these two larger accounts.

The annual cost of self-hosting can be written roughly like this:

1

Annual self-hosting cost = hardware depreciation + electricity + network / hosting + operations labor + redundancy for failures

The annual API cost is more direct:

1

Annual API cost = average daily token usage * price per million tokens * 365

That looks simple, but three details are often ignored.

Self-hosting is not a one-time hardware purchase. Electricity, spare parts, hosting conditions, alerting, upgrades, and maintenance all keep happening.
API pricing is not a single fixed number. Model choice, input-output ratio, cache hit rate, and tool usage can all change the final bill significantly.
Utilization is easy to underestimate. If your machine sits idle most of the time, a low per-inference cost means very little. On the other hand, if the workload is stable and the hardware stays busy, the financial case for self-hosting becomes much stronger.

So the numbers below should be read as rough order-of-magnitude guidance, not as a procurement quote.

A rough but useful breakeven table

To keep the discussion simple, let us start with a deliberately rough set of assumptions:

API pricing is estimated at roughly CNY 50 per million tokens
token usage counts both input and output together
local hardware is depreciated over 3 years
self-hosting cost includes baseline power and operations overhead
the local setup mainly assumes open-weight model inference, not strict parity with top closed models
this does not include training, fine-tuning, or a dedicated platform team

Under those assumptions, you get a rough picture like this:

Scenario	Daily token usage	Likely local setup	Annual self-hosting cost	Annual API cost	Rough conclusion
Light usage	500K	Single high-end consumer workstation	CNY 20K - 40K	about CNY 9K	API is cheaper
Medium usage	5M	Dual-GPU or small inference workstation	CNY 60K - 120K	about CNY 91K	Near breakeven
Heavy usage	50M	Multi-GPU server or cluster	CNY 400K - 800K	about CNY 912K	Self-hosting may be cheaper

If you want local quality to get as close as possible to top-tier closed models, this table usually moves upward again, because stronger models, more VRAM, and higher availability targets all push infrastructure and operations costs higher.

This table points to three things.

Individuals and small teams usually do not save money with self-hosting. If your workload is only a few hundred thousand tokens per day, APIs are still usually the more economical option. You spend less on hardware and avoid carrying the operations burden.
The real breakeven point tends to appear only in consistently high-usage scenarios. Not one occasional spike, but a workload that stays high day after day. Only then can hardware cost be spread efficiently enough.
The larger the usage, the more attractive self-hosting becomes financially. That is why large companies invest seriously in inference platforms. It is not because they enjoy complexity. It is because once the scale is large enough, the math really changes.

One critical condition: you may not be comparing the same thing

The biggest problem in many “self-hosting is cheaper than API” discussions is not the arithmetic. It is that the compared products are often not equivalent.

On the API side, you may be buying access to a top-tier closed model. On the local side, you may be running a quantized open-weight model. Both are called “LLMs,” but they are not the same product in a strict sense.

That means:

if open-weight quality is acceptable for your use case, self-hosting may indeed save a lot of money
if your quality bar is high and you depend on the best closed models, the room for self-hosting becomes much smaller
if you compare a cheaper model to a more expensive model, the result is not just a deployment conclusion, but also a model-selection conclusion

Put differently, many people think they are calculating deployment cost when they are actually accepting a capability downgrade first.

There is nothing wrong with that trade-off, but it should be stated clearly.

What self-hosting gives you besides cost savings

If a company still chooses to self-host after doing the math, it is usually not only about saving API money.

Data control. Some businesses simply do not want raw data flowing through third-party providers for long-term operational or compliance reasons. Local deployment makes the compliance and audit path easier to manage.
Customization. You can optimize around your own tasks with quantization, routing, distillation, fine-tuning, and tighter integration into internal systems. Standard APIs usually give you less freedom here.
A more predictable cost ceiling. API pricing scales directly with usage. When the business grows, the bill grows with it. Self-hosting has a large upfront investment, but under high and stable load, the cost curve is often easier to predict.
Offline operation and availability. If your environment requires internal-only deployment, or if you cannot accept key workflows depending entirely on external services, local deployment may simply fit the engineering requirements better.

A more practical decision framework

If you do not want to model every variable from day one, start with these three questions.

Is your workload consistently high over time? If you only see occasional spikes rather than sustained token usage every day, APIs are often still the better choice because you are not paying for idle hardware.
Can you accept the gap between a local model and a closed flagship model? If your business depends on best-in-class model quality, a large part of the claimed savings may come from lowering model quality rather than from deployment efficiency alone.
Do you actually have the ability to operate an inference service long term? What happens when a GPU fails, drivers conflict, service latency spikes, the model version needs to change, or rate limiting and monitoring need to be built? If nobody owns these questions, the issue is no longer just cost. It becomes a delivery problem.

Conclusion

Back to the original question: does self-hosting an LLM really let you use it without limits?

My answer is still: no.

It does not remove hardware bottlenecks, erase model capability gaps, or magically solve moderation, reliability, and operations work for you. What it gives you is not absolute freedom, but more control and the responsibility that comes with it.

At the same time, self-hosting is absolutely not a fake option. It becomes increasingly reasonable when several conditions are true at once:

your token usage stays high for a long time
the workload is stable and hardware utilization remains high
open-weight models are acceptable, or you already have the ability to optimize them well
data control, internal deployment, or predictable cost ceilings matter to you

If you are an individual, a small team, or just an occasional heavy user, APIs are still usually the more practical answer: less effort, less operational burden, and lower cost of experimentation.

If you are already in the phase where you burn tokens steadily every day, then it is worth calculating the full picture instead of staring only at API unit prices. Very often the answer is not “now I can use it without limits,” but a more grounded question that matters more: is this worth owning yourself?