I'm Writing Articles Myself Again, and Some Thoughts on GPT-5.4

Mon, 06 Apr 2026 21:49:34 +0800

I’ve gone back to writing articles “myself” again. The reason I say “myself” is:

Actually, my recent articles were all written through conversations with DeepSeek, where I had DeepSeek generate the output.

After generating the articles, I’d have Codex polish them. (But Codex’s polishing was absolute crap.)

In between, I also tried having GPT-5.4 generate output—that is, communicate with me + write the first draft.

The Problem

The reason I stopped using GPT-5.4, despite looking like an incredibly powerful large model, is that its output was truly garbage. It had an overwhelmingly heavy AI flavor, read with a strong translation tone that was genuinely uncomfortable. Beyond the translation tone, another major issue was that it couldn’t express what I meant. In my view, Chinese is a language with rich semantics and nuance, so this kind of expression easily deviates from my own thoughts and intentions. I believe Chinese emphasizes subtle expression, not blunt straightforwardness. GPT-5.4 had a lot of blunt straightforwardness. It was very uncomfortable. I think readers would feel uncomfortable reading it too.

But fundamentally, the main problem is the AI flavor. AI-generated articles universally have this AI flavor problem, and GPT-5.4 is the most obvious.

Recently, it’s probably because Codex has a 2x discount, so everyone wants to try it. Plus, Simple Codex’s Terminal Benchmark certification score has given people a lot more confidence.

The issue of not sounding human isn’t just my perspective. This is everyone’s complaint.

先什么时候能让gpt讲人话，而不是叽里咕噜讲一堆车轱辘废话，难绷。
— 竹筒Tom (@0xAzathoth_) April 5, 2026

In recent articles, when I explained in conversation “don’t be aggressive toward vendors,” it would write “this article isn’t targeting anyone.” A typical example is the later articles discussing LLM pricing.

If it knew about the Chinese meme “I’m not targeting anyone, I’m saying everyone here is xx” (from a Stephen Chow movie). I don’t think it would express itself that way.

So I’ve decided to write articles myself—I’ll take responsibility for the results.

Further Analysis - Let Me Talk About Other Things

GPT-5.4 has another obvious problem: I said not to do something, but it still does it. Or it outputs content saying it will do something, then doesn’t do it in the next step. If this appears in a longer multi-round conversation, I think it’s acceptable. But in the current situation, having just said it would do something in the previous sentence, then not doing it in the next step—this performance, I feel, isn’t good enough.

ASI cares about not just “safety,” but actually “alignment.” SAM doesn’t understand this. Actually “not listening when told” is a failure of “alignment.” I don’t like Sam. This problem is actually a management problem. The safety team doesn’t get the promised 20% compute. So naturally alignment can’t be achieved.

I’ll add some supporting materials later. Or open a new blog to discuss this.

Regarding collaboration with OpenCode, rather than being more open, it’s actually targeted opposition. We users benefit from this. The harder vendors fight, the more users benefit.

When Opus quotas were reduced, Codex immediately switched to token-based billing.

A Few Words About Doubao

Also, Doubao is a typical passive-aggressive master. Whether in group chat or voice, it’s the same. I don’t know where the training data went wrong.

Also, I didn’t expect the group chat assistant to get into arguments with people in the group 🤣

Supplement

Happened to see Old Feng from Cloud Numbers also discussing this problem. Yes, I Use AI to Write Articles.

His articles don’t look as heavily AI-flavored. Maybe Opus is more suitable for writing.

Additionally, if you include your own writing style in the prompt, it might further reduce the AI feeling.

Coding Performance and Model Cost-Effectiveness Analysis

Sat, 03 Jan 2026 00:00:00 +0000

This is my analysis report on the coding performance and cost-effectiveness of several models, used to compare the performance and cost efficiency of different models in coding tasks, in order to select the most suitable model.

For Chinese language tasks, using GLM 4.7 is clearly more cost-effective. The price of 2000 RMB basically covers a year of usage. The downside is that during peak hours, even the enterprise MAX version can be very slow.

From my practical experience, the capabilities of minimax m2.1 far exceed those of GLM 4.7.