<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Machine Learning on Svtter's Blog</title><link>https://svtter.cn/en/categories/machine-learning/</link><description>Recent content in Machine Learning on Svtter's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Mon, 06 Apr 2026 21:49:34 +0800</lastBuildDate><atom:link href="https://svtter.cn/en/categories/machine-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>I'm Writing Articles Myself Again, and Some Thoughts on GPT-5.4</title><link>https://svtter.cn/en/p/im-writing-articles-myself-again-and-some-thoughts-on-gpt-5.4/</link><pubDate>Mon, 06 Apr 2026 21:49:34 +0800</pubDate><guid>https://svtter.cn/en/p/im-writing-articles-myself-again-and-some-thoughts-on-gpt-5.4/</guid><description>&lt;img src="https://svtter.cn/p/%E6%88%91%E8%BF%98%E6%98%AF%E8%87%AA%E5%B7%B1%E5%86%99%E6%96%87%E7%AB%A0%E4%BB%A5%E5%8F%8A%E5%AF%B9-gpt-5.4-%E7%9A%84%E4%B8%80%E4%BA%9B%E6%83%B3%E6%B3%95/cover.jpg" alt="Featured image of post I'm Writing Articles Myself Again, and Some Thoughts on GPT-5.4" /&gt;&lt;p&gt;I&amp;rsquo;ve gone back to writing articles &amp;ldquo;myself&amp;rdquo; again. The reason I say &amp;ldquo;myself&amp;rdquo; is:&lt;/p&gt;
&lt;p&gt;Actually, my recent articles were all written through conversations with DeepSeek, where I had DeepSeek generate the output.&lt;/p&gt;
&lt;p&gt;After generating the articles, I&amp;rsquo;d have Codex polish them. (But Codex&amp;rsquo;s polishing was absolute crap.)&lt;/p&gt;
&lt;p&gt;In between, I also tried having GPT-5.4 generate output—that is, communicate with me + write the first draft.&lt;/p&gt;
&lt;h2 id="the-problem"&gt;The Problem
&lt;/h2&gt;&lt;p&gt;The reason I stopped using GPT-5.4, despite looking like an incredibly powerful large model, is that its output was truly garbage. It had an overwhelmingly heavy AI flavor, read with a strong translation tone that was genuinely uncomfortable. Beyond the translation tone, another major issue was that it couldn&amp;rsquo;t express what I meant. In my view, Chinese is a language with rich semantics and nuance, so this kind of expression easily deviates from my own thoughts and intentions. I believe Chinese emphasizes subtle expression, not blunt straightforwardness. GPT-5.4 had a lot of blunt straightforwardness. It was very uncomfortable. I think readers would feel uncomfortable reading it too.&lt;/p&gt;
&lt;p&gt;But fundamentally, the main problem is the AI flavor. AI-generated articles universally have this AI flavor problem, and GPT-5.4 is the most obvious.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Recently, it&amp;rsquo;s probably because Codex has a 2x discount, so everyone wants to try it. Plus, Simple Codex&amp;rsquo;s Terminal Benchmark certification score has given people a lot more confidence.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The issue of not sounding human isn&amp;rsquo;t just my perspective. This is everyone&amp;rsquo;s complaint.&lt;/p&gt;
&lt;blockquote class="twitter-tweet"&gt;&lt;p lang="zh" dir="ltr"&gt;先什么时候能让gpt讲人话，而不是叽里咕噜讲一堆车轱辘废话，难绷。&lt;/p&gt;&amp;mdash; 竹筒Tom (@0xAzathoth_) &lt;a href="https://twitter.com/0xAzAzathoth_/status/2040752766860329461?ref_src=twsrc%5Etfw"&gt;April 5, 2026&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src="https://platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;
&lt;p&gt;In recent articles, when I explained in conversation &amp;ldquo;don&amp;rsquo;t be aggressive toward vendors,&amp;rdquo; it would write &amp;ldquo;this article isn&amp;rsquo;t targeting anyone.&amp;rdquo; A typical example is the later articles discussing LLM pricing.&lt;/p&gt;
&lt;p&gt;If it knew about the Chinese meme &amp;ldquo;I&amp;rsquo;m not targeting anyone, I&amp;rsquo;m saying everyone here is xx&amp;rdquo; (from a Stephen Chow movie). I don&amp;rsquo;t think it would express itself that way.&lt;/p&gt;
&lt;p&gt;So I&amp;rsquo;ve decided to write articles myself—I&amp;rsquo;ll take responsibility for the results.&lt;/p&gt;
&lt;h2 id="further-analysis---let-me-talk-about-other-things"&gt;Further Analysis - Let Me Talk About Other Things
&lt;/h2&gt;&lt;p&gt;GPT-5.4 has another obvious problem: I said not to do something, but it still does it. Or it outputs content saying it will do something, then doesn&amp;rsquo;t do it in the next step. If this appears in a longer multi-round conversation, I think it&amp;rsquo;s acceptable. But in the current situation, having just said it would do something in the previous sentence, then not doing it in the next step—this performance, I feel, isn&amp;rsquo;t good enough.&lt;/p&gt;
&lt;p&gt;ASI cares about not just &amp;ldquo;safety,&amp;rdquo; but actually &amp;ldquo;alignment.&amp;rdquo; SAM doesn&amp;rsquo;t understand this. Actually &amp;ldquo;not listening when told&amp;rdquo; is a failure of &amp;ldquo;alignment.&amp;rdquo; I don&amp;rsquo;t like Sam. This problem is actually a management problem. The safety team doesn&amp;rsquo;t get the promised 20% compute. So naturally alignment can&amp;rsquo;t be achieved.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll add some supporting materials later. Or open a new blog to discuss this.&lt;/p&gt;
&lt;p&gt;Regarding collaboration with OpenCode, rather than being more open, it&amp;rsquo;s actually targeted opposition. We users benefit from this. The harder vendors fight, the more users benefit.&lt;/p&gt;
&lt;p&gt;When Opus quotas were reduced, Codex immediately switched to token-based billing.&lt;/p&gt;
&lt;h2 id="a-few-words-about-doubao"&gt;A Few Words About Doubao
&lt;/h2&gt;&lt;p&gt;Also, Doubao is a typical passive-aggressive master. Whether in group chat or voice, it&amp;rsquo;s the same. I don&amp;rsquo;t know where the training data went wrong.&lt;/p&gt;
&lt;p&gt;Also, I didn&amp;rsquo;t expect the group chat assistant to get into arguments with people in the group 🤣&lt;/p&gt;
&lt;h2 id="supplement"&gt;Supplement
&lt;/h2&gt;&lt;p&gt;Happened to see Old Feng from Cloud Numbers also discussing this problem. &lt;a class="link" href="https://mp.weixin.qq.com/s/TINtWWri5ghccVnJ9BIEPw" target="_blank" rel="noopener"
&gt;&lt;em&gt;Yes, I Use AI to Write Articles&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;His articles don&amp;rsquo;t look as heavily AI-flavored. Maybe Opus is more suitable for writing.&lt;/p&gt;
&lt;p&gt;Additionally, if you include your own writing style in the prompt, it might further reduce the AI feeling.&lt;/p&gt;</description></item><item><title>Coding Performance and Model Cost-Effectiveness Analysis</title><link>https://svtter.cn/en/p/coding-performance-and-model-cost-effectiveness-analysis/</link><pubDate>Sat, 03 Jan 2026 00:00:00 +0000</pubDate><guid>https://svtter.cn/en/p/coding-performance-and-model-cost-effectiveness-analysis/</guid><description>&lt;img src="https://svtter.cn/p/%E7%BC%96%E7%A0%81%E6%80%A7%E8%83%BD%E4%B8%8E%E6%A8%A1%E5%9E%8B%E6%80%A7%E4%BB%B7%E6%AF%94%E5%88%86%E6%9E%90/pics/bg-new-v2.jpg" alt="Featured image of post Coding Performance and Model Cost-Effectiveness Analysis" /&gt;&lt;p&gt;This is my analysis report on the coding performance and cost-effectiveness of several models, used to compare the performance and cost efficiency of different models in coding tasks, in order to select the most suitable model.&lt;/p&gt;
&lt;iframe src="model-comparison.pdf" style="width:100%; height:85vh; border:0;"&gt;&lt;/iframe&gt;
&lt;p&gt;For Chinese language tasks, using GLM 4.7 is clearly more cost-effective. The price of 2000 RMB basically covers a year of usage.
The downside is that during peak hours, even the enterprise MAX version can be very slow.&lt;/p&gt;
&lt;p&gt;From my practical experience, the capabilities of minimax m2.1 far exceed those of GLM 4.7.&lt;/p&gt;</description></item></channel></rss>