LLMs on Svtter's Blog

CS146S is a Good Course

Mon, 15 Dec 2025 20:45:35 +0800

CS146S is a good course, one of the reasons is that it teaches modern software engineers how to better collaborate with AI. Secondly, it basically covers all my modern coding capabilities. (It’s a joke!)

In the following content, I will embed the slides from the course as hyperlinks in my text. If you’re interested, you can click the hyperlinks directly to open the corresponding slides.

Basic Techniques

I think everyone, like me, has already mastered the basic capabilities. More clear and explicit prompts let LLMs execute instructions unambiguously. Additionally, there are prompt optimization techniques, and using Claude to optimize prompts.

The course also talked about how to build coding agents, emphasizing that you can use the Claude Code SDK. It’s now called Claude Agent SDK.

To enhance LLM capabilities, you can also use MCP services. I built git-mcp, and there’s also an unopen-sourced experimental startup MCP.

MCP a bit deeper (content from the PPT)

With MCP, it’s worth noting the Host/Server/Client concept. Many Hosts are not open-source. Deepchat’s Host can be referenced.

Limitations:

1
2
3


Agents don't handle many tools very well today
APIs eat up your **context** window quickly
Design APIs to be AI-native rather that rigid

IDE Agent

From the IDE perspective, I’ve switched from frequently using Cursor to using Claude Code + VSCode for programming. I feel Claude Code as a CLI is more powerful. However, I haven’t used Cursor for a while, so I don’t know if there have been some improvements. Trae’s solo mode is just like that, basically insufficient intelligence is the biggest problem. Trae CN.

Additionally worth mentioning is that Silas Alberti, Head of Research Cognition’s slides are very powerful.

This summary diagram is awesome. Is it really free to watch?

This article also mentions the concept of parallel agents.

So for me, the next direction to improve is cloud + async.

This is Silas Alberti’s advice:

devin and Claude Code Cloud are exactly the same. Actually, you can completely use Claude Code Cloud version for vibe coding.

Agent Manager

Engineers need to become agent managers, not just software engineers.

Under the Claude Code designer mindset, the software design process should be:

Provide high level requirements 🟩
Convert requirements into a design doc 🟩/🟦
Implement solution from doc 🟦
Add tests 🟦
Ensure CI (continuous integration) passes 🟦
Code review 🟦
Update docs 🟦

My habit is more to write simple requirements, then generate design, then let Claude Code implement the rest itself.

I recently found it’s not that capable. I adopted a test-driven development approach to ensure every step is done correctly. Otherwise, CI and Add tests actually have no meaning.

Techniques for directing agents:

Agent behavior files (Claude.md/Cursorrules/agents.md)
Hooks
Commands
Subagents

I’ve already used subagents and commands a lot. But I haven’t found a killing scenario for hooks yet.

Best practice Claude Code

What I want to say is to use subagents as much as possible to avoid the “lost in the middle” phenomenon.

Claude Code CLI

Why did I buy Claude Code?

We can do more things through the SDK:

1
2
3
4


claude -p \
 "what did i do this week?" \
 --allowedTools Bash(git log:*)
 --output-format stream-json

Conclusion

This course is free, but the insights inside surpass most paid courses. If you can understand and quickly absorb it, don’t be stingy with your time, learn it.

Why Agent

Tue, 30 Sep 2025 11:54:06 +0800

I’ve always had a question: Why do we need agent frameworks? Aren’t large models enough on their own? This article reflects my current understanding of the subject.

After using several tools extensively and participating in multiple agent projects recently, I’ve reached some conclusions.

The Limitations of LLMs

The primary reason for using agents is the inherent limitations of LLMs.

First and foremost is the context window, as explicitly mentioned in langchain/subagent. Although many modern models have significantly expanded context windows (GPT-4 Turbo 128K, Claude-3.5 Sonnet 200K, Gemini-1.5 Pro up to 2M), they are still insufficient for truly complex tasks. For example, processing a massive codebase or analyzing hundreds of documents quickly exhausts these limits. Furthermore, processing extremely long contexts is both expensive and slow.

Beyond context, there are other capability gaps:

Vision Capabilities: While modern VLMs (Vision Language Models) are powerful, traditional CV (Computer Vision) models often perform better in specific scenarios. Additionally, some models (like DeepSeek-V3) don’t have native vision capabilities.
Resource Access: LLMs cannot directly interact with databases, file systems, or network services.
Specialized Tools: Tools for code execution, complex mathematics, or data analysis require protocols like MCP to be accessible to an LLM.

What Agents Can Do

Beyond addressing the limitations above, here are some practical ways agents add value.

Domain-Specific Text Processing

Agents can process different text segments (contexts) independently.

Context Optimization: Agents can compress or selectively provide context, effectively extending the usable context window.
Performance Gains: An LLM within an agent can focus on a single, specific task, leading to better performance. When given too much text, LLMs often struggle to identify key information; smaller, targeted context makes this much easier.
Specialized Knowledge: LLMs are trained on general data. To make an agent a domain expert, we can inject specific knowledge directly into its context.

Visual Capability Integration

Through agents, we can integrate traditional vision models to handle tasks that LLMs struggle with. For example, using an MCP (Model Context Protocol) to bridge an agent with vision capabilities.

A notable example is Zhipu’s Vision MCP. Using this MCP in conjunction with an agent significantly enhances visual processing power. This highlights the value of MCP servers that integrate specialized services.

Agent Frameworks

Pydantic AI: I find this particularly useful because it integrates Pydantic models into the agent framework, making it much easier to debug. I’ve tested its integration with Qwen3.
LangChain: I haven’t used this in production, only for basic debugging. The API changes frequently, which can be challenging. One minor issue is prompt handling; I used Jinja to solve this. Alternatively, the “LangChain way” involves using PromptTemplates.

Using uv to publish Python packages

Tue, 03 Jun 2025 15:54:28 +0800

1
2
3


[build-system]
requires = ["setuptools>=42", "wheel", "uv>=0.6.0"]
build-backend = "setuptools.build_meta"

1

uv build

1

python -m twine upload

1
2
3


[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

1
2
3
4
5
6


[tool.pdm]
distribution = true

[tool.pdm.version]
source = "file"
path = "src/spback/__init__.py"

Since migrating from pdm to uv, besides dependency management, I also wanted to use uv for publishing packages.

Method 1

LLMs provided a solution, suggesting to add the following content in pyproject.toml:

After adding this content, we run:

Then run:

The package can then be published.

Method 2

Since there are many projects using pdm, directly modifying pdm can also cause significant inconvenience.

You can still use pdm as the build-system but use uv as the package management tool.

In other words:

even

Some Thoughts

LLMs are already quite powerful. However, LLMs cannot guarantee the accuracy of generated content, requiring human verification. Therefore, the human who verifies the output is essential.

This code must be verified by a human to work. Of course, if it’s merely about modifying content, LLMs can collaborate with us, in the form of a cursor.

Deployment of Dify 1.2.0

Tue, 22 Apr 2025 11:20:02 +0800

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


git clone https://github.com/langenius/dify
cd dify/docker
cp .env.example .env
docker compose up -d
```I believe hackers should abandon the idea of building agents from code and fully embrace workflow platforms like Dify. This approach is many times more efficient than writing code. If you must write code, you can develop plugins to embed into Dify.

What is Dify? A workflow platform designed for LLMs.

<script src="/js/repo-card.js"></script>

<!-- inside body, where you want to create the card -->
<div class="repo-card" data-repo="langgenius/dify"></div>

## Deployment Method

Simply execute the following code on your server.## Deployment Issues

Although Dify is an open-source project, being relatively new, it often encounters various unusual problems.

### Plugin Restart Problem

When using Dify 1.2.0, the Dify plugin daemon would continuously restart. Refer to this [issue](https://github.com/langgenius/dify/issues/17788) for details.

> Interestingly, in this issue, the problem was solved by AI.

### Protocols Problem

`http ... https`

Adjust the `FILE_URLS` variable.

## Plugins

To utilize certain features, I developed a Dify plugin for file compression.

<script src="/js/repo-card.js"></script>

<!-- inside body, where you want to create the card -->
<div class="repo-card" data-repo="svtter/filecompress"></div>

## Resource Attribution

- Images sourced from [chatgpt-lab](https://chatgpt-lab.com/n/n12d18abb26c8?gs=a6ed475ccea2)

Work With Langfuse

Mon, 21 Apr 2025 14:51:38 +0800

When developing LLM applications, we consider performance issues during LLM calls and monitor outputs during the process.

At this point, tools like LangSmith and Langfuse become very useful.

However, sometimes we have local computing resources and prefer not to use cloud-based resources for LLM call monitoring, so we might not consider LangSmith.

In such cases, we can use Langfuse for this purpose.

Deployment

Deploying Langfuse is very simple; all you need to do is:

1
2
3


git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d

This way, the deployment is successful.

Replacement

If you previously used OpenAI’s SDK, you can continue using it as follows.

Install langfuse in the project:

1

pip install langfuse

To configure the API key, you need to use it in the deployed langfuse:

1
2
3


LANGFUSE_SECRET_KEY=<secret key>
LANGFUSE_PUBLIC_KEY=<public key>
LANGFUSE_HOST="http://localhost:3001"

Here I have set the Langfuse port to 3001; you should adjust according to your own configuration.

Simply replace the original OpenAI configuration:

1
2
3


# remove: import openai

from langfuse.openai import openai

In addition, langfuse also supports langchain and llamaindex, which will not be elaborated on further here.

Thoughts

Coze is also developing a large model agent framework, but the approach is quite different. Coze is building everything, including workflows and LLMs, making it relatively closed.

However, langfuse is more open, allowing the use of langchain and other models.

As a developer from a small company, I prefer the langfuse model because it offers more choices. However, if the project timeline is tight and Coze is barely usable, I would choose Coze.

Issues

An exception occurred when I replaced the OpenAI SDK:

1

Unexpected error occurred. Please check your request and contact support: https://langfuse.com/support.

I still encountered issues when testing test_langfuse.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


import os

from langfuse.decorators import observe
from langfuse.openai import openai


@observe()
def story():
 return (
 openai.chat.completions.create(
 model="moonshot-v1-auto",
 max_tokens=100,
 messages=[
 {"role": "system", "content": "You are a great storyteller."},
 {"role": "user", "content": "Once upon a time in a galaxy far, far away..."},
 ],
 )
 .choices[0]
 .message.content
 )


@observe()
def main():
 return story()


def test_langfuse():
 assert os.getenv("OPENAI_BASE_URL") is not None
 assert os.getenv("OPENAI_API_KEY") is not None
 main()

Regarding this issue, I have opened a discussion.

Additionally, if you wish to view the original code, you can obtain it from https://github.com/svtter/pdf-reader.

RAG with LlamaIndex and Ollama

Sun, 09 Mar 2025 12:44:24 +0800

If you want to build a RAG system locally, we can use ollama as the base model and llamaindex to construct the agent.

Since llamaindex defaults to using OpenAI, we first need to adjust the default embedding model and LLM model.

1
2


 Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1])
 Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0)

The base_url needs to be replaced with your own ollama instance, such as http://localhost:11434.

If the files in the directory are all txt or md data, you can directly use SimpleDirectoryReader to read the basic data.

1
2


 # Create a RAG tool using LlamaIndex
 documents = SimpleDirectoryReader("data").load_data()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.ollama import OllamaEmbedding

def get_agent(model_name: str):
 Settings.embed_model = OllamaEmbedding(model_name=model_name, base_url=sdmicl[1])
 Settings.llm = Ollama(model=sdmicl[0], base_url=sdmicl[1], request_timeout=360.0)

 # Create a RAG tool using LlamaIndex
 documents = SimpleDirectoryReader("data").load_data()
 index = VectorStoreIndex.from_documents(documents)
 query_engine = index.as_query_engine()


 async def search_documents(query: str) -> str:
 """Useful for answering natural language questions about an personal essay written by Paul Graham."""
 response = await query_engine.query(query)
 return str(response)


 agent = FunctionAgent(
 name="Agent",
 description="Useful for multiplying two numbers and searching documents",
 tools=[multiply, search_documents],
 llm=ollama,
 system_prompt="You are a helpful assistant that can multiply two numbers and search documents to answer questions",
 )
 return agent

async def main():
 models = ('bge-m3', 'nomic-embed-text',)

 for model_name in models:
 print(f'model: {model_name}')
 agent = get_agent(model_name=model_name)
 response = await agent.run("What did the paul graham do in college? Also, what's 7 * 8?")
 print(str(response))
 print("Done.")
 print('-' * 100)

await main()

Openrouter Usage

Mon, 03 Mar 2025 11:45:12 +0800

Zhou Tian developed an application based on a large model using OpenRouter and encountered some issues, documenting a few insights.

No Support for Embeddings

The biggest issue is the lack of support for the embedding API. Although OpenRouter already supports API endpoints for various models like OpenAI, embeddings are crucial for developing RAG applications. The absence of embedding support renders OpenRouter ineffective in practical application development.