| |
Over the past year, I have attempted to use deepchat and large model APIs (such as k2 thinking turbo) to build a relatively private chat tool (or agent assistant) for handling some private data. However, the overall experience has not been great. The large models often provide incorrect answers.
For search capabilities, I used the bocha API, resetting 10 credits to provide search functionality for the large model.
Test Questions
I feel there are still some issues with context handling (within a single chat window). I briefly tested this question: Which is the most expensive model on Silicon Flow?.
The answer is:

Kimi k2 thinking turbo
First, deepchat:

Hmm, incorrect.
Then, kimi official:

Also incorrect.
Trying deepseek
First, let’s try the client.

Incorrect.
Then, deepseek official.

Very close, and the answer seems reasonable. Unfortunately, it’s still incorrect.
If we ask ChatGPT directly
Hiss, a bit off. Let’s try gpt-5.
Prompt:

Inference - Reasons for Poor Performance
- Insufficient search capability. The Bocha API is to blame.
- Different models may have different optimal hyperparameters for best performance. I called the large model API from Silicon Flow.
Conclusion
- For this specific problem, ChatGPT still performs better. Compared to before, the official search + model combination also seems to perform better. Therefore, unless the data is particularly sensitive, it’s better to use the official service.
- This article is for reference only, just for fun.
