<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Meter Reading on Svtter's Blog</title><link>https://svtter.cn/en/tags/meter-reading/</link><description>Recent content in Meter Reading on Svtter's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Thu, 19 Jun 2025 16:34:32 +0800</lastBuildDate><atom:link href="https://svtter.cn/en/tags/meter-reading/index.xml" rel="self" type="application/rss+xml"/><item><title>Poor Performance of Large Models on Specific Tasks</title><link>https://svtter.cn/en/p/poor-performance-of-large-models-on-specific-tasks/</link><pubDate>Thu, 19 Jun 2025 16:34:32 +0800</pubDate><guid>https://svtter.cn/en/p/poor-performance-of-large-models-on-specific-tasks/</guid><description>&lt;img src="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/bg.png" alt="Featured image of post Poor Performance of Large Models on Specific Tasks" /&gt;&lt;p&gt;Vision large models perform poorly on some specific tasks but perform better with formatted text. Here, I use the localization of meter reading areas as an example to demonstrate the performance of large models.&lt;/p&gt;
&lt;h2 id="source-code"&gt;Source Code
&lt;/h2&gt;&lt;p&gt;&lt;a class="link" href="https://github.com/Svtter/vl-model/pull/4" target="_blank" rel="noopener"
&gt;https://github.com/Svtter/vl-model/pull/4&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="test-tasks"&gt;Test Tasks
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;Extract text boxes from the image.&lt;/li&gt;
&lt;li&gt;Extract the meter reading area from the image.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="test-file"&gt;Test File
&lt;/h2&gt;&lt;p&gt;&lt;img src="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/meter-2.jpg"
width="1280"
height="1707"
srcset="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/meter-2_hu_771f0f2490b85ed1.jpg 480w, https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/meter-2_hu_8848c0cff3902819.jpg 1024w"
loading="lazy"
alt="Original Meter"
class="gallery-image"
data-flex-grow="74"
data-flex-basis="179px"
&gt;&lt;/p&gt;
&lt;p&gt;We can observe the performance differences among various models from these test results:&lt;/p&gt;
&lt;h2 id="test-results-comparison"&gt;Test Results Comparison
&lt;/h2&gt;&lt;h3 id="results-using-bounding-boxes-as-prompts"&gt;Results Using Bounding Boxes as Prompts
&lt;/h3&gt;&lt;p&gt;&lt;img src="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image.png"
width="1280"
height="1707"
srcset="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_hu_41dee455ef817364.png 480w, https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_hu_4c0f4ae31905bed5.png 1024w"
loading="lazy"
alt="Overall Test Results"
class="gallery-image"
data-flex-grow="74"
data-flex-basis="179px"
&gt;&lt;/p&gt;
&lt;h3 id="detailed-performance-of-each-model"&gt;Detailed Performance of Each Model
&lt;/h3&gt;&lt;h4 id="anthropic-claude-35-sonnet"&gt;Anthropic Claude 3.5 Sonnet
&lt;/h4&gt;&lt;p&gt;&lt;img src="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_anthropic_claude-3.5-sonnet.png"
width="187"
height="56"
srcset="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_anthropic_claude-3.5-sonnet_hu_fef09b134291fdf1.png 480w, https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_anthropic_claude-3.5-sonnet_hu_d1e74a2cb60d339a.png 1024w"
loading="lazy"
alt="Claude 3.5 Sonnet Test Results"
class="gallery-image"
data-flex-grow="333"
data-flex-basis="801px"
&gt;&lt;/p&gt;
&lt;h4 id="google-gemini-25-pro"&gt;Google Gemini 2.5 Pro
&lt;/h4&gt;&lt;p&gt;&lt;img src="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_google_gemini-2.5-pro.png"
width="690"
height="142"
srcset="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_google_gemini-2.5-pro_hu_75fca6815db4fee4.png 480w, https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_google_gemini-2.5-pro_hu_50b7c46ce946b5fc.png 1024w"
loading="lazy"
alt="Gemini 2.5 Pro Test Results"
class="gallery-image"
data-flex-grow="485"
data-flex-basis="1166px"
&gt;&lt;/p&gt;
&lt;h4 id="openai-gpt-4o"&gt;OpenAI GPT-4o
&lt;/h4&gt;&lt;p&gt;&lt;img src="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_openai_gpt-4o.png"
width="120"
height="60"
srcset="https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_openai_gpt-4o_hu_e7a4998fc04bc3f0.png 480w, https://svtter.cn/p/poor-performance-of-large-models-on-specific-tasks/pics/cropped_image_openai_gpt-4o_hu_3305e7a6fcb0125a.png 1024w"
loading="lazy"
alt="GPT-4o Test Results"
class="gallery-image"
data-flex-grow="200"
data-flex-basis="480px"
&gt;&lt;/p&gt;
&lt;h2 id="analysis-summary"&gt;Analysis Summary
&lt;/h2&gt;&lt;p&gt;From these test results, we can observe:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Differences in Visual Recognition Capabilities&lt;/strong&gt;: Different models exhibit significant performance variations when handling the same visual task.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Formatted Text Processing&lt;/strong&gt;: Compared to visual tasks, models perform more stably when processing structured text.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Characteristics&lt;/strong&gt;: Each model has its unique strengths and limitations.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These results remind us to evaluate the suitability of AI models based on specific task types when making selections.&lt;/p&gt;</description></item></channel></rss>