Messy Handwriting OCR: A Comparative Study Between Aya-Vision-8B and Qwen2VL-OCR-2B

In the world of optical character recognition (OCR), understanding how different models perform under varied conditions is crucial for improving the accuracy of automated text extraction, especially when dealing with messy or hard-to-read handwriting. This article compares two prominent OCR models — Aya-Vision-8B and Qwen2VL-OCR-2B — to understand their capabilities in recognizing and transcribing handwritten text from images.

The experiment explores the performance differences between these two models, focusing on how well a more complex 8B-parameter model (Aya-Vision-8B) stacks up against a smaller, 2B-parameter OCR model (Qwen2VL-OCR-2B). Both models were tested on a set of diverse images with varying levels of handwriting clarity. The goal of this comparison is not to make a definitive judgment but to observe how lower-parameter models perform in a domain traditionally dominated by large, high-parameter models.

OCR Test Results

This section presents the results of testing each model on a series of messy handwriting samples, with the OCR output from both models compared for accuracy. Here’s how the models performed:

Test Sample 1:

Aya-Vision-8B: “The snowman feels positive today. The snowman made himself a large fluffy dog. The show man with his hooves and scarf feels happy and a little girly. You give him a hug. He likes it. He lives in a lovely igloo. He is the very tidy.”
Qwen2VL-OCR-2B: “the snow man feels positive every day, the snow man made him self a large fluffy dog. the snow man is happy with his hat and scarf. he feel happy and gives you a hug. he lives in a lonely igloo. he is very grumpy.”

Test Sample 2:

Aya-Vision-8B: “Dear Dad, I am really crediting school. It’s making me very hard with all my stuff. I simply can’t think of anything I need, so if you want, you could just send me a card. I would love to hear from you. Love, [Your Name]”
Qwen2VL-OCR-2B: “Dear Dad, School is really great. I am making lots of friends and studying very hard. With all my stuff, I simply can’t think of anything I need. So if you want, you can just send me a card, and I would love to hear from you. Love, Your Son”

Test Sample 3:

Aya-Vision-8B: “NOTE 1 Here goes the text. It’s a handwritten note with some scribbles and cursive writing. The text seems to be a mix of thoughts or reminders, possibly related to personal growth or advice.”
Qwen2VL-OCR-2B: “Note 1 Please take these pills three times a day, and you should begin to feel much better.”

Test Sample 4:

Aya-Vision-8B: “Records. Archaeology has records of sports-like activities in the lives of those cultures.”
Qwen2VL-OCR-2B: “records. Archaeology has a long history of sports-like activities in the cultures.”

Test Sample 5:

Aya-Vision-8B: “The lion laughed at the mouse and let him go. A few days later, the same lion was caught in a hunter’s net.”
Qwen2VL-OCR-2B: “The lion laughed at the mouse and let him go. A few days later, the same lion was caught in a hunter’s net.”

What Undercode Says:

The experiment reveals several key insights into the capabilities of both models when dealing with messy handwriting. While Aya-Vision-8B, with its 8B parameters, demonstrates impressive results on many of the test samples, the Qwen2VL-OCR-2B, with its 2B parameters, does surprisingly well in some cases, matching or even surpassing Aya-Vision-8B in terms of accuracy.

The Aya-Vision-8B model appears to struggle with certain handwriting styles, particularly when the handwriting is more scribbled or cursive, such as in Test Samples 3 and 6. In these cases, Aya-Vision-8B generates relatively accurate but somewhat garbled text, especially when the original handwriting is unclear. The model often provides long descriptions that, while detailed, can be inaccurate in places where the handwriting is too complex for its analysis.

On the other hand, Qwen2VL-OCR-2B shows a more robust ability to handle simpler handwriting, though it sometimes sacrifices accuracy for fluency. In Test Samples 1 and 2, the model introduces some minor spelling or grammatical errors, but overall it provides more cohesive transcriptions. The model’s smaller parameter count seems to affect its ability to resolve complex ambiguities in handwriting, especially when dealing with cursive and poorly written texts.

Moreover, Qwen2VL-OCR-2B tends to perform better in cases where there is a clearer structure or less ambiguity in the handwriting. This suggests that lower-parameter models like Qwen2VL-OCR-2B might excel in specific scenarios where the handwriting is cleaner or less messy but might fall short in more chaotic handwriting styles.

From a broader perspective, this experiment highlights an important trend: models with fewer parameters can still perform competitively against more advanced models, especially in controlled environments or with less complicated data. However, for more unpredictable handwriting or harder-to-read text, larger models like Aya-Vision-8B have an edge due to their greater complexity and ability to handle more intricate patterns in handwriting.

In practice, the Qwen2VL-OCR-2B

Fact Checker Results:

Accuracy of Extracted Text: Both models show promising results, but Aya-Vision-8B generally provides more coherent text, though it can be less accurate in interpreting highly messy handwriting.
Performance of Qwen2VL-OCR-2B: Despite being a smaller model, Qwen2VL-OCR-2B manages to perform well on clearer handwriting samples and is close to Aya-Vision-8B in many cases.
Overall Comparison: Aya-Vision-8B is better suited for more complex OCR tasks, while Qwen2VL-OCR-2B shows potential for real-time applications with less challenging handwriting.

References:

Reported By: https://huggingface.co/blog/prithivMLmods/aya-vision-vs-qwen2vl-ocr-2b
Extra Source Hub:
https://www.facebook.com
Wikipedia
Undercode AI

Image Source:

Pexels
Undercode AI DI v2

Join Our Cyber World:

💬 Whatsapp | 💬 Telegram

Listen to this Post