Sarvam AI’s Vision Model Beats Google Gemini in OCR Benchmarks, Draws Global Attention

Sarvam AI’s Vision Model Beats Google Gemini in OCR Benchmarks, Draws Global Attention
Deepanker Verma February 8, 2026 Internet

Add Techlomedia as a preferred source on Google. Preferred Source

When people talk about advanced AI models, the focus is usually on the US and China. But there are a few Indian companies that are actually doing an impressive job in this field. This week, Sarvam AI got attention for releasing updates to two of its in-house tools, Sarvam Vision and Bulbul. Both models focus on areas that large global AI labs have often overlooked, especially Indian languages and real-world document use cases.

Sarvam Vision is an optical character recognition model designed to read complex documents. According to the company, the model has achieved an accuracy score of 84.3 percent on the olmOCR Bench. This score is higher than Google Gemini 3 Pro and several recent OCR models, while ChatGPT ranked much lower on the same benchmark.

The model has also performed strongly on OmniDocBench v1.5, which tests how well AI systems read and understand real world documents. Sarvam Vision scored 93.28 percent overall, with particularly strong results in complex layouts, technical tables, and mathematical formulas. These areas are usually difficult for traditional OCR systems because of dense formatting and inconsistent structure.

These results have started to change perceptions around Sarvam AI. The company was earlier questioned for focusing heavily on Indic language models. That scepticism is now turning into approval as real benchmark results and user feedback come in.

Pratyush Kumar, co-founder of Sarvam AI, also shared a few screenshots demonstrating the capabilities of Sarvam AI’s OCR. He shared how accurately the model converts handwritten text to typed text.

He also shared how well the model scanned a Tamil page from an old book.

Alongside Sarvam Vision, the company has also launched Bulbul V3, its latest text-to-speech model. Bulbul is designed to generate natural-sounding voices for Indian languages and is positioned as an alternative to global tools like ElevenLabs.

According to Sarvam, Bulbul V3 is built to deliver stable and expressive speech while reducing common errors seen in AI voice systems. The model currently supports more than 35 voices across 11 Indian languages, with plans to expand to 22 languages in the future.

It is important to note that Sarvam AI is not trying to compete with global AI giants in general intelligence. It is focusing in depth, local data, and specific problems that global models do not prioritize. And this could be its path to success.

Follow Techlomedia on Google News to stay updated. Follow on Google News

Affiliate Disclosure:

This article may contain affiliate links. We may earn a commission on purchases made through these links at no extra cost to you.

Deepanker Verma

About the Author: Deepanker Verma

Deepanker Verma is a well-known technology blogger and gadget reviewer based in India. He has been writing about Tech for over a decade.

Related Posts

Stay Updated with Techlomedia

Join our newsletter to receive the latest tech news, reviews, and guides directly in your inbox.