Written by publisher• July 4, 2025• 2:26 pm• Tech • Views: 1

Navigating AI Independence: India’s Quest for Homegrown Innovation Amid Linguistic Challenges

Inside India’s Scramble for AI Independence

By Shadma Shaikh | MIT Technology Review | July 4, 2025

India is at a critical juncture in its pursuit of artificial intelligence independence, striving to overcome unique structural and linguistic challenges to develop foundational AI models. Despite being a global technology hub, the country has lagged behind powerhouses like the US and China in creating homegrown AI technologies. Renewed governmental urgency, combined with rising entrepreneurial efforts, is shaping a new chapter in India’s AI story.

The Landscape: Enthusiasm Meets Frustration

In Bengaluru, Adithya Kolavi, 20, founder of AI startup CognitiveLab, expressed excitement earlier this year when a Chinese startup called DeepSeek launched a powerful new language model that rivaled Western AI giants in performance. DeepSeek achieved these benchmarks with comparatively less capital and in a shorter time frame, inspiring Kolavi as an example of disrupting through lean innovation.

However, for Abhishek Upperwal, founder of Soket AI Labs and an early pioneer developing India’s own foundational model—Pragna-1B—the moment was bittersweet. His multilingual model, designed to reduce what is called the “language tax” caused by India’s diversity of languages, remained a proof of concept due to sparse funding and limited scale. He reflected, “If we had been funded two years ago, there’s a good chance we’d be the ones building what DeepSeek just released.”

This dichotomy underscores the opportunities and obstacles faced by India’s AI builders. Although the nation has world-class tech talent and infrastructure, it has historically underinvested in research and development, particularly in deep tech necessary for groundbreaking AI.

Structural Challenges and Lingual Diversity

India’s tech ecosystem has traditionally emphasized software services rather than invention, with giants like Infosys and Tata Consultancy Services focusing on efficient delivery rather than innovation. This has translated into a chronic underfunding of research, with India’s R&D expenditure standing at 0.65% of GDP ($25.4 billion) in 2024—far below China’s 2.68% ($476.2 billion) and the US’s 3.5% ($962.3 billion).

Compounding this is one of India’s defining features: its linguistic diversity. With 22 official languages and hundreds of dialects, India poses a rare challenge for AI development. Most large language models (LLMs) are trained on English and a handful of other global languages, but Indian languages collectively make up less than 1% of online content. The scarcity of digitized, labeled, and cleaned data for languages such as Bhojpuri, Kannada, and Gujarati severely limits the ability to build models that accurately understand and generate Indian language text.

Indian languages also feature complex scripts and agglutinative grammar, where words contain multiple meaningful units that standard tokenizers—tools that segment text for AI processing—often mishandle. This results in inefficient text inputs, making it harder for AI to grasp nuances or generate coherent responses.

Initiatives and Innovations

Amid these challenges, Indian researchers have begun developing tailored solutions. Sarvam AI created OpenHathi-Hi-v0.1, an open-source Hindi language model based on Meta’s Llama 2 architecture, trained on 40 billion tokens of Hindi and related content. It is one of the largest Hindi language models openly available.

Upperwal’s Pragna-1B introduced “balanced tokenization,” a novel technique that enabled a modest 1.25-billion-parameter model to perform comparably to much larger counterparts by optimizing for Indian language complexities. Despite modest funding of just $250,000, his team trained the model on 300 billion tokens, demonstrating that intelligent engineering can partially overcome resource constraints.

Startups like Krutrim AI aim even higher. Their latest model, Krutrim-2, is a 12-billion-parameter multilingual system designed for English and 22 Indian languages. Krutrim’s approach includes building custom Indic tokenizers, optimizing training infrastructure, and focusing on multimodal and voice-first AI use cases—critical in India where literacy and digital access vary widely.

Governmental Response: Rapid Mobilization

The launch of DeepSeek-R1 acted as a wakeup call for Indian policymakers. Ten days after its debut in January 2025, the Ministry of Electronics and Information Technology (MeitY) issued a public tender inviting companies to provide GPU compute power for government-led AI research. Major data center and cloud companies, including Jio, Tata, AWS partners, and CDAC, responded.

This initiative provided MeitY access to nearly 19,000 GPUs at subsidized rates, unleashed a flurry of proposals to develop foundational AI models, and accelerated India’s AI ambitions. Within weeks, 67 proposals were submitted, tripling by March. By April, plans were announced to develop six large-scale AI models by the end of 2025, alongside 18 AI applications focused on sectors like agriculture, education, and climate action. The ambitious Sarvam AI was selected to build a 70-billion-parameter model optimized for Indian languages.

Experts note that India’s blend of talent, political will, and cost-effective innovation offers a promising path forward—akin to the success of the Mangalyaan Mars orbiter mission. As Gautam Shroff of IIIT-Delhi remarked, “India could do a Mangalyaan in AI.” AI literacy advocates like Jaspreet Bindra emphasize the importance of this momentum: “DeepSeek is probably the best thing that happened to India. It gave us a kick in the backside to stop talking and start doing something.”

The Road Ahead

Building AI sovereignty in India will require sustained investments beyond infrastructure—fostering research ecosystems that connect innovation to commercial pathways, attracting and retaining talent, and developing long-term capital for deep tech breakthroughs. Institutional mechanisms akin to the US’s DARPA could help translate research successes into civilian applications.

Moreover, AI models must address India’s linguistic and cultural diversity to truly serve its population, especially rural and multilingual users often left behind by English-centric technologies. Solutions like Upperwal’s speech APIs for 22 Indian languages represent steps toward inclusive AI.

India’s AI journey is just beginning, and while challenges are formidable, the combined determination of government, startups, and researchers signals a strong desire to carve out a leadership role in the next wave of technological innovation. With continued support, India could not only achieve AI independence but also pioneer models that work for the Global South.

Visited 1 times, 1 visit(s) today