BharatGen, the recent generative AI endeavour of India’s Ministry of Science & Technology, is among the first in the world with the mandate of enhancing the delivery of services to the people. To the best of the researchers’ knowledge, BharatGen is the first government sponsored MLLM for Indian languages only in the world. The initiative is to build three primary AI capabilities of language, speech, and computers vision for Indian linguistic and cultural and socio-economic profile.
Owned by IIT Bombay with support from the NM-ICPS, BharatGen will include partnerships with other leading academic institutions including IITs and IIM, Indore. It operated in the selection of data related to India in particular, to raise the level of India’s management of its digital assets. The four strategic pillars of BharatGen are multilingual and multimodal models; training on the Bhartiya data set; open-source platform; and, building a generative AI research community in the country. The goal of the project is to create models that can generate various content of the textual, visual, and audio type.
BharatGen is a pioneering effort, aimed at creating elite generative AI models built for Indian audience which takes into consideration the linguistic and cultural, as well as the economic demographic variations. This initiative is basically to construct AI models that better capture the linguistic diversity in India.
To make sure that the AI models embrace the Indian languages, BharatGen has provided for the “Bharat Data Sagar” project. This program is specifically about primary data collection involving specifically underrepresented Indian Languages which are largely missing in existing data sets thus providing an end to end-training data for AI models.
Beyond translation: multilingual and multimodal foundation models are introduced.
Developing and training local to Indian datasets.
Public space for enhancing original work in AI development.
It is anticipated that ongoing development research and intensification of applications of AI will continue throughout up to 2026.
Enhance Public Service Delivery: Applying AI models to deliver a wide range of public services of increased quality and availability taking into account the Indian’s cultural and language differences.
Develop Multimodal AI Models: Develop AI models that learn and produce human language, speech, computer vision capable to address India’s needs.
Promote Indian Language Support: Basically, the emphasis should be placed on the development of models with improved language understanding abilities based on words that are characteristic of such Indian languages as are not presented quite extensively in the existing databases.
Curate India-Centric Data: Create channels to search and collect first-hand data most relevant to India to have more control over the digital realm and AI.
Foster Generative AI Research: Develop an excellent research environment in generative AI, fostering cooperation of universities, government, industries, and startups.
Support Cultural Identity and Regional Development: Offer solutions that support regional development by providing convenient options of translation of sounds from one local language to another dialect.
Encourage Open-Source AI: Create a free to use open-source web platform meant for applying, comparing and ranking various AI models and tools for the research community and industries.
Empower Marginalized Communities: It is necessary to observe that underrepresented and marginalized groups of citizens should also be provided with technologies through corresponding AI applications.
Position India as a Global AI Leader: Increase India’s standing in the global AI ecosystem that is currently being driven by rapidly developed solutions that impact and include everyone.
From the contextual understanding of the state-of-art generative AI model of India, BharatGen can be seen as a profound advancement to fill the existing gap and provide natural language processing solution to the multilingual country like India for improving public services. Given the focus on collecting India-specific data and using an open-source approach, it will make for not only much better AI technologies, and a rebalance of power in the AI industry, where India will have much more control over the direction being taken. For these reasons, BharatGen is a project of strategic value for the India of tomorrow, the digital transformation the country is actively pursuing.
Chat With Us