The launch of ChatGPT by OpenAI in 2022 ignited a global surge of interest in artificial intelligence (AI), placing the spotlight squarely on Large Language Models (LLMs).
This rapid advancement has extended beyond the tech giants of Silicon Valley, with governments now joining the fray to bolster national competitiveness, uphold data sovereignty, and reduce dependence on foreign AI systems.
In an Op-Ed titled An LLM for ASEAN, by ASEAN, Assistant for Research Activities at the Economic Research Institute for ASEAN and East Asia (ERIA) Satria Mahesya Muhammad highlighted this momentum in Southeast Asia.
The region’s AI market, he notes, is accelerating at an annual growth rate of 27.71 per cent (CAGR 2025–2030), projected to reach USD30.3 billion.
Yet, despite the impressive trajectory, Satria pointed out a notable gap: the absence of a truly collaborative LLM that reflects ASEAN’s rich linguistic and cultural mosaic.
A step in this direction has emerged through AI Singapore’s (AISG) Southeast Asian Languages in One Network (SEA-LION) model – a multilingual LLM benefiting from open-source contributions.
However, as Satria noted, funding and technical development remain largely centralised in Singapore.
According to the ERIA One ASEAN Start-up White Paper 2024, Singapore’s investment in localised LLM development reportedly totals SGD70 million – a demonstration to its AI ambitions.
In comparison, Europe’s multilingual LLM efforts are often driven by collective contributions – a hybrid of university-led initiatives, private-sector partnerships, and European Union (EU)-backed funding.
Satria contended that ASEAN should embrace a similar model, advocating for a regional LLM built by and for the region – one that ensures inclusive participation and represents the linguistic and cultural diversity of all 10 member states.
WHY ASEAN NEEDS ITS OWN LLM
LLMs are sophisticated AI systems trained to understand and generate human-like responses.
At their core, they rely on deep learning models – particularly transformer architectures – trained on immense datasets ranging from online texts to code and digital media.
These models enable natural language processing, pattern recognition, and content generation across various formats, from simple sentences to images, sound and video.
However, many of today’s widely used LLMs – such as ChatGPT and Meta’s LLaMA – are primarily trained on English-dominated Internet data, with the language accounting for nearly half (49.2 per cent) of the training corpus.
Satria believes this heavy skew places non-English-speaking regions, particularly Southeast Asia with its 700 million people and over 1,200 native languages, at a disadvantage.
Research indicates that low-resource Southeast Asian languages are poorly handled by existing LLMs, leading to inaccuracies and confusion between similar languages like Bahasa Indonesia and Melayu.
These models also struggle with “code-mixed” language – the informal blend of English, local dialects, and national languages common in the region.
Furthermore, the Assistant for Research Activities highlighted that AI language bias remains a growing concern.
A report by Singapore’s Infocomm Media Development Authority (IMDA) on AI safety revealed that nearly half of AI-generated English responses exhibited bias – with this figure rising to two-thirds when assessing regional languages.
This suggests that without intervention, AI systems risk reinforcing linguistic inequities and misrepresenting cultural nuances.
ASEAN’S JOURNEY TOWARDS AI SELF-RELIANCE
To bridge this gap, both multilingual and monolingual LLMs have been emerging across Southeast Asia.
SEA-LION, for instance, has been trained on 980 billion tokens across several regional languages, including English.
On a national scale, Vietnam’s PhoGPT and Indonesia’s Sahabat-AI – trained on 102 billion and 50 billion tokens respectively – have demonstrated promising results in localising AI.
These models incorporate local linguistic quirks, slang and cultural references often overlooked by mainstream models. Yet despite these gains, ASEAN’s LLM development remains fragmented.
Challenges include disjointed data governance, varying degrees of AI readiness among member states, and the absence of a unified AI roadmap.
By contrast, the EU has adopted both national and supranational strategies. EU-funded projects like EuroLLM, TildeLM and OpenEuroLLM aim to strengthen underrepresented languages through cross-border collaboration.
Nationally, Estonia has developed open-source language models in partnership with Meta, while France has committed EUR109 billion to advance its AI ecosystem.
TOWARDS A SHARED REGIONAL VISION
While AISG’s efforts have laid a strong foundation, ASEAN’s next step must be more ambitious. A regional AI research hub, backed by pooled financial and human capital, could act as the backbone for collective innovation.
Such a centre would facilitate the development of high-quality, multilingual datasets and enable meaningful collaboration across borders.
A critical element of this initiative would be the establishment of an ASEAN Language Repository – a central, open-access platform to collect, preserve and share structured linguistic, domain-specific and multimodal datasets.
This repository would serve not only as a resource for training LLMs but also as a vital instrument in preserving the region’s linguistic heritage.
Given that the bulk of AI expertise and infrastructure lies within the private sector, ASEAN must actively cultivate robust public–private partnerships.
While partnerships with established tech giants like Nvidia, Microsoft, Google and Amazon are valuable, ASEAN should also deepen ties with leading Asian firms such as Alibaba, Baidu, NEC, SoftBank and Naver.
Broader engagement with countries like the United States, China, Japan and South Korea – combined with deeper integration into Asia’s start-up ecosystem – could unlock vital investment and knowledge-sharing opportunities.
Such collaborations would bolster regional capacity to innovate while ensuring ASEAN maintains agency in its AI future.
As LLM development becomes increasingly central to AI progress, ASEAN must rise to the challenge.
A regional approach – built on cooperation, cultural sensitivity, and mutual investment – is vital not just to protect digital sovereignty but also to ensure inclusive technological advancement.
By investing in a collective AI research hub, building an ASEAN-wide language repository, and embracing both regional and global partnerships, the bloc can build LLMs that speak not only the languages of its people, but also their stories, contexts and identities.
In doing so, ASEAN won’t just join the global AI race – it will help shape it. – Izah Azahari