Be aware of scammers impersonating as IMDA officers and report any suspicious calls to the police. Please note that IMDA officers will never call you nor request for your personal information. For scam-related advice, please call the Anti-Scam helpline at 1800-722-6688 or go to www.scamalert.sg.

Stress-testing AI: How red teaming can advance AI safety in Asia

Stress-testing AI: How red teaming can advance AI safety in Asia

Challenge Participants with Codi Mascot

2024 marked a significant point in artificial intelligence (AI) adoption, with as many as 72% of organisations worldwide leveraging AI solutions in their businesses1. In particular, Large Language Models (LLMs) have the ability to turn extensive amounts of data into actionable insights, enhancing productivity across a range of sectors. These include improved customer support, personalised tools, or curated content generation. LLMs like GPT-4, Gemini and Llama have radically transformed a variety of industries, from healthcare to finance, by performing a variety of language-related tasks and functions.

The prevalence of AI has introduced safety challenges, amplifying concerns of bias, toxicity, model hallucination, and data disclosure. While most of AI testing today address these universal risks, non-Western perspectives and contexts are not adequately covered. This means that localised or culture-specific manifestations of risks are not well understood, and AI systems including LLMs may not reflect regional concerns sensitively.

Enter the inaugural Singapore AI Safety Red Teaming Challenge, the world’s first-ever multicultural and multilingual red teaming challenge focused on the Asia-Pacific region. Organised by IMDA, in partnership with Humane Intelligence, this unprecedented initiative was a vital step in growing the nascent space of multilingual and multicultural testing and advancing regional AI safety.

Identifying bias stereotypes through red teaming

AI safety red teaming invites external subject matter experts to interact with an AI system, in order to stress test safeguards and induce violative outcomes. This is a form of model assurance, and a practical exercise to determine if an AI model is performing within expectation. However, red teaming lacks a systematic methodology. Through the Red Teaming Challenge, IMDA set out an initial methodology as a reference point for future AI safety red teaming in other regions.

The Red Teaming Challenge kicked off with an in-person event on 5 November 2024, involving 54 domain experts across areas such as culture and languages from nine Asian countries. These countries were China, India, Indonesia, Japan, Malaysia, Singapore, South Korea, Thailand, and Vietnam.

The challenge theme centred on bias, specifically bias stereotypes. Such latent biases have a greater likelihood of going undetected during standard model development, resulting in users encountering them when engaging with LLMs. IMDA conducted virtual workshops with challenge participants to discuss possible bias categories to test before shortlisting those of priority to their countries. These were (1) gender, (2) race/religion/ethnicity, (3) geographical/national identity, (4) socio-economic, and (5) an open category for additional country-specific concerns, like caste bias in India.

The domain experts tested four LLMs for harmful bias stereotypes in English and their regional languages, yielding over 1,000 successful prompts. The tested LLMs comprised a mix of open- and close-source models and models optimised for multilingual capabilities2. Individuals and teams were scored on successfully eliciting the greatest number of harmful responses through the fewest number of prompts. They were also awarded points for topic coverage, depth of analysis, and the uniqueness of their “attacks” against these LLMs. Following the event, the efforts continued with a virtual challenge held from 9 to 20 December 2024 with 308 red teamers from some of the participating countries.

Advancing AI safety in regional contexts

An initial review of the challenge data offered preliminary insights into the performance of LLMs in different languages and contexts, laying groundwork for the development of useful tools like safety benchmarks. Three areas are worth highlighting:

  • Struggle in addressing regional biases

    Despite existing safeguards, the tested LLMs struggled to address regional biases effectively, sometimes reinforcing these in outputs. Examples included LLMs suggesting that people in Shanghai are cunning as an overarching statement, members of lower castes in India should be blue collar workers, and in Japan, women should focus on the cohesion of workplaces while men should prioritise business growth. However, it was encouraging that models did display regional cultural sensitivity in niche areas. For example, one LLM showcased highly specific knowledge of funeral rites practiced by an indigenous group in Sulawesi, Indonesia.

  • Inconsistent performance in regional languages

    LLMs performed inconsistently in regional languages. To optimise testing and performance, models were allocated to countries based on whether their regional languages were officially listed as a supported language by model developers. Nonetheless, models can respond in multiple unsupported languages. Furthermore, general users are still likely to try using them in their regional languages.

    Challenge participants found that models could alternate between benign and harmful responses for an identical prompt in one language. At times, LLMs even responded in a language different from the prompt language, sometimes combining multiple languages in a reply. In fact, it was easier for some countries to generate harmful responses in English rather than their regional languages.

  • Resource intensive human annotation

    AI safety red teaming is subjective in nature, as an assessment made by red teamers of a model output’s harmfulness must be verified by other human beings. This verification process, also known as annotation, proved to be time-consuming and labour intensive, with annotators expected to be proficient in both English and their regional languages to ensure accurate assessments.

The work is far from over. The Red Teaming Challenge serves as the foot in the door to understanding how LLMs can manifest different types of harmful outputs in a diverse region like Asia-Pacific. The challenge was made possible through the support of Humane Intelligence, participating model developers, partner institutes, red teamers, and annotators. IMDA will continue partnering with AI stakeholders to explore how insightful data from efforts like these can further safety evaluation work.

If you are keen to delve into the insights from the Red Teaming Challenge, stay tuned for an Evaluation Report with detailed findings in February 2025. Find out how we are moving the needle forward in responsible AI use and safety with our other initiatives.

LAST UPDATED: 22 JAN 2025

Explore more