Be aware of scammers impersonating as IMDA officers and report any suspicious calls to the police. Please note that IMDA officers will never call you nor request for your personal information. For scam-related advice, please call the Anti-Scam helpline at 1800-722-6688 or go to www.scamalert.sg.

Image with 2 hands touching a virtual globe

Privacy Enhancing Technology Sandboxes

Experiment with digital solutions that safely extract value from data.

What are Privacy Enhancing Technologies?

Privacy Enhancing Technologies (PETs) enable businesses to obtain valuable insights from data while ensuring personal data protection, data privacy, and the safeguard of commercially sensitive information. PETs allow for increased B2B data collaboration, cross-border data flow, and data collection for AI development.

IMDA’s PET Sandbox: A safe space to trial PETs

As PETs are still in their infancy, there is much to learn about using these technologies in a real-world environment. To facilitate experimentation with PETs, IMDA’s PET Sandbox – Singapore’s first – provides opportunities for companies to work with trusted PET digital solution providers to develop use cases and pilot PETs.

The PET Sandbox will:

Icon showing human connection

Matchmake use case owners to a panel of PET digital solution providers.

Icon showing growth

Provide grant support to user companies to scope and implement pilot projects.

Icon showing a secured storage

Provide regulatory support to ensure that PETs are deployed in a compliant manner.

Unlock value from data with IMDA's Privacy Enhancing Technology Sandbox

Unlock value from data with IMDA's Privacy Enhancing Technology Sandbox

New Archetype – Use of PETs for Generative AI Use cases

Introduction

As a recognition of the potential that PETs hold, the PETs sandbox was introduced in 2022 to provide a testing ground for businesses to pilot their PET use cases, with technology, financial, and regulatory support from IMDA, across three archetypes: (a) identifying common customers in multiple datasets; (b) deriving additional features of common customers from multiple datasets; and (c) making more data available for AI. The sandbox has already seen active industry participation, with some projects targeting the data protection challenges inherent in traditional AI.

The IMDA is keen to extend similar support and exploration into the realms of generative AI. We acknowledge that the use of PETs in generative AI is nascent today. Nonetheless, our conversations with industry partners indicate that businesses recognize both the potential of generative AI and the personal data protection risks that it poses throughout its lifecycle and are interested in the use of PETs to address such risks.

The use of PETs can drive growth in generative AI use cases by unlocking more data and addressing data protection risks

Since November 2022, generative AI has surged in popularity, exemplified by the success of ChatGPT and the subsequent proliferation of similar applications. This technology has the potential to deliver significant value to the global economy over the next few years by driving innovation, improving efficiency, and creating new opportunities.

Central to this potential is the critical role of data. Generative AI models require large amounts of data for training to generate accurate and contextually relevant output. In contrast to traditional AI, training data for generative AI is more voluminous, diverse, and complex. This poses a greater challenge in curating and removing personal data, resulting in a higher risk that confidential data (i.e. personal or commercially sensitive data) may be included in the training dataset.

Further, when users interact with generative AI, both the inputs (prompts) and corresponding outputs can include anything, including confidential data. Confidential data entered as input prompts may be stored and inadvertently used to further train the generative AI models. Consequently, generative AI may output such confidential data in interactions with users.

The use of PETs could help to unlock more data for generative AI use cases, and address confidential data related risks largely across the following areas:

  1. [Input] To “make available” more data:
    • E.g. The use of Synthetic Data (SD) techniques to create statistically similar data for training or testing purposes.
    • E.g. The use of Homomorphic Encryption (HE) to encrypt and enable compute on sensitive datasets, so that confidential data can also be used in a privacy preserving manner.
  2. [Output] To obfuscate or remove confidential data:
    • E.g. The use of Differential Privacy (DP) to add noise to output (reports or analysis) to lower the likelihood of re-identification.

The use of PETs has been steadily growing in traditional AI use cases, but it is still nascent for Generative AI

PETs have been successfully used to protect the collection, sharing and use of confidential data in traditional AI1 use cases. Examples include:

PET Used Example in Traditional AI
Differential Privacy (DP) Apple uses DP when collecting user data to train machine learning models to power keyboard predictive text features.2
Federated Learning (FL) Using FL, Moorfields Eye hospital NHS Foundation Trust deployed machine learning models for the diagnosis and treatment of common eye diseases.3
Fully Homomorphic Encryption (FHE) Banco. Bradesco used FHE on financial data for machine learning, and proved similar levels of accuracy and privacy could be achieved.4
Synthetic Data (SD) American Express used synthetic data to train its AI models to improve detection of rare and uncommon frauds.5
Trusted Execution Enclaves (TEEs) The Weather Company (an IBM business) used AWS Clean Rooms to enable advertisers to analyse their data together with weather data and used predictive machine learning to identify engaged audiences at scale.6

However, examples of PETs in generative AI, while growing, are still explored primarily by larger technology companies:

  • Microsoft7, Meta8, and Anthropic9 have used synthetic data to train their respective generative AI models.
  • Apple recently launched Private Cloud Compute10, which uses secure enclaves to process user inferences for generative AI services.

As the use of PETs for generative AI is still nascent, to facilitate a more comprehensive understanding of PETs and their potential to enable privacy-preserving generative AI use cases, it is essential to expand real-world implementations across various industries (e.g. finance, healthcare, supply chain, etc) and different types of organisations (e.g. MNCs, governments, SMEs, etc).

Encouraging experimentation and feedback through a new Archetype for Generative AI

At IMDA, we adopt a “use case centric” approach through the PETs Sandbox, emphasizing close partnerships with industry to learn from the practical implementation of solutions. It is crucial to ensure that feedback and learnings are factored in, so that any policy or guidance11 developed is meaningful, grounded, and reflective of industry’s requirements. In working with industry on use cases, the Sandbox has validated the potential of PETs to support the use and sharing of data in a trusted and accountable manner across industries such as finance, healthcare and AdTech.

With greater interest in Gen AI applications from industry, and the need to better understand and address data protection risks in such applications, the PETs Sandbox will be expanded to include a new archetype – “Data Use for Gen AI”, that would focus on model and application 1) development and 2) use (See Annex A (102.94KB) for more information)

While not traditionally considered PETs, Gen AI can be used to identify and flag personal data, which can then be removed or obfuscated. The use of such technologies would also be relevant, and we encourage use cases employing such solutions to also come onboard our Sandbox.

We invite industry to propose use cases for collaboration under this new archetype.

Case studies

Since the launch of the PET Sandbox, IMDA has received active participation from the industry. Read about the PET use cases implemented by the participating businesses: 

Expand All
Ant International | Enhancing Customer Engagement with Privacy Preserving AI

Customer engagement and experience can be enhanced by offering relevant promotions. This can be predicted using a model that is trained using preference and behavioral data that is collected by different data partners. In this POC, Ant and partner explores training a more accurate prediction model by using preference and behavioral data from their partners, without revealing sensitive information regarding their customers or business during the entire process.

Learn how to enhance customer engagement with privacy preserving AI (1.14MB).

GPAI | Overcoming data barriers via Trustworthy Privacy Enhancing Technologies

GPAI and IMDA collaborated to demonstrate how privacy-enhancing technologies can be used to share data from past pandemics and improve societal resilience to future outbreaks. This report shares key findings from this project, highlighting the importance of circulating data via confidential and trustworthy channels.

Learn how trustworthy privacy-enhancing technologies can be used to overcome data barriers (1.41MB).

Grab | Enabling Data Availability through Automated Data Management

Faster market and data-driven decision making can be enabled by faster access to data. In this POC, Grab transformed their manual data tagging and clearance process to one that is automated with LLM based metadata tagging and automated data anonymisation to provide quick data access without compromising on personal data.

Learn how to enable data availability through automated data management (612.18KB).

Healthcare Services Provider | Accessing more data through Trusted Execution Environment to generate new insights

Data sharing amongst the pharmaceutical company and its data partners has not been easy as they have to abide to the different data protection regulations in various jurisdictions. Often these data involves individuals and may disclose product distribution and transaction data within the ecosystem players. The pharmaceutical company designed a Trusted Execution Environment (TEE) based solution for the POC, where it included safeguards to ensure that the original data from data partners cannot be read, modified, or accessed in any form by host of the environment. Through the POC, the company managed to access more data from its data partners and create new data models that benefits the ecosystem partners. 

Find out how more data can be accessed through Trusted Execution Environment to generate new insights (280.69KB).

Mastercard | Preventing financial fraud across different jurisdictions with secure data collaborations

Mastercard, a global technology company in the payments industry, via its Cyber and Intelligence Solutions business line has been assessing the potential for frontier technologies, including Privacy Enhancing Technologies (PETs), to buttress its product offerings against financial crimes like money laundering.

Mastercard developed a proof of concept (POC) in IMDA’s PET Sandbox program to investigate a product based on Fully Homomorphic Encryption (FHE), provided by a third-party supplier, for sharing financial crime intelligence across international borders – specifically between Singapore, the United States (“US”), India and the United Kingdom (“UK”) – while complying with prevailing regulations.

Learn how an AML solution built with Fully Homomorphic Encryption can comply with financial regulations (365.61KB).

Meta | Digital advertising in a paradigm without 3rd party cookies

Tracking technologies, like 3rd party cookies, are presently a mainstay in the digital advertising ecosystem. Publishers, advertisers and adtech firms rely on the collection and sharing of user / device identifiers to analyse how consumers can be shown advertisements best aligned to their online interests or activities.

However, the ecosystem is preparing for a paradigm in which the collection of user / device identifiers that are linkable across apps / websites is no longer feasible, and trust in the ecosystem is low. A prominent avenue where solutions to measure attribution of digital ads without tracking technologies is actively discussed is the World Wide Web Consortium (W3C)’s Private Advertising Technology Community Group, or “PAT-CG”.

Meta and Mozilla, as members of PAT-CG, have proposed a solution – “Interoperable Private Attribution” or IPA. It uses a combination of multiparty computing (MPC), aggregation, differential privacy (DP) and write-only identifiers to enable attribution measurement. The solution aims to measure advertising outcomes based on impressions shown on publisher website(s)/app(s) and conversions occurring on an advertiser website/app.

Find out how Meta and its partners piloted ‘Interoperable Private Attribution’ (439.36KB), a PET-based solution to generate attribution reports without use of 3rd party cookies.

Companies with PET use cases are invited to participate in the PET Sandbox. More information can be found in the invitation to participate in the PET Sandbox (202.63KB).

For insights into PETs and the PET Sandbox, please access the following reports.

IMDA-Google: PET x Privacy Sandbox

IMDA has partnered with Google to give companies access to Google’s Privacy Sandbox through IMDA’s PET Sandbox environment. Key features of the PET x Privacy Sandbox include:

An icon showing smart phone

Open to all Singapore-registered companies interested in learning about, and testing, PETs. Be a part of the privacy-first shift across web and mobile applications.

Icon showing a shield with a lock icon

Companies will learn about protecting user privacy and be equipped with tools to do so while extracting data value.

An icon showing a laptop with a search icon on it

Publishers and developers will learn privacy-preserving alternatives to access data on sites and apps for business needs.

Find out more about the PET x Privacy Sandbox here.

Contact us

For queries, please email to data_innovation@imda.gov.sg

Footnotes

1 Traditional AI refers to AI models that make predictions by leveraging insights derived from historical data. Typical traditional AI models include logistic regression, decision trees and conditional random fields. Other terms used to describe this include “discriminative AI”.

2 Apple Machine Learning Research, Learning with Privacy at Scale, Dec 2017

3 Open Data Institute, Federated Learning: An Introduction, Jan 2023

4 IBM Research, Top Brazilian Bank Pilots Privacy Encryption Quantum Computers Can’t Break, Jan 2020

5 Fortune, American Express is trying technology that makes deepfake videos look real, Sep 2020

6 AWS, AWS Clean Rooms ML

7 Microsoft, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

8 Meta Responsible AI, Our responsible approach to Meta AI and Meta Llama 3

9 Anthropic, The Claude 3 Model Family: Opus, Sonnet, Haiku

10 Apple Security Research, Privacy Cloud Compute: A new frontier for AI privacy in the cloud, Jun 2024

11 For example, the PDPC has published the “Proposed Guide on Synthetic Data Generation” that would be finetuned iteratively with new inputs from industry based on PETs Sandbox use cases

Explore related tags

LAST UPDATED: 12 JUL 2024

Explore more