Podcast 34: TADSummit Innovators, Pieter Luitjens, PrivateAI

Podcast Directory

Pieter Luitjens, is the co-Founder and CTO at Private AI. He took part in the TADSummit panel session, AI and Video applications, here’s the Video of the panel.

As I got to understand PrivateAI I realized the importance of what they do, all the conversations recorded across programmable communications are now fed into LLMs (Large Language Models), identification of PII (Personal Identifiable Information) and its redaction are critical. And critically are generally NOT being performed.

On April 28th 2021, the South Korean Personal Information Protection Commission (PIPC) imposed sanctions and a fine of KRW 103.3 million (USD 92,900) on ScatterLab, Inc., developer of the chatbot “Iruda,” for eight violations of the Personal Information Protection Act (PIPA). 

PIPC’s investigation found that ScatterLab used KakaoTalk, a popular South Korean messaging app, messages collected by its apps “Text At” and “Science of Love” between February 2020 to January 2021 to develop and operate its AI chatbot “Iruda.” 

Data exposed included 1,431 KakaoTalk messages revealing 22 names (excluding last names), 34 locations (excluding districts and neighborhoods), gender, and relationships (friends or romantic partners) of users. There have been many other embarrassing cases, there’s lots of advice on what NOT to share with ChatGPT.

PrivateAI provides a solution enterprises can self-host to identify and then redact customer PII, not just for LLM training, but also across all data within the company’s data lake. PII includes the obvious data such as name, address, account numbers but also preferences, hobbies, location data, known as quasi PII.

Private AI then fills the redacted data with dummy data, so the data is good for the LLM, yet preserves privacy. They cover 50 languages across text, PDFs, images, and audio. ASR (Automatic Speech Recognition) companies extensively use their services. Competitors include AWS Comprehend, Microsoft Presidio, and Google DLP.

Being on-premise is important to many businesses, especially in Healthcare, Insurance and Finance. Keeping customer data away from the cloud matters. Across programmable communications, e.g. UCaaS, CCaaS, collaboration, voice, messaging, video, wherever communications is recorded and then fed into a LLM, redaction is critical.

We then discussed the relationship between privacy preserving tools like PrivateAI and SSI (Self Sovereign ID). You can imagine a setting where permission is given to use your conversation data for training purposes but ONLY if PII is redacted. At TADHack Open in March, the challenge from STROLID (creator of vCon, the PDF for conversations) is focused on PII identification and redaction. This space is moving fast, and thanks to Pieter for opening our eyes to its importance.

3 thoughts on “Podcast 34: TADSummit Innovators, Pieter Luitjens, PrivateAI”

Leave a Reply

Your email address will not be published. Required fields are marked *