Data Compliance for AI Innovation like ChatGPT
By You Chuanman

Data Compliance for AI Innovation like ChatGPT

Mar. 27, 2023  |     |  0 comments



Editor’s Notes:
ChatGPT surprises the world with its immense ability to integrate information and process languages. It quickly attracts users worldwide, with one million downloads within five days of its launch. ChatGPT, however, is not the world’s first conversational artificial intelligence chatbot. In fact, many of its predecessors have been subject to regulatory sanctions due to their illegal collection and processing of personal data. This paper examines the development of Lee-Luda, a Korean artificial intelligence robot, and the involved legal issues, including whether chat records are personal information, whether this information could be utilized for other purposes without the user's consent, and the special protection of minors’ personal information.


Introduction



Chat Generative Pre-trained Transformer, shortened for ChatGPT, is an artificial intelligence chat robot developed by OpenAI. It has achieved exponential market growth in just a few months, with the number of downloads exceeding one million within five days since its launch on 30 November 2022.


The launch of ChatGPT, one of the most phenomenal AI innovations in 2023, has sparked intense discussions in a wide range of fields, such as industrial policy, technological development, venture capital, teaching and research ethics, and political ecology. It has also raised profound challenges on the efficacy of today’s regulatory paradigms over artificial intelligence innovation and the direction of future regulatory reforms.

The U.S. government, for instance, released “The Blueprint for an AI Bill of Rights: Making Automated Systems Work for the American People” in October 2022, putting forward five principles that artificial intelligence technology shall follow in dealing with its negative impact on the U.S. political ecology.

1. Safe and Effective Systems;
2. Algorithmic Discrimination Protections;
3. Data Privacy;
4. Notice and Explanation;
5. Human Alternatives, Consideration, and Fallback.

Meanwhile in the European Union, the Italian Data Protection Agency (Garante Per La Protezione Dei Dati Personali, hereinafter “the Agency”) on 3 February 2023 banned Replika, a program developed by Luka, a company in the U.S. Bay Area. The Agency believes that this artificial intelligence-driven chatbot is involved in illegally collecting and processing personal data, which violates the EU’s “General Data Protection Regulation”. In particular, Replika fails to meet transparency requirements and poses a hazard to the protection of minors due to the absence of an age verification mechanism.

As the Latin proverb goes,
“Nihil sub sole novum”.(There is nothing new under the sun.)

Scatter Lab, a South Korean startup, released a similar artificial intelligence chatbot, “Lee-Luda,” as early as December 2020, exactly two years before the launch of ChatGPT in December 2022. Although Lee-Luda was warmly welcomed by young Korean people. It has received severe criticism in relation to gender discrimination, sexual harassment, personal information leakage, and many other issues.

The following briefly analyses the administrative penalties imposed by the Personal Information Protection Commission of South Korea (PIPC) in April 2021 on Lee-Luda for infringing on users’ personal information. It aims to increase the awareness of compliance risks associated with data privacy and security, when innovations such as ChatGPT provide artificial intelligence services to the wider society.


Lee-Luda



Lee-Luda is an AI chatbot launched by Scatter Lab, a Seoul-based Korean startup, on 23 December 2020. The company was founded in August 2011 by Jongyoun Kim, who was only 26 years old at the time and had just graduated from Yonsei University with both business administration and sociology majors.  Forbes Korea reported in 2016 his entrepreneurial story entitled “Scatter Lab CEO Jongyoun Kim: The Dream of Emotional Analysis Artificial Intelligence Service”. Kim was also selected as a “Power Leader 2030” by Forbes Korea in both 2018 and 2019.

Scatter Lab received about 200 million won in 2013 from Korea Venture Investment Corp (KVIC), a Korea Fund of Funds supported by the government, according to research done in January 2021 by Edaily and relevant public information.

KVIC is a member institution under the Korean government's Ministry of Small and Medium Enterprises. It was established in 2005 based on the “Special Measures for the Promotion of Venture Businesses Act,” for the purpose of providing a stable capital source for venture investment to promote the development of Korean venture capital and private equity fund industries and create a better start-up ecosystem. KVIC has financed guidance funds up to $5.7 billion, invested in 1,082 partnership funds, and invested in 8,974 companies as of September 2022, according to the KVIC website.

In addition to KVIC, Scatter Lab attracted a ₩1.3 billion investment from SoftBank Ventures Korea and KTB Network, a Korean venture capital firm, in 2015. Additionally, it drew ₩5 billion investment from SoftBank Ventures Asia (Korean venture capital company), Cognitive Investment (America-based), ES Investor (Korean venture capital corporation), and NCSoft (a representative Korean online game firm) in 2018.

Scatter Lab launched “TextAt” as early as 2013, a dialogue analysis service that mainly analyses chats between friends to judge whether one has a crush on the other. The company launched another service, “Science of Love”, in 2016 to assess the level of affection between users by analysing conversations uploaded by users on KakaoTalk, a chat application software widely used in South Korea. Scatter Lab officially launched the chatbot, Lee-Luda, in December 2020.

The AI chat robot Lee-Luda was set by Scatter Lab as a 20-year-old female college student who likes Korean girl groups, cat photos, and sharing her daily life on social media platforms.


Lee-Luda 2.0
(Source: Lee-Luda App)

By running “TextAt” and “Science of Love” for many years, Scatter Lab has collected 9.4 billion chat records between about 600,000 young users since 2013. With these massive and high-quality databases as AI learning materials, Lee-Luda was enabled to interact with users and learn continuously when talking with users.

Not long after it launched, however, some users intentionally humiliated and sexually harassed Lee-Luda with obscene words when chatting. The chatbot was even subject to the vulgar online discussion of how to corrupt it, leading to its degeneracy. The service was suspended on 12 January 2021 because it was widely criticised for alleged gender discrimination, racial discrimination, and discrimination against vulnerable groups.



Data Breach Penalty Adjudication



Personal Information Protection Commission (PIPC or 개인정보보호위원회 in Korean), the primary regulator of information protection and data compliance in South Korea, officially released a decision on 28 April 2021 to fine Scatter Lab up to ₩103.3 million.


PIPC found that Scatter Lab leaked a large amount of user personal information, including name, mobile phone number, and personal address, without deletion or anonymous processing when providing Lee-Luda artificial intelligence analysis and chat services; without prior consent of users, it also collected and misused the KakaoTalk conversations of about 600,000 users from its associated applications, Science of Love and TextAt.

PIPC adjudicated that Scatter Lab violated Article 18 (limitation to out-of-purpose use and provision of personal information), Article 21 (destruction of personal information), Article 22 (methods of obtaining consent), Article 23 (limitation to processing of sensitive information), and Article 28 (processing of pseudonymous data), and other articles in the Personal Information Protection Act.

Are chat records personal information?

The users’ conversations on KakaoTalk shall be deemed as personal information under the Personal Information Protection Act of South Korea. PIPC believes private conversations between users on KakaoTalk collected from Science of Love include personally identifiable information, such as real names and mobile phone numbers. Not only that, even without such explicit information, massive dialogues on KakaoTalk, including where people work and the relationships between them, can also help identify and track the inherent identity of a user.

Scatter Lab, however, does not delete sensitive information when it uses this personal information for Lee-Luda’s AI learning and thus violates users’ privacy.

Is AI analysis of chat records illegal?

PIPC believes that users’ consent is one of the most important compliance requirements for processing personal information.

Scatter Lab argued during the investigation that Science of Love’s user agreement stated that the processing of personal information was for the development of new services, and Science of Love and Lee-Luda were essentially the same from the perspective of an AI service provider, as both services used machine learning to analyse conversations and displayed appropriate responses to users.

PIPC, however, affirms that using the personal information collected by Science of Love for the study and operation of Lee-Luda is considered beyond the original purpose of information collection.

Science of Love and Lee-Luda are not homogeneous, especially from users’ perspectives. In Science of Love, once a user enters a KakaoTalk conversation sentence, the likeness score between conversation partners will be calculated. Lee-Luda, in contrast, is a service that interacts with users through questions and answers. Therefore, a clear difference exists between these two services.

Special Protection of Minors’ Personal Information

In addition, Scatter Lab also seriously violates the special protection of minors’ personal information in accordance with Article 22 of South Korea’s Personal Information Protection Act by collecting a large amount of personal information of minors under the age of 14 without the consent of their parents or guardians. The PIPC investigation finds that Scatter Lab does not set age restrictions when recruiting service users. It has collected the personal information of 48,000 minors through TextAt, 120,000 through Science of Love, and 39,000 through Lee-Luda.

Moreover, neither did Scatter Lab destroy those users’ personal information who have cancelled their membership nor take measures to destroy or separate those users’ personal information who have not used the service for more than one year. The company also shared KakaoTalk chat information on GitHub, violating provisions of South Korea’s Personal Information Protection Act.



Conclusion



Scatter Lab expressed its willingness to take full responsibility on the day when the penalty notice was announced. Scatter Lab would rectify and redesign Lee-Luda according to regulators’ requirements to earnestly abide by the legal rules and industry standards related to personal information processing. To prevent similar violations, Scatter Lab also upgraded its internal compliance mechanism for personal information protection, including restricting services offered to minors under the age of 14.

The rectified Lee-Luda 2.0 was back online in early 2022. The corporate website has a new greeting: "Hello, I am your first AI friend Lee-Luda.”  More self-introductions of Lee-Luda are shown on the webpage as well. Lee is 21 years old; her characteristic is artificial intelligence; her MBTI personality classification is ENFP (extraversion/intuition/emotion/understanding); and her hobbies include going for a walk when the weather is good, hanging out with friends all night, and checking Instagram. Lee-Luda says, “I want to share my daily life with you! Be friends with me?” Next to this invitation is a dialogue page where a user chats with Lee-Luda about what to eat for dinner.

Source: Lee-Luda 2.0 Website

Through the penalty for Lee-Luda, South Korea’s law enforcement agencies once again emphasise that artificial intelligence companies, as personal information processors, must abide by the golden rule of “informed consent with legitimate purposes” when collecting and processing personal information to provide products and services:

“Personal information processors shall not use personal information collected for specific services for other purposes without informing users and obtaining explicit consent.”


Considering the vast commercial potential of ChatGPT, many Internet giants worldwide are already gearing up to enter the cutting-edge AI technology represented by ChatGPT and the value conversion of commercial capital. Tencent, for example, announced a patent for man-machine dialogue on February 3, claiming that it could realise natural and smooth communication between man and machine. Baidu announced on February 7 that it would complete the internal testing of its ChatGPT-like product named ERNIE Bot in March and launch it to the public to intelligently answer users’ questions. Wang Huiwen, the co-founder of Meituan, also declared on February 13 that he would form a “Chinese OpenAI” team with 50 million US dollars.

Companies that are afraid of missing out on this round of the Internet revolution also should pay attention to the development of legislation and supervision worldwide and actively evaluate internal and external influences based on actual application scenarios. The aim is to avoid disturbing the social economy and public order or infringing on privacy rights and other legitimate rights and interests of other people, as well as enhance the awareness of compliance risks to maintain basic social ethics.