Last March the Italian data protection authority (DPA) temporarily blocked OpenAI’s ChatGPT in Italy. The authority criticised OpenAI for not describing how it trains its algorithms and not offering users the possibility of deleting or correcting inaccurate data. Open AI has since made some modifications, and Italy has allowed use of the software again, but doubts remain.
OpenAI has stated the AI is trained using “information that is publicly available on the internet”. This means that the AI “crawls” the web to train the algorithm. Such activity is problematic under the European General Data Protection Regulation (GDPR), especially when it takes into account criminal activity.
For example, a malevolent actor could use ChatGPT to access a public blogpost of mine, where I write about LGBTQ+ unions, to create a database of LGBTQ+ individuals which could be up for sale to a reactionary government.
OpenAI also mentions another source: “information that we license from third parties”. It is not clear who such “third parties” are and how they collect data. GDPR prohibits the use of illegally formed databases.
The second issue concerns the rights of the data subject. The DPA demanded ChatGPT allow the removal and rectification of data, on request of the data subject. The company has made some changes, but has declared that it cannot guarantee the rectification of data due to technical reasons.
This is an issue that is not unique to ChatGPT as these technologies are increasingly being used in other areas, such as recruitment. For example, if you were applying for a job, a recruiter could stop your prospects of winning a position due to inaccurate information about you in an illegally created database.
These are some of the unanswered questions that require a clear position from national governments and European institutions.