Ever since OpenAI’s ChatGPT chatbot burst out into the limelight late last year, its popularity has grown by leaps and bounds. By the end of January 2023, according to a report from UBS, a bank, ChatGPT had garnered over 100 million monthly active users, beating all social media sites as the fastest consumer internet service to achieve that distinction.
Unsurprisingly, in lockstep with its growing popularity, controversies have also started dogging the company. For instance, in mid-January, Time magazine published a bombshell report about how OpenAI sub-contracted Kenyan workers earning less than US$2 per hour to label toxic content, like violence, sexual abuse and hate speech, to be used to train ChatGPT to reduce its own toxicity. Some of them reported that they had been mentally scarred by descriptions of topic ranging from hate speech to violence to sexual perversion.
Of course, the report touched off a raucous online kerfuffle. As of my writing, however, this controversy has mostly fizzled out, like most internet-fuelled controversies. Moreover, as per the Time report, the sub-contractor which employed the Kenyan workers, San Francisco-based Sama, cancelled its contracts with OpenAI in February 2022, and, in January 2023, announced that it would cease all of its work on sensitive content.
Was this a storm in a tea cup or does it deserve a little more examination? Did Time waste space and everyone’s time, or is OpenAI actually liable for a wrong worthy of the negative attention it got in the aftermath of the article? Perhaps a detached analysis of the situation can help answer these questions. For the story, and the popular reaction to it, suffered from a major context collapse.
In particular, three major aspects of the story were not distinguished from each other, thereby muddying the waters. The first is the most important: the kind of work OpenAI subcontracted to Sama has to be done by people right now, as the Time reporters acknowledged early in their report.
An AI model is only as good as the data it is trained on. For large models, like the one behind ChatGPT, which indiscriminately scrape training data from all corners of the internet, it is guaranteed that toxic training will end up colouring outputs, unless that toxic content is identified and labelled beforehand by humans who know what is toxic and can train the model on what to avoid.
Of course, this is horrible work, but I doubt the Time reporters intended for this to be the central point of their report. There has been some excellent reporting on similar stories about sub-contractors for Facebook and other companies that have to deal with sensitive content, and there will likely be many more. It is a necessary side-effect of dealing with human-generated content. At some level, a janitor has to take out the trash, and there is no predicting how filthy that trash will be.
Besides, people do toxic or dangerous work in many other fields of human endeavour all the time. Often, these jobs sit at the very foundations of our civilisation. Think of sewer maintenance crews, pathologists, high voltage line technicians, police officers and oil rig workers. Our lives are backstopped by people breaking their backs (sometimes literally), to keep the barbarians away from the gates.
Additionally, it isn’t clear that there is an elegant solution to this problem in the case of AI, short of stopping the whole of AI technology development in its tracks until we can figure this out, which is unlikely to happen. Of course, sensible measures like proper recruitment and training, along with workplace resources like therapy, should be in place and easily accessible.
Here, we find possibly the first infraction for which Sama is liable. Time claimed that Sama management denied several employees the chance to see counsellors when they needed to do so. The company denied this, but it should have been investigated. Interestingly, however, this did not form the main thrust of either the article or the popular reaction to it.
The second aspect of the story is that OpenAI, an American company, sub-contracted this work to workers in a poorer country. Here, yet again, though we plunge into murkier ethical waters, there isn’t much that’s inherently controversial or new about this. Companies have been outsourcing low-value and relatively more labour-intensive work to poorer countries for as long as the global economy has been a thing.
Practically every sector of the modern economy works like this. And while some may decry it, once again, this work must be done, and it provides employment to large numbers of people, because it tends to be relatively more labour-intensive. Savvy governments, moreover, have demonstrated the ability to leverage low-value economic activity to slowly climb up the economic totem pole. Taiwan’s TSMC, along with China’s homegrown electronics giants, are excellent examples of just this.
There is nothing inherently wrong about productive work being outsourced to areas with cheaper or better skilled labourers. There might be specific examples of abuse, but that is hardly an indictment of the system. In the case of the OpenAI controversy, there is no evidence that OpenAI outsourced the work because it wanted to export harm to Kenya.
This leaves us with the third, and final major aspect of the story, which is that the workers earned less than US$2 per hour for their work. This is, in the final analysis, the most scandalous element of this story. It is especially important given that the company benefiting from this labour is worth billions of dollars and has American employees who earn orders of magnitude more than the sub-contractors.
From a quick reading, one gets an almost visceral reaction to the inequality and exploitation on display. However, we should not be so quick to lambast OpenAI for this. For one, the Time report states that OpenAI paid Sama an hourly rate of US$12.50 for the work. Therefore, only Sama can explain the difference between this figure and what its workers earned. And it is likely that the bulk of that difference really did go to business costs, as per Sama’s statement to Time.
Additionally, this aspect could benefit from a little additional context. While the pay these workers earned might seem like a pittance to Western readers, it is not necessarily the case in Kenya. A monthly income of just under KES 40,000 (around US$300, which the lowest earner would take home after taxes) might not support a life of luxury, but it certainly can support an individual or even a young family. As someone who lives in Kenya, I’ve seen people get by on way less.
The most important question, therefore, is whether this kind of pay is commensurate to this kind of work in Kenya. The problem is that no one really knows what the right number is. I may sound callous for stating this, but perhaps only a fair market can solve this conundrum. Sama could probably have paid more, and there is something malodorous about the inequality on display here, but this story is in no way a straightforward case of villainy.
Which countries are moderating ChatGPT’s toxic content now? It’s not public information. We asked the bot itself and were told:
OpenAI has not publicly disclosed the locations of the individuals or companies providing human annotations for its training data. It is possible that some of the workers involved in annotating and improving the training data for models like GPT-3 are based in Kenya, but this information has not been confirmed by OpenAI.
Of course, this is not to say that the whole controversy was a nothingburger. Far be it from me, an idealistic youth from the African country at the centre of this story, to defend multi-billion-dollar American companies and their sub-contractors.
My point is that I don’t think there can ever be a fair wage for this kind of work.