Breaking the Black Box
How Machines Learn to Be Racist
Early computers were mostly just big calculators, helping us process large numbers. Now, however, computers are so powerful that they are learning how to make decisions on their own in the rapidly growing field of artificial intelligence.
But AI-enabled machines are only as smart as the knowledge they have been fed. Microsoft learned that lesson the hard way earlier this year when it released an AI Twitter bot called Tay that had been trained to talk like a Millennial teen. Within 24 hours, however, a horde of Twitter users had retrained Tay to be a racist Holocaust-denier, and Microsoft was forced to kill the bot.
This was not the first episode of an AI system learning the wrong lessons from its data inputs. Last year, Google’s automatic image recognition engine tagged a photo of two black people as “gorillas” — presumably because the machine learned on a database that hadn’t included enough photos of either animals or people. The company apologized and said they would fix it.
To illustrate how sensitive AI systems are to their information diet, we built an AI engine that deduced synonyms from news articles published by different types of news organizations. We used an algorithm created by Google called word2vec, that is one of the neural nets that Google uses in its search engine, its image recognition tool, and to generate automatic email responses.
We trained the synonym picker by having it “read” hundreds of thousands of articles from six different categories of news outlets:
- Left:The Huffington Post and The Nation
- Right:The Daily Caller and Breitbart News
- Mainstream:The New York Times and The Washington Post
- Digital:The Daily Beast and Vox
- Tabloids: The New York Post and the New York Daily News
- ProPublica
Then we let the synonym picker guess which words appeared to have similar meanings, based on the knowledge it gained from each news database. The varied results generated by each category were striking.
Consider the synonyms generated for “BlackLivesMatter.” For the Left-trained AI, “hashtag” was the closest synonym; for the Right-trained AI, it was “AllLivesMatter.” For the AI trained with Digital news outlets, close synonyms were “Ferguson” and “Bernie.”
Or consider synonyms for “woman.” In the Tabloids, “victim” ranked high, while in ProPublica (admittedly trained on the smallest amount of data), “knifepoint” ranked as a close synonym. For “man,” the words “son,” “lover” and “gentleman” were ranked about as high on the list of synonyms by news outlets on the Left as “stabs,” “suspect” and “burglar” were by outlets on the Right.
And for “abortion,” the Left-trained AI chose “contraception” as a close synonym, while the Right-trained AI chose “parenthood” and “late-term.” The Mainstream-media-trained AI chose “clinics” among its top synonyms.
Try it for yourself here.
Check out our previous episodes, including our tool that shows you what Facebook knows about you.