Breaking the Black Box

How Machines Learn to Be Racist

This is the fourth installment in a series that aims to explain and peer inside the black-box algorithms that increasingly dominate our lives.

Early computers were mostly just big calculators, helping us process large numbers. Now, however, computers are so powerful that they are learning how to make decisions on their own in the rapidly growing field of artificial intelligence.

But AI-enabled machines are only as smart as the knowledge they have been fed. Microsoft learned that lesson the hard way earlier this year when it released an AI Twitter bot called Tay that had been trained to talk like a Millennial teen. Within 24 hours, however, a horde of Twitter users had retrained Tay to be a racist Holocaust-denier, and Microsoft was forced to kill the bot.

This was not the first episode of an AI system learning the wrong lessons from its data inputs. Last year, Google’s automatic image recognition engine tagged a photo of two black people as “gorillas” — presumably because the machine learned on a database that hadn’t included enough photos of either animals or people. The company apologized and said they would fix it.

To illustrate how sensitive AI systems are to their information diet, we built an AI engine that deduced synonyms from news articles published by different types of news organizations. We used an algorithm created by Google called word2vec, that is one of the neural nets that Google uses in its search engine, its image recognition tool, and to generate automatic email responses.

We trained the synonym picker by having it “read” hundreds of thousands of articles from six different categories of news outlets:

Then we let the synonym picker guess which words appeared to have similar meanings, based on the knowledge it gained from each news database. The varied results generated by each category were striking.

Consider the synonyms generated for “BlackLivesMatter.” For the Left-trained AI, “hashtag” was the closest synonym; for the Right-trained AI, it was “AllLivesMatter.” For the AI trained with Digital news outlets, close synonyms were “Ferguson” and “Bernie.”

Or consider synonyms for “woman.” In the Tabloids, “victim” ranked high, while in ProPublica (admittedly trained on the smallest amount of data), “knifepoint” ranked as a close synonym. For “man,” the words “son,” “lover” and “gentleman” were ranked about as high on the list of synonyms by news outlets on the Left as “stabs,” “suspect” and “burglar” were by outlets on the Right.

And for “abortion,” the Left-trained AI chose “contraception” as a close synonym, while the Right-trained AI chose “parenthood” and “late-term.” The Mainstream-media-trained AI chose “clinics” among its top synonyms.

Try it for yourself here.

See What AI Learns

We created this AI system using Google’s open source technology, and trained it to produce synonyms based on what it learned from different news sources. We trained it on six different datasets, each composed of tens of thousands of articles published by the news outlets described below.

The synonyms are ranked in descending order based on how closely the AI system thought it matched the word entered. Highlighted words are synonyms that are unique to a dataset.

Left

The Nation Huffington Post

  1. trump 100%
  2. donald 74%
  3. study 51%
  4. relax 45%
  5. venezuela 43%
  6. realdonaldtrump 41%
  7. win 41%
  8. actually 40%
  9. sanders 40%
  10. bernie 39%
  11. seidman 39%
  12. severe 39%
  13. drumpf 38%
  14. threat 38%
  15. how 38%
  16. clinton 38%
  17. crisis 37%
  18. fascism 36%
  19. tweet 36%
  20. hillary 36%

Right

Daily Caller Breitbart

  1. trump 100%
  2. donald 70%
  3. clinton 63%
  4. hillary 62%
  5. nominee 62%
  6. romney 60%
  7. presidential 56%
  8. presumptive 55%
  9. frontrunner 54%
  10. front-runner 54%
  11. cruz 54%
  12. mitt 52%
  13. \xc2\xa0trump 51%
  14. republican 51%
  15. he 51%
  16. gop 51%
  17. sanders 51%
  18. trump\xc2\xa0 50%
  19. she 50%
  20. rubio 50%

Beltway

New York Times Washington Post

  1. trump 100%
  2. donald 73%
  3. cruz 67%
  4. clinton 67%
  5. mrs 65%
  6. sanders 64%
  7. hillary 64%
  8. rubio 64%
  9. romney 62%
  10. obama 61%
  11. candidate 59%
  12. gingrich 57%
  13. presidential 55%
  14. nominee 54%
  15. kasich 54%
  16. lyin 54%
  17. candidacy 54%
  18. presumptive 53%
  19. republican 52%
  20. paladino 52%

Tabloid

New York Post New York Daily News

  1. trump 100%
  2. donald 73%
  3. front-runner 62%
  4. presidential 61%
  5. billionaire 59%
  6. romney 56%
  7. bombastic 56%
  8. republican 55%
  9. presumptive 54%
  10. gop 54%
  11. trumps 52%
  12. clinton 51%
  13. mogul 51%
  14. hillary 51%
  15. blowhard 50%
  16. bloviating 50%
  17. frontrunner 50%
  18. trumpian 49%
  19. blustering 49%
  20. bigoted 48%

Digerati

Vox Daily Beast

  1. trump 100%
  2. donald 76%
  3. clinton 65%
  4. hillary 61%
  5. campaign 60%
  6. cruz 60%
  7. carson 59%
  8. candidate 59%
  9. he 57%
  10. sanders 56%
  11. romney 56%
  12. obama 55%
  13. republican 55%
  14. gop 54%
  15. gingrich 53%
  16. rubio 53%
  17. putin 52%
  18. his 52%
  19. palin 51%
  20. priebus 51%

ProPublica

  1. trump 100%
  2. donald 74%
  3. study 51%
  4. relax 45%
  5. venezuela 43%
  6. realdonaldtrump 41%
  7. win 41%
  8. actually 40%
  9. sanders 40%
  10. bernie 39%
  11. seidman 39%
  12. severe 39%
  13. drumpf 38%
  14. threat 38%
  15. how 38%
  16. clinton 38%
  17. crisis 37%
  18. fascism 36%
  19. tweet 36%
  20. hillary 36%

Check out our previous episodes, including our tool that shows you what Facebook knows about you.

Additional design and production by Rob Weychert and David Sleight.


Comments powered by Disqus