How artificial intelligence can fight hate speech in social media
With social media users numbering in the billions, all hailing from various backgrounds and bringing diverse moral codes to today’s wildly popular platforms, a space for hate speech has emerged. Internet service providers have responded by employing AI-powered solutions to address this insidious problem.
Hate speech is a serious issue. It undermines the principles of democratic society and the rules of public debate. Legal views on the matter vary. On the internet, every statement that transgresses the standards for hate speech established by a given portal (Facebook, Twitter, Wikipedia etc.) may be banned from publication. To get around such bans, numerous groups have launched platforms to exchange their thoughts and ideas. Stricter definitions of hate speech are common. They make users feel safe, which is paramount for social media sites as the presence of users is often crucial to income. And that’s where building machine learning models spotting the hate speech comes in.
What is hate speech?
The definition of hate speech varies by country and may apply to various aspects of language. Laws prohibit directing hateful speech and defamatory language toward another’s religion, ethnicity or sexuality. Many countries penalize anyone who agitates violence or genocide. Additionally, many legislatures ban symbols of totalitarian regimes and limit the freedom of assembly when ideologies like fascism or communism are involved.
In its most common form, hate speech attacks a person or group based on race, religion, ethnicity, national origin, disability, gender or sexual orientation. As regards what’s legal, the devil, as usual, is in the details. Finding the balance between freedom of speech and the protection of minority rights makes it difficult to produce a strict definition of hate speech. However, the problem has certainly grown with the rise of social media companies. The 2.27 bln active users of Facebook, who come from various backgrounds and bring diverse moral codes to the platform, have unwittingly provided a space for hate speech to emerge. Due to the international and flexible nature of the Internet, battling online hate speech is a complex task involving various parties.
Finally, there is a proven link between offensive name-calling and higher rates of suicide within migrant groups.
Why online hate speech is a problem
As a study from Pew Research Center indicates, 41% of American adults have experienced some form of online harassment. The most common is offensive name calling (experienced by 27%) and purposeful embarrassment (22%). Moreover, a significant number of American adults have experienced physical threats, sustained harassment, stalking and sexual harassment (10%, 7%, 7% and 6% respectively).
Hate speech itself has serious consequences for online behavior and general well-being. 95% of Americans consider it a serious problem. At 27%, more than one in four Americans have reported deciding not to post something when encountering hate speech toward another user. 13%, meanwhile, have stopped using a certain online platform after witnessing harassment. Ironically, protected as a form of free speech, hate speech has resulted in muting more than a quarter of internet users.
Who should address the issue
Considering both vox populi and practice, online platforms are to tackle the problem of user’s hate speech. According to the Pew Research Center report cited above, 79% of Americans say that online service and social network providers are responsible for addressing harassment. In Germany, companies may face a fine of up to 50m euro if they fail to remove within 24 hours illegal material, including fake news and hate speech.
Hate speech is not always as blatant as calling people names. It can come in many subtler forms, posing as neutral statements or even care. That’s why building more sophisticated AI models that can recognize even the subtlest forms of hate speech is called for.
How those models should be built
When building a machine learning-powered hate speech detector, the first challenge is to build and label the dataset. Given that the differences between hate speech and non-hate speech are highly contextual, constructing the definition and managing the dataset is a huge challenge. The context may depend on:
- The context of the discussion – historical texts full of outdated expressions may be automatically (yet falsely) classified as hate speech
- Example: Mark Twain’s novels use insulting language; citing them may set off hate speech bells.
- How the language is used – in many countries, hate speech used for artistic purposes is tolerated.
- Example: Hip-hop often uses misogynistic language while heavy metal (especially the more extreme sub-genres) is rife with anti-religious lyrics.
- The relationship of the speaker to the group being hated – the members of a group are afforded more liberties with using aggressive or insulting language when addressing other members of that group than are those who are not a part of it.
- Example: the term “sans-cullottes” was originally coined to ridicule the opponents of conservatives. It literally meant “people with no trousers” and was aimed at the working class, members of whom wore long trousers instead of the fashionable short variety. The term went on to enter the vernacular of the working classes in spite of its insulting origins.
Irony and sarcasm pose yet another challenge. According to Poe’s law, without smileys or other overt signs from the writer, ironic statements made online are indistinguishable from serious ones. In fact, the now-ubiquitous emoticons were invented by professors at Carnegie Mellon University to avoid mistakes.
When the dataset is ready, building a blacklist of the most common epithets and hate-related slurs may be helpful, as automation-based blacklist models are effective 60% of the time in spotting online hate speech (based on our in-house benchmarks). Building both supervised and unsupervised learning models to spot the new combinations of harmful words or finding existing ones may raise that effectiveness further. Hate speech is dynamic and thus evolves rapidly as new forms and insulting words emerge. By keeping an eye on conversations and general discourse, machine learning models can spot suspicious new phrases and alert administration.
A formula of hate
An automated machine learning model is able to spot the patterns of hate speech based on word vectors and the positions of words with certain connotations. Thus, it is easier to spot emerging hate speech that went undetected earlier, as current politics or social events may trigger new forms of online aggression.
Unfortunately, people spreading hate have shown serious determination to overcome automated systems of spotting hate speech by combining common ways of fooling machines (like using acronyms and euphemisms) and perpetuate hate.
Challenges and concerns
One of the main concerns in building machine learning models is finding a balance between the model’s vigilance and the number of false positives it returns. Considering the uneasy relations between hate speech and freedom of speech, producing too many false positives may be considered by users an attempt at censorship.
Another challenge is to build the dataset and label the data to train the model to recognize hate speech. As machines themselves are truly neutral, the person responsible for the dataset may be biased or at least influenced to profile the hate speech recognition model. Thus, the model may be built to purposefully produce false-positives in order to reduce the prevalence of certain views in a discussion.