Global Data Science Forum

Using ML-Generated Text to Combat Hate Speech with Thoughtful Responses

By Michael Mansour posted Tue December 03, 2019 03:06 PM


Using ML-Generated Text to Combat Hate Speech with Thoughtful Responses



Researchers at UC Santa Barbara created a system that could respond to online forum hate-speech with comment meant to diffuse the vitriol, for instance on Reddit.  The goal is to have a bot that can automatically help moderate conversations by pointing out why a post is hateful or unacceptable in the online community, and how the poster should conduct themselves moving forward.  To generate the training data set, they leveraged Amazon Mechanical Turks to identify hateful comments and to write thoughtful responses. No such dataset existed prior to this, and it is now available to the public to use.  It would be interesting to have actual forum moderators create this dataset instead of Mechanical Turks, potentially yielding a higher quality dataset. 

To train a system that could automatically generate effective responses, they divided the task into 4 parts:

  1. Identifying key hate words
  2. Categorizing the hate speech
  3. Generate a response that maintains a positive tone, 
  4. Suggesting an action to the hate speech poster. 

For response-text-generation, they evaluate Seq-2-Seq, Variational Auto-Encoders, and Reinforcement Learning.  Efficacy of a response is judged by both humans and quantitative metrics. They find that the Seq-2-Seq and RL methods unsurprisingly had the best results, but overall, the evaluators preferred human-written responses over machine generated responses 70% of the time.  Some of the primary challenges stemmed from unbalanced vocabulary distributions in the dataset, and a wide variance in hate-speech posts.

You can read the original Arxiv paper here.


Diffusing bad behavior and vitriol, even for a human, is difficult. The results from the qualitative assessment of this tool suggest that it would not be effective as an automated tool to combat hate speech.  However, there are some areas for expansion using state-of-the-art NLP techniques leveraging BERT that were not considered in this implementation.  

While measuring that effectiveness is out of scope for this paper, areas to explore would be extracting the right context from the inappropriate comment and the larger discussion, offering a way to have the human operator tune the type of generated response, and measuring the impact of the generated response on the offenders subsequent behavior. 

This raises a larger issue, which is ML should not always be thought of as a silver bullet to solve a problem, but rather as a cognitive tool that can make humans more effective.  For instance, a human forum moderator could scale their ability to detect and respond to hate speech. Dealing with vitriolic online content takes its toll on humans, as seen with Facebook content moderators who suffered mental trauma from the job.  This highlights the importance of this kind of tool.