Pitfalls and potholes of content moderation for chatbots, Elayne Ruane

Video and Slides

Outline: Pitfalls and potholes of content moderation for chatbots

Elayne Ruane, PhD researcher in QA of Conversational AI at LERO Centre

  • Chatbots can provide a fast and convenient experience to customers who need to solve a problem, complete a task, or get some information.
  • The promise of the speed and availability of a machine combined with the conversation and accessibility of a human is an attractive solution.
  • But what about when chatbots fail to live up to those expectations, the user gets frustrated, and the conversation gets heated?
  • Abusive messages from users towards chatbots are not uncommon and moderation efforts are fraught with unintended consequences.
  • This presentation will discuss approaches to content moderation for chatbots, common pitfalls, and some recommendations for handling abusive messages.

Presentation Review

You can ask Elayne any questions about this presentation in the comments section of this weblog, or contact Elayne directly with the info at the end of the presentation.

Moderation is more than a social media chat room problem. Its an issue for any brand using a chatbot to engage with its customers and prospects. Google has a team dedicated to it within their AI Ethics.

Chat can be voice or text, as voice will be converted to text. Though with ASR (Automatic Speech Recognition) care needs to be taken as an innocent regional pronunciation could be converted into a profanity.

Moderation is highly application specific, as some words may be permissible in some context e.g. in healthcare or education, that would not be expected in general customer service.

Elayne works on bots, especially in customer service; abusive messages come with the territory, they are unavoidable. Even abuse directed at the people creating the bot. Its born out of customer frustration. A critical point Elayne makes is the decisions a brand makes around moderation have a significant impact of how customers perceive the brand and in protecting the user.

What is offensive is much more than covered by profanity filters. Its ideas, and is subject to regional and cultural variations. Elayne provides excellent examples of the impact of unsupervised training, and how some simple filters have unintended consequences. The critical point Elayne states is do not create the bot in a vacuum, use a diverse team to help define what is offensive / abusive language for your specific application and brand.

This is just leading up to identifying abuse. Next comes how to deal with abuse. Is it a non-response, deflection, no response (silence), informing, reporting, escalating to a live agent. Interestingly, bots in general and bots with feminine voices receive substantially more abuse than live agents. Elayne provides lots of interesting examples of unintended consequences.

The cost of a false positive can be high for both the brand and the customer/user. Even if you choose to constrain the bot space to only respond to specific intents with a limited vocabulary, the training data generated still needs work. And remember those chatbot interactions are public, they can and will be recorded, and potentially make their way onto social media.

Elayne provides excellent advice on managing moderation:

  • Thoughtful design is important (legally, morally, commercially…)
  • Protecting the user is #1 – give users recourse and the benefit of the doubt
  • Your chatbot is just ones and zeros but your team are people!

Thank you Elayne for an insightful presentation on Chatbot Moderation. I hope you’ve raised the TADSummit community’s awareness of this critically important and under-discussed topic.

4 thoughts on “Pitfalls and potholes of content moderation for chatbots, Elayne Ruane”

  1. Thank you Elayne for such excellent insight into chatbot moderation.

    1) When we first talked about moderation my focus was on applications like Clubhouse, where I’d witnesses some nasty ideas being discussed. Your presentation helped me realize the importance of moderation across all chatbots. It’s the public face of your brand. For a business that has not considered moderation so far, what are the first steps you’d recommend?

    2)In your conclusions you mentioned thoughtful design. Would you please give some specific examples?

    3) Until you mentioned it, I’d not considered the impact of the abusive messages on the chatbot team. How can this be mitigated?

    4) I really like the unintended consequences examples. Do you have some more?

  2. Hi Alan, thanks for your great questions!

    1) I think a great first step is taking stock of what is known. If there are conversation transcripts, review them (or a subset) for abusive or inappropriate messages to get a sense of the moderation needed. To determine the response strategy, I would recommend creating user profiles to understand the user group(s) and focus on characteristics that may increase the group’s vulnerability (e.g. minors). Once you understand your user group and you have a sense of how they’re speaking to the chatbot you’re in a great position to set up your moderation efforts.

    2) What I’m trying to get at here with thoughtful design is really interrogating a solution such as a conversation flow design and how it may impact different types of users and in different scenarios. In my experience, conversation design is the land of unintended consequences – there’s always going to be some users who reply in a way that wasn’t anticipated, or some conversational context that changes how the user interprets the chatbot’s responses. Too often, we take a one-size-fits-all approach and it’s important to acknowledge that our design decisions can have far reaching impact. Thoughtful design challenges us to investigate that impact before we roll out to end users. At a minimum, diverse perspectives on a team will help and I would also recommend validating design decisions with people external to the team.

    3) I think some ways to mitigate the mental toll this kind of moderation can take on members of a team is to automate as much as possible, don’t leave any one person to read through these messages in isolation, and lastly, celebrate the wins! It’s easy to look over success conversations because we want them to be the standard but it’s important to have some balance. Being aware of the potential harm means you’re ready to spot if someone needs extra support or is overwhelmed.

    4) Often unintended consequences for flagging specific terms are realized at the expense of the user. Going back to thoughtful design and having diverse perspectives on the team, hopefully you can catch these before it impacts any individual or group. There are a lot of examples but typically any kind of identity marker or a word that can be used in multiple contexts will be dicey. I think for each word in a profanity dictionary, challenge whether we can use it in at least one valid context and go from there. In terms of specific examples, you can look broadly at different categories. For example, take words related to violence such as the word “kill” in these two sentences: “I’ll kill you” vs. “I need to kill some background apps to use your service”.

Comments are closed.