Inside the Black Box: Facebook, Content Moderation, and Machine Learning

Justin Stoner
8 min readNov 6, 2020
Neil Potts

This past October, I had the pleasure of meeting Neil Potts (a public policy expert at Facebook) at the High Tech Law Institute talk held by Santa Clara University School of Law. The discussion is part of the ongoing “Artificial Intelligence for Social Impact and Equity” series moderated by Professor Colleen Chien and Irina Raicu, Director of the Internet Ethics Program at the Markkula Center for Applied Ethics, at Santa Clara University.

Mr. Potts is the current Public Policy Director of Trust and Safety at Facebook. A graduate of the Naval Academy, his past experience includes serving as an Intelligence Officer with the U.S Marines and working as a public policy lawyer. At Facebook, he has an active role in developing policy positions, granting him regular oversight into the challenges and decision-making the social media giant faces — policy decisions which have sparked recent headlines in the tech and political world.

The notable segment of Neil’s discussion centered on how Facebook handles the sheer volume of content posted daily. Facebook relies, in large part, on Artificial Intelligence (“AI”) to help meet the growing social media expectations of Facebook users, government regulators, and shareholders. To this end, Facebook heavily relies on the wisdom of its public policy experts in helping guide what role the social media platform should play moving forward.

What sparked my interest in this topic is the promise (future prospect) machine learning has in the realm of AI. In studying how AI is used in content moderation, I now recognize the current state of natural language processing technology and how its capability to recognize context; in theory, this advanced form of machine learning could give Facebook a formidable ability to control its platform.

Of course, with natural language processing and the power to control content, several notable issues arise. Below are but a few issues to think about:

  1. Why is content moderation important in the first place?
  2. How does Facebook moderate their content?
  3. Why is machine learning useful for content moderation?
  4. What does the future of content moderation in social media look like?

The Importance of Content Moderation

Facebook is unprecedented in history: its ability to connect people, ideas and content is powerful. Couple this with its profit-making from users simply logging on and enduring the stream of advertising. But all this also comes with a cost. Remember Pizzagate? Facebook is now banning groups like QAnon that continue to use Facebook to further the misinformation connected to that conspiracy.

What happens on Facebook seems to have a significant effect on real world behavior — especially with regards to politics and hyperpolarization. For example, have you watched two news networks cover the same event, yet in a way that makes you think people are living in separate worlds? It’s like living in the Matrix.

Facebook has found itself proliferating this world of reality detachment and hyperpolarization. As it increasingly focuses on content moderation. Content moderation is more important — and possibly more evident now — than ever because of increasing social media use, the significant volume of content filtering through social media platforms, and the dangers of illicit or unlawful material going online. For example, terrorist organizations should not have a presence on the site.

In any event, Facebook has taken public criticism for some of its content moderation practices. Some contend the platform does not filter enough harmful content while others hold it filters too much good content. The former suggests that despite Facebook’s efforts there is still a lot of violence, graphic material, and misinformation on the platform. The latter suggests that Facebook’s algorithms make too many mistakes and remove posts that are not harmful at all.

And much like a black box, we are unable to see how data filters through content moderation algorithms. Much remains open as to how an algorithm determines what is “harmful” content. In 2020, criticism of content moderation has reached new highs as we head into the U.S. Presidential Election. In a notable move, Facebook recently decided to ban political ads in the week leading up to November 3. This is new territory for Facebook.

Content Moderation: Three-Stage Process

To prevent the spread of abusive and harmful content, Facebook has devised a multi-stage process. Because there are over 100 billion new posts on Facebook each day, a 100% human content moderation process is impossible. Machines ensure only the most nuanced content reaches them. Then the humans can make the decision to remove the content or not. It is not a perfect process, but with 13 different languages to support, errors are understandable.

Here is a step by step summary of the process:

1. Content is uploaded usually in the form of text, image, video, or sound.

2. The pre-moderation stage uses a machine learning algorithm which is trained to remove harmful content.

3. Whatever harmful content was missed by the pre-moderation stage is looked at by human moderators and by the user community.

4. Content that is not removed is made visible to users.

A Flowchart of the Content Moderation Process

A key thing to understand is the more data the algorithm receives the better it gets. Today’s algorithms are used to do one job really well. Programmers leverage this by taking the algorithm’s errors and use them to improve the algorithm. For algorithms data is king.

What is Harmful Content?

Harmful content is usually defined by community standards. Generally, harmful content involves material that is graphic, sexual, insensitive or illegal in nature. A post may be harmful on its face, but in some cases a more careful contextual analysis is needed. For example, a conversation which on a surface level indicates harmful material might, upon a deeper contextual analysis, be harmless. But contextual analysis requires more computers to process larger amounts of data, which in turn demands increased processing power and costs money.

Below is a chart illustrating the common forms of harmful content and the formats in which they appear.

An Illustration of Different Types of Harmful Content and Media

How Machine Learning Identifies Harmful Content

Machine learning is the type of AI used in content moderation. A computer uses a special computer program, in turn using a specific algorithm to achieve a desired outcome, but its effectiveness depends on the amount and variety of data you give it.

Below are the three most common methods machine learning is used:

Hash Matching

Hash matching is a common method which removes image and audio content already known to be harmful. It’s useful because hash matching uses little computational power. Hash matching involves assigning a unique digital ‘fingerprint’ to material already detected as harmful. Then when new content is uploaded, it can be automatically removed during the pre-moderation stage if the computed hash matches a hash stored in the database of known harmful content.

However, this method has its drawbacks. The hash-matching algorithm does not work on types of content that have not been previously identified as harmful. Also, this algorithm can be circumvented by extreme alterations to images and audio which can decrease the odds of detection through this approach.

An Example of Technical Architecture for Detecting Child Abuse Material

Keyword Filtering

Keyword filtering is a method that checks whether text contains any blacklisted words or phrases. The blacklisted words are stored in a database creating a simple moderation strategy. Keyword filtering allows companies like Facebook to tune their content moderation efforts according to their policies. It is similar to hash matching in that words left off the blacklist will not be removed. But because the list is not exhaustive, and with hundreds of languages and dialects existing, it is likely that users can bypass this method and find new harmful words not blacklisted. Also, words can intentionally be spelled wrong to circumnavigate the list. Therefore, this method requires constant attention to keep the list effective.

See below to find which comment was banned by keyword filtering.

An Example of How a Content Moderator Would Use Key Word Filtering

Natural Language Processing (“NLP”)

In contrast to the other two methods, NLP is a more advanced technique aimed at interpreting and understanding language within text. NLP is a developing area with the potential to help identify more instances of harmful content. NLP uses computers to process human language into written text. There are several content moderation techniques that use NLP:

Sentiment Analysis: Sentiment analysis is perhaps the better known use of NLP. Breaking down sections of texts into labeled data proves to be meaningful. For example, it is possible to classify text indicating emotion as positive or negative through several techniques. One technique called “bag of words” (“BoW”) assigns a score to words and their variations which enables text to be classified as positive or negative.

N-Grams: ‘N-grams’ is a similar technique which uses a labeling method called grouping. Words or characters in this technique can be grouped by their similarities (e.g., words that are misspelled or that contain numbers).

Word Embedding: Another popular method is Word Embedding, where words or phrases in entire sets of languages are mapped into real numbers or vectors. One of the advantages of using word embedding models is that the AI can be trained unsupervised. In other words, this method does not require classified data. Additionally, word embedding is an energy efficient process and is able to be trained on very large datasets. Word embedding techniques are sometimes used to categorize sentences prior to their input into deep learning NLP algorithms. Deep learning models are useful for understanding text and can also be used for sentiment analysis.

Below shows a spectrum of all the common moderation tools and their relative level of sophistication.

A Continuum Illustrating the Relative Complexity of Machine Learning Methods

Where Do We Go From Here?

With content moderation making headlines, questions arise as to where we (and Facebook) head from here, and the fate of content moderation in the future. All indications point toward Facebook becoming more proactive. Facebook’s policies are getting more aggressive in trying to prevent problems on the platform. At the same time, Facebook finds itself in a precarious situation and receiving pressure from several forces: Government regulators, and the public as notable examples. Others are boycotting Facebook outright. With every move, Facebook encounters scrutiny. Facebook may see content moderation as a way to grow, to develop better AI, and ultimately to turn a profit. But, content moderation carries a danger that Facebook will enter (or maybe already has stepped into) a never ending game of tug of war.


1. Cambridge Consultants. Use of AI in Online Content Moderation. The Altran Group, 2018.