What is human in the loop?
From our combined 10 years of experience working in AI
Dec 15, 2023
Back in 2021, my co-founder and I joined a series B startup to lead the AI team. Our job was to automate as much as possible the identity verification process (called KYC): reading passports, checking the facematch with a selfie, detecting document fraud…
We initially thought we could focus on training and deploying machine learning models to reach close to full automation, but we were wrong. The number of human agents actually grew to several hundreds, and we had to build a whole internal platform to safely deploy AI in production, with humans in the loop.
Three different teams collaborated to make it work:
AI: our team was in charge of an AI inference service, with endpoints like “read-documents”.
Review UI: another team was in charge of the human review UI. It typically worked as a real-time labeling interface, where agents would be connected and perform one job after the other.
Workflow: The final team was in charge of the “dispatch engine”, a workflow that orchestrated the mix of AI endpoints and human tasks.”
We started Fragment, to build the tool we wish we had internally, and help every company automate real-world processes with a mix of AI and humans in the loop.
We will publish a series of blog posts around the topic of human in the loop, to share insights from our personal experiences and from our customers. This first post will explain the main topic: what is human in the loop?
What is “human in the loop” in AI and why should I use it?
“Human in the loop” can mean very different things in a machine learning setup. Let’s review typical situations:
Offline annotation. Use humans to annotate training data (see Karpathy’s blog post Software 2.0). Even ChatGPT uses human workers to annotate their training set or tune the model, with RLHF.
User feedback. Take user feedback as a stream of training data (collaborative filtering in recommender systems, online training, closed-loop training processes like Snorkel, etc.).
User instructions to tweak predictions (chat-based assistant, prompt engineering, etc.). The user can tweak inputs to a model and observe the response in real time.
Online review. Send predictions to humans (fraud alerts to be reviewed, failed OCR to be corrected, etc.).
Here we’ll deep dive on situation 4, which is typical for backoffice task automation for instance. In this case you have a semi-automated system, with a mix of AI and humans. Whenever the AI fails, it requires help from a human agent to complete the task or take over completely.
Most real-world AI products have humans in the loop. Imagine an automated production line with human supervisors checking that everything is going smoothly. The main reasons to use a combination of AI and humans are when you have:
High stakes: you want a human reviewer before taking a specific action, like banning someone for fraud (the EU AI Act goes into that direction)
High volumes: humans can’t review everything and rely on a pre-screening of alerts from the AI
Regulation: the EU AI Act will impose the use of human supervision. For a resume screening or legal decision, you want to have the key decisions made by a human, not an AI.
I’ll share two concrete examples.
Example 1: AI for security cameras
Suppose you’re selling AI for security cameras. These cameras record 24/7 and produce huge amounts of video data. It’s impossible for a human to monitor accurately all these video feeds, but a trained AI can process everything and trigger alerts.
Because you don’t want to miss any real alert (e.g. a fire starting), the threshold for an alert will be low and the model will output a lot of false alerts. So you need to have humans in the loop to dismiss the false positive alerts within a few seconds.
Example 2: AI for content moderation
Content moderation decisions in social networks require human judgement. But the volume of content produced every day is huge, so AI provides a first filter through it. Human reviewers in the loop check flagged posts before making a decisions. Over time, the labels from these reviewers help improve the AI model, which improves the automation rate and reduce the need for human agents. However, human attackers will always find new types of harmful content that goes through the automated filters, and we will always need human reviewers on the defense (moderation team).
This was a brief explainer on human in the loop for AI. I believe this is a topic that will only grow in significance, and will help define what software means in the future, and how humans will be involved.
In the next post, we’ll dive further into how to implement human in the loop concretely in your company, and how Fragment can help.
For now, here are some great resources to dive further:
Karpathy’s blog post Software 2.0 on how the nature of software changes with ML and becomes non deterministic (2017)
Blog post “How to Label 1M Data Points/Week” from Scale on how they built the humans in the loop for GPT-2 finetuning (2019)
Book Weapons of Math Destruction from Cathy O'Neil, on the dangers of blindly trusting data and AI with decisions (2016)
Stay in the loop