Adding human supervision to AI
A tutorial on how to implement humans in the loop.
Dec 22, 2023
At our previous company, AI was core to the product (identity verification), but we still needed to have human agents in the loop for edge cases or to maintain a form of quality control and supervision.
In practice, three different software teams collaborated to make it work:
AI: our team, built an AI inference service, with endpoints like “read-documents”.
Review UI: built the human review UI. It typically worked as a real-time labeling interface, where agents would be connected and perform one task after the other.
Workflow: built the core “dispatch engine”, a workflow that orchestrated the mix of AI endpoints and human tasks.
In this post, I’ll explain how it all worked together, starting with a simple example.
How to set up human supervision correctly
Let’s take a simple example where you want to classify cats and dogs. Here are the 5 steps you would take (cf. image above):
Send the photo to your backend
Call your AI model, which returns a prediction and a confidence score
If the confidence score is too low, we need humans in the loop
Send a task to one or multiple human agents for review
You get the final output, a mix of the AI prediction and the two human reviews
There is a lot of hidden complexity on steps 3 and 4.
On step 3, the AI models should give a calibrated confidence score (more on model calibration here). For instance for the security camera AI, a score of
0.6 means there is a 60% chance the alert is real. Picking the right confidence threshold means balancing false negatives (missed alerts) and false positives (false alerts). Any alert above the confidence threshold will go to a human for review.
On step 4, you need to handle the human operations: review incoming alerts, manage the different priority levels, assign human agents depending on their skills… A lot of products run 24/7 so you need to rotate different teams, manage outsourced contractors, hire new agents and staff them correctly to constantly absorb the load.
Creating the right task UI for human review
Some design principles for building an efficient task UI:
One pagers. Each task should be a clear one pager with all the required information to complete the task at hand. You want to minimize the number of opened tabs and clicks for the agent, and only display the data required for the task. This is key to increase performance (agents can be focused on the actual task) and quality.
Stateless. It should be possible to complete the task several times without any side effect. This will allow to compare results between different completions. For instance you could send the task to 2 different agents, or to both an AI and a human, and get the final result through consensus algorithms. It also becomes possible to send historical benchmark tasks mixed into real production tasks.
Json in, json out. To be stateless, the task UI should behave like an API, with “json in, json out”. All the input of a task should be enough to display and perform it, while the only result should be the json output, without side effects on the DB.
This will allow you to scale your human operations, send the same task to multiple agents for quality control, and progressively automate each of these tasks.
Fragment: a middle layer between AI and Ops
To make human in the loop work, you need two different teams to work hand in hand:
The AI team building the automation software
The Ops team running the human operations, managing outsourced agents…
At Fragment, we believe these two teams can work more efficiently with the right software. We are building a middle layer on top of your AI, that adds human supervision with one API call.
Here are the steps from the previous example, now with Fragment:
Send the photo to your backend
Call the Fragment API to create the task
Fragment will call your AI inference service, which returns a prediction and a confidence score
Fragment optionally adds human supervision, depending on human review rules set up in the admin panel. All the ops management complexity is handled on Fragment
Return the final output from the Fragment API, from a mix of AI and humans
The benefit for AI engineers is to have a simple API for human tasks. You can focus on building better models and improving automation, while the complexity of managing humans is hidden away.
For ops teams who actually need to handle that complexity, Fragment is a task management system for the human agents. Agents install the Fragment browser extension, which adds an embedded player to help them navigate from task to task. Managers can monitor and improve the operations on 3 main dimensions:
Efficiency / cost: treatment time for each task
Quality: keep the quality high enough through review
Reactivity: improve response time to new tasks and maintain your SLAs
If this seems relevant for you, feel free to reach out !
Stay in the loop