Research on advanced data annotation tools
High quality data is the cornerstone on which AI is based. But the data generation process needs a lot of human intervention, which causes problems of inconsistency, human failures, high costs in time and money. It is also difficult to source tools that can provide data with the following characteristics: high quality, regulatory compliant, securely generated, scalable and fast, affordable, flexible, consistent, balanced representation of the domain they are trying to represent and accurate in terms of annotation.
The objective of the HADA project is to design a set of data annotation tools for the most used data inputs for AI: Voice, Text and Image. This will allow Sigma to have an advanced annotation tool framework that will increase and speed up its services around data annotation and will pave the way to the commercialization of annotation tools.
The project will research and address the stages of the machine learning lifecycle:
This solution will be a scientific annotation framework of Human-in-the-Loop Artificial Intelligence (HITL AI)
The tools under development support the entire data annotation process and include:
Active Learning: Research and implementation of hybrid unsupervised and semi-supervised models to reduce the need for large labeled data sets.
Data Anonymization: Application of anonymization on the algorithms used for data selection, annotation support and quality control. Automatic Distractors Removal through AI modeling and data enhancement.
Decision Reduction: AI model to assist in the intelligent reduction of labelling options provided to the annotator, tending to binary classification problems.
Multiple Annotation: Intelligent data clustering algorithms that allow simultaneous annotation of more than one sample at a time.
Automatic Error Detection: Automatic annotation error detection using unsupervised learning techniques.
The project started at the end of 2022 and is expected to be completed by mid-2024.
The project 2021/C005/00146323 is funded by the EU Next Generation through the public business entity attached to the Ministry of Economic Affairs and Transformation.
The project will be developed entirely by Sigma Cognition, with the support of two specialized groups from the Polytechnic University of Madrid (UPM) and Universidad Carlos III.