HADA

Research on advanced data annotation tools

Challenge

High quality data is the cornerstone on which AI is based. But the data generation process needs a lot of human intervention, which causes problems of inconsistency, human failures, high costs in time and money. It is also difficult to source tools that can provide data with the following characteristics: high quality, regulatory compliant, securely generated, scalable and fast, affordable, flexible, consistent, balanced representation of the domain they are trying to represent and accurate in terms of annotation. 

The objective of the HADA project is to design a set of data annotation tools for the most used data inputs for AI: Voice, Text and Image. This will allow Sigma to have an advanced annotation tool framework that will increase and speed up its services around data annotation and will pave the way to the commercialization of annotation tools.

The project will research and address the stages of the machine learning lifecycle:

  1. Data preparation and selection: Data considered most relevant to the improvement of AI models shall be selected and prepared in such a way as to make manual annotation easier.
  2. Annotation:  Develop technologies that simplify the activity of annotators to speed up the process and improve quality.
  3. Quality control: Establish solutions that help detect and correct errors while increasing consistency between different scorers.

This solution will be a scientific annotation framework of Human-in-the-Loop Artificial Intelligence (HITL AI)

Solution

HADA is an individual industrial research project, which supports Sigma’s continued offering of data annotation services.

Results

The project started at the end of 2022 and is expected to be completed by mid-2024.

 

Funding

The project 2021/C005/00146323 is funded by the EU Next Generation through the public business entity attached to the Ministry of Economic Affairs and Transformation.

       

 

Partners

The project will be developed entirely by Sigma Cognition, with the support of two specialized groups from the Polytechnic University of Madrid (UPM) and Universidad Carlos III.