SigmaOnTopic

Development of a unified natural language semantic search tool on unstructured digital content

Challenge

The project in centered in the development of a unified natural language semantic search tool on digital content that allows to effectively retrieve  information in any corporation, public body or entity that handles a lot of information of unstructured data in any source formats (documents, audios, videos, web pages, social networks, databases, etc.).

 

Sigma proposes to develop the SigmaOnTopic product that will use as main source the available digital repositories and will allow content to be retrieved through searches expressed in natural language that will analyze any type of document previously indexed through the search engine that it incorporates.  Among the main novelties offered by the proposed solution, the following two should be highlighted.

 

  • Linguistic preprocessing of queries  will be incorporated to allow the user to use natural language in the search field. The system will translate the user’s request to the syntax of the search engine, establishing the necessary filters or parameters depending on the intentions and entities detected.

 

  • Vector representations will be included in the indexing using Language trained with neural networks to allow more semantic search. That is, it will return documents and fragments in which not only the terms of the query are present but also other expressions with the same meaning. The system is also capable of retrieving text contained in audios and videos, previously reducing noise to improve intelligibility.

Solution

SigmaOnTopic is an individual industrial research project, which will allow the Sigma Group to continue being a reference in offering advanced document management capabilities without changing the data infrastructures and repositories being used. 

Results

The execution phase of SigmaOnTopic started in mid- 2022 and will be running up to May 2024. 

The project is organized in four work packages:

  1. SigmaOnTopic laboratory prototype development
  2. UPCT adaptations and prototype integration
  3. Commissioning and evaluation pilot
  4. Management and dissemination of results.

SigmaOntopic is being integrated in the digital content platform that is being used extensively in Spanish universities, created by the Polytechnical university of Cartagena (UPCT).

Funded by

The project is supported by the EU Next Generation funds through the Red.es organization and is part of the Recovery, Transformation and Resilience Plan.

 

Partners

The project will be developed entirely by Sigma Cognition, with the support of the Polytechnical University Of Cartagena, who will implement the pilot and perform the evaluation of the results.

Project News and Events

Publications

EN