SigmaOnTopic
Development of a unified natural language semantic search tool on unstructured digital content
Challenge
The project in centered in the development of a unified natural language semantic search tool on digital content that allows to effectively retrieve information in any corporation, public body or entity that handles a lot of information of unstructured data in any source formats (documents, audios, videos, web pages, social networks, databases, etc.).
Sigma proposes to develop the SigmaOnTopic product that will use as main source the available digital repositories and will allow content to be retrieved through searches expressed in natural language that will analyze any type of document previously indexed through the search engine that it incorporates. Among the main novelties offered by the proposed solution, the following two should be highlighted.
Linguistic preprocessing of queries will be incorporated to allow the user to use natural language in the search field. The system will translate the user’s request to the syntax of the search engine, establishing the necessary filters or parameters depending on the intentions and entities detected.
Vector representations will be included in the indexing using Language trained with neural networks to allow more semantic search. That is, it will return documents and fragments in which not only the terms of the query are present but also other expressions with the same meaning. The system is also capable of retrieving text contained in audios and videos, previously reducing noise to improve intelligibility.
Solution
SigmaOnTopic is an individual industrial research project, which will allow the Sigma Group to continue being a reference in offering advanced document management capabilities without changing the data infrastructures and repositories being used.
Results
The execution phase of SigmaOnTopic started in mid- 2022 and will be running up to May 2024.
The project is organized in four work packages:
SigmaOnTopic laboratory prototype development
UPCT adaptations and prototype integration
Commissioning and evaluation pilot
Management and dissemination of results.
SigmaOntopic is being integrated in the digital content platform that is being used extensively in Spanish universities, created by the Polytechnical university of Cartagena (UPCT).
Funded by
The project is supported by the EU Next Generation funds through the Red.es organization and is part of the Recovery, Transformation and Resilience Plan.
Partners
The project will be developed entirely by Sigma Cognition, with the support of the Polytechnical University Of Cartagena, who will implement the pilot and perform the evaluation of the results.
Project News and Events
IX Seminario Anual del Máster en Ciencia del Lenguaje y Lingüística Hispánica (30 de Junio, 2023)
Un buscador semántico de Inteligencia Artificial localiza información tanto en texto como en audios y vídeos (Universidad Politécnica de Cartagena)