Bayesian and neural ranking approaches for supporting schema references in keyword queries over relational databases

Resumo

Relational Keyword Search (R-KwS) systems enable naive/informal users to explore and retrieve information from relational databases without knowing schema details or query languages. These systems take the keywords from the input query, locate the elements of the target database that correspond to these keywords, and look for ways to “connect” these elements using information on referential integrity constraints, i.e., key/foreign key pairs. Although several such systems have been proposed in the literature, most of them only support queries whose keywords refer to the contents of the target database. Very few support queries in which keywords refer to elements of the database schema. In this work, we propose Lathe, a novel R-KwS designed to support such queries. To this end, we first generalize the well-known concepts of Query Matches (QMs) and Candidate Joining Networks (CJNs) to handle keywords referring to schema elements and propose new algorithms to generate them. Then, we introduce an approach to automatically select the CJNs that are more likely to represent the user intent when issuing a keyword query. Our key contributions are a novel Bayesian-based QM ranking algorithm that prioritizes relevant QMs, avoiding the processing of less likely answers, an effective Bayesian CJN ranking algorithm leveraging QM rankings to prioritize and evaluate relevant CJNs, an eager CJN evaluation strategy that discards spurious CJNs early, and a novel transformer-based neural approach for QM ranking and CJN ranking, leading to improved results on metrics such as recall and R@k. We present a comprehensive set of experiments performed with query sets and datasets previously used in experiments with state-of-the-art R-KwS systems and methods. Our results indicate that Lathe can handle a wider variety of keyword queries while remaining highly effective, even for large databases with intricate schemas. Additionally, we developed PyLatheDB, a Python library for Relational Keyword Search that implements Lathe.

Descrição

Citação

MARTINS, Paulo Rodrigo Oliveira. Bayesian and neural ranking approaches for supporting schema references in keyword queries over relational databases. 2024. 112 f. Tese (Doutorado em Informática) - Universidade Federal do Amazonas, Manaus, 2024.

Avaliação

Revisão

Suplementado Por

Referenciado Por

Licença Creative Commons

Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto