Modelos de Tópicos baseados em Autocodificadores Variacionais utilizando as distribuições Gumbel-Softmax e mistura de Normais-Logísticas
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal do Amazonas
Resumo
Probabilistic topic models are statistical models which are able to identify topics on textual data. They are widely applied in many tasks related to Natural Language Processing due to their effective use of unlabeled data to capture latent relations. Analytical solutions for Bayesian inference of such models, however, are usually intractable, hindering the proposition of highly expressive text models. In this scenario, Variational Auto-Encoders (VAEs), where an artificial neural-based inference network is used to approximate the posterior distribution, became a promising alternative for inferring latent topic distributions of text documents. These models, however, also pose new challenges such as the requirement of continuous and reparameterizable distributions which may not fit so well the true latent topic distributions. Moreover, inference networks are prone to a well-known problem called component collapsing, where a little number of topics are effectively retrieved. To overcome these problems, we propose two new text topic models. The first (GSDTM) is based on the pseudo-categorical continuous distribution called Gumbel-Softmax which is able to generate categorical-like samples, while the second (LMDTM) adopts a mixture of Normal-Logistic distributions which can fits well in scenarios where the data distribution is complex. We also provide a study on the impact of different modeling choices on the generated topics, observing a trade-off between topic coherence and generative model quality. Through experiments using two reference datasets, three different quantitative metrics and one qualitative inspection, we show that GSDTM largely outperforms previous state-of-the-art baselines in most of scenarios, when considering average topic coherence and perplexity.
Descrição
Citação
SILVEIRA, Denys Dionísio Bezerra. Modelos de Tópicos baseados em Autocodificadores Variacionais utilizando as distribuições Gumbel-Softmax e mistura de Normais-Logísticas. 2018. 115 f. Dissertação (Mestrado em Informática) - Universidade Federal do Amazonas, Manaus, 2018.
Coleções
Avaliação
Revisão
Suplementado Por
Referenciado Por
Licença Creative Commons
Exceto quando indicado de outra forma, a licença deste item é descrita como Acesso Aberto

