컨텐츠 시작


Math Events

행사일정 보기
행사분류 국내
행사종류 콜로퀴움
행사명 Dual Representation Topic Model
행사일자 2016-10-13
장소 카이스트 자연과학동 공동강의실
장소(영문) Room 1501, Building E6-1, KAIST
링크주소 https://mathsci.kaist.ac.kr/home/schedul/seminar/

In statistical methods for language and document modeling, there are

two major perspectives: representation at the document level, and

representation at the word level. At the document level, topic models

such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet

process (HDP), based on the  word-document matrix, aim to discover

topics whose dimensionality is much lower than the size of the

vocabulary. At the word-level, language models such as n-grams and

neural word embedding, based on the word co-occurrence matrix, aim to

represent each word in a high-dimensional vector space. In this work,

we develop Dual Representation Topic Model (DRTM), a novel topic model

which combines the advantages of the two approaches. DRTM models

documents and words based on the locations of the individual words

within documents, as well as the local contexts of the words. DRTM

transforms each document into a network of words by generating edges

when words of near proximity have high semantic similarity. Then it

infers the topic for each edge - a pair of words - rather than

assigning topics for individual words as in traditional topic models.

This enables the model to learn a better document representation by

inferring the global topics while considering the local contexts of

individual words.