컨텐츠 시작
학술대회/행사
Math Events
행사일정 보기 | |
---|---|
행사분류 | 국내 |
행사종류 | 콜로퀴움 |
행사명 | Dual Representation Topic Model |
행사일자 | 2016-10-13 |
장소 | 카이스트 자연과학동 공동강의실 |
장소(영문) | Room 1501, Building E6-1, KAIST |
링크주소 | https://mathsci.kaist.ac.kr/home/schedul/seminar/ |
내용 | In statistical methods for language and document modeling, there are two major perspectives: representation at the document level, and representation at the word level. At the document level, topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), based on the word-document matrix, aim to discover topics whose dimensionality is much lower than the size of the vocabulary. At the word-level, language models such as n-grams and neural word embedding, based on the word co-occurrence matrix, aim to represent each word in a high-dimensional vector space. In this work, we develop Dual Representation Topic Model (DRTM), a novel topic model which combines the advantages of the two approaches. DRTM models documents and words based on the locations of the individual words within documents, as well as the local contexts of the words. DRTM transforms each document into a network of words by generating edges when words of near proximity have high semantic similarity. Then it infers the topic for each edge - a pair of words - rather than assigning topics for individual words as in traditional topic models. This enables the model to learn a better document representation by inferring the global topics while considering the local contexts of individual words. |