컨텐츠 시작

학술대회/행사

초록검색

제출번호(No.) 0622
분류(Section) Contributed Talk
분과(Session) Probability / Stochastic Process / Statistics (SS-12)
영문제목
(Title(Eng.))
Probabilistic analysis of a central data structure in compression algorithms
저자(Author(s))
Kanal Hun2, Brigitte Vall\'ee1
University of Caen1, Royal university of Phnom Penh2
초록본문(Abstract) The digital search tree (dst) is a central structure which contains words, and it plays a central role in compression algorithms. It is thus important to obtain its probabilistic analysis. This analysis had been already studied for simple sources (memoryless and Markov chains sources). However, this structure is often used when the source which emits words is more ``realistic '' and more ``correlated'', and we are led to deal with general sources. The talk describes a precise analysis of the dst structure when it is built with $n$ words which are emitted from a general source $\mathcal S$. In particular, we prove that the mean path length of a dst is asymptotic to $ 1/h(\mathcal S) n \log n$, where $h(\mathcal S)$ is the entropy of the source, provided that the source is ``tame''. This result shows that the dst structure has the same performances as another dictionary structure, called the Trie, which had been analyzed in the case of general sources.

We use methods from ``analytic combinatorics'', described in the book of Flajolet and Sedgewick, and we mainly deal with generating functions. We first obtain a system of equations for these generating functions, which can be solved with the aid of Laplace and Mellin transform. We then obtain an exact expression for the mean path length, which involves the transition operator relative to the source. However, the asymptotics of the exact expression cannot be derived easily, and we use the Rice Formula to obtain the final result. The first steps are true for all the sources, but the last step (Rice Formula) can be only performed if the source possesses ``tameness'' properties. We also describe probabilistic properties of the source under which it is proven to be tame.
분류기호
(MSC number(s))
05D40
키워드(Keyword(s)) digital search tree, compression algorithms, probabilistic sources, generating functions
강연 형태
(Language of Session (Talk))
English