Parallel text index construction

Kurpicz, Florian

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Fischer, Johannes	-
dc.contributor.author	Kurpicz, Florian	-
dc.date.accessioned	2020-07-07T06:55:18Z	-
dc.date.available	2020-07-07T06:55:18Z	-
dc.date.issued	2020	-
dc.identifier.uri	http://hdl.handle.net/2003/39196	-
dc.identifier.uri	http://dx.doi.org/10.17877/DE290R-21114	-
dc.description.abstract	In dieser Dissertation betrachten wir die parallele Konstruktion von Text-Indizes. Text-Indizes stellen Zusatzinformationen über Texte bereit, die Anfragen hinsichtlich dieser Texte beschleunigen können. Ein Beispiel hierfür sind Volltext-Indizes, welche für eine effiziente Phrasensuche genutzt werden, also etwa für die Frage, ob eine Phrase in einem Text vorkommt oder nicht. Diese Dissertation befasst sich hauptsächlich, aber nicht ausschließlich mit der parallelen Konstruktion von Text-Indizes im geteilten und verteilten Speicher. Im ersten Teil der Dissertation betrachten wir Wavelet-Trees. Dabei handelt es sich um kompakte Indizes, welche Rank- und Select-Anfragen von binären Alphabeten auf Alphabete beliebiger Größe verallgemeinern. Im zweiten Teil der Dissertation betrachten wir das Suffix-Array, den am besten erforschten Text-Index überhaupt. Das Suffix-Array enthält die Startpositionen aller lexikografisch sortierten Suffixe eines Textes, d.h., wir möchten alle Suffixe eines Textes sortieren. Oft wird das Suffix-Array um das Longest-Common-Prefix-Array (LCP-Array) erweitert. Das LCP-Array enthält die Länge der längsten gemeinsamen Präfixe zweier lexikografisch konsekutiven Suffixe. Abschließend nutzen wir verteilte Suffix- und LCP-Arrays, um den Distributed-Patricia-Trie zu konstruieren. Dieser erlaubt es uns, verschiedene Phrase-Anfragen effizienter zu beantworten, als wenn wir nur das Suffix-Array nutzen.	de
dc.description.abstract	The focus of this dissertation is the parallel construction of text indices. Text indices provide additional information about a text that allow to answer queries faster. Full-text indices for example are used to efficiently answer phrase queries, i.e., if and where a phrase occurs in a text. The research in this dissertation is focused on but not limited to parallel construction algorithms for text indices in both shared and distributed memory. In the first part, we look at wavelet trees: a compact index that generalizes rank and select queries from binary alphabets to alphabets of arbitrary size. In the second part of this dissertation, we consider the suffix array---one of the most researched text indices.The suffix array of a text contains the starting positions of the text's lexicographically sorted suffixes, i.e., we want to sort all its suffixes. Finally, we use the distributed suffix arrays (and LCP arrays) to compute distributed Patricia tries. This allows us to answer different phrase queries more efficiently than using only the suffix array.	en
dc.language.iso	en	de
dc.subject	Parallel algorithms	en
dc.subject	Shared memory	en
dc.subject	Distributed memory	en
dc.subject	Text index	en
dc.subject	Wavelet tree	en
dc.subject	Suffix array	en
dc.subject.ddc	004	-
dc.title	Parallel text index construction	en
dc.type	Text	de
dc.contributor.referee	Puglisi, Simon J.	-
dc.date.accepted	2020-05-26	-
dc.type.publicationtype	doctoralThesis	de
dc.subject.rswk	Paralleler Algorithmus	de
dc.subject.rswk	Gemeinsamer Speicher	de
dc.subject.rswk	Verteilter Speicher	de
dc.subject.rswk	Wavelet	de
dc.subject.rswk	Suffix Array	de
dcterms.accessRights	open access	-
eldorado.secondarypublication	false	de
Appears in Collections:	LS 11

Files in This Item:

File	Description	Size	Format
Dissertation_Florian_Kurpicz.pdf	DNB	2.59 MB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

This item is protected by original copyright rightsstatements.org