Efficient string algorithmics across alphabet realms

dc.contributor.advisorFischer, Johannes
dc.contributor.authorEllert, Jonas
dc.contributor.refereeGawrychowski, Paweł
dc.date.accepted2024-02-19
dc.date.accessioned2024-04-03T13:17:59Z
dc.date.available2024-04-03T13:17:59Z
dc.date.issued2024
dc.description.abstractStringology is a subfield of computer science dedicated to analyzing and processing sequences of symbols. It plays a crucial role in various applications, including lossless compression, information retrieval, natural language processing, and bioinformatics. Recent algorithms often assume that the strings to be processed are over polynomial integer alphabet, i.e., each symbol is an integer that is at most polynomial in the lengths of the strings. In contrast to that, the earlier days of stringology were shaped by the weaker comparison model, in which strings can only be accessed by mere equality comparisons of symbols, or (if the symbols are totally ordered) order comparisons of symbols. Nowadays, these flavors of the comparison model are respectively referred to as general unordered alphabet and general ordered alphabet. In this dissertation, we dive into the realm of both integer alphabets and general alphabets. We present new algorithms and lower bounds for classic problems, including Lempel-Ziv compression, computing the Lyndon array, and the detection of squares and runs. Our results show that, instead of only assuming the standard model of computation, it is important to also consider both weaker and stronger models. Particularly, we should not discard the older and weaker comparison-based models too quickly, as they are not only powerful theoretical tools, but also lead to fast and elegant practical solutions, even by today's standards.en
dc.identifier.urihttp://hdl.handle.net/2003/42419
dc.identifier.urihttp://dx.doi.org/10.17877/DE290R-24255
dc.language.isoende
dc.subjectString algorithmicsen
dc.subjectSquare-freenessen
dc.subjectRunsen
dc.subjectLyndon arrayen
dc.subjectLempel-Ziven
dc.subjectRightmost Lempel-Ziven
dc.subjectLZ77en
dc.subjectGeneral ordered alphabeten
dc.subjectGeneral unordered alphabeten
dc.subjectWord RAMen
dc.subjectWord-packingen
dc.subject.ddc004
dc.subject.rswkZeichenkettede
dc.subject.rswkWort <Informatik>de
dc.subject.rswkDatenkompressionde
dc.subject.rswkOrdnungsreduktionde
dc.subject.rswkEffizienter Algorithmusde
dc.titleEfficient string algorithmics across alphabet realmsen
dc.typeTextde
dc.type.publicationtypePhDThesisde
dcterms.accessRightsopen access
eldorado.secondarypublicationfalsede

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dissertation_Ellert.pdf
Size:
2.99 MB
Format:
Adobe Portable Document Format
Description:
DNB
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.85 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections