Tree comparison: enumeration and application to cheminformatics
Loading...
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Graphs are a well-known data structure used in many application domains that rely on relationships between individual entities. Examples are social networks, where the users may be in friendship with each other, road networks, where one-way or bidirectional roads connect crossings, and work package assignments, where workers are assigned to tasks. In chem- and bioinformatics, molecules are often represented as molecular graphs, where vertices represent atoms, and bonds between them are represented by edges connecting the vertices. Since there is an ever-increasing amount of data that can be treated as graphs, fast algorithms are needed to compare such graphs. A well-researched concept to compare two graphs is the maximum common subgraph. On the one hand, this allows finding substructures that are common to both input graphs. On the other hand, we can derive a similarity score from the maximum common subgraph. A practical application is rational drug design which involves molecular similarity searches.
In this thesis, we study the maximum common subgraph problem, which entails finding a largest graph, which is isomorphic to subgraphs of two input graphs. We focus on restrictions that allow polynomial-time algorithms with a low exponent. An example is the maximum common subtree of two input trees. We succeed in improving the previously best-known time bound. Additionally, we provide a lower time bound under certain assumptions. We study a generalization of the maximum common subtree problem, the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. This problem is motivated by the application to cheminformatics. First, the vast majority of drugs modeled as molecular graphs is outerplanar, and second, the blocks correspond to the ring structures and the bridges to atom chains or linkers. If we allow disconnected common subgraphs, the problem becomes NP-hard even for trees as input. We propose a second generalization of the maximum common subtree problem, which allows skipping vertices in the input trees while maintaining polynomial running time.
Since a maximum common subgraph is not unique in general, we investigate the problem to enumerate all maximum solutions. We do this for both the maximum common subtree problem and the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. An arising subproblem which we analyze is the enumeration of maximum weight matchings in bipartite graphs. We support a weight function between the vertices and edges for all proposed common subgraph methods in this thesis. Thus the objective is to compute a common subgraph of maximum weight. The weights may be integral or real-valued, including negative values. A special case of using such a weight function is computing common subgraph isomorphisms between labeled graphs, where labels between mapped vertices and edges must be equal. An experimental study evaluates the practical running times and the usefulness of our block-and-bridge preserving maximum common induced subgraph algorithm against state of the art algorithms.
Description
Table of contents
Keywords
Maximum common subtree, All-cavity maximum weight matching, Maximum similarity subgraph, Cheminformatics