Managing interlingual references - a type generic approach
Loading...
Date
2011-09-21
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis presents a framework to make ubiquitous low level references
between arbitrary constructs in source code given in arbitrary programming
languages explicit. While the problems that arise due to these implicit
interlingual references are well-known to practitioners, there is no adequate
tool-based solution up to today. The reason is, that such a tool needs to be
capable to analyze source code in many languages and that the choice of these
languages is subject to the specific requirements of a project: The tool has to
be parametric in the languages themselves. The concept of datatype generic
programming, developed in the functional programming community in recent years,
builds up on ideas from category theory and there are working implementations
especially in the Haskell-community. This approach finally allows to write
type-safe software engineering tools that can be reused for (i.e. parametrized
by) many languages. After the presentation of the underlying machinery and its
application to real-life software engineering, we define these implicit
interlingual references as links between specific subtrees in abstract syntax
trees of possibly different languages. The notion of consistency for such a
pair is then the definition of a function that maps two arbitrary subterms to a
Boolean value. Based on this definition, we develop a framework that allows to
manage such references, i.e. we can define, check and adapt them in a type-safe
way. Finally, we perform a case study that proves that our approach works for
real life languages and projects. We highlight the contributions of this work
in the field of tension between theory and application: A theme that often
reoccurs in scientific software engineering is abstraction - we seek for
solutions that are independent of application specific context. But software
engineering is about engineering, thus there are real-life problems in
real-life applications that have to be solved. That means we have to identify a
practical problem, abstract from everything unnecessary, find a solution, and
bring that solution back into practice. This is quite a long way, and
especially the last step is often overseen. In our case, the practical problem
is well known among practitioners. At the same time, the abstract theories of
programming languages and the relations to the even more abstract realms of
algebra and category theory are well known to computer scientists and
mathematicians for a long time (the fixed point result of Lambek dates back to
1968). In this work, we start with the problem of inconsistencies between
artifacts of a different kind. Because the underlying references are
interlingual, we need a consistent formal framework to formulate the problem.
We express the underlying artifacts as terms that are typed with some algebraic
datatype. This is implemented using Haskell which has both algebraic datatypes
and a lot of parsers and general infrastructure. To argue about references
between terms of arbitrary algebraic datatypes, we need an accessible
specification of the signatures themselves. Formally this specification of
specifications can be expressed using category theory: The notion of a functor
that specifies the structure of a datatype is central in this respect and we
find the according implementation in Haskell under the term "datatype generic
programming". We use this as the technical basis of the prototype. In summary,
the contribution of this thesis is not only the development of a framework that
solves a known problem in a quite complicated way (though we are not aware of
other more promising solutions) but also an example of the complete way from a
practical problem to the deep theoretical formalization and back again to a
practical solution.
Description
Table of contents
Keywords
Datatype generic programming, Dependencies, Software projects, Static semantics, Syntax, Type safe