Hybrid code generation
Lade...
Datum
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Verlag
Sonstige Titel
Synergies of Large Language Models and Language-driven Engineering
Zusammenfassung
Language-Driven Engineering (LDE) aims to provide the most suitable modeling language for every purpose and stakeholder. As a concept for Low-Code/No-Code (LC/NC) en- vironments, LDE aims to create an easy-to-use development environment for everyone. Besides textual modeling languages, graphical modeling languages are particularly well suited for this purpose. They are often easy for humans to understand and easy to learn. Large Language Models (LLMs) are similarly easy to use. Natural language provides a universal interface that humans are accustomed to using every day. Consequently, LLMs have become ubiquitous in recent times. They are used in various fields, including text, image, and video generation. They are also highly popular for code generation. LLMs offer a high degree of flexibility, not only because of their natural language input, but also because of their output. They can produce arbitrary outputs, making them more versatile in code generation tasks than conventional code generators. However, they have downsides regarding controllability and reliability. For example, they are not deterministic and can produce different outputs when given identical inputs. Additionally, they are difficult to control reliably. Their output may not meet user expectations, and it may contain errors.
This uncertainty is particularly undesirable in environments like LDE, where code is generated from formalized models. Nevertheless, LDE environments may still want to benefit from LLMs’ flexibility for code generation. Furthermore, natural languages allow for the simplicity of expression that Domain-Specific Languages (DSLs) seek to provide. LC/NC approaches aim to provide the same code generation experience as LLMs. These approaches enable even non-expert users to easily articulate their needs and generate code.
Therefore, combining LDE and LLMs would enable great synergies. The flexibility of LLM code generation could be incorporated into LDE. At the same time, LDE could provide mechanisms to incorporate the control of conventional code generation from formalized models into LLM code generation.
This dissertation presents a hybrid code generation approach that combines LLM- supported and conventional code generation in the context of LDE. Specifically, it combines a two-step generation approach and an extension to Template-based Code Generation (TBCG) for LLMs. This two-step process leverages the flexibility of LLMs within LDE, while maintaining control. To achieve this, it intertwines DSLs and natural language into Domain-Specific Natural Languages (DSNLs). This enables synergies between conventional and LLM code generation. The extension to TBCG makes it easily usable as an LLM tool. Instead of outputting code without guidance, it constrains LLM code generation output. This makes LLM code generation more controllable.
The hybrid code generation approach takes advantage of the flexibility of LLMs. At the same time, it establishes three layers of control over the use of LLMs: 1. Contextualization: The presented approach generates contextualization for the LLM, so users can interact with it using a DSNL instead of natural language alone. DSNLs define the domain in which LLMs operate and enable referencing of other models in the LDE environment. DSNLs shift the responsibility of good prompting from users to LDE developers. This makes it easier to achieve good code generation results. Thus, this layer controls the input for LLMs. 2. Validation by Design: System-level validation is applied to observe whether the generated output meets user expectations. Exploiting control over code generation ensures that Active Automata Learning (AAL) can infer behavioral automata for validation purposes. This enables quality control of the output at the semantic level. 3. Output Constraints: Extending TBCG for easy use with LLMs guides them regarding their generation capabilities. Rather than allowing the LLMs to output arbitrary and potentially unwanted code, the TBCG tooling constrains their output options. These constraints control the LLMs during code generation.
Beschreibung
Inhaltsverzeichnis
Schlagwörter
Code generation, Model-driven Engineering, Domain-specific Languages, Large Language Models, AI-assisted software development
Schlagwörter nach RSWK
Codegenerierung, Modellgetriebene Entwicklung, Domänenspezifische Sprache, Großes Sprachmodell, Generative KI
