Modeling Approaches for Dose-Response
Data in Toxicology
Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakultät Statistik der
Technischen Universität Dortmund
Vorgelegt von
Julia Christin Duda
Dortmund, Januar 2024
Amtierender Dekan:
Prof. Dr. Philipp Doebler
Gutachter:
Prof. Dr. Jörg Rahnenführer (Technische Universität Dortmund)
JProf. Dr. Kirsten Schorning (Technische Universität Dortmund)
Prof. Dr. Jan G. Hengstler (Leibnizinstitut für Arbeitsforschung an der Technischen Universität Dortmund)
Tag der Prüfung:
14. März 2024
Abstract
Dose-response modeling occurs in many application areas and has a rich research history.
An extensively studied application field is clinical studies, where dose-response model-
ing is used in Phase II studies to identify the dose closest to a pre-defined effect. Many
non-clinical, toxicological studies also aim at identifying a dose-response relationship.
However, for non-clinical or toxicological studies there are fewer regulations or guide-
lines. This leads to a gap between nowadays research advances in statistical modeling
and the use of these methods in practice in toxicology. In addition, toxicological dose-
response studies differ from clinical studies in various technical aspects. For example,
cells might be studied instead of human patients, and administered doses are constrained
due to laboratory, and technical reasons rather than ethical considerations. Therefore,
the transfer of clinical methodological knowledge into toxicological applications is
only possible to a limited extent and tailored methodologies are required that match the
specific data structure of toxicological studies.
This cumulative thesis is based upon four works that all present approaches for modeling
toxicological dose-response data. The first manuscript reveals the potential of applying
the Multiple Comparison Testing and Modeling (MCP-Mod) approach by Bretz et al.
(2005) developed for Phase II clinical studies on toxicological, gene-expression dose-
response data. In the second manuscript, a parametric, mechanistically motivated
model for toxicological dose-time-response data is developed. The third manuscript
is application-focused and explains the use of interaction effects when analyzing dose-
response gene expression in a two-factor setting. At last, a non-parametric Bayesian
dose-response modeling approach was developed that performs functional shrinkage for
non-linear function spaces. While the first three manuscripts are published, the fourth
work is attached in its current version.
iii

Acknowledgments
I would like to thank my supervisor, Jörg, for his unwavering support and optimism
throughout each phase of my dissertation. Thank you for always showing me another
perspective when I needed it, while also granting me a lot of freedom to explore and find
my ways.
I am grateful for the opportunity to work on my Ph.D. within the Research Training
Group 2624 Statistical Methods for High-Dimensional Data in Toxicology. It was a
great experience to work on so many interdisciplinary projects. Special thanks go out to
Prof. Hengstler, whose passion for his research is beyond scales and I am grateful for the
opportunity to collaborate with him - thank you! I would also like to thank all members
of the RTG and other colleagues, for being a great community to gain experiences, feel
supported, and share fun times throughout the last years.
A special thanks goes out to Matt, the co-author of my last work, who gave me the
opportunity to visit him at the National Institute of Environmental Health Sciences in
the United States to work on our research project. Thank you for pushing me to work
through challenges that I would have thought invincible before.
I would like to thank Ezekiel King, my roommate during my USA stay. Thank you for
teaching me to become a braver version of myself and to always cherish the little things
in life. I hope I made your life a little brighter in the short time we shared and that you
can rest peacefully now.
At last, I would like to thank my family and friends who greatly supported me throughout
all the ups and downs of this academic journey. Without your love and support, I would
not have made it through the hard times.
“Just keep swimming.”
– Dory, Finding Nemo.
v

List of Publications
This cumulative thesis is based on the following four manuscripts:
Article 1: Duda, J. C., Kappenberg, F., & Rahnenführer, J. (2022). Model selection character-
istics when using MCP-Mod for dose–response gene expression data. Biometrical
Journal, 64(5), 883–897. https://doi.org/10.1002/bimj.202000250
Contribution of the author:
The author of this thesis implemented the data analyses and the simulation studies,
added additional ideas and wrote the first draft of the manuscript. Discussions and
revising was done together with Franziska Kappenberg under the supervision of
Jörg Rahnenführer.
The reuse of this article in the thesis is granted under the terms of the Creative
Commons Attribution 4.0 International License.
Article 2: Duda, J. C., Hengstler, J. G., & Rahnenführer, J. (2022). td2pLL: an intuitive
time-dose-response model for cytotoxicity data with varying exposure durations.
Computational Toxicology, 43, Article 100234.
https://doi.org/10.1016/j.comtox.2022.100234
Contribution of the author:
The author of this thesis had the central idea for the suggested model, implemented
the software package, simulation study, and data application, and wrote the first
draft of the manuscript. Helpful comments by the co-authors were incorporated.
The reuse of this article in the thesis is granted under the terms of the Creative
Commons Attribution 4.0 International License.
vii
Article 3: Duda, J. C., Drenda, C., Kästel, H., Rahnenführer, J., & Kappenberg, F. (2023).
Benefit of using interaction effects for the analysis of high-dimensional time-
response or dose-response data for two-group comparisons. Scientific Reports,
13(1), 20804.
https://doi.org/10.1038/s41598-023-47057-0
Contribution of the author:
The author of this thesis implemented the analyses together with Carolin Drenda
and Hue Kästel, and wrote the manuscript. Comments from fruitful discussions
with Jörg Rahnenführer and Franziska Kappenberg were incorporated.
The reuse of this article in the thesis is granted under the terms of the Creative
Commons Attribution 4.0 International License.
Article 4: Duda, J. C. & Wheeler, M. Bayesian non-linear subspace shrinkage using horse-
shoe priors. Unpublished.
Contribution of the author:
The original idea for this manuscript was proposed by Matthew Wheeler. The
computational implementation, design, conduction, and analysis of the simulation
study and the search for and analysis of the application example as well as writing
the manuscript was done by the author of this thesis under the supervision of
Matthew Wheeler. The manuscript is attached in its current version.
Further publications:
1. Tug, T., Duda, J. C., Menssen, M., Bruce, S. W., Bringezu, F., Dammann, M.,
Frötschl, R., Harm, V., Ickstadt, K., Igl, B. W., Jarzombek, M., Kellner, R., Lott, J.,
Pfuhler, S., Plappert-Helbig, U., Rahnenführer, J., Schulz, M., Vaas, L., Vasquez,
M., Ziegler, V., Ziemann, C. (2024). In vivo alkaline comet assay: Statistical con-
siderations on historical negative and positive control data. Regulatory Toxicology
and Pharmacology, 105583. https://doi.org/10.1016/j.yrtph.2024.105583
2. Ghallab, A., González, D., Strängberg, E., Hofmann, U., Myllys, M., Hassan,
R., Hobloss, Z., Brackhagen, L., Begher-Tibbe, B., Duda, J. C., Drenda, C.,
Kappenberg, F., Reinders, J., Friebel, A., Vucur, M., Turajski, M., Seddek,
A., Abbas, T., Abdelmageed, N., Morad, S. A. F., Morad, W. , Hamdy, A.,
Albrecht, W., Kittana, N., Assali, M., Vartak, N., van Thriel, C., Sous, A.,
Nell, P., Villar-Fernandez, M., Cadenas, C., Genc, E., Marchan, R., Luedde,
T., Åkerblad, P., Mattsson, J., Marschall, H., Hoehme, S., Stirnimann, G., Schwab,
M., Boor, P., Amann, K., Schmitz, J., Bräsen, J. H., Rahnenführer, J., Edlund,
K., Karpen, S.J., Simbrunner, B., Reiberger, T., Mandorfer, M., Trauner, M.,
Dawson, P. A., Lindström, E., Hengstler, J. G. (2023). Inhibition of the Renal
Apical Sodium Dependent Bile Acid Transporter Prevents Cholemic Nephropa-
thy in Mice with Obstructive Cholestasis. Journal of Hepatology. Published.
https://doi.org/10.1016/j.jhep.2023.10.035
3. Kappenberg, F.,Duda, J. C., Schürmeyer, L., Gül, O., Brecklinghaus, T., Hengstler,
J. G., Schorning, K., & Rahnenführer, J. (2023). Guidance for statistical design and
analysis of toxicological dose–response experiments, based on a comprehensive
literature review. Archives of Toxicology. In press. https://doi.org/10.1007/s00204-
023-03561-w
4. Su, H., Haque, M., Becker, S., Edlund, K., Duda, J. C., Wang, Q., Reißing, J.,
Marschall, H., Candels, L. S., Mohamed, M., Sjöland, W., Liao, L., Drexler, S. A.,
Strowig, T., Rahnenführer, J., Hengstler, J. G., Hatting, M., Trautwein, C. (2023).
Long-term hypercaloric diet exacerbates metabolic liver disease in PNPLA3
I148M animals. Liver International. In press. https://doi.org/10.1111/liv.15587
5. Brecklinghaus, T., Albrecht, W., Duda, J. C., Kappenberg, F., Gründler, L.,
Edlund, K., Marchan, R., Ghallab, A., Cadenas, C., Rieck, A., Vartak, N., Tolosa,
L., Castell, J. V., Gardner, I., Halilbasic, E., Trauner, M., Ullrich, A., Zeigerer, A.,
Turgunbayer, Ö. D., Damm, G., Seehofer, D., Rahnenführer, J., Hengstler, J. G.
(2022). In vitro/in silico prediction of drug induced steatosis in relation to oral
doses and blood concentrations by the Nile Red assay. Toxicology Letters, 368,
33–46. https://doi.org/10.1016/j.toxlet.2022.08.006
6. Brecklinghaus, T., Albrecht, W., Kappenberg, F., Duda, J. C., Vartak, N., Edlund,
K., Marchan, R., Ghallab, A., Cadenas, C., Günther, G., Leist, M., Zhang, M.,
Gardner, I., Reinders, J., Russel, F. GM., Foster, A. J., Williams, D. P., Damle-
Vartak, A., Grandits, M., Ecker, G., Kittana, N., Rahnenführer, J., Hengstler, J. G.
(2022). The hepatocyte export carrier inhibition assay improves the separation of
hepatotoxic from non-hepatotoxic compounds. Chemico-Biological Interactions,
351, Article 109728. https://doi.org/10.1016/j.cbi.2021.109728
7. Brecklinghaus, T., Albrecht, W., Kappenberg, F., Duda, J. C., Zhang, M., Gardner,
I., Marchan, R., Ghallab, A., Turgunbayer, Ö. D., Rahnenführer, J., Hengstler, J.
G. (2022). Influence of bile acids on the cytotoxicity of chemicals in cultivated
human hepatocytes. Toxicology in Vitro, 81, Article 105344.
https://doi.org/10.1016/j.tiv.2022.105344
8. Ghallab, A., Myllys, M., Friebel, A., Duda, J. C., Edlund, K., Halibasic, E.,
Vucur, M., Hobloss, Z., Brackhagen, L., Begher-Tibbe, B., Hassan, R., Burke,
M., Genç, E., Frohwein, L. J., Hofmann, U., Holland, C. H., González Leiva, D.
F., Keller, M., Seddek, A., Abbas, T., Mohammed, E. S. I., Teufel, A., Itzel, T.,
Metzler, S., Marchan, R., Cadenas, C., Watzl, C., Nitsche, M. A., Kappenberg, K.,
Luedde, T., Longerich, T., Rahnenführer, J., Hoehme, S., Trauner, M., Hengstler,
J. G. (2021). Spatio-temporal multiscale analysis of Western diet-fed mice reveals
a translationally relevant sequence of events during NAFLD progression. Cells,
10(10), Article 2516. https://doi.org/10.3390/cells10102516
Contents
Abstract iii
Acknowledgments v
List of Publications vii
I Introduction 1
1 Motivation 3
2 Statistical Methods 9
2.1 Dose-response models . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 MCP-Mod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Gene expression data . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Functional shrinkage for linear subspaces . . . . . . . . . . . . . . . . 13
3 Summary of the Articles 15
3.1 Article 1: MCP-Mod on Gene-Expression Dose-Response Data . . . . . 15
3.2 Article 2: Time-Dose-Response Modeling . . . . . . . . . . . . . . . . 16
3.3 Article 3: Interaction Effects . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Article 4: Non-linear Shrinkage . . . . . . . . . . . . . . . . . . . . . 19
4 Discussion and Outlook 23
Bibliography 27
II Publications 33
xi

Part I
Introduction
1

1 Motivation
Dose-response analysis occurs in a variety of fields but is most intensely studied within
the pharmaceutical development context. When developing a new therapeutic compound,
the Phase I and II clinical trials specifically aim at finding a ’good’ dose (Ting et al.,
2017). Therefore, it is crucial to understand and statistically model the dose-response
relationship of the compound under investigation. The analysis of dose-response relation-
ships is also a central goal in related fields, such as pre-clinical studies or toxicological
research outside the pharmaceutical sector (Hothorn, 2016). The goal of modeling a
dose-response relationship in whichever application field appears identical from a mathe-
matical perspective, at least at first glance. But more specialized, tailored methodological
approaches to each field are required due to different experimental conditions. For exam-
ple, the challenge of small sample sizes in toxicological experiments calls for a different
methodological development. However, the methodological research states and practices
are more advanced in clinical research than in non-clinical and toxicological research for
historical, political, and economic reasons. The pharmaceutical industry faces complex
regulations, initiated by the Contergan scandal in the 1960s, while simultaneously being
an economically very profitable sector (Bauschke, 2010). This combination quickly
pushed methodological research for clinical studies towards new, complex methods that
save study duration time while remaining in line with regulatory standards.
For toxicological, non-clinical studies, there are many guidelines to standardize labora-
tory procedures of assays and study types, for example by the Organization of Economic
Co-Operation and Development (OECD) (Gordon, 2001). Laboratory testing results
related to chemical safety that adhere to OECD Testing Guidelines are accepted by
OECD countries. This harmonization effort reduces repeated testing for country-specific
regulations and creates trust and predictability in society and the international market.
Other authorities such as the European Food Safety Agency (EFSA) of the European
Union or the U.S. Environmental Protection Agency (EPA) also influence policy making
by providing scientific advice. Such guidelines are updated regularly to keep pace with
scientific advances. Especially within the risk assessment context, there were advances
in the recommendation for the statistical evaluation of dose-response data. For example,
in 2009 the Scientific Committee of the EFSA published a guidance document that
3
1 Motivation
promoted actual modeling of dose-response relationships rather than multiple testing
approaches between dose levels to determine the critical effect of a substance (EFSA,
2009).
Despite advances and efforts through guidelines and recommendations, their impact is
limited as they are not binding for all toxicological studies or experiments. International
guidelines apply for specific studies conducted by international companies and indepen-
dent risk assessors. The vast majority of research in toxicology is not required to be
guideline-coherent. This freedom empowers exploratory research but also reinforces the
gap between statistical methodological research and its application in practice.
The discrepancy between widely accepted methodological advances in risk assessment
and their practical use is quantified in recent a review by Kappenberg et al. (2023). From
3269 analyzed dose-response curves where some form of modeling was applied, for the
vast majority (>2,250) simple linear interpolation was used. Only for about 500 curves,
an actual parametric model was fitted.
In clinical research, dose-response modeling methods typically aim at dose-finding goals,
i.e. finding a clinically relevant dose (Bretz et al., 2008). Mathematically, the goal is
similar to finding toxicity levels, when analyzing dose-response data in risk assessment.
However, methodological research for clinical dose-finding studies can be considered
more prevalent. It started with the traditional rule-based, "up-and-down" 3+3 design
developed in 1946 (Dixon and Mood, 1946) which was introduced for clinical trials in
1989 (Storer, 1989). Model-based designs followed such as the continual reassessment
method (CRM) (O’Quigley and Shen, 1996), escalation with overdose control (EWOC)
(Babb et al., 1998; Rogatko et al., 2005) and extensions such as the time-to-event CRM
(TITE-CRM) (Cheung and Chappell, 2000). A hybrid approach that combines classical
multiple comparison procedures (MCP) testing approaches (Tamhane et al., 1996) with
modeling approaches (Pinheiro et al., 2006) is MCP-Mod (Bretz et al., 2005). Further
methods and reviews are compared and referenced by Ananthakrishnan et al. (2017).
Despite this active, fast-paced development, and agreement on the superiority of the
advanced methods, also in clinical trials there is a high reluctance to adopt new designs
in practice (Kurzrock et al., 2021). A review by Rogatko et al. (2007) revealed that
from 1,235 clinical Phase I studies published between 1991 and 2006, only 20 (1.6%)
used a statistical modeling method that was published during that time period. The vast
majority of the remaining studies used some form of the classical rule-based design.
The above considerations reveal how despite efforts both from academia and regulatory
bodies and a theoretically very advanced research state on dose-response modeling, the
application of said methods falls short in practice. Before linking how the work of this
4
thesis contributes to filling this gap, one additional development that is intertwined with
this topic is elucidated.
Another fast-developing field that effects dose-response modeling in toxicology is ge-
nomics. Starting with the discovery of the molecular structure of DNA by Watson
and Crick (Watson and Crick, 1953), sequencing technologies that transform genetical
information into machine-readable data were continuously improved. A breakthrough
was the first complete sequence of the human genome in 2001 (Venter et al., 2001).
This was achieved with the first sequencing technology, Sanger sequencing (Sanger and
Coulson, 1975). The sequencing technology was rapidly advances, bringing forward
microarray technology (Schena et al., 1995), next generation sequencing, which allowed
massive parallel sequencing (Margulies et al., 2005), and more recently single-cell se-
quencing (Aldridge and Teichmann, 2020), with its first experiment in 2008. Sanger and
next-generation sequencing can nowadays be considered as first and second generation
sequencing. In 2009, single-cell sequencing or third generation sequencing was first
introduced commercially by Pacific Biosciences (PacBio) as Single Molecule Real Time
(SMRT) Sequencing (Eid et al., 2009). Third generation sequencing differs from first
and second generation sequencing, as it allows much longer sequencing lengths up to
tens of kilobases of DNA base pairs of a single DNA molecule. It does not require DNA
amplification (Satam et al., 2023).
With next generation sequencing, the size of generated data grew rapidly. This called for
new methods and software and highly influenced the research field of bioinformatics.
Data generated by sequencing machines has to be pre-processed prior to the actual data
analysis (Gondane and Itkonen, 2023). After pre-processing the raw reads, they have to
be aligned or assembled using alignment algorithms such as STAR (Dobin et al., 2013),
TopHat (Trapnell et al., 2009) or Salmon (Patro et al., 2015). Subsequent to additional
normalization steps, the statistical differential gene-expression analysis follows. This
analysis aims at finding genes that are differentially expressed (have different activity
levels) between two groups, hence identifying genes that might cause the condition
under investigation. The most used softwares to identify differentially expressed genes
are DESeq2 (Love et al., 2014) and edgeR (Robinson et al., 2010).
Over the last two decades, the rapidly decreasing cost of genome sequencing allowed
for larger sample sizes. From $100,000,000 per human genome sequencing in 2001, the
costs dropped below $1,000 per genome in 2021 (Wetterstrand, 2021). Consequently,
not only two groups might be compared, but repeated measurements at different dose-
or time-points became financially feasible. Such experimental setups allowed for dose-
response modeling considerations in the field of RNA sequencing data.
5
1 Motivation
In summary, dose-response modeling is a central task for toxicological risk assessment
and faces various, interdisciplinary challenges. It is intensively studied in related fields
such as clinical trials, where the mathematical approach is similar, though contextual
differences lead to different focuses. Additionally, ceaseless technical developments in
genomics increase RNA sequencing data set sizes such that dose-response modeling is
starting to become relevant in bioinformatical analyses, too. Despite regulations and
guidelines, the transfer from state-of-the-art methodological dose-response modeling
research into practice is slow both for toxicological and clinical research. Overarching
use of general dose-response modeling approaches developed in the context of either
clinical or toxicological research is rare but promising, due to the mathematical similar
goal. Specific, contextual differences between toxicological and clinical studies certainly
prevent a general interchangeability of methods. For example, censored survival data in
clinical trials are typically not present in controlled toxicological studies with mice or
cells, or exposure times in clinical models typically refer to the time after an administra-
tion, rather than the time period of a continuous exposure duration. Such differences
do not allow indiscriminate methodological interchangeability, but uphold the need for
tailored method developments.
The works of this thesis contribute to the aforementioned challenges from various
perspectives. The first paper reveals the potential of using MCP-Mod - a dose-response
modeling approach developed in the clinical research context - on toxicological, in
vitro gene expression dose-response data (Duda et al., 2022b). MCP-Mod is a two-step
procedure designed to first detect a Proof-of-Concept (PoC) of a candidate drug and
then, conditioned on an established PoC, determine the dose of interest by considering
model uncertainty. These ideas map well to the goals of gene expression analysis, where
the detection of active genes is of interest, while the dose-response model of each gene
is uncertain a priori.
The second paper introduces a time-dose-response modeling approach that is tailored
to exposure-time dose-response modeling for cytotoxicity experiments (Duda et al.,
2022a). Motivated by Haber’s law (Haber, 1924), the model links a physiological idea to
a popular dose-response model, yielding a time-dose-response model. Equipped with an
R software package and a recommended two-step procedure, the approach is presented
with a practical focus, thereby minimizing the burden for potential users.
Promoting the optimal use of existing tools and methods in RNA Sequencing analyses
in practice is the aim of the third paper of this thesis. A common experimental scenario
in RNA Sequencing is the comparison of a factor with two (or more) levels between two
groups (Duda et al., 2023). Statistically, research questions for these scenarios naturally
translate into the analysis of an interaction effect. However, in practice, methodological
6
workarounds are often used. The paper explains and demonstrates in detail using an
RNA-Seq mouse data set, the benefits of directly modeling interaction effects compared
to the commonly used approach.
The last manuscript presents a new modeling approach that combines non-parametric
modeling with functional shrinkage into a parametric space. The approach shrinks a
non-parametric model into a pre-defined parametric function space if the data does not
suggest deviations from that space. If the data presents deviations from the assumed
function space, the approach is not limited to the assumed space and remains flexible.
We demonstrate the particular use of this general method for dose-response modeling,
where often the parametric Hill model is considered plausible, even though deviations
(e.g. due to downturn effects at large doses) cannot be ruled out in advance and must be
modeled flexibly. Further, the approach’s adaptive behavior is appealing for modeling
high throughput dose-response data, as it reduces the burden of correctly selecting a
fixed set of candidate models that is appropriate for thousands of genes.
The remainder of this thesis is structured as follows. In Chapter 2, a methodological
background is provided. It contains general concepts and methods that are used in
the works of this thesis but were developed before and are not part of the research
contribution of this thesis. Chapter 3 provides a summary of each of the four papers.
Close attention is paid to pointing out the innovative aspect of each work and how it
contributes to the respective research landscape. The thesis closes with a discussion in
Chapter 4. All full-length papers are attached thereafter.
7
1 Motivation
8
2 Statistical Methods
In this chapter, a general methodological background of existing methods that were used
or extended in the works of this thesis is presented. To allow intuitive notations, symbols
are mostly but not strictly used consistently across topics.
2.1 Dose-response models
For the remainder of this thesis, we use the terms dose and concentration interchange-
ably, despite its differences from a biological perspective. Mathematically, there is no
distinction, and for brevity, we will mostly refer to a dose.
Consider n ∈ N doses x = (x ⊤1, . . . , xn) with corresponding response values y =
(y1, . . . , yn)
⊤. Doses might be grouped into k <∑n distinct dose levels d1, . . . , dk with
n1, . . . , nk repeated response measurements and
k
i=1 ni = n.
Assume a parametric relationship between the true mean response and the dose level,
and additive, mean-zero, homoscedastic noise with variance σ2 > 0:
y = f(x) + ε, (2.1)
with ε = (ε1, . . . , εn)⊤ ∼ N (0, I 2nσ ). The parametric function f is unknown, and has
to be modeled. The following parametric dose-response models were considered in the
papers.
The Hill model (a.k.a. sigmoidal Emax model or four-parameter log-logistic model)
was developed by A. V. Hill in 1910 (Hill, 1910) to describe biochemical equilibrium
between oxygen availability and hemoglobin saturation. It is one of the most frequently
used dose-response models (Goutelle et al., 2008) and has a characteristic sigmoidal
shape. The Hill model is known under different names and parameterizations (Ritz,
2010). We use the parametrization as in the R package DoseFinding (Bornkamp
et al., 2010):
9
2 Statistical Methods
xh
fθ(x) = E0 + Emax (2.2)
EDh h50 + x
with parameters θ = (E0, E ⊤max, ED50, h) . The intercept or background response at
zero dose (x = 0) isE0. Emax is the asymptotic maximum andED50 is the dose yielding
half of the asymptotic maximum effect. The parameter h defines the steepness of the
curve at ED50. For the special case h = 1, the Hill model becomes the Emax model,
which has an asymptote only for large doses, but not at the zero dose level.
The Beta model is defined as
f δ1 δ2θ(x) = E0 + EmaxB(δ1, δ2)(x/scal) (1− x/scal) (2.3)
with B(δ1, δ2) = (δ + δ )(δ1+δ2)1 2 /(δδ1δδ21 2 ) capturing the shape of the density function
of a Beta distribution on [0, scal]. The Beta model can account for downturn effects
after an increase and is a comparatively flexible parametric dose-response model. While
the scaling parameter scal is a pre-defined hyperparameter that is by default set to 1.2
times the maximal dose, the parameters to be estimated are θ = (E0, E ⊤max, δ1, δ2) . The
linear parameters E0 and Emax can be interpreted as for the Hill model, while δ1 > 0
and δ2 > 0 are non-linear parameters that define the location and steepness of the mode.
Another simpler non-linear model is the Power model
fθ(x) = E0 + E
h
maxx (2.4)
with power coefficient h defining the steepness and potency in the dose-response context.
It is used in the U.S. EPA’s Benchmark Dose Software (BMDS) (Davis et al., 2011).
Further, simple linear regression models or quadratic models are candidate dose-response
models and are assumed familiar.
In the frequentistic setup, model fitting can be performed by ordinary least squares,
which coincides with a maximum likelihood approach for the normal error assumption,
or weighted least squares. In the Bayesian framework, the posterior distribution is
calculated analytically or approximated computationally. The specific computational
implementation for both frameworks varies. For examples and details, see Pinheiro
et al. (2014) for the MCP-Mod Software, Ritz et al. (2015) for the drc R package
(frequentistic), or Wheeler et al. (2023) and Davis et al. (2011) for the bayesian ToxicR
R package and BMDS software.
10
2.2 MCP-Mod
2.2 MCP-Mod
MCP-Mod was developed for Phase I/II clinical dose finding studies by Bretz et al.
(2005) to analyze dose-response data and thereby combine two classical approaches:
Multiple comparison procedures (MCP) and modeling (Mod). It is a dual approach to
address two goals simultaneously: Performing a proof-of-concept (PoC) step, and if the
PoC is established, model the dose-response relationship to derive the dose of interest.
With the PoC step or MCP step, it is analyzed if the true dose-response f is non-flat,
i.e. a dose-response signal exists. More precisely, one tests if f is within a set M of
considered dose-response functions, or not. Practitioners a-priori define reasonable,
standardized model shapes f 0 01 , . . . , fM by decomposing fm into a location, scale and
shape component:
f(x,θ) = θ0 + θ f
0
1 (x,θ
0).
A guesstimate approach translates interpretable statements on the expected response into
quantifiable shape parameters θ0 for each candidate model. Each model shape f 0m then
defines a mean response vector µm. The PoC test is constructed based on the hypotheses
Hm : c⊤0 mµm = 0 and H
m
1 : c
⊤
mµm ̸= 0 where the contrast vector cm is constructed to
maximize the power of the test under the assumption that µm is the true mean response.
Each contrast defines a test statistic
∑k
T = √ i=1 cmiȳim ∑ , (2.5)
S k 2i=1 cmi/ni
∑ ∑
where ȳi is the mean response at distinct dose level di and S2 =
k ni
i=1 j=1(yij −
ȳi)
2/(n−k)with yij being the jth measurement at dose di. Given the normal distribution
assumption in (2.1), (T1, . . . , Tm) follows a multivariate normal distribution. A PoC is
established if T = max(T1, . . . , Tm) is greater than the corresponding quantile qα, and
α is the pre-selected significance level.
This multivariate approach controls the family-wise error rate while remaining powerful
because correlations between similar model shapes are accounted for automatically. If a
PoC is established, a non-flat dose-response relationship is assumed. All models that
passed the PoC step are subject to modeling the dose-response curve in the modeling
(Mod) step. In the Mod-step either model selection using the Akaike Information
Criterion (AIC) or model averaging with weights based on the AIC is performed. MCP-
Mod is used in the first paper of this thesis, with a focus on model selection via the AIC
criterion.
11
2 Statistical Methods
2.3 Gene expression data
Gene expression data cover a wide variety of data types and technologies to generate
them, and algorithms for downstream analyses. A general historical and technical
overview was provided in Chapter 1. With a focus on the papers of this thesis, this
section provides more details on microarray data and RNA Sequencing data. For
elementary biological concepts on cells, proteins and DNA, see Brazma et al. (2001).
Microarrays quantify gene activities, by quantifying a single sample’s messenger RNA
(mRNA), a gene product assumed proportional to a gene’s activity, at thousands of genes
simultaneously. Small mRNA molecules - transcripts - are labeled with a fluorescent
dye and become targets. The transcripts are deposited over the microarray chip. On
the chip’s surface, there are small oligonucleotides (RNA strands) called probes which
the sample’s targets can bind to (hybridization), if the genetic sequence is similar
enough. In theory, only compatible probe targets remain on the ship after a washing
phase. Fluorescence-based intensity measurements are assumed proportional to the
corresponding gene activities (Sánchez and de Villa, 2008).
There are further technical details such as adjusting for background fluorescence that orig-
inates from non-specific hybridization, i.e. mRNA samples that bind to non-complementary
probes. After quality control assessment, background signal adjustment and normal-
ization to account for technical artifacts can be performed using the robust multi-array
averaging (RMA) measure (Irizarry et al., 2003). The final, pre-processed intensity
values of different samples are typically on a log2 scale and considered approximately
normally distributed.
An important conceptual difference between microarrays and next generation, RNA-
seqeuencing, is that microarrays have a pre-defined set of known target sequences, while
RNA-sequencing also allows de novo assembly of unknown genomes (Gondane and
Itkonen, 2023).
The bioinformatical assembling of the reads is not subject to the research of this thesis
and shall not be discussed here. Starting with the resulting raw read counts, normalization
and statistical modeling follow, as summarized in Chapter 1. In the papers of this thesis,
the R package DESeq2 (Love et al., 2014) is used to identify differentially expressed
genes (DEGs). The relevant modeling and normalization of DESeq2 is explained in
the following (cf. Love et al. (2014)). The read counts of gene i for sample j are
modeled using a negative binomial distribution, Kji ∼ NB(µij, αi), where µij is the
expected mean read count and αi a dispersion parameter. To account for different
gene lengths and other technical considerations, model µij = qijsj , where sj is the
12
2.4 Functional shrinkage for linear subspaces
size factor or technical read depth for sample j, and qij is the unknown, true DNA
fragment concentration of gene i for sample j. The size factors sj are estimated using
the median-of-ratios method (Anders and Huber, 2010). The fragment concentration
that represents the gene activit∑y is then modeled using a generalized linear model with
a logarithmic link: log2 q =
R
ij r=1 xjrβir, where xjr dummy-codes the experimental
conditions r = 1, . . . , R in a design matrix X = {x}jr and βij are the coefficients.
The selection of the model design and corresponding interpretation of the model fit for
typical gene expression experiments is the topic of the third paper of this thesis.
2.4 Functional shrinkage for linear subspaces
For the regression problem outlined in (2.1), Shin et al. (2020) developed a non-
parametric, Bayesian, flexible modeling approach that shrinks the posterior mean re-
sponse towards a pre-specified linear subspace Ω0 = {Φ0x|x ∈ Rn}. The approach is
adaptive because there is no shrinkage if the data suggests that the true mean response is
outside of Ω0. The mean response is modeled non-parametrically using B-splines (Carl,
2001):
∑k
f(x) = βmϕm(x),
m=1
where ϕm are B-spline bases evaluated at m = 1, . . . , k knots and design matrix Φ =
{ϕm(xi)}i,m. Shrinkage of the posterior mean response E(Φβ|y) into Ω0 is enforced by
the conditional prior specifications
( )
p(β|σ2, τ 2) ∝ (τ 2)−(k−d0)/2 1exp − β⊤Φ⊤(I − P )Φβ . (2.6)
2σ2τ 2
Φ0
and
(τ 2)b−1/2
p(τ) ∝ 1(0,∞)(τ), (2.7)
(1 + τ 2)(a+b)
where d0 = rank(Φ0), and PΦ0 = Φ0(Φ
⊤
0 Φ )
−1
0 Φ
⊤
0 is the orthogonal projection matrix
into the column space of Φ0. For example, shrinkage into a simple linear subspace can
be implemented using Φ0 = {1x}. The prior specification on τ , the shrinkage or scaling
parameter, is a horseshoe, standard half Cauchy prior for a = b = 1/2 (Carvalho et al.,
2010). It allows strong shrinkage of weak signals close to zero while leaving strong
signals unconstrained. In the model, this shrinks weak deviations of the response from
13
2 Statistical Methods
Ω0 such that the response is effectively in Ω0, while strong deviations of the response
from Ω0 remain unconstrained. For the hyperparameters a and b, Shin et al. (2020)
provides boundaries that guarantee optimally fast adaptive behavior (shrinkage or no
shrinkage) w.r.t. the sample size.
In the forth paper of this thesis, the approach by Shin et al. (2020) is extended to non-
linear function spaces. The relevance of this extension for toxicological applications is
highlighted as it is demonstrated on the non-linear Hill model.
14
3 Summary of the Articles
3.1 Article 1: Model selection characteristics when using
MCP-Mod for dose-response gene expression data
The first article contributes to the research area of dose-response modeling in a trans-
lational way. We identified MCP-Mod - a modeling method developed for clinical
dose-response analyses - as an adequate tool for dose-response modeling in a field where
MCP-Mod has not yet been used: gene expression data. Application of MCP-Mod in
gene-expression dose-response modeling is motivated by uncertainty considerations.
Gene-expression data are high-dimensional and it is therefore not feasible to carefully
choose appropriate modeling constraints for each gene. Also, differences in cell ex-
periments compared to human dosing data lead to further uncertainties in cell-based
dose-response data. If doses are applied to cells, there are fewer ethical constraints on
higher doses. This might lead to deviations from expected shapes. For example, down-
turn effects due to systematic toxicity at higher doses violate monotonicity assumptions.
Uncertainty regarding the final dose-response shape for gene expression data is hence
imminent, both due to the large number of genes that can not be viewed individually
as well as less constraints on dose selection. MCP-Mod addresses the issue of model
uncertainty by allowing a set of models to be considered, from which a single best model
is used or models are averaged for the final fit.
We apply and analyze the model selection workflow of MCP-Mod on a gene-expression
data set by Krug et al. (2013), where several doses of valproic acid (VPA) are applied
to embryonic stem cells. The six models linear, quadratic, sigmoidal Emax, Emax, and
beta model, are considered.
The aim of the first article was to analyze if any of the considered models are more
relevant than others, or might even be superfluous for the analysis. To address this
question, MCP-Mod was applied to the VPA data set, a simulation study was conducted
and a score that grades the model set’s performance was proposed.
The analyses lead to several conclusions that are relevant in practice and motivated
15
3 Summary of the Articles
the use of MCP-mod in other applied dose-response projects that face uncertainty (e.g.
Brecklinghaus et al. (2022)). The monotonicity assumption should not be stated for
analyzing dose-response gene-expression data. Models that were able to capture a
down-turn effect (beta and quadratic) cannot be excluded from the model set without
performance loss. The popular Emax model is a special case of the more general, and also
very popular sigmoidal Emax model. However, the Emax model can be excluded without
notable performance loss, especially if the sigmoidal Model is included in the model
set. The sigmoidal model is often selected if the response signal is clear. This result
aligns with the sigmoidal Emax model’s popularity in practice and its mechanistically
motivated origin. Often, gene expression data is noisy due to the many possible sources
of variability. For such genes, the linear model is useful for the detection of a general
trend, though the true dose-response shape likely is not linear and cannot be detected
accurately due to the noise.
To sum up, the first article laid an important groundwork for the cumulative work of this
thesis, which thereafter proceeded to focus on dose-response modeling, gene-expression
data, and uncertainty considerations.
3.2 Article 2: td2pLL: An intuitive time-dose-response model
for cytotoxicity data with varying exposure durations
The second article expands the research scope of dose-response modeling to time-dose-
response (TDR) modeling. TDR is often considered in the clinical context when the time
after intake of a dose is considered. However, this work focuses on TDR modeling in the
toxicological context where cells are continuously exposed to a dose of a compound for
a certain amount of time. Cells can be exposed continuously throughout a time period
in contrast to the momentary intake of a dose for humans, as cells are in a solution that
contains the administered concentration.
The contribution to the TDR research of this article is the proposition of a new time-
dose-response model tailored to cytotoxicity experiments and the development of an
R software package to facilitate its use. The proposed model is called time-dose 2-
parameter log-logistic (td2pLL) model. It is a two-dimensional, parametric approach
and relates the ED50 parameter to the exposure time:
ED50(t) = ∆t
−γ + C0, (3.1)
which leads to the TDR model
16
3.2 Article 2: Time-Dose-Response Modeling
xh
f(t, x) = 100− 100 . (3.2)
xh + ED h50(t)
This relationship is mechanistically motivated by a modification of Haber’s law, which
states that the toxic effect of a compound is the product of the concentration and the
exposure time (Haber, 1924). The added parameters ∆, γ and C0 are interpretable as the
magnitude of the exposure-time dependent effect, its potency, and a potential minimal
concentration toxicity threshold. The latter means that a minimal dose is required to
yield a toxicity effect for a hypothetical, infinitely long exposure duration. The lower
and upper asymptotes of 100 and 0 are considered fixed, as the model is tailored to
cytotoxicity experiments where initially all cells live and then continuously lose viability
with increasing dose.
The resulting model is more complex for practitioners than fitting a single dose-response
curve for each considered exposure time. However, the model has several advantages
if the experimental ranges of the dose and the exposure period allow to observe a
time-dependent change in the ED50. These include an increased precision in the ED50
estimate and the possibility to extrapolate ED50 for time periods not considered in the
experiment.
To decide whether the benefits justify the model’s complexity, a two-step pipeline was
suggested. In the first step, an ANOVA-based approach is used to verify if the ED50
depends on the exposure duration. If there is a dependency, the td2pLL model is used in
the second step. If not, this joint model is considered unnecessarily complicated, and
separate dose-response curves are fitted at each exposure duration.
The two-step approach using the suggested model is compared to other approaches
(always using the td2pLL model, always fitting separate time-doses curves per time
duration, ignoring the time data and fitting a single time-dose curve) in a simulation
study and applied on a real cytotoxicity data set by Gu et al. (2018).
The simulation results suggest that the two-step approach successfully decides between
using the td2pLL model or the simpler approach and leads to more precise ED50
estimates. The real data demonstration also indicates the benefit of the proposed model
and pipeline. However, the low number of different exposure points per compound used
in the experiments limits the reliability of these conclusions. Larger data examples are
required to provide a stronger empirical evaluation of the mechanistically derived ED50
dependency.
In summary, the suggested pipeline together with the provided software underlines the
application-oriented focus of this work. The article targets a broader audience, possibly
17
3 Summary of the Articles
with less profound statistical knowledge, to help bridge the gap between methodological
knowledge and its use in practice in toxicological risk assessment.
3.3 Article 3: Benefit of using interaction effects for the
analysis of high-dimensional time-response or
dose-response data for two-group comparisons
The third article is an application-oriented work and was motivated by the interdisci-
plinary projects together with the Leibniz Research Centre for Working Environments
and Human Factors (IfADo), e.g. by Su et al. (2023). It addresses how a statistically
well-known, potentially beneficial method has little use in practice for a very common
research situation in gene expression analyses. Precisely, we explain and illustrate the
use of interaction effects in gene expression analyses of experimental situations with
two factors with two levels each, or where one factor has more levels.
A typical experimental setup might compare the effect of a treatment (using Treatment
A or B) on two groups of genetically modified mice (Genotype 0 and 1). The biological
research question can then be formulated as: Does the treatment effect differ w.r.t. the
genotype? Statistically, this naturally translates into analyzing an interaction effect γ in
the model
yj = µ+ αgj + βcj + γgjcj + εj,
where µ is the mean in the reference group (genotype = 0, treatment = A), cj = 1 if
the treatment assigned to sample j is 1, otherwise cj = 0. Equivalently the genotype
of sample j is indicated by gj and εj is some error term. To test whether the treatment
effect differs between the genotypes, one considers the hypotheses H0 : γ = 0 and
H1 : γ ̸= 0.
The above approach is rather simple. However, while for many experimental situations
in practice, it would be appropriate to analyze them using the above approach, the reality
is different. Often a statistical detour is used that aims at analyzing interactions, but
evades the concept of (statistical) interaction effects in a model. Precisely, practitioners
might split the data set into samples of genotype A and samples of genotype B. Then,
separately for each genotype data set, the treatment effect is estimated. If the treatment
effect is significant (and possibly relevant) for one genotype data set, but not for the
other, a difference in the treatment effects w.r.t. the genotype is concluded. We denote
18
3.4 Article 4: Non-linear Shrinkage
this commonly found approach as ’Method I’ and the interaction effect-based approach
as ’Method II’.
In the article, we explain and demonstrate the differences and potential benefits of
Method II - modeling an interaction effect - using a gene expression data set where mice
are fed two different diets over many weeks. The diet was either a healthy standard diet
(SD) or an unhealthy, fatty western diet (WD) and we focused on the feeding periods of
either 3 weeks or 6 weeks. The effect of the WD over the SD is of interest. Biologically,
genes where the diet effect is different after 6 weeks of feeding compared to the diet
effect at the reference time of three weeks of feeding are of interest. Statistically, this
calls for modeling an interaction effect between diet type and feeding period. As we
dealt with RNA sequencing data, the expression levels are modeled using a Negative
Binomial distribution and a generalized linear model, as described in Section 2.3, and
implemented using the DESeq2 R package (Love et al., 2014). Method I and method II
were compared on the data set and results suggest that modeling an interaction effect
can lead to smaller but more specific gene sets.
The article provides practitioner-orientated explanations and real-data demonstrations on
using interaction effects tailored to gene expression analyses. Gene expression analyses
are a standard experimental procedure used by many laboratories with different levels of
statistical training available in-house. Therefore, the article contributes to bridging the
gap between available statistical research and software for gene expression analyses and
its optimal use in practice.
3.4 Article 4: Bayesian non-linear subspace shrinkage using
horseshoe priors
This article extends the work of Shin et al. (2020), who introduced shrinkage of non-
parametrically modeled response curves into parametric linear subspaces, towards shrink-
age into non-linear function spaces (cf. Section 2.4). In contrast to the previous articles,
this work is formulated within a Bayesian framework. While the work’s extension
towards non-linear function spaces is generally applicable for any non-linear parametric
model, its particular relevance for toxicological research is demonstrated by shrinkage
into the non-linear Hill model (cf. Section 2.1). The underlying motivation for this
work is that mechanistically motivated models as the Hill model are often non-linear and
always limited to being an approximation of a more complex, underlying physiological
mechanism. Therefore, deviations from a model that is overall plausible, cannot be
ruled out beforehand and a flexible modeling approach is needed that shrinks into the
19
3 Summary of the Articles
reasonable model, but is not fully constrained to the model space.
The methodological extension is based on conditional linearization of the non-linear
function space using a first-order Taylor expansion. Let hθ be the Hill model function (cf.
2.1) and ΩΘ0 = {hθ(x)|θ ∈ Θ, x ∈ R+0 } the corresponding function space. ΩΘ0 cannot
be represented by a matrix column space as for linear subspace shrinkage. Therefore,
consider a linear approximation of ΩΘ0 at θ0 for fixed x:
hθ(x) ≈ hθ0(x) + Ḟθ0(x)(θ − θ0) (3.3)
∣
with partial derivatives Ḟ = Ḟ (x) = ∂hθ(x)θ0 θ0 ∣
∣ ∈ Rn×s. Instead of P
∂θ Φ0
for the
θ=θ0
linear case, we propose to use Pθ = PḞ as projection into the column space of Ḟθ forθ
a given θ. Geometrically, this can be justified as Ḟθ(x) locally approximates hθ(x) by
spanning tangent planes at each xi ∈ x (Seber and Wild, 2003)[p. 130]. The resulting,
conditional shrinkage prior for β is
( )
p(β|σ2, τ 2,θ) ∝ 2 1(τ )−k/2 exp − β⊤Φ⊤(I − Pθ)Φβ . (3.4)
2σ2τ 2
Notably, the prior now depends on θ, which has to be updated and provided with prior
distributions as opposed to the linear shrinkage of Shin et al. (2020). The prior choices
for θ depend on the choice of hθ and available knowledge on the expected response.
Further, the conditional precision matrix of β has full rank due to the non-linearity of
hθ(x), leading to (τ 2)−k/2 instead of (τ 2)−(k−d0)/2 in the linear shrinkage case.
The proposed method can further be extended to shrinkage into combined function
spaces. Therefore, the corresponding Ḟθ of different hθ are combined horizontally to
form an overall derivative matrix, under the constraint that only one intercept column
remains. This straightforward extension to shrinkage into combined function spaces
allows for additional robustness and can properly capture uncertainty if more than one
parametric model is reasonable.
We demonstrate the method for shrinkage into the single Hill model space and the
combined space of the Hill model and the Power model. In an extensive simulation
study, these method implementations are compared to various alternative approaches,
namely B-splines, P-splines, a parametric fit with an additive horse-shoe shrinkage
spline, and common parametric fits (correctly or incorrectly specified). Additionally,
the method is applied to real data where age-dependent testosterone levels in men are
modeled.
20
3.4 Article 4: Non-linear Shrinkage
As a result, the non-parametric non-linear functional shrinkage approach performs
comparative to the oracle-condition parametric fit in case of correct function space
specification. If the wrong function space is specified, it correctly decides against
shrinkage into the function space and effectively becomes an unconstrained B-spline
fit as by construction. Shrinkage into combined function spaces leads to better fits due
to the increased robustness. In the context of dose-response modeling, this feature is
helpful for the common scenario of down-turn effects at higher doses. Here, the response
curve matches a Hill model up to a certain dose where a downturn starts, for example,
due to systematic toxicity. A limitation of the proposed method is that in the case of
function space misspecification, the missing shrinkage leads to too unconstrained fits
that are prone to overfitting. Approaches to overcome this and other limitations are
discussed in Section 4.
21
3 Summary of the Articles
22
4 Discussion and Outlook
For this thesis, four manuscripts with different forms of contributions to the dose-
response modeling research landscape in toxicology were presented. Dose-response
modeling is a large field with major applications in clinical trials, risk assessment, or
toxicology. Its methodological developments are additionally influenced by the fast
technological developments in high-throughput testing and genomics.
The first manuscript applied the dose-response modeling method MCP-Mod, originally
developed within the clinical context, for high-dimensional, toxicological dose-response
gene-expression data (Duda et al., 2022b). This application is innovative as the transfer
between dose-response modeling approaches between clinical applications and toxico-
logical applications is rare, despite large methodological overlaps of the two fields. Here,
the identification of the methodological transfer potential has an additional novelty aspect
as the transfer is not only from the clinical towards the general toxicological context
but to more specific, high-dimensional, toxicological gene expression data. The main
motivation to use MCP-Mod for gene-expression dose-response data is the method’s
ability to handle model uncertainty, which is crucial when modeling thousands of genes
simultaneously. As a result of this work, the necessity of considering model uncertainty
in gene-expression data was demonstrated. To quantify this, tailored measures and
evaluation procedures were additionally developed and a large simulation study was
performed.
The first paper’s scope was on model selection using MCP-Mod and further investiga-
tions on dose-response model averaging leave opportunities for future investigations.
Benchmark dose (BMD) estimation based on model averaging is a promising approach.
However, the inclusion of down-turn effect models might undermine the desirable
plateau at low doses and require further investigation. Further, multivariate approaches
to combine BMDs, potentially w.r.t. functional genetic groups, but within limits of
computational feasibility, are research opportunities based on this work. Related to this
task are the works of Kappenberg and Rahnenführer (2023) and the BMDExpress 2
software of the National Toxicology Program (Phillips et al., 2019). Kappenberg and
Rahnenführer (2023) uses an empirical Bayes framework to share Hill model parameter
23
4 Discussion and Outlook
information in functional genetical groups to improve the estimation of gene-wise and
gene-group-wise effective doses. The work is limited to the Hill model as a basis for
the parameter-sharing approach. BMDExpress 2 considers various models and genetic
pathway information to pool BMDs of genes within functional groups. The pooling is
limited to simple summary statistics, and challenges to incorporate fast, multivariate
approaches remain.
The second paper of this thesis proposed a mechanistically motivated two-dimensional
time-dose-response (TDR) model tailored to cytotoxicity experiments. Additionally, a
two-step guideline for parsimonious use of the model was provided and the methodology
is made available in a developed R-package (Duda et al., 2022a). The work is an
example that many toxicological applications benefit from tailored dose-response or TDR
method development, demonstrating the limitations of methodological interchangeability
between clinical and toxicological dose-response research. A limitation of the proposed
model is the small scale of the data used for the application, which limits statements
on empirical validation of the model. Larger, appropriate cytotoxicity data sets would
also promote the proposition of more parametric TDR models, opening doors for model
averaging considerations in the TDR modeling context.
The third paper of this thesis is application-focused and aims at promoting the optimal
use of available methods and software when analyzing RNA-sequencing data in dose-
response two-group comparisons. In particular, correct modeling of an interaction effect,
its implementation in the R-package DESeq2 (Love et al., 2014), thorough explanations,
and a demonstration of its methodological benefits on an RNA-sequencing mice data
set are provided. To build upon this work, extensive simulation studies and literature
search-based work that empirically assesses the potential of modeling interaction effects
on conducted studies offer a research outlook.
The last work of this thesis proposes a general Bayesian non-linear functional shrinkage
(NLFS) approach that can specifically be applied in the dose-response context. Based
upon a non-parametric model, the response adaptively shrinks towards a pre-defined,
non-linear function space (e.g. the Hill model), or a combination of function spaces.
NLFS is adaptive as shrinkage is not enforced if the data suggests that the function space
is misspecified, yielding a highly flexible approach that accounts for prior knowledge, but
also allows deviations. The method is an extension of Shin et al. (2020), who developed
this method for linear subspace shrinkage. The extension to non-linear function spaces
is based on a first-order Taylor expansion of the model space of interest. In an extensive
simulation study, NLFS was compared to other modeling approaches and applied to
a testosterone data set. To speed up computation, a tailored block Gibb’s sampler is
implemented, which uses Metropolis-Hastings and Slice Sampling.
24
The simulation results suggest that the proposed non-parametric method successfully
approximates a parametric fit if the function space is correctly specified. If the function
space is misspecified, the shrinkage quickly (for low sample sizes) vanishes, allowing
an unconstrained fit. We also demonstrated that shrinkage towards a combination of
approximated function spaces is straightforward and provides additional robustness
as further model uncertainty considerations can be met. An application example of
testosterone levels modeled for men of any age highlights the method’s strengths, as the
response is similar to the Hill model, while plausible, data-driven deviations from the
Hill model are present.
Several improvements and extensions of the proposed NLFS approach offer further
research directions. By construction, the model is a flexible non-parametric, uncon-
strained B-spline if the assumed function space is severely missspecified. This can lead
to undesirable overfitting. To overcome this limitation, the implementation of a smooth-
ness penalty as suggested by Wiemann and Kneib (2021) appears promising. Further,
the NLFS approach cannot handle a few (n < 10), distinct dose levels with repeated
measurements, which is common in practice. So far NLFS is implemented based on
uniformly drawn doses, which guarantees balanced shrinkage across the dose range and
circumvents rank-deficiency challenges. Adapting the approach and implementing an
additional grid that regulates the shrinkage independent of the doses is worth further
investigations. Within this extension, additional weights that allow for less shrinkage at
the end, e.g. to better account for down-turn effects, would meet further practical needs.
However, a grid approach requires additional hyperparameter considerations for the grid
size and density. To avoid these limitations, embedding the approach into a Gaussian
Process framework is another research perspective.
25
4 Discussion and Outlook
26
Bibliography
Aldridge, S. and Teichmann, S. A. (2020). Single cell transcriptomics comes of age.
Nature Communications, 11(1):4307.
Ananthakrishnan, R., Green, S., Chang, M., Doros, G., Massaro, J., and LaValley, M.
(2017). Systematic comparison of the statistical operating characteristics of various
phase i oncology designs. Contemporary clinical trials communications, 5:34–48.
Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count
data. Nature Precedings, pages 1–1.
Babb, J., Rogatko, A., and Zacks, S. (1998). Cancer phase i clinical trials: efficient dose
escalation with overdose control. Statistics in medicine, 17(10):1103–1120.
Bauschke, R. (2010). The Effectiveness of European Regulatory Governance: The Case
of Pharmaceutical Regulation. PhD thesis.
Bornkamp, B., Pinheiro, J., and Bretz, F. (2010). Dosefinding: planning and analyzing
dose finding experiments. R package version 0.4-1.
Brazma, A., Parkinson, H., Schlitt, T., and Shojatalab, M. (2001). A quick introduction
to elements of biology-cells, molecules, genes, functional genomics, microarrays.
European Bioinformatics Institute, Draft.
Brecklinghaus, T., Albrecht, W., Kappenberg, F., Duda, J., Vartak, N., Edlund, K.,
Marchan, R., Ghallab, A., Cadenas, C., Günther, G., et al. (2022). The hepatocyte
export carrier inhibition assay improves the separation of hepatotoxic from non-
hepatotoxic compounds. Chemico-biological interactions, 351:109728.
Bretz, F., Hsu, J., Pinheiro, J., and Liu, Y. (2008). Dose finding–a challenge in statistics.
Biometrical Journal: Journal of Mathematical Methods in Biosciences, 50(4):480–
504.
Bretz, F., Pinheiro, J. C., and Branson, M. (2005). Combining multiple comparisons and
modeling techniques in dose-response studies. Biometrics, 61(3):738–748.
27
Bibliography
Carl, D. (2001). A practical guide to splines. Springer.
Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). The horseshoe estimator for
sparse signals. Biometrika, 97(2):465–480.
Cheung, Y. K. and Chappell, R. (2000). Sequential designs for phase i clinical trials
with late-onset toxicities. Biometrics, 56(4):1177–1182.
Davis, J. A., Gift, J. S., and Zhao, Q. J. (2011). Introduction to benchmark dose methods
and us epa’s benchmark dose software (bmds) version 2.1. 1. Toxicology and applied
pharmacology, 254(2):181–191.
Dixon, W. J. and Mood, A. M. (1946). The statistical sign test. Journal of the American
Statistical Association, 41(236):557–566.
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P.,
Chaisson, M., and Gingeras, T. R. (2013). Star: ultrafast universal rna-seq aligner.
Bioinformatics, 29(1):15–21.
Duda, J., Hengstler, J. G., and Rahnenführer, J. (2022a). td2pll: An intuitive time-dose-
response model for cytotoxicity data with varying exposure durations. Computational
Toxicology, 23:100234.
Duda, J. C., Drenda, C., Kästel, H., Rahnenführer, J., and Kappenberg, F. (2023). Benefit
of using interaction effects for the analysis of high-dimensional time-response or
dose-response data for two-group comparisons. Scientific Reports, 13(1):20804.
Duda, J. C., Kappenberg, F., and Rahnenführer, J. (2022b). Model selection charac-
teristics when using mcp-mod for dose–response gene expression data. Biometrical
Journal, 64(5):883–897.
EFSA (2009). Guidance of the scientific committee on use of the benchmark dose
approach in risk assessment. EFSA Journal, 7(6):1150.
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan,
P., Bettman, B., et al. (2009). Real-time dna sequencing from single polymerase
molecules. Science, 323(5910):133–138.
Gondane, A. and Itkonen, H. M. (2023). Revealing the history and mystery of rna-seq.
Current Issues in Molecular Biology, 45(3):1860–1874.
Gordon, K. (2001). The oecd guidelines and other corporate responsibility instruments:
a comparison.
28
Bibliography
Goutelle, S., Maurin, M., Rougier, F., Barbaut, X., Bourguignon, L., Ducher, M., and
Maire, P. (2008). The hill equation: a review of its capabilities in pharmacological
modelling. Fundamental & clinical pharmacology, 22(6):633–648.
Gu, X., Albrecht, W., Edlund, K., Kappenberg, F., Rahnenführer, J., Leist, M., Moritz,
W., Godoy, P., Cadenas, C., Marchan, R., et al. (2018). Relevance of the incubation
period in cytotoxicity testing with primary human hepatocytes. Archives of toxicology,
92:3505–3515.
Haber, F. (1924). Zur geschichte des gaskrieges. Fuenf vortraege aus den jahren
1920–1923, pages 76–92.
Hill, A. V. (1910). The possible effects of the aggregation of the molecules of hemoglobin
on its dissociation curves. The Journal of Physiology, 40:iv–vii.
Hothorn, L. A. (2016). Statistics in toxicology using R. CRC Press.
Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U.,
and Speed, T. P. (2003). Exploration, normalization, and summaries of high density
oligonucleotide array probe level data. Biostatistics, 4(2):249–264.
Kappenberg, F., Duda, J. C., Schürmeyer, L., Gül, O., Brecklinghaus, T., Hengstler,
J. G., Schorning, K., and Rahnenführer, J. (2023). Guidance for statistical design
and analysis of toxicological dose–response experiments, based on a comprehensive
literature review. Archives of Toxicology, 97(10):2741–2761.
Kappenberg, F. and Rahnenführer, J. (2023). Information sharing in high-dimensional
gene expression data for improved parameter estimation in concentration-response
modelling. Plos one, 18(10):e0293180.
Krug, A. K., Kolde, R., Gaspar, J. A., Rempel, E., Balmer, N. V., Meganathan, K., Vojnits,
K., Baquié, M., Waldmann, T., Ensenat-Waser, R., et al. (2013). Human embryonic
stem cell-derived test systems for developmental neurotoxicity: a transcriptomics
approach. Archives of toxicology, 87:123–143.
Kurzrock, R., Lin, C.-C., Wu, T.-C., Hobbs, B. P., Pestana, R. C., and Hong, D. S. (2021).
Moving beyond 3+ 3: the future of clinical trial design. American Society of Clinical
Oncology Educational Book, 41:e133–e144.
Love, M. I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and
dispersion for rna-seq data with deseq2. Genome Biology, 15:550.
29
Bibliography
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A.,
Berka, J., Braverman, M. S., Chen, Y.-J., Chen, Z., et al. (2005). Genome sequencing
in microfabricated high-density picolitre reactors. Nature, 437(7057):376–380.
O’Quigley, J. and Shen, L. Z. (1996). Continual reassessment method: a likelihood
approach. Biometrics, pages 673–684.
Patro, R., Duggal, G., and Kingsford, C. (2015). Salmon: accurate, versatile and ultrafast
quantification from rna-seq data using lightweight-alignment. BioRxiv, 10:021592.
Phillips, J. R., Svoboda, D. L., Tandon, A., Patel, S., Sedykh, A., Mav, D., Kuo, B., Yauk,
C. L., Yang, L., Thomas, R. S., et al. (2019). Bmdexpress 2: enhanced transcriptomic
dose-response analysis workflow. Bioinformatics, 35(10):1780–1782.
Pinheiro, J., Bornkamp, B., Glimm, E., and Bretz, F. (2014). Model-based dose finding
under model uncertainty using general parametric models. Statistics in medicine,
33(10):1646–1661.
Pinheiro, J. C., Bretz, F., and Branson, M. (2006). Analysis of dose–response stud-
ies—modeling approaches. In Dose finding in drug development, pages 146–171.
Springer.
Ritz, C. (2010). Toward a unified approach to dose–response modeling in ecotoxicology.
Environmental Toxicology and Chemistry, 29(1):220–229.
Ritz, C., Baty, F., Streibig, J. C., and Gerhard, D. (2015). Dose-response analysis using r.
PloS one, 10(12):e0146021.
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edger: a bioconductor pack-
age for differential expression analysis of digital gene expression data. bioinformatics,
26(1):139–140.
Rogatko, A., Babb, J. S., Tighiouart, M., Khuri, F. R., and Hudes, G. (2005). New
paradigm in dose-finding trials: patient-specific dosing and beyond phase i. Clinical
cancer research, 11(15):5342–5346.
Rogatko, A., Schoeneck, D., Jonas, W., Tighiouart, M., Khuri, F. R., and Porter, A.
(2007). Translation of innovative designs into phase i trials. Journal of Clinical
Oncology, 25(31):4982–4986.
Sánchez, A. and de Villa, M. (2008). A tutorial review of microarray data analysis.
Bioinformatics, 1(1):1–55.
30
Bibliography
Sanger, F. and Coulson, A. R. (1975). A rapid method for determining sequences
in dna by primed synthesis with dna polymerase. Journal of molecular biology,
94(3):441–448.
Satam, H., Joshi, K., Mangrolia, U., Waghoo, S., Zaidi, G., Rawool, S., Thakare,
R. P., Banday, S., Mishra, A. K., Das, G., et al. (2023). Next-generation sequencing
technology: Current trends and advancements. Biology, 12(7):997.
Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995). Quantitative moni-
toring of gene expression patterns with a complementary dna microarray. Science,
270(5235):467–470.
Seber, G. A. and Wild, C. J. (2003). Nonlinear regression. hoboken. New Jersey: John
Wiley & Sons, 62(63):1238.
Shin, M., Bhattacharya, A., and Johnson, V. E. (2020). Functional horseshoe priors for
subspace shrinkage. Journal of the American Statistical Association, 115(532):1784–
1797.
Storer, B. E. (1989). Design and analysis of phase I clinical trials. Biometrics, pages
925–937.
Su, H., Haque, M., Becker, S., Edlund, K., Duda, J., Wang, Q., Reißing, J., Marschall,
H.-U., Candels, L. S., Mohamed, M., et al. (2023). Long-term hypercaloric diet
exacerbates metabolic liver disease in pnpla3 i148m animals. Liver International.
Tamhane, A. C., Hochberg, Y., and Dunnett, C. W. (1996). Multiple test procedures for
dose finding. Biometrics, pages 21–37.
Ting, N., Chen, D.-G., Ho, S., Cappelleri, J. C., et al. (2017). Phase II clinical develop-
ment of new drugs. Springer.
Trapnell, C., Pachter, L., and Salzberg, S. L. (2009). Tophat: discovering splice junctions
with rna-seq. Bioinformatics, 25(9):1105–1111.
Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith,
H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001). The sequence of the
human genome. Science, 291(5507):1304–1351.
Watson, J. D. and Crick, F. H. (1953). Molecular structure of nucleic acids: a structure
for deoxyribose nucleic acid. Nature, 171(4356):737–738.
31
Bibliography
Wetterstrand, K. A. (Last Updated: November 1 2021). The cost of sequencing a human
genome. https://www.genome.gov/about-genomics/fact-sheets/
Sequencing-Human-Genome-cost. Accessed: January 5 2024.
Wheeler, M. W., Lim, S., House, J. S., Shockley, K. R., Bailer, A. J., Fostel, J., Yang, L.,
Talley, D., Raghuraman, A., Gift, J. S., et al. (2023). Toxicr: A computational plat-
form in r for computational toxicology and dose–response analyses. Computational
Toxicology, 25:100259.
Wiemann, P. and Kneib, T. (2021). Adaptive shrinkage of smooth functional effects
towards a predefined functional subspace. arXiv preprint arXiv:2101.05630.
32
Part II
Publications
33

Article 1
Received: 19 August 2020 Revised: 11 November 2021 Accepted: 12 January 2022
DOI: 10.1002/bimj.202000250
RESEARCH ARTICLE
Model selection characteristics when using MCP-Mod for
dose–response gene expression data
Julia C. Duda Franziska Kappenberg Jörg Rahnenführer
Department of Statistics, TU Dortmund
University, Dortmund, Germany Abstract
We extend the scope of application for MCP-Mod (Multiple Comparison Proce-
Correspondence
Julia C. Duda, Department of Statistics, dure andModeling) to in vitro gene expression data and assess its characteristics
TU Dortmund University, Vogelpothsweg regarding model selection for concentration gene expression curves. Precisely,
87, 44227 Dortmund, Germany. we apply MCP-Mod on single genes of a high-dimensional gene expression data
Email: duda@statistik.tu-dortmund.de
set, where human embryonic stem cells were exposed to eight concentration lev-
Funding information els of the compound valproic acid (VPA). As candidate models we consider the
Bundesministerium für Bildung und sigmoid 𝐸max (four-parameter log-logistic), linear, quadratic, 𝐸max , exponential,Forschung, Grant/Award Number:
LivSysTransfer (031L0119); Deutsche and beta model. Through simulations we investigate the impact of omitting one
Forschungsgemeinschaft, Grant/Award or more models from the candidate model set to uncover possibly superfluous
Number: RTG 2624 (427806116) models and to evaluate the precision and recall rates of selected models. Each
model is selected according to Akaike information criterion (AIC) for a consid-
erable number of genes. For less noisy cases the popular sigmoid 𝐸max model
This article has earned an open data
badge “Reproducible Research” for is frequently selected. For more noisy data, often simpler models like the linear
making publicly available the code model are selected, but mostly without relevant performance advantage com-
necessary to reproduce the reported pared to the second best model. Also, the commonly used standard 𝐸max model
results. The results reported in this article
could fully be reproduced. has an unexpected low performance.
KEYWORDS
dose–response curves, gene expression, MCP-mod, model selection, toxicology
1 INTRODUCTION
In drug development, two major steps are of interest when a new compound is examined. First, changes in the dose or
concentration of the compound are intended to cause changes in the response. Once this relation is established, the precise
modeling of the dose–response curve is the next goal. It aims at finding the target dose for the confirmatory Phase III trials.
If multiple comparison procedures (MCPs) are used for signal detection, this can lead to less flexibility as target dose
estimation is restricted to the tested dose levels. One major methodological advancement in this field is the Multiple
Comparison Procedure and Modeling (MCP-Mod) approach by Bretz et al. (2005). It combines MCP and a modeling
(Mod) step by proposing amultistage procedure.MCP-Mod received a positive qualification opinion and a “fit for purpose”
designation by theEMAandFDA in 2014 and 2016, respectively, as statisticalmethodology to analyze Phase II dose-finding
studies under model uncertainty (European Medicines Agency, 2015; Food and Drug Administration, 2016).
This work extends the usual scope of application of MCP-Mod from clinical Phase II to gene expression data. As a
practical example, human embryonic stem cells are analyzed (O’Quigley et al., 2017, Chap. 12.3). Valproic acid (VPA) is
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the
original work is properly cited.
© 2022 The Authors. Biometrical Journal published by Wiley-VCH GmbH.
Biometrical Journal. 2022;64:883–897. www.biometrical-journal.com 883
884 DUDA et al.
used for treating epilepsy but it is known to be embryo-toxic when taken in the first trimenon of pregnancy (Genton
et al., 2006). The MCP-Mod framework can help to gain insights on concentration–response relationships between the
concentration of VPA and gene activity.
Specifically, we are interested in two aspects of MCP-Mod when applied on concentration–response data: the detection
of genes where VPA has an effect on the dose–response curve (power) and model selection. We investigate these proper-
ties in analyses on real and on simulated data. Further, the model performance or goodness-of-fit of selected models is
evaluated to identify which models are suitable for gene expression dose–response data.
Model selection and model performance differ substantially in the underlying theory. In model selection a statistical
model from a set of candidate models has to be selected, given a data set. The aim is to select the model that represents
the true, unknownmodel fun𝑅ction best (Chap. 1 of Claeskens & Hjort, 2008; Schorning et al., 2016)). In addition to select-ing the best model among the candidates, we also aim at identifying necessary or dispensable models. Therefore, we usethe goodness-of-fit measure 2adj. We combine the three aspects power, model selection, and goodness-of-fit in a newly
proposed score that summarizes the suitability of a model set. This approach is applied on the VPA data set and on simu-
lated data.
In the context of clinical Phase II trials, model uncertainty for dose–response modeling is considered to increase pre-
cision in target dose estimation—Ting (2006), Wheeler and Bailer (2009), Bornkamp et al. (2011) among many others.
In Phase II trials, decisions on the model set can be based on expert knowledge and concentrate on a single compound
and dose–response relationship. For gene expression data, model selection must be considered for thousands of genes
simultaneously and it is not straightforward to find or use prior knowledge on the dose–response profile of each gene.
House et al. (2017) and Filer et al. (2016) propose experimental pipelines that include concentration–response modeling
and model selection for toxicological gene expression data. They consider a flat model, the sigmoid 𝐸max model with all
four parameters or with the lower asymptote fixed to zero, and a gain–loss model that is similar to the beta model con-
sidered here. However, detailed investigations on the necessity of model selection and on appropriateness of candidate
model sets for gene expression concentration–response data are lacking, which motivates our work.
This paper is structured as follows. The VPA data set is introduced in Section 2. The statistical methodology including
MCP-Mod and both established performancemeasures and a newly proposed one are presented in Section 3. Our analysis
procedures and results that are based on theVPAdata set are presented in Section 4. Different controlled simulation setups
and corresponding results follow in Section 5. Final conclusions are summarized in Section 6. Source code to reproduce the
results is available as Supporting Information on the journalsweb page (http://onlinelibrary.wiley.com/doi/xxx/suppinfo).
2 GENE EXPRESSION DATA SET
The data set was first presented in the study of Krug et al. (2013), whereVPA is applied, among others, to human embryonic
stem cells (hESC). VPA is widely used to treat different forms of epilepsy. However, it is linked to an increased incidence in
congenital abnormalities (Cotariu & Zaidman, 1991). Krug et al. (2013) state that identifying changes in the transcriptome
induced by toxic substances illustrates interesting mechanistic insights.
Gene expression levels of the hESCs aremeasured repeatedly for different concentrations, using theGeneChipRHuman
Genome U133 Plus 2.0. The=data are preprocessed with the Robust Multi-Chip Average algorithm by Irizarry et al. (2003),such that the expression data are on the logarithmic scale with base 2.Th𝑑e d=ata set cont𝑑ain=s G 5𝜇4,675 probe sets, which will be referred to as genes in the following, for simplicity. Forevery 7gene, expressio8n values c𝑛orresponding to the concentrations 𝑑1 = 0, 𝑑2 = 25, 𝑑3 = 150, 𝑑4 = 350, 𝑑5 = 450, 𝑑6 =550, 800, and 1000 M2 =V⋯PA=ar𝑛e8a=va3ilable. For the control𝑁lev=el2𝑑71, 𝑛1 = 6 replicates were measured. For allother concentrations there are replicates. There are measurements per gene. The replicates are
biological replicates since different cells were used for each experiment. Due to functional relationships between genes,
we cannot assume independence between the measurements from different genes. Further, six or three replicates per
concentration is small for statistical modeling approaches. These problems are addressed in Section 4.
3 MCP-MODMETHODOLOGY AND PERFORMANCEMEASURES
In this sec𝑆tion, the methodology is presented. First, the MCP-Mod approach is outlined. Then, the performance measuresprecision and recall for evaluating the model selection in MCP-Mod are explained. Additionally, the newly proposedmeasure  is presented.
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DUDA et al. 885
TABL𝐵E 1 Dose–response models 𝑓(𝑑, 𝜽), their standardized versions 𝑓0(𝑑, 𝜽0), and the guesstimates for 𝜽0 for the analysis. For the betamodel is defined as 𝐵(𝛿1, 𝛿2) = (𝛿1 + 𝛿2)𝛿1+𝛿2∕(𝛿1𝛿1𝛿2𝛿2 ) and 𝐷 = 1200
M𝐸 odelmax 𝐸 𝐸
𝒇(𝒅+, 𝜽𝐸)𝐸0 + 𝐸max𝑑𝑑∕ℎ∕(𝐸(𝐸𝐷 + 𝑑) 𝑑
𝒇∕𝟎((𝒅,𝐷50ℎ + 𝑑ℎ) 𝑑ℎ∕𝐸(𝐸𝐷
𝜽𝟎)
𝐷50ℎ++𝑑)
𝜽𝟎
𝑑ℎ) 𝐸𝐸𝐷𝐷max 0 max 50 50 550 ∈ {100}Sigmoid
Exponential 𝐸𝐸0 ++ 𝐸1{exp(𝑑∕𝛿) − 1} exp(𝑑∕𝛿) − 1 𝛿 ∈
0{1=444.45505, }ℎ = 5.117
Linear 0
Quadratic 𝐸𝐸0 +
𝛿
+ 𝐸𝛽
𝑑
1𝑑 + 𝛽 𝑑2 𝑑𝑑 + (𝛽 ∕|𝛽 |)𝑑2 ∅𝛿 = 𝛽 ∕|𝛽 | = −0.001
Beta 0 max𝐵(𝛿21, 𝛿2)(𝑑∕𝐷)𝛿1 ⋅ (1 − 𝑑∕𝐷)𝛿2 (𝑑∕𝐷)𝛿21 (1 −1 𝑑∕𝐷)𝛿2 𝛿1 = 22 1, 𝛿2 = 1
3.1 MCP-Mod
The MCP-Mod approach was originally developed by Bretz et al. (2005) to model dose–response relationships in Phase II
clinical trials under model uncertainty. For details see also Xun and Bretz (2017) and Bornkamp et al. (2009).
The MCP-Mod methodology comprises two analysis steps. First, in the MCP-step, a statistically significant signal in
a gene is determined by an optimal-contrast test for a prespecified set of candidate models. If such a signal is found for
at least one model, a significant result of the multiple comparison procedure (signifMCP) is present for the gene. This
means that an effect of VPA on the gene activity is established. The second step, Mod, refers to the modeling. From the
set of mod𝑑els, for which a signifMCP has been established, one model fit is chosen and used as final fit for the data.𝑖A=lte1r,n…at,i𝑘vel1y, m𝑗o=de1l ,averaging can be performed.Denote as placebo concentration and 𝑑2 < … < 𝑑𝑘 as active concentrations with 𝑛𝑖 replicates. For concentrationand … , 𝑛𝑖 , 𝑁 = 𝑛1 +⋯+ 𝑛𝑘, the (preprocessed) expression values are modeled as
𝑦𝑖𝑗 = 𝜇(𝑑𝑖) + 𝜀𝑖𝑗, 𝜀𝑖𝑗 𝑖.∼𝑖.𝑑.  (0, 𝜎2), (1)
with homogeneous variance 𝜎2 𝑀> 0. The mean response E(𝑦𝑖𝑗) = 𝜇𝑖 = 𝑓(𝑑𝑖, 𝜽) at concentration 𝑑𝑖 is assumed to follow aconcentration–response model with parameter vector 𝜽 and 𝜀𝑖𝑗 as independent errors.For the MCP step, a set of candidate models needs to be prespecified. Models commonly used for dose–response
relationships are summarized in Table 1.
All models summarized in the first column o𝑓f(T𝑑a,b𝜽l)e=1 c𝜃an be r𝑓 (𝑑, 𝜽 ) 0 + 𝜃1𝑓
efor
0(𝑑m, 𝜽ulated as0), (2)
(see second column of Table 1), where 0 0 is the standardized version of a model. Introduction of the standardized
model shape concept is crucial for choosing optimal contrasts in the MCP step, as their choice is scale invariant.
It remains to determine initial guesses for the parameter 𝜽0. In practice, for a Phase II study, careful considerations
and prior knowledge on expected percentages of maximal effects at certain doses are translated into guesstimates for 𝜽0.
Here, the large number of genes makes individual, gene-dependent deci𝑓si0o(𝑑n,s𝜽o0n 𝜽0 difficult. We therefore use the sameguesstimates for all genes. Figure 1 displays the (rescaled) model shapes ) used for the analysis. The guesstimates
are listed in Table 1.
To the best of our knowledge there is little preliminary work on dose–response model selection in the context of gene
expression data (Filer et al., 2016; House et al., 2017). In toxicology, often monotone dose–response relationships are
assumed. Especially the 𝐸max model, a special case of the sigmoid 𝐸max model with ℎ = 1, was found to be appropri-
ate for the majority of dose–response relationships in a large meta-analysis of clinical dose–response studies (Thomas
et al., 2014). The inclusion of these two monotonic models in the candidate model set is therefore obligatory. The linear
model is added as a reference or baseline model. For genes where the true underlying model might be a sigmoid 𝐸max
model, but at the maximal considered dose, the turning point has not yet been reached, the exponential model might be
more suitable. The quadratic and the beta model are included as nonmonotone shapes. They are similar to the gain–loss
model used by Filer et al. (2016). There might be a nonmonotone relationship between concentration and gene activity,
for example, for metabolic genes. Such genes might be activated at lower VPA concentrations but successively deactivated
at increasing, highly toxic concentrations.
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
886 DUDA et al.
0 200 400 600 800 1000
betaMod emax exponential
1.0
0.8
0.6
0.4
0.2
0.0
linear quadratic sigEmax
1.0
0.8
0.6
0.4
0.2
0.0
0 200 400 600 800 1000 0 200 400 600 800 1000
Dose
F IGURE 1 Candidate model shapes used for the six concentration–response models
The specific guesstimates for 𝜽0 for each (m𝐸o𝐷del)are chosen such that a wide range of dose–response shapes is covered.Further, we consider that during the experime5n0tal design stage, the concentrations were chosen with the expectationth𝐸a𝐷t th≈e d4o5s0e with𝐸𝐷the h≈a8lf0m0 aximal effect is close to 450 and a plateau is reached at concentration 1000, whichtran5s0lates into the sec9𝐸o5𝐷nd guess that 95% of themaximal effect is reached at concentration 800.With these two assumptions( and 50𝑚 =),1t,h…e guesstimate 𝜽
0 for the sigmoid 𝐸max model can be calculated analytically. For the 𝐸max
model, a gu𝑡ess of an of 300 is,u𝑀sed. And for the exponential model, an 𝐸𝐷50 of 700 is assumed.Eac𝒄h 𝝁candidate s𝒄hap=e,(𝑐 , … , 𝑐 , defines a respective mean response vector 𝝁𝑚 = (𝜇𝑚1, … , 𝜇𝑚𝑘). For the MCP-step,a contr𝑚⊤ast𝑚 -test as f𝑚irst de𝑚sc1ribed𝑚by Abelson et al. (1963) is calculated. The test is constructed based on the linear con-trast where 𝑘)⊤ is chosen to maximize the power of the test for the assumed mean response 𝝁𝑚
(Bornkamp et al., 2009). This yields the hypotheses H𝑚0 ∶ 𝒄⊤𝑚𝝁𝑚 = 0 and H1𝑚 ∶ 𝒄𝑚⊤𝝁𝑚 ≠ 0.
The test statistics for the contrasts are given by
𝑇𝑚 = ∑𝑘𝑆√ 𝑖=∑𝑘1 𝑐𝑚𝑖=1 𝑐2𝑖
?̄?𝑖
?̄? 𝑚𝑖
∕𝑛𝑖 , 𝑚 = 1,… ,𝑀, (3)
whe(r𝑇e1, 𝑖…is, 𝑇th𝑚e)o⊤bservedmean at dose 𝑑𝑖 and 𝑆2 𝑡= ∑𝑘𝑖=1∑𝑛𝑗=𝑖 1(𝑦𝑖𝑗 − ?̄?𝑖)2∕(𝑁 − 𝑘) is themean squared error. Under H0 and(1), follows a central, multivariate -distribution.
A dose–response sig𝛼nal is established if 𝑇𝑇ma>x =𝑞 max(𝑇1, … , 𝑇𝑚) > 𝑞1=−𝛼{,𝑀w{h, …e𝑚0r,e,𝑀𝑞𝛼𝑚1}i}s th𝐿e equicoordinate 𝛼-quantile ofthe null distribution. This approach leads to𝑚multi1p−l𝛼e testing ad∅justm∗ent for1H H𝐿 with a strict control of the familywise error rate at level . The models with form∗th≠e set of significant models with estab-lished signifMCP. The modeling step is only executed if , that is, a dose–response signal is established for at least
one model.
During the Mod-step, either one fitted model of those that passed the MCP-step can be chosen for a final fit or all fitted
models that passed theMCP-step can be averaged. If a singlemodel is selected, criteria as the Akaike information criterion
(AIC) or the Bayesian information criterion (BIC) as well as the largest test statistic (maxT) can be used to pick a model.
For model averaging, standardized weights based on the AIC or BIC can be calculated for the models in, and the final
model is the resulting weighted average of each of the fitted models.
Calculations are done with the DoseFinding R package, version 0.9-17, and the statistical software R, version 4.0.2 (R
Core Team, 2020). For theℎnumerical estimation of the nonlinear parameters, we use the default boundary setting of theDoseFinding package. As the maximum concentration is 1000, this leads to boundaries for the 𝐸𝐷50 parameter of [1,1500] and [1/2, 10] for the parameter of the sigmoid 𝐸max model.
Model means
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DUDA et al. 887
3.2 Measures
In this section, we briefly present for our context the definitions of the evaluation measures precision, recall, and 𝑅2adj.
Further, a new measure  is proposed s𝑀pe∈cifically=f{o𝑀r th, e…c,o𝑀nte}xt of using MCP-Mod with a fixed candidate set  onmany dose–response data sets.For a set of genes and a specific model ∗ 1 𝐿 , the precision is defined as the conditional probability
that a model is correct, given that it has been selected. Accordingly, the recall is defined as the conditional probability that
a model is selected, given that it is correct (Buckland & Gey, 1994). Formally, we denote
precision = ?̂?(Model is correct |Model is selected) = 𝑡𝑝 𝑡+𝑝𝑓𝑝 ,
recall = ?̂?(Model is selected |Model is correct) = 𝑡𝑝
𝑡𝑝 𝑓𝑝 𝑓𝑛 𝑡𝑝 + 𝑓𝑛
,
where , , and are the number of true positives, false positives, and false negatives. Precision and recall values are
in the interval [0,1], and a larger value corresponds to a better performance. They can only be evaluated in simulations
where the correct model is known.
For a model fit 𝑓(⋅, ?̂?) and data 𝑦𝑖𝑗 of a specific gene, 𝑖 = 1, … , 𝑘, 𝑗 = 1,… , 𝑛𝑖 , we use 𝑅2adj defined as
𝑅2 = 1 − (1 − 𝑁𝑅2−)(𝑁𝑝 − 1)adj , 𝑅2 = 1 − 𝑆𝑆𝑆𝑆𝐸𝑇 , (4)
where 𝑆𝑆𝐸 = ∑𝑘𝑖=1∑𝑛𝑗=𝑖 1(𝑦𝑖𝑗 − 𝑓(𝑑𝑖, ?̂?))2 and 𝑆𝑆𝑇 =𝑝 ∑𝑘𝑖=1∑𝑗𝑛=𝑖 1(𝑦𝑖𝑗 − ?̄?)2 is the sum of squared errors and total sum ofsquares, respectively. The number of parameters is and the total number of measurements is 𝑁 = ∑𝑘𝑖=1 𝑛𝑖 .We further propose a new measure, the suitability of model set score . It can be used in a descriptive manner when
MCP-Mod is applied to dose–response or concentration–response data of many genes. The score balances two desired
properties. First, the number of detected signa𝑅l2s (signifMCPs) is desired to be large. Additionally, the detected signals arealso desired to be clear, that is, to have a large adj value. The score balances the number of detected signals and the model
performance, that is, power and goodness-of-fit. It is defined as
 = 𝐺1 ∑𝐺 𝑔=1 𝟙{Gene 𝑔 has significant MCP after adjustment} ⋅ 𝑅2adj, (5)
where 𝐺 denotes the total number of genes and  the considered set of candidate models. For a given set , the score
summarizes the proportion of genes with significant MCP after adjustment, weighted by the goodness-of-fit of the respec-
tive genes. Adjustment means that the false discovery rate (FDR) is controlled using the Benjamini–Hochberg (BH) pro-
cedure (Benjamini & Hochberg, 1995).
In the context of MCP-Mo𝑝d, this means that for each gene, the sma𝑝llest 𝑝-value fr𝑝om the MCP tests of all candidatemodels is chosen. Consequently, each gene is represent𝑅ed by a single -value. These -values are adjusted with the BHprocedure. If a BH-adjusted -value is below 0.05 then th2 is results in a multiplicity-adjusted significant concentration–response signal. As performance measure, the value of adj of the winner model w.r.t. AIC for the corresponding gene is
used. In general, when comparing two values 1 and 2 , the larger v𝑅alue indicates a favorable choice of the candidateset, since both the number of detected signals and the model performance2are taken into account. For improved clarity, inaddition both the proportion of genes with detected signal and themean adj of the fit of the winnermodels corresponding
to those genes will be reported.
4 DATA-BASED ANALYSIS
In this section, setups and results of the data-based analyses are presented. In the following, they are referred to as Analysis
I and Analysis II. In Analysis I, MCP-Mod is applied on the real VPA data set and an overview on model selection results
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
888 DUDA et al.
TABLE 2 Overview of the analyses scenarios and their respective data generation details (Section 4) as well as for the simulation study of
Section 5
Part Data (generation)
Analysis I main Original VPA data
Analysis II LOMO O𝑛 ri=ginSimulation 𝜎1= 𝑞⋯
al VPA data
(0.5=)𝑛⋅ 8 ∈ {3, 5, 10}range (𝜎 = 𝑞null(0.5) for null-model)
6942 F IGURE 2 Distribution of winner counts per model
6000
4000 3739
3067 3090
2376
2000
979
0
betaMod emax exponential linear quadratic sigEmax
Winner model by AIC given signifMCP
is provided. An additional goal is to check if any model can be omitted from the candidate model set because it can be
easily substituted by another model. Analysis I is extended by Analysis II through leave-one-model-out (LOMO) analyses.
These include that the entire analysis of Analysis I is repeated several times, and each time one of the candidate modelsof is omitted. For an overview of the different analyses and the simulation, see Table 2.
4.1 Setup for Analysis I
In Analysis I, MCP-Mod is applied independently on each gene of the VPA data set. As we cannot assume only increasing
or decreasing effects, each gene is tested with two-sided contrast tests with significance level 𝛼 = 0.05.
We apply multiplicity adjustment between genes by controlling the FDR using the BH procedure as described in Sec-
tion 3.2. For each gene, if a dose–response signal is detected and hence at least one model passes the MCP-step, the AIC is
used as model selection criterion. For small sample sizes, Schorning et al. (2016) show that the AIC outperforms the BIC,
especially if the true underlying model is a complex one among the considered models. There are 𝑁 = 27 observations
per gene in the VPA data set. Thus we use the AIC to avoid too low selection counts of possibly correct, more complex
models. In our analysis we will see that even with the AIC the simple linear model is often selected.
4.2 Results for Analysis I
Of the 54,675 genes, when controlling the FDR, 20,193 (36.9%) genes pass the MCP-step, that is, their concentration–
response profile significantly differs from a flat profile. VPA has a significant effect on the activation (deactivation) of
these genes. For each gene one winner model is selected by AIC as a final fit (Figure 2). The linear model is selected most
often (34.4%), because the AIC penalizes more complex models. However, Figure 3 clearly shows that the linear model
performs compa𝐸rmaatxively poorly with respect to the 𝑅2adj.The popu𝑅l2ar model (cf. Thomas et al., 2014, among many others) wins the fewest times and when it does win, itsfit has low adj values (Figures 2, 3). Figure 4 shows the distribution of model winner counts w.r.t. AIC stratified by 𝑅2adj.
Less noisy genes are represented by the rightmost plot. Assuming (1), we refer to more (less) noisy genes as genes whose
Number of genes
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DUDA et al. 889
F IGURE 3 Performance of winner models 1.00
0.75
0.50
0.25
betaMod emax exponential linear quadratic sigEmax
Winner model by AIC given signifMCP
[0, 0.25] (0.25, 0.5] (0.5, 0.75] (0.75, 1]
n = 526 n = 7821 3452 n = 6854 2346 n = 5086 1702
232
3000 2000 1500
200
1500
2000 1249 1000 981 912
1115 1141
108 694
100 1235 1000
78 1177 741
609
1000 500
44 632
771
485 500
22 262 18817
0 0 0 0
od ax
l
M nt
ia ea
r tic ax d x l l l
m n ra m M
o ma nt
ia r cea ati a
x od ax nti
a ar tic ax od ax tia ar tic ax
eta e n
e li ad
e a e a
o igE
n
u et
a e e lin rn m M
r m M
ad E ta em ne lin ad E ta em ne lin ad
r Em
b xp q s b xp
o qu si
g be o ig e o g
e e ex
p qu s b xpe q
u si
Winner model by AIC given signifMCP
F IGURE 4 Distribution of winner counts per model stratified by 𝑅2adj
underlyi𝑅n2g model has larger (smaller) s𝐸tamnadxard deviations 𝜎 in relation to the response range, which leads to smaller(larger) adj values. While the sigmoid and the beta model win most often for the least noisy stratum, the 𝐸max
model is chosen rarely, regardless of the strata. The nonmonotone beta and quadratic model are chosen considerably
often. For more noisy genes the linear model is preferred. For these genes, none of the models explains a lot of variance,
which favors the linearmodel in terms𝐸of AIC. Hence, if the linearmodel is selected by AIC, one should hesitate to assumea true linear concentration–resp𝐸o𝐷nse reTo ensure that the low number 5o0f m
laatxionship. Some example fits are visualiwinners is not only due to too strict𝐸zed in Figure 5.pmaarxameter constraints in the numericaloptimization, 𝐸w𝐷e visualize the p𝐸arameter estimates for genes where the model won (Appendix, Figure A1).There is no 𝑆ev𝑆i𝑇den50ce that the poor perfomr𝑅amx ance of the 𝐸max model is due to optimization constraints, but instead due tothe often low estimates. For an model, a low 𝐸𝐷50 translates to an early plateau, which can lead to an 𝑆𝑆𝐸close to the and therefore to a small 2adj.
It is also of interest if any of the models in the candidate se𝑅t is redundant such that it can be substituted by anothermodel. Removing such a model from the candidate set would2increase power as the number of hypotheses would bedecreased in the MCP-step of each gene. If for many genes the adj for the winner model and the second best model differ
substantially, th𝐸e winning model should be considered for future analyses. If the quadratic model is the winning modelwith a good fit, mmaanxy genes cannot be modeled well by the secon𝐸d best model (Figure 6).The sigmoid and the beta model pe𝑅rformances also differmbayx a considerable amount to the second best model’sperformance across the whole range of explained variance. The and the exponential model can mostly be replacedby othermodels without substantial loss in 2adj. This especially applies to genes with larger explained𝐸vmaraixance. The linearmodel can always be replaced with minimal loss in explained variance, as it is a special case of the model and the
quadratic model.
Number of genes
R2adj
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
890 DUDA et al.
Gene: 235456_at, Gene: 201152_s_at, Gene: 1554418_s_at, 
Model: betaMod Model: emax Model: exponential
7.5 5.0
8.0
7.0 4.5
4.0
6.5
7.5
3.5
6.0
3.0
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
Concentration Concentration Concentration
Gene: 204400_at, Gene: 203883_s_at, Gene: 217837_s_at, 
Model: linear Model: quadratic Model: sigEmax
9.8
10.0
8.5
9.4
9.6
9.0 8.0
9.2
8.6 7.5
8.8
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
Concentration Concentration Concentration
Gene: 1555004_a_at, Gene: 236937_at, Gene: 1558000_at, 
Model: emax Model: exponential Model: linear
4.1
4.2
3.9
4.0
4.0
3.7 3.9
3.8
3.5 3.8
3.6
3.3 3.7
3.4 3.6
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
Concentration Concentration Concentration
F IGURE 5 Example selection of nonnoisy (rows 1 and 2) and noisy (row 3) genes of the VPA data set with significant
concentration–response model, with added fit of the model selected as winner w.r.t. AIC. Gray dots represent single response values, and the
red dots indicate the mean responses per concentration. Each model has an 𝑅2adj of at least 0.75 (rows 1 and 2) or below 0.25 (row 3)
4.3 Setup for Analysis II
Analysis II offers further insights into possibly expendable models in the candidate set. The analysis setup is similar to
the one of Analysis I. Analysis I is redone several times, but each time one model from the candidate model set is omitted.
We refer to these as LOMO analyses.
4.4 Results for Analysis II
The number of FDR adjusted significant concentration–response relationships is similar to the main analysis where no
model is left out (Table 3). This finding is consistent with the results of Pinheiro et al. (2006). If the quadratic model is
omitted from the candidate model set, the total number of signifMCPs increases at the cost of reduced mean 𝑅2adj for the
remaining genes.This is due to the different, rarely appropriate shape of the quadraticmodel compared to all othermodels.Measured by the  score, it is proposed to drop the 𝐸max model from the candidate model set (indicated in bold). The
log2Expression log2Expression log2Expression
log2Expression log2Expression log2Expression
log2Expression log2Expression log2Expression
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DUDA et al. 891
𝑅F2IGURE 6 Scatter plots stratified by winner model (by AIC) showing the 𝑅2adj value of the winner on the 𝑥-axis and the difference to theadj value of the second best model (by AIC) on the 𝑦-axis
TABLE 3 Total number of FDR adjusted significant genes in each LOMO analysis, number of gained and lost genes compared to the
case when no model is left out (Analysis I),  score, rate of FDR adjusted significant genes, and mean 𝑅2adj among significant genes. Largest score indicated in bold
Model 𝐸 Total Gained Lost  signifMCP rate mean 𝑹𝟐adjSigmoid max 20,122 61 132 0.2114 0.3680 0.5743
Quadratic 20,459 697 431 0.2119 0.3742 0.5664
Beta 20,221 150 122 0.2120 0.3698 0.5732
Exponential 20,164 529 558 0.2120 0.3688 0.5748
Linear 20,178 91 106 0.2127 0.3691 0.5763
𝐸None 20,193 0 0 0.2134 0.3693 0.5778max 20,349 330 174 0.2138 0.3722 0.5745
sigmoid 𝐸max model, which contains the 𝐸max model as a special case, decreases the score the most when removed from
the candidate set.
We are further interested by which model an originally selected model after its omission is typically substituted in the
modeling step (Figure 7). The beta model is selected more often, if the sigmoid 𝐸max model is removed and vice versa. If
the often selected linear model is omitted, the exponential model is most often replacing it.Two additional evaluations regarding the validity of the  score were conducted. First, a copy of the VPA data set
was generated and all 3067 genes where the exponential model won by AIC were removed and the LOMO analyses wererepeated. As expected, in this artificial scenario the  score suggests to drop the exponential model (Table 4, second
column from the left).
Second, Analysis I was repeated but with a single model as candidate model (Table 4). When a single candidate modelis used, the  score is always smaller than when only one or no model is omitted from the candidate model set and the
original VPA data ar𝐸e used. The lowest score of 0.0825 is obtained if the quadratic model is the only candidate model. Thisis because the quadramtiacxmodel shape passes the MCP-step for only 12.27% of the genes. When using only one candidatemodel, the sigmoid model has the largest score of 0.2036.
The absolute differences in scoresmight appear small butmust not bemisinterpreted as irrelevant. For the artificial
scenario where genes with the exponential model as winner model are removed, omitting the exponential model from thecandidate model set is considered reasonable by construction. Therefore, the corresponding difference in the  score of
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
892 DUDA et al.
8000
Left out model
6000
betaMod
emax
4000 exponential
linear
quadratic
2000 sigEmax
none
0
betaMod emax exponential linear quadratic sigEmax
AIC winner model (given signifMCP)
F IGURE 7 Absolute count of selections by AIC in the modeling step of each model that had a signifMCP in the MCP-step for each
LOMO scenario
TABLE 4  score, rate of significant genes, and mean 𝑅2adj among significant genes for two added analyses: The LOMO analyses on the
modified VPA data set where genes with exponential winner model are removed and Analysis I repeated but with only one model in the
candidate model set. Largest  score indicated in bold
LOMO analyses on VPA data set without Only model in candidate model set on
exponential winner genes original VPA data set
signif. mean signif. mean
Mod𝐸el  genes (%) 𝑹𝟐adj  genes (%) 𝑹𝟐adjsig. max 0.1789 0.3062 0.5844 0.2036 0.3771 0.5399
Beta 0.1795 0.3077 0.5834 0.2013 0.3647 0.5520
Quadratic 0.1795 0.3117 0.5758 0.0825 0.1227 0.6722
Linear 0.1803 0.3071 0.5869 0.1855 0.3768 0.4922
𝐸None 0.1810 0.3078 0.5882 - - -max 0.1813 0.3097 0.5854 0.1690 0.3235 0.5225
Exponential 0.1834 0.3173 0.5778 0.1669 0.3011 0.5544
0.0024 can be interpreted as relevant. Only using the sigmoid 𝐸max model compared to using the full candidate model set
differs by 0.0098, which can hence be viewed as a relevant difference such that it would not suffice to use a single model.The interpretation of the  score is not straightforward, which is discussed in Section 6.
5 SIMULATION-BASED ANALYSIS
The simulation gives insights on the effect of the number of replications per concentration level while standard deviation
of the noise is fixed (Table 2). Opposed to the data-based analysis, the correct model is known such that precision, recall,
and goodness-of-fit can be evaluated.
5.1 Setup
Concentration–response data sets are generated for 10,000 genes for each of the six considered models and for the null
case, as well as for three different numbers of replicates 𝑛𝑖 and a fixed standard deviation to range ratio (see Table 2).
Details on how the range and𝑛standard deviation are chosen are explained in the following. The null case means that aconstant model is used to gener𝑖ate the data. In order to have a realistic data generation process, it is based on the real VPAdata set. For each con𝑓sid=er𝑓e(d𝑑, 𝜽,)a∈data set structu=ra7lly similar to the VPA data set but with 70,000 genes, 10,000 for eachof the six nonflat models, and 10,000for the flat𝑞n(0u.l5l)model, is generated as follows.Consider a model where | | and for the null case, 𝑓 = 𝑓(𝑑, 𝜽) = 𝑓(𝑑, 𝑐) = 𝑐 > 0. The assumedratio of standard deviation to range denoted by is explained below. Define ∗(𝑓) as the set of all genes 𝑔 for which
Gene count
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DUDA et al. 893
TABLE 5 Summary of the simulation results, stratified by correct model and chosen model, respectively. Signif. genes (%) is the rate of
FDR adjusted, detected dose–response signals among the 10,000 generated dose–response data for each model, with 𝑛𝑖 replicates at each dose
level. For the recall rate, th𝒏e model is the correct m𝑬 odel. For the precision rate, the model is the selected modelMeasure 𝒊 Beta 𝐦𝐚𝐱 Exponential Flat Linear Quadratic sig. 𝑬𝐦𝐚𝐱
3 0.989 0.982 0.979 0.037 0.988 0.988 0.997
Signif. genes (%) 5 1.000 1.000 1.000 0.042 1.000 1.000 1.000
10 1.000 1.000 1.000 0.045 1.000 1.000 1.000
3 0.447 0.496 0.561 0.963 0.705 0.543 0.471
Recall 5 0.567 0.552 0.657 0.958 0.750 0.638 0.567
10 0.712 0.616 0.754 0.955 0.770 0.723 0.717
3 0.582 0.658 0.682 0.925 0.416 0.547 0.510
Precision 5 0.627 0.746 0.745 0.999 0.529 0.565 0.586
10 0.676 0.790 0.822 1.000 0.713 0.614 0.689
model 𝑓 was the selected winner model by AIC in the VPA data set in Analysis I. Further, (𝑓) is a sample of 10,000
genes drawn with replacement𝑓f(r𝑔o)(m𝑑,?̂?∗)(𝑓). For a g𝑓ene 𝑔 ∈ (𝑓), the true und𝑔erlying concentration–response relationshipis assumed to be the mod𝑐el fit of model on the VPA d𝑔ata of gene . For the null model, the mean response ?̄?
(𝑔)
is used as true value for .
Given this true concentration–response relationship of gene , noise is added to generate a data set according to the
m𝑓(o𝑔)d(e𝑑l𝑖 ,e?̂?q)u+ati𝑒o𝑖𝑗n (𝑗1)=. F1o,r…co, 𝑛n𝑖centration levels 𝑑 ∈ {𝑑1, … , 𝑑8} used in the original experiment (see Section 2), generate 𝑦𝜎 𝑖
(𝑗𝑔) =
max (𝑓 (𝑑, , ?̂?))(−𝑓(𝑔),m𝑠)i.∶nT=he(a𝑓dde((𝑑d𝑓,n(?̂?𝑔o)))i)s⋅e𝑞v(a𝑠l)ues 𝑒𝑖𝑗𝑞a(r𝑠e) independently drawn from 𝜀 ∼  (0, (𝜎(𝑓
(𝑔), 𝑠))2). If 𝑓(𝑔) is not
the null case(𝑔,)then ran(g𝑔)e , where the range for a gene with model 𝑓 is calculated as range(𝑓(𝑔)) ∶=𝑑∈{𝑑1,…,𝑑8}𝑔 𝑑∈{𝑑1,…,𝑑8} . The term is the empirical 𝑠-quantile of the ratio 𝑆(𝑔) (𝑔)all genes with a detected signal and their respect𝑠iv=e 0fi.t5s in Analysis𝑞I(.0H.5e)re=, (0𝑆.3(𝑔2)2)22= ∑𝑘𝑖=1∑𝑛𝑗=𝑖 1(𝑦𝑖(𝑔)
∕−ra?̄?n(𝑔g)e)(2𝑓∕(𝑁)−ac𝑗 𝑖 𝑘ro)ssis
the estim𝜎(ated variance for each gene. Hence, for , we obtain , which is used for all nonflat modelsand all𝑞ge𝑓n((e𝑔0s).,5t𝑠o))=c=a0l𝑞c
(𝑔)
.1u9la0(t9𝑠e)𝜎(𝑓, 0.5) (Appendix, Figure A2). If 𝑓
(𝑔) is the null case, then range(𝑓(𝑔)) = 0. In this case,
we use null , which is the empirical 𝑠-quantile of 𝑆 calculated for nonsignificant genes 𝑔 of Analysis I. We
obtain null (Appendix, Figure A3).
Using an adapted standard deviation per gene and model is preferred over using a fixed standard deviation, because it
allows for comparability between different models and ranges with respect to goodness-of-fit (Kappenberg et al., 2021).
Finally, the generated data set is analyzed as the original VPA data set in Analysis I.
5.2 Results
Table 5 summarizes the results of the simulation w.r.t. signal detection (power), recall, and precision. For 𝑛𝑖 = 3, which
mimics the conditions in the real VPA data set, a signal is almost always detected if it is present. However, the recall
and precision rates for n𝐸onlinear and nonflat models for this scenario are below 0.69. If 𝑛𝑖 = 10, the rates of these modelselection errors are still lamrgaxe, even though the sample size𝑁 = 8 ⋅ 𝑛𝑖 = 80 is rather large in the context of toxicology. Forexample, if the sigmoid model is correct, for 31.1% of the generate𝑛d dose–response data another model is𝐸inmcaoxrrectlyselected. Due to the penalty term of the AIC used in model selection, complex correct models as the sigmoid or thebeta model have a comparatively large increase in recall rates when 𝑖 is increased. For comparatively noisy scenarios,
these models are rarely selected. The opposite holds for the least complex model, the linear model. It has a comparatively
very low precision rate (41.6%) and very high recall rate (70.5%) for 𝑛𝑖 = 3, but not for 𝑛𝑖 = 10. Precision values naturally
have more practical value, as they give insight on how confident one can be with the model selection. The precision rate
increases from 92.5%𝑛t𝑖o 99.9% for the flat model if 𝑛𝑖 increases from 3 to 5. For nonflat models, the precision rate does notexceed 82.2% at any .
In practice, often one is not mainly interested in selecting the true underlying model but to have a sufficiently good
fit. Figure 8 summarizes the relative loss in model fit by considering the log-ratio in root-mean-square error (RMSE)
between the winner and the true model, that is, between the actually selected model and the fitted model if the correct
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
894 DUDA et al.
ni3 ni5 ni10
4
Winner model
betaMod
2 emax
exponential
linear
quadratic
0 sigEmax
−2 7686 7534 8235 16958 9921 9251 9038 7399 8822 14196 11279 9675 10523 7794 9167 10804 11763 10400
F IGURE 8 Distributi𝑅on𝑀s𝑆o𝐸f log2(𝑅𝑀𝑆𝐸winner∕𝑅𝑀𝑆𝐸true) in the simulation. 𝑅𝑀𝑆𝐸true is the root mean squared error (RMSE) if thecorrect model is fitted and winner the RMSE of the selected winner model. For 108 genes, the log-ratio is greater than 5 and not displayed
model is selected. For both, the RMSE is calculated with respect to the true dose–response curve that is known in the
simulation. If the correct model is actually selected, this ratio is 0, because the true and selected model are the same. If
the ratio is greater than one, then the selected model differs from the correct model and has a worse RMSE. The ratio
of the RMSE values is independent of 𝑛𝑖 and of the range of the respective gene. It only captures the effect of the model
selection.
In general, the relative loss inRMSEdecrease𝑛s𝑖 w=it3h increasing𝑛𝑖 but is still present for𝑛𝑖 = 10, although𝑁 = 8 ⋅ 𝑛𝑖 = 80is already a large sample size in toxicology. For themedian of the log-ratio is 0.000𝑛0𝑖for all winnermodels, but 0.0543for the linear model. This demonstrates the low precision of the linear model for𝐸small . For small 𝑛𝑖 = 3, the penalty ofthe AIC is comparatively2large,=yi1e.l4d8i0ng too2many=se1l.e5c0t9ions of the simpler linearmmaxodel in cases where a more complexmodel m𝐸ight be required. If a more complex model such as the beta or sigmoid model is selected, the ratio’s upperquartile armealxargest with 0.464 and 0.594𝐸 , respectively. Hence, for 25% of the generated genes where thesigmoid mo𝑛del is smelaexcted, the selection is not correct and the RMSE is at least 50.9% larger than the RMSE of thecorrect model. For𝑖 the and for the exponential model, the upper quartiles of the ratio are closest to zero for each 𝑛𝑖 .𝐸With increasing , the penalty term of the AIC becomes comparatively weak. For 𝑛𝑖 = 10, this heavily affects the linearmmoadxel. It is selected less often and has log-ratios closely concentrated around zero. For the beta, quadratic, and sigmoidmodel, the upper quartile of the log-ratios remain comparatively2f0a.r23f5ro=m10.1.7I7f the beta model is selected, for 25% ofthe generated genes the selection is incorrect and the RMSE is at least times the RMSE if the correct model
was selected. Thus, not selecting the correct model results in a noteworthy relative loss in goodness-of-fit, even when
larger sample sizes are used in toxicology.
6 CONCLUSION
In this work, MCP-Mod was used as model selection approach for gene expression concentration–response data. For the
data set at hand, human embryonic stem cells were exposed to varying concentrations of VPA. For 54,675 probe sets or
g𝐸enes the expression is measured. The data set indicated that modeling gene expression concentration–response dataremqauxires the consideration of several models, that is, a candidate model set. Only considering the popular 𝐸max or sigmoidmodel might not be sufficient. Especially nonmonotone models like the quadratic model should also be taken into
account. When usingMCP-Mod, frequent selections of a linear model should not be misinterpreted as evidence for a true,
linear concentration–response relationship. A large noise-to-signal ratio, or, more precisely, a large standard deviation to
true response range ratio, favors the selection of the linearmodel. Also, there is typically no notable loss in goodness-of-fit,
when instead of the linear modelthe second be𝐸st model is use𝐸d.Using a newly proposed score, , it was obsermvaexd that the max model can be omitted from the candidate set despite itspopularity, as long as themore general sigmoid model is included in the candidate set. Further, the score discourages
to omit the linearmodel, even though it can be easily substitutedwith respect to goodness-of-fit. Including the linearmodel
in the candidate set aims to detect unclear concentration–response signals rather than modeling detected signals well. If
the linear model is omitted, one might fail to identify potentially interesting genes. Simulation studies based on the data
set indicate that the confidence in the correctness of the selected model, measured by the precision, is not very high.
log2RMSEwinner RMSEtrue
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DUDA et al. 895
Even when increasing the sample size per concentration from 3 to 10, which is very large for this type of toxicological
experiments, the precision of nonflat models does not exceed 0.83. Thus, increasing the number of experiments does
not increase the precision in model selection proportionally. The relative loss in goodness-of-fit due to model selection
mistakes decreases withincreasing sample size, but remains notable even for 10 replicates per concentration.The newly proposed  score served as a help to summarize analysis and simulation results. For a given candidate
set, it considers the power, that is, number of detected genes, and the goodness-of-fit of genes with a detected signal
simultaneously. Despite its simple form, its interpretability is not straightforward, which allows for improvements. If one
does not want to consider both power and goodness-of-fit at the same time but, for example, focuses on optimizing power,
the score is not an adequate tool.
The data basis of this work is a single data set, which, despite its size and quality, is an obvious limitation. Similar anal-
yses on other gene expression concentration–response data would be valuable to confirm our results. Another promising
approach, which is not considered in this work, is model averaging. It would be interesting to analyze the influence of the
different approaches and parameters on target dose estimation.
ACKNOWLEDGMENTS
This work has received funding from LivSysTransfer (BMBF: Project Number 031L0119) and was partially supported by
the Research Training Group “Biostatistical Methods for High-Dimensional Data in Toxicology” (RTG 2624, Project P2)
funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation: Project Number 427806116). The
authors would like to thank the anonymous reviewers for their numerous helpful comments.
CONFL ICT OF INTEREST
The authors have declared no conflict of interest.
DATA AVAILAB IL ITY STATEMENT
The original data that support the findings of this study are openly available in ArrayExpress as stated by Krug et al. (2013)
(https://doi.org/10.1007/s00204-012-0967-3).
OPEN RESEARCH BADGES
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to
reproduce the reported results. The data is available in the Supporting Information section.
This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary
to reproduce the reported results. The results reported in this article could fully be reproduced.
ORCID
JuliaC.Duda https://orcid.org/0000-0002-3894-0124
FranziskaKappenberg https://orcid.org/0000-0001-8066-5333
JörgRahnenführer https://orcid.org/0000-0002-8947-440X
REFERENCES
Abelson, R. P., & Tukey, J. W. (1963). Efficient utilization of non-numerical information in quantitative analysis general theory and the case of
simple order. The Annals of Mathematical Statistics, 34(4), 1347–1369.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the
Royal Statistical Society: Series B (Methodological), 57(1), 289–300.
Bornkamp, B., Bretz, F., Dette, H., & Pinheiro, J. (2011).. Response-adaptive dose-finding under model uncertainty. The Annals of Applied
Statistics, 5(2B), https://doi.org/10.1214/10-aoas445
Bornkamp, B., Pinheiro, J., &Bretz, F. (2009).MCPMod:AnRPackage for theDesign andAnalysis ofDose-Finding Studies. Journal of Statistical
Software, 29(7), https://doi.org/10.18637/jss.v029.i07
Bretz, F., Pinheiro, J. C., & Branson,M. (2005). Combiningmultiple comparisons andmodeling techniques in dose-response studies. Biometrics,
61(3), 738–748.
Buckland, M., & Gey, F. (1994). The relationship between recall and precision. Journal of the American Society for Information Science, 45(1),
12–19.
Claeskens, G., & Hjort, N. L. (2008).Model selection and model averaging. Cambridge Books.
Cotariu, D., Zaidman, J. L. (1991). Developmental toxicity of valproic acid. Life Sciences, 48(14), 1341–1350.
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
896 DUDA et al.
European Medicines Agency (2015). Qualification opinion of mcp-mod as an efficient statistical ethodology for model-based design and analysis
of phase II dose finding studies under model uncertainty. https://www.ema.europa.eu/en/documents/regulatory-procedural-guideline/draft-
qualification-opinion-mcp-mod-efficient-statistical-methodology-model-based-design-analysis_en.pdf.
Filer, D. L., Kothiya, P., Setzer, R. W., Judson, R. S., & Martin, M. T. (2016). tcpl: the ToxCast pipeline for high-throughput screening data.
Bioinformatics, btw680, https://doi.org/10.1093/bioinformatics/btw680
Food and Drug Administration. (2016). Determination letter. https://www.fda.gov/media/99313/download.
Genton, P., Semah, F., & Trinka, E. (2006). Valproic acid in epilepsy. Drug Safety, 29(1), 1–21.
House, J. S., Grimm, F. A., Jima, D. D., Zhou, Y.-H., Rusyn, I., & Wright, F. A. (2017). A pipeline for high-throughput concentration response
modeling of gene expression for toxicogenomics. Frontiers in Genetics, 8, 168.
Kappenberg, F., Grinberg, M., Jiang, X., Kopp-Schneider, A., Hengstler, J. G., & Rahnenführer, J. (2021). Comparison of observation-based and
model-based identification of alert concentrations from concentration–expression data. Bioinformatics, 37(14), 1990–1996.
Krug, A. K., Kolde, R., Gaspar, J. A., Rempel, E., Balmer, N. V., Meganathan, K., Vojnits, K., Baquié, M., Waldmann, T., Ensenat-Waser, R.,
Evans, R. M., Julien, S., Kortenkamp, A., Hescheler, J., Hothorn, L., Bremer, S., van Thriel, C., Krause, K.-H., Hengstler, J. G., Rahnenführer,
J., Leist, M., & Sachinidis, A. (2013). Human embryonic stem cell-derived test systems for developmental neurotoxicity: A transcriptomics
approach. Archives of Toxicology, 87(1), 123–143.
O’Quigley, J., Iasonos, A., & Bornkamp, B. (2017).Handbook of methods for designing, monitoring, and analyzing dose-finding trials. Handbooks
of Modern Statistical Methods, CRC Press.
Pinheiro, J., Bornkamp, B., & Bretz, F. (2006). Design and analysis of dose-finding studies combining multiple comparisons and modeling
procedures. Journal of Biopharmaceutical Statistics, 16(5), 639–656.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Schorning, K., Bornkamp, B., Bretz, F., &Dette, H. (2016). Model selection versusmodel averaging in dose finding studies. Statistics inMedicine,
35(22), 4021–4040.
Thomas, N., Sweeney, K., & Somayaji, V. (2014). Meta-analysis of clinical dose–response in a large drug development portfolio. Statistics in
Biopharmaceutical Research, 6(4), 302–317.
Ting, N. (2006). Dose finding in drug development. Springer Science & Business Media.
Wheeler, M. W., & Bailer, A. J. (2009). Comparing model averaging with other model selection strategies for benchmark dose estimation.
Environmental and Ecological Statistics, 16(1), 37–51.
Xun, X. & Bretz, F. (2017). TheMCP-Modmethodology: Practical considerations and the dose finding r package. In J. O’Quigley, A. Iasonos, & B.
Bornkamp (Eds.), Handbook of methods for designing, monitoring and analyzing dose-finding trials (pp. 205-227). Chapman and Hall/CRC.
SUPPORT ING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher’s website.
How to cite this article: Duda, J.C., Kappenberg, F., & Rahnenführer, J. (2022). Model selection characteristics
when using MCP-Mod for dose–response gene expression data. Biometrical Journal, 64, 883–897.
https://doi.org/10.1002/bimj.202000250
APPENDIX
300
150 100
75
100 200
50
50 100
25
0 0 0
0 500 1000 1500 −5 0 5 10 5 10
ED50estimate Emaxestimate E0estimate
F IGURE A1 Barplots of parameter estimates of the 𝐸max model for genes where it was selected as winner model by the AIC
Gene count
Gene count
Gene count
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
DUDA et al. 897
1500
1000
500
0
0.0 0.2 0.4 0.6 0.8
S / range
𝑞F(I0G.1)U=R0E.14A272 𝑞(0B.5a)rp=lo0t.o3f22𝑆2to rang𝑞e(0r.a9t)io=fo0r.5g2e4n5es with an FDR adjusted signifMCP in Analysis I. The vertical dotted lines are at, , and
10000
7500
5000
2500
0
0.0 0.2 0.4 0.6 0.8
S 
F IGUR
and 𝑞null(0E.9)A=30.28B9a8rplot of 𝑆 for nonsignificant genes in Analysis I. The vertical dotted lines are at 𝑞null(0.1) = 0.1236, 𝑞null(0.5) = 0.1909,
Gene count Gene count
 15214036, 2022, 5, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/bimj.202000250 by Technische Universitaet Dortmund, Wiley Online Library on [14/12/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

Article 2
Computational Toxicology 23 (2022) 100234
Contents lists available at ScienceDirect 
Computational Toxicology 
journal homepage: www.sciencedirect.com/journal/computational-toxicology 
td2pLL: An intuitive time-dose-response model for cytotoxicity data with 
varying exposure durations 
Julia Duda a,*, Jan G. Hengstler b, Jörg Rahnenführer a 
a Department of Statistics, TU Dortmund University, Vogelpothsweg 87, Dortmund 44227, Germany 
b Department of Toxicology, TU Dortmund University, Leibniz Research Centre for Working Environment and Human Factors, Ardeystr. 67, Dortmund 44139, Germany   
A R T I C L E  I N F O   A B S T R A C T   
Keywords: Statistical modeling approaches for dose-response or concentration-response analyses are often required in 
Exposure duration-concentration-response toxicological applications, especially for cytotoxicity assays. By fitting a concentration-response curve, one can 
Time-dose-response derive target concentrations, such as the EC50. In practice, concentration-response data for different exposure 
Dose-response durations might be available and the target concentration for each or some exposure duration(s) are of interest. 
Exposure-duration 
Cytotoxicity In this work, we propose a statistical modeling approach that improves the precision of the target concentration 
Statistical modelling estimation at a given exposure duration by extrapolating the concentration-response data from other exposure 
durations. The method further enables target concentration estimation at exposure durations that were not 
conducted in the experiment. For practitioners, the proposed model yields additional complexity compared to the 
simple approach of a single concentration-response curve for all exposure durations. It would only be used if it 
improves the estimation of the target concentration compared to the simple approach. We propose a two-step 
pipeline to decide between using the complex and the simple approach to result in a precise target concentra-
tion estimation. 
The methods were evaluated using a simulation study and a real data set. The models are accessible for 
practitioners through the R package td2pLL.   
cell cycle [3]. This usually leads to recommended exposure durations of 
1. Introduction 24, 48, or 72 h, depending on the test. In their guidance document, the 
ICCVAM also referred to Riddell et al. [4], who first addressed the 
In cytotoxicity assays, concentration-response curves help to un- exposure duration question for cytotoxicity tests. They found large dif-
derstand the functional relationship between exposure of cells in the ferences in the toxicity of compounds between 48 h and 72 h exposure 
culture medium and viability. To conduct an assay, in addition to the duration. Possible mechanistic explanations are that some substances 
concentration, the exposure duration of the cells should also be set. Gu damage cell membranes while others affect DNA replication or cell di-
et al. [1] analyzed the relevance of the exposure or incubation duration vision. The toxic effects of the former can be observed after a short time 
on cytotoxicity in primary human hepatocytes (PHH), in which exposure and the latter only after longer exposure durations, as the cell is only 
durations of 1 or 2 days are normally administered [2]. They showed a affected during certain phases of the cell cycle [5]. Consequently, if the 
clear influence of exposure duration on EC50 values of a set of 30 com- toxicological mechanism of a compound is unknown, one should 
pounds incubated for 1, 2 and 7 days by separately fitting concentration- consider various exposure durations. If toxicity can be measured for 
response curves for each compound and incubation duration. The se- short exposure durations, it is of further interest to investigate the 
lection of one or more exposure duration(s) is therefore relevant for exposure duration dependency. 
cytotoxicity assays as it affects the target concentration estimation. An exposure duration-concentration-response (ECR) model could 
However, there is no clear guideline for selecting exposure durations for facilitate answering questions such as: What is the hypothetical limit 
cytotoxicity tests. The guidance document of the Interagency Coordi- target concentration for infinitely long exposure durations? When does 
nating Committee on the Validation of Alternative Methods (ICCVAM), increasing the exposure duration cease to decrease the target concen-
for example, states that longer exposure durations tend to enhance the tration? The mechanistic motivation of the proposed ECR model helps to 
sensitivity of a test and propose to use exposure durations of at least one answer questions relating to dependency between the exposure duration 
* Corresponding author. 
E-mail addresses: duda@statistik.tu-dortmund.de (J. Duda), hengstler@ifado.de (J.G. Hengstler), rahnenfuehrer@statistik.tu-dortmund.de (J. Rahnenführer).  
https://doi.org/10.1016/j.comtox.2022.100234    
Available online 1 June 2022
2468-1113/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
Nomenclature DMSO Dimethyl sulfoxide 
ETOH Ethanol 
Abbreviations FAM Famotidine 
td2pLL time-dose two-parameter log–logistic GLC Glucose 
EC50 effective concentration of half maximal effect HYZ Hydroxyzine 
PHH primary human hepatocytes INAH Isoniazid 
ICCVAM Interagency Coordinating Committee on the Validation of KC Ketoconazole 
Alternative Methods LAB Labetalol 
ECR exposure duration-concentration-response LEV Levofloxacin 
4pLL four-parameter log–logistic MEL Melatonin 
2pLL two-parameter log–logistic MePA Methylparaben 
RSS residual sum of squares NAC Acetylcysteine 
AMAFC average mean absolute log2 fold-change NFT Nitrofurantoin 
APAP Acetaminophen NIM Nimesulide 
ASP Aspirin PHB Phenylbutazone 
BOS Bosentan PMZ Promethazine 
BRP Buspirone PPL Propanolol 
BUSF Busulfan RIF Rifampicin 
CBZ Carbamazepine TSN Triclosan 
CHL Chlorpheniramine maleate VITC Vitamin C 
CLON Clonidine VPA Valporic acid  
DFN Diclofenac 
and the target concentration. The hypothetical target concentration the specific ECR model, proposed here in Section 2.1. In addition to the 
EC50 for an infinitely long exposure duration is just one parameter of the mechanistic motivation, the application of the ECR model on a real ECR 
model. The parameter and the answer to the question are therefore cytotoxicity data set is presented in Section 2.4, which supports the 
available as soon as the model is fitted. ECR modeling could solve the model assumptions. The benefits of our model are described, for which 
issue of exposure-duration selection for concentration-dependent cyto- the assumptions and possible modifications are justified as far as the 
toxicity testing. limited size of the available data allowed. 
Fitting an ECR model has two main advantages over fitting separate To understand Section 2.4, a brief explanation of the concept of 
concentration-response curves for each exposure duration:  overfitting may be required for readers unfamiliar with this term. 
Generally, a model with many parameters can fit to a data set more 
1. The target concentration estimation is not restricted to the exposure closely than a model with fewer parameters. A more flexible model, 
durations conducted in the experiment. As the ECR model fits a however, is subject to a high risk of overfitting the data, i.e. it even fits to 
surface over the combinations of concentration and exposure dura- small deviations in the data that are due to noise. When overfitting is 
tion, it predicts a viability response for each possible combination strong, the resulting model would perform poorly on new data if the 
including those that are not conducted in the experiment. ECR model experiment is repeated. The overfitting effect is always present, but less 
fitting thus yields a target concentration estimation that is less sen- strong or negligible for large data sets and less parameters. For small 
sitive to the selection of exposure durations than fitting separate data sets and many parameters its effect is stronger. Hence, the number 
concentration-response curves at each exposure duration.  of parameters must be accounted for, i.e. the flexibility of a model, when 
2. A benefit of an ECR model is the increased statistical precision in comparing how closely a model fits the data as a measure of model 
target concentration estimation. If one fits a one-dimensional con- performance. Otherwise, a model with more parameters will automati-
centration-response curve for each exposure duration separately, cally fit the data more closely than a model with fewer parameters, but 
only the data that belong to the respective exposure duration are the seemingly better performance in the sense of a closer fit can be a 
used for each fit. When fitting an ECR model, only a single fit is result of overfitting. 
calculated using the data from all exposure durations. This increases In addition to the proposed ECR model, we propose a two-step 
the sample size, the precision in estimating the model parameters pipeline to automatically distinguish between experimental data for 
and, in turn, the target concentration estimation. which fitting an ECR model is beneficial and where it might be unnec-
essary complicated. It is possible that concentration-response data are 
These two benefits of the ECR model over separate concentration- available for different exposure durations, but all exposure durations are 
response curve fitting at each exposure duration underline the attrac- rather long so that the target concentration (such as the EC50) does not 
tiveness of the ECR modeling approach. More recent works highlighted differ between these exposure concentrations. In this case, it would be 
the need (Morin et al. 2018) and efforts (Focke et al. 2017; Serra et al. unnecessarily complicated and would not yield better target concen-
2020) to overcome suboptimal separate concentration-response curve tration estimation when fitting the proposed ECR model. For these cases, 
fitting when concentration-response relationships at different exposure it is preferred both in terms of a better target-concentration estimations 
durations are to be analyzed. These are discussed in Section 2.2. Two- and simplicity of the model to treat the data (after normalization with 
dimensional ECR modeling has key advantages in comparison to sepa- exposure duration-wise control) as if there is only one exposure dura-
rate concentration-response modeling when different exposure dura- tion. A simple concentration-response curve should be fitted using the 
tions are conducted in the experiment and current research seeks for data of all exposure durations. To account automatically for such a 
good ECR models. scenario, we additionally propose a statistical two-step procedure. In 
The validity of the model assumptions is crucial for the major step 1, it is decided objectively whether the EC50 differs between 
improvement of ECR modeling over separate concentration-response different exposure durations. If so, the ECR model is fitted in step 2. If 
fitting. We addressed this considering the mechanistic motivation of not, a single concentration-response curve is fitted in step 2. The pro-
2
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
posed two-step pipeline helps practitioners to fit an ECR model only on 
experimental data where it is beneficial and necessary. 100 Exp. dur.
To quantify and compare the potential benefit of fitting the proposed 
ECR model and the two-step pipeline, we conducted a simulation study t = 1
based on real cytotoxicity data of Gu et al. [1], which is presented in 75 t = 2
Section 2.5. We compared the new model and pipeline to fitting separate t = 3
concentration-response curves, as well as fitting a single concentration- t = 100
response curve to all exposure durations. The latter approach effectively 
ignores that there are different exposure durations. The simulation study 50
accounts for many scenarios, such as different exposure duration effects ∆ = 0.1
on the target concentration EC50 (no, small and large effect) and 
different levels of background noise (small, medium, large) in the data. 25
A common bottleneck for implementing sound theory into practical 
usage is that sophisticated statistical methods are often difficult to grasp 
for practitioners with less profound mathematical background or there is 
a lack of software availability. These drawbacks can prevent theoreti- 0 C0 = 0.01
cally well-established methods from being used in practice. Hence, 
simplicity and good interpretation of the proposed ECR model is an 0.001 0.010 0.100 1.000
advantage compared to other works in this field, as discussed in Section Concentration
2.1. To overcome problems with practical implementation and calcu-
lations of our proposed ECR model and methods, they were made Fig. 1. Concentration-response curves of a td2pLL model at different exposure 
available through an open source R software [6] package td2pLL via durations t with parameter Δ = 0.1. Black symbols on the dotted line show how 
GitHub (https://github.com/jcduda/td2pLL/). We therefore eased the the EC50(t) moves dependent on t. 
usage of our methods for practitioners both by making our model easy to 
interpret and by providing a software implementation. with respect to the raw mean response at the control, the viability is 100 
In this study, we addressed the question as to whether there is a [%] at the control, and for large enough concentrations, the viability 
benefit in using the proposed ECR model and the two-step pipeline when tends towards 0[%]. Mathematically, this is equivalent to setting E0 =
estimating target concentrations such as the EC50 for cytotoxicity data, 100 and Emax = −100 in Eq. (1), yielding: 
where different exposure durations are conducted in the experiment. ( ) xh
The methods have intrinsic advantages compared to fitting separate f x = 100 − 100 h . (2) xh + EC
concentration-response curves for each exposure duration and they 50
improve target concentration estimation, which was demonstrated in a Hence, for cytotoxicity one can reduce the 4pLL model to the 2pLL 
simulation study based on real data. model, as only two parameters, EC50 and h, remain to be estimated. In 
We note that in the mathematical literature, time-dose-response model cytotoxicity assays it can occur that the viability does not reach 0%. For 
is the general term and hence motivates the chosen name td2pLL (time- an initial development of an ECR model, we restrict ourselves to the 
dose two-parameter log–logistic) model. However, we used the biolog- reasonable assumption of 0% viability at large concentration. Examples 
ically correct term exposure duration-concentration-response (ECR) model of 2pLL curves are presented in Fig. 1. 
throughout this manuscript. The 2pLL concentration-response model depends only on the con-
centration. To incorporate exposure duration, the EC50 parameter is 
2. Materials and methods modelled to be dependent on the exposure duration, t, by setting EC50 =
EC50(t). This is intuitive because, e.g., for a longer exposure duration, 
2.1. Exposure duration-concentration-response model one expects a smaller concentration to yield 50% viability. How exactly 
the EC50(t) changes according to the exposure duration, t, needs to be 
In this section, we explain how the proposed ECR model is an formulated mathematically. We therefore applied a modified version of 
extension of a commonly used concentration-response curve. A detailed Haber’s law [10] following Miller et al. [11]. Originally, Haber postu-
explanation if given of how the parameters of the model can be inter- lated that a lethal effect of a compound is determined by multiplying the 
preted in terms of toxicological research interests. concentration of the compound and the exposure duration, i.e. effect  =
For cytotoxicity assays, the most popular model used for describing concentration ⋅ exposure duration. For exposure duration t = 1, we 
concentration-response curves is the sigmoidal four-parameter log–lo- denote the lethal effect for which half of the cells die, by Δ̃. By definition, 
gistic (4pLL) model ([7–9]): the corresponding concentration is the EC50: 
( )
xh
f x = E0 + Emax h , (1)  
Δ̃ = EC50⋅t.
xh + EC50 Miller et al. [11] further introduced a parameter C0 as a lower limit 
where x is the concentration and f(x) the viability in %. The Hill or slope for EC50(t). The parameter C0 can be interpreted as the EC50 at an infi-
parameter, h, represents the steepness of the curve, with a greater value nitely large exposure duration. This leads to the equation 
indicating a steeper curve. EC50 is the concentration where the half- Δ̃ = (EC − C )⋅t.
maximal effect is reached, i.e. half of Emax. For clarity, note that the 
50 0
other well-known parametrization ([7]) of the 4pLL model is Lastly, concentration and exposure duration might not change the 
⎛ ⎞ lethal effect equally when increased or decreased by one unit. To capture 
d − c d − c this, different exponents, α and β, were added in the model: 
f ⎜⎝x⎟⎠ = c + = c +1 ( )+ exp{b(log(x) − log(e))} 1 x b  + e Δ̃ = (EC50 − C α0) ⋅tβ.
where d is the upper asymptote, c is the lower asymptote, e is the EC Solving the equation for EC50 50 yields: 
and b is h. For viability assays, one can assume that after normalization 
3
Response
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
Fig. 2. Explanation on how to interpret the 
modeled influence of exposure duration on 
EC50 (red line) when fitting the proposed 
exposure duration-concentration-response 
(ECR) model with four ((a)-(d)) example 
parametrizations of the proposed ECR model, 
td2pLL. They have the same steepness 
parameter h = 2 and the same Δ parameter. 
Δ is the difference in EC50 values at exposure 
duration t = 1 and hypothetical exposure 
duration t→∞, where EC50(t→∞) = C0. 
Comparing the top row ((a), (b)) with the 
bottom row ((c), (d)), γ increases, which ex-
plains the stronger change of the EC50 at 
short exposure durations. In the right column 
((b), (d)), the EC50 does not appear to change 
much between different exposure durations 
compared to the left column ((a), (c)). This is 
because the concentration is visualized on a 
logarithmic scale and the fold-change be-
tween C0 +Δ = 0.2 (= EC50 at exposure 
duration t = 1) and C0 = 0.1 is only 0.2/
0.1 = 2, compared to 11 in (a) and (c).   
1/α
Δ̃ t−β/α C EC concentration. The parameters that define how the EC50 changes for ⏟̅⏞⏞̅⏟ + 0 = 50. different exposure durations are C0, γ, and Δ and can be interpreted as 
:=Δ
follows:  
Mathematically, the above parametrization does not lead to a unique 
solution when fitting the model to an appropriate data set. A re- • C0 is the limit EC50 for a hypothetical, infinitely long exposure 
parametrization circumvents this technical problem, yielding the final duration. For increasing exposure duration t, EC50(t) approaches C0 
dependency of the EC50 on the exposure duration: (if γ > 0, cf. Figs. 2 and 1).  
( ) • γ > 0 indicates when the EC50(t) changes most between different 
⇔ γ=β/αΔ⋅t−γ + C0 = EC50 t . exposure durations. A larger γ indicates a stronger initial change of 
Note that as soon as the parameters, Δ, γ and C are known/esti- the EC50 followed by quickly reaching a plateau in the EC50. This 0 
mated, the EC50(t) can be derived/estimated directly at any exposure means that for shorter exposure durations the EC50 differs markedly 
duration t of interest using the above formula. There is no restriction to and for larger exposure durations the EC50 does not change much as 
exposure durations conducted in the experiment. By plugging EC (t) it is already close to its limit C0. A smaller γ indicates a more 50
into the 2pLL model, the new td2pLL ECR model is obtained: balanced change of the EC50 between exposure durations. This 
( ) means that the EC50 differs considerably even for two comparatively 
xh long exposure durations. In other words, the EC50 reaches a plateau f t, x = 100 − 100 . (3)  
xh + (Δ⋅t−γ C h+ ) less quickly, i.e. at even longer exposure durations compared to 0
larger γ > 0. 
Examples of the model are visualized in Fig. 2. The interpretation of the • γ < 0 is the unintuitive case that the EC50 increases when the expo-
slope parameter, h, remains as in the standard concentration-response sure duration increases. This increase could be valid if the substance 
2pLL model: A larger h leads to a steeper curve alongside the 
4
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
is beneficial instead of toxic. Mathematically, the model contains no 
constraints on γ, but practically this case is typically irrelevant.  
• Δ is the difference in EC50 values at exposure duration t = 1 and Significant
exposure duration t = ∞, i.e. the limit EC50 value for infinitely large 
exposure durations, which is C . Δ hence indicates the range of the difference0
EC50 values for exposure durations t⩾1 (cf. Fig. 1). The EC50 at Step 1 between 50s 
exposure duration t = 1 is always Δ + C0. An example of the inter-
pretation of Δ and C is described below. Mathematically, EC t across exposure0 50( =
1) −EC50(t→∞) = (Δ + C0) −C0 = Δ. durations?
In the following we explain the interpretation of the model param-
eters C0 and Δ using examples (Figs. 2 and 1). When considering fold 
changes in the EC50 estimates between the exposure duration t = 1 and 
the hypothetical exposure duration, t→∞, both estimates of Δ and C0 
should be conducted simultaneously. For example, given the estimates 
C0 = 0.01 and Δ = 0.1, the fold change is 11. This can be derived Fit td2pLL Fit single 2pLL
directly from the model parameters by Step 2 (Account for effect of (Ignore effect of exposure
exposure duration) duration)
EC50(t = 1)/EC50(t→∞) = (C0 + Δ)/C0 = (0.01 + 0.1)/0.01 = 11.
However, if the estimated EC at an infinitely long exposure duration is Fig. 3. The proposed two-step pipeline decides in an objective, statistical 50 manner if the two-dimensional model should be used to model both concen-
C0 = 0.1 while still Δ = 0.1, then there is only a fold change of tration- and exposure duration-dependency, or if it suffices to use a one- 
(0.1 +0.1)/0.1 = 2 in the EC50 value between exposure durations t = 1 dimensional model that only includes the concentration. 
and t→∞. In summary, the proposed ECR model, td2pLL, has a clear 
derivation as it naturally extends the commonly used sigmoidal the experimental data, we propose a two-step pipeline. Step 1 involves 
concentration-response using Haber’s law [10]. deciding statistically if the exposure duration has an influence on the 
EC50. If it does, an ECR model is used for fitting in step 2. If not, a single, 
2.2. Comparison to other approaches one-dimensional 2pLL concentration-response curve is fitted on all data 
in step 2 and the information on exposure duration is ignored (Fig. 3). 
Efforts on modeling ECR data have been increasing lately. Focke For step 1, an ANOVA-based test can be used. Statistically, the hy-
et al. [12] used a similar approach to ours, but the exposure duration- pothesis that the EC50 is the same across all considered exposure dura-
response relationship was modeled with a sigmoidal function and the tions ti is tested against the hypothesis that it is different for any pair of 
concentration component was added according to Haber’s law. This values t1 and t2. Here, a typical signal plus noise model 
resulted in a different ECR model. There was no software implementa- ( )
tion provided by Focke et al. [12]. In the work of Schüttler et al. [13], an yijk = f ti, xj + εijk (4)  
ECR model for toxicogenomic data was proposed where the 
concentration-response relationship was a 4pLL model and the exposure with i = 1, …, k exposure duration levels, j = 1, …, l concentration 
duration was also incorporated using the EC50 parameter. However, the levels, nij replicates at exposure duration-concentration setting (ti, xj), 2
dependence of the EC50 parameter was modeled using a non-monotone and corresponding noise εijk ∼ N (0, σ ) is assumed. 
logarithmic Gaussian function. The authors justified this type of de- The two nested models Q0 and Q1 are then compared. Model Q0 is a 
pendency with its good empirical fit to toxicogenomic data. For cyto- regular 2pLL concentration-response model that is fitted to the data, 
toxicity data, however, a non-monotone exposure duration dependency where exposure duration is completely neglected. Model Q1 allows for 
is less realistic. A software implementation was provided by Schüttler each exposure duration an individual EC50 parameter. The remaining 
et al. [13] through an R-package. In the R-package TinderMIX intro- hill parameter h remains shared across exposure duration levels. Given 
duced by Serra et al. [14], ECR toxicogenomic data were modeled using the assumption of normally distributed errors, the (nested) ANOVA 
linear regression with polynomials with a maximal order of three. One statistic is 
obvious limitation is that these models might predict unrealistic (e.g. ( )/( )RSSQ − RSSQ η − ν
negative) response values for exposure duration-concentration settings F 0= /1 , (5)  RSS ν
that are not close to the exposure duration-concentration settings of the Q1
experiment. Furthermore, the model parameters are not easily inter- where η are the residual degrees of freedom for Q , ν are the residual 
pretable if quadratic terms or terms of a higher order are included. With 0degrees of freedom for Q , and RSS is the residual sum of squares. The 
td2pLL, we fill a scientific gap by providing a mechanistically motivated, 1test rejects the hypothesis that Q0 is the true model at significance level easily interpretable model for cytotoxicity data that is embedded in an α, if the observed value for F is greater than F(1 −α, η −ν, ν), the 1 −α 
R-package to ease its application for practitioners. quantile of an F-distribution with η −ν and ν degrees of freedom. A 
rejection of the null hypothesis that Q0 is the true model means that the 
2.3. Two-step pipeline for exposure duration-concentration-response data hypothesis that there is no exposure duration-dependency for the EC50 is 
rejected. 
Even when exposure duration-resolved and concentration-resolved Hence, rejecting the null hypothesis in step 1 leads to fitting the ECR 
toxicity data are available, they might not exhibit a clear exposure model in step 2. If the test does not reject the null hypothesis in step 1, 
duration-effect. One possible reason is that the experimentally chosen information on exposure duration is ignored and a single 2pLL 
exposure durations all lie within a range where the exposure duration- concentration-response model is fitted in step 2. The significance level α 
effect is already saturated and the EC50 no longer changes. For such must be chosen in advance. 
cases, it is unnecessarily complicated and not appropriate to use the The two-step pipeline provides an objective framework to make a 
proposed ECR model. Instead, ignoring the information on exposure decision regarding the data as to whether it is beneficial to account for 
duration and fitting a single concentration-response curve would be possible differences in target concentrations across different exposure 
considered reasonable. To automatically decide which case applies to 
5
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
Table 1 2.5. Simulation study - setup 
Overview of the simulation study setup that compares the precision of the EC50 
estimation between the proposed method (Two-Step) and other methods, based 2.5.1. Overview 
on various simulated scenarios that use real cytotoxicity data from Gu et al. [1].  To compare the performance of our method to other statistical 
Aim Comparing the performance of the proposed exposure analysis approaches, we used a simulation study based on the cytotox-
duration-concentration-response modeling approach (Two- icity data of Gu et al. [1]. The computations are performed with the 
step pipeline) for toxicity assays with respect to target td2pLL R package (version 1.0.0.). In such a simulation study, one as-
concentration estimation precision with other approaches.  
sumes different scenarios of ECR relationships as true. Given such a true 
scenario, data can be generated and the methods are applied on the 
Data Generating Based on data from Gu et al. [1] with models M ∈ {M0, M1 , generated data to estimate a target concentration for the assumed, true 
Mechanism M2} =: M where 
- M is the scenario where the EC does not change across ECR scenario. For reliable simulation results, for each scenario, the data 0 50 
exposure durations, generation and method application process was repeated many (nsim =
- M1 is the scenario where the EC50 changes moderately 1000) times. Performance of the methods was assessed by averaging 
across exposure durations, 
- M is the scenario where the EC changes a lot between precision of target concentration estimation, ÊC50, across the different 2 50 
exposure durations.  simulation scenarios. 
-Experimental design An overview of the main components of the simulation study using 
-k ∈ {3, 4} =: K different exposure durations the ADEMP summarization principle [16] (Aim, Data Generating 
-nobs ∈ {72, 216} =: N obs observations 
-(Normal) noise N N N N Mechanism, Estimands, Methods, Performance) is provided in Table 1. ∈ { 1, 2 , 3} = : N oise  In our simulation study, the general estimand was the target concen-
tration EC . To enhance the generalization of the simulation study re-
Estimands EC50 (Concentration causing a response that is 50 percent of 
50
the maximum achievable)  sults on real data, many factors of data generation were crossed with 
each other to generate a large, robust pool of simulation scenarios. 
To understand the presentation of the results in Section 3.2, impor-
Methods -Two-Step: Proposed two-step-procedure with ANOVA pre- 
test in step 1 and td2pLL ECR model fit or 2pLL tant factors of the data generation should be noted. To mimic the pos-
concentration-response model fit with ignored exposure sibility of different magnitudes of influence of the exposure duration on 
duration, respectively, in step 2. the target concentration in real data, in the simulation study we 
-Always td2pLL ECR model fit considered three levels of influence of the exposure duration on the 
-Always separate 2pLL concentration-response model fit 
per exposure duration target concentration EC50, namely no (M0), little (M1) and strong (M2) 
-Always single 2pLL concentration-response model fit influence. We further accounted for varying levels of background noise 
neglecting exposure duration and thus pooling with respect in real data by including little (N1), medium (N2) and strong (N3) 
to exposure durations  background noise scenarios in the simulated data. In the following, 
further detailed explanations on the data generating mechanism, 
Performance -AMAFC: Mean of the absolute log2 fold change of estimated methods, and performance measures are presented. 
and true EC50 across exposure durations and averaged over 
simulation replications. A small AMAFC is desirable.  2.5.2. Details 
To generate ECR data for the simulation study, the assumed true 
durations by fitting the proposed ECR model, or if it is sufficient to fit a model M, the experimental design, and the variability N of the added 
concentration-response curve. background noise must be specified. Three models representing 
different types of exposure duration dependency were used. These 
2.4. Data models were derived from models fitted on real data. A strong exposure 
duration dependency means that the EC50 differs a lot between different 
The data on which our simulation study is based come from Gu et al. exposure durations. For a strong (M2) and a weak (M1) exposure dura-
[1] where cytotoxicity testing was performed on primary human hepa- tion dependency, the td2pLL model was fitted to data from two com-
tocytes. These data are publicly available and the publication contains pounds tested by Gu et al. [1], namely chlorpheniramine (CHL) and 
details of the laboratory protocol. The data set contains concentration- ethanol, respectively. A model without exposure duration dependency 
response data of 30 compounds. For each compound, 3 biological rep- (M0) was derived from the fitted model M1 by setting γ = 0, which 
licates (donors) are available. For each donor, there are measurements removed the influence of the exposure duration on the viability response 
of 3 exposure durations (1, 2 and 7 days) and (including solvent) 6, 7, or such that the EC50 was the same across all exposure durations (Fig. 4). To 
8 concentrations (mostly 6). The concentrations are equidistant on a log- enhance comparability, the concentration ranges of the compounds 
√̅̅̅̅̅̅
scale with base 10. There are 4 or 8 measurements per compound, were rescaled to be between 0 and 1. 
donor, exposure duration and concentration. Note that the data was To cover more aspects of a real experiment in the simulation sce-
normalized: For each compound, the viability values per donor were narios, we varied the number of exposure durations in the generated 
divided by the respective donor-solvent response value and multiplied data. Given a true ECR model M from the set {M0, M1, M2}, the true 
by 100. This normalization implicitly assumes that compound, donor response values are calculated at either k = 3 or k = 4 different exposure 
and exposure duration only affect the response at concentration 0. Other durations as depicted in Fig. 5. For 3 exposure durations, the selected 
correlations cannot be accounted for with this typical normalization exposure duration-concentration points (ti, xj) agree with the ones 
procedure. More over, the ‘retting’ approach by Kappenberg et al. [15] selected in the real experiment of Gu et al. [1] for compound CHL. Note 
was applied as an additional pre-processing step, to better justify the that in Gu et al. an effect of the exposure duration was anticipated, such 
assumption on the asymptotes. Lastly, the concentration-range was that for the larger exposure durations of 7 (days), lower concentrations 
rescaled to [0,1] to guarantee comparison of the results across were selected. In the simulation, either nobs = 72 or nobs = 216 obser-
compounds. vations are generated with equal sample size for all exposure duration- 
concentration combinations (ti,xj). 
To understand the influence of noise, we added different levels of 
background noise to the assumed true mean responses. It was assumed 
that all data points at a specific exposure duration-concentration point 
6
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
Fig. 4. To increase the representative-
ness of the simulation study for real 
data, three different scenarios of effect 
(none, weak, large) of exposure duration 
on EC50 were included in the simulation. 
The weak (b, model M1) and large (c, 
model M2) effect are each a td2pLL 
model fit on real ECR data from Gu et al. 
[1] of the compounds ethanol and 
chlorpheniramine. The no-effect model 
M0 (a) was derived from the low-effect 
model M1 (b), with complete elimina-
tion of the exposure duration effect by 
setting γ = 0. The models M0, M1, and 
M2 served as assumed true exposure 
duration-concentration-response re-
lationships for the simulation study. 
Exact parameters of the models are: M0 
(h = 1.26, Δ = 0.07, γ = 0, C0 = 0.05), 
M1 (h = 1.26, Δ = 0.07, γ = 2.19, C0 =
0.05), M2 (h = 1.54,Δ = 0.06,γ = 2.05,
C0 = 0.02). Note that the controls could 
not be shown due to the logarithmic 
scale of the concentration axis.   
driven approach yields more realistic simulated data. For each com-
3 exposure durations 4 exposure durations pound, donor and exposure duration-concentration point (ti, xj), the 
7 12 12 12 12 12 12 9 9 9 9 9 9 standard deviation σi,j of the pre-processed (cf. Section 2.4) cytotoxicity 
measurements was calculated. Note that this procedure calculated a 
single standard deviation value from 3 to 4 measurements. A model was 
fitted to the standard deviations depending on the exposure duration, ti, 
4 9 9 9 9 9 9 and pseudo-log transformed concentrations, xi (see Figs. 5 and 6). The 
outcome (standard deviation) was modeled as a linear combination of 
2
2 12 12 12 12 12 12 9 9 9 9 9 9 intercept, concentration, concentration , exposure duration and the 
1 12 12 12 12 12 12 9 9 9 9 9 9 interaction between concentration
2 and exposure duration. This means 
that the concentration can have a quadratic effect and exposure duration 
a linear effect on the standard deviation. Only significant variables (p- 
Concentration value < 0.05) were retained in the model. The resulting model was used 
to calculate the standard deviation for the normally distributed noise ε ∼
Fig. 5. To increase the representativeness of the simulation study for real data, N (0, (σ 2i,j) ). From ε, realizations were drawn that were added to the 
the number of exposure durations was varied in the simulation. Experimental response values f (t ,x ). 
designs are shown with k 3 (left) and k 4 (right) different exposure dura- M i j= =
tions for the data generating mechanism in the simulation study. Only the cases We considered 3 scenarios with increasing noise levels. In N1 (low 
with nobs = 216 are depicted. For nobs = 72, the replicates at each exposure noise), the generated standard deviations were divided by 2, in N2 
duration-concentration point were reduced to 4 and 3, respectively. Note that a (medium noise) they were left unchanged, and in N3 (high noise), they 
pseudo-log transformation was used to display the control at concentration- were multiplied by 2. This approach of adding background in the 
level 0. simulation study closely mimicked real data background noise and 
therefore further increased the representativeness of the simulation 
(ti, xj) have the same, true mean response fM(ti, xj) of model M ∈ {M1,M1, study for real ECR data. 
M2}. Noise was added as realizations of independent, normally distrib- To compare the approaches considered in this work for fitting ECR 
uted, mean zero noise (cf. Eq. (4)). The noise standard deviation, σ, was data and predicting target concentrations, each approach was applied on 
chosen based on a linear model of the empirical standard deviations each ECR data generating scenario. All 3⋅2⋅2⋅3 = 36 combinations of 
observed in the real data of all compounds in Gu et al. [1]. This data- data generation in the space M × K × N obs × N oise were crossed with 
7
Exposure duration
0
1e−04
0.001
0.01
0.1
1
0
1e−04
0.001
0.01
0.1
1
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
ti = 1 ti = 2 ti = 7
60
40
20
0
Concentration xj
Fig. 6. The model used for determining exposure duration-concentration dependent average noise levels in the simulation study captured properties of real data 
background noise. Points are empirical standard deviations per compound, donor, concentration and exposure duration in the pre-processed data of Gu et al. [1]. The 
blue line shows a model fit to these standard errors depending on the exposure duration ti and the pseudo-log transformed concentration xj. The model fit was used to 
select exposure duration-concentration point dependent standard deviations of the noise that was added to the generated data in the simulation. Note that the model 
captures intuitive noise characteristics, such as a decrease in noise at the maximal concentration. 
each method for data fitting (Two-step, always td2pLL, always separate 
2pLL, always single 2pLL, cf. Table 1), which yielded 36⋅4 = 144 NAC
simulation scenarios with nsim = 1000 repetitions for each scenario in 
total. Note that fitting separate 2pLL curves for each exposure duration 
represented the most basic concentration-response modeling approach 
for accounting for an exposure duration effect on the target MEL VITC
concentration. 25
For each simulation scenario, the performance of a model fit and 
associated target concentration estimation was measured using the 
averaged mean absolute log2 fold change (AMAFC). For a fitted model DFN
and a fixed exposure duration, ti, ÊC50(ti) was the estimated EC50 value 
and EC50(ti) was the true value. The AMAFC was defined as the mean of DMSO KC
20 LAB TSNthe values for |log2(ÊD50(ti)/ED50(ti))|, where the mean was taken over 
the exposure durations, ti, and these means were averaged over all n
BOS
sim CBZ CHL NIM
simulation repetitions. A small AMAFC is desirable. VPA PPLNFT
In summary, the resulting simulation was based on real exposure GLC
duration-concentration response data of Gu et al. [1] and compared the ASP HYZ INAH
proposed method regarding estimation precision in target-concentration PHB APAP ETOH
estimation with several other approaches across many realistic 15 PMZ
scenarios. CLON LEV
BPR
3. Results 15 20 25
New td2pLL model: Deviation model fit to data
3.1. Real data application 
Fig. 7. The newly proposed td2pLL model for ECR data has valid assumptions 
To validate the assumptions of our newly proposed model, model fits based on its application to a small cytotoxicity data set, which allows for a more 
on real ECR data were analyzed. Validation of the assumptions of our precise target concentration estimation. The comparison of the td2pLL model 
new model is crucial as they are the basis for the benefits of the model, in fits with the classical separate 2pLL model fits per exposure duration for all 
terms of a potentially improved target concentration estimation for a compounds of Gu et al. [1] based on the deviation between the model fit and 
given exposure duration, by also exploiting data from other exposure the data, which is measured by the standard error (SE) of the residuals. A small 
durations. We therefore fitted our newly proposed model, the td2pLL SE is desirable, which may however be artificially decreased due to overfitting. 
model, to pre-processed ECR cytotoxicity data of Gu et al. [1]. We The points scatter around the diagonal line, which means that the td2pLL model 
compared the fit of the td2pLL model (approach 1) with separate 2pLL fits are comparable to those of the separate 2pLL fits. This indicates that the 
assumptions of the td2pLL model are valid. For compounds BUSF, MePA, RIF 
fits for each exposure duration (approach 2). This helped to investigate and FAM, computational problems in the separate 2pLL fitting occurred and the 
whether the td2pLL model has valid assumptions for application to data points are not shown (cf. Fig. A1). For those compounds, there was no 
cytotoxicity data, thus improving target concentration estimations. concentration-dependent decline in viability for some exposure durations. 
To properly compare the two approaches with respect to the validity 
of the td2pLL model assumptions, the statistical concept of overfitting is than approach 1, where the td2pLL was fitted only once to ECR data of 
crucial, as explained in the introduction. Approach 2 involved more all exposure durations and therefore used fewer parameters in total to fit 
parameters because for each of the three exposure durations, a new 2pLL all data of one compound. 
model with 2 parameters each was fitted. Also, for each fit only the data We reduced the overfitting effect in the comparison of the ap-
of the respective exposure duration were used. For approach one, only 4 proaches by calculating the standard error (SE) for both approaches for 
parameters were fitted. Hence, approach 2 was more prone to overfitting each compound. Note that this measure takes into account the numbers 
8
σ(ti, xj)
0
0.003
0.01
0.032
0.1
00..331168
1
0
0.001
0.003
Separate 2pLL models: Deviation model fit to data 0.01
0.032
0.1
00..331186
1
0
0.001
0.003
0.01
0.032
0.1
00..331168
1
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
PPL Fig. 8. The newly proposed td2pLL model 
leads to clear improvements in model fitting 
for some compounds (LAB, DFN) of [1] and 
less optimal fits for other compounds (PPL, 
90 90 GLC). The resulting concentration-response 
Time curves at exposure durations 1, 2 and 7 
60 60 (days - legend in upper left plot) are shown 
1 for separate 2pLL models (left column) and 
for the td2pLL model (right column). For 
30 2 30 the compounds PPL and GLC, separate fits 
7 might be more plausible, while for LAB and DFN the td2pLL model seems to stabilize the 
0 0 fit. Only concentration-wise response means 
are shown.   
04 01 32 1 16 6 04 1 32 60 e− .0 .00 .0 03 0.
1 .310 . 1 0 e− .0
0
.00 0.0
1 310 0.1 .31
6
1
3 0 0 0 0 3 0 0 0. 0
GLC
100 100
80 75
60 50
40
25
0 3.16 10 31.6 100 316 0 3.16 10 31.6 100 316
LAB
125 125
100 100
75 75
50 50
25 25
0 0
0 0.0032 0.01 0.0316 0.1 0.316 0 0.0032 0.01 0.0316 0.1 0.316
DFN
90 90
60 60
30 30
0 0
0 0.0158 0.05 0.158 0.5 1.58 0 0.0158 0.05 0.158 0.5 1.58
Concentration Concentration
9
Response Response Response Response
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
Simulated background noise level
Little (N1) Medium (N2) Large (N3)
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
1.00
0.75
0.50
0.25
0.00
LL L LL L L
2p LL pL tep 2p L
L pL tep 2p
L
d p 2 L
L LL p
s t
p
. 2 le −s s t
d
. 2
p p
le 
2 −s s t
d
. 2 le 
2 −s
te
ay ep ing w
o ay ep ng woi y ep g
o
Alw
S S T SAlw S
T
Alw
a S Sin T
w
Method
nobs 216 72
Fig. 9. The proposed methods, Two-step and always td2pLL, outperformed other methods regarding EC50 estimation across different simulation scenarios for 
exposure duration-concentration response cytotoxicity data. The three vertical panels indicate increased simulated background noise levels. The three horizontal 
panels correspond to an increasing effect of the exposure duration on the EC50. For M0 (upper horizontal panel), the EC50 is the same across all exposure durations. 
For M1 (middle horizontal panel), the EC50 decreases moderately for increasing exposure durations, and for M2 (lower horizontal panel) it decreases a lot for 
increasing exposure durations. Two-step is the proposed two-step pipeline with pre-test for exposure duration-dependency. Small values of a deviation between the 
estimated and the true EC50 are desirable. 
of parameters by dividing by the degrees of freedom, i.e. the number of how many exposure durations are investigated. Therefore the td2pLL 
observations minus the number of parameters. The residual SE measures model would not overfit the data. In summary, the assumptions of the 
how much the model fit deviates from the data and penalizes models new td2pLL model and thus its potential benefit in target concentration 
with more parameters. Without this penalty, approach 2 would have estimation precision are generally supported by the analysis on the small 
always fitted more closely to the data than approach 1 because of ECR data set of [1]. 
overfitting, even if the assumptions of the td2pLL model are true. If the For qualitative demonstration purposes, we have focused on four 
assumptions of the td2pLL model are true or close to the truth, we expect example compounds (PPL, DLC, LAB, DFN) where the td2pLL model 
the SE of the two approaches to be similar. The similarity of the SEs of appears to be either more or less adequate than separate 2pLL fits 
the two approaches across the compounds can be seen in Fig. 7, in which (Fig. 8). For PPL and GLC, separate 2pLL fits appear more appropriate. 
the SEs of the approaches are close to the diagonal line. If the assump- Especially for PPL, the assumption of a fixed steepness parameter, h, 
tions of the td2pLL model were markedly incorrect, the SE of approach 1 across all exposure durations seems questionable. For LAB and DFN, 
would be systematically larger and the points would systematically tend however, the td2pLL model fit seems to be more robust (less overfitting): 
to lie below the diagonal line. For LAB, the data at the exposure duration of 7 days were very noisy, 
The data are not optimal to validate the td2pLL model, as there are which made a separate fit for these data difficult. By fitting a td2pLL 
only 3 different exposure durations and often too low maximal con- model, the concentration-response curve fit at t = 7 was stabilized 
centrations were used (Fig. A1). through the (less noisy) concentration-response curves at the other 
For experiments with more exposure durations investigated, exposure durations. For DFN, the concentration-response curves for 
approach 2 would increasingly overfit the data, as with every added exposure durations of 2 and 7 days were similar. A td2pLL model fit 
exposure duration, a new 2pLL model is fitted. By contrast, for the practically pooled and therefore stabilized the curves of these two 
td2pLL model, the number of parameters remains the same, regardless of exposure durations by fitting a large γ, i.e. a strong initial change of the 
10
Estimated EC50 to true EC50 (Absolute log2 deviation)Simulated effect of exposure duration on EC50
None (M0) Small (M1) Large (M2)
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
EC50(t) and almost no change in the EC50(t) for larger exposure improved assessment of concentration and exposure-duration depen-
durations. dent cytotoxicity. For cytotoxicity experiments where concentration- 
In summary, the assumptions of the newly proposed td2pLL model response data are available for different exposure durations, the pro-
for ECR data are supported by the data set of Gu et al. [1] as far as the posed ECR model can improve the precision of target-concentration 
small size of the data set allows any validations. A possible modification estimation, such as the EC50, compared to fitting separate 
of the td2pLL model is a varying steepness of the concentration-response concentration-response curves for each exposure duration. This is 
curves across exposure durations. However, this would require more possible because the model implicitly uses the concentration-response 
parameters, which can easily lead to overfitting or computational issues. data from all exposure durations to calculate the target concentration 
In general, the application on the real cytotoxicity data of Gu et al. [1] at an exposure duration of interest. In addition, joint modeling allows 
supports that the newly proposed td2pLL model might lead to more the extrapolation of cytotoxicity responses to exposure duration- 
precise target concentration estimation for ECR data at an exposure concentration combinations that were not conducted in the experi-
duration of interest, by exploiting available concentration-response data ment. This is a main advantage over separately fitting concentration- 
of other exposure durations. response curves at each exposure duration, where cytotoxicity estima-
tion is restricted to the exposure durations used in the experiments. 
3.2. Results of the simulation study The potential benefit of the proposed ECR model over separate 
concentration-response curve fitting depends on the validity of the 
In this section we present the results of the simulation study model assumptions. We presented arguments both on a theoretical and 
described in Section 2.5. In particular, the precision of the target con- practical level, that support these model assumptions. From a theoretical 
centration (EC50) estimation for the proposed two-step pipeline that uses perspective, the model assumptions are mechanistically motivated as it 
the ECR model, td2pLL, was compared with other approaches for esti- extends the well-established, one-dimensional Hill model to a two- 
mating the EC50 from ECR data. The main finding was that the two-step dimensional ECR model using Haber’s law. For a practical investiga-
pipeline, as well as always fitting the new td2pLL model, outperformed tion, we checked the suitability of the model using a real cytotoxicity 
other approaches that ignore or overfit the potentially different EC50 data set of 30 compounds, with measurements for multiple exposure 
values at different exposure durations. Fig. 9 summarizes the simulation durations and concentrations for each compound. This practical appli-
results, comparing in each subgraph the four methods Two-step (new cation generally supported the model assumptions, as far as the limited 
two-step pipeline described in Section 2.3), Sep. 2pLL (separate 2pLL size of the data set allowed strong conclusions. The td2pLL model 
models for each exposure duration), Single 2pLL (Single 2pLL model seemed to be beneficial for the data for some compounds, while for 
independent of exposure duration) and td2pLL (new ECR model). The others, a more flexible model might be more suitable. 
subgraphs correspond to increasing noise (from left to right) and A detailed look suggested a possible extension of the td2pLL model, 
increasing exposure duration effect on the EC50 dependency (no, weak, namely a dependency on the (currently constant) slope parameter on the 
strong, from top to bottom). exposure duration. However, such model extensions might cause prob-
In general, the Two-step pipeline performed very well in all sce- lems as a more complex model is more difficult to fit, especially for 
narios, with comparatively small deviations between estimated and true typically small cytotoxicity data sets. Due to the small sample sizes and 
EC50 values. The method decides in a first step, if the EC50 changes number of different exposure durations of the analyzed data, further 
across exposure durations using a nested ANOVA approach. If so, a ECR cytotoxicity data are required to assess the general benefit of the 
td2pLL model is fitted that accounts for such an effect. If not, the effect of model for target-concentration estimation or if model modifications are 
exposure duration is ignored and a single 2pLL curve is fitted for all required. 
exposure durations. In general, exposure duration-related questions regarding the toxi-
On the one hand, ignoring the exposure duration (Single 2pLL) was cological mechanism of a compound can be analyzed with the td2pLL 
clearly worse, when an exposure duration-effect was present (in models model, such as: What is the hypothetical EC50 of an infinitely long 
M1 and M2, middle and bottom row subgraphs). Fitting separate 2pLL exposure duration? When does expanding the exposure duration have 
curves (Sep. 2pLL) lead to larger performance losses especially in the no further influence on the toxic effect? Due to its mechanistic moti-
case without an exposure duration effect (M0, top subgraphs). Ignoring vation, the model parameters can be easily interpreted to answer these 
the exposure duration and fitting a single concentration-response curve questions. In fact, one model parameter is the hypothetical limit EC50 for 
would better exploit the data structure. The performance loss was due to an infinitely long exposure duration. 
the much smaller effective sample size in the separate fits (Sep. 2pLL). We also proposed a two-step pipeline that first performs a statistical 
Furthermore, increasing the sample size (nobs) naturally improved test to decide whether a td2pLL model fit might be beneficial, or if it 
the performance (lower values for estimated to true EC50 deviance are suffices to ignore the exposure duration and fit a regular 2pLL 
observed for boxplots with red color compared to blue color). However, concentration-response curve. The two-step procedure and always 
the beneficial effect diminished if there was only little noise (N1) or if the fitting a td2pLL model achieved higher precision in target concentration 
method was based on severely incorrect assumptions (using Single 2pLL estimation than separately fitting regular 2pLL curves per exposure 
for scenario M2 with strong exposure duration effect). duration or just fitting a single 2pLL curve for all exposure durations, as 
Lastly, always fitting a td2pLL model (without ANOVA pre-step) demonstrated in a simulation study based on the data of Gu et al. [1]. We 
performed almost as well as the two-step pipeline that includes the recommend using the new two-step pipeline as it was successful in 
pre-test, with only little performance loss when there was no exposure deciding if a td2pLL model or just a simple 2pLL model should be fitted. 
duration effect (M0). Since using the two-step pipeline instead of always For practitioners, easy applicability of the proposed model and the two- 
fitting the tp2pLL model can lead to a simpler model with fewer pa- step pipeline was achieved by providing the new R-package td2pLL, 
rameters (Single 2pLL), this is preferred over the (more complicated) which is available at https://github.com/jcduda/td2pLL/. 
td2pLL model. This means that in some cases, the ANOVA-based two- In summary, the td2pLL model and the two-step pipeline are prom-
step pipeline avoids the fit of an unnecessarily complex model. In ising approaches to increase precision in target concentration estimation 
summary, the simulation study clearly promoted the use of the proposed for cytotoxicity experiments in which different exposure durations are 
two-step pipeline for target concentration estimation in ECR data. tested. The proposed methods outperformed the typical approach of 
separately fitting concentration-response curves for each exposure 
4. Discussion duration, as they can incorporate the data of all exposure durations for a 
single fit, which additionally allows cytotoxicity response estimations at 
In this work we proposed a new ECR model, named td2pLL, for an concentrations and exposure durations not conducted in the experiment. 
11
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
APAP ASP BOS BPR BUSF
200
120
1 150 140
150 2 150 100 120
7
100 80 100100 100
60 80
50 50 40 6050
20 40
0
0 0 200
0 0.1 1 10 0 0.1 1 10 0 0.01 1 0 0.001 0.1 1 0 0.01 0.1 1
Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg]
CBZ CHL CLON DFN DMSO
150 120 200
100 150 150
100 10080
100
60 100
50 40 50 50
50
20
0
0 0 0 0
0 0.01 0.1 1 0 0.001 0.1 1 0 0.1 10 0 0.01 0.1 1 0 10 100 1000
Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg]
ETOH FAM GLC HYZ INAH
150
150 120
140
120
100 100
100 100
100 80 80
60
50 60
50 40
50 40
20 20
0
0 0 0
0 1 10 100 1000 0 0.1 1 0 1 10 100 0 0.01 1 0 1 10 100
Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg]
KC LAB LEV MEL MePA
150 150
150 150
100 100100 100
100
50 50 50 50
50
0 0 0 0
0 0.001 0.01 0.1 0 0.001 0.01 0.1 0 0.1 1 10 0 0.1 1 0 0.001 0.01 0.1
Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg]
NAC NFT NIM PHB PMZ
200 150
250 150 150
200 150 100
100 100
150 100
100 50 5050 50
50
0 0 0 0 0
0 0.1 1 10 0 0.01 1 0 0.001 0.1 1 0 0.01 0.1 1 0 0.01 1
Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg]
PPL RIF TSN VITC VPA
140 200 150
120 120
100 150 100 150 100
80
100 80
60 10060
40 5050 40 50
20 20
0 0 0 0 0
0 0.01 1 0 0.01 0.1 0 0.001 0.01 0.1 0 0.01 1 10 0 0.1 1 10
Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg] Concentration [mg]
Fig. A1. Overview of separate 2pLL model fits by exposure duration (1, 2 or 7 days - legend in upper left plot) for all compounds of Gu et al. [1]. For compounds 
BUSF, FAM, MePA and RIF, computational problems occured as due to a high level of noise or a lack of a detectable concentration-dependent trend in viability, a flat 
concentration-response curve was fitted for some exposure durations. Only the concentration-wise response means are shown. 
Declarations Toxicology” (RTG 2624, Project P2) funded by the Deutsche For-
schungsgemeinschaft (DFG, German Research Foundation - Project 
Funding Number 427806116). 
This work has been supported (in part) by the Research Training 
Group ”Biostatistical Methods for High-Dimensional Data in 
12
Viability [%] Viability [%] Viability [%] Viability [%] Viability [%] Viability [%]
Viability [%] Viability [%] Viability [%] Viability [%] Viability [%] Viability [%]
Viability [%] Viability [%] Viability [%] Viability [%] Viability [%] Viability [%]
Viability [%] Viability [%] Viability [%] Viability [%] Viability [%] Viability [%]
Viability [%] Viability [%] Viability [%] Viability [%] Viability [%] Viability [%]
J. Duda et al.                                                                                                                                                                                                 C   o m   p  u t a  t i o  n  a l  T  o  x  i c o  l o  gy 23 (2022) 100234
Data availability References 
The data presented in this study are openly available in Archives of [1] X. Gu, W. Albrecht, K. Edlund, F. Kappenberg, J. Rahnenführer, M. Leist, 
Toxicology at 10.1007/s00204-018–2302-0. W. Moritz, P. Godoy, C. Cadenas, R. Marchan, Relevance of the incubation period in cytotoxicity testing with primary human hepatocytes, Arch. Toxicol. 92 (12) 
(2018) 3505–3515. 
Supplementary information [2] M.D. Arbo, S. Melega, R. Stöber, M. Schug, E. Rempel, J. Rahnenführer, P. Godoy, R. Reif, C. Cadenas, M. de Lourdes Bastos, Hepatotoxicity of piperazine designer 
drugs: up-regulation of key enzymes of cholesterol and lipid biosynthesis, Arch. 
To reproduce all figures and the simulation results, code is available Toxicol. 90 (12) (2016) 3045–3060. 
at https://github.com/jcduda/td2pLL-code. [3] ICCVAM: Guidance document on using in vitro data to estimate in vivo starting doses for acute toxicity. NIH Publication NO 01-4500 (2001). Available here 
[accessed 29 September 2021]. 
CRediT authorship contribution statement [4] R. Riddell, D. Panacer, S. Wilde, R. Clothier, M. Balls, The importance of exposure 
period and cell type in in vitro cytotoxicity tests, Altern. Lab. Anim. 14 (2) (1986) 
86–92. 
Julia Duda: Conceptualization, Methodology, Software, Validation, [5] NICEATM: In vitro cytotoxicity test methods for estimating acute oral systemic 
Formal analysis, Investigation, Data curation, Writing - original draft, toxicity: Background review document. NIH Publication No. 07-4518 (2006). 
Available here [accessed 29 September 2021]. 
Visualization. Jan G. Hengstler: Validation, Resources, Writing - re- [6] R Core Team: R: A Language and Environment for Statistical Computing. R 
view & editing, Supervision, Funding acquisition. Jörg Rahnenführer: Foundation for Statistical Computing, Vienna, Austria (2021). R Foundation for 
Conceptualization, Validation, Writing - review & editing, Supervision, Statistical Computing.https://www.R-project.org/. 
Project administration, Funding acquisition. [7] C. Ritz, Toward a unified approach to dose-response modeling in ecotoxicology, Environ. Toxicol. Chem. 29 (1) (2010) 220–229. 
[8] C. Ritz, F. Baty, J.C. Streibig, D. Gerhard, Dose-response analysis using r, PloS One 
10 (12) (2015) 0146021. 
Declaration of Competing Interest [9] F. Kappenberg, M. Grinberg, X. Jiang, A. Kopp-Schneider, J.G. Hengstler, J. Rahnenführer, Comparison of observation-based and model-based identification 
of alert concentrations from concentration–expression data, Bioinformatics (2021). 
The authors declare that they have no known competing financial [10] F. Haber, Zur Geschichte des Gaskrieges, in: Fünf Vorträge aus den Jahren 
interests or personal relationships that could have appeared to influence 1920–1923, Springer, Berlin, Heidelberg, 1924, pp. 76–92. [11] F.J. Miller, P.M. Schlosser, D.B. Janszen, Haber’s rule: a special case in a family of 
the work reported in this paper. curves relating concentration and duration of exposure to a fixed level of response 
for a given endpoint, Toxicology 149 (1) (2000) 21–34. 
[12] W.W. Focke, I. Van der Westhuizen, N. Musee, M.T. Loots, Kinetic interpretation of 
Acknowledgments log-logistic dose-time response curves, Sci. Rep. 7 (1) (2017) 1–11. 
[13] A. Schüttler, R. Altenburger, M. Ammar, M. Bader-Blukott, G. Jakobs, J. Knapp, 
The authors gratefully acknowledge the computing time provided on J. Krüger, K. Reiche, G.-M. Wu, W. Busch, Map and model—moving from 
observation to prediction in toxicogenomics, GigaScience 8 (6) (2019) 057. 
the Linux HPC cluster at Technical University Dortmund (LiDO3), [14] A. Serra, M. Fratello, G. Del Giudice, L.A. Saarimäki, M. Paci, A. Federico, D. Greco, 
partially funded in the course of the Large-Scale Equipment Initiative by Tindermix: Time-dose integrated modelling of toxicogenomics data, GigaScience 9 
the German Research Foundation (DFG) as project 271512359. (5) (2020) 055. 
[15] F. Kappenberg, T. Brecklinghaus, W. Albrecht, J. Blum, C. van der Wurp, M. Leist, 
J.G. Hengstler, J. Rahnenführer, Handling deviating control values in 
Appendix A. Figures  concentration-response curves, Arch. Toxicol. 94 (11) (2020) 3787–3798. 
[16] T.P. Morris, I.R. White, M.J. Crowther, Using simulation studies to evaluate 
statistical methods, Stat. Med. 38 (11) (2019) 2074–2102. 
13

Article 3
www.nature.com/scientificreports
OPEN Benefit of using interaction effects 
for the analysis of high‑dimensional 
time‑response or dose‑response 
data for two‑group comparisons
Julia C. Duda *, Carolin Drenda , Hue Kästel , Jörg Rahnenführer & Franziska Kappenberg 
High throughput RNA sequencing experiments are widely conducted and analyzed to identify 
differentially expressed genes (DEGs). The statistical models calculated for this task are often not 
clear to practitioners, and analyses may not be optimally tailored to the research hypothesis. Often, 
interaction effects (IEs) are the mathematical equivalent of the biological research question but are 
not considered for different reasons. We fill this gap by explaining and presenting the potential benefit 
of IEs in the search for DEGs using RNA‑Seq data of mice that receive different diets for different time 
periods. Using an IE model leads to a smaller, but likely more biologically informative set of DEGs 
compared to a common approach that avoids the calculation of IEs.
With the rapid developments in next-generation sequencing (NGS) technology in the last decades, analyses of 
gene expression data have become regular in many l aboratories1. A common goal is to identify differentially 
expressed genes (DEGs) that are responsible for the observable differences between, e.g., groups of individuals 
with different treatments or genotypes. Many software applications became available to optimally extract infor-
mation from the large amounts of experimental  data2. The mathematics behind these algorithms and models 
is often complicated, which can lead to suboptimal data analysis from practitioners and bioinformaticians. The 
interaction effect (IE) between two or more factors of interest is a methodological aspect that is often not con-
sidered in analyses where it could be beneficial. IEs are well-known in statistical modeling but are often not used 
in practice. Properly including and interpreting an IE in gene expression data analyses can be challenging, and 
the possibility of using an IE is often overlooked. An obvious reason for not using IEs in DEGs analyses might 
be the complexity of the statistical models and their correct computational implementation.
In the literature, there are many application examples similar to the one we will use throughout the manu-
script, where an IE was likely beneficial to find interesting DEGs, but not considered. For example,3 dealt with 
time-restricted feeding of mice to test whether it could prevent obesity. They used D ESeq24 and the design 
included several factors such as genotype, feeding group, and time. In this setting, combining different variables 
to explore the interaction between e.g. time and genotype could have led to other, potentially more interesting 
DEGs. In another e xample5 used four separate study groups to analyze the differences in heart failure in mice. 
They either received a standardized chow or a high-fat diet for 12 weeks, and either additionally received angio-
tensin II after 8 weeks or not. Here as well, analyzing the excluded interaction between diet and hormones could 
lead to additional interesting insights.
Examples with an IE included in the DEG analysis were provided b y6,7. Sloley et al.6 studied the exposure to 
high-frequency head impacts in mice. They use the DESeq2 package and their design contains an IE of the two 
factors treatment and injury. Similar methods are used i n7, in which mice were treated with acarbose at three 
independent study sites. Their model contains the variables treatment, sex, and the interaction between them.
In this work, we explain the use, interpretation, and potential benefit of using IEs in gene expression analysis 
to identify DEGs. The article equips practitioners with a less profound statistical background with the knowl-
edge to decide if the use of an IE helps answer their research question. We therefore aim at keeping the level of 
mathematical complexity low, to reach a wider range of potential users. Mathematical details can be found  in8,9. 
We illustrate, explain, and compare DEG analyses with and without IE using a gene expression data set f rom10, 
where mice were fed either an unhealthy or a healthy diet for 3 to 48 weeks.
Department of Statistics, TU Dortmund University, Vogelpothsweg 87, 44227 Dortmund, Germany. *email: duda@
statistik.tu-dortmund.de
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 1
Vol.:(0123456789)
www.nature.com/scientificreports/
The article is structured as follows. We first explain the IE from different perspectives. Then we conceptually 
compare the use of an IE with the common approach that avoids modeling of interaction w.r.t. the resulting 
DEGs. The two methods are applied to the data set at hand and the differences in the results are discussed and 
explained in detail.
Material and methods
Data
The data set was first presented b y10, where mice were fed with two different diets over the course of 48 weeks. 
One diet was the high-fat or ‘Western’ diet (WD) and the control was a standard diet (SD). The nine analysis time 
points within the 48 weeks were week 3, 6, 12, 18, 24, 30, 36, 42 and 48. In total 79 samples (mice) were used. The 
gene expression data from 35,727 genes were measured using RNA-seq. After removing the weeks with no data 
from mice in one of the two groups, 64 samples from the weeks 3, 6, 30, 36, 42, and 48 were left. To focus on the 
explanatory aim, analyses were mostly restricted to the data of weeks 3 and 6. The sample sizes in the remaining 
weeks are 7, 5, 5, 7, 3, 5 for SD and 5, 5, 5, 5, 4, 8 for WD. Further pre-processing is explained in “Implementation”.
Interaction effects explained
When two or more factors are of interest in an experiment, one should consider including IEs in the statistical 
model. Only using additive or main effects may not result in adequate modeling of the data. In Fig. 1, different 
effect scenarios are visualized using interaction plots for the case of two factors of interest, e.g. some group (0 = 
blue, 1 = red) and a compound with low and high concentration. In Fig. 1a, there is no interaction between the 
group and the concentration: The increase of the response from the low to the high concentration is the same for 
group 0 and group 1. At the same time, for a fixed concentration, the difference in the responses between group 
0 and group 1 is the same. One can describe the absence of an IE graphically, biologically, and mathematically.
• Graphically, an additive effect or the lack of an IE results in parallel lines between the two groups.
• Biologically, the effect of the concentration does not interact with the effect of the group, because it is always 
the same increase in response from low to high concentration, regardless of the group.
• Mathematically, considering two factors with two levels each, a classical linear model, or equivalently an 
ANOVA model, with only additive effects for the two factors and normal noise is appropriate to model the 
data. This formalizes to 
yj = µ+ α · gj + β · cj + εj (1)
 where j indicates the sample, gj indicates if the jth sample is in group 0 ( gj = 0 ) or in group 1 ( gj = 1 ), and 
cj indicates if the j-th sample is exposed to the low concentration ( cj = 0 ) or the high concentration (c j = 1).
  The mean difference in the responses for group 1 compared to group 0 is α and for increasing the concen-
tration from low to high, the mean difference is β.
  For example, if the j-th sample is in group 0 ( gj = 0 ) and exposed to the low concentration ( cj = 0 ), the 
expected response is µ+ 0 · α + 0 · β = µ . The intercept µ represents the mean response in the reference 
group (0) with the reference concentration (low).
Figure 1.  Schematic depiction of data scenarios without and with IE. (a) Group 0 (blue) and 1 (red) both have 
a positive effect for treatment high compared to low and a positive group effect, but no IE. (b) As in (a), but 
with an additional positive IE. (c) Negative IE between group and treatment. (d) No treatment effect for group 
0. The treatment effect for group 1 is entirely represented by the IE. (e) Both groups display a positive treatment 
effect and there is no group effect in the treatment category low, only in high, i.e. an IE is present. (f) Negative IE 
between group and treatment, but no line crossing as in (c).
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 2
Vol:.(1234567890)
www.nature.com/scientificreports/
The contrary case, the presence of a clear IE with a changed direction for the concentration effect, is depicted in 
Figure 1c. The crossing lines mean that the effect of a concentration increase is not additive (it is not the same 
within both groups). Instead, the concentration effect depends on the group, i.e. there is an interaction with 
the group effect. For group 0, an increase in the concentration leads to an increase in the response, whereas for 
group 1, an increase in the concentration leads to a decrease in the response. The additive model (1) can not 
capture this interaction as the model fit would force parallel lines into the effect plot. Mathematically, a model 
that accounts for the interaction between group and treatment is, therefore, an extension of the model in Eq. 
(1) by adding the IE γ:
yj = µ+ α · gj + β · cj + γ · gj · cj + εj . (2)
If the j-th sample is exposed to the higher concentration ( cj = 1 ) and is in group 1 ( gj = 1 ), then the mean 
response is µ+ α + β + γ . The interaction term γ · gj · cj allows the lines in the interaction plot to be non-
parallel. It is important to note that an IE does not necessarily visualize as a crossing of lines in an interaction 
plot, but simply non-parallel lines, such as in the examples shown in Fig. 1b, d, e, and 1f. We elucidate the use 
of IEs when analyzing real data in the context of biological research questions in “When do interaction effects 
capture the research question?”.
Interaction effects calculated with DESeq2
In this section, we explain the mathematical background of gene expression modeling with the popular R-package 
DESeq2 4. Details on statistical concepts presented here may not be relevant to readers who are more application-
oriented and can be ignored without risking comprehension of the remaining sections. However, to understand 
an IE in more depth, we encourage to understand the parameters in the model formula (4).
Consider the count matrix K, where Kij are the count reads of gene i for sample j, i ∈ {1, ..., n} , j ∈ {1, ...,m} . 
To model the count data, DESeq2 uses a generalized linear model with a negative binomial distribution 
Kij ∼ NB(µij , τi) with mean µij and gene-specific dispersion τi.
The mean of the observed counts µij = sjqij is modeled with the parameter qij , which is proportional to the 
expected true concentration of fragments for sample j and rescaled with a sample-specific size∑ factor sj . The 
parameter qij is modeled with a generalized linear model using the logarithmic link: log2(qij) = r βirxjr . In a 
factorial design, xjr ∈ {0, 1} indicates if the rth explanatory variable applies to sample j, such that for the ith gene, 
βir is the log2 FC for factor level r compared to the reference factor level.
For our application example (“Data” ), the model has one factor for the diet (two values) and one factor for 
the week (six values). A model with the parameters for the week and diet without interaction is fitted for each 
gene i, 1 ≤ i ≤ 35, 727 . In the following, we suppress the gene index i and consider the sample (mouse) index j. 
The model used in DEseq2 is then
∑6
log2(qj) = µ+ α · dj + βr · wjr , (3)
r=2
where µ (intercept) denotes the response at the reference (SD and week 3), and α is the WD (main) effect. The 
variable dj is binary with value 0 for the SD and value 1 for the WD. The parameters βr , r ∈ {2, ..., 6} , correspond 
to the week effects. The variable wjr is the indicator variable for the week, i.e. wj2 = 1 only for week 6.
Now, adding an IE, the model is
∑6 ∑6
log2(qj) = µ+ α · dj + βr · wjr + γr · dj · wjr . (4)
r=2 r=2
The parameter γ2 denotes the IE between the factor diet and the factor week, comparing week 6 to week 3. The 
parameter γ3 refers to the interaction between the diet and week, comparing week 30 to week 3, and so on. Due 
to the log2 transformation for the sample concentration qj , the parameters must all be interpreted accordingly. 
For example, an IE of γ2 = 3 means that the difference between the diet effect in week 3 and the diet effect in 
week 6 is 23 = 8 , or has a FC of 8.
When do interaction effects capture the research question?
In RNA-Seq experiments, often the case of two factors, e.g. treatment and genotype, are analyzed, and it is of 
interest whether the effect of the treatment differs between the genotypes (in certain genes). The research ques-
tion might be formulated as: Does the genotype affect the treatment effect? IEs capture such a research question 
and they should therefore be considered for the analysis.
In our application example, the two factors are diet and week, where diet is either a WD or a SD and week 
indicates the feeding duration. In this dataset measurements for different time points are available, and we focus 
on the two shortest durations, 3 weeks and 6 weeks, to explain the IE concept. The 3-week time point can be 
considered the reference level of the factor week. The research goal is to identify genes where activation/deac-
tivation from weeks 3 to 6 induced by the WD is different compared to the SD. Mathematically, this research 
question translates into identifying genes with an IE between diet and week. Consequently, the use of a model 
that includes an IE should be considered.
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 3
Vol.:(0123456789)
www.nature.com/scientificreports/
How do interaction effects capture the research question?
To explain how IEs capture the research question, we visualize the benefit of adding IEs to a linear model, using 
our example dataset. In Fig. 2, for the mice groups, for each combination of diet type and week, expression values 
and fitted means are plotted, exemplary for one selected gene. Once no IEs are included in the model (Fig. 2, 
left), and once IEs are included (Fig. 2, right).
Without IEs, the estimated effect differences between the diets, represented by arrows, are mathematically 
forced to be the same across all weeks (vertical lines have the same length).
Consequently, in week 3, the effect is markedly overestimated, as the arrow between SD and WD is larger than 
the pure difference in group means. In contrast, if an IE is used (Fig. 2, right), then the group means estimated 
by the model capture well that the diet effect varies across weeks. The mathematical formulas of the estimated 
effects represented by the arrows are explained in “Interaction effects calculated with DESeq2”.
Comparison of methods for estimating interaction effects
In this section, we compare the results obtained by fitting an interaction model between two factors (called 
Method II in the following) with a far more popular alternative, which we call Method I. The alternative approach 
avoids the direct modeling of an IE between two factors as follows: The data are split with respect to the second 
factor (e.g. week) into two groups G0 and G1 . Then for group G0 and G1 separately, a model comparing the groups 
with respect to the first factor (e.g. diet) is fitted. Finally, it is analyzed, if for one group, typically the reference 
group G0 , no significant effect is observed, and for the other group G1 , there is a significant effect present.
The differences between the two approaches are illustrated and discussed on the mouse dataset, where for 
Method I the groups G0 and G1 are defined by week 3 (as reference) and week 6 (or larger week numbers, respec-
tively). The models per week contain only one factor (diet) with two levels, SD and WD. Since separate models 
are fitted per week, the model-wise diet effect is allowed to vary across weeks.
When interpreting the results of the differential expression analysis, a consideration of both statistical sig-
nificance and biological relevance is necessary: A p-value smaller than the significance level, which constitutes a 
statistically significant result, does not necessarily mean that the mean effect level, given here by the log2-Fold 
Change ( log2FC), is of relevant size. On the other hand, a mean effect level larger than a pre-specified threshold, 
motivated by the biological context, does not always correspond to small p-values 11. Thus, to interpret a gene to 
be a differentially expressed gene (DEG), we always require two conditions to be fulfilled: The (FDR-adjusted) 
p-value is smaller than a significance level, and the log2 FC is larger than a pre-specified threshold.
For the mouse dataset and the separate models (Method I), only those genes that show a diet effect (both 
significant and relevant) in week 6, but not in the reference week 3, are considered DEGs. The motivation is 
that interesting genes show no effect at the reference time point, where the diet had too little time to cause a 
differential effect, but later (at 6 weeks) the diet causes such a difference. For the interaction model (Method II), 
not two models but only a single model is fitted. To detect DEGs, one simply checks if the estimated IE is both 
significant and relevant.
• Method I (Separate): Separately for each week: Fit a one-factor model (two-group comparison, see equation 
(1)).
  A gene is DEG if the diet effect is both significant and relevant in week 6, but not both in week 3.
• Method II (Interaction): Fit a two-factor model between week and diet (including week, diet, and interaction), 
see equation (2).
  A gene is DEG if the IE is both significant and relevant.
11 11
10 10
9 9
8 8
7 7
6 6
5 5
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50
weeks weeks
count observed mean diet SD WD
Figure 2.  Visualization of the fitted model without IE (left) and with IE (right) for the mice dataset, for the gene 
identifier ENSMUSG00000069170 (Adgrv1). The arrows represent the estimated log2FCs according to Eq. (3) 
for the left fit, and Eq. (4) for the right fit. For both fits, µ (green arrow) is the expected mean gene expression 
level for the reference values three weeks and SD, and α (vertical dark grey arrows) is the estimated FC between 
SD and WD at each week. Further, both models include the week effects βr (blue arrows). The right model 
additionally includes interaction effects (yellow, orange, and red arrows) that correspond to γr in formula (4).
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 4
Vol:.(1234567890)
log2(normalized count)
log2(normalized count)
www.nature.com/scientificreports/
To visualize the differences between the decision outcomes (gene is DEG or not) of Method I and II, Fig. 3 dis-
plays 7 cases using simulated data scenarios. The data are generated with constant residual variance, so that the 
decision is not influenced by differing variance values, but only by the estimated effect (arrow lengths).
• Case 1: Within both weeks, the estimated diet effect is not relevant (dotted green effect arrow). There is hence 
is no DEG by Method I. Since the effects are of similar size, the IE estimated by Method II (pink arrow) is 
not significant, and neither Method II classifies the gene as DEG.
• Case 2: In week 3, the effect is not relevant, in week 6 it is both significant and relevant. This leads to a sig-
nificant IE for Method II. Therefore, both Method I and Method II classify the gene as DEG.
• Case 3: The diet effect is significant in both weeks. Since it is significant in week 3, Method I does not classify 
the gene as DEG. However, the diet effect in the second week is much larger, such that the IE is significant, 
and Method II classifies the gene as DEG.
• Case 4: Similar to case 3, but the effect direction of the diet effect changes: In the first week, there is a positive 
effect, and in the second week a negative effect. Again, only Method II classifies the gene as DEG.
• Case 5: In week 3, the diet effect is just below the significance level, whereas in week 6 it is just above the 
significance level. Therefore, Method I labels the gene as DEG. For Method II, the IE is not significant as the 
diet effect does not differ much between the weeks. Method II does not label the gene as DEG.
• Case 6: Similar to case 4, but the effect in week 3 is not significant. Now both methods classify the gene as 
DEG.
• Case 7: The direction of the diet effect changes. It is positive in week 3 and negative in week 6. Within each 
week, the effect size is not significant, therefore Method I classifies the gene as not DEG. The overall change 
in the effect represented by the IE is significant. Therefore, Method II labels this gene as DEG.
Implementation
For all calculations, R 12, version 4.2.2, and the packages DESeq24, version 1.38.1, and topGO13, version 2.50.0, 
were used for determining DEGs and performing gene ontology enrichment analyses (GO EA), respectively. 
The entire code is shared on GitHub (https:// github. com/ jcduda/g ene_e xpre ssion_i nter action). We specify the 
models of Method I and II in DESeq2 using
• Method I: DESeqDataSet(gse, design = ∼ diet)
• Method II: DESeqDataSet(gse, design = ∼ diet + weeks + diet:weeks)
In the example, the code for Method I is applied twice for separate weeks, i.e. for two different data sets ‘gse’, while 
the code for Method II is applied only once. Note that a model based on ∼ diet + weeks results in the same 
parameter values for each week, making it unsuitable for comparison with Method I and Method II, see Fig. 2.
One notable preprocessing step was the filtering. Removing only genes with less than ten counts over all 
samples resulted in a peak of the estimated diet effect at 0.206 (Supplementary Fig. 1). However, removing genes 
with more than 50% of samples with 0 counts leads to reasonably estimated effects without artifactual spikes 
in the histogram (Supplementary Fig. 2). Further, we shrunk the estimated effects using approximate posterior 
estimation with the lfcShrink f unction14. Effects that are non-zero only due to noise are shrunk to zero, while 
large, reliable effects are not affected.
Results
We compare Method I (separate) and Method II (interaction) for the mouse dataset, w.r.t. classification of genes 
as DEG or not DEG, as described in “Comparison of methods for estimating interaction effects”. In the following 
list, we define the terms significant, relevant, and DEG in the context of the example study.
For Method I we call a gene
• significant, if false discovery rate (FDR) adjusted p-value < 0.05 (for a specific week X)
• relevant, if absolute log2 FC > log2(1.5) (for a specific week X)
• DEG for week X, if it is significant and relevant for week X
• DEG, if it is not DEG for week 3, but DEG for week 6
For Method II we call a gene
• significant, if FDR adjusted p-value < 0.05 (for the IE)
• relevant, if absolute log2 FC > log2(1.5) (for the IE)
• DEG, if it is significant and relevant (for the IE)
For Method I, up-regulated DEGs for week X have a positive diet effect in week X. For Method II, up-regulated 
DEGs have a positive IE. Down-regulated DEGs are defined accordingly.
Comparison of genes selected by Method I and Method II
We expect a relevant number of DEGs, since a biological effect of the diet (WD vs. SD) is reported b y10. Table 1 
shows the number of DEGs in week 3 and DEGs in week 6, according to Method I (simple comparison per 
week). There are more DEGs after 6 weeks of feeding compared to 3 weeks, both for up- or down-regulation. 
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 5
Vol.:(0123456789)
www.nature.com/scientificreports/
For up-regulated genes, 104 genes are DEGs only for week 3, 81 genes that are DEGs in both weeks, and 1,622 
genes that are DEGs only in week 6. Hence, for Method I, regarding up-regulation, one would focus on the 1622 
DEGs that are only identified for week 6 and not for week 3.
Table 2 presents a main finding of our study, a comparison of DEGs identified with Method I and Method II. 
One can see that Method I (separate) identifies more DEGs than Method II (interaction). However, the DEGs 
identified by Method II are not all contained in the DEGs identified by Method I. There are almost 200 genes 
only identified by Method II, both for up-regulation and for down-regulation.
Characterization of genes that are DEG only for Method I or only for Method II
To understand the benefits of the two methods, we characterize the genes that are only identified by one of the 
two approaches, respectively. After a mathematical characterization, we also investigate biological differences.
An insightful example is gene Sirt7 in Fig. 5, which is a typical case for being DEG by Method II, but not 
by Method I. From week 3 to week 6, there is an interaction between the factor week and diet (crossing of grey 
lines). The IE (large yellow arrow) is significant and relevant, making this gene DEG for Method II. However, 
for Method I the log2 FC of the diet effect in week 6 is not large enough to pass the threshold of log2(1.5) . Hence, 
Sirt7 is not identified as DEG by Method I, even though an important underlying diet effect dependent on the 
time seems reasonable. Such genes are overlooked by the popular Method I.
To better understand the differences between the two approaches, Fig. 6 shows regions of genes classified as 
DEG by both, none, or only one of the two methods, dependent on the main effect (diet) and the IE, as obtained 
by the interaction model (2) used by Method II.
Each dot represents a single gene. If there is no interaction (cf. Fig. 1a), the estimated IE is (close to) 0, such 
that the x- and y-value are identical and the gene is on the diagonal. For better illustration, the estimated effects 
are not shrunk and the decision rule depends on the log2 FC threshold only. In practice, log2 FC estimates should 
be subject to shrinkage and the classification into a DEG depends on both, log2 FC and adjusted p-value (Sup-
plementary Fig. 3 in the Appendix).
The genes can be divided into four groups according to the DEG classification of Method I and Method II. 
The numbers 1–7 assigned to regions match the simulated cases in Fig. 3 and a real gene expression pattern of a 
representative gene shown in Fig. 4. In the following, the gene expression patterns corresponding to the colored 
regions in Fig. 6 are explained.
• Orange: not DEG for both methods. Genes closer to the diagonal than log2(1.5), such that the IE is below this 
threshold and the gene is not DEG for Method II. Further, genes with absolute main effect above log2(1.5) 
are DEG for week 3 and thus not DEG for Method I.
• Green: DEG only for Method I. Genes with absolute main effect and IE less than log2(1.5), but overall effect 
in week 6 greater than log2(1.5). These genes are not DEG in week 3 by being slightly below the threshold 
but are DEG in week 6 by being slightly above the threshold. Hence, they are DEG for Method I, but the IE 
is small and the gene is not DEG for Method II.
• Purple: DEG for Method I and II. Genes with an estimated main effect (for week 3) below the log2 FC bounda-
ries, but the sum of main and IE (diet effect for week 6) is outside these boundaries. Hence, these genes are 
DEG for Method I. For Method II, they are DEG since the IE is large enough (points far from the diagonal 
line).
• Blue: DEG only for Method II. Genes that are not DEG for Method I since they are either DEG in week 3 
(main effect outside ± log2(1.5)) or have a main effect inside ± log2(1.5) (as gene 7) but are not DEG in week 
6, since the corresponding effect (main plus IE) is also within ± log2(1.5)).
We further looked at differences concerning the biological conclusions of the found DEGs. First, a qualitative, 
small literature research on the top 10 (lowest adj. p-value) upregulated DEGs found only by Method I or only by 
Method II, respectively, suggests that both methods find genes that are reasonably associated with liver disease 
induced by a fatty diet (Table 4; Supplementary Table 1). On a broader scale, a GO EA was performed on the 
DEGs found by Method I, Method II, and the combination of both DEG sets (Table 3; Supplementary Table 2). 
Despite the smaller number of DEGs identified by Method II, the biological interpretation based on the processes 
identified by GO EA is very similar and plausibly covers immune activation related to fatty liver disease. This 
suggests that the DEGs found by Method II are more specific in the sense that they include fewer non-relevant 
genes while yielding similar GO EA results.
Discussion
Using an IE model with 2 factors (Method II) instead of two separate models with one factor each (Method I) 
clearly changes the set of DEGs found in a gene expression analysis. The set of DEGs found with Method II is 
usually smaller. A theoretical reason for this is that statistical inference that aims at detecting IEs is less powerful 
in the sense that the sample size must be four times larger to have the same power for detecting an IE than to 
detect a main  effect15,16, p. 100f.
Further, a gene that just passed the thresholds for being DEG for the reference group, but just not for the 
other group, is DEG for Method I but usually not for Method II, and it is not a good candidate for a biologically 
meaningful statement. The resulting DEGs for Method II are smaller in number, but lead to equally reasonable 
biological findings based on enrichment analyses. A limitation of Method II is that a single model with two main 
factors and an IE can be more difficult to interpret correctly than two models with one factor each and no IE. 
Quantifying if the smaller set of DEGs found by Method II contains less irrelevant genes is difficult for several 
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 6
Vol:.(1234567890)
www.nature.com/scientificreports/
Figure 3.  Visualization of seven example scenarios with different main effects and IEs, leading to different 
decisions for Method I (left column) and Method II (right column). Dots represent data points (blue: SD, red: 
WD; left: 3 weeks, right: 6 weeks), arrows represent effects (black: reference mean, green: main effect of diet, 
purple: IE). Dotted arrows indicate non-relevance (absolute effect size below threshold), solid arrows represent 
relevant effects. Dotted arrows are only shown for the main effects of IEs. The label ’DEG’ below a scenario 
indicates if the respective method classifies a gene as DEG (green) or not DEG (red).
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 7
Vol.:(0123456789)
www.nature.com/scientificreports/
Figure 4.  Example genes that are, according to DEG decision cases 1–7, not always classified in the same 
way by Method I (left) and II (right). Note that the original data are the same per gene (row), but due to the 
differences between Method I and II, background normalizations yield slightly different data for each gene. For 
normalization, DESeq estimates the library sizes as the median of the ratios of observed  counts9. See caption of 
Figure 3 for an explanation of the arrows.
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 8
Vol:.(1234567890)
www.nature.com/scientificreports/
Week 3 only Overlap Week 6 only
Up 104 81 1,622
Down 81 93 726
Table 1.  Overview of DEGs for Method I, comparison of SD and WD.
Method I only Overlap Method II only
Up-regulated 914 695 167
Down-regulated 540 177 186
Table 2.  Comparison of DEGs identified with Method I and Method II. Note that 914 + 695 = 1609 does not 
equal 1622 in Table 1, because here we do not include genes that are downregulated in week 3, as otherwise 
they would not be DEG by Method I.
11.0
10.5
10.0
9.5
9.0
0 5 10 15 20 25 30 35 40 45 50
weeks
count observed mean diet SD WD
Figure 5.  Expression pattern for the gene Sirt7, which is for the comparison week 3 vs. week 6 DEG for Method 
II (interaction), but not by Method I (separate), since the effect size is too low for week 6. See caption of Fig. 2 
for detailed explanation of the arrows.
reasons. First, a literature search to determine if a gene is not reported within the context of liver disease is fruit-
less. Due to false positive results and extensive research in this area, almost any gene can be found as associated. 
Second, the data set at hand does not have a clean reference, because mice were already fed with HFD for three 
weeks in the reference group, instead of being fed for zero weeks. However, within the limits of this study, the 
conceptual reasoning and analyses of GO enrichment analyses suggest that gene sets identified by Method II are 
smaller but likely contain fewer irrelevant genes.
Conclusion
An IE might often be an adequate translation of a biological research question into a statistical concept. However, 
this relationship might remain unnoticed due to a lack of expertise or reluctance to deviate from routines. In 
this work, we offer an extensive explanation of IEs and why they might be scientifically relevant in the context 
of detecting differentially expressed genes (DEGs) in gene expression analysis.
We compare the IE-based approach (Method II) with a popular alternative approach (Method I) that avoids 
the calculation of IEs. While Method I detects more DEGs, many of them might not be scientifically relevant, 
whereas the smaller set of DEGs found with Method II can be interpreted as more specific by having fewer 
irrelevant genes. We encourage researchers to clarify for each project if an IE is the accurate mathematical 
representation of the formulated research question and to use this concept when appropriate. Further, if the 
research goal is to identify a smaller gene set containing less irrelevant genes (less false positives), we encourage 
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 9
Vol.:(0123456789)
log2(normalized count)
www.nature.com/scientificreports/
Figure 6.  Characterization of regions of genes that are identified as DEG only by Method I or by Method II, 
or by both or none of the methods. The x-axis shows the estimated main effect (diet), i.e. the estimated log2 FC 
from a SD to WD in the reference week 3, and on the y-axis the sum of this main effect and the IE, i.e. the 
overall effect between the two diets in week 6 in the interaction model, is plotted.
Method I Method II Method I or II
1 Immune system process ( −292.44× 10 ) Immune system process (3 −28.33× 10 ) Immune system process ( 2.27× −2910 )
2 Immune response ( −292.44× 10 ) Immune response (3 × −28.33 10 ) Immune response ( × −292.27 10 )
3 Defense response ( −292.44× 10 ) Cell activation (3 −28.33× 10 ) Defense response ( 2.27× −2910 )
4 Pos. reg. of immune system process ( × −292.44 10 ) Response to external stimulus (5 × −2810 ) Regulation of immune system process ( × −292.27 10 )
5 Regulation of immune system process ( × −292.44 10 ) Defense response ( × −286 10 ) Pos. reg. of immune system process ( × −292.27 10 )
6 Response to other organism ( × −292.44 10 ) Response to stimulus ( −271 .65× 10 ) Response to external stimulus ( × −292.27 10 )
7 Response to external biotic stimulus ( −292.44× 10 ) Leukocyte activation ( × −27) Response to biotic stimulus ( × −292.57 10 2.27 10 )
8 Response to biotic stimulus ( × −292.44 10 ) Regulation of immune system process ( −251 .2× 10 ) Response to other organism ( −292.27× 10 )
9 Response to external stimulus ( × −292.44 10 ) Response to external biotic stimulus ( × −25 −292.27 10 ) Response to external biotic stimulus ( 2.27× 10 )
10 Defense response to other organism ( −292.44× 10 ) Response to other organism ( −252.27× 10 ) Defense response to other organism ( −292.27× 10 )
11 Innate immune response ( × −292.44 10 ) Response to biotic stimulus ( −25 Biol. proc. involved in interspecies interaction btw organ-2.27× 10 ) isms ( −292.27× 10 )
12 Cell activation ( −29 −252.44× 10 ) Pos. reg. of immune system process ( 2.92× 10 ) Cell activation ( −292.27× 10 )
13 Biol. proc. involved in interspecies interaction btw Pos. regulation of multicellular organismal process Pos. regulation of multicellular organismal process organisms ( −29 −252.44× 10 ) ( 4.31× 10 ) ( × −292.27 10 )
14 Inflammatory response ( −29 Biol. proc. involved in interspecies interaction btw −292.44× 10 ) organisms ( −25) Inflammatory response ( 2.27× 10 )8.57× 10
15 Pos. reg. of response to external biotic stimulus ( −29) Pos. reg. of response to stimulus ( 
−22
2.93× 10 ) Innate immune response ( × −292.27 10 )
2.44× 10
Table 3.  Top 15 most significant GO groups found based on upregulated DEGs by Method I, Method II and 
combining the genes found by Method I and Method II. FDR-adjusted p-values are in parentheses.
to use Method II. However, if the research goal is rather exploratory and more false positives are acceptable, we 
suggest to use Method I.
Data availability
The analyzed data sets are publicly available at the SRA database with reference number PRJNA 953810.
Code availability
The code is available on GitHub (https:// github.c om/ jcduda/g ene_e xpre ssion_ intera ction).
Appendix
See Table 4.
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 10
Vol:.(1234567890)
www.nature.com/scientificreports/
DEG only in Gene Log2FC FDR-adj. p Literature
Acnat2 (ENSMUSG00000060317) 2.02 Considered a candidate for specific metabolic processes within the Type I acyl-CoA thioesterase/< 0.01 acyltransferase gene  family17
Tlr12 (ENSMUSG00000062545) 1.45 < 0.01 Signaling in chronic liver diseases via complex immune responses mediating h epatocyte18
Fgf21 (ENSMUSG00000030827) 2.39 < 0.01 Associated with development and progression of  NAFLD19
Tgfbi (ENSMUSG00000035493) 0.79 Overexpression in mice resulted in an increased incidence of spontaneous tumors and N,N-diethyl-< 0.01 nitrosamine (DEN)-induced liver tumor n odules20
Method I Ehhadh (ENSMUSG00000022853) 0.90 < 0.01 Associated with development of fatty liver disease in dairy  cows21
Hpgds (ENSMUSG00000029919) 2.41 < 0.01 Overexpression associated with adipogenesis and increased insulin  sensitivity22
Slc17a4 (ENSMUSG00000021336) 1.14 An intestinal organic anion exporter expressed predominantly in the pancreas, liver, colon, and < 0.01 small  intestine23
Lgmn (ENSMUSG00000021190) 0.76 < 0.01 Elevated expression of LGMN is reported in the tumor cells of l iver24
Slc7a8 (ENSMUSG00000022180) 1.26 < 0.01 Slc7a8 deletion is protective against diet-induced  obesity25
Pgm3 (ENSMUSG00000056131) 1.17 26< 0.01 Up-regulated in livers of high fat diet fed  mice
Mmp12 (ENSMUSG00000049723) 4.16 < 0.01 Mmp12 is a matrix metalloproteinase and associated with liver disease and i nflammation27
Cyp2c38 (ENSMUSG00000032808) 2.47 < 0.01 Cyp2c family up-regulated in NAFLD mouse  model28
Itgam (ENSMUSG00000030786) 2.92 < 0.01 Increase in Itgam (aka Cd11b) in liver X receptors knockout  mice29
Adgrg2 (ENSMUSG00000031298) 2.76 < 0.01 Found upregulated in cholestasis liver tissue compared to mildly damaged liver t issue30
Nap1l1 (ENSMUSG00000058799) 0.71 < 0.01 Tumor promoter in hepatocellular  carcinoma31
Method II Gstm3 (ENSMUSG00000004038) 2.40 < 0.01 Associated with acute-on-chronic hepatitis B liver  failure32
Myo1c (ENSMUSG00000017774) 0.61 < 0.01 Upregulation associated with human chronic liver  disease33
Rac2 (ENSMUSG00000025003) 1.89 < 0.01 Associated with  NAFLD34
Cyp2c39 (ENSMUSG00000025003) 2.65 < 0.01 Cyp2c family up-regulated in NAFLD mouse  model28
Gm3776 (ENSMUSG00000111709) 3.11 Gm3776, or glutathione S-transferase (GST) alpha 13, belongs to GST genes that are associated < 0.01 with liver  disease35
Table 4.  Top 10 most significant up-regulated genes found only by Method I or Method II, respectively.
Received: 14 March 2023; Accepted: 8 November 2023
References
 1. Murray, D., Doran, P., MacMathuna, P. & Moss, A. C. In silico gene expression analysis—An overview. Mol. Cancer 6, 1–10 (2007).
 2. Costa-Silva, J., Domingues, D. & Lopes, F. M. RNA-seq differential expression analysis: An extended review and a software tool. 
PloS one 12, e0190152 (2017).
 3. Chaix, A., Lin, T., Le, H. D., Chang, M. W. & Panda, S. Time-restricted feeding prevents obesity and metabolic syndrome in mice 
lacking a circadian clock. Cell Metab. 29, 303–319 (2019).
 4. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with deseq2. Genome 
Biol. 15, 550. https:// doi. org/ 10. 1186/ s13059-0 14- 0550-8 (2014).
 5. Withaar, C. et al. The effects of liraglutide and dapagliflozin on cardiac function and structure in a multi-hit mouse model of heart 
failure with preserved ejection fraction. Cardiovasc. Res. 117, 2108–2124 (2021).
 6. Sloley, S. S. et al. High-frequency head impact causes chronic synaptic adaptation and long-term cognitive impairment in mice. 
Nat. Commun. 12, 1–20 (2021).
 7. Smith, B. J. et al. Changes in the gut microbiome and fermentation products concurrent with enhanced longevity in acarbose-
treated mice. BMC Microbiol. 19, 1–16 (2019).
 8. Turner, J. R. & Thayer, J. Introduction to Analysis of Variance: Design, Analyis & Interpretation: Design, Analyis & Interpretation. 
(Sage, 2001).
 9. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Nat. Prec. 1–1 (2010).
 10. Ghallab, A. et al. Spatio-temporal multiscale analysis of western diet-fed mice reveals a translationally relevant sequence of events 
during NAFLD progression. Cells 10, 2516 (2021).
 11. Hothorn, L. A. Statistics in Toxicology Using R. (CRC Press, 2015).
 12. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2022).
 13. Alexa, A. & Rahnenfuhrer, J.topGO: Enrichment Analysis for Gene Ontology. R Package Version 2.50.0. (2022).
 14. Zhu, A., Ibrahim, J. G. & Love, M. I. Heavy-tailed prior distributions for sequence count data: Removing the noise and preserving 
large differences. Bioinformatics 35, 2084–2092 (2019).
 15. Leon, A. C. & Heo, M. Sample sizes required to detect interactions between two binary fixed-effects in a mixed-effects linear 
regression model. Comput. Stat. Data Anal. 53, 603–608 (2009).
 16. Fleiss, J. L. Design and Analysis of Clinical Experiments. (Wiley, 2011).
 17. Reilly, S.-J. et al. A peroxisomal acyltransferase in mouse identifies a novel pathway for taurine conjugation of fatty acids. FASEB 
J. 21, 99–107 (2007).
 18. Kiziltas, S. Toll-like receptors in pathophysiology of liver diseases. World J. Hepatol. 8, 1354 (2016).
 19. Tucker, B., Li, H., Long, X., Rye, K.-A. & Ong, K. L. Fibroblast growth factor 21 in non-alcoholic fatty liver disease. Metabolism 
101, 153994 (2019).
 20. Han, B. et al. The role of tgfbi ( βig-h3) in gastrointestinal tract tumorigenesis. Mol. Cancer 14, 1–12 (2015).
 21. Le-Tian, Z. et al. Protein acetylation in mitochondria plays critical functions in the pathogenesis of fatty liver disease. BMC Genom-
ics 21, 1–17 (2020).
 22. Fujitani, Y. et al. Pronounced adipogenesis and increased insulin sensitivity caused by overproduction of prostaglandin d2in vivo. 
FEBS J. 277, 1410–1419 (2010).
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 11
Vol.:(0123456789)
www.nature.com/scientificreports/
 23. Togawa, N., Miyaji, T., Izawa, S., Omote, H. & Moriyama, Y. A Na+-phosphate cotransporter homologue (slc17a4 protein) is an 
intestinal organic anion exporter. Am. J. Physiol.-Cell Physiol. 302, C1652–C1660 (2012).
 24. Reddy, B. D., Beeraka, N. M., Chitturi, C. & Madhunapantula, S. V. An overview of targeting legumain for inhibiting cancers. Curr. 
Pharmaceut. Des. 27, 3337–3348 (2021).
 25. Pitere, R. R., van Heerden, M. B., Pepper, M. S. & Ambele, M. A. Slc7a8 deletion is protective against diet-induced obesity and 
attenuates lipid accumulation in multiple organs. Biology 11, 311 (2022).
 26. Jang, J.-H. et al. Klhl3 deficiency in mice ameliorates obesity, insulin resistance, and nonalcoholic fatty liver disease by regulating 
energy expenditure. Exp. Mol. Med. 54, 1250–1261 (2022).
 27. Naim, A., Pan, Q. & Baig, M. S. Matrix metalloproteinases (MMPS) in liver diseases. J. Clin. Exp. Hepatol. 7, 367–372 (2017).
 28. Xiang, L. et al. Comparison of hepatic gene expression profiles between three mouse models of nonalcoholic fatty liver disease. 
Genes Dis. 9, 201–215 (2022).
 29. Endo-Umeda, K. et al. Liver x receptors regulate hepatic f4/80+ cd11b+ Kupffer cells/macrophages and innate immune responses 
in mice. Sci. Rep. 8, 1–14 (2018).
 30. Liu, X., Taylor, S. A., Celaj, S., Levitsky, J. & Green, R. M. Expression of unfolded protein response genes in post-transplantation 
liver biopsies. BMC Gastroenterol. 22, 380 (2022).
 31. Zhang, Y.-W. et al. Nap1l1 functions as a tumor promoter via recruiting hepatoma-derived growth factor/c-jun signal in hepatocel-
lular carcinoma. Front. Cell Dev. Biol. 9, 659680 (2021).
 32. Sun, F.-K. et al. High promoter methylation levels of glutathione-S-transferase m3 predict poor prognosis of acute-on-chronic 
hepatitis b liver failure. Hepatol. Res. 47, 566–573 (2017).
 33. Arif, E. et al. Targeting myosin 1c inhibits murine hepatic fibrogenesis. Am. J. Physiol.-Gastrointest. Liver Physiol. 320, G1044–G1053 
(2021).
 34. Zhu, J., Min, N., Gong, W., Chen, Y. & Li, X. Identification of hub genes and biological mechanisms associated with non-alcoholic 
fatty liver disease and triple-negative breast cancer. Life 13, 998 (2023).
 35. Prysyazhnyuk, V. et al. Glutathione-S-transferases genes-promising predictors of hepatic dysfunction. World J. Hepatol. 13, 620 
(2021).
Acknowledgements
This work has been supported (in part) by the Research Training Group “Biostatistical Methods for High-
Dimensional Data in Toxicology” (RTG 2624, Project P2) funded by the Deutsche Forschungsgemeinschaft 
(DFG, German Research Foundation-Project Number 427806116).
Author contributions
J.R. conceived the original idea. C.D. performed the calculations with the help of H.K. J.D. wrote the manuscript 
with input from all authors. J.R. and F.K. supervised the work, revised the manuscript, and proposed analyses 
in discussions with J.D. and C.D. All authors read and improved the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Competing interests 
The authors declare no competing interests.
Additional information
Supplementary Information The online version contains supplementary material available at https://d oi. org/ 
10. 1038/s 41598-0 23- 47057-0.
Correspondence and requests for materials should be addressed to J.C.D.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and 
institutional affiliations.
Open Access  This article is licensed under a Creative Commons Attribution 4.0 International 
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or 
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the 
Creative Commons licence, and indicate if changes were made. The images or other third party material in this 
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the 
material. If material is not included in the article’s Creative Commons licence and your intended use is not 
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. To view a copy of this licence, visit http://c reati vecom mons.o rg/l icens es/ by/4.0 /.
© The Author(s) 2023
Scientific Reports |        (2023) 13:20804  | https://doi.org/10.1038/s41598-023-47057-0 12
Vol:.(1234567890)

Article 4
Bayesian non-linear subspace shrinkage using horseshoe
priors
Julia Christin Duda duda@statistik.tu-dortmund.de
Department of Statistics, TU Dortmund University, Dortmund, Germany
Matthew Wheeler matt.wheeler@nih.gov
Biostatistics and Computational Biology Branch, National Institute of Environmental
Health Sciences, Research Triangle Park, Durham, North Carolina, U.S.A.
2024/06/05
Abstract
When modeling biological responses using Bayesian non-parametric regression, prior
information may be available on the shape of the response in the form of non-linear function
spaces that define the general shape of the response. To incorporate such information into
the analysis, we develop a non-linear functional shrinkage (NLFS) approach that uniformly
shrinks the non-parametric fitted function into a non-linear function space while allowing for
fits outside of this space when the data suggest alternative shapes. This approach extends
existing functional shrinkage approaches into linear subspaces to shrinkage into non-linear
function spaces using a Taylor series expansion and corresponding updating of non-linear
parameters. We demonstrate this general approach on the Hill model, a popular, biologically
motivated model, and show that shrinkage into combined function spaces, i.e., where one has
two or more non-linear functions a priori, is straightforward. We demonstrate this approach
through synthetic and real data. Computational details on the underlying MCMC sampling
are provided with data and analysis available in an online supplement.
1 1 Introduction
2 When modeling complex biological systems, mechanistic knowledge about the system under in-
3 vestigation is often available; however, including this information in a statistical model may be im-
4 possible due to the system’s complexity in relation to experimental and computational resources
5 [Mesarovic et al., 2004]. Often, simplified models are used in lieu of the true mechanistic model
6 [Šimon, 2005]. When using these simplified models, one expects them to describe the observed
1
1 INTRODUCTION
7 data correctly or be mildly misspecified, and in the case of misspecification, the model may still
8 be helpful in describing the response.
9 When modeling biological systems, an example of this situation is the use of the Hill model.
10 This model, which represents sigmoidal-shaped responses, is a simplification of the complex
11 biochemical process based upon chemical kinetics [Hill, 1910] and is used to model a wide variety
12 of biochemical processes [Goutelle et al., 2008]. Despite its widespread use, it may not always
13 represent the observed response. Non-monotone deviations of the Hill’s functional form may be
14 evident in the data. Additionally, other competing models may also be available, and the modeler
15 might like to include this information to inform the fitting process, too. We develop a framework
16 that allows one to define a subspace over one or more function spaces of interest for Bayesian
17 non-parametric regression.
18 From the Bayesian perspective, there is a rich literature on approaches incorporating prior
19 knowledge in non-parametric regression. Naively, one may center the non-parametric model on
20 the specified parametric function. When the parametric data-generating mechanism’s mean is
21 the known parametric model, ensuring that estimates do not contain artifactual deviations from
22 that model is difficult, implying that shrinkage to the prior model will not be uniform. Further, using
23 this method, there is no way to create a space based on multiple parametric functions. More
24 sophisticated approaches use shape constraints induced through the prior distribution, which
25 include monotonicity or limits to the number of extrema [Brezger and Steiner, 2008, Shively et al.,
26 2009, Meyer, 2008, Shively et al., 2011, Meyer et al., 2011, Gunn and Dunson, 2005, Köllmann
27 et al., 2014, Wheeler et al., 2017]. Though these approaches are often effective, they do not
28 directly incorporate parametric modeling information on the shape of the model; they force the
29 response to be in the constrained space by putting a prior mass of zero on all responses outside
30 of that space.
31 Alternatively, one may merge mechanistic prior knowledge into a model is through ordinary
32 differential equations (ODEs) within a Bayesian framework. Parametric Bayesian models include
33 pharmacokinetic/pharmacodynamic modeling, discussed by Lunn et al. [2002], and Huang et al.
34 [2006] present an HIV-modeling example using Bayesian hierarchical models with non-linear dif-
35 ferential equations. More flexible non-parametric approaches use differential equations to inform
36 stochastic processes with induced constraints [Golightly andWilkinson, 2011, Titsias et al., 2012].
37 While Alvarez et al. [2013] and Wheeler et al. [2014] proposed a Gaussian Process (GP) ap-
38 proach that incorporates mechanistic knowledge defined by differential equations. More recently,
39 Chen et al. [2022] incorporate mechanistic knowledge defined by linear or non-linear partial dif-
40 ferential equations (PDE) into a GP framework by selecting PDE points, i.e. pseudo covariate
2
2 MODEL
41 points through which the assumed PDE information is incorporated. Like the shape-constrained
42 approaches, these methods form a Basis expansion consistent with a subspace defined using
43 mechanistic knowledge. Thus, these priors imply that an estimated function is within the given
44 subspace, and they do not allow for deviations outside of this space.
45 We define a prior distribution over a non-linear subspace - such as the Hill model and power
46 model - that does not require a fitted function to be within that subspace. When the non-linear
47 subspace is correctly specified, shrinkage into it occurs; but, when the true model is outside
48 of the subspace, the approach is unconstrained. We build upon the work of Shin et al. [2020]
49 who introduced the functional horseshoe (fHS) prior for linear spaces. The fHS prior shrinks the
50 non-parametric fit towards a pre-specified, linear subspace. This approach is different from well-
51 known shrinkage approaches such as Ridge, Lasso or Horseshoe [Hoerl and Kennard, 1970,
52 Tibshirani, 1996, Carvalho et al., 2010], which shrink model coefficients in a non-parametric
53 regression towards the origin. The prior of Shin et al. [2020] has the appealing property that
54 the posterior shrinks into the pre-specified subspace f it is consistent with the observed data or,
55 alternatively, is left unconstrained otherwise. The shrinkage occurs at the minimax optimal rate.
56 In our extension, we use a Taylor expansion to locally linearize the response function, where
57 the derivatives depend on parameters of the non-linear model. The extension allows functional
58 shrinkage into a non-linear function space or adapts the function to be outside of the non-linear
59 space. The relevant non-linear function space is specified a priori using one or more parametric
60 models.
61 We present our shrinking approach in Section 2. Section 3 then illustrates the approach both
62 for the case of shrinkage into a single function space - shown for the Hill model - and into a
63 combined function space - shown for the Hill and the power models. We compare our method
64 against other parametric and non-parametric approaches in a simulation study in Section 4. We
65 apply our method to a real-world data example of total testosterone levels measured in 9943
66 males aged between 3 and 85 years in section 5. The computational back-bone of the approach
67 is MCMC sampling combining Gibbs-, Metropolis-Hastings- and Slice-sampling [Brooks et al.,
68 2011, Neal, 2003], detailed in the supplementary material.
69 2 Model
70 2.1 Spline Model
71 Consider the non-parametric regression problem
yi = g(xi) + εi, (1)
3
2.2 Bayesian Priors for a Non-linear Subspace 2 MODEL
72 with unknown mean function g : R → R. We observe y = (y1, . . . , y )′n corresponding to co-
73 variates iidx = (x1, . . . , x ′n) and wish to estimate g. Assuming εi ∼ N(0, σ2), it is common to
74 approximate g using a B-spline basis expansion [Carl, 2001], i.e.,
∑k
f(x ) = ϕji m(xi)βm, (2)
m=1
75 or f(x) = Φβ. Here, the B-spline basis ϕjk(x) are of order j defined on k
∗ internal knots, where
∗
76 k = k +j, β = (β1, . . . , β )
⊤
k denotes the vector of basis coefficients. We consider cubic splines
77 and omit the superscript j = 3. With a dense knot set, the spline approximation f can be made
78 to be arbitrarily close to any continuous g, allowing one to estimate a large space of functions to
79 arbitrary precision.
80 2.2 Bayesian Priors for a Non-linear Subspace
81 For many prior specifications, the expansion in (2) may not place high prior probability on biologi-
82 cally relevant responses. To define a biologically relevant model, we construct a prior distribution
83 that places significant prior mass on the function space defined by the non-linear model, e.g., the
84 space of Hill models, but does not put zero mass outside the function space.
85 To do this, assume knowledge about the shape of g through a twice differentiable function
86 hθ : R → R. The function hθ depends on parameter vector θ, and defines the function space
ΩΘ87 0 = {hθ|θ ∈ Θ} for all realizations Θ ⊆ Rs. If the true mean function g happens to be outside
88 ΩΘ, shrinkage towards ΩΘ0 0 is undesirable. Given a dense knot set, the spline f can approximate
89 hθ for any ϵ−ball. Consequently, the space of functions represented by the spline contains ΩΘ0 .
90 We define a prior for (2) that places prior mass on ΩΘ0 , but does not limit responses to be only in
Θ
91 Ω0 .
92 To define this prior, we consider Shin et al. [2020], who defined a projection prior that shrinks
93 into the linear column space defined by the matrix Φ ∈ Rn×d0 through
( )
p(β|σ2, τ2) ∝ 1(τ2)−(k−d0)/2 exp − β⊤Φ⊤(I − PΦ0)Φβ , (3)2σ2τ2
94 where d0 = rank(Φ0), Φ0 is constructed as a linear space of known covariates, and PΦ0 is the
95 orthogonal projection matrix into the column space of Φ0. The hyperparameter , τ, is given a
96 generalized horseshoe (HS) prior with hyperparameters a and b (cf. Shin et al. [2020]). When
97 a = b = 0.5 the prior is a half-Cauchy distribution, and one arrives at the HS prior [Carvalho
98 et al., 2010].
99 In (3), one constructs PΦ0 using the linear column space of Φ0. Given our space is non-linear,
100 there is no direct analogue to PΦ0 . As an approximation, we use a Taylor series approximation of
4
3 NON-LINEAR FUNCTIONAL SHRINKAGE FOR SINGLE OR COMBINED FUNCTION
SPACES
101 hθ0 , θ0 ∈ Θ. That is, we linearly approximate ΩΘ0 at any θ0 using a first-order Taylor expansion
hθ(x) ≈ hθ0(x) + Ḣθ0(x)(θ − θ0) (4)
∣
102 where Ḣ (x) = ∂hθ(x) ∣∣ ∈ Rn×sθ0 ∂θ is the Jacobian containing the partial derivatives of hθθ=θ0
103 evaluated at θ0. The column space of Ḣθ(x) approximates hθ(x) [Seber and Wild, 2003][p. 130]
104 and we use Ḣθ(x) to construct Pθ = PḢ . Thus, for any θ0, we project f(x) onto the space locallyθ
105 approximating hθ0 . When there are multiple function spaces to consider, the same approach
106 applies; in this case, operator PḢ defines the projection into a combined linear space, where Ḣθθ
107 represents the Jacobian across all assumed functions.
108 We place the prior
( )
p(β| 1σ2, τ2, θ) ∝ (τ2)−k/2 exp − β⊤Φ⊤(I − Pθ)Φβ (5)
2σ2τ2
109 over β to shrink realizations of (2) into ΩΘ0 . In (5), θ is given an appropriate prior to complete
110 the specification. This approach penalizes deviations of Φβ based upon the projection operator
111 (I−Pθ). As we shrink back to a planar approximation given a specific θ0, we require appropriately
112 specified priors for the non-linear parameters in Θ. As β is defined conditional on θ through the
113 linear projection operator Pθ, only priors for the non-linear parameters can be learned.
114 In this formulation, (τ2)−(k−d0)/2 in (3) becomes (τ2)−k/2 because we separately model the
115 intercept (cf. Section 3.1). This change yields proper priors as due to the non-linearity, no linear
116 basis of Φ is in (I − Pθ) and Φ⊤(I − Pθ)Φ has full rank.
117 3 Non-linear functional shrinkage for single or combined function
118 spaces
119 3.1 Single function spaces
120 As an example of non-linear functional shrinkage using a single function, we consider the Hill
121 model. This function is given by
xθ4
h(x, θ) = θ1 + θ2 , (6)
θ θ4 θ3 + x 4
122 where θ1 is the background response at x = 0, θ2 is the maximal change in the response, θ3 is
123 the dose where half of this change is reached and θ4 defines the steepness of the curve. The
124 Jacobian, Ḣθ, is
∣
∂h(x, θ) ∣ ( )∣
∣∣ = 1 x
θ4 −θ ∣4 , (7)
∂θ xθ4+θ θ
θ2 θ s(x, θ3, θ4) θ2 log(θ3/x)s(x, θ3, θ4) ∣
θ=θ 3
4 3 θ=θ
0 0
5
3 NON-LINEAR FUNCTIONAL SHRINKAGE FOR SINGLE OR COMBINED FUNCTION
3.2 Combined function spaces SPACES
125 with s(x, θ , θ ) = ((xθ −1)θ43 4 3 + 1)−1((θ −1 θ4 −13x ) + 1) . The derivative matrix does not depend
126 upon the linear parameters θ1, but it still depends on θ2. However, Pθ does not depend on θ1 and
127 θ2 (cf. Lemma 1 in the appendix), which gives a direct example of why we do not place a prior over
128 these linear quantities. Of the parameters in (7), parameter θ3 is of particular interest because
129 it represents the value of x that produces a response that is the average of the lower and upper
130 asymptote. Values of the covariate below θ3 correspond to values of the response less than 50%
131 of the maximal response. Further, θ4 corresponds to the steepness of the response and speed
132 of a chemical reaction in a biological substrate. As both quantities have direct interpretation,
133 informative priors can be developed for these quantities accordingly, which in turn informs the
134 subspace the model may shrink into.
135 To specify the hyperprior over (θ3, θ4), we assume x ∈ [0, 1], and let E[θ3] = 0.5, the mid-
136 point, and for θ4, we center it on 3, letting the parameter vary within a range that we have often
137 seen in bioassays. In our model, θ1 enters as the intercept, and θ2, the maximal response
138 change, implicitly enters the model through the β coefficients. Using the Hill model as a prior to
139 define (5), we complete the prior specification as
(y|β, σ2, θ ) ∼ N(θ +Φβ, σ21 1 In) (8)
θ1 ∼ N(0, 20), θ3 ∼ N+(0.5, 0.05) θ4 ∼ LN(0.95, 0.29) (9)
σ2 ∼ IG(0.001, 0.001), (10)
140 where N+(a, b) is a truncated normal distribution with mean a and variance b (before truncation),
141 LN(a, b) is a log-normal distribution with log-mean a and log-variance b, and IG(a, b) is an
142 inverse-gamma distribution with shape a and scale b. Note that θ4 ∼ LN(0.95, 0.29) results in
143 E[θ4] = 3 and V [θ4] = 3.
144 3.2 Combined function spaces
145 If one desires multiple functions to define in the function space because of uncertainty in the
146 function space, one can add multiple functions. Here, assume there are r ∈ {1, . . . , R} = R
function spaces (r) { (r)147 Ω0 = hθ |θ ∈ Θ(r)} of interest; we omit the index r on each θ for simplicity.
(r)
148 For each (r)Ω0 , calculate the Jacobian, Ḣθ , i.e.,
(R) (1) (R)
Ḣθ = (Ḣθ . . . Ḣθ ),
(R)
149 and use this to construct Pθ. The Jacobian, Ḣθ , must be full rank without linear bases other
150 than an intercept column for Equ. 5 to hold.
151 To illustrate the combined subspace shrinkage approach, we use the Hill and power models.
152 The latter function defined as as hθ(x) = θ1 + θ xθ32 , which has only one non-linear parameter,
6
4 SIMULATION STUDY
153 θ3, that requires a prior specification. We use θ3 ∼ N(0.5, 0.25), to center on a concave shape.
154 The partial derivatives of the power model are
∂h(x, θ) ( )
= 1 xθ
(1)
3 log(x)xθ3 = Ḣθ . (11)∂θ
(1) (2)
155 Prior to combining Ḣθ of the power model and Ḣθ of the Hill model (Eq. 7), we remove the
(1)
156 intercept from Ḣθ to obtain a full rank. Shrinkage into the combined subspaces is equivalent to
157 shrinkage into a single subspace.
158 4 Simulation Study
159 4.1 Setup
160 We perform a simulation study and evaluate the performance of the proposed approach against
161 other fitting strategies. Full details of the simulation design are summarized by the ADEMP
162 principle described in Morris et al. [2019] (Table S2). We generate data using three parametric
163 cases: the Hill model, the power model, and a misspecified model (the Hill model with downturn).
164 We look at exposure-response data as, for such data, chemical kinetics of exposure are often
165 approximated by the Hill model, but the results generalize to other domains.
166 For each data set, we draw x ∈ [0, 1] uniformly for n ∈ {50, 100, 200, 500} observations,
167 where 50 is a realistic assay size and larger n are chosen to study the large sample behavior.
168 Mean zero normal noise with variance σ2 = 0.005 and a larger noise σ2 = 0.05 is added. These
169 variances represent a 2-SD spread that is approximately 14% and 45% of the maximal response.
170 In total, 24 data generating scenarios are used, with nrep = 1000 simulations per scenario. For
171 each dataset, we apply the following methods:
172 4.1.1 Modeling Approaches
173 Non-linear functional shrinkage (NLFS)
174
175 Non-linear functional shrinkage is performed with shrinkage into the Hill space (NLFS(Hill)),
176 power space (NLFS(power)), or a combination (cf. Section 3.2) of both function spaces (NLFS(Hill+power)).
2
177 Two variations for the shrinkage parameter τ are considered. One uses a half Cauchy prior
178 (a = b = 0.5) and is implemented according to Makalic and Schmidt [2015]; the other, imple-
179 mented ourselves using slice sampling [Neal, 2003], uses a ω ∼ Beta(a, b) prior where a = 0.5
180 and b = exp(−k log(n)/2) as proposed by Shin et al. [2020] and k is the number of knots.
7
4.1 Setup 4 SIMULATION STUDY
181 Parametric Model (Param.)
182
183 We investigate the performance fitting of the Oracle model using Bayesian parametric regression
184 for the Hill model (Param.(Hill)) or the power model (Param.(power)) (priors in Table S2). Fitting
185 these models allows us to compare the performance of the Oracle NLFS to the Oracle parametric
186 model.
187 B-splines
188
189 Bayesian B-splines with a scaling parameter λ2 ∼ IG(0.001, 0.001) where the spline coefficients
190 are given by the prior β ∼ N(0, σ2λ2diag(k)). This model represents a basis approach without
191 smoothing and is used to compare the performance of the NLFS approach when the shrinkage
192 subspace is misspecified.
193 P-splines
194
195 Penalized Bayesian smoothing splines where β ∼ N(0, σ2τ2K−1) where K = R⊤R and R is
196 a second order penalty matrix and τ2 ∼ IG(1, 0.005), similar to the hyperparameter choices in
197 Lang and Brezger [2004]. This approach builds upon the B-spline approach, adding a smoothing
198 component, and typically performs better in practice than B-splines
199 Parametric Model + horseshoe B-spline
200
201 We also consider a model that includes the true parametric model plus an additional B-spline to
202 account for model misspecification, i.e., y = hθ(x) + Φβ + ε. When hθ(x) specifies the correct
203 model, one has β = 0; otherwise, β ̸= 0. To shrink the β coefficients to zero, we use a horseshoe
204 prior, i.e., β ∼ iidN(0, σ2τ2diag(λ 2 21 , . . . , λk )) where τ ∼ C+(0, 1) and λj ∼ C+(0, 1), cf. Makalic
205 and Schmidt [2015]. C+(0, 1) denotes a standard Half-Cauchy prior. As in the parametric model
206 case, hθ is either the parametric Hill (Param.(Hill)+B-spline) or power model (Param.(power)+B-
207 spline). This approach represents a direct competitor to the NLFS approach.
8
4.2 Results 4 SIMULATION STUDY
208 4.1.2 Further Considerations
209
210 For all simulations, we use k = 15 inner knots for the B-spline basis matrix. When MCMC
211 sampling, we took 10,000 draws from the posterior, discarding the first 2000 samples as burn-in.
212 Initial experiments indicated that this number of samples was adequate to estimate the posterior
213 distribution. For the spline-based approaches (NLFS, B-spline, P-spline), we place a vague prior
214 on the intercept term, θ1, defined in (8). Traceplots, of an NLFS fit with correctly and incorrectly
215 specified subspaces, are given in the supplement (supplemental Figures 4 and 5) and show
216 convergence.
217 4.2 Results
218 Figure 1 gives representative results of the simulation, where all results are provided in the sup-
219 plement. Unsurprisingly, when the Hill model is the truth (Figure 1a), we observe the largest
220 RMSE of 0.151 when fitting the misspecified parametric model (Param.(power)). Unlike the
221 misspecified parametric fits, when the function space is misspecified in the NLFS approach
222 (NLFS(power)), the RMSE is approximately one-third (0.046) that from fitting the misspecified
223 parametric model, indicating the NLFS approach adjusts to the data. In this scenario, the
224 NLFS(power) performance with mean RMSE of 0.046 was similar to that of the B-spline approach
225 with mean RMSE of 0.042.
226 When the correct function space is assumed for the NLFS prior, the RMSE drops to 0.019
227 (Figure 1a), as low as that of the 0racle parametric fit. This demonstrates the adaptive shrinkage
228 behavior of the NLFS approach in the case of correct subspace specification. Here, the NLFS
229 approach effectively shrank towards the correctly assumed space for sample sizes as low as
230 n = 50, and performed similarly to the oracle parametric model fit. The P-spline approach
231 receives no prior model or subspace specification but yields smooth splines. Consequently, its
232 performance was in between the approach with misspecification and correct specification.
233 When the correct function space is assumed, the NLFS approach tended to outperform the
234 parametric + horseshoe B-spline (PHBspline) approach. The PHBspline approach does not en-
235 force an equally smooth, global shrinkage of all β towards zero, especially when there are lever-
236 age points far from the observed mean.
237 When all assumed models or spaces are misspecified (Figure 1b), the NLFS approach was
238 outperformed by the PHBspline approach for the same model misspecification, i.e., NLFS(power)
239 was outperformed by param.(power) + horseshoe B-spline and NLFS(Hill) was outperformed by
240 param.(Hill) + horseshoe B-spline. However, the NLFS approach in general has an advantage
9
4.2 Results 4 SIMULATION STUDY
a) Truth: Hill (n=50)
0.100
0.030
0.010
0.003
Param. Param. NLFS P−Spline NLFS* Param.* Param. NLFS
(power) (power) (power) (Hill) (Hill) (Hill) (Hill+power)
+horseshoe +horseshoe
B−Spline B−spline
b) Truth: Hill + Downturn (n=50)
0.10
0.05
0.03
Param. Param. NLFS P−Spline NLFS Param. Param. NLFS
(power) (power) (power) (Hill) (Hill) (Hill) (Hill+power)
+horseshoe +horseshoe
B−Spline B−spline
Figure 1: Representative root mean square error (RMSE) results of the simulation for the scenar-
ios where the truth is Hill (a) or Hill + Downturn (b) and a medium noise level. Pane b represents
the situation where a deviation of an unknown shape is the truth. All simulation results can be
found in Tables S3 and S4. ∗Correct model used in the model fit.
10
RMSE RMSE
5 REAL DATA EXAMPLE
241 in misspecification scenarios due to its inherent flexibility to shrink toward combined function
242 spaces. The NLFS(Hill+power) outperformed all other approaches in this scenario with a mean
243 RMSE of 0.028. Only the P-spline approach came close, showing a slightly weaker performance
244 with a mean RMSE of 0.030.
245 The NLFS prior appropriately shrinks into the correct space, giving equivalent fits to the para-
246 metric Hill model (Figure 2a). Correspondingly, the P-Spline smoothing approach estimate shows
247 various artifactual bumps not evident in the NLFS approach. Even if the true model is the Hill
248 model, the naive PHBspline approach produces an artifactual bump in the asymptote region that
249 does not occur with the NLFS fit (Figure 2d). The NLFS approach and generic B-spline are
250 equivalent when the model is misspecified (i.e., fits a model outside of the assumed space) (Fig-
251 ure 2b). The NLFS(Hill+power) approach fits a model outside of the Hill space (Figure 2c) and
252 illustrates how shrinkage into combined subspaces can reduce misspecification errors involving
253 minor deviations.
254 5 Real Data Example
255 We applied the proposed method on a testosterone data set collected by Kelsey et al. [2014].
256 They modeled total testosterone (TT) concentration in male participants dependent on age, to
257 identify normal TT ranges at any age. TT levels are the result of highly complex physiological
258 processes, mechanistic models are not available. TT is expected to increase during puberty,
259 reach a maximum and possibly slowly decline with age, which is a sigmoidal assumption. Con-
260 sequently, we a priori assume the Hill based shape through the NLFS prior.
261 Kelsey et al. [2014] collected data from 13 studies on TT by age, yielding>10.000 data points;
262 they then fit 330 polynomial models and selected a single best parametric model based on the
2
263 best R with 5-fold cross-validation. Due to the large number of data points, spline smoothing
264 approaches tend to produce local artifacts that are biologically unreasonable. The NLFS ap-
265 proach offers an alternative to extensive model comparisons while simultaneously incorporating
266 knowledge on the curves shape.
267 For the analysis, we set
θ3 ∼ N(15, 4), (12)
θ4 ∼ LN(2.28, 0.05). (13)
268 Since testosterone levels increase during puberty, a priori we assume TT levels to reach half
269 maximal levels (e.g., θ3) at approximately 15 years with a standard deviation of 2 years. For the
270 steepness parameter, θ4, we set the log-mean to 2.28 and the log-variance to 0.05 implying that
11
5 REAL DATA EXAMPLE
a) Truth = Hill, n = 50 b) Truth = Hill, n = 50
NLFS(Hill) NLFS(power)
Param.(Hill) B−Spline
P−Spline Param.(power)
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Dose Dose
c) Truth = Hill + Downturn, n = 50 d) Truth = Hill, n = 50
NLFS(Hill+power) NLFS(Hill): Post. mean 
P−Spline NLFS(Hill): 95% CI
Truth Param.(Hill) + B−spline: Post. mean Param.(Hill) + B−spline: 95% CI
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Dose Dose
Figure 2: Posterior mean responses (a-d) and credible intervals (d) of representative simulation
runs with median noise level (σ2 = 0.005): (a) Oracle scenario. (b) misspecification scenario. (c)
deviations of unknown shape. Combined subspace shrinkage reduces misspecification errors.
(d) Comparison of credible intervals between NLFS and PHBspline in the Oracle scenario.
12
Posterior mean Posterior mean
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Response Posterior mean
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
5 REAL DATA EXAMPLE
NLFS(Hill)
Literature model
P−spline
0 20 40 60 80
Age [years]
Figure 3: Comparison of the NLFS approach to a P-spline fit and the best parametric model
selected by Kelsey et al. [2014]. For the P-spline and NLFS(Hill), the posterior mean is displayed.
The parametric model by Kelsey is log10(TT + 1) = (a+ cx+ ex
2 + gx3)/(1 + bx+ dx2 + fx2)
with a = 0.04655, b = −0.05311, c = 0.05123, −d = 0.00793, e = −0.01222, f = 0.00058 and
g = 0.00069. For display, the model is backtransformed using exp(log10(TT + 1) + σ̂
2/2) where
σ̂2 is the estimated residual variance.
271 the actual mean and variance to be 10 and 5, respectively. We choose these values as we expect
272 TT to increase rapidly upon the onset of puberty.
273 We fit our model to the slightly reduced data set of men of age 85 or younger (98.5% of
274 original data) because of extreme variability in the approximately 150 observations above 85. To
model the noise, we let σ2 ∼ C+275 (0, 1), to account for the large variance in the data.
276 The NLFS and parametric fit by Kelsey et al. [2014] are roughly sigmoidal (Figure 3). They
277 both show a peak around age 19, followed by a slight descent that eventually plateaus. The NLFS
278 approach notably predicts a larger mean testosterone level than the parametric model. The P-
279 spline fit has a less pronounced peak TT around age 20 and has artifactual bumps that oscillate
280 around the NLFS estimate. For each method, the observed RMSE values were 5.123 (NLFS),
281 5.154 (parametric), and 5.111 (P-spline) and therefore similar, but the three resulting model fits
282 are visibly distinguishable. Arguably, the NLFS approach more appropriately models the mean
283 than the P-spline due to the latter’s bumpiness. Further, NLFS estimates a higher mean TT level
284 than the parametric model, which suggests there may be some underestimation of the mean TT
285 when using a parametric approach.
13
Total Testosterone [nmol/L]
0 10 20 30 40 50
6 DISCUSSION
286 6 Discussion
287 The NLFS prior enables adaptive shrinkage into a pre-specified non-linear function space but
288 does not constrain the resulting function to be in that space if the data are not compatible with
289 that a priori function space. This approach can be applied to function spaces defined by any
290 twice differentiable function. Because such a setting is common, the NLFS approach balances
291 adhering to prior assumptions and accounting for model misspecification.
292 The NLFS approach can shrink into a combined function space, thereby providing robustness
293 against misspecification. This benefit is supported by simulation results, where NLFS combined
294 function space prior outperformed all other methods under model misspecification. Defining such
295 a prior is straightforward for the NLFS approach. Attempting to account for parametric misspec-
296 ification by including a spline that shrinks to zero if the model is correctly specified can give
297 artifactual features in the Oracle scenario. As a comparison, the NLFS prior gave slightly better
298 RMSE results under the Oracle scenario, and provided more realistic curve fits without the ar-
299 tifactual features. Though NLFS with a misspecified single subspace performed slightly worse,
300 adding subspaces in NLFS did not lead to a relevant performance loss compared to only including
301 the correct model but also robustified NLFS against misspecification.
302 When modeling the TT data in Kelsey et al. [2014], the NLFS yielded a plausible non-
303 parametric estimate that did not produce artifactual features. In this regard, NLFS provided
304 equally reasonable mean estimates as the parametric model, while not requiring a model selec-
305 tion procedure on over 300 models.
306 Simulation results empirically show that NLFS correctly decides to either shrink towards the
307 specified subspace, or remain unconstrained. Though we have not provided a theoretical proof,
308 our simulation results suggest that the optimal, theoretical shrinkage properties given by Shin
309 et al. [2020] approximately hold in the non-linear case. Because it performs similarly to an un-
310 smoothed spline estimate, adding a smoothness penalty similar to the one proposed by Wiemann
311 and Kneib [2021] for linear subspace shrinkage may be a promising extension.
312 The construction of NLFS assumes the independent variable to be continuously distributed,
313 with a unique covariate value for each observation. In some biological applications, data are
314 generated in a planned experimental setting, with multiple units treated at few distinct exposure
315 levels. For such experiments, the exposure is typically a dose or concentration. Dose-response
316 modeling is often performed in terms of a simple parametric Hill model fit, which can lead to
317 misspecification errors that could be prevented using NLFS. Tailoring NLFS to a such a data
318 structure is necessary. This can be done using a grid that defines the shrinkage locations, such
319 that the shrinkage is independent of the few experimentally selected doses. Precisely, Φθ would
14
6 DISCUSSION
320 be evaluated at a grid instead of the observed exposure levels. This avoids a lack of shrink-
321 age at basis functions that fall between exposure levels. This extension would yield more model
322 parameters related to the construction of the grid. Another extension using a fully specified Gaus-
323 sian processes is an alternative and would reduce hyperparameter choices on knot sequences
324 and shrinkage grids. Another extension is to account for heteroscedasticity. For non-parametric
325 Bayesian modeling, different methodologies can be applied, e.g. Dirichlet process priors. Other
326 computational challenges in the NLFS approach relate to the derivatives. For example, using the
327 Hill model, derivatives w.r.t. the non-linear parameters can be almost linearly dependent. Careful
328 prior selection or expanding the shrinkage onto additional subspaces might soften this challenge.
15
A TABLES
329 Acknowledgements
330 This manuscript was funded in part by the Research Training Group “Biostatistical Methods for
331 High-Dimensional Data in Toxicology” (RTG 2624) funded by the Deutsche Forschungsgemein-
332 schaft (DFG, German Research Foundation-Project Number 427806116) and intramural funds
333 at the NIEHS.
334 Supporting Information
335 The code and data underlying this article are available on GitLab at
336 https://gitlab.tu-dortmund.de/functional shrinkage/nonlinear shrinkage.
337 A Tables
Table 1: Overview of method settings used in the simulation study.
Algorithm Assume Shrinkage (Horseshoe prior)
1 NLFS Hill a = b = 0.5
2 NLFS Hill a = 0.5, b = exp(−k log(n)/2)
3 NLFS Hill & power a = b = 0.5
4 NLFS Hill & power a = 0.5, b = exp(−k log(n)/2)
5 NLFS power a = b = 0.5
6 NLFS power a = 0.5, b = exp(−k log(n)/2)
7 parametric Hill -
8 parametric power -
9 B-spline - -
10 P-spline - -
11 Parametric + B-spline Hill a = b = 0.5
12 Parametric + B-spline power a = b = 0.5
16
A TABLES
Table 2: Simulation setup summarized by the ADEMP principle.
Aim Comparing proposed approach against existing approaches
Data generation Dose-response models:
- Hill: h (x) = x
θ4
θ θ (θ4 θ 3 = 0.3, θ4 = 6)θ +x 43
- Power: h θ2θ(x) = θ1x , (θ2 = 0.5)
- Hill + Downturn: hθ(x) = hHillθ (x) + 1[0.6,∞)(x)(−1.5(x− 0.6)2) (θ3 = 0.3, θ4 = 6)
Doses: Unif∼ [0, 1]
Sample sizes: n ∈ {50, 100, 200, 500}
Added noise: ε ∼ N(0, σ2), σ2 ∈ {0.005, 0.05}
Estimand Mean of posterior dose-response function estimate f(x)
Methods Non-linear functional shrinkage (NLFS) (θ1 ∼ N(0, 1), σ2 ∼ IG(0.001, 0.001))
- assuming Hill (NLFS (Hill))
Priors: θ3 ∼ N+(0.5, 0.05), θ4 ∼ LN s.t. E(θ4) = 3, Var(θ4) = 3
- assuming power (NLFS (power))
Priors: θ3 ∼ N(0.5, 0.25)
- assuming Hill and power (NLFS (Hill+power))
Priors: As in NLFS(Hill) and NLFS(power)
Parametric Bayesian fit (Param.) (θ1 ∼ N(0, 1), log(σ2) ∼ N(−1.75, 1))
- assuming Hill (Param.(Hill))
Priors: θ3 ∼ N+(0.5, 0.05), θ4 ∼ LN s.t. E(θ4) = 3, Var(θ4) = 3
- assuming power (Param.(power))
Priors: θ ∼ N(0.5, 0.25)
B-spline
Priors: θ1 ∼ N(0, 1), σ2 ∼ IG(0.001, 0.001), λ2 ∼ IG(0.001, 0.001)
P-spline
Priors: θ1 ∼ N(0, 1), σ2 ∼ IG(0.001, 0.001), τ2 ∼ IG(1, 0.005)
Parametric + horseshoe B-spline
y = hθ(x) + Φ(β) + ε
Priors:
β ∼ N(0, σ2τ2diag(λ2 21, . . . , λk))
τ ∼ C+(0, 1), iidλ +j ∼ C (0, 1)
θ1 ∼ N(0, 1), θ2 ∼ N(1.5, 2) (Scaling)
- assuming Hill (Param.(Hill) + B-spline))
Prior: θ3 ∼ N+(0.5, 0.05), θ4 ∼ LN s.t. E(θ4) = 3, Var(θ4) = 3
- assuming power (Param.(power) + B-spline)
Prior: θ ∼ N(0.5, 0.25)
Performance RMSE between posterior mean E(f(x)|dis) and true g(x) evaluated at drawn doses x ∈
[0, 1]n
17
A TABLES
18
Table 3: Simulation results for σ2 = 0.005 (2σ = 14.1% of maximal effect) summarized by mean RMSE and corresponding standard deviation in
parenthesis. OS means own slice and refers to setting a and b for the shrinkage parameters as suggested in Shin et al. [2020] whereas HC means Half
Cauchy and refers to the standard horseshoe prior with a = b = 0.5.
n=50 n=100 n=200 n=500
Method Hill power Hill+down Hill power Hill+down Hill power Hill+down Hill power Hill+down
1 NLFS(Hill), OS 0.019 0.027 0.037 0.014 0.017 0.028 0.009 0.011 0.024 0.006 0.008 0.017
(0.007) (0.008) (0.008) (0.005) (0.005) (0.005) (0.003) (0.003) (0.002) (0.002) (0.002) (0.004)
2 NLFS(power), OS 0.046 0.018 0.053 0.031 0.013 0.031 0.023 0.009 0.022 0.015 0.006 0.015
(0.013) (0.006) (0.017) (0.005) (0.005) (0.005) (0.004) (0.003) (0.004) (0.002) (0.002) (0.002)
3 NLFS(Hill+power), OS 0.021 0.022 0.028 0.015 0.015 0.023 0.01 0.011 0.018 0.007 0.007 0.011
(0.007) (0.007) (0.006) (0.005) (0.005) (0.004) (0.003) (0.003) (0.003) (0.002) (0.002) (0.003)
4 NLFS(Hill), HC 0.019 0.022 0.033 0.014 0.015 0.027 0.01 0.011 0.024 0.006 0.008 0.015
(0.007) (0.007) (0.007) (0.005) (0.005) (0.004) (0.003) (0.003) (0.002) (0.002) (0.002) (0.004)
5 NLFS(power), HC 0.046 0.017 0.054 0.03 0.012 0.03 0.021 0.009 0.022 0.013 0.006 0.013
(0.015) (0.006) (0.019) (0.005) (0.005) (0.005) (0.004) (0.003) (0.004) (0.002) (0.002) (0.002)
6 NLFS(Hill&power), HC 0.021 0.021 0.028 0.015 0.015 0.023 0.01 0.011 0.018 0.007 0.007 0.011
(0.007) (0.007) (0.006) (0.005) (0.005) (0.004) (0.003) (0.003) (0.003) (0.002) (0.002) (0.002)
7 param(Hill) 0.019 0.023 0.052 0.013 0.019 0.05 0.009 0.016 0.05 0.006 0.012 0.049
(0.007) (0.005) (0.008) (0.005) (0.004) (0.005) (0.003) (0.002) (0.003) (0.002) (0.001) (0.002)
8 param(power) 0.151 0.015 0.18 0.152 0.011 0.18 0.154 0.008 0.181 0.154 0.005 0.182
(0.011) (0.006) (0.011) (0.008) (0.005) (0.008) (0.005) (0.003) (0.006) (0.003) (0.002) (0.004)
9 bspline 0.042 0.042 0.042 0.032 0.033 0.032 0.023 0.024 0.023 0.014 0.014 0.014
(0.007) (0.008) (0.007) (0.005) (0.005) (0.005) (0.004) (0.004) (0.004) (0.002) (0.002) (0.002)
10 pspline 0.03 0.026 0.03 0.022 0.019 0.022 0.016 0.015 0.016 0.011 0.01 0.011
(0.007) (0.006) (0.007) (0.005) (0.004) (0.005) (0.003) (0.003) (0.003) (0.002) (0.002) (0.002)
11 param(Hill)+bspline 0.021 0.022 0.031 0.015 0.017 0.024 0.011 0.014 0.018 0.007 0.01 0.012
(0.007) (0.006) (0.006) (0.005) (0.004) (0.005) (0.003) (0.003) (0.003) (0.002) (0.002) (0.002)
12 param(power)+bspline 0.038 0.019 0.04 0.029 0.013 0.03 0.021 0.009 0.022 0.013 0.006 0.014
(0.007) (0.007) (0.007) (0.005) (0.005) (0.005) (0.004) (0.003) (0.004) (0.002) (0.002) (0.002)
A TABLES
19
Table 4: Simulation results for σ2 = 0.05 (2σ = 44.7% of maximal effect) summarized by mean RMSE and corresponding standard deviation in parenthesis.
OS means own slice and refers to setting a and b for the shrinkage parameters as suggested in Shin et al. [2020] whereas HC means Half Cauchy and
refers to the standard horseshoe prior with a = b = 0.5.
n=50 n=100 n=200 n=500
Method Hill power Hill+down Hill power Hill+down Hill power Hill+down Hill power Hill+down
1 NLFS(Hill), OS 0.059 0.081 0.076 0.043 0.058 0.063 0.03 0.039 0.051 0.019 0.022 0.036
(0.021) (0.021) (0.017) (0.015) (0.015) (0.012) (0.01) (0.011) (0.009) (0.007) (0.007) (0.007)
2 NLFS(power), OS 0.137 0.053 0.155 0.107 0.039 0.101 0.066 0.027 0.067 0.042 0.017 0.043
(0.015) (0.019) (0.022) (0.023) (0.014) (0.024) (0.012) (0.011) (0.012) (0.007) (0.007) (0.007)
3 NLFS(Hill+power), OS 0.064 0.066 0.072 0.046 0.046 0.055 0.032 0.031 0.041 0.02 0.02 0.03
(0.021) (0.021) (0.019) (0.015) (0.015) (0.014) (0.01) (0.011) (0.01) (0.007) (0.007) (0.006)
4 NLFS(Hill), HC 0.057 0.067 0.073 0.042 0.049 0.059 0.03 0.034 0.046 0.02 0.021 0.033
(0.02) (0.02) (0.017) (0.015) (0.015) (0.013) (0.01) (0.011) (0.01) (0.007) (0.007) (0.006)
5 NLFS(power), HC 0.133 0.048 0.141 0.098 0.036 0.097 0.066 0.026 0.072 0.042 0.017 0.043
(0.016) (0.02) (0.026) (0.021) (0.015) (0.019) (0.013) (0.011) (0.014) (0.007) (0.007) (0.008)
6 NLFS(Hill+power), HC 0.061 0.06 0.068 0.045 0.045 0.053 0.032 0.032 0.041 0.02 0.02 0.03
(0.02) (0.019) (0.019) (0.015) (0.015) (0.014) (0.01) (0.011) (0.01) (0.007) (0.007) (0.006)
7 param(Hill) 0.063 0.054 0.082 0.043 0.042 0.066 0.03 0.033 0.058 0.019 0.024 0.053
(0.021) (0.018) (0.016) (0.016) (0.013) (0.011) (0.011) (0.009) (0.006) (0.007) (0.005) (0.003)
8 param(power) 0.161 0.043 0.188 0.157 0.032 0.184 0.156 0.024 0.183 0.155 0.016 0.183
(0.012) (0.019) (0.012) (0.008) (0.014) (0.009) (0.005) (0.01) (0.006) (0.003) (0.007) (0.004)
9 bspline 0.112 0.107 0.111 0.088 0.084 0.086 0.068 0.067 0.067 0.047 0.048 0.046
(0.02) (0.022) (0.02) (0.014) (0.014) (0.014) (0.01) (0.01) (0.01) (0.006) (0.006) (0.006)
10 pspline 0.081 0.063 0.08 0.061 0.048 0.06 0.045 0.037 0.045 0.03 0.026 0.03
(0.019) (0.017) (0.019) (0.014) (0.013) (0.013) (0.01) (0.009) (0.01) (0.006) (0.006) (0.006)
11 param(Hill)+bspline, HC 0.072 0.06 0.081 0.05 0.045 0.061 0.034 0.034 0.047 0.022 0.023 0.033
(0.021) (0.02) (0.019) (0.016) (0.014) (0.014) (0.011) (0.01) (0.009) (0.007) (0.006) (0.006)
12 param(power)+bspline, HC 0.097 0.053 0.103 0.076 0.039 0.081 0.058 0.028 0.061 0.04 0.019 0.042
(0.019) (0.02) (0.019) (0.014) (0.015) (0.014) (0.01) (0.01) (0.01) (0.006) (0.007) (0.006)
D COMPUTATION
338 B Figures
339 C Proofs
340 Lemma 1: Pθ does not depend on linear parameters.
341 Let h(x, θ) = θ1+θ2q(x, θ3) be a twice differentiable function with q non-linear in θ3. W.L.o.G.
342 assume that θ3 ∈ R. Then
 
1 0 0∂h(x, θ) ∂q(x, θ )  Ḣ 3 θ = = (1n q︸(x︷,︷θ3︸) θ2 ) = (︸1n ︷c︷1 c2∂θ ︸ ∂︷θ︷3 ︸ ︸
)0 1 0
:=c1 :=H ∈Rn×3
:=c 12 0 0 θ2︸ ︷︷ ︸
H2∈R3×3
343 and
⊤ ⊤
P = Ḣ (Ḣ Ḣ )−1θ θ θ θ Ḣθ
= H H (H H⊤1 2 2 1 H1H )
−1
2 H2H
⊤
1
= H −11H2H2 (H
⊤H )−1H−1 ⊤1 1 2 H2H1
= H1(H
⊤ −1 ⊤
1 H1) H1
344 and H1 does not depend on θ2.
345 D Computation
346 The code to reproduce results is available at
347 https://gitlab.tu-dortmund.de/functional shrinkage/nonlinear shrinkage.
348 The non-linear functional shrinkage (NLFS) approach for the Hill model is implemented using
349 a combination of Gibbs-, Metropolis-Hastings- and Slice sampling [Brooks et al., 2011, Neal,
350 2003]. We separately model the function intercept θ1.
351 Given the likelihood and priors
Y ∼ N(θ11n +Φβ, σ2In)
σ2 ∼ IG(aσ, bσ), θ1 ∼ N(µ 2θ1 , σθ )1
β ∼ N(0, σ2τ2(Φ⊤(I − P )Φ)−1θ )
θ3 ∼ N+(µθ3 , σ2θ ), θ4 ∼ LN(µθ4 , σ23 θ )4
ω = 1/(1 + τ2) ∼ Beta(aω, bω), aω = 0.5, bω = exp(− log(n)/2),
20
D COMPUTATION
α σ θ
θ ω β
Figure 4: Traceplots of the NLFS(Hill) example fit in Figure 2, pane (a), correct subspace speci-
fication. The first 2000 samples were discarded as burn-in. Due to the correct subspace speci-
fication, there is strong shrinkage (ω = (1 + τ2)−1 close to 1). The effective sample size (ESS)
was calculated based on the 10000 draws after discarding the first 2000 burn-in draws using the
coda R-package [Plummer et al., 2006].
21
D COMPUTATION
• • •
• •
Figure 5: Traceplots of the NLFS(power) example fit in Figure 2, pane (b), subspace misspecifi-
cation. The first 2000 samples were discarded as burn-in. Due to the subspace misspecification,
there is little shrinkage (ω = (1+ τ2)−1 close to 0). Further, power exponent θ3 is stuck at 0 for a
few thousand draws, and θ1, the intercept, seems highly correlated. Due to the misspecification
and effectively no shrinkage, single parameters are not well identifiable and the whole response
must be viewed to inspect convergence. The resulting response mixes well (bottom row). The
effective sample size (ESS) was calculated based on the 10000 draws after discarding the first
2000 burn-in draws using the coda R-package [Plummer et al., 2006].
22
D COMPUTATION
Algorithm 1 Non-linear functional shrinkage (NLFS)
1: Initialize: β(1), σ2(1), τ (1), ω(1), θ(1)
2: for i : 2 → B do
3: Calculate Ḣθ(i−1)
4: Sample β(i) ∼ p(β|·) ▷ Conjugate
5: Sample (i)θ1 ∼ p(θ1|·) ▷ Conjugate
6: Sample σ2(i) ∼ p(σ2|·) ▷ Conjugate
7: Sample ω(i) ∼ p(ω|·) ▷ Slice Sampler
8: Sample Non-linear θ(i) ∼ p(θ|·) ▷ MH Sampler
9: end for
10: return All samples
352 the sampler is summarized in Algorithm 1.
353 Update β
354 We update β using the full conditional posterior
β|σ2, τ2, θ, y ∼ N(µ′ ,Σ′β β) (14)
355 where Σ′β = σ
2(Φ⊤Φ+ τ2Φ⊤(In − P )Φ)−1θ and µ′β = σ−2Σ′ ′βΦ ỹ and ỹ = y − 1nθ1.
356 Update θ1
357
358 The intercept θ1 is updated using
θ |β, σ2, y ∼ N(µ′ 2 ′1 θ , σθ ) (15)1 1
359 where ′ ′σ2 = (σ2σ2θ θ )(nσ
2
θ + σ
2) and µ′θ = σ
−2σ2 ⊤θ 1n (y − Φβ) + µθ1/σ21 1 1 1 1 θ .1
360 Update σ2
361 The noise variance σ2 is updated by
σ2|β, τ2, θ ∼ IG(a′ , b′σ σ) (16)
362 where a′σ = (n+ k)/2 + a and b
′
σ σ = 0.5(RSS+ τ
−2β⊤(Φ⊤(In − Pθ)Φ)β) + bσ where RSS =
363 ||y − (θ 1 + Φβ)||21 n 2 is the residual sum of squares and ||.||2 is the Euclidean norm.
364 Update τ2
We update τ2365 using a slice sampler [Neal, 2003] considering the posterior log likelihood
g(τ) = log(p(τ |β, σ2, θ)) =(−k/2 + bω − 0.5) log(τ2) (17)
+ (−a 2ω − bω) log(1 + τ ) (18)
( )
− 1+ β⊤Φ⊤(In − P −22 θ)Φβ τ . (19)2σ
23
D COMPUTATION
366 Note that we use −k/2 and not −(k − d0)/2 as in Shin et al. [2020] where d0 is the rank of the
367 (linear) projection matrix. We omit −d0 as for the non-linear approach, there are no linear bases
368 in Ḣθ that are in Φ (because there is no intercept in Φ) and hence the prior covariance matrix of
369 β is of full rank k. For a current τ0, calculate v = g(τ0)). Uniformly draw z̃ ∼ U(0, exp(v)). For
370 z = log(z̃) define the slice Sz = {x : g(x) < g(z)} and sample the next τ1 uniformly from Sz.
371 For computational ease, we restrict τ2 to [0.001, 10].
372 The above sampling for the general ω ∼ Beta(aω, bω) prior was primarily featured and labelled
373 ’own slice’ (OC) in Table 3. We also considered a standard horseshoe (HS) prior (a = b = 0.5).
374 Details on its implementation are in Makalic and Schmidt [2015].
375 Update non-linear θ
376 One only has to update the non-linear parameters of θ, as Pθ only depends on the non-linear pa-
377 rameters. For the Hill model, the non-linear parameters are θ3 and θ4. We assume independence
378 and separately update θ3 and θ4 using a Metropolis-Hastings sampler [Brooks et al., 2011] and
379 explain the sampling for θ3.
380 Perform the three sampling steps
1. Draw a candidate (1)381 θ3 from a proposal distribution ppropusing
(0)
θ3
382 2. Calculate the hastings ratio
(1) (0) (1)
p(θ
HR = 3
|·)pprop(θ3 |θ3 ) .
(0) (1) (0)
p(θ3 |·)pprop(θ3 |θ3 )
383 3. Draw u ∼ Unif (1)[0, 1]. If HR > u, accept θ3 as new draw. Otherwise, reject and consider
(0)
384 θ3 as new draw.
385 For step (1), sample a new candidate (1)θ3 from a proposal distribution, e.g.
(0)
N 2+(θ3 , σprop)
386 where σ2prop might be calculated as, e.g. the empirical variance of the latest 100 draws of θ3, or
simply as σ2 = σ2387 prop θ . To sample x from a truncated normal distribution with positive support,3
388 x ∼ N+(µ, σ2), calculate l = P (X < 0) where X ∼ N(µ, σ2). Sample u ∼ Unif[l, 1] and
389 calculate x = qµ,σ2(u), the corresponding quantile.
390 For step (2), consider log(HR) for computational stability:
HR (1)|· − (0)|· (0)| (1) − (1) (0)log( ) = log(p(θ3 )) log(p(θ3 )) + log(pprop(θ3 θ3 )) log(pprop(θ3 |θ3 )).
391 For the fully conditional log posterior log(p(θ|·)), we can integrate out β to reduce the autocorre-
392 lation in the sampling. Since p(θ|·) ∝ p(y|θ, σ2, τ2)p(θ) and
y|θ, σ2, τ2 ∼ N(E(Φβ + ε),Cov(Φβ + ε)),
393 we use y|θ, σ2, τ2 ∼ N(0,Σy) with Σ = σ2τ2y Φ(Φ⊤(I − Pθ)Φ)−1Φ⊤ + σ2In.
24
REFERENCES REFERENCES
394 References
395 M. A. Alvarez, D. Luengo, and N. D. Lawrence. Linear latent force models using gaussian pro-
396 cesses. IEEE transactions on pattern analysis and machine intelligence, 35(11):2693–2705,
397 2013.
398 A. Brezger and W. J. Steiner. Monotonic regression based on bayesian p–splines. Journal
399 of Business & Economic Statistics, 26(1):90–104, 2008. doi: 10.1198/073500107000000223.
400 URL https://doi.org/10.1198/073500107000000223.
401 S. Brooks, A. Gelman, G. Jones, and X.-L. Meng. Handbook of markov chain monte carlo. CRC
402 press, 2011.
403 D. Carl. A practical guide to splines. Springer, 2001.
404 C. M. Carvalho, N. G. Polson, and J. G. Scott. The horseshoe estimator for sparse signals.
405 Biometrika, 97(2):465–480, 2010.
406 J. Chen, Z. Chen, C. Zhang, and C. Jeff Wu. Apik: Active physics-informed kriging model with
407 partial differential equations. SIAM/ASA Journal on Uncertainty Quantification, 10(1):481–506,
408 2022.
409 A. Golightly and D. J. Wilkinson. Bayesian parameter inference for stochastic biochemical network
410 models using particle markov chain monte carlo. Interface focus, 1(6):807–820, 2011.
411 S. Goutelle, M. Maurin, F. Rougier, X. Barbaut, L. Bourguignon, M. Ducher, and P. Maire. The
412 hill equation: a review of its capabilities in pharmacological modelling. Fundamental & clinical
413 pharmacology, 22(6):633–648, 2008.
414 L. H. Gunn and D. B. Dunson. A transformation approach for incorporating monotone or unimodal
415 constraints. Biostatistics, 6(3):434–449, 2005.
416 A. V. Hill. The possible effects of the aggregation of the molecules of hemoglobin on its dissocia-
417 tion curves. The Journal of Physiology, 40:iv–vii, 1910.
418 A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.
419 Technometrics, 12(1):55–67, 1970.
420 Y. Huang, D. Liu, and H. Wu. Hierarchical bayesian methods for estimation of parameters in a
421 longitudinal hiv dynamic system. Biometrics, 62(2):413–423, 2006.
25
REFERENCES REFERENCES
422 T. W. Kelsey, L. Q. Li, R. T. Mitchell, A. Whelan, R. A. Anderson, and W. H. B. Wallace. A
423 validated age-related normative model for male total testosterone shows increasing variance
424 but no decline after age 40 years. PloS one, 9(10):e109346, 2014.
425 C. Köllmann, B. Bornkamp, and K. Ickstadt. Unimodal regression using bernstein–schoenberg
426 splines and penalties. Biometrics, 70(4):783–793, 2014.
427 S. Lang and A. Brezger. Bayesian p-splines. Journal of computational and graphical statistics,
428 13(1):183–212, 2004.
429 D. J. Lunn, N. Best, A. Thomas, J. Wakefield, and D. Spiegelhalter. Bayesian analysis of popula-
430 tion pk/pd models: general concepts and software. Journal of pharmacokinetics and pharma-
431 codynamics, 29:271–307, 2002.
432 E. Makalic and D. F. Schmidt. A simple sampler for the horseshoe estimator. IEEE Signal Pro-
433 cessing Letters, 23(1):179–182, 2015.
434 D. Mesarovic, Mihajlo, S. Sreenath, and J. Keene. Search for organising principles: understand-
435 ing in systems biology. Systems biology, 1(1):19–27, 2004.
436 M. C. Meyer. Inference using shape-restricted regression splines. The Annals of Applied
437 Statistics, 2(3):1013–1033, 2008. ISSN 19326157. URL http://www.jstor.org/stable/
438 30245118.
439 M. C. Meyer, A. J. Hackstadt, and J. A. Hoeting. Bayesian estimation and inference for gener-
440 alised partial linear models using shape-restricted splines. Journal of Nonparametric Statistics,
441 23(4):867–884, 2011.
442 T. P. Morris, I. R. White, and M. J. Crowther. Using simulation studies to evaluate statistical
443 methods. Statistics in medicine, 38(11):2074–2102, 2019.
444 R. M. Neal. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
445 M. Plummer, N. Best, K. Cowles, and K. Vines. Coda: Convergence diagnosis and output
446 analysis for MCMC. R News, 6(1):7–11, 2006. URL https://journal.r-project.org/
447 archive/.
448 G. A. Seber and C. J. Wild. Nonlinear regression. hoboken. New Jersey: John Wiley & Sons, 62
449 (63):1238, 2003.
450 M. Shin, A. Bhattacharya, and V. E. Johnson. Functional horseshoe priors for subspace shrink-
451 age. Journal of the American Statistical Association, 115(532):1784–1797, 2020.
26
REFERENCES REFERENCES
452 T. S. Shively, T. W. Sager, and S. G. Walker. A bayesian approach to non-parametric monotone
453 function estimation. Journal of the Royal Statistical Society. Series B (Statistical Methodology),
454 71(1):159–175, 2009.
455 T. S. Shively, S. G. Walker, and P. Damien. Nonparametric function estimation subject to mono-
456 tonicity, convexity and other shape constraints. Journal of Econometrics, 161(2):166–181,
457 2011.
458 P. Šimon. Considerations on the single-step kinetics approximation. Journal of Thermal Analysis
459 and Calorimetry, 82(3):651–657, 2005.
460 R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical
461 Society: Series B (Methodological), 58(1):267–288, 1996.
462 M. K. Titsias, A. Honkela, N. D. Lawrence, and M. Rattray. Identifying targets of multiple co-
463 regulating transcription factors from expression time-series by bayesian model comparison.
464 BMC systems biology, 6:1–21, 2012.
465 M. W. Wheeler, D. B. Dunson, S. P. Pandalai, B. A. Baker, and A. H. Herring. Mechanistic
466 hierarchical gaussian processes. Journal of the American Statistical Association, 109(507):
467 894–904, 2014.
468 M. W. Wheeler, D. B. Dunson, and A. H. Herring. Bayesian local extremum splines. Biometrika,
469 104(4):939–952, 2017.
470 P. Wiemann and T. Kneib. Adaptive shrinkage of smooth functional effects towards a predefined
471 functional subspace. arXiv preprint arXiv:2101.05630, 2021.
27