No. 655        September 2022 
BILEVEL OPTIMIZATION OF THE  
KANTOROVICH PROBLEM AND  ITS 
QUADRATIC REGULARIZATION 
PART I: EXISTENCE RESULTS 
S. Hillbrecht, C. Meyer
ISSN: 2190-1767
BILEVEL OPTIMIZATION OF THE KANTOROVICH PROBLEM
AND ITS QUADRATIC REGULARIZATION
PART I: EXISTENCE RESULTS
SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
Abstract. This paper is concerned with an optimization problem governed
by the Kantorovich optimal transportation problem. This gives rise to a bilevel
optimization problem, which can be reformulated as a mathematical problem
with complementarity constraints in the space of regular Borel measures. Be-
cause of the non-smoothness induced by the complementarity relations, prob-
lems of this type are frequently regularized. Here we apply a quadratic regular-
ization of the Kantorovich problem. As the title indicates, this is the first part
in a series of three papers. It addresses the existence of optimal solutions to the
bilevel Kantorovich problem and its quadratic regularization, whereas part II
and III are dedicated to the convergence analysis for vanishing regularization.
1. Introduction
This paper is concerned with a bilevel optimization problem with the Kan-
torovich problem of optimal transport as the lower-level problem. The problem
under consideration takes the following form: inf J (π, µ1)π,µ1
(BK)  s.t. µ1 ∈M(Ω1){, ∫µ1 ≥ 0, ‖µ1‖M(Ω ) = ‖µ
d
 1 2
‖M}(Ω ,2) π ∈ arg min cd dϕ : ϕ ∈ Π(µ1, µd2), ϕ ≥ 0 .
Ω
Herein, cd is a given cost function measuring the transportation cost and µ
d
2 a
given marginal on a domain Ω2. The set Π(µ
d
1, µ2) denotes the set of feasible
transport plans, i.e., regular Borel measures that have µ d1 and µ2 as first and second
marginal, see (1.1) below. The lower level problem thus aims at minimizing the
transportation cost among all feasible transport plans associated with µ1 and µ
d
2 . It
is Kantorovich’s well-known generalization of the famous Monge problem, cf. [15].
We refer to [25, 26, 2, 22] for more details on the Kantorovich problem and its
application background. The bilevel optimization problem now consists of varying
the first marginal µ1 such that this marginal, together with an associated optimal
transport plan, minimizes a given objective J . The additional constraints on µ1 in
(BK) ensure that there is at least one optimal transport plan associated with µ1
Date: June 27, 2022.
2010 Mathematics Subject Classification. 49Q22, 90C08, 49J45.
Key words and phrases. Optimal transport, Kantorovich problem, bilevel optimization, qua-
dratic regularization.
This research was supported by the German Research Foundation (DFG) under grant num-
ber LO 1436/9-1 within the priority program Non-smooth and Complementarity-based Distributed
Parameter Systems: Simulation and Hierarchical Optimization (SPP 1962).
1
2 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
so that the feasible set of (BK) is non-empty. One possible application for such a
bilevel problem could be, for instance, the identification of the first marginal based
on measurements πd of the transport plan on a part D of the domain Ω1 × Ω2. In
this case, the upper level objective would be of the form J (π, µ1) = d(π, πd) (plus
potential regularization terms accounting for errors in the measurement), where
d denotes a suitable distance such as |π − πd|(D) or ‖π − π(d)‖W−1,p(D) for some
p > dim(D).
From a bilevel optimization point of view, the Kantorovich problem is challeng-
ing. First, for a given cost cd and marginals µ1 and µ
d
2 , the optimal transport
plan needs not to be unique (unless the cost function is strictly convex and at
least one of the marginals is absolutely continuous w.r.t. the Lebesgue measure, see
[22, Theorem 1.17]). Thus, in general, there is no single-valued solution mapping
µ1 7→ π associated with the lower level problem in (BK). This prevents us from
using the so-called implicit programming approach, where the lower level problem
is replaced by its solution operator and the (potentially limited) differentiability
properties of the latter are used to derive optimality conditions and optimization
algorithms for the bilevel problem, cf. e.g. [21, 4]. Alternatively, one could replace
the convex lower-level problem by its necessary and sufficient first-order optimality
conditions. These, however, contain a complementarity system in the space of reg-
ular Borel measures, which turns the bilevel problem into a mathematical program
with complementarity constraints (MPCC) in M(Ω1 × Ω2).
A common strategy to treat MPCCs is to regularize the complementarity con-
straints and the lower level problem. We only refer to [19, 23, 14, 16] in the finite
dimensional setting and to [3, 13, 10, 24, 27] for problems in function spaces. These
approaches are of theoretical as well as numerical interest. While a limit analysis for
vanishing regularization parameters yields stationarity conditions for the original
problem, the regularized problems can often be treated with standard algorithms
that, together with a path-following procedure for the regularization parameter,
can provide an efficient method for solving an MPCC. Here, we follow a similar ap-
proach and employ a quadratic regularization of the Kantorovich problem, which
was proposed and analyzed in [18]. This regularization has several advantages.
First, the regularized Kantorovich problem is strictly convex and thus uniquely
solvable, which is in particularly attractive from the viewpoint of bilevel optimiza-
tion as it allows the implicit programming approach to be applied. Moreover, the
regularity of the optimal transport plans is improved in a way that we are faced
with an MPCC in L2(Ω1 × Ω2) instead of an MPCC in the space of regular Borel
measures. Finally, as shown in [18], the quadratic regularization preserves essential
features of the original Kantorovich problem such as the sparsity of the optimal
transport plan as well as a dual problem that provides a substantial reduction of
the dimension. As a price for these desirable properties, the regularized problems
still contain a complementarity relation and are therefore not smooth, in contrast
to common MPCC regularization approaches. However, due to their particular
structure involving a complementarity system in L2(Ω1×Ω2), we expect that non-
smooth optimization algorithms for MPCCs in function space are applicable, see
e.g. [9, 5].
This work is the first part in a series of three papers. While the other two contri-
butions address the question of convergence of solutions of the regularized bilevel
problems for vanishing regularization parameter, this paper is concerned with the
BILEVEL KANTOROVICH PROBLEM, PART I 3
existence of solutions. For the bilevel Kantorovich problem (BK) itself, existence
of globally optimal solutions is rather straightforward to show, based on known
(weak) stability results for the Kantorivich problem. The situation changes, if one
turns to its regularized counterpart. As we will see by means of a counterexample,
the solution operator associated with the regularized Kantorovich problem is not
weakly continuous. Nevertheless, one can show its strong continuity, which allows
us to prove the existence of solution for the regularized bilevel problems.
The paper is organized as follows: After introducing some basic notation and
assumptions in the rest of this introduction, we collect some known results on the
Kantorovich problem and its quadratic regularization in Section 2. We then turn
to the existence of globally optimal solutions for (BK) in Section 3. The main part
of the paper is contained in Section 4, where we first verify that the regularized
solution operator is locally Hölder continuous and, based on that, show the existence
of optimal solutions for the regularized bilevel problems.
1.1. Notation and Standing Assumptions.
Domains. For d1, d2 ∈ N, let Ω1 ⊂ Rd1 and Ω d22 ⊂ R be compact and connected
sets with non-empty interior. We moreover suppose that their Cartesian product
Ω := Ω1×Ω2 coincides with the closure of its interior and has a Lipschitz boundary
in the sense of [8, Def. 1.2.2.1]. By B(Ω), we denote the respective Borel σ-algebra
on Ω and by λ the Lebesgue measure on B(Ω). For Ω1 and Ω2, B(Ωi) and λi,
i = 1, 2, are defined analogously so that λ = λ1 ⊗ λ2. Furthermore, we abbreviate
|Ω1| := λ1(Ω1), |Ω2| := λ2(Ω2), and |Ω| := λ(Ω).
Marginals. Let (X,B(X)) be a measurable space. Then, we denote by M(X) the
space of (signed) regular Borel measures on X equipped with the total variation
norm, i.e., ‖µ‖M(X) := |µ|(X). If µ1 ∈ M(Ω1) and µ2 ∈ M(Ω2), then the set of
transport plans between the marginals µ1 and µ2 is given by
(1.1) Π(µ1, µ2) := {π ∈M(Ω): P1#π = µ1 and P2#π = µ2},
where, for i = 1, 2,
Pi#π := π ◦ P−1i : B(Ωi)→ R,
is the pushforward measure of π via the projection Pi : Ω 3 (x1, x2) 7→ xi ∈ Ωi.
Note that Π(µ1, µ2) = ∅, if µ d1(Ω1) 6= µ2(Ω2). Throughout the paper, µ2 ∈M(Ω2)
is a fixed marginal satisfying µd2 ≥ 0 and, in order to ease notation, ‖µd2‖M(Ω ) = 1.2
The normalization condition is no restriction and can be ensured by re-scaling.
Given a measure space (X,A, µ), the Lebesgue space of p-th power absolutely
integrable functions is denoted by Lp(X,µ), p ∈ [1,∞). If X ⊂ Rn, n ∈ N, is a
Lebesgue measurable set and µ is the Lebesgue measure, we simply write Lp(X).
Cost Function. The cost function is assumed to satisfy c ∈W 1,pd (Ω), p > d1 + d2,
where, with a slight abuse of notation, W 1,p(Ω) denotes the Sobolev space on int(Ω).
Note that, due to the regularity of ∂Ω, W 1,p(Ω) is compactly embedded in C(Ω),
cf. e.g. [1, Theorem 6.3]. Thus, there exists a continuous representative of cd, which
we denote by the same symbol.
4 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
Bilevel Objective. The functional J : M(Ω)×M(Ω1)→ R is supposed to be lower
semicontinuous w.r.t. weak-∗ convergence. Let us give an application-driven exam-
ple for such a functional. Suppose that subsets D ∈ B(Ω) and D1 ∈ B(Ω1) and
measurements π ∈M(D) and µdd 1 ∈M(D1) are given. Then we set
J (π, µ d1) := |π − πd|(D) + ν |µ1 − µ1 |(D1),
where ν ≥ 0 is a weighting parameter. The goal of the bilevel optimization is then
to adjust µ1 and π such that the deviation between an optimal transport process
and (possibly inaccurate) measurements thereof on subdomains becomes minimal.
2. Preliminaries
Given marginals µ1 ∈ M(Ω1), µ2 ∈ M(Ω2) and a measurable cost function
c : Ω→ [−∞,∞], the Kantorovich problem∫ of optimal transport reads
 inf K(π) := c(x) dπ(x)(KP) Ω
s.t. π ∈ Π(µ1, µ2), π ≥ 0.
Lemma 2.1 ([26, Theorem 4.1]). If µ1, µ2 ≥ 0 and ‖µ1‖M(Ω1) = ‖µ2‖M(Ω ) and2
if c is lower semicontinuous and bounded from below, then there exists an optimal
solution of (KP).
Despite this existence result and its simple structure, the Kantorovich problem
provides some challenging aspects, especially from a numerical perspective. First of
all, its solution may be non-unique (although there are conditions which guarantee
uniqueness, see e.g. [22, Theorem 1.17]). More importantly, since π “lives” on the
Cartesian product of Ω1 and Ω2, the dimension of a discretized counterpart of (KP)
easily becomes so large that a numerical solution by means of standard LP-solvers
is no longer possible. Therefore, several penalization and relaxation methods have
been proposed, that in combination with dualization, allow a significant reduction
in the size of the problem. The most popular method is probably the entropic
regularization in combination with the well-known Sinkhorn algorithm, see e.g.
[7, 6].
In this paper, we rely on a different strategy, namely the quadratic regularization
that has been introduced in [18] and is as follows: Given a regularization parameter
γ > 0, two marginals µ ∈ L21 (Ω1), µ 22 ∈ L (Ω2), and a cost function c ∈ L2(Ω), we
consider ∫
 inf Kγ(πγ) := c(x)πγ(x) dλ(x) +
γ
2 ‖π
2
 γ
‖L2(Ω)
 Ωs.t. π∫ 2γ ∈ L (Ω), πγ ≥ 0 λ-a.e. in Ω,
(KPγ)  ∫ πγ(x1, x2) dλ2(x2) = µ1(x1) λ1-a.e in Ω1, Ω2 πγ(x1, x2) dλ1(x1) = µ2(x2) λ2-a.e in Ω2.
Ω1
Lemma 2.2 ([18, Lemma 2.1, Theorem 2.11]).
(i) Problem (KPγ) admits a unique solution if and only if µi ≥ 0 λi-a.e. in Ωi,
i = 1, 2, and ‖µ1‖L1(Ω ) = ‖µ2‖L11 (Ω2).
BILEVEL KANTOROVICH PROBLEM, PART I 5
(ii) If, in addition, there exist constants c > −∞ and δ > 0 such that c ≥ c
λ-a.e. in Ω and µi ≥ δ λi-a.e. in Ωi, i = 1, 2, then π ∈ L2γ (Ω) is a solution of
(KPγ) if and only if there exist functions α1 ∈ L2(Ω1) and α2 ∈ L2(Ω2) satisfying
1
(2.1a) ∫πγ − (α1 ⊕ α2 − c)+ = 0 λ-a.e. in Ω,γ
(2.1b) ∫ πγ(x1, x2) dλ2(x2) = µ1(x1) λ1-a.e. in Ω1,Ω2
(2.1c) πγ(x1, x2) dλ1(x1) = µ2(x2) λ2-a.e. in Ω1.
Ω1
Herein, (α1⊕α2)(x1, x2) := α1(x1) +α2(x2) λ-a.e. in Ω refers to the direct sum of
α1 ∈ L2(Ω ) and α ∈ L21 2 (Ω2), while, for given u ∈ L2(Ω), (u)+(x) := max{u(x); 0}
λ-a.e. in Ω denotes the pointwise maximum. It is clear that both the direct sum and
the pointwise maximum map L2(Ω1)×L2(Ω2) and L2(Ω), respectively, to L2(Ω) so
that (2.1a) is well defined.
(iii) Under the above assumptions, the functions αi ∈ L2(Ωi), i = 1, 2, from (ii)
solve thedual problem given by ∑2 ∫1 2 max Φγ(a1, a2) := − 2‖(a1 ⊕ a2 − c)+‖L2(Ω) + γ aiµi dλi(2.2)  i=1 Ωis.t. ai ∈ L2(Ωi), i = 1, 2,
and there is no duality gap, i.e., Φγ(α1, α2) = Kγ(πγ).
The above results directly address the aforementioned challenges. Besides the
uniqueness of the solution, the approach allows us to escape the curse of dimen-
sionality. If one inserts (2.1a) into (2.1b) and (2.1c), then a non-smooth system of
equations for the dual variables α1 and α2 arises. We are then dealing with a prob-
lem in L2(Ω 21) × L (Ω2) instead of L2(Ω1 × Ω2), which, after discretization, leads
to a substantial reduction of the number of unknowns. Moreover, due to the max-
operator in (2.1a), the sparsity pattern of the transport plans is better preserved
compared to the entropic regularization. Finally, the structure of (2.1) allows for
the application of a semi-smooth Newton method, see [18] for more details.
The convergence of (sub-)sequences of solutions of (KPγ) to solutions of (KP)
for γ ↘ 0 is addressed in [17]. To be more precise, it is shown that the objective of
(KPγ) Γ-converges to the objective of (KP) w.r.t. weak-∗ convergence in M(Ω) as
γ ↘ 0, provided that the original marginals in M(Ωi), i = 1, 2, are smoothed and
the smoothing parameter is properly coupled with γ, see [17, Theorem 4.2].
As outlined in the introduction, the goal of this and the companion papers is
to employ the quadratic regularization for a bilevel optimization problem with the
Kantorovich problem (KP) as the constraint. The motivation for this approach is as
follows: First, the uniqueness of solutions for (KPγ) allows us to define a solution
operator Sγ(c, µ1, µ2) 7→ πγ , which, in turn, enables us to employ the so-called
implicit programming approach. Secondly, we expect that Sγ (or a discretization
thereof) provides sufficient smoothness to use non-smooth optimization algorithms
for the solution of the regularized bilevel problem. Before we address the regularized
bilevel problem, we turn to the optimization of the original problem (KP) and show
existence of at least one optimal solution to the latter in the upcoming section.
6 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
3. Existence of Optimal Solutions of the Bilevel Kantorovich
Problem
Let us first recall the bilevel optimization of the Kantorovich problem from theintroduction: inf J (π, µ1)π,µ1
(BK)  s.t. µ1 ∈M(Ω1){, ∫µ1 ≥ 0, ‖µ1‖M(Ω ) = 1, 1 }π ∈ arg min cd dϕ : ϕ ∈ Π(µ1, µd2), ϕ ≥ 0 ,
Ω
where c ∈W 1,pd (Ω), for p > d1 +d2, and µd2 ∈M(Ω2) denote a fixed cost functional
and a fixed marginal, respectively, and J is a given objective, see Section 1.1. To
shorten notation, given c ∈ C(Ω) and µi ∈ M(Ωi), i = 1, 2, we abbreviate the set
of associated optimal transport plan{s∫by }
(3.1) S(c, µ1, µ2) := arg min cdϕ : ϕ ∈ Π(µ1, µ2), ϕ ≥ 0 .
Ω
The essential tool to establish the existence of solutions to (BK) is the following
stability result for the Kantorovich problem. Its proof is based on the concept of
c-cyclic monotonicity. For details, we refer to [26, Section 5].
Lemma 3.1 (Stability of the transport plan, [26, Theorem 5.20]). Let c ∈ C(Ω) be
given and assume that {µk1}k∈N ⊂M(Ω1) and {µk2}k∈N ⊂M(Ω2) are sequences that
satisfy µk, µk1 2 ≥ 0 and ‖µk1‖ kM(Ω ) = ‖µ2‖M(Ω ) for all k ∈ N and converge weakly-∗1 2
to µ1 ∈ M(Ω1) and µ2 ∈ M(Ω2). Let moreover {πk}k∈N be a sequence of optimal
transport plans associated with (µk1 , µ
k
2), i.e., πk ∈ S(c, µk1 , µk2). Then there is a
subsequence that converges weakly-∗ to an optimal transport plan π ∈ S(c, µ1, µ2).
The above lemma is just a special case of [26, Theorem 5.20], where the cost func-
tion need not to be fixed. We underline that the notion of “weak convergence” in
[26] (sometimes also called narrow convergence) coincides with weak-∗ convergence
in our case, since Ω1, Ω2, and Ω are compact.
Theorem 3.2. There exists at least one globally optimal solution to (BK).
Proof. Based on Lemma 3.1, the result easily follows from the direct method of the
calculus of variations.
First, thanks to Lemma 2.1, the feasible set of (BK), denoted by F , is nonempty.
Thus, there exists a minimizing sequence {(π kk, µ1)}k∈N so that
lim J (π kk, µ1) = inf J (π, µ1) ∈ R ∪ {−∞}.
k→∞ (π,µ1)∈F
The feasibility of the minimizing sequence implies
‖π ‖ = π (Ω × Ω ) = (P π )(Ω ) = µk(Ω ) = ‖µkk M(Ω) k 1 2 1# k 1 1 1 1‖M(Ω ) = 1 ∀ k ∈ N1
so that there is a subsequence, denoted by the same symbol to ease notation, that
converges weakly-∗ to a limit (π̄, µ̄ ) ∈M(Ω)×M(Ω ). Thus (µk, µd) ⇀∗1 1 1 2 (µ̄1, µd2) in
M(Ω1)×M(Ω2) and consequently, Lemma 3.1 implies π̄ ∈ S(c dd, µ̄1, µ2). In view of
the constraints of the Kantorovich problem, this also gives µ̄1 ≥ 0 and ‖µ̄1‖M(Ω1) =
1 and hence, (π̄, µ̄1) ∈ F , i.e., the weak limit is feasible. The optimality of the weak
limit follows from the presupposed weak-∗ lower semicontinuity of J . 
BILEVEL KANTOROVICH PROBLEM, PART I 7
Remark 3.3. Lemma 3.1 only requires the continuity of the cost function. For the
mere existence result, one could therefore relax the regularity assumption on the
cost function to cd ∈ C(Ω). The improved regularity of cd is however required for
the analysis of the regularized bilevel problem, and for this reason, we impose it as
standing assumption. Moreover, Lemma 3.1 also holds in Polish spaces and not only
in compact sets. One can therefore generalize the existence result of Theorem 3.2
to a much broader class of domains Ω1 and Ω2.
If one replaces (KP) by its necessary and sufficient optimality conditions, then
(BK) can equivalently be rewritten as inf J (π, µ )
1
 s.t. µ1 ∈M(Ω1), µ1 ≥ 0, ‖µ1‖M(Ω1) = 1,
π ∈M(Ω1 × Ω2), ϕ ∈ C(Ω1), ψ ∈ C(Ω2),
(BK) ⇐⇒  P π = µ , P
d
 1# 1 2#
π = µ2 ,
 π∫≥ 0, ϕ(x1) + ψ(x2) ≤ cd(x1, x2) ∀ (x1, x2) ∈ Ω1 × Ω2, (ϕ⊕ ψ − cd) dπ = 0.
Ω1×Ω2
We observe that the last two lines in the above reformulation of (BK) form a
complementarity system in M(Ω1 × Ω2) and C(Ω1 × Ω2), so that (BK) becomes
an MPCC in the space of regular Borel measures, as already mentioned in the
introduction. Even though several results for MPCCs are known, in particular
when the cone defining the complementarity constraints is polyhedric, which is the
case here, see [28, Example 4.12], problems of this type are typically smoothed or
regularized, and we will do just that in the next section.
4. Existence of Optimal Solutions of the Regularized Bilevel
Problem
The regularized Kantorovich problem (KPγ) clearly admits a solution only if the
marginals are functions in L2(Ω1) and L
2(Ω2). Therefore, one needs to regularize
the marginals, if the Kantorovich problem in (BK) is replaced by (KPγ). But even
if the marginals were functions in L2(Ω1) and L
2(Ω2), one needs to smooth them
considering the lack of (weak) continuity of the solution mapping associated with
(KPγ), see Example 4.2 below. What is more, in order to guarantee the existence
of the dual variables α1 and α2 from Lemma 2.2, the marginals need to be strictly
positive, see [18, Assumption 1]. We therefore introduce the convolution & constant
shifting operators
δ
(4.1) T δ δ 2 δi : M(Ωi) 3 µi 7→ ϕi ∗ µi + ∈ L (Ωi ), i = 1, 2,|Ωδi |
which turn the marginals into smooth and strictly positive functions on Ωδ1 and
Ωδ2. Herein, δ > 0 is a smoothing parameter, ϕ
δ ∞ di
i ∈ Cc (R ) denotes a standard
mollifier with ‖ϕδ‖ = 1 and support in B(0, δ) ⊂ Rdii L1(Rdi ) and Ωδi := Ωi+B(0, δ),
i = 1, 2. Moreover, we set Ωδ := Ω
δ δ
1×Ω2. When mollifying µi, we of course extend
it by zero, i.e., ∫
(ϕδi ∗ µi)(x) = ϕδi (x− y) dµi(y), x ∈ Ωδi .
Ωi
8 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
As a consequence, ϕδ ∗ µ ⇀∗i i µi in M(Ωi) as δ ↘ 0, provided that the support of
µi has positive distance to ∂Ωi, which will be useful for the convergence analysis in
the companion paper [11], where the smoothing parameter δ will be polynomially
coupled with the regularization parameter γ.
Owing to Lemma 2.2 there exists a unique solution π ∈ L2γ (Ωδ) to (KPγ) for
costs in L2(Ωδ) and marginals in L
2(Ωδi ), i = 1, 2, that satisfy the conditions in
Lemma 2.2(i). We denote the associated solution operator by
Sγ : L2(Ωδ)×{ M0 3 (c, µ1, µ 22)→7 πγ ∈ L (Ωδ),
M0(Ωδ) := (µ1, µ2) ∈ L2(Ωδ1)× L2(Ωδ2) : ‖µ1‖L1(Ωδ) = ‖µ2‖L1(Ωδ1 2), }
µi ≥ 0 λi-a.e. in Ωδi , i = 1, 2 .
To ease notation, we suppress the dependency of πγ and Sγ on δ. Furthermore,
we introduce the extension-by-zero operator E 2δ : C(Ω) → L (Ωδ), whose adjoint
E∗ : L2δ (Ωδ) → M(Ω) is the associated restriction operator. Now, we have every-
thing at hand to formulate the regularized bilevel problem: inf Jγ(πγ , µ 11, c) := J (πγ , µ1) + pγ ‖c− c pd‖π ,µ ,c W 1,p(Ω)γ 1
(BKδγ)  s.t. c ∈W 1,p(Ω(), µ1 ∈M(Ω1), µ)1 ≥ 0, ‖µ1‖M(Ω ) = 1,1
πγ = E∗δ Sγ E δ δ dδ c, T1 (µ1), T2 (µ2) .
As we will see in the proof of Theorem 4.7, there holds (T δ1 (µ δ d1), T2 (µ2)) ∈M0(Ωδ)
such that πγ is well defined, cf. Lemma 2.2(i). Compared to (BK), we not only
replace the Kantorovich problem as the lower-level problem with its regularized
counterpart, but also add the cost function c to the set of optimization variables.
This is motivated by the so-called reverse approximation property, which is essential
to show the convergence of minimizers of (BKδγ) towards solutions of the original
unregularized bilevel problem, see the companion paper [12], where this property
is elaborated for the finite dimensional counterparts of (BK) and (BKδγ). This
property requires a set of optimization variables that is sufficiently rich, as is also
required, e.g., in the optimization of perfect plasticity, see [20]. For this reason,
c is treated as an additional optimization variable. After all, the penalty term in
the upper-level objective Jγ will ensure that, in the limit, c equals the given cost
function cd, see [12].
Remark 4.1. Instead of regularizing w.r.t. the Lebesgue measure, one could also
apply a regularization w.r.t.∫the produc∫t measure of the marginals µ 1
⊗ µ2, i.e.,
inf cdπ + γ 22 π d(µ1 ⊗ µ2)
(K̃Pγ)  Ω Ω
s.t. π ∈ L2(Ω1 × Ω2, µ1 ⊗ µ2) ∩Π(µ1, µ2),
where, with a slight abuse of notation, we use the same symbol for the Borel
measure π and its density w.r.t. the product measure. Note that the constraint
π ∈ Π(µ1, µ2) does not imply that π is automatically absolutely continuous w.r.t.
the product measure, as the counterexample Ωi = [0, 1], µi = λ, i = 1, 2, and
π = (id, id)#λ shows. Hence, an additional regularization is also necessary in
this case. Nevertheless, a regularization w.r.t. to the product measure has several
advantages. For instance, the marginals need not to be smoothed and the positivity
assumption on µ1 and µ2 in [18, Assumption 1] becomes superfluous. However, in
BILEVEL KANTOROVICH PROBLEM, PART I 9
the bilevel context, this approach does not seem to be promising: µ1 is an upper-
level variable and therefore the space for π is no longer fixed but depends on the
optimization variable.
The rest of this paper is dedicated to the existence of solutions to (BKδγ). In this
context, the continuity properties of Sγ are of course essential and will be discussed
in the following.
4.1. Hölder Continuity of the Regularized Solution Operator. The key
tool in the existence proof for the unregularized bilevel problem in Theorem 3.2 has
been the stability of the Kantorovich problem w.r.t. (weak-∗) perturbations of the
marginals from Lemma 3.1. Unfortunately, such a weak continuity result does not
hold in case of the regularized Kantorovich problem, as we will demonstrate below
by means of a counterexample.
Example 4.2. Let Ω1 = Ω2 = [0, 1], γ = 1, and c(x1, x2) :=
1
4 |x
2
1−x2| . Moreover,
define f : R→{R, f(x) := sgn(sin(2πx)) and, for n ∈ N, set{
f(nx ) + 9 , 0 ≤ x ≤ 1 1n 1 1 ,α (x ) := 4 2 αn 0, 0 ≤ x2 ≤ ,1 1 2 (x2) := 2
f(nx1) +
5 1
4 , 2 < x1 ≤ 1, −
1
2 ,
1
2 < x2 ≤ 1.
In view of (2.1a), we(further set1 )
π (x , x ) := αnn 1 2  1 (x1) + α
n
2 (x2)− c(x1, x2)
γ
+
f(nx ) + 91 4 − 14 |x1 − x2|2, 0 ≤ x 11, x2 ≤ 2 ,f(nx ) + 5 − 1 |x − x |2, 1 < x ≤ 1, 0 ≤ x ≤ 1 ,=  1 4 4 1 2 2 1 2( 2f(nx ) + 7 − 11 4 4 |x1 − x2|2, ) 0 ≤ x 1 11 ≤ 2 , 2 < x2 ≤ 1,
f(nx1) +
3 − 14 4 |x1 − x2|
2 , 12 < x1, x2 ≤ 1,+
whose weak limit (w.r.t.weak convergence in L2(Ω)) is 9 − 1 |x 24 4 1 − x2| , 0 ≤ x , x 11 2 ≤ 2 , 5 1 4 − 4 |x1 − x |
2
2 ,
1 < x 1
 2 1
≤ 1, 0 ≤ x2 ≤ ,
π(x1, x2) = 2 7 − 14 4 |x1 − x |22 , 0 ≤ x 1 11 ≤ 2 , 2 < x2 ≤ 1,7 − 18 8 |x1 − x2|2, 12 < x1, x2 ≤ 1.
Because the system in (2.1) is necessary and sufficient for optimality, πn is the
solution of (KPγ) wit∫h the cost function defined abov∫e and marginals given by1 1
µn1 (x1) := π (x , x ) dx , µ
n
n 1 2 2 2 (x2) := πn(x1, x2) dx1.
0 0
Note that, for all n ∈ N, µn 1i ≥ /16 λi-a.e., i = 1, 2, such that Lemma 2.2(ii) is
indeed applicable. Clearly, the weak convergence of πn implies that µ
n
i converges
weakly to µi ∈ L2(Ωi), i = 1, 2, and the pointwise bound carries over to the weak
limit such that Lemma 2.2(ii) also holds for the limits µ1 and µ2. Accordingly,
if the weak limit π were a solution to the regularized Kantorovich problem, then
there would be dual sol(utions α1 ∈ L2(Ω1) and α 22 ∈) L (Ω2) so that
π(x1, x2) = α1(x1) + α (x
2
2 2)− c(x1, x2) λ-a.e. in [0, 1] .+
10 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
Because of π > 0, this is equivalent to 94 , 0 ≤ x1, x 12 ≤ 2 ,5 , 1 < x1 ≤ 1, 0 ≤ x2 ≤ 1 ,
(4.2) α 4 2 21(x1) + α2(x2) =  74 , 0 ≤ x ≤ 11 2 , 12 < x2 ≤ 1,7
8 +
1
8 |x
2 1
1 − x2| , 2 < x1, x2 ≤ 1,
λ-a.e. in [0, 1(]2. T)his, howeve(r, lea)ds to a contradiction: picking arbitrary Lebesgue
points x̂ ∈ 0, 12 2 and x̃2 ∈} 12 , 1 of α2,{(4.2) implies
9 − α (x̂ ), 0 ≤ x ≤ 1 74 2 2 1 2 4 − α2(x̃2), 0 ≤ x1 ≤
1
= α(x ) = 2
5 1
4 − α2(x̂2),
1
2 < x1 ≤ 1 [ 78 +]18 |x1 − x̃ 22| − α2(x̃2), 12 < x1 ≤ 1
λ1-a.e. in [0, 1]. While the conditions in 0,
1
2 ( imp]ly that α2(x̂2) − α (x̃ ) = 12 2 2 , it
must hold that α2(x̂2) − α2(x̃2) ≤ 3 < 18 2 on
1
2 , 1 , which yields the desired con-
tradiction. Therefore, the weak limit π cannot be the solution of (KPγ) associated
with the limits µ1 and µ2, giving in turn that Sγ is not weakly continuous.
We compensate the lack of weak continuity by means of the convolution operators
T δi , i = 1, 2, which turn weakly convergent into strongly convergent sequences. Still,
we need to prove that Sγ is continuous w.r.t. the strong topology of L2(Ωδ). This
is what we will show next. To this end, let γ > 0 and δ > 0 be arbitrary, but fixed
throughout this section. To simplify the notation, we slightly chance the notation in
comparison to the beginning of this section and the consecutive subsection. First,
we suppress the sub- and superscripts γ and δ in the rest of this section, except for
Sγ and πγ in order to underline the difference to the solution of the unregularized
problem (KP) and its solution set in (3.1). Moreover, given (µ1, µ2) ∈M0(Ω), and
c ∈ L2(Ω), we set π 2γ := Sγ(c, µ1, µ2) ∈ L (Ω) (i.e., without the restriction operator
E∗δ ).
We start with several auxiliary boundedness results. For this purpose, let us
consider arbitrary but fixed data c ∈ L2(Ω), µ ∈ L2i (Ωi), i = 1, 2, with
(4.3) ‖µ1‖L1(Ω ) = ‖µ1‖L1(Ω ) =: m.1 2
Moreover, we suppose that the assumptions of the second part of Lemma 2.2 are
satisfied, i.e., there are constants c > −∞ and δ > 0 such that
(4.4) c ≥ c λ-a.e. in Ω and µi ≥ δ λi-a.e. in Ωi, i = 1, 2.
Note that the latter condition is automatically fulfilled in context of the bilevel
problem (BKδγ) owing to the definitions of the operators T δi . Moreover, the penalty
term in the upper level objective ensures that the cost c is bounded in W 1,p(Ω) ↪→
C(Ω), which ensures the other condition thanks to the compactness of Ω.
Lemma 4.3. There exists a co(nstant C = C(γ,m) > 0 such tha)t
‖πγ‖L2(Ω) ≤ C ‖µ1‖L2(Ω ) ‖µ2‖L2(Ω ) + ‖c‖L2 .1 2 (Ω)
Proof. According to Lemma 2.2, there exist α1 ∈ L2(Ω1) and α ∈ L22 (Ω2) such
that the system in (2.1) is fulfilled. Multiplying (2.1b) and (2.1c) with α1 and α2,
respectively, integrating and∫adding the res∫ulting equations∫leads to
(4.5) γ‖π‖2L2(Ω) = µ1α1 dλ1 + µ2α2 dλ2 − πcdλ,
Ω1 Ω2 Ω
BILEVEL KANTOROVICH PROBLEM, PART I 11
where we also used (2.1a). Exploiting the normalization condition in (4.3), we
obtain ∫ ∫ ∫ ∫
1 1
µ1α1 dλ1 = µ1α1 µ2 dλ2 dλ1 = (µ1 ⊗ µ2)α1dλ
Ω m1 Ω1 Ω m2 Ω
and an analogous equation for µ2α2. Herein, (µ1 ⊗ µ2) := µ1(x1)µ2(x2) for λ-a.a.
(x1, x2) ∈ Ω refers to the direct product of µ1 and µ2. One easily verifies that
µ ⊗ µ ∈ L21 2 (Ω) ∫with ‖µ1 ⊗ µ2‖L2(Ω) = ‖µ1‖L2(Ω )‖µ2‖L2(Ω ). Therefore, (4.5)1 2allows us to estimate ∫ ∫
γ‖π‖2 1 1L2(Ω) ≤ (µ1 ⊗ µ2)(α1 ⊕ α2 − c)+dλ+ (µ1 ⊗ µ2)cdλ− πcdλm Ω m Ω Ω
≤ γ m−1 ‖π‖L2(Ω)‖µ1‖L2(Ω )‖µ1 2‖L2(Ω2)
+m−1 ‖µ1‖L2(Ω )‖µ2‖1 L2(Ω2)‖c‖L2(Ω) + ‖c‖L2(Ω)‖π‖L2(Ω).
and Young’s inequality then gives the result. 
The next two lemmas address the boundedness of the dual variables α1 and α2
from Lemma 2.2. We first observe that the dual variables cannot be bounded by
the data in any norm, since they are not unique: if (α1, α2) satisfies (2.1) for πγ ,
then, for any r ∈ R, (α1 + r, α2 − r) does so, too. In the rest of the paper, we
therefore pick a particular dual so∫lution, namely the one, where r is chosen suchthat
(4.6) α2 dλ2 = 0.
Ω2
We call a dual solution (α1, α2) satisfying (4.6) zero-mean dual solution. The above
considerations show that, under the assumptions of Lemma 2.2, there always exists
a zero-mean dual solution. We further underline that even the zero-mean dual so-
lution is in general not unique as the counterexample in [18, Section 2.3] illustrates.
Still, we can show that every zero-mean dual solution is bounded by the data c, µ1,
and µ2. We start with an L
1 bound in the following
Lemma 4.4. Every zero-mean dual(solution satisfies )
‖α1‖
2
L1(Ω ) + ‖α2‖L1(Ω ) ≤ C ‖µ1‖L2 ‖µ ‖ 2 + ‖c‖ 2 + 11 2 (Ω1) 2 L (Ω2) L (Ω)
with a constant C = C(γ,m, δ) > 0.
Proof. Lemma 2.2(iii) yields
‖c‖L2(Ω) ‖πγ‖L2(Ω) ≥ −Kγ(πγ)
= −Φγ(∫α1, α2) ∫
≥ − 1 1(µ1 ⊗ µ2)(α1 ⊕ α2 − c) dλ− (µ1 ⊗ µ2)cdλ,
m Ω m Ω
which, in co(mbination with (2.1a), leads)to
‖c‖L2(Ω) ‖∫πγ‖L2(Ω) + ‖µ1 ⊗ µ2‖L2(Ω) ∫
≥ − 1 ∫ (µ1 ⊗ µ2)(α1 ⊕ α2 −
1
∫c)+ dλ+ (µ1 ⊗ µ2)(α1 ⊕ α2 − c)− dλm Ω m Ω
γ δ2≥ − (µ1 ⊗ µ2)πγ dλ+ (α1 ⊕ α2 − c)− dλ,
m Ω m Ω
12 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
where δ > 0 is the constant from (4.4). Thus, thanks to Lemma 4.3, we arrive at
‖α1 ⊕ α2‖L1(Ω) ≤ ‖(α1 ⊕ α2 − c)+‖L1(Ω) + ‖(α1 ⊕ α2 − c)−‖L1(Ω) + ‖c‖L1(Ω)
≤ γ‖πγ(‖L1(Ω) + ‖c‖L1(Ω)
+ m2 ‖c‖L2δ (Ω)‖πγ‖L2(Ω)
( )+ 1 ‖c‖ γL2(Ω)‖µ1 ⊗ µ2‖L2m (Ω) + m‖)µ1 ⊗ µ2‖L2(Ω)‖πγ‖L2(Ω)
≤ C ‖µ1‖ ‖
2
L2(Ω ) µ2‖L2(Ω ) + ‖c‖L2(Ω) + 11 2
with a constant C > 0 depending on γ, m, and δ. To deduce a bound for α1 and
α2 individually from this, we emp∫loy (4.6) to obtain
‖α1 ⊕ α2‖L1(Ω) = sup φ(α1 ⊕ α2) dλ
φ∈L∞(Ω), Ω
‖φ‖∞≤1 ∫
≥ sup (φ1 ⊗ 1)(α1 ⊕ α2) dλ = |Ω2| ‖α1‖L1(Ω ).1
φ ∞1∈L (Ω1), Ω
‖φ1‖∞≤1
Finally, the L1-norm of α2 is estim(ated by )
‖α2‖ −1L1(Ω ) ≤ |Ω1| ‖α1 ⊕ α2‖L1(Ω) + |Ω2|‖α2 1‖L1(Ω ) ,1
which, together with the previous estimates, yields the claim. 
Lemma 4.5. There is a constant C(= C(γ,m, δ, c) > 0 such that )
‖ 6α1‖L2(Ω ) + ‖α2‖L2(Ω ) ≤ C ‖µ1 2 1‖L2(Ω )‖µ1 2‖L2(Ω + ‖c‖ 22) L (Ω) + 1
holds for every zero-mean dual solution.
Proof. We proceed in two steps: in a first step, we show the boundedness of the
positive parts of α1 and α2 (i); in a second step, we derive bounds for their negative
parts (ii).
Ad (i): Let us denote the bound from Lemma 4.4 by M > 0, i.e., in particular
‖α2‖L1(Ω ) ≤M , and define{, up to sets of zero Lebesg}ue measure, the subset2
∈ | | ≤ 2MΩ̃2 := x2 Ω2 : α2(x2) ⊂ Ω .| 2Ω2|
Then it follows b∫y construction tha∫t
M ≥ |α2(x2)|dλ2 ≥ |α2(x2)|dλ2 ≥
2M |Ω \ Ω̃ |,
Ω Ω \Ω̃ |Ω2|
2 2
2 2 2
which in turn implies
|Ω2| |Ω2|
(4.7) ≥ |Ω2 \ Ω̃2| = |Ω2| − |Ω̃2| =⇒ |Ω̃2| ≥ > 0.
2 2
Furthermore, we define, up to sets of zero Lebesgue measure, the sets
Ω+1 := {x1 ∈ Ω1 : α1(x1) ≥ 0}
as well as
Ω̃+ := {(x +1, x2) ∈ Ω1 × Ω̃2 : α1(x1) + α2(x2) ≥ 0}.
Then, by construction, it holds
≤ ≤ − ≤ 2M0 α1(x1) α2(x2) λ-a.e.-in (Ω+ × Ω̃ ) \ Ω̃+.|Ω2| 1
2
BILEVEL KANTOROVICH PROBLEM, PART I 13
Together with (4.7), this yields ∫ ∫
‖ ‖2 1(α1) 2+ L2(Ω ) = |α | dλ dλ1 |Ω̃2|(Ω̃∫ 1 12 Ω+1 ∫ 2 )
2
(4.8) ≤ 2∫ |α1| dλ+ |α1|
2 dλ
|Ω2| Ω̃+ (Ω+1 ×Ω̃2)\Ω̃+
≤ 2 | |2 8 |Ω1|α1 dλ+ M2.|Ω2| 2Ω̃+ |Ω2|
In order to estimate the first term∫ on the right hand side, we first observe that
‖(α 2 21 ⊕ α2)+‖L2(Ω) ≥ ∫ (α1 ⊕ α2) dλΩ̃+ ∫
≥ ∫ |α1|2dλ− 2 |α1| |α2|dλ(4.9) Ω̃+ Ω̃+
≥ |α 21| dλ− 2 ‖α1‖L1(Ω ‖α ‖ 11) 2 L (Ω2)
Ω̃+
≥ ‖α1‖2 2L2(Ω̃+ − 2M .)
The left hand side in the above equation is estimated by means of (2.1a) and
Lemma 4.3 as follows: (∫ ∫ )
‖(α1 ⊕ α2) ‖2 2 2+ L2(Ω) ≤ 2( (α1 ⊕ α2 − c)+dλ+Ω ) c dλΩ(4.10)
= 2 (γ2‖π ‖2 2γ L2(Ω) + ‖c‖L2(Ω) )
≤ C ‖µ ‖2 ‖µ ‖2 21 L2(Ω1) 2 L2(Ω + ‖c‖ 22) L (Ω) .
Inserting (4.9) and (4.10) into (4.8) and employing the definition of M as the bound
from Lemma 4.4 then yields ( )2
(4.11) ‖(α1)+‖L2(Ω ) ≤ C ‖µ1‖L2(Ω )‖µ2‖L2(Ω ) + ‖c‖L2(Ω) + 1 ,1 1 2
with a positive constant C depending on γ, m, and δ. Interchaninging the roles of
α1 and α2 implies the same bound for (α2)+.
Ad (ii): To show the bound for the negative part, we argue similarly to the
second part of the proof of [18, Theorem 2.11]. Given r ∈ R, we consider the set
(defined up to sets of zero Lebesgue measure)
Ω̂r2 := {x2 ∈ Ω2 : (α2)+(x2) > r + c} ⊂ Ω2.
Recall that c > −∞ is defined to be a lower bound of the cost function c, see (4.4).
For any r > −c, the Lebesg∫ue measure of this subset can be estimated by
| r| ≤ 1 ≤ 1 ‖ ‖ ≤ MΩ̂2 (α2)+dλ2 α2 L1(Ω ) ,r + c 2Ω̂r r + c r + c2
where M a∫gain denotes the bound fro∫m Lemma 4.4. For all r > −c, we thus obtain
(−r + α2 − c)+ dλ2 ≤ ∫ (−(r + c) + (α ) ) dλΩ2 Ω2 ( 2 + +) 2
= − (r + c) + (α2)+ dλ2
Ω̂r2
14 SEBASTIAN HILLBRECHT AND CHRISTIAN ME(YER ) 1
≤ | 1 M 2Ω̂r| 22 ‖(α2)+‖L2(Ω ) ≤ K,2 r + c
where K is short for the bound for ‖(α2)+‖L2(Ω ) from step (i), i.e., the right-hand2
side in (4.11). Therefore, if we set
M K2
(4.12) R := − c + 1 > −c,
γ2 δ2
then ∫
(−R+ α2 − c)+ dλ2 < γ δ.
Ω2
Now∫ assume that α1 ≤ −R λ1-a.e. on a set E ⊂ Ω1 of positive Lebesgue measure.Then ∫
(α1 ⊕ α2 − c)+ dλ2 ≤ (−R+ α2 − c)+ dλ2 < γ δ ≤ γ µ1 λ1-a.e. in E,
Ω2 Ω2
which contradicts (2.1b). Hence the definition of R in (4.12) along with the defini-
tion of M and K being the bounds from Lemma 4.4 and (4.11) yields the existence
of a constant C > 0 su(ch that )
(α1)− ≤ R ≤ C ‖µ1‖ ‖ ‖ ‖ ‖
6
L2(Ω ) µ2 L2(Ω ) + c L2(Ω) + 1 λ1 2 1-a.e. in Ω1,
i.e., we actually even obtain an L∞-bound for the negative part. Note that, since R
depends on γ, δ, and c, the same holds for the above constant. Again, the estimate
for (α2)− follows by means of reversed roles. Combining the results for the positive
and negative part finally proves the lemma. 
With the above boundedness results we are now in a position to prove the
(strong) continuity of Sγ . For this purpose, let us define the following sets:
(4.13) H(Ω) := L2(Ω)× L2(Ω1)× L2(Ω2),
(4.14) Cc(Ω) := {{c ∈ L2(Ω): c ≥ c λ-a.e. in Ω},
(4.15) Mmδ (Ω) := (µ1, µ2) ∈ L2(Ω1)× L2(Ω2) : ‖µ1‖L1(Ω ) = ‖µ1 2‖L1(Ω ) = m,2 }
µi ≥ δ λi-a.e. in Ωi, i = 1, 2 .
Proposition 4.6. Let γ, δ,m > 0 and c > −∞ be given. Then the solution operator
of the regularized Kantorovich problem (KPγ) is Hölder continuous with exponent
1/2 on bounded sets in the following sense: For all M > 0, there exists a constant
L > 0, depending on M , γ, δ, m, and c, such that
1
‖S /γ(cµ, µ1, µ2)− Sγ(cν , ν1, ν2)‖L2(Ω) ≤ L ‖(cµ, µ1, µ )− (c , ν , ν )‖ 22 ν 1 2 H(Ω)
for all (c mµ, µ1, µ2), (cν , ν1, ν2) ∈ Cc(Ω)×Mδ (Ω) with
‖(cµ, µ1, µ2)‖H(Ω), ‖(cν , ν1, ν2)‖H(Ω) ≤M.
Proof. The result is more or less a straight forward consequence of the previous
boundedness results. Let (cµ, µ1, µ2), (cν , ν1, ν2) ∈ (Cc(Ω)×Mmδ (Ω))∩BH(Ω)(0,M)
be given. We set πµ := Sγ(cµ, µ1, µ2) and πν := Sγ(cν , ν1, ν2) and denote the
associated zero-mean dual solutions by (αµ µ ν ν 2 2
∫ 1
, α2 ), (α1 , α2) ∈ L (Ω1)× L (Ω2). The
equality constraints in (KPγ) imply ∫
(4.16) (πµ − πν) dλ2 = µ1 − ν1, (πµ − πν) dλ1 = µ2 − ν2.
Ω2 Ω1
BILEVEL KANTOROVICH PROBLEM, PART I 15
Testing the first and second equation in (4.16) with αµ ν µ ν1−α1 and α2−α2 , respectively,
integratin∫g and then a(dding the resulting equations, we arr1 )ive at
(πµ −(∫πν) (α
µ
1 ⊕ α
µ
2 − cµ)−
1
(αν∫⊕ αν1 2 − cν) dλΩ γ γ
1
= ∫(µ1 − ν1)(α
µ
1 − αν
µ ν
1)dλ1 +) (µ2 − ν2)(α2 − α2)dλ2γ Ω1 Ω2
− (πµ − πν)(cµ − cν)dλ .
Ω
Using (2.1a), the inequality (a − b )2+ + ≤ (a+ − b+)(a − b) for all a, b ∈ R, and
Young’s inequality, we a(r∑rive at2 )
‖πµ − π 2ν‖L2(Ω) ≤ C ‖α
µ ν 2
i − αi ‖L2(Ω )‖µi − νi‖L2i (Ωi) + ‖cµ − cν‖L2(Ω) .
i=1
By Lemma 4.5, there holds
‖αµ ν µ ν 6i − αi ‖L2(Ω ) ≤ ‖αi ‖ 2i L (Ω ) + ‖αi ‖L2(Ω ) ≤ C(M + 1)i i
with a constant C depending on γ, δ, m, and c. Inserting this in the above inequality
then gives the result. 
Some words concerning the above results are in order. First, due to their non-
uniqueness, it is clear that one cannot expect an analogous result for the dual
variables, even if we restrict to the zero-mean dual solutions. Moreover, an inspec-
tion of the foregoing analysis reveals that the constant L from Proposition 4.6 tends
to infinity, if γ and/or δ converge to zero or if c approaches −∞. This is somewhat
to be expected, since one looses regularity as γ tends to zero and there is in general
no existence of an optimal transport plan, if there is no lower bound for the cost.
4.2. Existence of Optimal Solutions. Based on the Hölder continuity result,
we are now in the position to establish the existence of solutions to (BKδγ). For
this purpose, we return to the notation from the beginning of this section, i.e. in
particular, given c ∈ C(Ω) and µ1 ∈ M(Ω1) with µ1 ≥ 0 and ‖µ1‖M(Ω1) = 1, we
set πγ := E∗δ Sγ(Eδ c, T δ1 (µ1), T δ2 (µd2)) ∈M(Ω).
Theorem 4.7. Let γ > 0 and δ > 0 be fixed. There exists at least one globally
optimal solution to the regularized bilevel Kantorovich problem (BKδγ).
Proof. Based on Proposition 4.6, we can apply the direct method of the calculus of
variations. First, we note that, for all µ1 ∈M(Ω1), µ2 ∈M(Ω2) with ‖µ1‖M(Ω =1)
‖µ2‖M(Ω ) and µi ≥ 0, i = 1, 2, there holds ϕδi ∗ µi ≥ 0 λi-a.e. in Ωδi and thus2
‖T δ1 (µ1)‖L1(Ωδ) = ‖ϕδ1‖L1(Rd11 )‖µ1‖M(Ω1) + δ(4.17)
= ‖ϕδ2‖L1(Rd2 )‖µ2‖M(Ω ) + δ = ‖T δ2 (µ2)‖L1(Ωδ) =: m.2 2
Hence, (T δ1 (µ1), T δ2 (µd2)) ∈Mm δ δ d0 (Ωδ) so that Sγ(Eδ c, T1 (µ1), T2 (µ2)) is well defined
for every c ∈ C(Ω). Consequently, the feasible set of (BKδγ) is nonempty and thus
there is an infimal sequence {(πnγ , µn1 , cn)}n∈N ⊂M(Ω)×M(Ω )×W 1,p1 (Ω), i.e.,
(4.18) Jγ(πnγ , µn1 , cn)→ inf (BKδγ) ∈ R ∪ {−∞} as n→∞,
16 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
where inf (BKδγ) denotes the minimal value of (BK
δ
γ). By the constraint in (BK
δ
γ),
the sequence {µn1}n∈N is bounded in M(Ω1). Moreover, due to the constraints in
(KPγ), ∫
‖πn δ nγ ‖M(Ω) ≤ ‖Sγ(Eδ cn, T1 (µ1 ), T δ2 (µd2))‖ δ dL1(Ω = ϕ ∗ µ dλδ) 2 2 2 + δ = 1 + δ.
Ωδ2
Therefore, {µn1} and {πnγ } are both bounded in M(Ω1) and M(Ω), respectively, and
thus, the presupposed weak-∗ lower semicontinuity of J implies that
inf J (πn, µnγ 1 ) =: j > −∞.
n∈N
Consequently, for n ∈ N large enough,( )
‖cn − c pd‖W 1,p(Ω) ≤ pγ max{inf (BK
δ
γ) + 1, 0} − j
such that {c } is bounded in W 1,pn (Ω). Therefore, there is a subsequence, denoted
for simplicity by the same symbol, such that
(4.19) (πnγ , µ
n
1 ) ⇀
∗ (π̄, µ̄1) in M(Ω)×M(Ω1), cn ⇀ c̄ in W 1,p(Ω).
The set {µ1 ∈M(Ω1) : µ1 ≥ 0} is∫easily seen∫to be weakly-∗ closed. Hence µ̄1 ≥ 0and thus
1 = ‖µn‖ n1 M(Ω ) = dµ1 → dµ̄1 = ‖µ̄1 1‖M(Ω1).
Ω1 Ω1
Therefore, the weak limit µ̄1 satisfies the constraints in (BK
δ
γ). Moreover, since
p > d, the embedding W 1,p(Ω) ↪→ C(Ω) is compact, and thus cn → c̄ uniformly in
Ω. Since Ω is compact, there is a constant c > −∞ such that infx∈Ω cn(c) ≥ c for
all n ∈ N as well as infx∈Ω c̄(x) ≥ c. Hence, there holds Eδ c̄, Eδcn ∈ Cc(Ωδ) for all
n ∈ N and the uniform convergence implies
Eδcn → Eδ c̄ in L2(Ωδ).
Furthermore, the complete continuity of the convolution yields
T δ n1 (µ1 )→ T δ1 (µ̄1) in L2(Ωδ1).
Since µn1 , µ̄1 ≥ 0 and µd2 ≥ 0, we have ϕδ1 ∗ µn1 , ϕδ1 ∗ µ̄1 ≥ 0 λ1-a.e. in Ωδ1 and
ϕδ ∗ µd2 2 ≥ 0 λ2-a.e. in Ωδ2 and therefore thanks to the constant shift in (4.1)
(T δ1 (µn1 ), T δ2 (µd m δ2)) ∈Mδ′ (Ωδ) ∀n ∈ N, (T1 (µ̄ ), T δ(µd)) ∈Mm1 2 2 δ′ (Ωδ),
where δ′ := max{|Ωγ |, |Ωγ1 2 |}−1 δ. Therefore, we are allowed to apply the Hölder
continuity result from Proposition 4.6, which gives
Sγ(Eδ c , T δn 1 (µn1 ), T δ2 (µd2))→ S (E c̄, T δγ δ 1 (µ̄1), T δ2 (µd2)) in L2(Ωδ).
The continuity of the restriction operator E∗δ and the first convergence in (4.19)
then imply
π̄ = E∗δ Sγ(Eδ c̄, T δ1 (µ̄1), T δ2 (µd2))
such that the weak limit (π̄, µ̄1, c̄) is feasible for (BK
δ
γ). The mapping W
1,p(Ω) 3
c 7→ ‖c − c ‖pd W 1,p(Ω) is continuous and convex and therefore weakly lower semi-
continuous. Additionally, J was assumed to be weakly-∗ lower semicontinuous.
Altogether, the convergence in (4.19) yields
Jγ(π̄, µ̄1, c̄) ≤ lim inf Jγ(πn, µn1 , cn) = inf (BKδ→∞ γ),n
ensuring the optimality of the weak limit. 
BILEVEL KANTOROVICH PROBLEM, PART I 17
References
[1] Robert A. Adams and John J. F. Fournier. Sobolev spaces, volume 140 of Pure and Applied
Mathematics (Amsterdam). Elsevier/Academic Press, Amsterdam, second edition, 2003.
[2] Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In Modelling and
optimisation of flows on networks, pages 1–155. Springer, 2013.
[3] V. Barbu. Analysis and Control of nonlinear infinite dimensional systems. Academic Press,
New York, 1993.
[4] Constantin Christof, Christian Clason, Christian Meyer, and Stephan Walther. Optimal con-
trol of a non-smooth semilinear elliptic equation. Math. Control Relat. Fields, 8(1):247–276,
2018.
[5] Constantin Christof, Juan Carlos De los Reyes, and Christian Meyer. A nonsmooth trust-
region method for locally Lipschitz functions with application to optimization problems con-
strained by variational inequalities. SIAM J. Optim., 30(3):2163–2196, 2020.
[6] Christian Clason, Dirk A. Lorenz, Hinrich Mahler, and Benedikt Wirth. Entropic regular-
ization of continuous optimal transport problems. J. Math. Anal. Appl., 494(1):Paper No.
124432, 22, 2021.
[7] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances
in neural information processing systems, pages 2292–2300, 2013.
[8] Pierre Grisvard. Elliptic Problems in Nonsmooth Domains. Classics in Applied Mathematics.
SIAM, Philadelphia, 1985.
[9] Lukas Hertlein and Michael Ulbrich. An inexact bundle algorithm for nonconvex nonsmooth
minimization in Hilbert space. SIAM J. Control Optim., 57(5):3137–3165, 2019.
[10] Roland Herzog, Christian Meyer, and Gerd Wachsmuth. C-stationarity for optimal control of
static plasticity with linear kinematic hardening. SIAM Journal on Control and Optimization,
50(5):3052–3082, 2012.
[11] Sebastian Hillbrecht, Paul Manns, and Christian Meyer. Bilevel optimization of the Kan-
torovich problem and its quadratic regularization, part II: Convergence results in infinite
dimensions. In preparation, 2022.
[12] Sebastian Hillbrecht and Christian Meyer. Bilevel optimization of the Kantorovich problem
and its quadratic regularization, part III: Reverse approximation in finite dimensions. In
preparation, 2022.
[13] M. Hintermüller and I. Kopacka. Mathematical programs with complementarity constraints
in function space: C-and strong stationarity and a path-following algorithm. SIAM Journal
on Optimization, 20(2):868–902, 2009.
[14] Tim Hoheisel, Christian Kanzow, and Alexandra Schwartz. Theoretical and numerical com-
parison of relaxation methods for mathematical programs with complementarity constraints.
Math. Program., 137(1-2, Ser. A):257–288, 2013.
[15] Leonid V. Kantorovič. On the translocation of masses. C. R. (Doklady) Acad. Sci. URSS
(N.S.), 37:199–201, 1942.
[16] Christian Kanzow and Alexandra Schwartz. The price of inexactness: convergence properties
of relaxation methods for mathematical programs with complementarity constraints revisited.
Math. Oper. Res., 40(2):253–275, 2015.
[17] Dirk Lorenz and Hinrich Mahler. Orlicz space regularization of continuous optimal transport
problems. arXiv:2004.11574, 2021.
[18] Dirk A. Lorenz, Paul Manns, and Christian Meyer. Quadratically regularized optimal trans-
port. Appl. Math. Optim., 83(3):1919–1949, 2021.
[19] Zhi-Quan Luo, Jong-Shi Pang, and Daniel Ralph. Mathematical programs with equilibrium
constraints. Cambridge University Press, Cambridge, 1996.
[20] Christian Meyer and Stephan Walther. Optimal control of perfect plasticity Part II: Displace-
ment tracking. SIAM J. Control Optim., 59(4):2498–2523, 2021.
[21] F. Mignot. Contrôle dans les inéquations variationelles elliptiques. Journal of Functional
Analysis, 22(2):130–185, 1976.
[22] Filippo Santambrogio. Optimal transport for applied mathematicians, volume 87 of Progress
in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer, Cham,
2015. Calculus of variations, PDEs, and modeling.
[23] Holger Scheel and Stefan Scholtes. Mathematical programs with complementarity constraints:
stationarity, optimality, and sensitivity. Math. Oper. Res., 25(1):1–22, 2000.
18 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER
[24] A. Schiela and D. Wachsmuth. Convergence analysis of smoothing methods for optimal control
of stationary variational inequalities with control constraints. ESAIM: M2AN, 47:771–787,
2013.
[25] Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Mathe-
matics. American Mathematical Society, Providence, RI, 2003.
[26] Cédric Villani. Optimal transport. Old and new, volume 338 of Grundlehren der Mathematis-
chen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag,
Berlin, 2009.
[27] Gerd Wachsmuth. Towards M-stationarity for optimal control of the obstacle problem with
control constraints. SIAM Journal on Control and Optimization, 54(2):964–986, 2016.
[28] Gerd Wachsmuth. A guided tour of polyhedric sets. Journal of Convex Analysis, 26(1):153–
188, 2019.
Sebastian Hillbrecht, Technische Universität Dortmund, Fakultät für Mathematik,
Lehrstuhl X, Vogelpothsweg 87, 44227 Dortmund, Germany
Email address: sebastian.hillbrecht@tu-dortmund.de
Christian Meyer, Technische Universität Dortmund, Fakultät für Mathematik, Lehrstuhl
X, Vogelpothsweg 87, 44227 Dortmund, Germany
Email address: christian2.meyer@tu-dortmund.de