No. 655 September 2022 BILEVEL OPTIMIZATION OF THE KANTOROVICH PROBLEM AND ITS QUADRATIC REGULARIZATION PART I: EXISTENCE RESULTS S. Hillbrecht, C. Meyer ISSN: 2190-1767 BILEVEL OPTIMIZATION OF THE KANTOROVICH PROBLEM AND ITS QUADRATIC REGULARIZATION PART I: EXISTENCE RESULTS SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER Abstract. This paper is concerned with an optimization problem governed by the Kantorovich optimal transportation problem. This gives rise to a bilevel optimization problem, which can be reformulated as a mathematical problem with complementarity constraints in the space of regular Borel measures. Be- cause of the non-smoothness induced by the complementarity relations, prob- lems of this type are frequently regularized. Here we apply a quadratic regular- ization of the Kantorovich problem. As the title indicates, this is the first part in a series of three papers. It addresses the existence of optimal solutions to the bilevel Kantorovich problem and its quadratic regularization, whereas part II and III are dedicated to the convergence analysis for vanishing regularization. 1. Introduction This paper is concerned with a bilevel optimization problem with the Kan- torovich problem of optimal transport as the lower-level problem. The problem under consideration takes the following form: inf J (π, µ1)π,µ1 (BK)  s.t. µ1 ∈M(Ω1){, ∫µ1 ≥ 0, ‖µ1‖M(Ω ) = ‖µ d  1 2 ‖M}(Ω ,2) π ∈ arg min cd dϕ : ϕ ∈ Π(µ1, µd2), ϕ ≥ 0 . Ω Herein, cd is a given cost function measuring the transportation cost and µ d 2 a given marginal on a domain Ω2. The set Π(µ d 1, µ2) denotes the set of feasible transport plans, i.e., regular Borel measures that have µ d1 and µ2 as first and second marginal, see (1.1) below. The lower level problem thus aims at minimizing the transportation cost among all feasible transport plans associated with µ1 and µ d 2 . It is Kantorovich’s well-known generalization of the famous Monge problem, cf. [15]. We refer to [25, 26, 2, 22] for more details on the Kantorovich problem and its application background. The bilevel optimization problem now consists of varying the first marginal µ1 such that this marginal, together with an associated optimal transport plan, minimizes a given objective J . The additional constraints on µ1 in (BK) ensure that there is at least one optimal transport plan associated with µ1 Date: June 27, 2022. 2010 Mathematics Subject Classification. 49Q22, 90C08, 49J45. Key words and phrases. Optimal transport, Kantorovich problem, bilevel optimization, qua- dratic regularization. This research was supported by the German Research Foundation (DFG) under grant num- ber LO 1436/9-1 within the priority program Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization (SPP 1962). 1 2 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER so that the feasible set of (BK) is non-empty. One possible application for such a bilevel problem could be, for instance, the identification of the first marginal based on measurements πd of the transport plan on a part D of the domain Ω1 × Ω2. In this case, the upper level objective would be of the form J (π, µ1) = d(π, πd) (plus potential regularization terms accounting for errors in the measurement), where d denotes a suitable distance such as |π − πd|(D) or ‖π − π(d)‖W−1,p(D) for some p > dim(D). From a bilevel optimization point of view, the Kantorovich problem is challeng- ing. First, for a given cost cd and marginals µ1 and µ d 2 , the optimal transport plan needs not to be unique (unless the cost function is strictly convex and at least one of the marginals is absolutely continuous w.r.t. the Lebesgue measure, see [22, Theorem 1.17]). Thus, in general, there is no single-valued solution mapping µ1 7→ π associated with the lower level problem in (BK). This prevents us from using the so-called implicit programming approach, where the lower level problem is replaced by its solution operator and the (potentially limited) differentiability properties of the latter are used to derive optimality conditions and optimization algorithms for the bilevel problem, cf. e.g. [21, 4]. Alternatively, one could replace the convex lower-level problem by its necessary and sufficient first-order optimality conditions. These, however, contain a complementarity system in the space of reg- ular Borel measures, which turns the bilevel problem into a mathematical program with complementarity constraints (MPCC) in M(Ω1 × Ω2). A common strategy to treat MPCCs is to regularize the complementarity con- straints and the lower level problem. We only refer to [19, 23, 14, 16] in the finite dimensional setting and to [3, 13, 10, 24, 27] for problems in function spaces. These approaches are of theoretical as well as numerical interest. While a limit analysis for vanishing regularization parameters yields stationarity conditions for the original problem, the regularized problems can often be treated with standard algorithms that, together with a path-following procedure for the regularization parameter, can provide an efficient method for solving an MPCC. Here, we follow a similar ap- proach and employ a quadratic regularization of the Kantorovich problem, which was proposed and analyzed in [18]. This regularization has several advantages. First, the regularized Kantorovich problem is strictly convex and thus uniquely solvable, which is in particularly attractive from the viewpoint of bilevel optimiza- tion as it allows the implicit programming approach to be applied. Moreover, the regularity of the optimal transport plans is improved in a way that we are faced with an MPCC in L2(Ω1 × Ω2) instead of an MPCC in the space of regular Borel measures. Finally, as shown in [18], the quadratic regularization preserves essential features of the original Kantorovich problem such as the sparsity of the optimal transport plan as well as a dual problem that provides a substantial reduction of the dimension. As a price for these desirable properties, the regularized problems still contain a complementarity relation and are therefore not smooth, in contrast to common MPCC regularization approaches. However, due to their particular structure involving a complementarity system in L2(Ω1×Ω2), we expect that non- smooth optimization algorithms for MPCCs in function space are applicable, see e.g. [9, 5]. This work is the first part in a series of three papers. While the other two contri- butions address the question of convergence of solutions of the regularized bilevel problems for vanishing regularization parameter, this paper is concerned with the BILEVEL KANTOROVICH PROBLEM, PART I 3 existence of solutions. For the bilevel Kantorovich problem (BK) itself, existence of globally optimal solutions is rather straightforward to show, based on known (weak) stability results for the Kantorivich problem. The situation changes, if one turns to its regularized counterpart. As we will see by means of a counterexample, the solution operator associated with the regularized Kantorovich problem is not weakly continuous. Nevertheless, one can show its strong continuity, which allows us to prove the existence of solution for the regularized bilevel problems. The paper is organized as follows: After introducing some basic notation and assumptions in the rest of this introduction, we collect some known results on the Kantorovich problem and its quadratic regularization in Section 2. We then turn to the existence of globally optimal solutions for (BK) in Section 3. The main part of the paper is contained in Section 4, where we first verify that the regularized solution operator is locally Hölder continuous and, based on that, show the existence of optimal solutions for the regularized bilevel problems. 1.1. Notation and Standing Assumptions. Domains. For d1, d2 ∈ N, let Ω1 ⊂ Rd1 and Ω d22 ⊂ R be compact and connected sets with non-empty interior. We moreover suppose that their Cartesian product Ω := Ω1×Ω2 coincides with the closure of its interior and has a Lipschitz boundary in the sense of [8, Def. 1.2.2.1]. By B(Ω), we denote the respective Borel σ-algebra on Ω and by λ the Lebesgue measure on B(Ω). For Ω1 and Ω2, B(Ωi) and λi, i = 1, 2, are defined analogously so that λ = λ1 ⊗ λ2. Furthermore, we abbreviate |Ω1| := λ1(Ω1), |Ω2| := λ2(Ω2), and |Ω| := λ(Ω). Marginals. Let (X,B(X)) be a measurable space. Then, we denote by M(X) the space of (signed) regular Borel measures on X equipped with the total variation norm, i.e., ‖µ‖M(X) := |µ|(X). If µ1 ∈ M(Ω1) and µ2 ∈ M(Ω2), then the set of transport plans between the marginals µ1 and µ2 is given by (1.1) Π(µ1, µ2) := {π ∈M(Ω): P1#π = µ1 and P2#π = µ2}, where, for i = 1, 2, Pi#π := π ◦ P−1i : B(Ωi)→ R, is the pushforward measure of π via the projection Pi : Ω 3 (x1, x2) 7→ xi ∈ Ωi. Note that Π(µ1, µ2) = ∅, if µ d1(Ω1) 6= µ2(Ω2). Throughout the paper, µ2 ∈M(Ω2) is a fixed marginal satisfying µd2 ≥ 0 and, in order to ease notation, ‖µd2‖M(Ω ) = 1.2 The normalization condition is no restriction and can be ensured by re-scaling. Given a measure space (X,A, µ), the Lebesgue space of p-th power absolutely integrable functions is denoted by Lp(X,µ), p ∈ [1,∞). If X ⊂ Rn, n ∈ N, is a Lebesgue measurable set and µ is the Lebesgue measure, we simply write Lp(X). Cost Function. The cost function is assumed to satisfy c ∈W 1,pd (Ω), p > d1 + d2, where, with a slight abuse of notation, W 1,p(Ω) denotes the Sobolev space on int(Ω). Note that, due to the regularity of ∂Ω, W 1,p(Ω) is compactly embedded in C(Ω), cf. e.g. [1, Theorem 6.3]. Thus, there exists a continuous representative of cd, which we denote by the same symbol. 4 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER Bilevel Objective. The functional J : M(Ω)×M(Ω1)→ R is supposed to be lower semicontinuous w.r.t. weak-∗ convergence. Let us give an application-driven exam- ple for such a functional. Suppose that subsets D ∈ B(Ω) and D1 ∈ B(Ω1) and measurements π ∈M(D) and µdd 1 ∈M(D1) are given. Then we set J (π, µ d1) := |π − πd|(D) + ν |µ1 − µ1 |(D1), where ν ≥ 0 is a weighting parameter. The goal of the bilevel optimization is then to adjust µ1 and π such that the deviation between an optimal transport process and (possibly inaccurate) measurements thereof on subdomains becomes minimal. 2. Preliminaries Given marginals µ1 ∈ M(Ω1), µ2 ∈ M(Ω2) and a measurable cost function c : Ω→ [−∞,∞], the Kantorovich problem∫ of optimal transport reads  inf K(π) := c(x) dπ(x)(KP) Ω s.t. π ∈ Π(µ1, µ2), π ≥ 0. Lemma 2.1 ([26, Theorem 4.1]). If µ1, µ2 ≥ 0 and ‖µ1‖M(Ω1) = ‖µ2‖M(Ω ) and2 if c is lower semicontinuous and bounded from below, then there exists an optimal solution of (KP). Despite this existence result and its simple structure, the Kantorovich problem provides some challenging aspects, especially from a numerical perspective. First of all, its solution may be non-unique (although there are conditions which guarantee uniqueness, see e.g. [22, Theorem 1.17]). More importantly, since π “lives” on the Cartesian product of Ω1 and Ω2, the dimension of a discretized counterpart of (KP) easily becomes so large that a numerical solution by means of standard LP-solvers is no longer possible. Therefore, several penalization and relaxation methods have been proposed, that in combination with dualization, allow a significant reduction in the size of the problem. The most popular method is probably the entropic regularization in combination with the well-known Sinkhorn algorithm, see e.g. [7, 6]. In this paper, we rely on a different strategy, namely the quadratic regularization that has been introduced in [18] and is as follows: Given a regularization parameter γ > 0, two marginals µ ∈ L21 (Ω1), µ 22 ∈ L (Ω2), and a cost function c ∈ L2(Ω), we consider ∫  inf Kγ(πγ) := c(x)πγ(x) dλ(x) + γ 2 ‖π 2  γ ‖L2(Ω)  Ωs.t. π∫ 2γ ∈ L (Ω), πγ ≥ 0 λ-a.e. in Ω, (KPγ)  ∫ πγ(x1, x2) dλ2(x2) = µ1(x1) λ1-a.e in Ω1, Ω2 πγ(x1, x2) dλ1(x1) = µ2(x2) λ2-a.e in Ω2. Ω1 Lemma 2.2 ([18, Lemma 2.1, Theorem 2.11]). (i) Problem (KPγ) admits a unique solution if and only if µi ≥ 0 λi-a.e. in Ωi, i = 1, 2, and ‖µ1‖L1(Ω ) = ‖µ2‖L11 (Ω2). BILEVEL KANTOROVICH PROBLEM, PART I 5 (ii) If, in addition, there exist constants c > −∞ and δ > 0 such that c ≥ c λ-a.e. in Ω and µi ≥ δ λi-a.e. in Ωi, i = 1, 2, then π ∈ L2γ (Ω) is a solution of (KPγ) if and only if there exist functions α1 ∈ L2(Ω1) and α2 ∈ L2(Ω2) satisfying 1 (2.1a) ∫πγ − (α1 ⊕ α2 − c)+ = 0 λ-a.e. in Ω,γ (2.1b) ∫ πγ(x1, x2) dλ2(x2) = µ1(x1) λ1-a.e. in Ω1,Ω2 (2.1c) πγ(x1, x2) dλ1(x1) = µ2(x2) λ2-a.e. in Ω1. Ω1 Herein, (α1⊕α2)(x1, x2) := α1(x1) +α2(x2) λ-a.e. in Ω refers to the direct sum of α1 ∈ L2(Ω ) and α ∈ L21 2 (Ω2), while, for given u ∈ L2(Ω), (u)+(x) := max{u(x); 0} λ-a.e. in Ω denotes the pointwise maximum. It is clear that both the direct sum and the pointwise maximum map L2(Ω1)×L2(Ω2) and L2(Ω), respectively, to L2(Ω) so that (2.1a) is well defined. (iii) Under the above assumptions, the functions αi ∈ L2(Ωi), i = 1, 2, from (ii) solve thedual problem given by ∑2 ∫1 2 max Φγ(a1, a2) := − 2‖(a1 ⊕ a2 − c)+‖L2(Ω) + γ aiµi dλi(2.2)  i=1 Ωis.t. ai ∈ L2(Ωi), i = 1, 2, and there is no duality gap, i.e., Φγ(α1, α2) = Kγ(πγ). The above results directly address the aforementioned challenges. Besides the uniqueness of the solution, the approach allows us to escape the curse of dimen- sionality. If one inserts (2.1a) into (2.1b) and (2.1c), then a non-smooth system of equations for the dual variables α1 and α2 arises. We are then dealing with a prob- lem in L2(Ω 21) × L (Ω2) instead of L2(Ω1 × Ω2), which, after discretization, leads to a substantial reduction of the number of unknowns. Moreover, due to the max- operator in (2.1a), the sparsity pattern of the transport plans is better preserved compared to the entropic regularization. Finally, the structure of (2.1) allows for the application of a semi-smooth Newton method, see [18] for more details. The convergence of (sub-)sequences of solutions of (KPγ) to solutions of (KP) for γ ↘ 0 is addressed in [17]. To be more precise, it is shown that the objective of (KPγ) Γ-converges to the objective of (KP) w.r.t. weak-∗ convergence in M(Ω) as γ ↘ 0, provided that the original marginals in M(Ωi), i = 1, 2, are smoothed and the smoothing parameter is properly coupled with γ, see [17, Theorem 4.2]. As outlined in the introduction, the goal of this and the companion papers is to employ the quadratic regularization for a bilevel optimization problem with the Kantorovich problem (KP) as the constraint. The motivation for this approach is as follows: First, the uniqueness of solutions for (KPγ) allows us to define a solution operator Sγ(c, µ1, µ2) 7→ πγ , which, in turn, enables us to employ the so-called implicit programming approach. Secondly, we expect that Sγ (or a discretization thereof) provides sufficient smoothness to use non-smooth optimization algorithms for the solution of the regularized bilevel problem. Before we address the regularized bilevel problem, we turn to the optimization of the original problem (KP) and show existence of at least one optimal solution to the latter in the upcoming section. 6 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER 3. Existence of Optimal Solutions of the Bilevel Kantorovich Problem Let us first recall the bilevel optimization of the Kantorovich problem from theintroduction: inf J (π, µ1)π,µ1 (BK)  s.t. µ1 ∈M(Ω1){, ∫µ1 ≥ 0, ‖µ1‖M(Ω ) = 1, 1 }π ∈ arg min cd dϕ : ϕ ∈ Π(µ1, µd2), ϕ ≥ 0 , Ω where c ∈W 1,pd (Ω), for p > d1 +d2, and µd2 ∈M(Ω2) denote a fixed cost functional and a fixed marginal, respectively, and J is a given objective, see Section 1.1. To shorten notation, given c ∈ C(Ω) and µi ∈ M(Ωi), i = 1, 2, we abbreviate the set of associated optimal transport plan{s∫by } (3.1) S(c, µ1, µ2) := arg min cdϕ : ϕ ∈ Π(µ1, µ2), ϕ ≥ 0 . Ω The essential tool to establish the existence of solutions to (BK) is the following stability result for the Kantorovich problem. Its proof is based on the concept of c-cyclic monotonicity. For details, we refer to [26, Section 5]. Lemma 3.1 (Stability of the transport plan, [26, Theorem 5.20]). Let c ∈ C(Ω) be given and assume that {µk1}k∈N ⊂M(Ω1) and {µk2}k∈N ⊂M(Ω2) are sequences that satisfy µk, µk1 2 ≥ 0 and ‖µk1‖ kM(Ω ) = ‖µ2‖M(Ω ) for all k ∈ N and converge weakly-∗1 2 to µ1 ∈ M(Ω1) and µ2 ∈ M(Ω2). Let moreover {πk}k∈N be a sequence of optimal transport plans associated with (µk1 , µ k 2), i.e., πk ∈ S(c, µk1 , µk2). Then there is a subsequence that converges weakly-∗ to an optimal transport plan π ∈ S(c, µ1, µ2). The above lemma is just a special case of [26, Theorem 5.20], where the cost func- tion need not to be fixed. We underline that the notion of “weak convergence” in [26] (sometimes also called narrow convergence) coincides with weak-∗ convergence in our case, since Ω1, Ω2, and Ω are compact. Theorem 3.2. There exists at least one globally optimal solution to (BK). Proof. Based on Lemma 3.1, the result easily follows from the direct method of the calculus of variations. First, thanks to Lemma 2.1, the feasible set of (BK), denoted by F , is nonempty. Thus, there exists a minimizing sequence {(π kk, µ1)}k∈N so that lim J (π kk, µ1) = inf J (π, µ1) ∈ R ∪ {−∞}. k→∞ (π,µ1)∈F The feasibility of the minimizing sequence implies ‖π ‖ = π (Ω × Ω ) = (P π )(Ω ) = µk(Ω ) = ‖µkk M(Ω) k 1 2 1# k 1 1 1 1‖M(Ω ) = 1 ∀ k ∈ N1 so that there is a subsequence, denoted by the same symbol to ease notation, that converges weakly-∗ to a limit (π̄, µ̄ ) ∈M(Ω)×M(Ω ). Thus (µk, µd) ⇀∗1 1 1 2 (µ̄1, µd2) in M(Ω1)×M(Ω2) and consequently, Lemma 3.1 implies π̄ ∈ S(c dd, µ̄1, µ2). In view of the constraints of the Kantorovich problem, this also gives µ̄1 ≥ 0 and ‖µ̄1‖M(Ω1) = 1 and hence, (π̄, µ̄1) ∈ F , i.e., the weak limit is feasible. The optimality of the weak limit follows from the presupposed weak-∗ lower semicontinuity of J .  BILEVEL KANTOROVICH PROBLEM, PART I 7 Remark 3.3. Lemma 3.1 only requires the continuity of the cost function. For the mere existence result, one could therefore relax the regularity assumption on the cost function to cd ∈ C(Ω). The improved regularity of cd is however required for the analysis of the regularized bilevel problem, and for this reason, we impose it as standing assumption. Moreover, Lemma 3.1 also holds in Polish spaces and not only in compact sets. One can therefore generalize the existence result of Theorem 3.2 to a much broader class of domains Ω1 and Ω2. If one replaces (KP) by its necessary and sufficient optimality conditions, then (BK) can equivalently be rewritten as inf J (π, µ ) 1  s.t. µ1 ∈M(Ω1), µ1 ≥ 0, ‖µ1‖M(Ω1) = 1, π ∈M(Ω1 × Ω2), ϕ ∈ C(Ω1), ψ ∈ C(Ω2), (BK) ⇐⇒  P π = µ , P d  1# 1 2# π = µ2 ,  π∫≥ 0, ϕ(x1) + ψ(x2) ≤ cd(x1, x2) ∀ (x1, x2) ∈ Ω1 × Ω2, (ϕ⊕ ψ − cd) dπ = 0. Ω1×Ω2 We observe that the last two lines in the above reformulation of (BK) form a complementarity system in M(Ω1 × Ω2) and C(Ω1 × Ω2), so that (BK) becomes an MPCC in the space of regular Borel measures, as already mentioned in the introduction. Even though several results for MPCCs are known, in particular when the cone defining the complementarity constraints is polyhedric, which is the case here, see [28, Example 4.12], problems of this type are typically smoothed or regularized, and we will do just that in the next section. 4. Existence of Optimal Solutions of the Regularized Bilevel Problem The regularized Kantorovich problem (KPγ) clearly admits a solution only if the marginals are functions in L2(Ω1) and L 2(Ω2). Therefore, one needs to regularize the marginals, if the Kantorovich problem in (BK) is replaced by (KPγ). But even if the marginals were functions in L2(Ω1) and L 2(Ω2), one needs to smooth them considering the lack of (weak) continuity of the solution mapping associated with (KPγ), see Example 4.2 below. What is more, in order to guarantee the existence of the dual variables α1 and α2 from Lemma 2.2, the marginals need to be strictly positive, see [18, Assumption 1]. We therefore introduce the convolution & constant shifting operators δ (4.1) T δ δ 2 δi : M(Ωi) 3 µi 7→ ϕi ∗ µi + ∈ L (Ωi ), i = 1, 2,|Ωδi | which turn the marginals into smooth and strictly positive functions on Ωδ1 and Ωδ2. Herein, δ > 0 is a smoothing parameter, ϕ δ ∞ di i ∈ Cc (R ) denotes a standard mollifier with ‖ϕδ‖ = 1 and support in B(0, δ) ⊂ Rdii L1(Rdi ) and Ωδi := Ωi+B(0, δ), i = 1, 2. Moreover, we set Ωδ := Ω δ δ 1×Ω2. When mollifying µi, we of course extend it by zero, i.e., ∫ (ϕδi ∗ µi)(x) = ϕδi (x− y) dµi(y), x ∈ Ωδi . Ωi 8 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER As a consequence, ϕδ ∗ µ ⇀∗i i µi in M(Ωi) as δ ↘ 0, provided that the support of µi has positive distance to ∂Ωi, which will be useful for the convergence analysis in the companion paper [11], where the smoothing parameter δ will be polynomially coupled with the regularization parameter γ. Owing to Lemma 2.2 there exists a unique solution π ∈ L2γ (Ωδ) to (KPγ) for costs in L2(Ωδ) and marginals in L 2(Ωδi ), i = 1, 2, that satisfy the conditions in Lemma 2.2(i). We denote the associated solution operator by Sγ : L2(Ωδ)×{ M0 3 (c, µ1, µ 22)→7 πγ ∈ L (Ωδ), M0(Ωδ) := (µ1, µ2) ∈ L2(Ωδ1)× L2(Ωδ2) : ‖µ1‖L1(Ωδ) = ‖µ2‖L1(Ωδ1 2), } µi ≥ 0 λi-a.e. in Ωδi , i = 1, 2 . To ease notation, we suppress the dependency of πγ and Sγ on δ. Furthermore, we introduce the extension-by-zero operator E 2δ : C(Ω) → L (Ωδ), whose adjoint E∗ : L2δ (Ωδ) → M(Ω) is the associated restriction operator. Now, we have every- thing at hand to formulate the regularized bilevel problem: inf Jγ(πγ , µ 11, c) := J (πγ , µ1) + pγ ‖c− c pd‖π ,µ ,c W 1,p(Ω)γ 1 (BKδγ)  s.t. c ∈W 1,p(Ω(), µ1 ∈M(Ω1), µ)1 ≥ 0, ‖µ1‖M(Ω ) = 1,1 πγ = E∗δ Sγ E δ δ dδ c, T1 (µ1), T2 (µ2) . As we will see in the proof of Theorem 4.7, there holds (T δ1 (µ δ d1), T2 (µ2)) ∈M0(Ωδ) such that πγ is well defined, cf. Lemma 2.2(i). Compared to (BK), we not only replace the Kantorovich problem as the lower-level problem with its regularized counterpart, but also add the cost function c to the set of optimization variables. This is motivated by the so-called reverse approximation property, which is essential to show the convergence of minimizers of (BKδγ) towards solutions of the original unregularized bilevel problem, see the companion paper [12], where this property is elaborated for the finite dimensional counterparts of (BK) and (BKδγ). This property requires a set of optimization variables that is sufficiently rich, as is also required, e.g., in the optimization of perfect plasticity, see [20]. For this reason, c is treated as an additional optimization variable. After all, the penalty term in the upper-level objective Jγ will ensure that, in the limit, c equals the given cost function cd, see [12]. Remark 4.1. Instead of regularizing w.r.t. the Lebesgue measure, one could also apply a regularization w.r.t.∫the produc∫t measure of the marginals µ 1 ⊗ µ2, i.e., inf cdπ + γ 22 π d(µ1 ⊗ µ2) (K̃Pγ)  Ω Ω s.t. π ∈ L2(Ω1 × Ω2, µ1 ⊗ µ2) ∩Π(µ1, µ2), where, with a slight abuse of notation, we use the same symbol for the Borel measure π and its density w.r.t. the product measure. Note that the constraint π ∈ Π(µ1, µ2) does not imply that π is automatically absolutely continuous w.r.t. the product measure, as the counterexample Ωi = [0, 1], µi = λ, i = 1, 2, and π = (id, id)#λ shows. Hence, an additional regularization is also necessary in this case. Nevertheless, a regularization w.r.t. to the product measure has several advantages. For instance, the marginals need not to be smoothed and the positivity assumption on µ1 and µ2 in [18, Assumption 1] becomes superfluous. However, in BILEVEL KANTOROVICH PROBLEM, PART I 9 the bilevel context, this approach does not seem to be promising: µ1 is an upper- level variable and therefore the space for π is no longer fixed but depends on the optimization variable. The rest of this paper is dedicated to the existence of solutions to (BKδγ). In this context, the continuity properties of Sγ are of course essential and will be discussed in the following. 4.1. Hölder Continuity of the Regularized Solution Operator. The key tool in the existence proof for the unregularized bilevel problem in Theorem 3.2 has been the stability of the Kantorovich problem w.r.t. (weak-∗) perturbations of the marginals from Lemma 3.1. Unfortunately, such a weak continuity result does not hold in case of the regularized Kantorovich problem, as we will demonstrate below by means of a counterexample. Example 4.2. Let Ω1 = Ω2 = [0, 1], γ = 1, and c(x1, x2) := 1 4 |x 2 1−x2| . Moreover, define f : R→{R, f(x) := sgn(sin(2πx)) and, for n ∈ N, set{ f(nx ) + 9 , 0 ≤ x ≤ 1 1n 1 1 ,α (x ) := 4 2 αn 0, 0 ≤ x2 ≤ ,1 1 2 (x2) := 2 f(nx1) + 5 1 4 , 2 < x1 ≤ 1, − 1 2 , 1 2 < x2 ≤ 1. In view of (2.1a), we(further set1 ) π (x , x ) := αnn 1 2  1 (x1) + α n 2 (x2)− c(x1, x2) γ + f(nx ) + 91 4 − 14 |x1 − x2|2, 0 ≤ x 11, x2 ≤ 2 ,f(nx ) + 5 − 1 |x − x |2, 1 < x ≤ 1, 0 ≤ x ≤ 1 ,=  1 4 4 1 2 2 1 2( 2f(nx ) + 7 − 11 4 4 |x1 − x2|2, ) 0 ≤ x 1 11 ≤ 2 , 2 < x2 ≤ 1, f(nx1) + 3 − 14 4 |x1 − x2| 2 , 12 < x1, x2 ≤ 1,+ whose weak limit (w.r.t.weak convergence in L2(Ω)) is 9 − 1 |x 24 4 1 − x2| , 0 ≤ x , x 11 2 ≤ 2 , 5 1 4 − 4 |x1 − x | 2 2 , 1 < x 1  2 1 ≤ 1, 0 ≤ x2 ≤ , π(x1, x2) = 2 7 − 14 4 |x1 − x |22 , 0 ≤ x 1 11 ≤ 2 , 2 < x2 ≤ 1,7 − 18 8 |x1 − x2|2, 12 < x1, x2 ≤ 1. Because the system in (2.1) is necessary and sufficient for optimality, πn is the solution of (KPγ) wit∫h the cost function defined abov∫e and marginals given by1 1 µn1 (x1) := π (x , x ) dx , µ n n 1 2 2 2 (x2) := πn(x1, x2) dx1. 0 0 Note that, for all n ∈ N, µn 1i ≥ /16 λi-a.e., i = 1, 2, such that Lemma 2.2(ii) is indeed applicable. Clearly, the weak convergence of πn implies that µ n i converges weakly to µi ∈ L2(Ωi), i = 1, 2, and the pointwise bound carries over to the weak limit such that Lemma 2.2(ii) also holds for the limits µ1 and µ2. Accordingly, if the weak limit π were a solution to the regularized Kantorovich problem, then there would be dual sol(utions α1 ∈ L2(Ω1) and α 22 ∈) L (Ω2) so that π(x1, x2) = α1(x1) + α (x 2 2 2)− c(x1, x2) λ-a.e. in [0, 1] .+ 10 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER Because of π > 0, this is equivalent to 94 , 0 ≤ x1, x 12 ≤ 2 ,5 , 1 < x1 ≤ 1, 0 ≤ x2 ≤ 1 , (4.2) α 4 2 21(x1) + α2(x2) =  74 , 0 ≤ x ≤ 11 2 , 12 < x2 ≤ 1,7 8 + 1 8 |x 2 1 1 − x2| , 2 < x1, x2 ≤ 1, λ-a.e. in [0, 1(]2. T)his, howeve(r, lea)ds to a contradiction: picking arbitrary Lebesgue points x̂ ∈ 0, 12 2 and x̃2 ∈} 12 , 1 of α2,{(4.2) implies 9 − α (x̂ ), 0 ≤ x ≤ 1 74 2 2 1 2 4 − α2(x̃2), 0 ≤ x1 ≤ 1 = α(x ) = 2 5 1 4 − α2(x̂2), 1 2 < x1 ≤ 1 [ 78 +]18 |x1 − x̃ 22| − α2(x̃2), 12 < x1 ≤ 1 λ1-a.e. in [0, 1]. While the conditions in 0, 1 2 ( imp]ly that α2(x̂2) − α (x̃ ) = 12 2 2 , it must hold that α2(x̂2) − α2(x̃2) ≤ 3 < 18 2 on 1 2 , 1 , which yields the desired con- tradiction. Therefore, the weak limit π cannot be the solution of (KPγ) associated with the limits µ1 and µ2, giving in turn that Sγ is not weakly continuous. We compensate the lack of weak continuity by means of the convolution operators T δi , i = 1, 2, which turn weakly convergent into strongly convergent sequences. Still, we need to prove that Sγ is continuous w.r.t. the strong topology of L2(Ωδ). This is what we will show next. To this end, let γ > 0 and δ > 0 be arbitrary, but fixed throughout this section. To simplify the notation, we slightly chance the notation in comparison to the beginning of this section and the consecutive subsection. First, we suppress the sub- and superscripts γ and δ in the rest of this section, except for Sγ and πγ in order to underline the difference to the solution of the unregularized problem (KP) and its solution set in (3.1). Moreover, given (µ1, µ2) ∈M0(Ω), and c ∈ L2(Ω), we set π 2γ := Sγ(c, µ1, µ2) ∈ L (Ω) (i.e., without the restriction operator E∗δ ). We start with several auxiliary boundedness results. For this purpose, let us consider arbitrary but fixed data c ∈ L2(Ω), µ ∈ L2i (Ωi), i = 1, 2, with (4.3) ‖µ1‖L1(Ω ) = ‖µ1‖L1(Ω ) =: m.1 2 Moreover, we suppose that the assumptions of the second part of Lemma 2.2 are satisfied, i.e., there are constants c > −∞ and δ > 0 such that (4.4) c ≥ c λ-a.e. in Ω and µi ≥ δ λi-a.e. in Ωi, i = 1, 2. Note that the latter condition is automatically fulfilled in context of the bilevel problem (BKδγ) owing to the definitions of the operators T δi . Moreover, the penalty term in the upper level objective ensures that the cost c is bounded in W 1,p(Ω) ↪→ C(Ω), which ensures the other condition thanks to the compactness of Ω. Lemma 4.3. There exists a co(nstant C = C(γ,m) > 0 such tha)t ‖πγ‖L2(Ω) ≤ C ‖µ1‖L2(Ω ) ‖µ2‖L2(Ω ) + ‖c‖L2 .1 2 (Ω) Proof. According to Lemma 2.2, there exist α1 ∈ L2(Ω1) and α ∈ L22 (Ω2) such that the system in (2.1) is fulfilled. Multiplying (2.1b) and (2.1c) with α1 and α2, respectively, integrating and∫adding the res∫ulting equations∫leads to (4.5) γ‖π‖2L2(Ω) = µ1α1 dλ1 + µ2α2 dλ2 − πcdλ, Ω1 Ω2 Ω BILEVEL KANTOROVICH PROBLEM, PART I 11 where we also used (2.1a). Exploiting the normalization condition in (4.3), we obtain ∫ ∫ ∫ ∫ 1 1 µ1α1 dλ1 = µ1α1 µ2 dλ2 dλ1 = (µ1 ⊗ µ2)α1dλ Ω m1 Ω1 Ω m2 Ω and an analogous equation for µ2α2. Herein, (µ1 ⊗ µ2) := µ1(x1)µ2(x2) for λ-a.a. (x1, x2) ∈ Ω refers to the direct product of µ1 and µ2. One easily verifies that µ ⊗ µ ∈ L21 2 (Ω) ∫with ‖µ1 ⊗ µ2‖L2(Ω) = ‖µ1‖L2(Ω )‖µ2‖L2(Ω ). Therefore, (4.5)1 2allows us to estimate ∫ ∫ γ‖π‖2 1 1L2(Ω) ≤ (µ1 ⊗ µ2)(α1 ⊕ α2 − c)+dλ+ (µ1 ⊗ µ2)cdλ− πcdλm Ω m Ω Ω ≤ γ m−1 ‖π‖L2(Ω)‖µ1‖L2(Ω )‖µ1 2‖L2(Ω2) +m−1 ‖µ1‖L2(Ω )‖µ2‖1 L2(Ω2)‖c‖L2(Ω) + ‖c‖L2(Ω)‖π‖L2(Ω). and Young’s inequality then gives the result.  The next two lemmas address the boundedness of the dual variables α1 and α2 from Lemma 2.2. We first observe that the dual variables cannot be bounded by the data in any norm, since they are not unique: if (α1, α2) satisfies (2.1) for πγ , then, for any r ∈ R, (α1 + r, α2 − r) does so, too. In the rest of the paper, we therefore pick a particular dual so∫lution, namely the one, where r is chosen suchthat (4.6) α2 dλ2 = 0. Ω2 We call a dual solution (α1, α2) satisfying (4.6) zero-mean dual solution. The above considerations show that, under the assumptions of Lemma 2.2, there always exists a zero-mean dual solution. We further underline that even the zero-mean dual so- lution is in general not unique as the counterexample in [18, Section 2.3] illustrates. Still, we can show that every zero-mean dual solution is bounded by the data c, µ1, and µ2. We start with an L 1 bound in the following Lemma 4.4. Every zero-mean dual(solution satisfies ) ‖α1‖ 2 L1(Ω ) + ‖α2‖L1(Ω ) ≤ C ‖µ1‖L2 ‖µ ‖ 2 + ‖c‖ 2 + 11 2 (Ω1) 2 L (Ω2) L (Ω) with a constant C = C(γ,m, δ) > 0. Proof. Lemma 2.2(iii) yields ‖c‖L2(Ω) ‖πγ‖L2(Ω) ≥ −Kγ(πγ) = −Φγ(∫α1, α2) ∫ ≥ − 1 1(µ1 ⊗ µ2)(α1 ⊕ α2 − c) dλ− (µ1 ⊗ µ2)cdλ, m Ω m Ω which, in co(mbination with (2.1a), leads)to ‖c‖L2(Ω) ‖∫πγ‖L2(Ω) + ‖µ1 ⊗ µ2‖L2(Ω) ∫ ≥ − 1 ∫ (µ1 ⊗ µ2)(α1 ⊕ α2 − 1 ∫c)+ dλ+ (µ1 ⊗ µ2)(α1 ⊕ α2 − c)− dλm Ω m Ω γ δ2≥ − (µ1 ⊗ µ2)πγ dλ+ (α1 ⊕ α2 − c)− dλ, m Ω m Ω 12 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER where δ > 0 is the constant from (4.4). Thus, thanks to Lemma 4.3, we arrive at ‖α1 ⊕ α2‖L1(Ω) ≤ ‖(α1 ⊕ α2 − c)+‖L1(Ω) + ‖(α1 ⊕ α2 − c)−‖L1(Ω) + ‖c‖L1(Ω) ≤ γ‖πγ(‖L1(Ω) + ‖c‖L1(Ω) + m2 ‖c‖L2δ (Ω)‖πγ‖L2(Ω) ( )+ 1 ‖c‖ γL2(Ω)‖µ1 ⊗ µ2‖L2m (Ω) + m‖)µ1 ⊗ µ2‖L2(Ω)‖πγ‖L2(Ω) ≤ C ‖µ1‖ ‖ 2 L2(Ω ) µ2‖L2(Ω ) + ‖c‖L2(Ω) + 11 2 with a constant C > 0 depending on γ, m, and δ. To deduce a bound for α1 and α2 individually from this, we emp∫loy (4.6) to obtain ‖α1 ⊕ α2‖L1(Ω) = sup φ(α1 ⊕ α2) dλ φ∈L∞(Ω), Ω ‖φ‖∞≤1 ∫ ≥ sup (φ1 ⊗ 1)(α1 ⊕ α2) dλ = |Ω2| ‖α1‖L1(Ω ).1 φ ∞1∈L (Ω1), Ω ‖φ1‖∞≤1 Finally, the L1-norm of α2 is estim(ated by ) ‖α2‖ −1L1(Ω ) ≤ |Ω1| ‖α1 ⊕ α2‖L1(Ω) + |Ω2|‖α2 1‖L1(Ω ) ,1 which, together with the previous estimates, yields the claim.  Lemma 4.5. There is a constant C(= C(γ,m, δ, c) > 0 such that ) ‖ 6α1‖L2(Ω ) + ‖α2‖L2(Ω ) ≤ C ‖µ1 2 1‖L2(Ω )‖µ1 2‖L2(Ω + ‖c‖ 22) L (Ω) + 1 holds for every zero-mean dual solution. Proof. We proceed in two steps: in a first step, we show the boundedness of the positive parts of α1 and α2 (i); in a second step, we derive bounds for their negative parts (ii). Ad (i): Let us denote the bound from Lemma 4.4 by M > 0, i.e., in particular ‖α2‖L1(Ω ) ≤M , and define{, up to sets of zero Lebesg}ue measure, the subset2 ∈ | | ≤ 2MΩ̃2 := x2 Ω2 : α2(x2) ⊂ Ω .| 2Ω2| Then it follows b∫y construction tha∫t M ≥ |α2(x2)|dλ2 ≥ |α2(x2)|dλ2 ≥ 2M |Ω \ Ω̃ |, Ω Ω \Ω̃ |Ω2| 2 2 2 2 2 which in turn implies |Ω2| |Ω2| (4.7) ≥ |Ω2 \ Ω̃2| = |Ω2| − |Ω̃2| =⇒ |Ω̃2| ≥ > 0. 2 2 Furthermore, we define, up to sets of zero Lebesgue measure, the sets Ω+1 := {x1 ∈ Ω1 : α1(x1) ≥ 0} as well as Ω̃+ := {(x +1, x2) ∈ Ω1 × Ω̃2 : α1(x1) + α2(x2) ≥ 0}. Then, by construction, it holds ≤ ≤ − ≤ 2M0 α1(x1) α2(x2) λ-a.e.-in (Ω+ × Ω̃ ) \ Ω̃+.|Ω2| 1 2 BILEVEL KANTOROVICH PROBLEM, PART I 13 Together with (4.7), this yields ∫ ∫ ‖ ‖2 1(α1) 2+ L2(Ω ) = |α | dλ dλ1 |Ω̃2|(Ω̃∫ 1 12 Ω+1 ∫ 2 ) 2 (4.8) ≤ 2∫ |α1| dλ+ |α1| 2 dλ |Ω2| Ω̃+ (Ω+1 ×Ω̃2)\Ω̃+ ≤ 2 | |2 8 |Ω1|α1 dλ+ M2.|Ω2| 2Ω̃+ |Ω2| In order to estimate the first term∫ on the right hand side, we first observe that ‖(α 2 21 ⊕ α2)+‖L2(Ω) ≥ ∫ (α1 ⊕ α2) dλΩ̃+ ∫ ≥ ∫ |α1|2dλ− 2 |α1| |α2|dλ(4.9) Ω̃+ Ω̃+ ≥ |α 21| dλ− 2 ‖α1‖L1(Ω ‖α ‖ 11) 2 L (Ω2) Ω̃+ ≥ ‖α1‖2 2L2(Ω̃+ − 2M .) The left hand side in the above equation is estimated by means of (2.1a) and Lemma 4.3 as follows: (∫ ∫ ) ‖(α1 ⊕ α2) ‖2 2 2+ L2(Ω) ≤ 2( (α1 ⊕ α2 − c)+dλ+Ω ) c dλΩ(4.10) = 2 (γ2‖π ‖2 2γ L2(Ω) + ‖c‖L2(Ω) ) ≤ C ‖µ ‖2 ‖µ ‖2 21 L2(Ω1) 2 L2(Ω + ‖c‖ 22) L (Ω) . Inserting (4.9) and (4.10) into (4.8) and employing the definition of M as the bound from Lemma 4.4 then yields ( )2 (4.11) ‖(α1)+‖L2(Ω ) ≤ C ‖µ1‖L2(Ω )‖µ2‖L2(Ω ) + ‖c‖L2(Ω) + 1 ,1 1 2 with a positive constant C depending on γ, m, and δ. Interchaninging the roles of α1 and α2 implies the same bound for (α2)+. Ad (ii): To show the bound for the negative part, we argue similarly to the second part of the proof of [18, Theorem 2.11]. Given r ∈ R, we consider the set (defined up to sets of zero Lebesgue measure) Ω̂r2 := {x2 ∈ Ω2 : (α2)+(x2) > r + c} ⊂ Ω2. Recall that c > −∞ is defined to be a lower bound of the cost function c, see (4.4). For any r > −c, the Lebesg∫ue measure of this subset can be estimated by | r| ≤ 1 ≤ 1 ‖ ‖ ≤ MΩ̂2 (α2)+dλ2 α2 L1(Ω ) ,r + c 2Ω̂r r + c r + c2 where M a∫gain denotes the bound fro∫m Lemma 4.4. For all r > −c, we thus obtain (−r + α2 − c)+ dλ2 ≤ ∫ (−(r + c) + (α ) ) dλΩ2 Ω2 ( 2 + +) 2 = − (r + c) + (α2)+ dλ2 Ω̂r2 14 SEBASTIAN HILLBRECHT AND CHRISTIAN ME(YER ) 1 ≤ | 1 M 2Ω̂r| 22 ‖(α2)+‖L2(Ω ) ≤ K,2 r + c where K is short for the bound for ‖(α2)+‖L2(Ω ) from step (i), i.e., the right-hand2 side in (4.11). Therefore, if we set M K2 (4.12) R := − c + 1 > −c, γ2 δ2 then ∫ (−R+ α2 − c)+ dλ2 < γ δ. Ω2 Now∫ assume that α1 ≤ −R λ1-a.e. on a set E ⊂ Ω1 of positive Lebesgue measure.Then ∫ (α1 ⊕ α2 − c)+ dλ2 ≤ (−R+ α2 − c)+ dλ2 < γ δ ≤ γ µ1 λ1-a.e. in E, Ω2 Ω2 which contradicts (2.1b). Hence the definition of R in (4.12) along with the defini- tion of M and K being the bounds from Lemma 4.4 and (4.11) yields the existence of a constant C > 0 su(ch that ) (α1)− ≤ R ≤ C ‖µ1‖ ‖ ‖ ‖ ‖ 6 L2(Ω ) µ2 L2(Ω ) + c L2(Ω) + 1 λ1 2 1-a.e. in Ω1, i.e., we actually even obtain an L∞-bound for the negative part. Note that, since R depends on γ, δ, and c, the same holds for the above constant. Again, the estimate for (α2)− follows by means of reversed roles. Combining the results for the positive and negative part finally proves the lemma.  With the above boundedness results we are now in a position to prove the (strong) continuity of Sγ . For this purpose, let us define the following sets: (4.13) H(Ω) := L2(Ω)× L2(Ω1)× L2(Ω2), (4.14) Cc(Ω) := {{c ∈ L2(Ω): c ≥ c λ-a.e. in Ω}, (4.15) Mmδ (Ω) := (µ1, µ2) ∈ L2(Ω1)× L2(Ω2) : ‖µ1‖L1(Ω ) = ‖µ1 2‖L1(Ω ) = m,2 } µi ≥ δ λi-a.e. in Ωi, i = 1, 2 . Proposition 4.6. Let γ, δ,m > 0 and c > −∞ be given. Then the solution operator of the regularized Kantorovich problem (KPγ) is Hölder continuous with exponent 1/2 on bounded sets in the following sense: For all M > 0, there exists a constant L > 0, depending on M , γ, δ, m, and c, such that 1 ‖S /γ(cµ, µ1, µ2)− Sγ(cν , ν1, ν2)‖L2(Ω) ≤ L ‖(cµ, µ1, µ )− (c , ν , ν )‖ 22 ν 1 2 H(Ω) for all (c mµ, µ1, µ2), (cν , ν1, ν2) ∈ Cc(Ω)×Mδ (Ω) with ‖(cµ, µ1, µ2)‖H(Ω), ‖(cν , ν1, ν2)‖H(Ω) ≤M. Proof. The result is more or less a straight forward consequence of the previous boundedness results. Let (cµ, µ1, µ2), (cν , ν1, ν2) ∈ (Cc(Ω)×Mmδ (Ω))∩BH(Ω)(0,M) be given. We set πµ := Sγ(cµ, µ1, µ2) and πν := Sγ(cν , ν1, ν2) and denote the associated zero-mean dual solutions by (αµ µ ν ν 2 2 ∫ 1 , α2 ), (α1 , α2) ∈ L (Ω1)× L (Ω2). The equality constraints in (KPγ) imply ∫ (4.16) (πµ − πν) dλ2 = µ1 − ν1, (πµ − πν) dλ1 = µ2 − ν2. Ω2 Ω1 BILEVEL KANTOROVICH PROBLEM, PART I 15 Testing the first and second equation in (4.16) with αµ ν µ ν1−α1 and α2−α2 , respectively, integratin∫g and then a(dding the resulting equations, we arr1 )ive at (πµ −(∫πν) (α µ 1 ⊕ α µ 2 − cµ)− 1 (αν∫⊕ αν1 2 − cν) dλΩ γ γ 1 = ∫(µ1 − ν1)(α µ 1 − αν µ ν 1)dλ1 +) (µ2 − ν2)(α2 − α2)dλ2γ Ω1 Ω2 − (πµ − πν)(cµ − cν)dλ . Ω Using (2.1a), the inequality (a − b )2+ + ≤ (a+ − b+)(a − b) for all a, b ∈ R, and Young’s inequality, we a(r∑rive at2 ) ‖πµ − π 2ν‖L2(Ω) ≤ C ‖α µ ν 2 i − αi ‖L2(Ω )‖µi − νi‖L2i (Ωi) + ‖cµ − cν‖L2(Ω) . i=1 By Lemma 4.5, there holds ‖αµ ν µ ν 6i − αi ‖L2(Ω ) ≤ ‖αi ‖ 2i L (Ω ) + ‖αi ‖L2(Ω ) ≤ C(M + 1)i i with a constant C depending on γ, δ, m, and c. Inserting this in the above inequality then gives the result.  Some words concerning the above results are in order. First, due to their non- uniqueness, it is clear that one cannot expect an analogous result for the dual variables, even if we restrict to the zero-mean dual solutions. Moreover, an inspec- tion of the foregoing analysis reveals that the constant L from Proposition 4.6 tends to infinity, if γ and/or δ converge to zero or if c approaches −∞. This is somewhat to be expected, since one looses regularity as γ tends to zero and there is in general no existence of an optimal transport plan, if there is no lower bound for the cost. 4.2. Existence of Optimal Solutions. Based on the Hölder continuity result, we are now in the position to establish the existence of solutions to (BKδγ). For this purpose, we return to the notation from the beginning of this section, i.e. in particular, given c ∈ C(Ω) and µ1 ∈ M(Ω1) with µ1 ≥ 0 and ‖µ1‖M(Ω1) = 1, we set πγ := E∗δ Sγ(Eδ c, T δ1 (µ1), T δ2 (µd2)) ∈M(Ω). Theorem 4.7. Let γ > 0 and δ > 0 be fixed. There exists at least one globally optimal solution to the regularized bilevel Kantorovich problem (BKδγ). Proof. Based on Proposition 4.6, we can apply the direct method of the calculus of variations. First, we note that, for all µ1 ∈M(Ω1), µ2 ∈M(Ω2) with ‖µ1‖M(Ω =1) ‖µ2‖M(Ω ) and µi ≥ 0, i = 1, 2, there holds ϕδi ∗ µi ≥ 0 λi-a.e. in Ωδi and thus2 ‖T δ1 (µ1)‖L1(Ωδ) = ‖ϕδ1‖L1(Rd11 )‖µ1‖M(Ω1) + δ(4.17) = ‖ϕδ2‖L1(Rd2 )‖µ2‖M(Ω ) + δ = ‖T δ2 (µ2)‖L1(Ωδ) =: m.2 2 Hence, (T δ1 (µ1), T δ2 (µd2)) ∈Mm δ δ d0 (Ωδ) so that Sγ(Eδ c, T1 (µ1), T2 (µ2)) is well defined for every c ∈ C(Ω). Consequently, the feasible set of (BKδγ) is nonempty and thus there is an infimal sequence {(πnγ , µn1 , cn)}n∈N ⊂M(Ω)×M(Ω )×W 1,p1 (Ω), i.e., (4.18) Jγ(πnγ , µn1 , cn)→ inf (BKδγ) ∈ R ∪ {−∞} as n→∞, 16 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER where inf (BKδγ) denotes the minimal value of (BK δ γ). By the constraint in (BK δ γ), the sequence {µn1}n∈N is bounded in M(Ω1). Moreover, due to the constraints in (KPγ), ∫ ‖πn δ nγ ‖M(Ω) ≤ ‖Sγ(Eδ cn, T1 (µ1 ), T δ2 (µd2))‖ δ dL1(Ω = ϕ ∗ µ dλδ) 2 2 2 + δ = 1 + δ. Ωδ2 Therefore, {µn1} and {πnγ } are both bounded in M(Ω1) and M(Ω), respectively, and thus, the presupposed weak-∗ lower semicontinuity of J implies that inf J (πn, µnγ 1 ) =: j > −∞. n∈N Consequently, for n ∈ N large enough,( ) ‖cn − c pd‖W 1,p(Ω) ≤ pγ max{inf (BK δ γ) + 1, 0} − j such that {c } is bounded in W 1,pn (Ω). Therefore, there is a subsequence, denoted for simplicity by the same symbol, such that (4.19) (πnγ , µ n 1 ) ⇀ ∗ (π̄, µ̄1) in M(Ω)×M(Ω1), cn ⇀ c̄ in W 1,p(Ω). The set {µ1 ∈M(Ω1) : µ1 ≥ 0} is∫easily seen∫to be weakly-∗ closed. Hence µ̄1 ≥ 0and thus 1 = ‖µn‖ n1 M(Ω ) = dµ1 → dµ̄1 = ‖µ̄1 1‖M(Ω1). Ω1 Ω1 Therefore, the weak limit µ̄1 satisfies the constraints in (BK δ γ). Moreover, since p > d, the embedding W 1,p(Ω) ↪→ C(Ω) is compact, and thus cn → c̄ uniformly in Ω. Since Ω is compact, there is a constant c > −∞ such that infx∈Ω cn(c) ≥ c for all n ∈ N as well as infx∈Ω c̄(x) ≥ c. Hence, there holds Eδ c̄, Eδcn ∈ Cc(Ωδ) for all n ∈ N and the uniform convergence implies Eδcn → Eδ c̄ in L2(Ωδ). Furthermore, the complete continuity of the convolution yields T δ n1 (µ1 )→ T δ1 (µ̄1) in L2(Ωδ1). Since µn1 , µ̄1 ≥ 0 and µd2 ≥ 0, we have ϕδ1 ∗ µn1 , ϕδ1 ∗ µ̄1 ≥ 0 λ1-a.e. in Ωδ1 and ϕδ ∗ µd2 2 ≥ 0 λ2-a.e. in Ωδ2 and therefore thanks to the constant shift in (4.1) (T δ1 (µn1 ), T δ2 (µd m δ2)) ∈Mδ′ (Ωδ) ∀n ∈ N, (T1 (µ̄ ), T δ(µd)) ∈Mm1 2 2 δ′ (Ωδ), where δ′ := max{|Ωγ |, |Ωγ1 2 |}−1 δ. Therefore, we are allowed to apply the Hölder continuity result from Proposition 4.6, which gives Sγ(Eδ c , T δn 1 (µn1 ), T δ2 (µd2))→ S (E c̄, T δγ δ 1 (µ̄1), T δ2 (µd2)) in L2(Ωδ). The continuity of the restriction operator E∗δ and the first convergence in (4.19) then imply π̄ = E∗δ Sγ(Eδ c̄, T δ1 (µ̄1), T δ2 (µd2)) such that the weak limit (π̄, µ̄1, c̄) is feasible for (BK δ γ). The mapping W 1,p(Ω) 3 c 7→ ‖c − c ‖pd W 1,p(Ω) is continuous and convex and therefore weakly lower semi- continuous. Additionally, J was assumed to be weakly-∗ lower semicontinuous. Altogether, the convergence in (4.19) yields Jγ(π̄, µ̄1, c̄) ≤ lim inf Jγ(πn, µn1 , cn) = inf (BKδ→∞ γ),n ensuring the optimality of the weak limit.  BILEVEL KANTOROVICH PROBLEM, PART I 17 References [1] Robert A. Adams and John J. F. Fournier. Sobolev spaces, volume 140 of Pure and Applied Mathematics (Amsterdam). Elsevier/Academic Press, Amsterdam, second edition, 2003. [2] Luigi Ambrosio and Nicola Gigli. A user’s guide to optimal transport. In Modelling and optimisation of flows on networks, pages 1–155. Springer, 2013. [3] V. Barbu. Analysis and Control of nonlinear infinite dimensional systems. Academic Press, New York, 1993. [4] Constantin Christof, Christian Clason, Christian Meyer, and Stephan Walther. Optimal con- trol of a non-smooth semilinear elliptic equation. Math. Control Relat. Fields, 8(1):247–276, 2018. [5] Constantin Christof, Juan Carlos De los Reyes, and Christian Meyer. A nonsmooth trust- region method for locally Lipschitz functions with application to optimization problems con- strained by variational inequalities. SIAM J. Optim., 30(3):2163–2196, 2020. [6] Christian Clason, Dirk A. Lorenz, Hinrich Mahler, and Benedikt Wirth. Entropic regular- ization of continuous optimal transport problems. J. Math. Anal. Appl., 494(1):Paper No. 124432, 22, 2021. [7] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in neural information processing systems, pages 2292–2300, 2013. [8] Pierre Grisvard. Elliptic Problems in Nonsmooth Domains. Classics in Applied Mathematics. SIAM, Philadelphia, 1985. [9] Lukas Hertlein and Michael Ulbrich. An inexact bundle algorithm for nonconvex nonsmooth minimization in Hilbert space. SIAM J. Control Optim., 57(5):3137–3165, 2019. [10] Roland Herzog, Christian Meyer, and Gerd Wachsmuth. C-stationarity for optimal control of static plasticity with linear kinematic hardening. SIAM Journal on Control and Optimization, 50(5):3052–3082, 2012. [11] Sebastian Hillbrecht, Paul Manns, and Christian Meyer. Bilevel optimization of the Kan- torovich problem and its quadratic regularization, part II: Convergence results in infinite dimensions. In preparation, 2022. [12] Sebastian Hillbrecht and Christian Meyer. Bilevel optimization of the Kantorovich problem and its quadratic regularization, part III: Reverse approximation in finite dimensions. In preparation, 2022. [13] M. Hintermüller and I. Kopacka. Mathematical programs with complementarity constraints in function space: C-and strong stationarity and a path-following algorithm. SIAM Journal on Optimization, 20(2):868–902, 2009. [14] Tim Hoheisel, Christian Kanzow, and Alexandra Schwartz. Theoretical and numerical com- parison of relaxation methods for mathematical programs with complementarity constraints. Math. Program., 137(1-2, Ser. A):257–288, 2013. [15] Leonid V. Kantorovič. On the translocation of masses. C. R. (Doklady) Acad. Sci. URSS (N.S.), 37:199–201, 1942. [16] Christian Kanzow and Alexandra Schwartz. The price of inexactness: convergence properties of relaxation methods for mathematical programs with complementarity constraints revisited. Math. Oper. Res., 40(2):253–275, 2015. [17] Dirk Lorenz and Hinrich Mahler. Orlicz space regularization of continuous optimal transport problems. arXiv:2004.11574, 2021. [18] Dirk A. Lorenz, Paul Manns, and Christian Meyer. Quadratically regularized optimal trans- port. Appl. Math. Optim., 83(3):1919–1949, 2021. [19] Zhi-Quan Luo, Jong-Shi Pang, and Daniel Ralph. Mathematical programs with equilibrium constraints. Cambridge University Press, Cambridge, 1996. [20] Christian Meyer and Stephan Walther. Optimal control of perfect plasticity Part II: Displace- ment tracking. SIAM J. Control Optim., 59(4):2498–2523, 2021. [21] F. Mignot. Contrôle dans les inéquations variationelles elliptiques. Journal of Functional Analysis, 22(2):130–185, 1976. [22] Filippo Santambrogio. Optimal transport for applied mathematicians, volume 87 of Progress in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer, Cham, 2015. Calculus of variations, PDEs, and modeling. [23] Holger Scheel and Stefan Scholtes. Mathematical programs with complementarity constraints: stationarity, optimality, and sensitivity. Math. Oper. Res., 25(1):1–22, 2000. 18 SEBASTIAN HILLBRECHT AND CHRISTIAN MEYER [24] A. Schiela and D. Wachsmuth. Convergence analysis of smoothing methods for optimal control of stationary variational inequalities with control constraints. ESAIM: M2AN, 47:771–787, 2013. [25] Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Mathe- matics. American Mathematical Society, Providence, RI, 2003. [26] Cédric Villani. Optimal transport. Old and new, volume 338 of Grundlehren der Mathematis- chen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2009. [27] Gerd Wachsmuth. Towards M-stationarity for optimal control of the obstacle problem with control constraints. SIAM Journal on Control and Optimization, 54(2):964–986, 2016. [28] Gerd Wachsmuth. A guided tour of polyhedric sets. Journal of Convex Analysis, 26(1):153– 188, 2019. Sebastian Hillbrecht, Technische Universität Dortmund, Fakultät für Mathematik, Lehrstuhl X, Vogelpothsweg 87, 44227 Dortmund, Germany Email address: sebastian.hillbrecht@tu-dortmund.de Christian Meyer, Technische Universität Dortmund, Fakultät für Mathematik, Lehrstuhl X, Vogelpothsweg 87, 44227 Dortmund, Germany Email address: christian2.meyer@tu-dortmund.de