PARTITION FUNCTION ESTIMATION AND PHASE TRANSITIONS ON RANDOM SATISFIABILITY PROBLEMS Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Technischen Universität Dortmund an der Fakultät für Informatik von Arnab Chatterjee aus Kharagpur,West Bengal,India Dortmund 2025 Dekan: Prof. Dr. Jens Teubner Gutachter: Prof. Dr. Amin Coja-Oghlan TU Dortmund Prof. Dr. Dimitris Achlioptas University of Athens Datum der mündlichen Prüfung: 25.11.2025 DEDICATION To my parents and all young minds of West Bengal. Acknowledgements “So long, and thanks for all the fish.” Looking back at my time as a master student at IIIT Delhi, at the end of my second semester, when many of my friends chose their thesis topics in the booming areas of machine learning and artificial intelligence, I was fortunate to have Prof. Subhabrata Samajder as my master thesis advisor who introduced me to the world of “Random Graphs”. Many of the random ideas that struck me during my thesis took shape in various forms over the next several years and culminated in this thesis. I would say Prof. Samajder is one of those responsible person for turning my attention towards theoretical computer science. This thesis is my humble token of appreciation for his effort and sincerity. Nevertheless, one year and a half working at the industry after completing my masters, I made one of the most important decisions of my life – pursuing a PhD. in the field of random graphs. So, the cruise again resume its journey with a different commander now on the deck. My PhD. supervisor Prof. Amin Coja-Oghlan patiently let me explore the depths but gave the screw the crucial half-turn just when needed to prevent me from floating away. I can’t thank him enough for his constant encouragement and his confidence in me. Each and every time whenever I came up with a new idea he has been patient with me to stretch my mind and raise me up to think like a mathematician. Throughout the PhD. his guidance and support played a pivotal role in my decision to dedicate myself towards random graphs and probabilistic combinatorics more specifically on random satisfiability problems. I would also like to extend my sincere appreciation to my dissertation committee members– Dimitris Achlioptas, Jean Christoph Jung and Kevin Buchin for taking the time to read my thesis, participate in my defense, and provide thoughtful feedback.. I am also beholden to my coauthors from world’s prestigious institutes — Prof. Catherine Greenhill (UNSW), Prof. Mihyung Kang (TU Graz), Prof. Noela Múller (TU Eindhoven) and Prof. Gregory B. Sorkin (LSE). Collaborating with them broadened my understanding of the field and advanced my learning trajectory. Besides them, I am also thankful to work closely with my colleagues at TU Dortmund. Kostas, Lena, Maurice, Olga, Pavel and Ulrike created a space that was both supportive and engaging, with conversations covering from technical details to everyday’s casual banter. Special thanks go to Maurice who helped me not only in finding accommodation, but also in some official works as an interpreter during my earlier days in Germany. I also thank Haodong and Joon with whom I spent a very good time at Leiden and our conversation often went late into the evenings reminding me that the mathematics can be serious but also fun. During my PhD. I also have the valuable opportunity to spend a research visit at University of California, Irvine (UCI). I owe my heartiest thanks to Prof. Asaf Ferber for hosting and making me feel welcome to his research group. Beside him, I also thank to his group members – Marcelo, Mason and Xiaonan with whom I shared my california stays, so not just productive discussion but also many memorable days, from working on existing problems to explore new ideas. i The whole PhD, journey is not complete without the people who provide strength outside the academic world. My parents Soumen Chatterjee and Mamata Chatterjee are above and beyond all thanks that I can ever gather. I am dedicating this thesis to my lovely parents as well as to all the young minds of West Bengal. It is not possible to adequately express in words the encouragement and support they have given me throughout my graduation timeframe. I hope I somewhat succeeded in meeting their expectations. I also owe heartiest thanks to my bhai (cousin) and masi (aunty), whose encouragement and motivation gave me the strength to keep going even when things seem uncertain. My deepest appreciation goes to my fiancée, Susmita who has been my constant companion and whose belief, love, care on me has carried me through the difficult times. This thesis, which contains whatever my research output that I could possibly ’write’ in words, is a joint fruit of labor, persistence and confidence of a lot of people, spread all around the globe. I took this opportunity to thank all those who helped me turn a possibility into a reality. I surely missed more names than I remembered to mention above. But mentioned or unmentioned, my gratitude transgresses the words I used to express my heartiest thanks. Arnab, California, September 2025. ii Abstract This thesis emphasizes on the estimation of partition functions and analyze phase transitions in random satisfiability problems with focuses on random 2-SAT, random k-XORSAT and random k-SAT models. Partition functions capture the exponential growth of solution spaces and establish a bridge among combinatorics, probability, and statistical physics. Studying their asymptotics and fluctuations helps us to understand the mechanisms behind sharp phase transitions and the solution space geometry in random constraint satisfaction problems. Our first contribution establishes a central limit theorem for the number of solutions (also called ’partition function’ in physics jargon) of random 2-SAT – first CLT of this type for any random CSPs. Thereby it provides a precise probabilistic characterization of fluctuations on the logarithm of the number of satisfying assignments of order p n with n the number of variables. In addition to this we effectively evaluated the formula for variance on the number of random 2-SAT solutions. The proof techniques relies on the Martingale central limit theorem along with the Gibbs uniqueness property and the local convergence to the Galton-Watson tree combined with a coupling argument called ‘Aizenmann-Sims-Starr scheme’. The second part of the thesis investigates the performance of a statistical physics inspired message passing algorithm called ’Belief Propagation Guided Decimation’ on the random k-XORSAT problem. Specifically, we derive an explicit threshold upto which the algorithm succeeds with a strictly positive probability between 0 and 1. Additionally, we study a thought experiment called ‘Decimation process’ for which we determine different phase transitions such as (non)-reconstruction and condensation phase transition and their connection to BPGD (in which regimes these two processes diverges or converge). Finally, for random k-SAT, we revisited the Gibbs uniqueness threshold, improving the lower bound over the previous work by Montanari and Shah [83]. More specifically, we count the number of actual satisfying assignments of random k-SAT which is given by the physics inspired ‘replica symmetry solution’ upto the Gibbs uniqueness threshold. Mathematically, we find an explicit expression on the logarithm of the number of solutions of random k-SAT in terms of the Bethe free entropy which is a function defined for a probability measure in the unit intterval. Moreover, our lower bound in contrast to Montanari-Shah bound is significant particularly for small k. In a nutshell, this thesis advance the rigorous understanding of random satisfiability problems by combining the algorithmic analysis, probabilistic combinatorics and statistical physics equipment. In light of both the structural properties of random formulas and the effectiveness of different message passing algorithms along with the universal principles governing fluctuations, correlation decay and mathematical foundation for the phenomena predicted by spin glass theory, point toward new directions for the future research on random satisfiability problems. iii Contents Acknowledgements i Abstract iii 1 Introduction 3 2 Models 9 2.1 Constraint Satisfaction Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 The SAT Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.3 Why SAT ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.4 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Statistical Physics and CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Boltzmann (Gibbs) probability distribution . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Some statistical physics models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Message Passing Algorithms 18 3.1 Belief Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 BP messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Computing marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Bethe-Free Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Belief Propagation Guided Decimation . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Decimation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Unit Clause Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.4 Pure Literal Pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Warning Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 Phase Transitions in random CSPs 31 4.1 The Satisfiabilty Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Quenched and Annealed Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Gibbs measure and Long range correlation . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.1 Gibbs measure on random CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 CONTENTS CONTENTS 4.3.2 Correlation decay and Gibbs Uniqueness . . . . . . . . . . . . . . . . . . . . . 35 4.3.3 Replica Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.4 Clustering transition: Reconstruction Property . . . . . . . . . . . . . . . . . . 39 4.4 Different phases in random k-SAT and random k-XORSAT . . . . . . . . . . . . . . 41 5 A Central Limit Theorem for random 2-SAT solutions 44 5.1 Motivation and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Main Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3 Proof Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.1 Method of Moments fails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 BP Approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3.3 Towards calculating variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.4 Establishing the Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6 Performance of BPGD on random k-XORSAT 53 6.1 Motivation and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.2 Problem Statement and Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6.2.1 Analysis of BPGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2.2 Phase Transition of Decimation process . . . . . . . . . . . . . . . . . . . . . . 57 6.3 Proof Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 7 On the Gibbs Uniqueness in random k-SAT 61 7.1 Motivation and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 7.2 Main Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 7.2.1 Limit in probability of log-partition function in random k-SAT . . . . . . . 62 7.2.2 Lower bound on Gibbs uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . 63 7.3 Proof Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.3.1 Existence of fixed point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.3.2 Interpolation method: matching upper bound . . . . . . . . . . . . . . . . . . 66 7.3.3 Aizenmann-Sims-Starr: matching lower bound . . . . . . . . . . . . . . . . . 67 7.3.4 Lower bound on Gibbs uniqueness threshold . . . . . . . . . . . . . . . . . . . 68 8 The Last Chapter 70 8.1 Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 8.3 Contribution of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 A List of Papers 85 i List of Figures 2.1 Factor Graph representation of the SAT formula in example 2.1.1 . . . . . . . . . . . . . . 11 2.2 Left: A factor Graph representation of a 3-SAT formula F 3SAT = (x1 ∨¬x3 ∨ x4)∧ (x2 ∨ ¬x4 ∨¬x5)∧ (¬x1 ∨x5 ∨¬x6)∧ (x3 ∨¬x4 ∨x6). Right: A factor graph representation of a random linear system of equations (2.1.5) over F2. . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Left: Factor graph involved in computing ν(t+1) x→a which is a function of all ’incoming messages’ ν̂(t ) b→x with b ̸= a. Right: Factor graph involved in computing ν̂(t ) a→x which is a function of all ’incoming messages’ ν(t ) y→a with y ̸= x. . . . . . . . . . . . . . . . . . . . . . 20 3.2 Up: A local snapshot of Warning Propagation update rules for message νF,a→x,ℓ defined in (3.3.3). Down: Similarly, a local snapshot of Warning Propagation update rules for message νF,x→a,ℓ defined in (3.3.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1 Galton-Watson tree T with Gibbs Uniqueness property . . . . . . . . . . . . . . . . . . . . 37 4.2 Phase diagram of k-SAT adapted and modified from [66]. Left to Right: Uniqueness, Clus- tering (Replica Symmetry), Clustering → Condensation (dynamic 1RSB), Condensation → Satisfiability (static 1RSB), UNSAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3 Phase diagram of k-XORSAT adapted and modified from [66]. Left to Right: Clustering (Easy SAT), Satisfiability(Hard SAT), UNSAT . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.1 Numerical approximations to the function φ(d) from (5.1.1) (red) and the variance η(d)2 from (5.2.5) (green). The black dashed line is the first moment bound d 7→ log(2)+ d 2 log(3/4) whereas the purple dashed line is the second moment bound. (Figure 1, [23]) 46 5.2 An illustration of the correlated GW-tree T ⊗ (Figure 1, [23]) . . . . . . . . . . . . . . . . . 50 5.3 Marginal distribution on two correlated formulas for d = 0.9 and M = 0.1m,0.5m,0.9m (Figure 2, [23]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.1 Matrix A and A′ corresponds to the Tanner graph G and G ′ . . . . . . . . . . . . . . . . . 55 6.2 Φd ,k,λ for k = 3 and d = 2.4, for λ from 0 to 0.3 (maximum at z = 0) and from 0.4 to 0.9 (Figure 1, [22]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.3 The phase diagrams for k = 3,4,5 with d ∈ (dmin,dSAT) on the horizontal and θ on the vertical axis (Figure 3, [22]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 7.1 Comparison of Bd ,k (πd ,k ) with known bounds for limn→∞ 1 n log Z (Φ) for k = 3. [21] . . 64 1 LIST OF FIGURES LIST OF FIGURES 7.2 A graphical representation of coupling technique (Aizenmann-Sims-Starr scheme) . . 68 2 1 Introduction “The theory of probability as mathematical discipline can and should be developed from axioms in exactly the same way as Geometry and Algebra.” – Andrey Kolmogorov In theoretical computer science, profound insights often arise at the intersection of the discrete and probabilistic paradigms, formally referred to as probabilistic combinatorics. In this setting, random graphs serve as a basic framework where edges are assigned at random according to a given probabil- ity distribution [48]. Many researchers investigate the circumstances under which important global properties emerge namely connectivity, the emergence of a giant component and the threshold in chro- matic number [3, 5, 48, 63, 94]. These phase transitions or discontinuities, not only reflect phenomena in statistical physics but also reveal the average-case complexity measures of algorithms [1]. General- izing this perspective from graphs to higher dimensions naturally allows for a deep investigation of random constraint satisfaction problems (CSPs) in which constraints are a generalization of edges and involve assignments rather than colorings. As a consequence, random CSPs form a unifying locus: they capture the combinatorial complexity of random graphs which require the precise algorithmic attention typical in computer science and use probabilistic techniques honed in the mathematics domain [9, 10]. This thesis capitalizes on this ternary relation in studying phase transitions in random CSPs, characterizing the number of satisfying assignments, the constraint density boundaries beyond which satisfiability disappears, analyze the performance of few physics inspired algorithms on random CSPs and detailing the structural mechanisms underlying the phase transition [8]. The Constraint Satisfaction Problem (CSP) is defined as the set of n discrete-valued variables taking 3 CHAPTER 1. INTRODUCTION 4 values from a finite domain and finds a satisfying assignment subject to m-constraints. Each of the constraints enforces some requirements on a subset of the variables [99]. A solution of the CSP is an assignment of the variables that satisfies simultaneously all the constraints. There are numerous studies were tackled in the field of computer science as well as in combinatorics and statistical mechanics [11, 61, 66, 68]. Its inter-disciplinary research interest comes from a broad spectrum of applications ranging from coding theory and communication engineering to computer architecture, operational research and artificial intelligence [17, 45, 46, 54]. Famous examples of CSPs are the k- satisfiability (k-SAT) problem and the graph q-coloring one(q-COL). In k-SAT the variables are boolean and each constraint is the disjunction (OR) of k-literals (either a variable or its negation). In the second one the variables are placed on the vertices of a graph where they can take q-possible colors and each edge of the graph enforce the constraint that the two end vertices of an edge take different colors. CSPs can be analyzed from several different perspectives. One fundamental approach comes from computational complexity theory [51, 84, 91] that classifies CSPs based on their worst-case difficulty. Particularly, it determines whether there exists an efficient polynomial time algorithm in the number of variables n and m clauses to determine the existence of a solution for every possible instances. It may come in several forms — beside the search variant which tries to find a satisfying assignment, the decision variant handles the question of whether there exists a solution or not [66]. Once a solution does exist one can might interest in how many such solutions exist [100]. While the general CSPs can be defined over deterministic structures, a complementary insightful perspective turns up when randomness comes into picture above the classical CSPs, in which behaviors are hard to observe in a deterministically generated instance [73], defined as random CSPs — the models in which the structure, assignments of variables and constraints, the matrix corresponding to the linear equations in Fq are chosen according to some probability distribution. Over the years, a significant developments have been made to illuminate the various behavior of random CSPs. It share a formal mathematical analogy with the models studied in statistical mechanics of disordered systems, particularly in the mean-field spin glasses [16, 87, 88, 90] where the interactions induced by constraints are of vexing nature and due to the randomness in their construction, they don’t possess any underlying finite dimensional structure. For instance, take the example of q-coloring problem on random graphs, the variable can be treated as Potts spins, and our goal is to find the ground state in the anti-ferromagnetic Potts model [13, 39]. A straight forward observation is when all the connection edges in the ground state are bi-colored, the problems admits a satisfying assignment as all the constraints are satisfied simultaneously. In the case of satisfiability problem, for instance in the case of random k-SAT (with k = 3 for simpler version) each variable inside a clause one can assume as a spin in the Ising model and the clauses can be thought as the interactions between the spins with satisfying clauses refer to a certain configuration of the spins [37, 40, 77, 78, 89]. With the connection to statistical mechanics one could define an energy function that penalizes the unsatisfied clauses. Like previous example here also our goal is to find the ground state of this system which would correspond to a satisfying assignment. Analogously, one can find the low-energy configuration in an Ising model. A particular interesting research topic is to analyze the regime where both the number 5 CHAPTER 1. INTRODUCTION of variables(n) and clauses (m) tends to infinity (∞) at a fixed ratio α= m/n, called as the constraint density ratio. The random CSPs exhibit the threshold phenomena in this regime, the probability of some of the properties falling abruptly from 1 to 0 [74, 80] as a function of α which treated as a control parameter. Any of these phase transitions occurs at the satisfiability threshold denoted as αSAT which depends on the parameter k,d , q of the problem (where d refers to average variable degree of a random satisfiability problem and q refers to the number of colorings that can be assigned to a variable in the case of q-coloring problem on random (hyper)-graph). For α < αSAT, the system admits a satisfying assignment that satisfy all the constraint simultaneously, while a random instance is typically unsatisfiable for α>αSAT 1. To establish phase transitions in different random CSPs, statistical physicists have contributed significantly by analyzing non-rigorous methods and explained the combinatorial structure of CSPs by providing in details the solution space geometry of the problem and its connection with the phase transitions [59, 66]. Thanks to the similarity between random CSPs and spin glasses, the application of these methods first develop in the context of statistical mechanics of disordered systems namely the replica symmetry and cavity method [15, 59, 65, 72, 79] in the statistical physics jargon. These non-rigorous techniques have provided the predictions of αSAT in many models as well uncovered many other phase transition corresponds to the structure of the set of solutions in the satisfiability phase. Additionally, the cavity method has inspired new algorithms, particularly the message passing algorithms as well spectral algorithms and leads to exciting predictions to the information-theoretic and computational nature of different inference problems. These algorithms exploit the detailed picture of the solution space, out of which some of the predictions have been confirmed later as well rigorously [1, 8, 41, 76]. So, on one side statistical physicists are happy with their heuristic predictions or evidence to a given problem, whereas on the other side mathematician/theoretical computer scientists try to verify their heuristic predictions by means of mathematical proofs. For example, in [32], the authors mathematically establish a formula for the mutual information in statistical inference problems induced by random graphs inspired by cavity method, a non-rigorous physics approach. In this PhD. also we verify and confirm such a heuristic prediction of two physicists Ricci-Tersenghi and Semerjian [96] using a physics inspired message passing algorithm called as Belief Propagation Guided Decimation. In particular, we [22] derive an explicit threshold upto which the algorithm suc- ceeds with a strictly positive probability Ω(1) and beyond which it fails to find a satisfying assignment with high probability [22]. Alongside, we also analyze a thought experiment called the decimation process for which we identify a (non)-reconstruction and a condensation phase transition. There is one more interesting phase transition occurs in the satisfiable phase, name as clustering (dynamic) phase transition occurs at a critical density denoted as αclus. From the name it is clear that this tran- sition emphasizes the drastic change of the shape of the set of solutions, which can be treated as a subset of the whole configuration space. Below this threshold (αclus), the set of solutions is rather 1Although Friedgut [47] provides the transition from SAT to UNSAT (Theorem 4.1.2, Chapter 4), the existence of satisfia- bility thresholds (αSAT) still is an open problem in many interesting problems like- random 3-SAT, random k-NAE-SAT (for small k values – say random 3-NAE-SAT), random 3-graph coloring etc. Many other general CSP thresholds besides k-SAT and coloring predicted by physics heuristics, but not proved rigorously. CHAPTER 1. INTRODUCTION 6 well-connected that is any solution can be reached from other by a reshuffling of a confined number of variables, whereas above αclus the set of solution chops into a large number of distinct clusters which corresponds to a pure decomposition of the uniform measure over the set of solutions [59]. Internally, the clusters are well-connected but well-separated from the other. This phase transition also marks the emergence of a certain long range correlations among variables which enables the solvability of an information-theoretic problem called tree reconstruction [85]. But when we are talking about the static version of the model, then the properties are not affected by this clustering transition, rather only sensitive to a further transition called condensation phase transition denoted as αcond which affects the number of dominant clusters [59]. In random k-XORSAT both (non)-reconstruction and the clustering thresholds coincide (the phase diagram can be found in Chapter 4). Because of the structure of the k-XORSAT (which can be translated into linear system over F2), the solution space geometry is completely determined by the linear 2-core which marks both (i). the onset of frozen variables – shattering into clusters and (ii). the onset of long-range correlations on the GW tree model which im- plies reconstruction possible. Further the clustering threshold αclus can be defined as the appearance of a non-trivial solution of the one step of Replica Symmetry Breaking (1RSB) equation with Parisi breaking parameter X = 1, in the context of cavity method [69]. For more details in particular about the reconstruction and systematic connection one can refer to 67. Experimentally, for some random CSPs it is verified that the clustering threshold αclus happens at a much smaller constraint density than the satisfiability threshold αSAT in the large k-limit. For instance, the asymptotic bound for the clustering and satisfiability thresholds for the random k-SAT isαclus ∼ 2k k (logk+loglogk+1−log2) [81] and αSAT ∼ 2k log2 [34] (more details can be found in Chapter 4). Despite the detailed picture of the solution space geometry of the random CSPs, it remains an interesting research question to understand how the algorithms behave when they are trying to find a solution in the satisfiable regime. More specifically, researchers would like to determine the algorithmic threshold denoted as αalg above which no algorithm is able to find a solution for the problem with high probability in polynomial time. Using numerical simulations, for small k-values one can able to design the algorithms efficiently where the clause to variable densities very close to the satisfiability threshold (αSAT), whereas it cannot be possible to solve numerically for large values of k. In this context, Coja-Oghlan in [27] provides a polynomial time algorithm for random k-SAT upto the constraints densities coincide at leading order with the clustering threshold αclus. Although this strands a broad range of the threshold value where typical instances have a non-empty set of solutions, but there is no known algorithm which can be able to find the solutions efficiently. In some cases it is proven that these algorithms fail to find the solutions [30, 50, 56]. Even if it is hard to predict the algorithmic threshold αalg precisely in terms of structural phase transition one can predict the hypothesis that the clustering threshold is upper bound to the algorithmic one, αalg ≤ αclus. The research in structural phase transitions within the satisfiability regime particularly the emergence of the clustering threshold αclus in terms of long-range correlations, relies on the uniform distribution of all satisfying assignments. In this thesis, we also prove that throughout this satisfiable phase, the logarithm of the number of satisfying assignments of a random 2-SAT exhibits fluctuations of order p n, where n is the number of 7 CHAPTER 1. INTRODUCTION variables. Going back to the statistical physics inspired perspective, one more prominent threshold arise in terms of characterization on the decay of correlation under the Gibbs (or Boltzmann) measure on the solution of the problem is called “Gibbs Uniqueness threshold” denoted as duni q [98]. This threshold is expressed in terms of the average degree of variable (d) and comes into picture when analyzing the infinite tree limit – Galton-Watson process approximation of the problem. Conversely, other two thresholds (satisfiability and clustering) are defined with respect to the constraint density (α). In sparse random graph models, these two parameters are closely related although d concerns with the local geometric structure of the factor graph, whereas α treats as a control parameter in the random CSP ensembles. For instance, in case of random k-SAT or random k-coloring, the constraint density α determines the expected average degree of variables d , making the uniqueness threshold relate with the other structural transitions. Specially, from an algorithmic prospective, the Gibbs uniqueness threshold plays an important role as the local algorithm such as Belief propagation is effective in this regime. In this thesis, we explicitly provide a lower bound on this Gibbs uniqueness threshold (duni q ) that improves over prior work of Montanari and Shah [83]. Particularly, we prove that for any k ≥ 3 for clause/variable ratios upto this uniqueness threshold of the corresponding Galton-Watson tree, the number of satisfying assignments of random k-SAT is given by the physics method called ’replica symmetry’ predicted in [77]. This manuscript is based on the three papers [21–23] that have been produced during my PhD. The remaining chapters are organized as follows. ■ Chapter 2 introduces the definition and representation of constraint satisfaction problems. We also discuss one of the well-known CSP: the satisfiability problem and its importance in computer science. The chapter concludes with a brief discussion of several statistical physics quantities that will be used throughout the thesis. ■ Chapter 3 focuses on message passing algorithms, including Belief Propagation and Warning Propagation, along with some variants derived from them. We discuss the applications and characteristics of algorithms such as Belief Propagation Guided Decimation, the Decimation Process (a statistical physics–inspired thought process), and the purely combinatorial algorithm namely Unit Clause Propagation. We also present one of the most useful algorithms employed in this thesis for estimating the logarithm of the partition function on random k-SAT (further details in Chapter 7). ■ In Chapter 4 we examine the satisfiability transition and compares short-range and long-range correlations in random satisfiability problems, with a focus on random k-SAT and random k-XORSAT. We study the Gibbs measure on random CSPs and explore various phase transitions, including reconstruction, non-reconstruction, condensation, and Gibbs uniqueness, in the context of both random k-XORSAT and k-SAT. The chapter ends with a comparison of the different phases and the associated solution space geometry of random k-SAT versus random k-XORSAT. CHAPTER 1. INTRODUCTION 8 ■ Chapter 5 addresses the number of satisfying assignments in random 2-SAT. This chapter con- tains the first main result of the thesis, establishing a central limit theorem for the number of solutions of random 2-SAT formulas – first time CLT on any kind of random satisfiability problems. ■ Chapter 6 presents the second main result of the thesis, analyzing the performance of the Belief Propagation Guided Decimation algorithm on random k-XORSAT and comparing it with the statistical physics–inspired thought process namely decimation process. We also identify different phase transitions of the decimation process, pinpointing the regimes of d and θ where BPGD succeeds or fails. ■ In Chapter 7 we revisit the Gibbs uniqueness threshold for random k-SAT and improves the lower bound established by Montanari and Shah [83]. Towards the proof of the result we introduce ’interpolation method’ and ’Aizenmann-Sims-Starr scheme’ for proving the matching upper and lower bound on the logarithm of number of satisfying assignments on random k-SAT upto Gibbs uniqueness threshold. We explicitly determine the number of satisfying assignments predicted by the statistical physics–inspired replica symmetric solution. The result comes in terms of the Bethe free entropy Bd ,k which is a function defined for a probability measure π ∈P (0,1). ■ The final chapter summarizes the results of the thesis, comparing them with existing work in the probabilistic combinatorics as well as statistical physics literature. We conclude by outlining several interesting open problems and possible directions for future research. 2 Models “...random constraint satisfaction problems (CSPs) is the geometry of the space of satisfying or almost satisfying assignments ... for which a precise landscape of predictions has been made via statistical physics-based heuristics.” –Jun-Ting Hsieh et. al. 2.1 Constraint Satisfaction Problems 2.1.1 Definitions A constraint satisfaction problem (’CSP’) is defined as the set of n variables denoted as V = {x1, x2, · · · , xn} are submitted to a set of a number of constraints C = {a1, a2, · · · , am} for some m ∈N. The variables xi , i ∈ {1, · · · ,n} take their values in a finite setΩ. Clearly, when |Ω| = 2, the variables will be treated as Boolean variables: Ω= {0,1}. In statistical physics paradigm these are equivalent to spins σi ∈ {−1,1}, using the change of variable σi = 2xi −1. When the set Ω takes an arbitrary integer q , the variables can be served as Potts spins or colors: Ω= {1,2, · · · , q}. Further, we call σx = {x1, x2, · · · , xn} ∈Ωn as a configuration of the variables and for a subset ω⊆ {1, · · · ,n} of variables, we call σxω its configuration. The clauses a j with j ∈ {1, · · · ,m} details a subset ∂ j ⊆ {1, · · · ,n} of variables [99] and put a constraint on the value of their configuration σx∂ j . More specifically, when the constraint is satisfied the function a j : Ω|∂ j | → {0,1} assesses to value 1, otherwise to value 0. There are several variations of CSPs. In the optimization version of the problem, the goal is to find an optimal configuration which minimizes the cost function. The cost function E :Ωn →R+ is defined 9 CHAPTER 2. MODELS 10 as the counts of total number of constraints which are unsatisfied by a configuration σ: E(σx ) = m∑ j=1 (1−a j (σ∂ j )) (2.1.1) The second type of the problem referred as decision making where the aim is to find a configuration σ⋆x with cost function E (σ⋆x ) ≤ E0, where E0 is given as the threshold value for the cost function. When this threshold value becomes 0, then the goal will be to find such a configuration which satisfies all the constraints simultaneously. Such this configuration will be referred as the solution of the problem. In other way, a solution or, a satisfying assignment is a mapping σ : V →Ω that satisfies every constraints. As long as computing the cost function is easy to evaluate, the decision making problem will be easier than the optimization version. Because in the decision case, once we have the optimal configuration we just need to compare it with the threshold value of the cost function. The third variant of the CSPs so called counting problem refers to count the number of satisfying assignments (or, solutions) of a given instance. Generally speaking, this version is much more harder as compared to the previous two variants. In this thesis, we compute such a problem on random 2-SAT where we count the number of satisfying assignments of random 2-SAT throughout the satisfiable regime. Moreover, in this thesis we also determine the number of actual satisfying assignments of random k-SAT formula for the clause-to-variable densities upto Gibbs Uniqueness threshold discussed in Chapter 4. 2.1.2 The SAT Problem The satisfiability problem has a long and exciting history in probabilistic combinatorics as well as in computer science. Consider a boolean formula F consists of n boolean variables and m logical clauses {a1, a2, · · · , am} on the set of literals. Each literal li corresponds to a variable xi can take either the value xi or its negation (¬xi ). Each clause present in the SAT formula F is a disjunction (logical OR (∨)) on the literals and are of the form a j = li (1) ∨ li (2) ∨ li (3) ∨·· ·∨ li (∂ j ) (2.1.2) where each literals li are formed from the variables in ∂ j . So, the variables in ∂ j can take values from 2|∂ j | possible combinations out of which in one case only the clauses get violated when all the literals take the value 0. Then the satisfiability formula F is the conjunction (logical AND(∧)) over the set of clauses and are of the form F = a1 ∧a2 ∧a3 ∧·· ·∧am (2.1.3) and is also called a CNF formula (conjunctive normal form). Subsequently, the formula F satisfies i.e., evaluates to 1 if and only if all the clauses present in F are evaluate to 1. Below example 2.1.1 provides a SAT formula which consists of eight variables {x1, x2, · · · , x8} and four clauses {a1, · · · , a4}. Moreover, a satisfying assignment to the below toy example is given by σ(x1) = 1,σ(x2) = 1,σ(x3) = 0,σ(x4) = 11 CHAPTER 2. MODELS x1 x2 x3 x4 x5 x6 x7 x8 a1 a2a3 a4 x9 Figure 2.1: Factor Graph representation of the SAT formula in example 2.1.1 0,σ(x5) = 1,σ(x6) = 1,σ(x7) = 0,σ(x8) = 1,σ(x9) = 1. Example 2.1.1. F = (¬x1 ∨x2 ∨x3 ∨x4)︸ ︷︷ ︸ a1 ∧ (¬x4 ∨x9)︸ ︷︷ ︸ a2 ∧ (x4 ∨¬x5 ∨x6)︸ ︷︷ ︸ a3 ∧ (x1 ∨¬x6 ∨¬x7 ∨x8)︸ ︷︷ ︸ a4 In the decision version of this problem, a satisfying assignment is the configuration σ such that our formula F evaluates to 1. Thus the instance of a SAT problem is defined by the number of variables (n) and clauses (m) as well for each clause a the given choice of the subset ∂ j and for each variable appears in that subset ∂ j the choice of the literal li appears in the clause a j by describing the formula F . Using the spins σi ∈ {−1,1} with changing variable σi = 2xi −1 used in the context of Ising model in statistical physics paradigm, the clauses can be written as follows: a j (σ∂ j ) = 1− ∏ i∈∂ j 1[σi =−Υ j i ] (2.1.4) where, Υ j i =    1 if the literal is xi and, −1 if the literal is ¬xi 2.1.3 Why SAT ? So far, we have discussed the basic definitions of constraint satisfaction problems and the satisfiability (SAT) problem which is a member of a larger family of CSPs. The obvious question comes into reader’s mind that why we study the satisfiability problems in computer science. The SAT problems play a crucial role in theoretical computer science and sit in a prominent place among all NP-complete problems. This is because SAT is both simple by empowering combinatorial reasoning and general enough to model any kind of other problems in a quite natural style. CHAPTER 2. MODELS 12 (i) SAT is simple to describe: In complexity theory, one of the most general theorem concerning the NP-complete problems is “NDTM-ACCEPTANCE” which states: given the description of a non-deterministic turing machine M , an input string x and the number of steps t , does M accept x within t steps? Although this problem is more general than SAT, but not simple. General in the sense that the theorem “NDTM-ACCEPTANCE is NP-complete” becomes a triviality whereas on the other hand the Cook-Levin theorem “3-SAT is NP-complete” is one of the fundamental results making SAT much more pliable by allowing combinatorial reasoning than the non- deterministic turing machine. One of the most interesting and deep research topic of SAT problems say random k-SAT, intrinsically defining a probability distribution on k-SAT formulas exhibits many interesting phenomena. By contrast, defining a probability distribution on the tuples (M , x, t), makes the instance of NDTM-ACCEPTANCE much less natural as well less interesting as compared to SAT. (ii) SAT is general: Beside its simplicity regarding the combinatorial structure, it is still general enough to model a wide range of problems in a quite natural fashion. Thus it can be served as a ’modeling language’ for many problems. In complexity theory any NP-complete problem can model another NP problem via reduction if there exists a polynomial time algorithm for that problem. But some of these reductions are more or less straightforward, whereas some are not. Let’s consider the problem “Hamiltonian Path” where we have given a graph G and we know some of the edges say e1,e2,e3 are the part of the Hamiltonian path. Then can we be able to extend this to complete a Hamiltonian Path? So, formulating this problem as a SAT problem is quite straightforward whereas the reduction from SAT to Hamiltonian Path requires the design of several clever mechanism. 2.1.4 Factor Graphs Coming to the graphical representation of an instance of a CSP, the most prominent way to represent any CSP problem is using a graph called factor graph or, tanner graph. This is a bipartite graph G = (V ,C ,E) consists of two types of nodes where the first type V = {x1, x2, · · · , xn} represents the variables (referred as variable nodes) and the second type C = {a1, a2, · · · , am} represents the clauses (referred as check/factor nodes). E is the set of edges connecting the variable and check nodes. There will be an edge between a variable node (xi ) and a check node (ai ) if the variable xi appears in the check ai either in original form (xi ) or as its negation (¬xi ). Furthermore, there can be a weight function associated with each check node ai where these weights are linked to a probability distribution called Boltzmann (Gibbs) distribution described later in this section. Figure 2.1 is a factor graph representation corresponds to the SAT formula F given in example 2.1.1. In the figure, the variable nodes V = {x1, x2, · · · , x9} are represented by filled circles whereas the check nodes C = {a1, · · · , a4} are represented by the empty square. The set E is represented by both solid and dashed edge, where the solid link between a variable and check node is referred as the variable appears as its original form in that check, whereas the dashed link represents that the variable appears 13 CHAPTER 2. MODELS x1 x2 x3 x4 x5 x6 a1 a2 a3 a4 a1 a2 a3 x1 x2 x3 x4 Figure 2.2: Left: A factor Graph representation of a 3-SAT formula F 3SAT = (x1 ∨¬x3 ∨x4)∧ (x2 ∨¬x4 ∨¬x5)∧ (¬x1 ∨x5 ∨¬x6)∧ (x3 ∨¬x4 ∨x6). Right: A factor graph representation of a random linear system of equations (2.1.5) over F2. as negation in that check node. Equivalently, the factor graph G also can be viewed as a hyper-graph where the variables are still represented as vertices but the clauses are now represent as hyper-edges which link a subset of vertices with length > 2. In this thesis we are mainly interested in the factor graph associated with the random k-SAT model and the random matrix over finite field. Let’s talk about the representation of the factor graph corresponds to these two models briefly. (i) random k-SAT: The most common example of SAT problem is random k-SAT where each clause can take exactly k-variables. Similar to other factor graph representations, the set V represents the boolean variables and the set C represents the clauses and there will be an edge between a variable node and a check node if and only if that particular variable is present in that check. The variables are denoted by a circle and the check/factor nodes are denoted by a square. Figure 2.2 (left) is a simple factor graph representation of a k-SAT formula with k = 3. (ii) random linear equations over F2: Coming to the random linear system of equations unlike random k-SAT, it is easier to grab. Consider a linear system of equation Ax = b over F2 where A is an n ×n matrix with each entry 1 with probability d/n where d is the average variable degree. Now, given a random vector b = {0,1}m , our goal is to design a factor graph corresponding to the system of linear equations. Resembling to k-SAT, here also, the set V represents the variables designed by a circle and the check nodes C represents the equations designed by a square. There will be an edge between a variable and a check if and only if that particular variable will appear in that equation. Let’s take a toy example of this kind below:   1 0 1 0 0 1 1 1 1 1 0 1   ︸ ︷︷ ︸ A   x1 x2 x3 x4   ︸ ︷︷ ︸ x =   1 0 1   ︸︷︷︸ b (2.1.5) CHAPTER 2. MODELS 14 Finding the set of all possible solutions of the vector X of the system of linear equations Ax = b over F2 is referred as a well-known random CSP called random k-XORSAT where each equation contains exactly k variables. In other words the matrix A has exactly k ones in each row. So, as compared to k-SAT, the disjunction ∨ (OR) inside a clause is replaced by XOR denoted by ⊕: a j = li (1) ⊕ li (2) ⊕ li (3) ⊕·· ·⊕ li (k) (2.1.6) Equivalently, one can rewrite this as the sum ∑k p=1 li (p) of literals inside a clause equals to 1 modulo 2. One of the best known algorithms for solving this linear system of equations in polynomial time O(n3) is Gaussian elimination where the number of equations m =Θ(n) with n is the total number of variables present in those equations. Moreover, this k-XORSAT can be random when the clauses (equations) are drawn independently and uniformly at random from the all possible 2k (n k ) XOR-clauses on the set of variables V = {x1, x2, · · · , xn}. Due to its algebraic structure, any algorithm is always easy to analyze on this model. Later in this thesis, we will look over the performance of such a physics inspired algorithm on random k-XORSAT model. Figure 2.2 (right) shows the factor graph representation of the linear system of equations over F2 as given in equation (2.1.5). 2.2 Statistical Physics and CSPs One of the most striking phenomenon of science is to deal with the ever growing variety of states of matter with various properties. Here the statistical physics comes into picture which aims to explain how the complex behaviors can emerge when a large numbers of identical elementary component interact with each other. It relies on two notable steps, in one hand passing the idea from the deter- ministic law of physics to a probabilistic description, on the other hand it starts from a probabilistic description and tries to recover that determinism by law of large numbers at a macroscopic level. In this section we will discuss some of the basic properties of statistical physics and its connection with the constraint satisfaction problems. 2.2.1 Boltzmann (Gibbs) probability distribution From equation (2.1.1), there exists a cost function E : Ωn → R+ which counts the total number of clauses violated by a given assignment σx ∈Ωn of the n variables. In the mathematical optimization problem one always aims to minimize this cost function defined over the set of all possible configura- tions Ωn . Once the configuration space Ω and the cost function E are fixed, the Boltzmann probability distribution for the system to be found in the set of configuration is given by, µβ(σx ) = 1 Z (β) e−βE(σx ) (2.2.1) 15 CHAPTER 2. MODELS where, the normalization constant Z (β) is known as Partition function in physics jargon and is equal to Z (β) = ∑ σx∈Ωn e−βE(σx ) (2.2.2) The real parameter T = 1/β is the temperature with β refers as the inverse temperature. In the context of CSP, to emphasize the factor graph G by introducing a weight function ψa j :Ω∂a j → (0,∞) to each constraint a j , the Boltzmann distribution can be re-written as µ(σx ) = 1 ZG m∏ j=1 ψa j (σx∂ j ) where, ZG = ∑ σx∈Ωn m∏ j=1 ψa j (σx∂ j ) (2.2.3) However equations (2.2.1)–(2.2.2) interpolates smoothly between numerous interesting situations. In the high-temperature limit (β→ 0), one can recover uniform probability distribution whereas in the low-temperature limit (β→∞) it concentrates on the global maxima of the original distribution. Specifically, in theβ→∞ limit, a configurationσx0 ∈Ω such that E (σx ) ≥ E (σx0 ) for anyσx ∈Ω is called a ground state with Ω0 denotes the set of all ground states and the corresponding energy E0 = E (σx0 ) is called ground state energy. Therefore, lim β→0 µβ(σx ) = 1 |Ω| and lim β→∞ µβ(σx ) = 1 |Ω0| I(σx ∈Ω0) (2.2.4) Also, in this setting the cost function E is termed as the Hamiltonian or the energy function and is defined as, EG(σx ) =− log m∏ j=1 ψa j (σx∂ j ) (2.2.5) The most important thermodynamic potential in this regards is the Free energy which is defined as FG(β) =− 1 β log Z (β) (2.2.6) whereas in calculations, it is often more convenient to use the Free entropy given by, ΦG(β) =−βFG(β) = log Z (β) (2.2.7) 2.2.2 Some statistical physics models In mathematical physics a wide range of interesting phenomenon occur when we make the number of variables n →∞. From the above section it is clear to see that there is a direct map between CSPs and the statistical physics problems. Let’s take an example of a spin glass models which are the generalization of Ising model with the variables treated as spins σ j with σ j ∈ {−1,+1} and the coupling CHAPTER 2. MODELS 16 J a either takes value from R or from {−1,+1}. Therefore, the energy function is defined as, E(σx ) =− m∑ a=1 J a ∏ j∈∂a σ j (2.2.8) But when it comes to the interaction between general spin-glass model to p-body, p-spin model comes into picture. One of the most famous well-known model in this regard is the Edward-Anderson model when p = 2. Moreover, the spin glass model can be formatted as a constraint satisfaction problem in the β→∞ limit as the Boltzmann distribution in (2.2.1) minimizes our energy function E(σx ). For a two body interaction, the variables should have either ferromagnetic (where J a > 0 and ∏ j∈∂aσ j = 1) or, antiferromagnetic (where J a < 0 and ∏ j∈∂aσ j =−1). The general idea behind the ferromagnetic regime is that when β is large (low temperature case), one of the spins begins to dominate the others and the system shows a positive or negative magnetization. On the other hand when β is small (high temperature case), there is no magnetization has been observed and the regime is called paramagnetic. Then the obvious observation is to pinpoint the critical inverse temperature (βcrit) where the phase transition occurs i.e., the system suddenly switches from paramagnetic to ferromagnetic. Coming to the random k-SAT model, for each constraint a j with j ∈ [m] and for some β> 0 we can rewrite the weight function and Hamiltonian as follows ψa j (σa∂ j ) = m∏ j=1 exp (−β · 1{σÕ a j } ) E(σ) =β · m∑ j=1 1{σÕ a j } Here the Hamiltonian counts the number of unsatisfied assignments and a penalty of −β is imposed to the satisfied clauses. As a result the partition function ZG(β) approximates the number of satisfying assignments by taking the inverse temperature β to infinity. Moreover, when β=∞, the Gibbs distribu- tion is the uniform distribution over the solution space as the unsatisfied clauses gets a zero penalty and therefore the partition function ZG(β) counts the number of solutions exactly. However, it is easier to handle the finite β case and take limit after. Coming to random k-XORSAT which boils down to a problem of random linear system of equations over F2, let’s take for the clause/equation a j the weight is given by, ψa j (σa∂ j ) = n∏ i=1 1 { n∑ j=1 Ai jσx j=0 } E(σ) = n∑ i=1 1 { n∑ j=1 Ai jσx j=0 } where the partition function ZG can be computed as the cardinality of the kernel of A(G) and can be written as, |kerA(G)| = ZG = ∑ σx∈Fn 2 n∏ i=1 1 { n∑ j=1 Ai jσx j=0 } The weight function ψa j (σa∂ j ) can be extended by allowing the value zero when the equation is unsatisfied, but the Gibbs distribution is always well-defined as the zero vector always belongs to the 17 CHAPTER 2. MODELS Kernel of the matrix A and therefore, ZG = |kerA| > 0. In view of weight function (sometimes called compatibility function) defined in (2.2.3), ψa j broad- cast the temperature to the variable nodes in the factor graph G. Then for analyzing the partition function one needs to look at the normalized limit of the “free entropy (ΦG(β))” referred as free entropy density (φG) and defined as, φG =φG(β) = lim n→∞ 1 n ΦG(β) = lim n→∞ 1 n log Z (β) (2.2.9) A clear observation on the Free entropy density φG reveals that there can happen a phase transition at the singularities of it if and only if φG is non-analytic. Two common types of phase transition in this regard are the first and second order phase transition. The first order phase transition occurs when the first derivative of φG w.r.t. β i.e., ∂ ∂βφG is discontinuous at some β̃ and similarly for the second order phase transition when ∂ ∂β2φG is discontinuous. Concurrently the higher order phase transitions may occur and described accordingly. Therefore, Free entropy density φG is one of the most important entity for understanding the physical system and its changing behavior. A heuristic towards Free entropy density φG is Belief Propagation, a statistical physics inspired message passing algorithm, which we will describe in details in the next chapter. 3 Message Passing Algorithms “Message passing algorithms have proved surprisingly successful in solving hard constraint satisfaction problems on sparse random graphs.” – Andrea Montanari et.al. Consider a universal problem of computing marginals of a graphical model with n variables denoted as V = {x1, x2, · · · , xn} taking values in a finite setΩ. One naive approach for computing such marginals is to take the sums over all configurations with time complexity |Ω|n . But when we are talking about tree factor graphs, computing marginals on such model takes time grows linearly with n. This can be done through a ’dynamic programming’ which recursively sums over all variables starting from the leaves and moving towards the root of the tree. Such a recursive procedure is remodeled as a distributed ‘message passing’ algorithm. These algorithms operate on ’messages’ associated with the edges of the factor graph and update the messages recursively based on local computations done at the vertices of the graph. In this chapter we will discuss few of such message passing methods along with few algorithms associated with these methods. 3.1 Belief Propagation Belief Propagation (in short BP) is one of most well known iterative message passing procedure for computing marginals as well as to compute the partition function Z with respect to a measure µ (defined in (2.2.3)) of a variable xi or any subset of variables. Moreover, it provides an efficient way to sample a configuration σx from µ and the best thing is all these computations can be achieved in 18 19 CHAPTER 3. MESSAGE PASSING ALGORITHMS polynomial time with respect to the sample size n. It is straightforward to prove that BP computes such marginals exactly on trees. For this purpose BP is extremely effective in the case of loopy graphs as well. The basic intuition behind this success is that BP as a local message passing procedure should be successful when the underlying model is a locally tree like structure. There are many applications of these type of factor graph models appear frequently in the field of probabilistic combinatorics as well as in statistical physics. Despite these advantages, BP becomes ineffective in the emergence of long-range correlations which in turn lead to a phase transition. In the later chapters we will see few of such application. 3.1.1 BP messages In 1962, R.G.Gallager [49] introduced BP messages for decoding the low density parity check matrix and in 1988 by J.Pearl [92] it was again launched for the first time in the context of probabilistic inference. Lets define two type of messages ν(t ) x→a and ν̂(t ) a→x associated with each edge (x, a) ∈ E of the factor graph G= (V ,C ,E) at step t . Out of these twos the message ν(t ) x→a is going from a variable to a check node whereas the message ν̂(t ) a→x is going from the check node to a variable node. More specifically, ν(t ) x→a is the marginal of the variable xi when the check node a is removed whereas ν̂(t ) a→x is the marginal of a variable xi when all the check nodes in ∂i \a have been discarded. As the messages are dependent on the time parameter t , so for each t > 0 both messages are probability distributions overΩ. Initially, both the messages are the uniform distribution over Ω i.e., ν(0) x→a(s) = ν̂(0) a→x (s) = 1/Ω for all x ∈V , a ∈C and s ∈Ω. One can also initialize the messages by drawing i.i.d from a probability distribution P on P (Ω). Furthermore, the BP equations [60] on a tree consists of the set of messages {ν(t ) x→a , ν̂(t ) a→x }(x,a)∈E with t > 0 are given by, ν(t+1) x→a (s) = 1 Zx,a ∏ b∈∂x\a ν̂(t ) a→x (s) (3.1.1) ν̂(t ) a→x (s) = 1 Ẑx,a ∑ σ∈Ω∂a 1 { σx = s } ψa(σ) ∏ y∈∂a\x ν(t ) y→a(σy ) (3.1.2) where the Zx,a ,Ẑx,a are the normalization constants of the messages. The equations (3.1.1)–(3.1.2) are referred as the Belief Propagation (BP) equations for which one can consider the fixed point equations. Moreover, all the messages are updated in parallel. It is clear from the above BP equations that if ∂x\a = ; then equation 3.1.1 is the uniform distribution over Ω. Similarly, if ∂a\x = ; then ν̂(t+1) a→x (s) ∝ψa(s). Figure 3.1 shows a pictorial illustration of the BP equation update rules. Moreover, the algorithm 3.1 (also see [66]) provides the iterative procedure for finding a solution of the BP equations (3.1.1–3.1.2). The obvious question is that under which condition(s) the messages converge to a limit (ν∗x→a ,ν∗a→x ). From [66], it is clear that on a tree of diameter tmax the algorithm 3.1 guarantees to find the set of messages exactly, independently of the choice of the initialization and the updating rules given in (3.1.1)–(3.1.2). Moreover in this algorithm we haven’t specified any ordering of the edges (x, a) for the update of the messages. So, reshuffling the ordering, taking any random permutation of edges before CHAPTER 3. MESSAGE PASSING ALGORITHMS 20 Figure 3.1: Left: Factor graph involved in computing ν(t+1) x→a which is a function of all ’incoming messages’ ν̂(t ) b→x with b ̸= a. Right: Factor graph involved in computing ν̂(t ) a→x which is a function of all ’incoming messages’ ν(t ) y→a with y ̸= x. Input: a factor graph G= (V ,C ,E), set of functional nodes {ψa}a∈C , precision accuracy ε, maximum number of iterations tmax. Output: A set of messages {ν(·),ν̂(·)}, or state ’Not Converge’ if fails. 1 Initialization: For each edge {x, a} ∈ E , initialize νx→a(·) and ν̂a→x (·) as i.i.d. random variables with distribution P. 2 for t = 0, · · · , tmax do 3 Compute two messages: first {ν̂(t ) a→x }(x,a)∈E , then {ν(t+1) x→a }(x,a)∈E using (3.1.1) if δ(maximum message change) < ε then 4 return set of messages {ν(t+1) x→a , ν̂(t ) a→x } 5 return “Not Converged”. Algorithm 3.1: Belief Propagation algorithm [66] each updating of messages are allowed as BP is used as a heuristic on factor graphs with loops without guarantee of convergence. 3.1.2 Computing marginals Our next goal is to compute the marginals µ(x) of a variable x ∈ V . Since we have in our hand the solution of the BP equations from algorithm 3.1, using the Markov property and the finite tree factor graph model we can construct the marginal [66] of the variable x using: µ(x) ∝ ∏ a∈∂x ( ∑ σ∂b\x ,s∈Ω 1{σx = s}ψb(σ∂b) ∏ y∈∂b\x νy→b(σy ) ) (3.1.3) 21 CHAPTER 3. MESSAGE PASSING ALGORITHMS Then using the equation (3.1.2) we get: µ(x) = 1 Zx ∏ a∈∂x ν̂a→x (s) (3.1.4) where Zx is a normalization constant. So far, we see how the marginal computation of a variable using the BP equations and know that the BP messages converge to a limit (ν∗x→a ,ν∗a→x ). The next question should come into our mind that if the limit(s) does exist then what is/are the significance of such limit(s)? Although the marginals µ(x) for all x ∈V are computed exactly on tree factor graph model using the BP fixed points, but generally this is not true always because of several limits depend on the initialization of factor graph which contains short cycles (cycles with bounded length). However, BP provides a good approximation on the marginal computation with the help of a correct initialization if the corresponding factor graph doesn’t contain too many short cycles. Furthermore, in case of tree factor graph model one can express the free entropy density Φ= 1 n log Z (where Z is referred as the partition function) from the set of messages which is the solution of BP equation (3.1.1)–(3.1.2) is known as Bethe Free Entropy (denoted asΦBethe) in physics jargon. In the next subsection we will see in details of this quantity. 3.1.3 Bethe-Free Entropy In 1935 German-American physicist Hans Albrecht Eduard Bethe in [14] first introduced the Bethe free entropy density for the ferromagnetic Ising model. Using the decomposition property of Gibbs distribution, we introduce the Bethe Free EntropyΦBethe in terms of 2|E | BP messages {νx→a(·), ν̂a→x (·)} and can be expressed as [66] ΦBethe = 1 n [ ∑ x∈V B(t ) x + ∑ a∈C B(t ) a − ∑ (xa)∈E B(t ) ax ] (3.1.5) where B(t ) x = log [ ∑ σ∈Ω∂a ψa(σ) ∏ x∈∂a ν(t ) x→a(σx ) ] , B(t ) a = log [∑ s∈Ω ∏ a∈∂x ν̂(t ) a→x (s) ] and B(t ) ax = ∑ x∈∂a log [∑ s∈Ω ν(t ) x→a(s) · ν̂(t ) a→x (s) ] Roughly speaking, B(t ) x corresponds to the contribution of the variable nodes to the partition function, B(t ) a to the contribution of the check nodes and B(t ) ax to the contribution of the edges. The aim is that, lim n→∞ 1 n E[log Z ] = lim t→∞ lim n→∞Φ Bethe (3.1.6) CHAPTER 3. MESSAGE PASSING ALGORITHMS 22 The limit ν∗x→a ,ν∗a→x which are the fixed points of the BP equations (3.1.1)–(3.1.2) (if they exist) are expected to correspond to the stationary points ofΦBethe [101]. The quantity such as the BP messages and the Bethe-Free entropy are model specific; i.e., they depend on the factor graph or the setΩ and the convergence problem on top of this. In this thesis, we find an explicit expression for the logarithm of the number of solutions of a random k-SAT formula F = F d ,k (n) for every d within the Gibbs uniqueness threshold (discuss in Chapter 4). The result comes in terms of the Bethe-Free entropy Bd ,k which is a function defined for a probability measure π ∈P (0,1). We discuss the details of this result in Chapter 7. 3.2 Algorithms In this section we discuss a few algorithms/processes on random k-XORSAT/k-SAT instances which we used in this thesis with the technique of Belief Propagation. 3.2.1 Belief Propagation Guided Decimation In early 2000s, physicists have proposed a message passing algorithm called Belief Propagation Guided Decimation (BPGD) which performs impressively on various random CSPs [72, 96] according to the computer experiments. BPGD sets its ambitions higher than merely finding a solution to the k-XORSAT instance F : the algorithm attempts to sample a solution uniformly at random. To this end BPGD assigns values to the variables x1, . . . , xn of F one after the other. In order to assign the next variable the algorithm attempts to compute the marginal probability that the variable is set to ‘true’ under a random solution to the k-XORSAT instance, given all previous assignments. More precisely, suppose BPGD has assigned values to the variables x1, . . . , xt already. Write σBP(x1), . . . ,σBP(xt ) ∈ {0,1} for their values, with 1 representing ‘true’ and 0 ‘false’. Further, let F BP,t be the simplified formula obtained by substituting σBP(x1), . . . ,σBP(xt ) for x1, . . . , xt . We drop any clauses from F BP,t that contain variables from {x1, . . . , xt } only, deeming any such clauses satisfied. Thus, F BP,t is a XORSAT formula with variables xt+1, . . . , xn . Its clauses contain at least one and at most k variables, as well as possibly a constant (the XOR of the values substituted in for x1, . . . , xt ). Input: a random k-XORSAT formula F with variables x1, . . . , xn conditioned on being satisfiable Output: an assignment σBP : {x1, . . . , xn} → {0,1}. 1 for t = 0, . . . ,n −1 do 2 compute the BP approximation µF BP,t ; 3 set σBP(xt+1) = { 1 with probability µF BP,t 0 with probability 1−µF BP,t ; 4 return σBP; Algorithm 3.2: The BPGD algorithm (Section 1.2, [22]). 23 CHAPTER 3. MESSAGE PASSING ALGORITHMS Let σF BP,t be a uniformly random solution of the XORSAT formula F BP,t , assuming that F BP,t remains satisfiable. Then BPGD aims to compute the marginal probability P [ σF BP,t (xt+1) = 1 | F BP,t ] that a random satisfying assignment of F BP,t sets xt+1 to true. This is where Belief Propagation (‘BP’) comes in. An efficient message passing heuristic for computing precisely such marginals, BP returns an ‘approximation’ µF BP,t of P [ σF BP,t (xt+1) = 1 | F BP,t ] . Having computed the BP ‘approximation’, BPGD proceeds to assign xt+1 the value ‘true’ with probability µF BP,t , otherwise sets xt+1 to ‘false’, then moves on to the next variable. The pseudocode is displayed as Algorithm 3.2. Remark 3.2.1. • If the BP approximations are exact, i.e., if F BP,t is satisfiable andµF BP,t =P [ σF BP,t (xt+1) = 1 | F BP,t ] for all t , then Bayes’ formula shows that BPGD outputs a uniformly random solution of F . However, there is no universal guarantee that BP returns the correct marginals. • Due to the algebraic structure of the XOR operation, BPGD is easier to analyze on random k- XORSAT and in fact the marginal probabilities are guaranteed to be half integral as seen in below Fact 3.2.2 i.e., P [ σF BP,t (xt+1) = 1 | F BP,t ] ∈ {0,1/2,1}. (3.2.1) Fact 3.2.2. The BP messages and marginals are half-integral for all t , i.e., for all t ≥ 0 and s ∈ {0,1} we have µF,x→a,ℓ(s),µF,a→x,ℓ(s),µF,x,ℓ(s) ∈ {0,1/2,1}. (3.2.2) Furthermore, for all ℓ> 2 ∑ a∈C (F ) |∂a| we haveµF,x,ℓ(s) =µF,x,ℓ+1(s). (Since the total number of messages is bounded by 2 ∑ a∈C (F ) |∂a|, the BP messages will have converged point wise after this number of iterations.) 3.2.2 Decimation Process In addition to the BPGD algorithm itself, the heuristic work [96] considers an idealized version of the algorithm, the decimation process. This is a thought experiment that highlights the conceptual reasons behind the success/failure of BPGD algorithm. Just like BPGD, it also assigns values to variables one after the other but instead of the BP ‘approximations’, the decimation process uses the actual marginals given its previous decisions. To be precise, suppose that the input formula F is satisfiable and that variables x1, . . . , xt have already been assigned values σDC(x1), . . . ,σDC(xt ) in the previous iterations. Obtain F DC,t by substituting the values σDC(x1), . . . ,σDC(xt ) for x1, . . . , xt and dropping any clauses that do not contain any of xt+1, . . . , xn . Thus, F DC,t is a XORSAT formula with variables xt+1, . . . , xn . Let σF DC,t be a random satisfying assignment of F DC,t . Then the decimation process sets xt+1 ac- cording to the true marginal P [ σF DC,t (xt+1) = 1 | F DC,t ] , thus ultimately returning a uniformly random satisfying assignment of F . The pseudocode is displayed as Algorithm 3.3. CHAPTER 3. MESSAGE PASSING ALGORITHMS 24 Input: a random k-XORSAT formula F , conditioned on being satisfiable Output: an assignment σDC : {x1, . . . , xn} → {0,1}. 1 for t = 0, . . . ,n −1 do 2 compute πF DC,t =P [ σF DC,t (xt+1) = 1 | F DC,t ] ; 3 set σDC(xt ) = { 1 with probability πF DC,t 0 with probability 1−πF DC,t ; 4 return σDC; Algorithm 3.3: The decimation process (Section 1.4, [22]). Remark 3.2.3. If the ’BP approximations’ are correct, the decimation process and BPGD are identical. The key question should come to our mind that for what parameter regimes these two processes coincide or diverge. Later in Chapter 6 we will see in details the connection of BPGD and decimation process and their phase transitions and the performance of BPGD by providing the exact success/failure probability regimes which verifies mathematically the heuristic work by Ricci-Tersenghi and Semerjian [96]. 3.2.3 Unit Clause Propagation In this thesis, we analyze two variants of Unit Clause Propagation, one for random 2-SAT [23] and another one for random k-XORSAT model [22]. Employed by all modern SAT solvers as a sub-routine, Unit Clause Propagation is a linear time algorithm that tracks the implication of the partial assignments. As we know that random 2-SAT problem is in P, the polynomial algorithm, that solves it, is a sequential assignment procedure that a variable is assigned to a given value at each time, ends either when all the variables are assigned and the resultant assignment is SAT (or when it has proven the formula UNSAT). At each step, once a variable is assigned, the initial CNF-formula can be simplified according to a reduction process. Suppose we set a variable i to xi = 1 (the case xi = 0 is symmetric). Each clause containing the literal xi is satisfied by this assignment and can be removed from the formula. On the other hand, clauses containing the opposite literal ¬xi cannot be satisfied by this assignment; thus, the literal ¬xi is removed from those clauses, reducing their length by one. As a consequence, this reduction may produce a 0-clause: for instance, if in the original formula F there was already a unit clause c =¬xi , then setting xi = 1 immediately creates a contradiction, and F |xi = 1 is UNSAT. In that case, backtracking is required: one must undo the assignment xi = 1 and instead try xi = 0 in order to proceed. During this process, whenever the simplified formula contains a unit clause, that variable is forced to take the unique value satisfying it. This assignment may generate new unit clauses, which in turn must also be satisfied, and so on. This cascading sequence of forced steps is known as Unit Clause Propagation (UCP). However, for worst-case 2-SAT instances, UCP alone does not guarantee success: it must be combined with backtracking to systematically explore assignments when contradictions arise. 25 CHAPTER 3. MESSAGE PASSING ALGORITHMS Let’s consider a 2-CNF formula F along with a set L of literals. These literals are deemed to be ‘true’. The algorithm then pursues direct logical implications, thereby identifying additional ‘implied’ literals that need to be true so that no clause gets violated. This procedure is outlined in Steps 1–2 of Algorithm 3.4; the outcome of Steps 1–2 is independent of the order in which literals/clauses are processed. Input: A 2-CNFΦ along with a set L of literals deemed true. 1 while there exists a clause a ≡ l ∨¬l ′ with l ′ ∈L and l ̸∈L do 2 add literal l to L ; 3 For variables x ∈V (Φ) such that x ∈L or ¬x ∈L let σx =    1 if x ∈L and ¬x ̸∈L , −1 if ¬x ∈L and x ̸∈L , 0 otherwise. Let C be the set of all clauses a such that σx = 0 for all x ∈ ∂a and return L ,C ,σ; Algorithm 3.4: Pessimistic Unit Clause Propagation (‘PUC’) (Section 2.4, [23]). Clearly, trouble occurs if PUC ends up placing both a literal l and its negation ¬l into the set L . Our ‘pessimistic’ Unit Clause variant makes no attempt at mitigating such contradictions. Instead, Step 3 just constructs a partial assignment where all conflicting literals are set to a dummy value zero. In addition to this, PUC identifies the set C of conflict clauses that contain conflicted variables only. Now consider a 2-CNF F on a set of variables V (F ). For each possible literal l ∈ {x,¬x : x ∈V (F )} we run PUC (F ,L = {l }). Let C (F , {l }) be the set of conflict clauses returned by PUC. Obtain the pruned formula F̂ from F by removing all clauses in C (F ) =⋃ l C (F , {l }). Then it is easy to verify the following fact: Fact 3.2.4. For any 2-CNF F the pruned 2-CNF F̂ is satisfiable. Remark 3.2.5. The pruned formula F̂ could have far fewer clauses than the original formula F . Accord- ingly, even if F is satisfiable the number Z (F̂ ) of satisfying assignments of F̂ could dramatically exceed Z (F ). As UCP returns all the assignments that were forced due to the presence of unit clauses, there are three possible output for this process: • the output is an assignment of all the variables in V , then F is SAT which in turn applies that the assignment produced is a solution to F . • If the obtained simplified formula doesn’t contain unit clauses, then the output is a partial assignment which can be extended to a complete satisfying assignment if and only if the input formula is satisfiable i.e., if and only if F is SAT. • the output is UNSAT which in turn the input formula F is UNSAT. CHAPTER 3. MESSAGE PASSING ALGORITHMS 26 Coming to the analysis of UCP on random k-XORSAT, due to the Fact 3.2.2, the BPGD algorithm effectively reduces to UCP, a purely combinatorial algorithm [66, 96]. It works exactly same as we discussed before by attempting to assign random values to as yet unassigned variables one after the other. After each such random assignment the algorithm pursues the ‘obvious’ implications of its decisions. Specifically, the algorithm substitutes its chosen truth values for all occurrences of the already assigned variables. If this leaves a ‘unit clause’, the algorithm assigns that variable so as to satisfy the unit clause. If a conflict occurs because two unit clauses impose opposing values on a variable, the algorithm declares that a conflict has occurred, sets the variable to false and continues; of course, in the event of a conflict the algorithm will ultimately fail to produce a satisfying assignment. The pseudocode for the algorithm is displayed in Algorithm 3.5. 1 Let U =; and let σUC : U → {0,1} be the empty assignment; 2 for t = 0, . . . ,n −1 do 3 if xt+1 ̸∈U then 4 add xt+1 to U ; 5 choose σUC(xt+1) ∈ {0,1} uniformly at random; 6 while F [σUC] contains a unit clause a do 7 let x be the variable in a; 8 let s ∈ {0,1} be the truth value that x needs to take to satisfy a; 9 if another unit clause a′ exists that requires x be set to 1− s then 10 output ‘conflict’ and let σUC(x) = 0; 11 else 12 add x to U and let σUC(x) = s; 13 return σUC; Algorithm 3.5: The UCP algorithm for random k-XORSAT instance F (Section 6.1, [22]). Let F UC,t denote the simplified random k-XORSAT formula obtained after the first t iterations (in which the truth values chosen for x1, . . . , xt and any values implied by unit clauses have been substituted). We notice that the values assigned during Steps 6–12 are deterministic consequences of the choices in Step 5. In particular, the order in which unit clauses are processed Steps 6–12 does not affect the output of the algorithm. Later in this thesis, we will see in details the connection/performance of UCP algorithm with the previously discussed BPGD algorithm on random k-XORSAT in Chapter 6. 3.2.4 Pure Literal Pursuit In our third result of this thesis [21] on the lower bound of Gibbs Uniqueness threshold on random k-SAT, the main result relies on the algorithm called Pure Literal Pursuit(’PULP’). Its purpose is to trace the repercussions of setting a relatively small number of variables to specific truth values which will allow us to compare the number of satisfying assignments that set a few chosen variables to specific values to the total number of satisfying assignments. Given a k-CNF F and a set L of literals of F that we deem to be set to ‘true’. We would like to identify a superset L̄ ⊇L of literals (L̄ is a ‘closure’ of L ) with the following properties: 27 CHAPTER 3. MESSAGE PASSING ALGORITHMS PULP1 every clause a that contains a literal from ¬L̄ = {¬l : l ∈ L̄ } also contains a literal from L̄ . PULP2 there is no literal l such that l ,¬l ∈ L̄ . It may be impossible to satisfy PULP1 and PULP2 simultaneously. In this case we ask PULP to report a ‘contradiction’. But if PULP1–PULP2 can be satisfied, we aim to find a closure L̄ of as small size |L̄ | as possible. The combinatorial idea behind PULP1–PULP2 is as follows. Deeming the literals from the initial set L ‘true’, our goal is to reconcile this assumption with the formula F . To this end we enhance the set L . Clearly, any clause that contains the negation ¬l of a literal l that we deem true also needs to contain another literal l ′ that is set to true. This is what PULP1 asks. Furthermore, it would be contradictory to deem both l and its negation ¬l true; this is PULP2. In order to identify a ‘small’ closure L̄ the PULP algorithm resorts to pure literal elimination, let’s consider a variable x is pure in a CNF formula F if sign(x, a) = sign(x,b) for any two clauses a,b ∈ ∂x. Clearly, if our objective is to construct a satisfying assignment, we might as well set all pure variables x to the value that satisfies all clauses a ∈ ∂x and disregard these clauses henceforth. In light of this observation, pure literal elimination repeatedly removes all clauses that contain a pure variable. Naturally, every round of clause removals may create new pure variables, and thus more clauses may be ripe for removal in the next round. For a clause a of the original formula F let ha(F ) ≥ 1 be the number of the round at which pure literal elimination removes a. If a is never removed then we set ha(F ) =∞. The PULP algorithm invokes a slightly modified version of pure literal elimination to accommodate the initial set L of literals. Specifically, for a variable x of a CNF F and s ∈ {±1} let F [x 7→ s] be the CNF obtained by removing all clauses a ∈ ∂x with sign(x, a) = s and removing the literal −s · x from all a ∈ ∂x with sign(x, a) =−s. The definition reflects that if we set x to value s, all a ∈ ∂s x will be satisfied, while all a ∈ ∂−s x will have to be satisfied by one of their other constituent literals. Further, let hx (s,F ) =    0 if ∂−s F x =;, max { ha(F [x 7→ s]) : a ∈ ∂−s F x } otherwise. ∈ [0,∞]. (3.2.3) We refer to hx (s,F ) as the height of literal s · x in F . The PULP algorithm, displayed as Algorithm 3.6, harnesses the heights as follows. In its attempt to precipitate PULP1 and PULP2 the algorithm iteratively enhances the set L of literals deemed to be ‘true’. For any clause a that violates PULP1 and that contains a literal l ̸∈ ¬L the algorithm adds one such literal l of minimum height to L . This choice is intended to keep the ultimate size of the closure small; one could say that PULP uses height as a proxy of ‘size’. If at any point the algorithm encounters a clause a that consists of literals from ¬L only, the algorithm reports a contradiction and aborts. Remark 3.2.6. To break ties that may occur in the execution of Steps 3 and 7 of PULP we assume that the variables and clauses of F are numbered so that Steps 3 and 7 can choose the clause/variable with the smallest number that satisfies the respective requirements. In due course we will run PULP on (finite CHAPTER 3. MESSAGE PASSING ALGORITHMS 28 Input: A k-CNF F and a set L of literals. 1 Let L̄ =L ; 2 while there is a clause a that contains a literal from ¬L̄ but no literal from L̄ do 3 Pick such a clause a that minimizes the distance from the initial set L = {|l | : l ∈L }; 4 if a consists of literals l ∈¬L̄ only then 5 return ‘contradiction’ and halt; 6 else 7 choose x ∈ ∂a with x,¬x ̸∈ L̄ that minimizes hx (sign(x, a),F ) and add sign(x, a) ·x to L̄ 8 return L̄ Algorithm 3.6: The PULP algorithm (Section 2.3, [21]) subtrees of) the Galton-Watson tree T. To number the variables and clauses of T we equip each of them with an independent Gaussian label. Since T comprises a countable number of clauses/variables, these labels will almost surely be pairwise distinct. The analysis of this algorithm can be found in Chapter 7 briefly. 3.3 Warning Propagation Warning Propagation is a purely combinatorial message passing algorithm in the same family like Belief Propagation. For some graph based matrix and constraint satisfaction models the ’discrete’ version (where the messages are from the finite alphabetΩ instead of being probability distributions) of Belief Propagation is treated as Warning Propagation which helps to find direct implications of a recursive processes associated with graphs. So, for a graph G let M (G) be the set of all vectors (ωu→v )(u,v)∈V (G)2:{u,v}∈E(G) ∈Ω2|E(G)|. Here also the parallelism holds for updating messages based on some fixed rules like in BP. The update rule ϕ is defined for d ∈N and (Ω d ) , the set of all d-ary multisets with elements fromΩ: ϕ : ⋃ d≥0 ( Ω d ) →Ω (3.3.1) which takes any multiset of input messages and produces an output messages. Now the corresponding Warning Propagation operator is defined as, WPG : M (G) →M (G) ω= (ωu→v )uv → ( ϕ ({ωu→v : uv ∈ E(G),u ̸= v}) ) uv The message from node u to node v is updated according to the WP updated rule applied to the multiset of messages that u receives from all of its neighbors except v . In factor graph setting, similar as BP, WP also associates two message sequences (ωF,x→a ,ωF,a→x ) with every adjacent clause/variable pair. In this thesis we provide a detailed analysis of Warning Propagation when we analyze the performance 29 CHAPTER 3. MESSAGE PASSING ALGORITHMS of Belief Propagation Guided Decimation(’BPGD’) on random k-XORSAT model (more details can be found in Chapter 6). Due to the half-integrality of the BP messages and marginals in random k-XORSAT, BP is equivalent to Warning Propagation. The messages take one of three possible discrete values {f,u,n} (‘frozen’, ‘uniform’, ‘null’). To trace the BP messages for which the two values {n,u} would be necessary. However, the third value f will prove useful in order to compare the BP approximations (computed using ’BPGD’ algorithm discussed in Section 3.2.1) with the actual marginals(computed using ’Decimation’ process discussed in Section 3.2.2). Although the messages initially in BP are given as uniform i.e., ωF,x→a,0(s) =ωF,a→x,0(s) = 1/2 (s ∈ {0,1}). we launch WP from all frozen start values. ωF,x→a,0 =ωF,a→x,0 = f for all a, x. (3.3.2) Subsequently the messages get updated according to the rules: x a y n n n n x a y f f u u x a y u n f u a x b n n f u a x b f f u u a x b u u u u Figure 3.2: Up: A local snapshot of Warning Propagation update rules for message νF,a→x,ℓ defined in (3.3.3). Down: Similarly, a local snapshot of Warning Propagation update rules for message νF,x→a,ℓ defined in (3.3.4) CHAPTER 3. MESSAGE PASSING ALGORITHMS 30 ωF,a→x,ℓ+1 =    n if ωF,y→a,ℓ = n for all y ∈ ∂a \ {x}, f if ωF,y→a,ℓ ̸= u for all y ∈ ∂a \ {x} and ωF,y→a,ℓ ̸= n for at least one y ∈ ∂a \ {x}, u otherwise, (3.3.3) ωF,x→a,ℓ+1 =    n if ωF,b→x,ℓ = n for at least one b ∈ ∂x \ {a}, f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x \ {a} and ωF,b→x,ℓ = f for at least one b ∈ ∂x \ {a}, u otherwise. (3.3.4) In addition to the messages we also define the mark of variable node x by letting ωF,x,ℓ =    n if ωF,b→x,ℓ = n for at least one b ∈ ∂x, f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x and ωF,b→x,ℓ = f for at least one b ∈ ∂x, u otherwise. (3.3.5) We conclude the chapter by establishing a relationship between BP and WP. Fact 3.3.1. For all t ≥ 0 and all x, a we have νx→a,ℓ(1) = 1/2 ⇔ ωF,x→a,ℓ ̸= n, (3.3.6) νa→x,ℓ(1) = 1/2 ⇔ ωF,a→x,ℓ ̸= n, (3.3.7) νx,ℓ(1) = 1/2 ⇔ ωF,x,ℓ ̸= n. (3.3.8) In the next chapter we will have a detailed overview of the phase transitions of different random satisfiability problems. 4 Phase Transitions in random CSPs “Recent research indicates that many convex optimization problems with random constraints exhibit a phase transition as the number of constraints increases..” –Dennis Amelunxen et al. In this chapter we provide a detailed analysis on the phase transitions occurring in random constraint satisfaction problems when the clause density α is varied. Particularly, we will take as an example of mainly two random CSP models, one is random k-SAT and another is random k-XORSAT which are the main theme of this thesis, but many other random CSP ensembles share the same qualitative nature. 4.1 The Satisfiabilty Transition Recall that the satisfiability threshold αsat (k) separates a phase α < αsat (k) where the random in- stances are SAT w.h.p. to a phase where random instances are UNSAT w.h.p. One of the most well- known technique for an estimation of the satisfiability threshold discussed in [15, 59, 65, 72, 79] is the cavity method. However, the existence of the satisfiability transition is not yet proven for all k values. The below conjecture summarizes the fact. Conjecture 4.1.1. Let F = F (k,α) be a random CNF formula with n variables and m = αn clauses with α is the clause density, drawn from the random k-SAT ensemble. Then for any k ≥ 2 there exists a 31 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 32 constant αsat (k) such that for any ε> 0, lim n→∞P[F (k,αsat (k)−ε) is SAT ] = 1 lim n→∞P[F (k,αsat (k)+ε) is SAT ] = 0 This conjecture has been proved for k = 2 with αsat = 1 by Chvatal and Reed in 1992 [26] and Goerdt in 1996 [53]. Later in 2015, Ding, Sly and Sun prove the satisfiability conjecture for large but finite k value and show that the value of αsat (k) is given by the one step symmetry breaking cavity method prediction. However, Friedgut in 1999 [47] provides a partial result in this regards: Theorem 4.1.2. For every k ≥ 2, there exists a sequence αk (n) such that for all ε> 0, lim n→∞P[F (k,αk (n)−ε)is SAT ] = 1 lim n→∞P[F (k,αk (n)+ε)is SAT ] = 0 The above theorem provides the transition from SAT to UNSAT which takes place in a window smaller than any fixed ε for large enough n. However, it remains to prove the convergence of the sequence αk (n) to some value αsat (k) as n → ∞ to prevent from possible oscillations. There are several methods to derive the upper and the lower bounds rigorously on this sequence αk (n). The most well-known choice for deriving the upper bound on the number of satisfying assignments is to apply the Markov’s inequality (or, the first moment bound) whereas for obtaining the lower bound on the number of solutions, Chebyshev’s inequality (or, second moment method) which is more delicate to implement than the first moment bound. More precisely, define a function on the set of instances U (F ) such that, U (F ) =    0, if F is UNSAT ≥ 1, otherwise Then after applying Markov’s inequality one can get, P[F is SAT] ≤ E[U (F )] As we don’t know how to compute the quantity U (F ) = 1[F is SAT], the first choice will be to use U (F ) = Z (F ), the number of solutions of F . Then given an assignment σ, by linearity of expectation and by uniformity in the clause generation E[Z (F )] is given by, E[Z (F )] = 2n(1−2−k )m = exp[n(log2+α log(1−2−k ))] When n →∞ we get, E[Z (F )] →    0, if α>αu(k), +∞, if α<αu(k) with αu(k) =− log2 log(1−2−k) . Because of the number of satisfying assignments Z (F ) can take exponentially 33 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS large values as n →∞ and its fluctuations can be exponentially large as well, one can expect that the above upper bound αu is not tight. Later in 1998, Kirousis et.al. [57] define the function U (F ) as the number of locally maximal SAT assignment which counts the number of solutions in a small subclass of solutions. Thus they obtain a new more tighter upper bound α̂u(k) that is the solution of the equation α log(1−2k )+ log(2−exp(−kα/(2k −1))) = 0 Coming to the lower bound on the value αk (n), in 2006 Achlioptas and Moore [4] first introduced the method for obtaining the lower bound using second moment method. Applying the second moment to the function U (F ) we get, P[F is SAT] ≥ E[U (F )]2 E[U (F )2] It can be shown that applying this method to the number of solutions Z (F ) doesn’t provide a useful bound because the fraction E[Z (F )]2 E[Z (F )2] disappears for any non-zero value of α. Instead one can choose another function U to be the size of the subset of the set of solutions. Using this, Achlioptas and Moore in [4] showed that for any k ≥ 3, lim n→∞P[F (k,α) is SAT] = 1, if α≤ 2k−1 log2−2 For n →∞ this lower bound along with the upper bound provides the scaling αsat (k) =O(2k ). In the next section we will discuss briefly ’quenched’ and ’annealed’ techniques which is very useful in the analysis of satisfiability threshold on random CSP problems. 4.2 Quenched and Annealed Techniques Let us consider a graphical model G and its corresponding measure µG (F ). The support of the model is the set of solutions of a given instance F : µG (F )(σ) =    0, if σ is not a solution > 0, otherwise When σ is not a solution i.e., µG (F )(σ) = 0 we can introduce the parameter inverse temperature (β) and bringing the normalized free entropy (defined in (2.2.7)) as follows: ΦG =Φ( G (F ),β )= 1 n log Z (G (F ),β) (4.2.1) Now when the instance is randomly drawn (for instance from the random k-SAT ensemble), the mea- sure µG (F ) becomes random. Determining the typical properties of measure µG (F ) and the normalized CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 34 free entropy density ΦG are two most important quantities for estimating the partition function which counts the number of solutions in any random satisfiability problems. For this the quenched free entropy density is defined over the random ensemble of instances: Φque(G ) = lim n→∞ 1 n E[log Z (G (F ))] (4.2.2) Usually, this quantity cannot be evaluated exactly on random satisfiability problems, but can be estimated using non-rigorous cavity method which we will discuss later in this chapter. There is a natural upper bound on this quantity provided by normalized annealed free entropy density, Φann(G ) = lim n→∞ 1 n logE[Z (G (F ))] (4.2.3) It is straightforward from Jensen’s inequality applied on the number of solutions Z (G (F )) yields: Φann(G ) ≥Φque(G ) (4.2.4) Later in this chapter, when we will talk about the replica symmetry trick, we will again introduce the ’quenched free energy’ quantity. In the second result of this thesis, we harness ’quenched’ arguments which was partly developed in some prior work [12,29] on the rank of random matrices over finite fields to establish a precise connection between the decimation process and the performance of BPGD [22]. 4.3 Gibbs measure and Long range correlation The effectiveness of belief propagation relies on a basic assumption that the adjacent variable nodes becomes weakly correlated with respect to the resulting distribution when a check node is pruned from the factor graph. But when the factor graph contains small loops or variables are correlated at a long distance then the above hypothesis may break down [66]. So, in factor graphs with locally tree like structure, the long range correlation is responsible for the failure of BP. Thus a phase transition will occur with the emergence of such long range correlation separating ’weakly correlated’ and ’highly correlated’ phase. The central tool behind the study of any random CSPs such as random k-SAT and random k-XORSAT through the lens of statistical mechanics is ’Gibbs measure’ which encodes the uniform distributions over all set of solutions. Thus it provides insight into the correlation decay, algo- rithmic tractability and various phase transitions [38]. This section develops the framework of Gibbs measure in sparse random structures [43] by introducing correlation decay and Gibbs uniqueness property followed by the landscape of phase transition – replica symmetry, one-step replica symmetry breaking (1-RSB cavity method) and the (non)-reconstruction / reconstruction properties on trees. 4.3.1 Gibbs measure on random CSPs Recall the Boltzmann(Gibbs) distribution from Chapter 2, with F be an instance of a random CSP consists of variable set V = {x1, x2, · · · , xn} and constraint set C = {c1,c2, · · · ,cm}. A satisfying assignment 35 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS σ ∈ {0,1}n is a solution if all the constraints are satisfied. The associated Gibbs measure at inverse temperature β≥ 0 is defined as, µβ(σ) = 1 Zβ(F ) exp (−βH(σ) ) where H (σ) is number of violated constraints under the assignment σ and the partition function Zβ(F ) is given by, Zβ(F ) = ∑ τ∈{0,1}n exp (−βH(τ) ) Remark 4.3.1. • At β=∞, the Gibbs measure µβ(F ) is the uniform distribution over all satisfying assignments. • At finite β, µβ(F ) interpolates between uniform randomness and strong bias toward solutions. Thus the Gibbs measure encodes the solution space geometry and serves as a basis for analyzing the correlations between variables. 4.3.2 Correlation decay and Gibbs Uniqueness In statistical mechanics, physical systems that have only short range correlation should relax rapidly to their equilibrium distribution. The reason behind this depends on the different degrees of freedom. If the degrees of freedom are independent, then the system relaxes on microscopic scales (namely the relaxation time of a single particle, spin etc.), whereas if they are not independent but their correlations are short-ranged, they can be harsh in such a way that they become nearly independent. For two variables xi and x j ∈V , define their correlation under the Gibbs measure [38, 40] µβ by Corrµβ(xi , x j ) = Eµβ [xi x j ]−Eµβ [xi ]Eµβ [x j ] So we need to use a measure of how much the joint distribution µi j (·, ·) of xi and x j is different from their product marginals µi (·) times µ j (·). Thus defining the two-point correlation [66] by averaging their variation distance ||µi j (·, ·)−µi (·)µ j (·)|| over the vertices i , j : Corr(2) ≡ 1 n ∑ i , j∈V ||µi j (·, ·)−µi (·)µ j (·)|| Remark 4.3.2. • Correlation decay occurs if correlations Corrµβ(xi , x j ) vanish with the distance between two nodes i , j ∈V in the graph, i.e., Corrµβ(xi , x j ) → 0 as dist(i , j ) →∞ CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 36 • Absence of correlation decay indicates long-range dependencies which often indicates the cluster- ing or broken symmetry. These phenomena are linked to the uniqueness of Gibbs measure on the infinite tree limit (local weak convergence) [21, 40]. Gibbs Uniqueness Begin with the Galton-Watson tree T=Td ,k , which is generated by a two-type branching process. The two types are variable nodes and clause nodes. The process starts with a single root variable node r . The offspring of any variable node is a Po(d) number of clause nodes, while every clause node begets precisely k −1 variable nodes. Additionally, independently for each clause node a and every variable node x that is either a child or the parent of a a sign, denoted sign(x, a) ∈ {±1}, is chosen uniformly at random. The resulting random tree T models the local structure of the random k-CNF formula F = F (n,d ,k) in the sense of local weak convergence [9, 62]. For an integer ℓ≥ 0 let T(ℓ) be the finite tree obtained by removing all variable and clause nodes at a distance greater than 2ℓ from the root r . We identify the finite tree T(ℓ) with a Boolean formula whose variables/clauses are precisely the variable/clause nodes of T(ℓ). Let S(T(ℓ)) ̸= ; be the set of satisfying assignments of this formula and let τ(ℓ) ∈ S(T(ℓ)) be a uniformly random satisfying assignment. Moreover, let ∂2ℓr be the set of variable nodes ofT(ℓ) at distance precisely 2ℓ from the root r . Then for given d ,k the tree T=Td ,k has the Gibbs uniqueness property (see [59] for more details) if lim ℓ→∞ E [ max τ∈S(T(ℓ)) ∣∣∣P [ τ(ℓ)(r ) = 1 |T ] −P [ τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) = τ(x) ]∣∣∣ ] = 0. (4.3.1) In words, in the limit of large ℓ the truth value τ(ℓ)(r ) of the root r is asymptotically independent of the truth values {τ(ℓ)(x)}x∈∂2ℓr of the variables at distance 2ℓ from r . In this thesis, we explicitly derive the lower bound on the duniq(k) threshold for any k ≥ 3. The details of this is discussed in Chapter 7. Remark 4.3.3. • Uniqueness regime: correlation decay along the tree T and the belief propagation converges to a unique fixed point. • Non-Uniqueness regime: In the infinite tree T, suitably chosen boundary conditions can give rise to distinct extremal Gibbs measures, commonly referred to as pure states. On the other hand, in large finite random graphs, this phenomenon appears in the thermodynamic limit as a decomposition of the solution space into multiple well-separated clusters, which correspond to these pure states. • In random k-SAT, the breakdown of Gibbs uniqueness (a state where the solution is not unique and has many distinct clauses) occurs well before the clustering threshold (see Figure 4.2) whereas in random k-XORSAT the phase transition for the appearance of a linear number of frozen variables 37 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS (variables whose assignments are determined by the problem structure) occurs simultaneously with the emergence of the 2-core. r 1 1 0 1 0 01 0 1 2ℓ σ : Figure 4.1: Galton-Watson tree T with Gibbs Uniqueness property 4.3.3 Replica Symmetry The replica symmetry condition originally come from the method to study the partition function Z . As estimating the partition function is hard because of the sum runs over the exponential number of indices (say for k-SAT we have 2n possible summands). Therefore to get the leading exponential order of partition function, a reasonable approximation is given by the term limn→∞ 1 n log Z . Then the average log-partition function (in other words ’quenched free energy’) defined in Section 4.2 for computing the moments E[Z ℓ] for any fixed ℓ ∈N: lim n→∞ 1 n E[log Z ] = lim n→∞ lim ℓ→0 1 nℓ logE[Z ℓ] This technique is known as replica symmetry trick [36, 70, 71]. To see the working mechanism the most suitable model to consider is random energy model(REM). Although this model does not describe any realistic physical system but is a good example for studying the concept of replica theory. Consider the model, we have 2n possible assignments σ with each has an energy E (σ). Therefore ℓ-th moment of Z is given by, Z ℓ = 2n∑ i1,i2,··· ,iℓ exp [ β (−Ei1 −·· ·−Eiℓ )] where, β denotes the inverse temperature. This quantity can be considered as a partition function of a new system given by ℓ-tuples {i1, · · · , iℓ} with energies Ei1,··· ,iℓ = Ei1 +·· ·+Eiℓ . which implies that the CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 38 new system is obtained by taking ℓ independent copies of the original model with each copy refers as ’replica’. Then the above equation can be written as, Z ℓ = 2n∑ i1,··· ,iℓ=1 2n∏ j=1 exp [ −βE j ( ℓ∑ a=1 1(ia = j ) )] Therefore for obtaining the average by taking the linearity of expectation and the i.i.d Gaussian energies E j , we get, E [ Z ℓ ] = 2n∑ i1,··· ,iℓ=1 exp [ β2n 4 ℓ∑ a,b=1 1 (ia = ib) ] (4.3.2) Now we can write the indicator in the equation (4.3.2) using ℓ×ℓ overlap matrix Q where each entry is given by Qab = 1{ia = ib} ∈ {0,1}. Then we can rewrite the above equation as below: E [ Z ℓ ] = ∑ Q Nn(Q)exp [ β2n 4 ℓ∑ a,b=1 Qab ] where Nn(Q) is the number of configurations (i1, · · · , iℓ) whose overlap matrix is Qab and the sum over ∑ Q runs over the symmetric matrix {0,1}ℓ×ℓ matrices. Using the large deviation principle for an entropy function s(Q) which only depends on matrix Q of the REM model we obtain Nn(Q) = exp(n(s(Q)+o(1))). Then after taking the log to the above equation we get, logE [ Z ℓ ] = n(max Q γ(Q)+o(1)) γ(Q) = β2 4 n∑ a,b=1 Qab + s(Q) (4.3.3) Here, γ is symmetric under permutation of replicas for ℓ> 1 For any permutation π on set |ℓ| we have γ(Q) = γ(Qπ) with elements Qπ ab = Qπ(a)π(b). This γ is symmetric due to the fact that the replicas at the beginning are identical and that implies Qab = q0 ∈ {0,1} for all a ̸= b — this is called in physics replica symmetry(RS). Now the immediate consequence comes for the maximization over γ with the matrix Q yields two maxima: one is Q1 where all entries are one and second one is the identity matrix Q0. From [66] there exist a precise threshold βc = 2 √ (log2)/ℓ, with β≤βc, the global maximum obtained at Q0 (identity matrix) whereas with β>βc, the maximum attained at Q1 (all 1-matrix). Thus heuristically, by putting the threshold value βc for the identity matrix Q0 we obtain the following prediction for β≤βc: lim n→∞ 1 n E [Z ] = lim n→∞ lim ℓ→0 1 ℓn logE [ Z ℓ ] = lim ℓ→0 1 ℓ γ(Q0) = β2 4 − log2 Moreover, two problems come into picture for ℓ< 1: first one we need to maximize over the negative number of variables. For the remedy of this problem Giorgio Parisi transformed the problem into a 39 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS minimization problem referred as ’Parisi axioms’. Mathematically, logE [ Z ℓ ] = n ( min Q γ(Q)+o(1) ) For β>βc the sum in (4.3.3) is dominated by Q which are not replica symmetric. In order to improve on the RS result, one can the subspace of matrices to be optimized over. Proposed by Parisi in the more complicated case of spin glass mean-field theory the replica symmetry breaking, prescribes a recursive procedure for defining larger and larger spaces of matrices Q where one searches for saddle points. The first step of this procedure called one step replica symmetry breaking(1RSB) [66]. For better understanding let us suppose ℓ is a multiple of x and we group ℓ replicas into ℓ/x groups of x in such as way with q0 ̸= q1,    Qaa = 1, Qab = q1, if a and b are in same group Qab = q0, if a and b are in different group In random k-SAT model, for any variable x1 and x2 and a sample assignment σ from the Boltzmann distribution we have, lim n→∞E [∣∣µ(σx1 =σx2 = 1)−µ(σx1 = 1) ·µ(σx2 = 1) ∣∣]= 0 If the replica symmetry condition approximately holds for random factor graphs, then the limiting normalized log-partition function is predicted by, lim n→∞ 1 n E [ log Z ]= sup π∈P 2(Ω) B(π) (4.3.4) where, π is the probability measure defined in the interval [0,1] and P (Ω) is the probability distribution onΩ and B : P 2(Ω) →R. In physics paradigm the quantity in the r.h.s. of (4.3.4) is referred as ’Bethe free entropy’. Our third result of this thesis on random k-SAT provide a detailed evaluation of this quantity. For more details refer to Chapter 7 and [21]. In the next subsection we will see the clustering transition upto which this replica symmetry holds for random k-SAT and beyond which this 1RSB holds that the cluster breaks into exponentially many solution clusters. 4.3.4 Clustering transition: Reconstruction Property when we are in the satisfiable regime α < αSAT(k), the clustering transition (also known as recon- struction or, dynamic transition) occurs. The set of typical solutions is rather well connected, that means any solution can be reached from the other by intermediate solutions. But above the clustering CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 40 threshold αclus the typical solutions break into an exponentially many clusters or pure states which are internally well connected, but separated one from each other by free-energy barriers. So, the definition concerning typical solutions with respect to the measure µ=µG (F ) chosen to describe the set of solutions. Moreover introduced by Montanari and Semerjian in [82] the clustering transition can be interpreted as the birth of the point-to-set correlation function under that probability measure µG (F ) chosen to describe the set of solutions. Given a variable node v and a set of variables V , the point-to-set correlation function is defined for spin variables over the measure µ: Cor(v,V ) = ∑ σV Pµ (σV ) (∑ σv Pµ (σv |σV )σv )2 − (∑ σi Pµ (σv )σv ) In the unclustered regime when the distance from variable v to V on the graph of interaction grows, the point-to-set interaction function Cor(v,V ) vanishes, whereas in the clustered regime it doesn’t decay to zero. Further in [82], the author justifies the terminology dynamic transition by showing that this correlation implies the divergence of the relaxation time of local stochastic processes that obey the detailed balanced condition such as ’Markov Chain Monte Carlo’. In the cavity method pf random CSPs, this clustering threshold refers to as the appearance of a non-trivial solution of one-step of Replica Symmetry Breaking (1RSB) with parisi parameter X = 1. Let us briefly summarize the above thing. In the unclustered phase, the typical solution belongs to a single cluster. Thus the thermodynamic properties of the measure µ are well defined by the rigorous technique called Replica Symmetric cavity method which in particular estimate ΦRS that is the quenched free entropy density defined in (4.2.2). On the other hand, in the case of 1-RSB cavity method, the solution set splits into an exponential number of disjoint clusters (also called ’pure states’). In the next subsection we will see different phases of these transitions in random k-SAT and k-XORSAT model. In the second result of this thesis, we employ the concept of (non-) reconstruction property in the context of computing the marginal probability of the root of a bipartite graph G(F DC,t ) associated with a random k-XORSAT formula F DC,t generated by the decimation process (defined in Chapter 3). Roughly speaking, non-reconstruction means that the marginal πF DC,t =P [ σF DC,t (xt+1) = 1 | F DC,t ] is determined by short-range rather than long-range effects. As we know that Belief Propagation is a local algorithm, one might expect that the (non-)reconstruction phase transition coincides with the threshold up to which BPGD succeeds (more details can be found in [22]). For a (variable or clause) vertex v of G(F DC,t ) let ∂v be the set of vs neighbors. More generally, for an integer ℓ ≥ 1 let ∂ℓv be the set of vertices of G(F DC,t ) at shortest path distance precisely ℓ from v . From Figure 4.1 also one can refer that computing the marginal of the root r , whatever assignments at a distance greater than 2ℓ will not affect the marginal computation of root in the case of (non)-reconstruction phase. Following [59], we say that F DC,t has the non-reconstruction property if lim ℓ→∞ limsup n→∞ E [∣∣∣P [ σF DC,t (xt+1) = 1 ∣∣∣F DC,t , { σF DC,t (y) } y∈∂2ℓxt+1 ] −P[ σF DC,t (xt+1) = 1 | F DC,t ]∣∣∣ |F XOR sat. ] = 0. (4.3.5) 41 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS Conversely, F DC,t has the reconstruction property if liminf ℓ→∞ liminf n→∞ E [∣∣∣P [ σF DC,t (xt+1) = 1 ∣∣∣F DC,t , { σF DC,t (y) } y∈∂2ℓxt+1 ] −P[ σF DC,t (xt+1) = 1 | F DC,t ]∣∣∣ |F XOR sat. ] > 0. (4.3.6) 4.4 Different phases in random k-SAT and random k-XORSAT There are several phase transitions affecting the structure of the set of solutions of typical instances predicted by cavity method in the satisfiable phase. Last few subsections describe the concept of correlation decay and pure state decomposition in the clustering transition. When we deal with models on locally tree like structure, we have encountered three main phases (also in random k-SAT) which can be studied using appropriate cavity method. UNSAT Figure 4.2: Phase diagram of k-SAT adapted and modified from [66]. Left to Right: Uniqueness, Clustering (Replica Symmetry), Clustering → Condensation (dynamic 1RSB), Condensation → Satisfiability (static 1RSB), UNSAT • Replica Symmetry: We further divide this phase into two different regimes. (i). α<αuniq: In this phase there exists no trivial decomposition into pure states. The system is in the replica symmetry phase with one big cluster of solution. (ii). αuniq <α<αclus: Although replica symmetry holds in this regime but the clusters form one big cluster along with exponentially tiny and scarce cluster. • Dynamic 1RSB (αclus <α<αcond): In this phase the Gibbs measure µ(·) admits a non-trivial decomposition into an exponentially number of pure states. From the correlation point of view this phase is stable to small perturbations, but it is reconstructible. The solution space undergoes clustering that is exponentially many solution clusters of small size. • Static 1RSB (αclus <α<αcond): This is the ’original’ 1RSB phase analogous to the low-temperature phase of the REM. This phase is not stable to small perturbations, and it is reconstructible implies it has long-range correlations. The number of solutions in this regime is bounded. CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 42 • UNSAT: Beyond the satisfiability threshold αSAT, there exists no solution and the regime is referred as Unsatisfiable phase. Remark 4.4.1. In terms of statistical physics, one replica symmetry breaking (1RSB) cavity method gives a full overview of the solution space geometry of random k-SAT. According to physics method, the satisfiability threshold for random k-SAT turns out to be αsat (k) = 2k log2− 1+ log2 2 +ok (1) Further for larger k values the satisfiability threshold have been studied rigorously [34, 41]. UNSAT Easy SAT Hard SAT Figure 4.3: Phase diagram of k-XORSAT adapted and modified from [66]. Left to Right: Clustering (Easy SAT), Satisfiability(Hard SAT), UNSAT k 3 4 5 αclus 0.81847 0.77228 0.70178 αSAT 0.91794 0.97677 0.99244 Table 4.1: The thresholds αclus and αSAT for various k [66] On the other hand, in case of random k-XORSAT, the whole regime α < αSAT is the satisfiable phase. This means there exist solutions to the random linear system with high probability and more specifically, the number of solutions is given by, Z = en(1−α) where n is the number of variables of random k-XORSAT formula. From the figure it is clear that, the threshold αclus separates two phases: • α<αclus: This phase is referred as ’Easy SAT ’ where there is a single cluster of solutions. • αclus <α<αSAT: This phase is referred as ’Hard SAT 1’ where the solutions of linear system are grouped into well-separated clusters. 1The k-XORSAT problem can, of course, always be solved by ’Gaussian elimination in O(n3) time with n is the number of involved variables and m =Θ(n) is the number of equations. 43 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS • α>αSAT: In this regime there exists no solution and is referred as ’UNSAT ’ phase. Table 4.1 shows the thresholds for small k values. For the large k, αclus ≈ logk k and αSAT ≈ 1− e−k + O(e−2k ) [66]. In the second result of this thesis, we analyze the performance of BPGD algorithm on thresholds of α values, more precisely on the threshold values of average variable degree d (where α= d/k) with constant k ≥ 3. Based on: The number of random 2-SAT solutions is asymptotically log-normal [23] Arnab Chatterjee, Amin Coja-Oghlan, Noela Müller, Connor Riddlesden, Maurice Rolvien, Pavel Zakharov, Haodong Zhu Proc. 28th RANDOM (2024) 5 A Central Limit Theorem for random 2-SAT solutions “The central limit theorem is the most fundamental theory in modern statistics.· · · With the central limit theorem, parametric tests have higher statistical power than non-parametric tests, which do not require probability distribution assumptions.” –Sang Gyu Kwak et.al. Till now we have explored the random constraint satisfaction models, the different message passing algorithms and the phase transitions depend on the values of variable to clause density (α) and the solution space geometry based on the values ofα. The goal of this chapter is to provide a deeper insight on the estimation of partition function i.e., the number of solutions (more precisely, the logarithm of the number of satisfying assignments) in random 2-SAT, the simplest random CSP model. 5.1 Motivation and History The hunt for satisfiability thresholds has been a guiding theme of research into random constraint satisfaction problems [6,24,41]. Once the satisfiability threshold has been pinpointed, the next obvious question should come into one’s mind is to determine the distribution of satisfying assignments within the satisfiable phase [59]. Indeed, the number of such solutions is intimately tied to phase transitions that affect the solution space geometry, which in turn impacts the computational nature of finding or sampling solutions [1, 18, 44]. Despite its importance, the problem of counting solutions in 44 45 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS random CSPs remains difficult, with few general-purpose tools currently available. In those instances where precise, rigorous results have been obtained, such as for random NAE-SAT or XORSAT, the proofs commonly rely on the method of moments (e.g., [4, 42, 93, 95]). A necessary condition for the success of this approach is that the problem exhibits certain symmetries which are absent in many interesting cases [6, 31]. Random 2-SAT, the simplest random constraint satisfaction problem lacking the aforementioned symmetry properties, is therefore an intriguing topic of study. The number of satisfying assignments in random 2-SAT Let F 2SAT = F n,m be a random 2-CNF on n Boolean variables x1, . . . , xn with m clauses, drawn indepen- dently and uniformly from all 4 (n 2 ) possible 2-clauses. Further assume that m ∼ dn/2 for a fixed real d > 0. Since 1990s it has been known that F 2SAT is satisfiable w.h.p. if d < 2, and unsatisfiable w.h.p. if d>2 [26, 53]. Whereas the first order approximation to the number of satisfying assignments has been studied recently [2]. Alongside this, calculating the number of satisfying assignments Z (F 2SAT) is a #P-hard task 1 [100]. Nonetheless, Monasson and Zecchina in [77] put forward a conjecture on the exponential order of the number of satisfying assignments of random 2-CNFs using physics inspired technique. In 2021 Achlioptas et al. [2] provides a first order on the logarithm of the number of satisfying assignments using law of large number approximation by introducing a function φ(d) > 0 such that for all d < 2, i.e.,throughout the entire satisfiable phase, log Z (F 2SAT) = nφ(d)+o(n) w.h.p., (5.1.1) In this thesis we determine not only the leading order of log Z (F 2SAT) but also its fluctuations. We also provide a precise result by showing that the logarithm of the number of satisfying assignments converges to a Gaussian throughout the satisfiable regime – the first central limit theorem (’CLT’) of this type for any random CSPs. 5.2 Main Result. In this section we state our first result of this thesis [23]. Let P (R2) be the set of all (Borel) probability measures on R2. For 0 < d < 2 and 0 ≤ t ≤ 1 we define an operator logBP⊗d ,t :P ( R2)→P ( R2) , ρ 7→ ρ̂ = logBP⊗d ,t (ρ), (5.2.1) as follows. Let (ξρ,i )i≥1, (ξ′ρ,i )i≥1, (ξ′′ρ,i )i≥1, ξρ,i = ( ξρ,i ,1 ξρ,i ,2 ) , ξ′ρ,i = ( ξ′ρ,i ,1 ξ′ρ,i ,2 ) , ξ′′ρ,i = ( ξ′′ρ,i ,1 ξ′′ρ,i ,2 ) be random vectors with distribution ρ, let d d=Po(td), d ′,d ′′ d=Po((1− t )d) and let si , s ′i , s ′′i ,r i ,r ′ i ,r ′′ i for 1#P-hard comprises counting problems that ask for the existence of the number of solutions for a given NP decision problem. A problem is said to be #P-hard if it is as hard as any problem in the class #P. For this reason, any problem in #P can be reduced to it in polynomial time. CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 46 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 d 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Va ria nc e Variance Second moment bound 0.40 0.45 0.50 0.55 0.60 0.65 0.70 Ex pe ct at io n Expectation First moment bound Figure 5.1: Numerical approximations to the function φ(d) from (5.1.1) (red) and the variance η(d)2 from (5.2.5) (green). The black dashed line is the first moment bound d 7→ log(2)+ d 2 log(3/4) whereas the purple dashed line is the second moment bound. (Figure 1, [23]) i ≥ 1 be uniformly random on {±1}, all mutually independent. Then ρ̂ is the distribution of the vector   ∑d i=1 si log (1 2 ( 1+ r i tanh(ξρ,i ,1/2) ))+∑d ′ i=1 s ′i log ( 1 2 ( 1+ r ′ i tanh(ξ′ρ,i ,1/2) )) ∑d i=1 si log (1 2 ( 1+ r i tanh(ξρ,i ,2/2) ))+∑d ′′ i=1 s ′′i log ( 1 2 ( 1+ r ′′ i tanh(ξ′′ρ,i ,2/2) ))   ∈R2 . In addition, define a function B⊗ d ,t : P (R2) → (0,∞] by letting B⊗ d ,t (ρ) = E [ 2∏ h=1 log ( 1− 1 4 (1+ r 1 tanh(ξρ,1,h/2))(1+ r 2 tanh(ξρ,2,h/2)) )] . (5.2.2) The main theorem establishes a CLT for the logarithm of the number of solutions of F 2SAT, with a standard deviation that is closely connected to the aforementioned function B⊗ d ,t : Theorem 5.2.1 (Theorem 1.1, [23]). For any 0 < d < 2, t ∈ [0,1] there exists a unique probability measure ρd ,t ∈P (R2) such that ρd ,t = logBP⊗d ,t (ρd ,t ) and ∫ R2 |ξ|22 dρd ,t (ξ) <∞. (5.2.3) Furthermore, lim n→∞ log Z (F 2SAT)−E[log Z (F 2SAT) | Z (F 2SAT) > 0]p m =Γη(d) in distribution, where (5.2.4) η(d)2 = ∫ 1 0 B⊗ d ,t (ρd ,t )dt −B⊗ d ,0(ρd ,0) ∈ (0,∞). (5.2.5) 47 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS Remark 5.2.2. Note that, the conditioning on log Z (F 2SAT) > 0 is necessary in (5.2.4), because even for d < 2 the formula F 2SAT is unsatisfiable with probability Ω(n−1), in which case log Z (F 2SAT) =−∞. Moreover, the L2-bound ensures that the integral (5.2.5) is well-defined. Finally, (5.2.4) implies that, P [ log Z (F 2SAT)−E[log Z (F 2SAT) | Z (F 2SAT) > 0] < z p m ]∼P[ Γη(d) < z ] (z ∈R). (5.2.6) So, the first result of this thesis [23] addresses two questions. Question 1. How to show the asymptotic normality in (5.2.4) ? Question 2. How to calculate the variance of the formula effectively? Evaluating Standard Deviation. Towards answering the second question, we know that the proof of the uniqueness of the stochastic fixed point ρd ,t from (5.2.3) is based on the contraction method, a fixed point iteration will con- verge rapidly. In effect, for any d , t a discrete distribution that approximates ρd ,t arbitrarily well (in Wasserstein distance) can be computed via a randomized heuristic algorithm called population dynam- ics [66, Chapter 14]. Since B⊗ d ,t (ρd ,t ) varies continuously in d and t , η(d)2 can thus be approximated within any desired accuracy, see Figure 5.1. Study Fixed Point ρd ,t . The distribution of each coordinate of the fixed point has been studied by Múller, Neininger and Zhu in [86]. It is shown that when d ≤ 1, the corresponding measure is purely discrete. After applying the transformation 1+tanh(·/2) 2 the support of the measure consists of rational numbers in (0,1) for all 0 < d < 2. Moreover, when d ∈ (1,2), the measure acquires a continuous part. 5.3 Proof Strategy The main hurdle towards the proof of Theorem 5.2.1 is to compute the variance of log Z (F 2SAT) given satisfiability. The key idea, inspired by spin glass theory [25] but new to any random CSPs, is to count the joint number of satisfying assignments of two correlated random formulas. Once this is accomplished Theorem 5.2.1 will follow from a general martingale central limit theorem. To get habituated we first revisit the method of moments, the reasons it fails on random 2-SAT and the combinatorial interpretation of the law of large numbers (5.1.1). 5.3.1 Method of Moments fails. The default approach to estimating the number of solutions to a random CSP is the venerable second moment method [6]. Its core idea is to show that the second moment of the number of solutions is of the same order as the square of the expected number of solutions. If so then the moment CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 48 computation together with small subgraph conditioning yields the precise limiting distribution of the number of solutions [35, 97]. However, this approach works only if the log of the number of solutions superconcentrates around the log of the expected number of solutions which does not hold in random 2-SAT. In fact, a straightforward calculation yields that, 1 n logE[Z (F 2SAT)] ∼ log2+ d 2 log(3/4). (5.3.1) The formula on the r.h.s. is displayed as the black dashed line in Figure 5.1. As can be verified analytically, this line strictly exceeds the function φ(d) from (5.1.1) for any 0 < d < 2. Consequently, (5.1.1) implies that log Z (F 2SAT) ≤ logE[Z (F 2SAT)]−Ω(n) w.h.p. In other words, the expected number of solutions E[Z (F 2SAT)] overshoots the typical number of solutions by an exponential factor w.h.p. (for details, see the discussion in [4, 7]). Rather than relying on the method of moments, Monasson and Zecchina in [77] proposed a physics-inspired approach for estimating log Z (F 2SAT) using the Belief Propagation algorithm which is discussed in Chapter 3 of this thesis. This approach was established rigorously by Achlioptas et.al. [2]. 5.3.2 BP Approximation. The Belief propagation messages introduced in Chapter 3 get updated iteratively by an operator BP : (νx→a ,νa→x )a,x∈∂a 7→ (ν̂x→a , ν̂a→x )a,x∈∂a = BP((νx→a ,νa→x )a,x∈∂a). (5.3.2) where, for ∂a = {x, y} the updated messages ν̂a→x (±1) are defined by ν̂a→x (sign(x, a)) = 1 1+νy→a(sign(y, a)) , ν̂a→x (−sign(x, a)) = νy→a(sign(y, a)) 1+νy→a(sign(y, a)) . (5.3.3) Moreover, for a variable x and a clause a ∈ ∂x we define 2 ν̂x→a(s) = ∏ b∈∂x\{a}νb→x (s)∏ b∈∂x\{a}νb→x (1)+∏ b∈∂x\{a}νb→x (−1) (s ∈ {±1}) . (5.3.4) The purpose of BP is to heuristically ‘approximate’ the marginal probabilities that a random satisfying assignment σ=σF 2SAT of F 2SAT will set a certain variable to a specific truth value. The ‘approximation’ given by the set (νx→a ,νa→x )a,x∈∂a of messages reads νx (s) = ∏ b∈∂x νb→x (s)∏ b∈∂x νb→x (1)+∏ b∈∂x νb→x (−1) (s ∈ {±1}). (5.3.5) Equation (5.3.5) suggests that the BP operator should be iterated until a fixed point is reached; that is, until ν̂x→a = νx→a and ν̂a→x = νa→x hold for all x, a. We then evaluate (5.3.5) and substitute them into a general expression known as Bethe free entropy, which yields the BP approximation of log Z (F 2SAT). 2For the sake of tidyness, if the above denominator vanishes we simply let µ̂x→a (±1) = 1 2 . 49 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS This BP approximation is accurate when the bipartite graph induced by the clause-variable incidences of the 2-CNF F 2SAT is acyclic, but it may produce inaccurate results in the presence of cycles. 5.3.3 Towards calculating variance. The proof of the formula (5.1.1) combines the Gibbs uniqueness property discussed in Chapter and the local convergence to the Galton-Watson tree with a coupling argument called the ’Aizenman-Sims-Starr scheme’ [2]. Unfortunately it is not clear how the order of the standard deviation of log Z (F 2SAT) could be derived because the main problem relies on the convergence of the Gibbs uniqueness property diminishes as d approaches the satisfiability threshold. To overcome this challenge, we develop a combinatorial interpretation of log2(Z (F 2SAT)) by constructing a correlated pair (F 1(M , M ′),F 2(M , M ′)) for any given integers M , M ′ ≥ 0 of formulas on the variable set Vn = {x1, . . . , xn} as follows. Let (ai )i≥1, (a ′ i )i≥1, (a ′′ i )i≥1 be sequences of mutually independent uniformly random clauses on Vn . Then F 1(M , M ′) = a1 ∧·· ·∧aM ∧a ′ 1 ∧·· ·∧a ′ M ′ and, (5.3.6) F 2(M , M ′) = a1 ∧·· ·∧aM ∧a ′′ 1 ∧·· ·∧a ′′ M ′ . (5.3.7) Thus, the two formulas share clauses a1, . . . , aM . Additionally, each contains another M ′ independent clauses. In particular, F 1(m,0), F 2(m,0) are identical, while F 1(0,m), F 2(0,m) are independent. For computing the variance given that F 1(M ,m −M) and F 2(M ,m −M) are satisfiable for all M , we can write a telescoping sum log Z (F 1(m,0)) · log Z (F 2(m,0))− log Z (F 1(0,m)) · log Z (F 2(0,m)) (5.3.8) = m∑ M=1 log Z (F 1(M ,m −M)) · log Z (F 2(M ,m −M)) − log Z (F 1(M −1,m −M +1)) · log Z (F 2(M −1,m −M +1)). Clearly, if we could take the expectation on the l.h.s. of (5.3.8), we would precisely obtain the variance of log Z (F 2SAT). However, we cannot just take the expectation of (5.3.8), because some F h(M ,m −M) may be unsatisfiable for (h = 1,2). potentially leading to occurrences of log0. To address this issue we replace log Z (F 2SAT) with more tractable random variable sharing same limiting distribution whose construction is based on ’Unit Clause Propagation’ discussed in Chapter 3, Section 3.2.3. Thus we obtain a pruned formula F̂ 2SAT from the original 2-CNF F 2SAT and the following Fact can be verified (for more details of the proof see [23]). Fact 5.3.1 (Fact 2.2, [23]). For any 2-CNF F 2SAT, the pruned 2-CNF F̂ 2SAT is satisfiable. Note that, even if F 2SAT is satisfiable the number Z (F̂ 2SAT) of satisfying assignments of F̂ 2SAT could dramatically exceed Z (F 2SAT) as the pruned formula F̂ 2SAT generally have far fewer clauses than the original formula F 2SAT. However, the following proposition shows that on a random formula, the impact of pruning is modest. CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 50 Proposition 5.3.2 (Proposition 2.3, [23]). | log Z (F̂ 2SAT)− log Z (F 2SAT)| ≤ n1/3. with probability 1− o(n−1/2). Figure 5.2: An illustration of the correlated GW-tree T ⊗ (Figure 1, [23]) As the error bound from Proposition 5.3.2 is tight, it suffices to establish a CLT for the log of the number of satisfying assignments of the pruned formula log Z (F̂ 2SAT). Revisiting the telescoping sum (5.3.8) we obtain the following lemma Lemma 5.3.3 which expresses the variance as a sum of local changes. For example,Φ1(M ,m −M) is obtained fromΦ1(M −1,m −M) by adding a single random clause, namely aM . On the other hand, only a few clauses are pruned from random formulas w.h.p. Expanding the variance Var ( log Z (F̂ 2SAT) ) as follows: Lemma 5.3.3 (Lemma 2.4, [23]). Let ∆(M) = E [ log ( Z (F̂ 1(M ,m −M)) Z (F̂ 1(M −1,m −M)) ) · log ( Z (F̂ 2(M ,m −M)) Z (F̂ 2(M −1,m −M)) )] , (5.3.9) ∆′(M) = E [ log ( Z (F̂ 1(M −1,m −M +1)) Z (F̂ 1(M −1,m −M)) ) · log ( Z (F̂ 2(M −1,m −M +1)) Z (F̂ 2(M −1,m −M)) )] . (5.3.10) Then Var [ log Z (F̂ 2SAT) ]= m∑ M=1 ∆(M)−∆′(M). Now for the analysis of the correlated formulas we need the following expressions to evaluate. Proposition 5.3.4 (Proposition 2.5, [23]). Let 1 ≤ M ≤ m. Then, Z (F̂ h(M ,m −M)) Z (F̂ h(M −1,m −M)) = 1− ∏ y∈∂aM P [ σy ̸= sign(y, aM ) | F̂ h(M −1,m −M), aM ]+o(1) (h = 1,2), Z (F̂ 1(M −1,m −M +1)) Z (F̂ 1(M −1,m −M)) = 1− ∏ y∈∂a ′ m−M+1 P [ σy ̸= sign(y, a ′ m−M+1) | F̂ 1(M −1,m −M), a ′ m−M+1 ]+o(1), Z (F̂ 2(M −1,m −M +1)) Z (F̂ 2(M −1,m −M)) = 1− ∏ y∈∂a ′′ m−M+1 P [ σy ̸= sign(y, a ′′ m−M+1) | F̂ 2(M −1,m −M), a ′ m−M+1 ]+o(1). 51 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS We construct a Galton-Watson tree T ⊗ that approximates the joint distribution of the local structure of the pair (F̂ 1(M −1,m −M), F̂ 2(M −1,m −M)). Shared variables/clauses are indicated in red, 1-distinct variables/clauses in green and 2-distinct ones in blue in Figure 5.2. (for more details refer to [23]) From T ⊗ we extract a pair (T 1,T 2) of correlated random trees. Specifically, T h is obtained from T ⊗ by deleting all (3−h)-distinct variables and clauses. Hence, the parameter t determines how ‘similar’ T 1,T 2 are. As we have generated a pair of random formulas and take a uniformly random pair of satisfying assignments, the joint distribution of any of n coordinates can be viewed on the heatmaps (shown in Figure 5.3): almost independent formulas on the left and highly correlated formulas on the right. Figure 5.3: Marginal distribution on two correlated formulas for d = 0.9 and M = 0.1m,0.5m,0.9m (Figure 2, [23]) Now in our hand we have a pair of correlated formulas the next step is to run BP on the random trees (T 1,T 2) to find the joint distribution of the truth values σT (2ℓ) 1 ,o ,σT (2ℓ) 2 ,o assigned to the root o. Fortunately, due to the Markovian nature of the Galton-Watson tree T ⊗, the bottom-up BP computation on a random tree can be expressed by a fixed point iteration on the space of probability distributions on R2. The most appropriate operator logBP⊗d ,t expresses the updates of the log-likelihood ratios of the BP messages from (5.3.3)–(5.3.4). Thus the followings hold: Proposition 5.3.5 (Proposition 2.8, [23]). There exists a unique ρd ,t ∈P (R2) that satisfies (5.2.3) and limℓ→∞ρ(ℓ) d ,t = ρd ,t weakly. Corollary 5.3.6 (Corollary 2.11, [23]). With η(d)2 from (5.2.5) we have η(d) > 0 and Varlog Z (F̂ 2SAT) ∼ mη2 d . The proof of Proposition 5.3.5 is based on a contraction argument, for any d , t the distribution ρd ,t can be approximated effectively within any given accuracy via a fixed point iteration. From the contraction argument on the evaluation of the functional B⊗ d ,t on ρd ,t yield finite values for any d ∈ (0,2) and t ∈ [0,1] which implies the finiteness of the variance. Lemma 5.3.7 (Lemma 10.1, [23]). For any d ∈ (0,2) and t ∈ [0,1], B⊗ d ,t (ρd ,t ) <∞. Moreover, for any d ∈ (0,2), η(d)2 <∞. CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 52 Finally, along with Proposition 5.3.4, the BP arguments on correlated formulas give us the variance of log Z (F̂ 2SAT). 5.4 Establishing the Central Limit Theorem Once finished calculating variance we set up a filtration (Fn,M )0≤M≤mn by letting Fn,M be theσ-algebra generated by a1, . . . , aM . The conditional expectations is given by, Z n,M = m−1/2E [ log Z (F̂ 2SAT) |Fn,M ] (5.4.1) then form a Doob martingale. Let X n,M = Z n,M −Z n,M−1 be the martingale differences. Proposition 5.4.1 (Proposition 2.12, [23]). For all 0 < d < 2 the martingale (5.4.1) satisfies lim n→∞E [ max 1≤M≤m |X n,M | ] = 0 and, (5.4.2) lim n→∞E ∣∣∣∣∣η(d)2 − m∑ M=1 X 2 n,M ∣∣∣∣∣= 0. (5.4.3) Clearly, the above conditions can be checked easily with the help of pruning argument. Thus we conclude this chapter by deriving the main result of this chapter from the following general martingale central limit theorem, which is a special case of [55, Theorem 3.2]. Theorem 5.4.2 ( [55, Theorem 3.2]). Let (Z n,i ,Fn,i )0≤i≤mn ,n≥1 be a zero-mean, square-integrable mar- tingale array with differences X n,i = Z n,i −Z n,i−1 for 1 ≤ i ≤ mn . Assume that there exists a constant η2 such that lim n→∞ max 1≤i≤mn |X n,i | = 0 in probability, (5.4.4) lim n→∞ mn∑ i=1 X 2 n,i = η2 in probability, (5.4.5) E [ max 1≤i≤mn X 2 n,i ] is bounded in n. (5.4.6) Then Z n,mn converges in distribution to a Gaussian distribution with mean zero and variance η2. Observe that,Proposition 5.4.1 directly implies the conditions of Theorem 5.4.2. Lastly, for the finiteness and positiveness of the variance, Lemma 5.3.7 guarantees that η(d) <∞, while Corollary 5.3.6 shows that η(d) > 0. Based on: Belief Propagation Guided Decimation on Random k-XORSAT [22] Arnab Chatterjee, Amin Coja-Oghlan, Mihyung Kang, Lena Krieg, Maurice Rolvien, Gregory Sorkin Proc. 52nd ICALP (2025) 6 Performance of BPGD on random k-XORSAT “ BPGD enhances conventional BP by sequentially fixing (decimating) variable nodes based on their belief values and reducing the solution space.” –Masoumeh Alinia et.al. As we established a central limit theorem holds for the number of solutions of random 2-SAT formulae in the previous chapter, we now shift our attention from random 2-SAT to another random satisfiability problem namely random k-XORSAT, one of the simplest examples of random contraint satisfaction problems exhibiting sharp phase transition. More precisely, we will analyze the perfor- mance of Belief Propagation Guided Decimation (’BPGD’) introduced in Chapter 3 by mathematically verified the heuristic work by Ricci-Tersenghi and Semerjian [96]. In addition to this, we study a thought experiment called ’decimation process’ (also initiated in Chapter 3) for which we identify a (non)-reconstruction and condensation phase transition. Begin the chapter with some motivation and background behind this work. 6.1 Motivation and History The random k-XORSAT exhibits many features common to other intensely studied random CSPs, such as random k-SAT. At the same time, the random k-XORSAT is mathematically more compliant than say random k-SAT because a XORSAT instance translates into a linear system over F2 as XOR 53 CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 54 operation is equivalent to addition modulo two. In addition, the algebraic nature of the problem induces strong symmetry properties that simplify its study [12]. Since early 2000, in combinatorics as well as in statistical physics there has been contributing intriguing ’prediction’ on different random CSPs. Furthermore, in 2008 Ricci-Tersenghi and Semerjian in [96] put forward a heuristic analysis of BPGD on both random k-SAT and k-XORSAT. Later Coja-Oghlan and Pachon-Pinzon, demonstrated both ’decimation process’ and ’BPGD’ rigorously on random k-SAT [28,33] by assuming clause length k is sufficiently large due to the lack of inherent symmetry in random k-SAT. In a recent paper [102], Yung in 2024 a first step towards the rigorous analysis of BPGD on random k-XORSAT has been undertaken. However, Yung’s analysis turns out to be not tight. Specifically, apart from requiring spurious lower bounds on the clause length k, Yung’s results do not quite establish the precise connection between the decimation process and the performance of BPGD. One reason for this is that [102] relies on ‘annealed’ techniques, i.e., essentially moment computations. Here we instead harness ‘quenched’ arguments to proof the success probability of BPGD and to make a precise connection between BPGD and the decimation process. The next section provides a very brief overview of the main problem addressed in this work along with their results. 6.2 Problem Statement and Results. let F XOR = F (n,d ,k) be a random k-XORSAT formula with variables x1, . . . , xn and m random clauses of length k where m d=Po(dn/k). The m clauses are drawn uniformly and independently out of the set of all 2k (n k ) possibilities. Thus, d > 0 equals the average number of clauses that a given variable xi appears in. Moreover, every clause of F XOR is an XOR of precisely k distinct variables with k ≥ 3, each of which may or may not come with a negation sign. Mathematically, if we are handed a number of independent random constraints (clauses) ci of the type ci = yi 1 XOR · · · XOR yi k , where each yi j is either one of n available Boolean variables x1, . . . , xn or a negation ¬x1, . . . ,¬xn . As we know that boolean XOR boils down to addition over F2, this problem can be rephrased as the full rank problem for the random matrix A with q = 2, k = k fixed to a deterministic value. Furthermore, the random negation patterns of the constraints amount to choosing a random right-hand side vector y for which we are to solve Ax = y . Let A be a matrix representation of a random k-XORSAT formula F XOR. In addition to the matrix A, define A′ = A′(θ) by adding θn (0 ≤ θ ≤ 1) new rows, each with exactly a single one, at the bottom of A. Equivalently, the Tanner graph G ′ of A′ is obtained by adding Po(λn) unary check nodes to the Tanner graph G of A where λ = − log(1−θ). This process is called ’Pinning’ which helps to remove mostly ’short linear relations’. Below Figure 6.1 gives a rough sketch of the matrix A and A′ with respect to the tanner graph G and G ′. So the second result of this thesis [22] addresses two questions. Question 1. How does the solution space geometry will change when ’pinning’ occurs? 55 CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT Figure 6.1: Matrix A and A′ corresponds to the Tanner graph G and G ′ Question 2. Establish a link between the performance of the BPGD algorithm and phase transition in decimation process. 6.2.1 Analysis of BPGD In order to state the main results we need to introduce a few threshold values. To this end, given d ,k and a real parameter λ ≥ 0, consider the probability generating functions of D corresponds to the variable node and treat as a Poisson random variable and K corresponds to the check nodes and treat as a two-point distribution, either k or 1 (as the check nodes contains two types of nodes, one with degree k and another is the unit clause.) and is given by, D(z) = exp((λ+d)(z −1)) K (z) = kλ kλ+d z + d kλ+d zk . (6.2.1) Definition 6.2.1. The Bethe free entropyΦ of the matrix A′ is defined by Φ(z) = D ( 1− K ′(z) K ′(1) ) − D ′(1) K ′(1) ( 1−K (z)− (1− z)K ′(z) ) . Also, consider a function φ: φ(z) = 1−D ′ (1−K ′(z)/K ′(1) ) /D ′(1). Remark 6.2.2. • Φ′(z) = D ′(1)K ′′(z)(φ(z)− z)/K ′(1). • So, the stationary points ofΦ coincide with the fixed points of φ which is verified in [22]. CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 56 Substituting for the specific distributions from (6.2.1) we get the following expressions for φ(z) and Φ(z): φd ,k,λ :[0,1] → [0,1], z 7→ 1−exp ( −λ−d zk−1 ) , (6.2.2) Φd ,k,λ :[0,1] →R, z 7→ exp ( −λ−d zk−1 ) − d(k −1) k zk +d zk−1 − d k . (6.2.3) Let α∗(λ) = α∗(d ,k,λ) ∈ [0,1] be the smallest and α∗(λ) = α∗(d ,k,λ) ≥ α∗(d ,k,λ) ∈ [0,1] the largest fixed point of φd ,k,λ. Figure 6.2 visualizesΦd ,k,λ(z) for different values of λ. 0.2 0.4 0.6 0.8 1 z -0.1 -0.05 0.05 0.1 0.15 0.2 Φd, k, λ Figure 6.2: Φd ,k,λ for k = 3 and d = 2.4, for λ from 0 to 0.3 (maximum at z = 0) and from 0.4 to 0.9 (Figure 1, [22]) In addition to this, define few threshold values of d . dmin(k) = ( k −1 k −2 )k−2 , (6.2.4) dcore(k) = sup { d > 0 :α∗(0) = 0 } , (6.2.5) dSAT(k) = sup { d > 0 :Φd ,k,0(α∗(0)) ≤Φd ,k,0(0) } . (6.2.6) where, the value dSAT(k) is the random k-XORSAT satisfiability threshold [12, 42, 93] and dcore(k) equals the threshold for the emergence of a giant 2-core within the k-uniform hypergraph induced by Φ [12, 75]. A bit of calculus reveals that 0 < dmin(k) < dcore(k) < dSAT(k) < k. Now we state our second result of this thesis [22]. Theorem 6.2.3 (Theorem 1.1, [22]). Let k ≥ 3. 57 CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT (i). If d < dmin(k), then lim n→∞P [ BPGD(F XOR) finds a satisfying assignment ]= exp ( −d 2(k −1)2 4 ∫ 1 0 z2k−4(1− z) 1−d(k −1)zk−2(1− z) dz ) . (ii). If dmin(k) < d < dSAT(k), then P [ BPGD(F XOR) finds a satisfying assignment ]= o(1). The above theorem determines the precise clause-to-variable densities where BPGD succeeds/fails and mathematically verified the heuristic work by Ricci-Tersenghi and Semerjian [96]. To be precise, in the ‘successful’ regime BPGD does not actually succeed with high probability, but with an explicit prob- ability strictly between zero and one. The most significant ingredient towards turning the heuristic arguments from [96] into a rigorous proof is a formula for the nullity of the check matrix of the XORSAT instance F DC,t from the decimation process introduced in Chapter 3. The following proposition establishes a relationship between the matrix At = AF DC,t and the functionΦd ,k,λ for the pre-defined d and λ. Proposition 6.2.4 (Proposition 2.6, [22]). lim n→∞nulAt =Φd ,k,λ(αmax) in probability. 6.2.2 Phase Transition of Decimation process In addition to the success probability of BPGD algorithm, we also mathematically confirm the predic- tions of phase transition heuristically introduced by Ricci-Tersenghi and Semerjian [96] and investigate how they relate to the performance of BPGD. The next two theorems identify precise regime of d ,t where different phase transitions of the decimation process hold. Before going to the statement of the theorems let us introduce few values of λ. [accordingly the θ values follow from λ=− log(1−θ). λ∗ =λ∗(d ,k) =− log(1− z∗)− z∗ (k −1)(1− z∗) >λ∗ where, λ∗ =λ∗(d ,k) = max { 0,− log(1− z∗)− z∗ (k −1)(1− z∗) } ≥ 0 Additionally, let λcond(d ,k) be the solution to the ODE ∂λcond(d ,k) ∂d =− α∗(λcond(d ,k))k −α∗(λcond(d ,k))k k(α∗(λcond(d ,k))−α∗(λcond(d ,k))) , λcond(dSAT(k),k) = 0 (6.2.7) To be precise, while θcond matches the predictions of [96], the ODE formula (6.2.7) for the threshold, which is easy to evaluate numerically, does not appear in [96]. Instead of the ODE formulation, CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 58 [96] define λcond as the (unique) λ ≥ 0 such that Φ(α∗) = Φ(α∗); (we showed in [22] that both are equivalent.) Theorem 6.2.5 (Theorem 1.2, [22]). Let k ≥ 3 and let 0 ≤ t = t(n) ≤ n be a sequence such that limn→∞ t/n = θ ∈ (0,1). (i). If d < dmin(k), then F DC,t has the non-reconstruction property w.h.p. (ii). If dmin(k) < d < dSAT(k) and θ < θ∗ or θ > θcond, then F DC,t has the non-reconstruction property w.h.p. (iii). If dmin(k) < d < dSAT(k) and θ∗ < θ < θcond, then F DC,t has the reconstruction property w.h.p. Recall µF DC,t denote the BP ‘approximation’ of the correct marginal πF DC,t of variable xt+1 in the formula F DC,t created by the decimation process. Theorem 6.2.6 (Theorem 1.3, [22]). Let k ≥ 3 and let 0 ≤ t = t(n) ≤ n be a sequence such that limn→∞ t/n = θ ∈ (0,1). (i). If 0 < d < dmin(k) then µF DC,t =πF DC,t w.h.p. (ii). If dmin(k) < d < dSAT(k) and θ < θcond or θ > θ∗, then µF DC,t =πF DC,t w.h.p. (iii). If dmin(k) < d < dSAT(k) and θcond < θ < θ∗, then E ∣∣µF DC,t −πF DC,t ∣∣=Ω(1). The upshot of Theorems 6.2.5–6.2.6 is that the relation between the accuracy of BP and reconstruc- tion is subtle. Remark 6.2.7. As long as d < dmin non-reconstruction holds throughout and the BP approximations are correct. But if dmin < d < dSAT and θ∗ < θ < θcond, then Theorem 6.2.5 (iii) shows that reconstruction occurs. Nonetheless, Theorem 6.2.6 (ii) demonstrates that the BP approximations remain valid in this regime. By contrast, for θcond < θ < θ∗ we have non-reconstruction by Theorem 6.2.5 (iii), but Theorem 6.2.6 (iii) shows that BP misses its mark with a non-vanishing probability. Finally, for θ > θ∗ everything is in order once again as BP regains its footing and non-reconstruction holds. Unfortunately BPGD is unlikely to reach this happy state because the algorithm is bound to make numerous mistakes at times t/n ∈ (θcond,θ∗). Figure 6.3 illustrates Theorems 6.2.5–6.2.6, displays the phase diagram in terms of d and θ ∼ t/n for k = 3,4,5. Figure 6.3 description: • Hatched area: displays the regime θ < θ∗ and θcond < θ where non reconstruction holds. • Non-Hatched area: displays the regime θ∗ < θ < θcond where we have reconstruction. • Blue area: displays θ < θcond and θ > θ∗ where BP is correct. • Orange area: BP is inaccurate. 59 CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 2.0 2.2 2.4 2.6 d 0.00 0.05 0.10 0.15 dcoredmin dsat * cond * (a) k = 3 2.5 3.0 3.5 d 0.0 0.1 0.2 0.3 dcoredmin dsat * cond * (b) k = 4 2.5 3.0 3.5 4.0 4.5 d 0.0 0.1 0.2 0.3 0.4 dcoredmin dsat * cond * (c) k = 5 Figure 6.3: The phase diagrams for k = 3,4,5 with d ∈ (dmin,dSAT) on the horizontal and θ on the vertical axis (Figure 3, [22]). 6.3 Proof Strategy. Thanks to the half integrality of the messages introduced in [Chapter 3, Fact 3.2.2], BP is equivalent to Warning Propagation in random k-XORSAT. Theorem 6.2.5–6.2.6 rely on the count of null variables in the WP algorithm. Recall ωF,x→a ,ωF,a→x ,ωF,x ∈ {f,u,n} be the WP limits from Chapter 3. Furthermore, let Vf,ℓ(F ), Vu,ℓ(F ), Vn,ℓ(F ) be the sets of variables with the respective mark after ℓ≥ 0 iterations and Vf(F ),Vu(F ),Vn(F ) be the sets of variables where the limit ωF,x takes the respective value. The following statement traces WP on the random formula F DC,t produced by the decimation process. Proposition 6.3.1 (Proposition 2.5, [22]). Let ε> 0 and assume that d > 0, t = t (n) ∼ θn satisfy one of the following conditions: (i). d < dmin, or (ii). d > dmin and θ ̸∈ {θ∗,θ∗}. Then there exists ℓ0 = ℓ0(d ,θ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have ∣∣t +|Vn,ℓ(F DC,t )|−α∗n ∣∣< εn, ∣∣t +|Vf,ℓ(F DC,t )|− (α∗−α∗)n ∣∣< εn, ∣∣Vn(F DC,t )△Vn,ℓ(F DC,t ) ∣∣< εn. Along with the above proposition, in order to investigate the accuracy of BP it suffices to compare the numbers of variables marked n by WP with the true marginals. The following corollary summarizes the result. Corollary 6.3.2 (Corollary 2.9, [22]). For any d, θ the following statements are true. (i). If d < dmin, or d > dmin and θ < θcond, or d > dmin and θ > θ∗, then |V0(F DC,t )△Vn(F DC,t )| = o(n) w.h.p. (ii). If d > dmin and θcond < θ < θ∗, then |V0(F DC,t )△Vn(F DC,t )| =Ω(n) w.h.p. CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 60 The Corollary 6.3.2 directly implies Theorem 6.2.6 which in turn implies Theorem 6.2.3 (ii). For the (non-)reconstruction thresholds in Theorem 6.2.5 we need to investigate the conditional marginals given the values of variables at a certain distances from xt+1 as in the (non)-reconstruction property defined in Chapter 4. This is where the extra value f from the construction of WP enters. Corollary 6.3.3 (Corollary 2.10, [22]). Assume that d > dmin and let ε> 0. (i). If θ < θcond, then for any fixed ℓ we have |Vf,ℓ(F DC,t )∩V0,ℓ 1(F DC,t )| < εn w.h.p. (ii). If θ > θcond, then there exists ℓ0 = ℓ0(d ,θ,ε) such that for any fixed ℓ> ℓ0 we have |(Vn,ℓ(F DC,t )∪Vf,ℓ(F DC,t ))△V0,ℓ(F DC,t )| < εn w.h.p. Comparing the number of actually frozen variables with the ones marked f by WP, we obtain Theo- rem 6.2.5. Coming to the proof of the success probability of BPGD, the Corollary 6.3.3 directly implies that the BP approximations of the marginals are mostly correct for d < dmin on the formula F DC,t obtained by the decimation process. The difficulty in analyzing BPGD lies in proving that the estimates of the algorithm are not just mostly correct, but correct up to only a bounded expected number of discrepancies over the entire execution of the algorithm. To prove this fact we combine the method of differential equations with a precise analysis of the sources of the remaining bounded number of discrepancies which comes from the presence of short (i.e., bounded-length) cycles (we call this as ’toxic cycles’ ) in the graph G(F ). (more details of the proof can be found in [22]). Again due to the half-integrality fact 3.2.2 on random k-XORSAT, we know that BPGD boils down to the pure combinatorial algorithm called ’Unit Clause Propagation’ (UCP) (pseudocode of the UCP algorithm can be found in Chapter 3). We conclude this chapter by stating the following proposition which can be verified easily (details can be found in [22]). Proposition 6.3.4 (Proposition 6.1, [22]). We have, P [ BPGD outputs a satisfying assignment of F XOR ]=P[ UCP outputs a satisfying assignment of F XOR ] . This proposition implies that the success probability of BPGD established in Theorem 6.2.3 is equivalent to that of the UCP algorithm. So, the second result of the thesis establishes a sharper bound on the clause length for k ≥ 3. Depending on the regime of d , both the BPGD and UCP algorithms may succeed or fail, thereby substantially improved over Yung’s result [102] which shows the lower bounds on the clause length (k ≥ 9 for UCP and k ≥ 13 for BPGD). 1V0,ℓ(F ) be the set of variables xi such that σi = 0 for all σ ∈ ker AF for which σh = 0 for all variables xh ∈ ∂ℓxi Based on: The random k-SAT Gibbs Uniqueness Threshold revisited [21] Arnab Chatterjee, Amin Coja-Oghlan, Catherine Greenhill, Vincent Pfenninger, Maurice Rolvien, Pavel Zakharov, Konstantinos Zampetakis arXiv:2506.01359 (2025) 7 On the Gibbs Uniqueness in random k-SAT “· · · the theory of Gibbs measures, which provides a very effective and flexible way to define collections of “locally dependent” random variables.” –Amir Dembo et.al. Unlike in previous chapter where we have investigated the random k-XORSAT problem, analyzing the performance of BPGD and its connection to the decimation process associated with different phase transitions, we now turn to random k-SAT, one of the most extensively studied random constraint satisfaction problems. Beyond its central role in computational complexity, it provides a natural framework for exploring the fundamental phenomena such as the number of satisfying assignments, clustering, reconstruction and Gibbs uniqueness phase transition. In this chapter our focus is to establish rigorous lower bounds on the number of satisfying assignments of random k-SAT upto the Gibbs uniqueness threshold inspired from the statistical physics inspired mechanism ’replica symmetric solution’ [77, 78]. 7.1 Motivation and History Since the time of 1990s, pinpointing the satisfiability threshold on random k-SAT, defined as the largest clause to variable density upto which satisfying assignments exist [6], has been a guiding theme of research in the area of random CSPs. For every k ≥ 3, indeed the physics inspired ’cavity method’ predicts the satisfiability threshold exactly [72] but for ’small’ k ≥ 3 in random k-SAT it is hard nut to crack. In statistical physics, one of the most important quantity is to determine the exact number of 61 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 62 satisfying assignments (also known as ’partition function’ in physics jargon) and then the satisfiability threshold. More recently three prior contributions stand out to prove the physics prediction correctly. Firstly, in [90] Panchenko and Talagrand proved a rigorous upper bound on the physics formula using a proof technique called ’interpolation method’. Secondly, Achlioptas et al. [2] proved the physics formula in the case k = 2 which is conceptually easier than k ≥ 3. Montanari and Shah in [83] provided a correct approximation on the number of ’good’ satisfying assignment all but o(n) clauses for all k ≥ 3. Our paper verifies the correct number of satisfying assignments given by physics method ’replica symmetry solution’ for any k ≥ 3 upto Gibbs uniqueness threshold. Moreover we derive a lower bound on the Gibbs uniqueness threshold which improves significantly over the work of Montanari and Shah in [83] for small k ≥ 3. 7.2 Main Results. In this section we state our third result of this thesis [21]. Let F kSAT = F d ,k (n) be the random k-CNF on n Boolean variables x1, . . . , xn with m d=Po(dn/k) clauses a1, . . . , am . Similarly like in random 2-SAT(Chapter 5) and in random k-XORSAT (Chapter 6) the clauses ai are drawn independently and uniformly from the set of all 2k (n k ) possible clauses with k distinct variables. The parameter d prescribes the expected number of clauses associated with a given variable appears to it. Also, let S(F kSAT) be the set of satisfying assignments of F kSAT and let Z (F kSAT) = |S(F kSAT)| i.e., the number of satisfying assignments of the random k-SAT formula F kSAT. The aim of this chapter are twofold: (i). To study the logarithm of the number of satisfying assignments of random k-SAT (in statistical physics it is called ’partition function’ as defined in Chapter 2) i.e., the quantity 1 n log Z (F kSAT) as n →∞, which is given by the prediction of ’replica symmetry solution’ in terms of ’Bethe Free entropy’ which is a function defined for a probability measure π ∈P (0,1). (ii). When the above quantity is well-defined we further rigorously analyzed the lower bound ob- tained for the Gibbs uniqueness threshold for any k ≥ 3 which is significantly improved for small k values over the work in [83]. 7.2.1 Limit in probability of log-partition function in random k-SAT Along with the Gibbs uniqueness property defined in Chapter 4 as a final preparation we need to illuminate the ’replica symmetric solution’ from [77, 78]. This prediction comes in terms of a fixed point problem on the space P (0,1) of probability measures on the open unit interval. Consider the Belief Propagation operator BPd ,k : P (0,1) →P (0,1), π 7→ π̂= BPd ,k (π) (7.2.1) 63 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT defined as follows. Let d+,d− d=Po(d/2) be Poisson variables with expectation d/2. Moreover, let (µπ,i , j )i , j≥1 be a sequence of i.i.d. random variables, each following distribution π. All these random variables are mutually independent. Further, let µπ,i = 1− k−1∏ j=1 µπ,i , j for i ≥ 1, and µ̂π = ∏d− i=1µπ,2i−1∏d− i=1µπ,2i−1 + ∏d+ i=1µπ,2i . (7.2.2) Then π̂ is the distribution of µ̂π. Furthermore, define the Bethe free entropy Bd ,k (π) = E [ log ( d−∏ i=1 µπ,2i + d+∏ i=1 µπ,2i−1 ) − d(k −1) k log ( 1− k∏ j=1 µπ,1, j )] , (7.2.3) provided that the expectation on the r.h.s. exists. Theorem 7.2.1 (Theorem 1.1, [21]). Let k ≥ 3 and assume that 0 < d < duniq(k). Then the weak limit πd ,k = lim ℓ→∞ BPℓd ,k (δ1/2) ∈P (0,1) (7.2.4) exists and lim n→∞ 1 n log Z (Φ) =Bd ,k (πd ,k ) in probability. (7.2.5) where, BPℓd ,k is the ℓ-fold application of the operator BPd ,k and δ1/2 ∈ P (0,1) be the atom at 1/2. Although the formula (7.2.5) is not explicit, but the proof the Theorem 7.2.1 reveals that the convergence of the weak limit πd ,k occurs rapidly. In the next subsection we will introduce few threshold values of d and provide a improved lower bound on the number of solutions of random k-SAT for any k ≥ 3. 7.2.2 Lower bound on Gibbs uniqueness Before we dive into the second result of this chapter we first introduce few known and our threshold values of d corresponds to the number of satisfying assignments of random k-SAT. From the title of this chapter one natural question arises: How can we determine the Gibbs uniqueness threshold duniq in random k-SAT? The best known current result for duniq is in the case of k = 2 which coincides with the value dSAT for random 2-SAT. As the precise value of duniq is not known for k ≥ 3, Montanari and Shah in [83] proved that this value is upper bounded by the pure literal threshold dpure defined in [19, 74]: duniq(k) ≤ dpure(k) = min z>0 z (1−exp(−z/2))k−1 . (7.2.6) CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 64 Figure 7.1: Comparison of Bd ,k (πd ,k ) with known bounds for limn→∞ 1 n log Z (Φ) for k = 3. [21] Complementing the upper bound (7.2.6), Montanari and Shah derived a lower bound dMS(k): dMS(k) = sup { d > 0 : d(k −1) ( 1−exp(−d/2)/4 )( 1−exp(−d/2)/2 )k−2 < 1 } ≤ duniq(k). (7.2.7) But unfortunately, this bound is not tight even for d = 2. Along this their bound only yields the number of ‘good’ assignments satisfying all but o(n) clauses, rather than of actual satisfying assignments. In the following theorem we derived a new lower bound dour on the Gibbs uniqueness threshold duniq on the number of actual satisfying assignments of random k-SAT. Theorem 7.2.2 (Theorem 1.2, [21]). For all k ≥ 3 we have duniq(k) ≥ dour(k) := sup { d > 0 : d(k −1) 2 ( 1−exp(−d/2)/2 )k−2 < 1 } . (7.2.8) An easy calculation exposes that for every k ≥ 2, dMS(k) < dour(k) Beside that, the best prior rigorous bounds on the number of satisfying assignments beyond the giant component threshold dgiant = 1/(k −1) from the first and second moment methods. The first moment bound reads 1 n log Z (F kSAT) ≤ log2+ d k log(1−2−k )+o(1) w.h.p. (7.2.9) Moreover, Achlioptas and Peres [7] perform a second moment argument on the number of satisfying assignments that enjoy a peculiar additional condition required to keep the second moment under 65 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT control. They show that w.h.p. 1 n log Z (F kSAT) ≥ (1−d) log2+ d k log [( λ1/2 +λ−1/2)k −λ−k/2 ] +o(1), (7.2.10) where(1−λ)(1+λ)k−1 = 1, λ> 0. (7.2.11) Figure 7.1 illustrates the bounds (7.2.9)–(7.2.10) along with (7.2.5) for k = 3. Figure 7.1 description: • The red dotted line depicts the first moment upper bound (7.2.9). • The green dotted line represents the lower bound provided by (7.2.10). • The blue line displays a numerical approximation of Bd ,3(πd ,3). To obtain our values, we generated 106 samples from π≈ BP25 d ,3(δ1/2) and then evaluated the corresponding empirical average of the expression in (7.2.3). Finally combining Theorem 7.2.1–Theorem 7.2.2 we obtain the following corollary: Corollary 7.2.3 (Corollary 1.3, [21]). For k ≥ 3 and d < dour the following holds from (7.2.5) lim n→∞ 1 n log Z (Φ) =Bd ,k (πd ,k ) 7.3 Proof Strategy The proof of the two results of this chapter comprises several steps discussing below: 7.3.1 Existence of fixed point. The existence of the limit πd ,k is an easy consequence of the Gibbs uniqueness property discussed in Chapter 4 for every d < duniq and the limit πd ,k = limℓ→∞ BPℓd ,k (δ1/2) is a fixed point of the Belief Propagation operator from 7.2.1. The following proposition implies that the limit defined in (7.2.4) exists with respect to W1-metric. Proposition 7.3.1 (Proposition 2.1, [21]). The W1-limit πd ,k = limℓ→∞ BPℓd ,k (δ1/2) exists and E [ log2µπd ,k ,1,1 ] +E ∣∣∣∣∣log ( d−∏ i=1 µπd ,k ,2i + d+∏ i=1 µπd ,k ,2i−1 )∣∣∣∣∣+E ∣∣∣∣∣log ( 1− k∏ j=1 µπd ,k ,1, j )∣∣∣∣∣<∞. (7.3.1) In addition, µπd ,k ,1,1 and 1−µπd ,k ,1,1 are identically distributed. Along with the fixed point existence below we will discuss the upper and lower bound of the value of log Z (F kSAT) using ’interpolation method’ and ’Aizenmann-Sims-Starr scheme’ respectively towards the proof of our result in the regime d < duniq. CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 66 7.3.2 Interpolation method: matching upper bound In mathematical physics as well as in random constraint satisfaction problems, there are several literature [31, 32, 90] deals with the interpolation method to provide a matching upper bound on the normalized log-partition function when n →∞. The basic idea is to construct a family of random CSPs, parameterized by t ∈ [0,1] which coincides with the final limiting random graph Ĝ of interest while for t =0 the CSP is so simple that we can calculate the partition function easily. Setup Define a family of interpolating CSPs {Gt }t∈[0,1]: • At t = 0: G0 is a trivial/decoupled CSP, often consisting of independent clauses on single variables. • At t = 1: G1 = Ĝ , the full random CSP model of interest. The clauses consists of exactly k variables. Indeed at t = 0 the logarithm of the partition function asymptotically equal to nBd ,k . To obtain the matching upper bound on E[log Z (Ĝ)] one can show that the mean of the logarithm of the partition function is a monotonically increasing function of t . In our model, the interpolation method along with Proposition 7.3.1 easily implies that, limsup n→∞ 1 n E [ log(Z (Φ)∨1) ]≤Bd ,k (πd ,k ) Ultimately Theorem 7.2.1 is the direct consequence of the following corollary and the proposition. Corollary 7.3.2 (Corollary 2.2, [21]). If d < duniq(k) then w.h.p. we have 1 n log Z (Φ) ≤Bd ,k (πd ,k )+o(1) Proposition 7.3.3 (Proposition 2.3, [21]). If d < duniq(k) then E [ log(Z (Φd ,k (n +1))∨1) ]−E[ log(Z (Φd ,k (n))∨1) ]=Bd ,k (πd ,k )+o(1). In order to evaluate the expectation from Proposition 7.3.3 we harness a ‘soft’ version of the k-SAT problem where violated clauses are discouraged but not strictly forbidden. Define for a real β> 0: Zβ(F kSAT) = ∑ σ∈{±1}V (F kSAT) ∏ a∈C (F kSAT) exp(−β1{σ ̸|= a}). (7.3.2) The above definition of the partition function ensures that Zβ(F kSAT) ≥ Z (F kSAT) for all β> 0. Then by means of interpolating argument we can say the following theorem and lemma. 67 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT Theorem 7.3.4 ( [90, Theorem 1]). For any k ≥ 3, any β> 0 and any probability measure π on [0,1] we have 1 n E [ log Zβ(F kSAT) ]≤ E [ log ( d−∏ i=1 µβ,π,2i + d+∏ i=1 µβ,π,2i−1 ) − d(k −1) k log ( 1− ( 1−e−β ) k∏ j=1 µπ,1, j )] , (7.3.3) where, µβ,π,i = 1− (1−exp(−β)) k−1∏ j=1 µπ,i , j (for i ≥ 1). The monotone convergence theorem for the measure π=πd ,k we get the explicit expression of the r.h.s. in (7.3.3). Now the routine application of Azuma-Hoeffding implies to this soft model with β<∞ gives the below concentration bound: Lemma 7.3.5 (Lemma 5.2, [21]). For any fixedβ> 0 we haveP [∣∣log Zβ(F kSAT)−E log Zβ(F kSAT) ∣∣>p n logn ]= o(1/n). This lemma implies that the clauses of the random formula F kSAT are drawn independently, and adding or removing a single clause can alter the value of log Zβ( ·) by no more than ±β. Finally, the Corollary 7.3.2 directly implies from Theorem 7.3.4 and Lemma 7.3.5. 7.3.3 Aizenmann-Sims-Starr: matching lower bound The key step of this chapter is to establish a lower bound on log Z (F kSAT) that matches the upper bound from Corollary 7.3.2. Here, we couple the random k-CNF F kSAT(n) = F d ,k (n) with n variables with the random k-CNF F kSAT(n + 1) = F d ,k (n + 1) with n + 1 variables. Most important part of the lower bound proof of Proposition 7.3.3 consists of the coupling argument discussed below along with the necessary tail bound. CPL1 Let F ′ kSAT be a random k-CNF with variables x1, . . . , xn and m′ d=Po(d(n −k +1)/k) clauses. CPL2 Obtain F ′′ kSAT from F ′ kSAT by adding another∆′′ d=Po(d(k −1)/k) independent random clauses. CPL3 Obtain F ′′′ kSAT from F ′ kSAT by adding one new variable xn+1 and∆′′′ d=Po(d) independent random clauses that each contain xn+1 and k −1 other variables from {x1, . . . , xn}. Figure 7.2 shows a graphical representation of the Aizenmann-Sims-Starr scheme (coupling technique used above). Based on the coupling we have the following fact and the tail bound: Fact 7.3.6 (Fact 2.4, [21]). For any d > 0 we have Z (F kSAT(n)) d=Z (F ′′ kSAT) and Z (F kSAT(n+1)) d=Z (F ′′′ kSAT). Proposition 7.3.7 (Proposition 2.7, [21]). For d < duniq(k) we have E [∣∣∣∣∣log Z (F ′′ kSAT)∨1 Z (F ′ kSAT)∨1 ∣∣∣∣∣ 3/2 + ∣∣∣∣∣log Z (F ′′′ kSAT)∨1 Z (F ′ kSAT)∨1 ∣∣∣∣∣ 3/2] =O(1). (7.3.4) CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 68 independent random clauses independent random clauses Figure 7.2: A graphical representation of coupling technique (Aizenmann-Sims-Starr scheme) Finally, towards the proof of Theorem 7.2.1, the existence of the limit comes from Proposition 7.3.1 (for the detailed proof one can refer [21]) and the normalized log-partition function comes from the Aizenmann-Sims-Starr lower bound along with the ’PULP’ algorithm we discussed in Chapter 3. Mathematically, the below equation (7.3.5) along with Corollary 7.3.2 implies the result. 1 n E [ log(1∨Z (F kSAT(n))) ]= 1 n n−1∑ N=0 ( E [ log(1∨Z (F kSAT(N +1)) ]−E[ log(1∨Z (F kSAT(N )) ]) =Bd ,k (πd ,k )+o(1) . (7.3.5) 7.3.4 Lower bound on Gibbs uniqueness threshold Finally, we are left with the proof of the Gibbs uniqueness threshold lower bound stated in Theo- rem 7.2.2. An obvious challenge associated with the establishing the Gibbs uniqueness property discussed in Chapter 4 is to estimate the marginal of the root variable given any possible boundary conditions at a distance 2ℓ from the root r . But using the help of [2] we may confine ourselves to just a single, explicit boundary configuration τ+ that satisfies P [ τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) =τ+(x) ] = max τ∈S(T(ℓ)) P [ τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) = τ(x) ] . (7.3.6) 69 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT and define for any variable w at distance 2q from x with parent clause a and grandparent variable u as τ+(w) = sign(w, a) · 1{sign(u, a) ̸=τ+(u)}− sign(w, a) · 1{sign(u, a) =τ+(u)} . (7.3.7) Then we can say for any integer ℓ > 0 the assignment τ+ satisfies (7.3.6). Hence the Theorem 7.2.2 reduces to the following statement. Proposition 7.3.8 (Proposition 2.13, [21]). For d < dour(k) we have that lim ℓ→∞ E [ P [ τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) =τ+(x) ] −P [ τ(ℓ)(r ) = 1 |T ]] = 0. (7.3.8) Although it seem delicate because the boundary condition τ+ depends on the tree T(ℓ). To over- come this problem, we generalize another technique from the work [2] on random 2-SAT to k ≥ 3 by introducing a quantity that allows us to prove (7.3.8) which behaves ‘Markovian’ as we pass up and down the tree. Finally, by combining the mechanism of coupling and contraction with the treatment of both pure and mixed literals (the detailed proof can be found in [21]), we conclude that Theorem 7.2.2 directly follows from the Proposition 7.3.8 together with the triangle inequality. 8 The Last Chapter “The important thing is not to stop questioning. Curiosity has its own reason for existing.” –Albert Einstein In this concluding chapter we take a break from formulating and answering questions and instead summarize the main ideas of this thesis and an evaluation of the author’s contribution to each paper is addressed. We also discuss some potential future directions and mention some relevant research areas that were out of scope for discussions in the previous chapters and in the papers (found in appendix). 8.1 Summary of the thesis So far, this thesis has addressed several problems in the probabilistic analysis of random combinatorial structures, more particularly in random constraint satisfaction problems (CSPs). While Chapter 5,6,7 focused on more specific model, together they contribute to a unified understanding of how the combinatorial structure, randomness and statistical physics ideas interact in discrete probability. Below we will give a very brief overview of our results stated in previous three chapters and a quick comparison with the previous results. Random 2-SAT: Our first result concerns the random 2-SAT, where we establish a central limit theorem on the number of random 2-SAT solutions which exhibits log-normal fluctuations. This provides a precise distributional description of the solution space size and strengthen the clear picture of probabilistic argument of random satisfiability by analyzing the contraction property of logBP⊗d ,t which ensures the existence of a unique fixed point ρd ,t and involves the analysis of Belief Propagation on a Galton-Watson tree, which connects the fixed point to the BP marginals. 70 71 CHAPTER 8. THE LAST CHAPTER Finally, we derive our main result from the general martingale central limit theorem , which is a special case of ( [55],Theorem 3.2). Comparison with previous work: Known results: • In 1996, Goerdt [53] stated that d = 2 is the satisfiability thresh- old of a random 2-SAT formula. In other words, for any ε > 0, the probability of random 2-SAT is satisfiable tends to one if d < 2−ε and tends to zero if d > 2 + ε as n →∞. Another obvious question is to find out the number of satisfying assignments in the satisfiable regime i.e., when d < 2. • In 2021, Achlioptas et.al. [2] pro- vides a first order approximation by stating the normalized parti- tion function i.e., the logarithm of the number of solution (Z ) of random 2-SAT formula F w.r.t. n (where n is the number of variables in F ) converges in prob- ability to a constant (µd ) which doesn’t depend on n. Mathemati- cally, log Z (F ) n p−→µd Our results: In this thesis, we say even more about the number of solutions of random 2-SAT. The number of random 2-SAT solutions exhibits fluctuations of order p n. More precisely, log Z (F )−E[log Z (F )] ηd p n d−→N (0,1) where ηd ≥ 0 is not depend on n. Along with the asymptotic normality we also evaluated the formula for variance effec- tively. Note: By contrast, for other random CSPs the typical fluctuations of the logarithm of the number of solutions are bounded throughout all or most of the satisfiable regime. Random k-XORSAT: In the context of the second result of this thesis, we analyzed the performance of Belief Propagation Guided Decimation, a statistical physics inspired algorithm on random k- XORSAT and our rigorous analysis mathematically verified the heuristic work of Ricci-Tersenghi and Semerjian [96]. Specifically, we derive an explicit threshold upto which the BPGD algorithm succeeds with a strictly positive probability Ω(1) and beyond which the algorithm fails with CHAPTER 8. THE LAST CHAPTER 72 high probability. Due to its algebraic structure, BPGD is equivalent to the purely combinatorial algorithm called ’Unit Clause Propagation’(UCP), so the results work for BPGD should work for UCP as well for the same parameter values. In addition to this we analyze the ’Decimation process’ for which we identify a (non)-reconstruction and condensation phase transition. Comparison with previous work: Known results: • In 2009, Ricci-Tersenghi and Semerjian [96] provide a heuris- tic on the satisfiability formula without giving a proof. They just mentioned the success proba- bility of both BPGD and UCP without verified the proof for that. Psucc = exp ( − ∫ 1 0 d t 4(1− t ) f (t )2 (1− f (t )) ) with f (t) = αk(k − 1)t k−2(1 − t) where α is the clause-to-variable density. • In a recent paper by Yung [102] establishes the ’Overlap Gap paradigm’ which only provides one-sided bound due to the mo- ment computations (in physics jargon called ’annealed’ tech- nique), which implies no positive result of the algorithm. For the above reason, his lower bounds on the clause length k ≥ 9 for UCP and k ≥ 13 for BPGD are not tight. Moreover according to his argument BPGD and Unit clause both algorithm fail for d > dcore. Our results: We mathematically verified the heuristic predictions of Ricci-Tersenghi and Semer- jian [96] as to the performance of BPGD by providing an explicit formula for the success probability of the algorithm for precise clause-to-variable densities. For k ≥ 3, we proof the following: • If d < dmin(k), then lim n→∞Psucc = exp ( −d 2(k −1)2 4 ∫ 1 0 z2k−4(1− z) 1−d(k −1)zk−2(1− z) dz ) . • If dmin(k) < d < dSAT(k), then Psucc = o(1). Moreover, in contrast to Yung’s result, our technique relies on ’quenched’ argument which shows that for any k ≥ 3, both BPGD and UCP find the satisfying assign- ment with strictly positive probability for d < dmin with dmin < dcore. We also analyze different phase transition of decimation process for which we pin- point the (non)-reconstruction and con- densation phase transition based on dif- ferent d and θ values for any k ≥ 3. 73 CHAPTER 8. THE LAST CHAPTER Random k-SAT: Turning to random k-SAT, we revisited the Gibbs Uniqueness threshold. In this result we prove that for any k ≥ 3 for the clause-to-variable densities upto the Gibbs uniqueness threshold , the number of satisfying assignments of random k-SAT is given by physics inspired ’replica symmetric solution’. Our result sharpen the understanding of the onset of long range correlations on random k-SAT by clarifying the transition of Gibbs measure from uniqueness to (non)-uniqueness regime. Below we will compare our result with the previous results in this context. Comparison with previous work: Known results: • In 2004 Panchenko and Tala- grand [90] proved a rigorous upper bound on the physics formula for the number of satis- fying assignments using a proof technique called ’interpolation method’. • Achlioptas et.al. in [2] proved the physics formula in the case k = 2 which is much simpler than for any k ≥ 3. • The most important work in this regards by Montanari and Shah in 2007 provided a correct ap- proximation on the number of ’good’ satisfying assignment all but o(n) clauses for k ≥ 3. However, it seems difficult to estimate the gap between the number of such ’good’ assign- ments and the number of actual satisfying assignments. Our results: Our paper verifies the correct number of satisfying assignments given by physics method ’replica symmetry solution’ used in [77,78] for any k ≥ 3 upto Gibbs unique- ness threshold. Moreover we derive a lower bound on the Gibbs uniqueness threshold which im- proves significantly over the work of Mon- tanari and Shah in [83] for small k ≥ 3. Below we provide the values of dgiant (gi- ant component threshold of hypergraph induced by random k-CNF formula), dMS (Montanari-Shah bound), dour (our bound), dpure (pure literal threshold), dSAT (satisfiability threshold) for small k- values. k 2 3 4 5 dgiant 1.0000 0.5000 0.3333 0.2500 dMS 1.1625 0.8792 0.8695 0.9236 dour 2.0000 1.3431 1.2451 1.2635 dpure 2.0000 4.9108 6.1782 7.0178 dSAT 2.0000 12.801 39.724 105.585 Taken together the results in this thesis highlight the estimation of partition function (in other words, the number of satisfying assignments) of random formulas and their phase transitions by providing a clear picture of the solution space geometry of different random satisfiability problems. CHAPTER 8. THE LAST CHAPTER 74 Therefore, it establishes a bridge between probabilistic combinatorics, random graphs and statistical physics. More specifically, it show how the tools such as local weak convergence, recursive fixed point distributional equations, different message passing algorithms (such as Belief Propagation, Warning Propagation, UCP), correlation decay methods can be leveraged to analyze the long range dependencies, phase transitions, solution space geometry and algorithmic thresholds in random structures. There are several challenging and more interesting open problems still remain which we will encounter in the next section. 8.2 Future Directions Paper 1 [23]: The number of random 2-SAT solutions is asymptotically log-normal ■ Investigate whether the present method of considering correlated instances be extended to random optimization problems. Moreover, establishing a central limit theorems for random optimization problems are also very interesting. Cao [20] provided a general framework based on the ‘objective method’ [9]. Unfortunately, the conditions of Cao’s theorem tend to be unwieldy for MAX-CSP problems with hard constraints. Recent work of Kreačič [58] and Glasgow, Kwan, Sah, Sawhney [52] on the matching number therefore instead resorts to the use of stochastic differential equations. ■ Another most interesting question can be whether the log-normal fluctuations hold for counting the number of solutions in other models such as random Horn-SAT or any planted CSPs. ■ Understand the precise behavior of variance near the satisfiability threshold and across the families of random CSPs whether the fluctuations emerge or not. Paper 2 [22]: Belief Propagation Guided Decimation on random k-XORSAT ■ Unlike random k-XORSAT, due to the lack of inherent symmetry in random k-SAT, BPGD algo- rithm provably fails to find the satisfying assignments on random k-SAT instances even below the threshold where the set of satisfying assignments shatters into well-separated clusters [1, 59]. So, a sophisticated message passing algorithm called ’Survey Propagation Guided Decimation’ has been suggested in [72, 96]. In random k-XORSAT, both BPGD and SPGD are equivalent but these two algorithms are substantially different in random k-SAT. Therefore investigating SPGD on random k-SAT can be one of the most interesting problem in this context and one might hope that SPGD outperforms BPGD on random k-SAT and finds satisfying assignments up to the aforementioned shattering transition. A negative result to the effect that Survey Propagation Guided Decimation fails asymptotically beyond the shattering transition point for large enough k exists [56]. Yet a complete analysis of SPGD/BPGD on random k-SAT like in random k-XORSAT used in this thesis remains an outstanding challenge. 75 CHAPTER 8. THE LAST CHAPTER ■ Investigating the interplay between the decimation process and the solution space geometry especially how the frozen variables emerge dynamically. ■ Apply the technique used in this paper to some inference problems in graphical models (e.g., community detection, coding theory and planted problems). ■ Finally, one of the most interesting open problem towards the performance of various types of algorithms such as greedy, message passing or local search that aim to find an assignment that violates the least possible number of clauses. A first step based on the heuristic ‘dynamical cavity method’ was recently undertaken by Maier, Behrens and Zdeborová [64]. Paper 3 [21]: The random k-SAT Gibbs Uniqueness threshold revisited ■ Pinpoint the exact threshold and nature of the Gibbs uniqueness threshold for general k ≥ 3 and its connection with statistical physics predictions. ■ Detailed exploration of the other phase transitions beyond Gibbs uniqueness like- (non)-reconstruction / reconstruction, condensation and freezing phases and their interactions. Physics predicts a sequence of phase transitions [59]: Replica Symmetric (non-reconstruction) → clustering/dynamic phase → condensation → freezing whereas the reconstruction threshold coincide with the clustering threshold. Physics predictions exist for the reconstruction threshold in random k-SAT using the heuristic technique called ’cavity method’ (or, replica symmetry) and it happens when αrec ≈ 2k k logk for large k values. So, when it comes with the long range correlations in the Gibbs measure the exact rigorously mathematical proof of reconstruction threshold is still a challenging problem. ■ Extending the results on Gibbs uniqueness to other random CSPs like hypergraph coloring, independent sets in random hypergraphs and spin glasses – making a promising and interesting research direction. 8.3 Contribution of the authors In the previous chapters we discussed an overview of the results and the proofs along with the useful tools we have used to get our results. The full versions of the papers can be found in appendix. We conclude the thesis by providing a list of papers that are the backbone of the thesis and to which the author of this thesis contributed. The first result of this thesis is from the paper titled “The number of random 2-SAT solutions is asymptotically log-normal” by Arnab Chatterjee, Amin Coja-Oghlan, Noela Müller, Connor Rid- dlesden, Maurice Rolvien, Pavel Zakharov, Haodong Zhu. The extended version of the paper has been published in the ’Theory of Computing’ (TOC) journal and the preliminary version appeared in CHAPTER 8. THE LAST CHAPTER 76 ’Approximation, Randomization, and Combinatorial Optimization’ (APPROX/RANDOM’24) – Leib- niz International Proceedings in Informatics (LIPIcs), volume 317, 39 : 1−39 : 15, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2024). This paper establishes a central limit theorem (’CLT’) on the logarithm of the number of solutions of random 2-SAT formula throughout the satisfiable regime. Beside this the paper also calculate the variance effectively. The problem first raised when NM, CR and HZ visited TU Dortmund for a week and introduced the martingale central limit theorem from the book of Hall and Heyde [55] and AC, ACO, MR and PZ jointly discussed with them about the martingale CLT on the number of satisfying assignments of random 2-SAT and how to prove the finiteness of the variance. Later AC, ACO, MR and PZ discussed towards variance calculation by constructing two correlated formulas and how to obtained a pruned correlated formula using UCP. AC, ACO, MR and PZ jointly examined the effect of removing a single clause from the original formula on the number of solutions in the pruned formula. AC contributed towards creating the Galton-Watson tree which is the local limit of two correlated formulas. Beside this, AC, NM and HZ was involving in the contraction property of logBP⊗d ,t which ensures the existence of a unique fixed point ρd ,t and involves the analysis of belief propagation on a Galton-Watson tree, which connects this fixed point to the BP marginals and thus completing the proof of our main result. The second result of this thesis is from the paper titled “Belief Propagation Guided Decimation on random k-XORSAT” by Arnab Chatterjee, Amin Coja-Oghlan, Mihyung Kang, Lena Krieg, Maurice Rolvien and Gregory Sorkin. The extended version of the paper has been submitted to ’Theory of Computing’ (TOC) journal and the conference version of this paper appeared in the ’52nd Interna- tional Colloquium on Automata, Languages and Programming’ (ICALP’25) – Leibniz International Proceedings in Informatics (LIPIcs), volume 334, 47 : 1−47 : 21, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2025). This paper analyze the performance of BPGD algorithm on random k-XORSAT formula and different phase transition of the decimation process. The problem was first raised when GS visited TU Dortmund and ACO introduced the problem by pointing out the paper by two physicists Ricci-Tersenghi and Semerjian [96] that their paper only provide a heuristic on the success probability of the algorithm. AC,MR and GS worked towards the tight analysis of BPGD by analyzing the function Φd ,k (Bethe-Free Entropy) and some threshold values of d . AC carried out the calculus part which leads to the proof of the behavior of the function w.r.t. different parameters (d ,λ). Later, AC, ACO and LK jointly developed the lemmas involving the presence of so-called toxic cycles in a sub formula which create obstacles towards the success probability of the formula. Coming to the last result of this thesis, it is from the paper titled “The random k-SAT Gibbs unique- ness threshold revisited” by Arnab Chatterjee, Amin Coja-Oghlan, Catherine Greenhill, Vincent Pfen- ninger, Maurice Rolvien, Pavel Zakharov and Konstantinos Zampetakis. The paper has been submitted to “Combinatorics, Probability and Computing” (CPC) journal. The aim of this paper is to determine the number of actual satisfying assignments of random k-SAT formula for clause-to-variable densities upto Gibbs uniqueness threshold. The problem was first raised when CG and VP visited TU Dortmund and we came up with an problem on counting the number of actual assignments of random k-SAT for- mula. Then we all looked at the paper by Montanari and Shah [83] in which they showed that for k ≥ 3 77 CHAPTER 8. THE LAST CHAPTER certain clause/variable densities the ’replica symmetric solution’ from physics correctly approximates the number of ‘good’ assignments that satisfy all but o(n) clauses. But unfortunately their bound was not tight even for k = 2 case. In this context AC, ACO, PZ and KZ jointly developed the pure literal pursuit (’PULP’) algorithm whose purpose is to trace the repercussions of setting a relatively small number of variables to specific truth values which constitutes the main technical challenge towards proof of the main result. AC and KZ was involved in constructing pure and mixed literal operator LL⋆k,d and the contraction of the operator with respect to a metric related to W1-Wasserstein distance and this summarizes the main step towards the proof of second theorem of the paper. nainaṁ chhindanti śhhastrān. i nainaṁ dahati pāvakah. na chainaṁ kledayantyāpo na śhhos. hayati mārutah. The atma (soul) cannot be shattered by weapons, it cannot be burnt by fires, it cannot be drenched by the waters and it cannot be rendered dry by the winds. – Srimat Bhagavad Gita (2.23) Bibliography [1] D. Achlioptas and A. Coja-Oghlan. Algorithmic barriers from phase transitions. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25–28, 2008, Philadel- phia, PA, USA, IEEE Computer Society, pp. 793–802 (2008). [2] D. Achlioptas, A. Coja-Oghlan, M. Hahn-Klimroth, J. Lee, N. Müller, M. Penschuck, G. Zhou: The number of satisfying assignments of random 2-SAT formulas. Random Structures and Algorithms 58, pp. 609–647, (2021). [3] D. Achlioptas and E. Friedgut: A sharp threshold for k-colorability. Random Structures & Algo- rithms,14, pp. 63–70, (1999). [4] D. Achlioptas, C. Moore: Random k-SAT: two moments suffice to cross a sharp threshold. SIAM Journal on Computing 36, 3, pp.740–762, (2006). [5] D. Achlioptas and A. Naor: The Two Possible Values of the Chromatic Number of a Random Graph. Annals of Mathematics, Second Series, Vol. 162, 3, pp. 1335-1351, (2005). [6] D. Achlioptas, A. Naor and Y. Peres: Rigorous location of phase transitions in hard optimization problems. Nature 435, 7043, pp.759–764, (2005). [7] D. Achlioptas, Y. Peres: The threshold for random k-SAT is 2k log2−O(k). Journal of the AMS 17, pp.947–973, (2004). [8] D. Achlioptas and F. Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction problems. In Proceedings of 38th STOC, 130-139, New York, NY, USA, (2006), ACM. [9] D. Aldous, J. Steele: The objective method: probabilistic combinatorial optimization and local weak convergence. In: H. Kesten (ed.): Probability on Discrete Structures. Springer (2004). [10] N. Alon and J. Spencer: The probabilistic method. Willey, 2nd edition, (2000). [11] S. Arora and B. Barak. Computational complexity: a modern approach. Cambridge University Press, (2009). [12] P. Ayre, A. Coja-Oghlan, P. Gao, N. Müller: The satisfiability threshold for random linear equations. Combinatorica 40, pp. 179–235, (2020). 78 [13] A.B.Babaev, A.K.Murtazaev and F.A. Kassan-Ogly: Ground State of an Antiferromagnetic Three- State Potts Model on a Triangular Lattice with Competing Interactions. Journal of Experimental and Theoretical Physics, Vol. 127, pp. 323–327, (2018). [14] H. A. Bethe. Statistical physics of superlattices. Proceedings of the Royal Society of London A, 150:552–575, (1935). [15] G. Biroli, R. Monasson and M. Weigt. A variational description of the ground state structure in random satisfiability problems. The European Physics Journal B, 14:551, (2000). [16] R. Biswas, W. Chen, A. Sen: On the replica symmetric solution in general diluted spin glasses. arXiv:2410.15599 (2024). [17] S.C. Brailsford, C.N. Potts and B.M. Smith. Constraint satisfaction problems: Algorithms and applications. European journal of operational research, 119(3):557–581, (1999). [18] G. Bresler, B. Huang: The algorithmic phase transition of random k-sat for low degree polynomials. In 62nd Annual Symposium on Foundations of Computer Science, FOCS 2021, IEEE Computer Society, Los Alamitos, CA, pp.298–309, (2022). [19] A. Broder, A. Frieze, E. Upfal: On the satisfiability and maximum satisfiability of random 3-CNF formulas. Proc. 4th SODA, pp.322–330, (1993). [20] S. Cao: Central limit theorems for combinatorial optimization problems on sparse Erdős-Rényi graphs. Annals of Applied Probability 31, pp.1687–1723, (2021). [21] A. Chatterjee, A. Coja-Oghlan, C. Greenhill, V. Pfenninger, M. Rolvien, P. Zakharov, K. Zampetakis: The random k-SAT Gibbs Uniqueness Threshold revisited. arXiv preprint arXiv:2506.01359, (2025). [22] A. Chatterjee, A. Coja-Oghlan, M. Kang, L. Krieg, M. Rolvien, G. Sorkin: Belief Propagation Guided Decimation on random k-XORSAT. In Proceedings of the 52nd ICALP, (2025). [23] A. Chatterjee, A. Coja-Oghlan, N. Müller, C. Riddlesden, M. Rolvien, P. Zakharov, H. Zhu: The number of random 2-SAT solutions is asymptotically log-normal. In Proceedings of 28th RANDOM, 39, (2024). [24] P. Cheeseman, B. Kanefsky, W. Taylor: Where the really hard problems are. In Proceedings of the IJCAI, pp.331–337, (1991). [25] W.-K. Chen, P. Dey, D. Panchenko: Fluctuations of the free energy in the mixed p-spin models with external field. Probability Theory and Related Fields 168, pp.41–53, (2017). [26] V. Chvátal, B.A. Reed: Mick gets some (the odds are on his side). In 33rd Annual Symposium on Foundations of Computer Science, Pittsburgh, Pennsylvania, USA, 24–27 October 1992, IEEE Computer Society, pp.620–627. 79 [27] A. Coja-Oghlan. A better algorithm for random k-SAT. SIAM Journal on Computing, 39(7):2823– 2864, (2010). [28] A. Coja-Oghlan: Belief Propagation fails on random formulas. Journal of the ACM 63 (2017) #49. [29] A. Coja-Oghlan, A. Ergür, P. Gao, S. Hetterich, M. Rolvien: The rank of sparse random matrices. Proc. 31st SODA, pp. 579–591, (2020). [30] A. Coja-Oghlan, A. Haqshenas and S. Hetterich. Walksat stalls well below satisfiability. SIAM Journal on Discrete Mathematics, 31(2):1160–1173, (2017). [31] A. Coja-Oghlan, T. Kapetanopoulos, N. Müller: The replica symmetric phase of random constraint satisfaction problems. Combinatorics, Probability and Computing 29, 3 , pp.346-422, (2020). [32] A. Coja-Oghlan, F. Krzakala, W. Perkins, L. Zdeborová: Information-theoretic thresholds from the cavity method. Advances in Mathematics 333, pp.694–795, (2018). [33] A. Coja-Oghlan, A. Pachon-Pinzon: The decimation process in random k-SAT. SIAM Journal on Discrete Mathematics 26, pp.1471–1509, (2012). [34] A. Coja-Oghlan and K. Panagiotou. The asymptotic k-SAT threshold. Advances in Mathematics, 288, pp.985–1068, (2016). [35] A. Coja-Oghlan, N. Wormald: The number of satisfying assignments of random regular k-SAT formulas. Combinatorics, Probability and Computing 27, pp.496–530, (2018). [36] A. Crisanti, G. Paladin and H.-J.S.A. Vulpiani: Replica trick and fluctuations in disordered systems. Journal de Physique I, 2(7):1325–1332, (1992). [37] A. Dembo, A. Montanari: Ising models on locally tree-like graphs. Annals of Applied Probability 20, pp.565–592, (2010). [38] A. Dembo, A. Montanari: Gibbs measures and phase transitions on sparse random graphs. Brazil- ian Journal of Probability and Statistics 24, pp.137–211, (2010). [39] A. Dembo, A. Montanari, A. Sly, N. Sun: The replica symmetric solution for Potts models on d-regular graphs. Communications in Mathematical Physics, 327, pp. 551–575, (2014). [40] A. Dembo, A. Montanari, N. Sun: Factor models on locally tree-like graphs. Annals of Probability 41, pp.4162–4213, (2013). [41] J. Ding, A. Sly and N. Sun. Proof of the satisfiability conjecture for large k. 20 Annals of Mathematics 196, pp.1–388, (2022). [42] O. Dubois, J. Mandler: The 3-XORSAT threshold. In 43rd Annual IEEE Symposium on Foundations of Computer Science, FOCS, pp. 769–778, (2002). 80 [43] A. Dylan: PhD Thesis : Gibbs Measures on Sparse Random Graphs. Princeton University, Computer Science Dept. Engineering Quadrangle Princeton, NJ, United States, (2022). [44] C. Efthymiou: On sampling symmetric Gibbs distributions on sparse random graphs and hyper- graphs. In 49th EATCS International Conference on Automata, Languages, and Programming, vol. 229 of LIPICs. Leibniz Int. Proc. Inform. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, (2022), page Art. No. 57, 16. [45] J. Feigenbaum. The use of coding theory in computational complexity. In Proceedings of Sympo- sium in Applied Mathematics, volume 50, pp.207–233, (1995). [46] E.C. Freuder and A.K. Mackworth. Constraint satisfaction: An emerging paradigm. In Foundations of Artificial Intelligence, volume 2, 13–27. Elsevier, (2006). [47] E. Friedgut. Sharp thresholds of graph properties, and the k-SAT problem. Journal of the AMS, 12:1017–1054, (1999). [48] A. Frieze and M. Karoński: Introduction to Random Graphs. Cambridge University Press, (2015). [49] R. G. Gallager: Low-density parity check codes. IEEE Transaction on Information Theory, 8:21–28, (1962). [50] D. Gamarnik and M.Sudan. Performance of sequential local algorithms for the random NAE-k-SAT problem. SIAM Journal on Computing, 46(2):590–619, (2017). [51] M.R. Garey and D.S. Johnson. Computers and intractability: A guide to the theory of NP- completeness. Freeman, San Francisco, (1979). [52] M. Glasgow, M. Kwan, A. Sah, M. Sawhney: A central limit theorem for the matching number of a sparse random graph. arXiv:2402.05851 (2024). [53] A. Goerdt: A threshold for unsatisfiability. Journal of Computer and System Sciences, 53, pp.469– 486, (1996). [54] J. Gu and R. Sosic. A parallel architecture for constraint satisfaction. In International conference on industrial and engineering applications of artificial intelligence and expert systems,pp. 229–237, (1991). [55] P. Hall, C. Heyde: Martingale limit theory and its applications. Academic Press (1980). [56] S. Hetterich: Analysing survey propagation guided decimation on random formulas. arXiv preprint arXiv:1602.08519, (2016). [57] L.M. Kirousis, E. Kranakis, D. Krizanc and Y.C. Stamatiou: Approximating the unsatisfiability threshold of random formulas. Random Structures & Algorithms, 12(3):253–269, (1998). 81 [58] E. Kreačič: Some problems related to the Karp-Sipser algorithm on random graphs. Ph.D. thesis, University of Oxford, (2017). [59] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian and L. Zdeborová: Gibbs states and the set of solutions of random constraint satisfaction problems. In Proceedings of the National Academy of Sciences 104 (25), pp. 10318–10323, (2007). [60] F.R. Kschischang, B. Frey and H. A. Loeliger: Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2):498–519, (2001). [61] V. Kumar. Algorithms for constraint-satisfaction problems: A survey. AI magazine, 13(1):32–32, (1992). [62] L. Lovász: Large networks and graph limits. AMS (2012). [63] T. Łuczakand J.C. Wierman: The chromatic number of random graphs at the double-jump threshold. Combinatorica, Volume 9, pages 39–49, (1989). [64] A. Maier, F. Behrens, L. Zdeborová: Dynamical cavity method for hypergraphs and its application to quenches in the k-XOR-SAT problem. arxiv 2412.14794 (2024). [65] S. Mertens, M. Mézard and R. Zecchina. Threshold values of random k-SAT from the cavity method. Random Structures and Algorithms, 28(3):340–373, (2006). [66] M. Mézard and A. Montanari : Information, physics and computation. Oxford University Press, (2009). [67] M. Mézard and A. Montanari. Reconstruction on trees and spin glass transition. Journal of Statis- tical Physics, 124:1317-1350, (2006). [68] M. Mézard and T. Mora. Constraint satisfaction problems and neural networks: A statistical physics perspective. Journal of Physiology-Paris, 103(1-2): 107–113, (2009). [69] M. Mézard and G. Parisi. The bethe lattice spin glass revisited. The European Physics Journal B, 20:217, (2001). [70] M. Mézard, G. Parisi, N. Sourlas, G. Toulouse, and M. Virasoro: Replica symmetry breaking and the nature of the spin glass phase. Journal de Physique, 45(5), pp. 843–854, (1984). [71] M. Mézard, G. Parisi and M. Virasoro: Spin glass theory and beyond: An introduction to the replica method and its applications, World Scientific Publishing Company, Vol. 9, (1987). [72] M. Mézard, G. Parisi and R. Zecchina. Analytic and algorithmic solution of random satisfiability problems. Science, 297:812–815, (2002). [73] D. Mitchell, B. Selman and H. Levesque. Hard and easy distributions of SAT problems. In Proceed- ings of the 10th National Conference on Artificial Intelligence, pp. 459-465, (1992). 82 [74] M. Molloy. Models for random constraint satisfaction problems. SIAM Journal on Computing, 32(4):935–949, (2003). [75] M. Molloy: Cores in random hypergraphs and Boolean formulas. Random Structures and Algo- rithms 27, pp.124–135, (2005). [76] M. Molloy. The freezing threshold for k-colorings of a random graph. In Proceedings of the 44th symposium on Theory of Computing, 921. ACM, (2012). [77] R. Monasson, R. Zecchina. The entropy of the k-satisfiability problem. Physics Review Letter, 76, 3881, (1996). [78] R. Monasson, R. Zecchina: Statistical mechanics of the random K -SAT model. Phys. Rev. E 56, pp.1357–1370, (1997). [79] R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman and L. Troyansky. 2+p-sat: Relation of typical- case complexity to the nature of the phase transition. Random Structures and Algorithms, 15:414, (1999). [80] A. Montanari, R. Restrepo and P. Tetali. Reconstruction and clustering in random constraint satisfaction problems. SIAM Journal on Discrete Mathematics, 25(2):771–808, (2011). [81] A. Montanari, F. Ricci-Tersenghi, G. Semerjian. Cluster of solutions and replica symmetry breaking in random k-satisfiability. Journal of Statistical Mechanics, P04004, (2008). [82] A. Montanari and G. Semerjian: Rigorous inequalities between length and time scales in glassy systems. Journal of Statistical Physics, 125: 23, (2006). [83] A. Montanari and D. Shah. Counting good truth assignments of random k-SAT formulae. In Proceedings of the 18th SODA, pp. 1255–1264, (2007). [84] C. Moore and S. Mertens. The Nature of Computation. Oxford University Press, (2011). [85] E. Mossel and Y. Peres. Information flow on trees. Annals of Applied Probability, 13(3):817–844, 08 (2003). [86] N. Múller, R. Neininger, H. Zhu. Random 2-SAT: The set of atoms of the limiting empirical marginal distribution. arXiv:2410.17749 [math.PR], (2024). [87] D. Panchenko: The Sherrington-Kirkpatrick model. Springer (2013). [88] D. Panchenko: Spin glass models from the point of view of spin distributions. Annals of Probability 41, pp.1315–1361, (2013). [89] D. Panchenko: On the replica symmetric solution of the K -sat model. Electron. J. Probab. 19 (2014) #67. 83 [90] D. Panchenko, M. Talagrand: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130, pp. 319–336, (2004). [91] C.H. Papadimitriou. Computational complexity. Addison-Wesley, (1994). [92] J. Pearl: Probabilistic reasoning in intelligent systems : networks of plausible inference. Morgan kaufmann Publishers Inc., San Francisco, CA, USA, (1988). [93] B. Pittel, G.B. Sorkin: The satisfiability threshold for k-XORSAT. Combinatorics, Probability and Computing 25, 2, pp. 236–268, (2016). [94] B. Pittel, J. Spencer and N. Wormald: Sudden emergence of a giant k-core in a random graph. Journal of Combinatorial Theory, Series B, Vol. 67, (1996). [95] F. Rassmann : On the number of solutions in random graph k-coloring. Combinatorics, Probability and Computing 28, 1, pp. 130–158, (2019). [96] F. Ricci-Tersenghi, G. Semerjian: On the cavity method for decimated random constraint satisfac- tion problems and the analysis of belief propagation guided decimation algorithms. Journal of Statistical Mechanics, P09001, (2009). [97] R. Robinson, N. Wormald: Almost all regular graphs are Hamiltonian. Random Structures and Algorithms 5, pp. 363–374, (1994). [98] A. Sly: Computational transition at the uniqueness threshold. Proc. 51st FOCS, pp. 287–296, (2010). [99] E.P.K. Tsang. Foundations of constraint satisfaction. Academic press, (1993). [100] L.G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Comput- ing, 8(3):410–421, (1979). [101] J.S. Yedidia, W.T. Freeman and Y. Weiss: Constructing free energy approximations and general- ized belief propagation algorithms. Technical report TR-2002–35, Mitsubishi Electrical Research Laboratories, (2002). [102] K. Yung: Limits of sequential local algorithms on the random k-XORSAT problem. Proc. 51st ICALP (2024) #123. 84 A List of Papers 85 THE NUMBER OF RANDOM 2-SAT SOLUTIONS IS ASYMPTOTICALLY LOG-NORMAL ARNAB CHATTERJEE, AMIN COJA-OGHLAN, NOËLA MÜLLER, CONNOR RIDDLESDEN, MAURICE ROLVIEN, PAVEL ZAKHAROV, HAODONG ZHU ABSTRACT. We prove that throughout the satisfiable phase, the logarithm of the number of satisfying assignments of a random 2-SAT formula satisfies a central limit theorem. This implies that the log of the number of satisfying assignments exhibits fluctuations of order p n, with n the number of variables. The formula for the variance can be evaluated effec- tively. By contrast, for numerous other random constraint satisfaction problems the typical fluctuations of the logarithm of the number of solutions are bounded throughout all or most of the satisfiable regime. MSc: 05C80, 60C05, 68Q87 1. INTRODUCTION 1.1. Background and motivation. The quest for satisfiability thresholds has been a guiding theme of research into random constraint satisfaction problems [7, 17, 25]. But once the satisfiability threshold has been pinpointed a question of no less consequence is to determine the distribution of the number of satisfying assignments within the satisfiable phase [35]. Indeed, the number of solutions is intimately tied to phase transitions that affect the geometry of the solution space, which in turn impacts the computational nature of finding or sampling solu- tions [4, 18, 29]. However, few tools are currently available to count solutions of random problems. Where precise rigorous results exist (such as in random NAESAT or XORSAT), the proofs typically rely on the method of moments (e.g., [6, 27, 43, 44]). Yet a necessary condition for the success of this approach is that the problem in question exhibits certain symmetries, which are absent in many interesting cases [7, 21]. The aim of the present paper is to shed a closer light on the number of satisfying assignments in random 2-SAT, the simplest random CSP that lacks said symmetry properties. While the random 2-SAT satisfiability threshold has been known since the 1990s [20, 32], a first-order approximation to the number of satisfying assignments has been obtained only recently [5]. This timeline reflects the computational complexity of the respective questions. As is well known, deciding the satisfiability of a 2-CNF reduces to directed reachability, solvable in polynomial time [10]. By contrast, calculating the number of satisfying assignmets Z (Φ) of a 2-CNFΦ is a #P-hard task [48]. Nonethe- less, Monasson and Zecchina [38] put forward a delicate physics-inspired conjecture as to the exponential order of the number of satisfying assignments of random 2-CNFs. Achlioptas et al. [5] recently proved this conjecture. Their theorem provides a first-order, law-of-large-numbers approximation of the logarithm of the number of satisfying assignments. The present paper contributes a much more precise result, namely a central limit theorem. We show that throughout the satisfiable phase the logarithm of the number of satisfying assignments, suitably shifted and scaled, converges to a Gaussian. This is the first central limit theorem of this type for any random CSP. Let Φ =Φn,m be a random 2-CNF on n Boolean variables x1, . . . , xn with m clauses, drawn independently and uniformly from all 4 (n 2 ) possible 2-clauses. Suppose that m ∼ dn/2 for a fixed real d > 0. Thus, d gauges the average number of clauses in which a variable xi appears. The value d = 2 marks the satisfiability threshold; hence, Φ is satisfiable with high probability (‘w.h.p.’) if d < 2, and unsatisfiable w.h.p. if d > 2 [20, 32]. Achlioptas et al. [5] determined a function φ(d) > 0 such that for all d < 2, i.e., throughout the entire satisfiable phase we have Z (Φ) = exp(nφ(d)+o(n)) w.h.p. , (1.1) thereby determining the leading exponential order of Z (Φ). However, (1.1) fails to identify the limiting distribution of Z (Φ). To be precise, since (1.1) shows that Z (Φ) scales exponentially, we expect this random variable to exhibit multiplicative fluctuations. Therefore, the appropriate goal is to find the limiting distribution of the logarithm of this random variable, i.e., of log Z (Φ). Indeed, physics intuition suggests that log Z (Φ) should be asymptotically Gaussian [36]. The main result of the present paper confirms this hunch. Specifically, letting Γη(d) be a Gaussian with mean 0 and standard deviation η(d) > 0, we prove that for all 0 < d < 2, log Z (Φ) satisfies P [ log Z (Φ)−E[log Z (Φ) | Z (Φ) > 0] < z p m ]∼P[ Γη(d) < z ] (z ∈R). (1.2) 1 ar X iv :2 40 5. 03 30 2v 2 [ cs .D M ] 2 0 Se p 20 24 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 d 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Va ria nc e (d)2 0.40 0.45 0.50 0.55 0.60 0.65 0.70 Ex pe ct at io n (d) First moment bound FIGURE 1. Left: Numerical approximations to the functionφ(d) from (1.1) (red) and the variance η(d)2 from (1.7) (green). The black dashed line is the first moment bound d 7→ log(2)+ d 2 log(3/4). Right: An illustration of the tree T ⊗ from Section 2.6. The order Θ( p n) of fluctuations confirmed by (1.2) sets random 2-SAT apart from a large family of other ran- dom constraint satisfaction problems. For example, for random graph q-colouring with q ≥ 3 colours the log of the number of q-colourings superconcentrates, i.e., merely has bounded fluctuations throughout most of the regime where the random graph is q-colourable [12].1 The same is true of random NAESAT, XORSAT and the symmet- ric perceptron [1, 11, 21, 43]. In each of these cases, certain fundamental symmetry properties (e.g., that the set of q-colourings remains invariant under permutations of the colours) enable the computation of the number of solutions via the method of moments. Random 2-SAT lacks the respective symmetry (as the set of satisfying as- signments is not generally invariant under swapping ‘true’ and ‘false’), and accordingly (1.2) establishes that the number of solutions fails to superconcentrate (for more details see [21]). 1.2. The main result. The formula for the standard deviation η(d) from (1.2) comes in terms of a fixed point equa- tion on a space of probability measures. Thus, let P (R2) be the set of all (Borel) probability measures on R2. For 0 < d < 2 and 0 ≤ t ≤ 1 we define an operator logBP⊗d ,t :P ( R2)→P ( R2) , ρ 7→ ρ̂ = logBP⊗d ,t (ρ), (1.3) as follows. Let (ξρ,i )i≥1, (ξ′ρ,i )i≥1, (ξ′′ρ,i )i≥1, ξρ,i = ( ξρ,i ,1 ξρ,i ,2 ) , ξ′ρ,i = ( ξ′ρ,i ,1 ξ′ρ,i ,2 ) , ξ′′ρ,i = ( ξ′′ρ,i ,1 ξ′′ρ,i ,2 ) be random vectors with distribution ρ, let d dist= Po(td), d ′,d ′′ dist= Po((1− t )d) and let si , s ′i , s ′′i ,r i ,r ′ i ,r ′′ i for i ≥ 1 be uniformly random on {±1}, all mutually independent. Then ρ̂ is the distribution of the vector   ∑d i=1 si log ( 1 2 ( 1+ r i tanh(ξρ,i ,1/2) ))+∑d ′ i=1 s ′i log ( 1 2 ( 1+ r ′ i tanh(ξ′ρ,i ,1/2) )) ∑d i=1 si log ( 1 2 ( 1+ r i tanh(ξρ,i ,2/2) ))+∑d ′′ i=1 s ′′i log ( 1 2 ( 1+ r ′′ i tanh(ξ′′ρ,i ,2/2) ))   ∈R2 . In addition, define a function B⊗ d ,t : P (R2) → (0,∞] by letting B⊗ d ,t (ρ) = E [ 2∏ h=1 log ( 1− 1 4 (1+ r 1 tanh(ξρ,1,h/2))(1+ r 2 tanh(ξρ,2,h/2)) )] . (1.4) 1Formally, up to the so-called condensation threshold, which precedes the q-colourabiliy threshold by a small additive constant, the loga- rithm of the number of q-colurings minus its expectation converges in distribution to a random variable with bounded moments [12, 13, 21]. 2 Theorem 1.1. For any 0 < d < 2, t ∈ [0,1] there exists a unique probability measure ρd ,t ∈P (R2) such that ρd ,t = logBP⊗d ,t (ρd ,t ) and ∫ R2 ∥ξ∥2 2dρd ,t (ξ) <∞. (1.5) Furthermore, lim n→∞ log Z (Φ)−E[log Z (Φ) | Z (Φ) > 0]p m =Γη(d) in distribution, where (1.6) η(d)2 = ∫ 1 0 B⊗ d ,t (ρd ,t )dt −B⊗ d ,0(ρd ,0) ∈ (0,∞). (1.7) The conditioning on log Z (Φ) > 0 is necessary in (1.6), because even for d < 2 the formulaΦ is unsatisfiable with probabilityΩ(n−1), in which case log Z (Φ) =−∞. Moreover, the L2-bound from (1.5) ensures that the integral (1.7) is well-defined. Finally, (1.6) implies (1.2). How can the formula (1.7) be evaluated? Because the proof of the uniqueness of the stochastic fixed point ρd ,t from (1.5) is based on the contraction method, a fixed point iteration will converge rapidly. In effect, for any d , t a discrete distribution that approximates ρd ,t arbitrarily well (in Wasserstein distance) can be computed via a randomised algorithm called population dynamics [36, Chapter 14]. Since B⊗ d ,t (ρd ,t ) varies continously in d and t , η(d)2 can thus be approximated within any desired accuracy, see Figure 1. 2. PROOF STRATEGY The main challenge towards the proof of Theorem 1.1 is to get a handle on the variance of log Z (Φ) given satis- fiability. The key idea, inspired by spin glass theory [19] but novel to random constraint satisfaction, is to count the joint number of satisfying assignments of two correlated random formulas. Once this is accomplished Theo- rem 1.1 will follow from the careful application of a general martingale central limit theorem. To get acclimatised we first revisit the method of moments, the reasons it fails on random 2-SAT and the combinatorial interpretation of the law of large numbers (1.1). 2.1. The method of moments fails. The default approach to estimating the number of solutions to a random CSP is the venerable second moment method [7]. Its thrust is to show that the second moment of the number of solutions is of the same order as the square of the expected number of solutions. If so then the moment compu- tation together with small subgraph conditioning yields the precise limiting distribution of the number of solu- tions [24, 45]. However, this approach works only if the log of the number of solutions superconcentrates around the log of the expected number of solutions. This necessary condition is not satisfied in random 2-SAT. In fact, a straightforward calculation yields 1 n logE[Z (Φ)] ∼ log2+ d 2 log(3/4). (2.1) The formula on the r.h.s. is displayed as the black dashed line in Figure 1. As can be verified analytically, this line strictly exceeds the function φ(d) from (1.1) for any 0 < d < 2. Consequently, (1.1) implies that log Z (Φ) ≤ logE[Z (Φ)]−Ω(n) w.h.p. In other words, the expected number of solutions E[Z (Φ)] overshoots the typical number of solutions by an exponential factor w.h.p. ; cf. the discussion in [6, 8]. 2.2. Belief Propagation. Instead of the method of moments, the prescription of the physics-based work of Monas- son and Zecchina [38] is to estimate log Z (Φ) by way of the Belief Propagation (BP) message passing algorithm. This approach was vindicated rigorously by Achlioptas et al. [5]. As we will reuse certain elements of that analysis we dwell on BP briefly. For a clause a of a 2-CNFΦ let ∂a = ∂Φa be the set of variables that a contains. Moreover, for x ∈ ∂a let signΦ(x, a) = sign(x, a) ∈ {±1} be the sign with which x appears in a. Analogously, let ∂x = ∂Φx be the set of clauses in which variable x appears. BP introduces ‘messages’ between clauses a and the variables x ∈ ∂a. More precisely, each such clause-variable pair a, x comes with two messages µx→a ,µa→x . The messages are probability distributions on ‘true’ and ‘false’, which we represent by ±1. Thus, µx→a(±1),µa→x (±1) ≥ 0 and µx→a(1)+µx→a(−1) =µa→x (1)+µa→x (−1) = 1. The messages get updated iteratively by an operator BP : (µx→a ,µa→x )a,x∈∂a 7→ (µ̂x→a , µ̂a→x )a,x∈∂a = BP((µx→a ,µa→x )a,x∈∂a). (2.2) 3 For a clause a with adjacent variables ∂a = {x, y} the updated messages µ̂a→x (±1) are defined by µ̂a→x (sign(x, a)) = 1 1+µy→a(sign(y, a)) , µ̂a→x (−sign(x, a)) = µy→a(sign(y, a)) 1+µy→a(sign(y, a)) . (2.3) Moreover, for a variable x and a clause a ∈ ∂x we define2 µ̂x→a(s) = ∏ b∈∂x\{a}µb→x (s)∏ b∈∂x\{a}µb→x (1)+∏ b∈∂x\{a}µb→x (−1) (s ∈ {±1}) . (2.4) The purpose of BP is to heuristically ‘approximate’ the marginal probabilities that a random satisfying assignment σ=σΦ ofΦwill set a certain variable to a specific truth value. The ‘approximation’ given by the set (µx→a ,µa→x )a,x∈∂a of messages reads µx (s) = ∏ b∈∂x µb→x (s)∏ b∈∂x µb→x (1)+∏ b∈∂x µb→x (−1) (s ∈ {±1}). (2.5) The BP ‘ansatz’ now asks that we iterate the BP operator until an (approximate) fixed point is reached, i.e., ideally until µ̂a→x = µa→x and µ̂x→a = µx→a for all a, x. Then we evaluate the BP marginals (2.5) and plug them into a generic formula called the Bethe free entropy, which yields the BP ‘approximation’ of log Z (Φ); an excellent exposition can be found in [36]. The BP recipe provably yields the correct result if the bipartite graph induced by the clause-variable incidences of the 2-CNFΦ is acyclic, but may be totally off otherwise. Of course, for 1 < d < 2 the bipartite graph associated with the random formulaΦ contains cycles in abundance. Nonetheless, (1.1) confirms that the BP formula provides a valid approximation to within o(n). The proof is based on two observations. First, that the local structure of the clause-variable incidence graph can be described by a Galton-Watson tree. Second, that the Galton-Watson tree enjoys a spatial mixing property called Gibbs uniqueness. Since the proof of Theorem 1.1 also harnesses Gibbs uniqueness, let us elaborate. To mimic the local structure ofΦ consider a multitype Galton-Watson tree T whose types are variable nodes and clause nodes of four sub-types (s, s′) with s, s′ ∈ {±1}. The root o is a variable node. The offspring of any variable node is a Po(d/4) number of clause nodes of each of the four sub-types. Finally, the offspring of a clause node is a single variable node. The clause type (s, s′) indicates that s is the sign with which the parent variable appears in the clause, while s′ determines the sign of the child variable. Thus, the Galton-Watson tree T can be viewed as a (possibly infinite) 2-CNF. For an integer ℓ ≥ 0 let T (2ℓ) be the finite tree/2-CNF obtained by deleting all variables and clauses at a distance larger than 2ℓ from the root. The tree T approximates Φ locally in the sense that for any fixed ℓ and any given variable xi the distribution of the depth-2ℓ neighbourhood of xi in Φ converges to T (2ℓ) as n →∞ (in the sense of local weak convergence). Moreover, Gibbs uniqueness posits that under random satisfying assignments of the tree-CNF T (2ℓ) the truth value σo of the root under a random satisfying assignment σ decouples from the values σT ,y of variables y ∈ ∂2ℓo at distance precisely 2ℓ from o for large enough ℓ. Formally, with S(T (2ℓ)) the set of satisfying assignments of the 2-CNF T (2ℓ), the following is true. Proposition 2.1 ([5, Proposition 2.2]). We have lim ℓ→∞ E [ max τ∈S(T (2ℓ)) ∣∣∣P [ σo = 1 | T (2ℓ),σ∂2ℓo = τ∂2ℓo ] −P [ σo = 1 | T (2ℓ) ]∣∣∣ ] = 0. (2.6) 2.3. Approaching the variance. The proof of the formula (1.1) combines the Gibbs uniqueness property and the local convergence to the Galton-Watson tree with a coupling argument called the ‘Aizenman-Sims-Starr scheme’ [5]. Unfortunately, this combination does not seem precise enough to get a handle on the limiting distribution of log Z (Φ) by a long shot. Actually, it is anything but clear how even the order of the standard deviation of log Z (Φ) could be derived along these lines. One specific problem is that the rate of convergence of (2.6) diminishes as d approaches the satisfiability threshold. To tackle this challenge we devise a combinatorial interpretation of log2 Z (Φ). A key idea, which we borrow from spin glass theory [19], is to set up a family of correlated random formulas. Specifically, given integers M , M ′ ≥ 0 we construct a correlated pair (Φ1(M , M ′),Φ2(M , M ′)) of formulas on the variable set Vn = {x1, . . . , xn} as follows. Let (ai )i≥1, (a ′ i )i≥1, (a ′′ i )i≥1 be sequences of mutually independent uniformly random clauses on Vn . Then Φ1(M , M ′) = a1 ∧·· ·∧aM ∧a ′ 1 ∧·· ·∧a ′ M ′ , Φ2(M , M ′) = a1 ∧·· ·∧aM ∧a ′′ 1 ∧·· ·∧a ′′ M ′ . (2.7) 2For the sake of tidyness, if the above denominator vanishes we simply let µ̂x→a (±1) = 1 2 . 4 Thus, the two formulas share clauses a1, . . . , aM . Additionally, each contains another M ′ independent clauses. In particular,Φ1(m,0),Φ2(m,0) are identical, whileΦ1(0,m),Φ2(0,m) are independent. Interpolating between these extreme cases offers a promising avenue for computing the variance: given that Φ1(M ,m −M) andΦ2(M ,m −M) are satisfiable for all M , we can write a telescoping sum log Z (Φ1(m,0)) · log Z (Φ2(m,0))− log Z (Φ1(0,m)) · log Z (Φ2(0,m)) (2.8) = m∑ M=1 log Z (Φ1(M ,m −M)) · log Z (Φ2(M ,m −M)) − log Z (Φ1(M −1,m −M +1)) · log Z (Φ2(M −1,m −M +1)). If we could take the expectation on the l.h.s. of (2.8), we would precisely obtain the variance of log Z (Φ). Moreover, each summand on the r.h.s. amounts to a ‘local’ change of swapping a shared clause for a pair of independent clauses. Yet we cannot just take the expectation of (2.8), because some Φh(M ,m − M) may be unsatisfiable. To remedy this, we will replace log Z (Φ) by a tamer random variable with the same limiting distribution. Its construc- tion is based on the Unit Clause Propagation algorithm. 2.4. Unit Clause Propagation. Employed by all modern SAT solvers as a sub-routine, Unit Clause Propagation is a linear time algorithm that tracks the implications of partial assignments. The algorithm receives as input a 2-CNF Φ along with a set L of literals. These literals are deemed to be ‘true’. The algorithm then pursues direct logical implications, thereby identifying additional ‘implied’ literals that need to be true so that no clause gets violated. This procedure is outlined in Steps 1–2 of Algorithm 1; the outcome of Steps 1–2 is independent of the order in which literals/clauses are processed. Input: A 2-CNFΦ along with a set L of literals deemed true. 1 while there exists a clause a ≡ l ∨¬l ′ with l ′ ∈L and l ̸∈L do 2 add literal l to L ; 3 For variables x ∈V (Φ) such that x ∈L or ¬x ∈L let σx =    1 if x ∈L and ¬x ̸∈L , −1 if ¬x ∈L and x ̸∈L , 0 otherwise. Let C be the set of all clauses a such that σx = 0 for all x ∈ ∂a and return L ,C ,σ; Algorithm 1: Pessimistic Unit Clause Propagation (‘PUC’). Clearly, trouble brews if PUC ends up placing both a literal l and its negation ¬l into the set L . Our ‘pessimistic’ Unit Clause variant makes no attempt at mitigating such contradictions. Instead, Step 3 just constructs a partial assignment where all conflicting literals are set to a dummy value zero. Additionally, PUC identifies the set C of conflict clauses that contain conflicted variables only. Now consider a 2-CNF Φ on a set of variables V (Φ). For each possible literal l ∈ {x,¬x : x ∈ V (Φ)} we run PUC(Φ,L = {l }). Let C (Φ, {l }) be the set of conflict clauses returned by PUC. Obtain the pruned formula Φ̂ from Φ by removing all clauses in C (Φ) =⋃ l C (Φ, {l }). Then it is easy to verify the following.3 Fact 2.2. For any 2-CNFΦ the pruned 2-CNF Φ̂ is satisfiable. Generally, the pruned formula Φ̂ could have far fewer clauses than the original formula Φ. Accordingly, even if Φ is satisfiable the number Z (Φ̂) of satisfying assignments of Φ̂ could dramatically exceed Z (Φ). However, the following proposition shows that on a random formula, the impact of pruning is modest. Proposition 2.3. With probability 1−o(n−1/2) we have | log Z (Φ̂)− log Z (Φ)| ≤ n1/3. 3See Section 4.2 for a detailed proof. 5 2.5. Variance redux. The error bound from Proposition 2.3 is tight enough so that towards the proof of Theo- rem 1.1 it suffices to establish a central limit theorem for log Z (Φ̂), i.e., the log of the number of satisfying assign- ments of the pruned formula. Once again the pivotal task to this end is to compute the variance of log Z (Φ̂). Revisit- ing the telescoping sum (2.8), we obtain the following expression. Recalling (2.7), we write Φ̂h(M , M ′) = áΦh(M , M ′) for the formula obtained by pruningΦh(M , M ′). Lemma 2.4. Let ∆(M) = E [ log ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) ) · log ( Z (Φ̂2(M ,m −M)) Z (Φ̂2(M −1,m −M)) )] , (2.9) ∆′(M) = E [ log ( Z (Φ̂1(M −1,m −M +1)) Z (Φ̂1(M −1,m −M)) ) · log ( Z (Φ̂2(M −1,m −M +1)) Z (Φ̂2(M −1,m −M)) )] . (2.10) Then Var [ log Z (Φ̂) ]= m∑ M=1 ∆(M)−∆′(M). Lemma 2.4 expresses the variance as a sum of local changes. For example, Φ1(M ,m − M) is obtained from Φ1(M −1,m −M) by adding a single random clause, namely aM . Thus, ∆(M) equals the expected change upon addition of a single shared clause—modulo the effect of pruning, that is. But fortunately, on random formulas only a few clauses get pruned w.h.p. In effect, we can express the impact of these random changes neatly in terms of random satisfying assignments of the ‘small’ formulas Φ̂h(M −1,m −M) that appear in (2.9)–(2.10). Specifically, the quotients in (2.9)–(2.10) boil down to the probabilities that random satisfying assignments of the ‘small’ formulas survive the extra clause that gets added to obtain the 2-CNFs in the respective numerators. Thus, withσ= (σy )y∈Vn denoting a random satisfying assignment of Φ̂h(M −1,m−M), we obtain the following. Proposition 2.5. Let 1 ≤ M ≤ m. W.h.p. we have Z (Φ̂h(M ,m −M)) Z (Φ̂h(M −1,m −M)) = 1− ∏ y∈∂aM P [ σy ̸= sign(y, aM ) | Φ̂h(M −1,m −M), aM ]+o(1) (h = 1,2), Z (Φ̂1(M −1,m −M +1)) Z (Φ̂1(M −1,m −M)) = 1− ∏ y∈∂a ′ m−M+1 P [ σy ̸= sign(y, a ′ m−M+1) | Φ̂1(M −1,m −M), a ′ m−M+1 ]+o(1), Z (Φ̂2(M −1,m −M +1)) Z (Φ̂2(M −1,m −M)) = 1− ∏ y∈∂a ′′ m−M+1 P [ σy ̸= sign(y, a ′′ m−M+1) | Φ̂2(M −1,m −M), a ′ m−M+1 ]+o(1). 2.6. Local convergence in probability. To evaluate the expressions from Proposition 2.5 we need to get a grip on the joint distribution of the truth values of y under random satisfying assignments of the two correlated formulas Φ̂h(M − 1,m − M). To this end we will devise a Galton-Watson tree T ⊗ that mimics the joint distribution of the local structure of (Φ̂1(M −1,m −M),Φ̂2(M −1,m −M)). Subsequently, we will establish Gibbs uniqueness for this Galton-Watson tree to compute the expressions from Proposition 2.5. The Galton-Watson tree T from Section 2.2 that describes the local topology of the ‘plain’ random formula Φ had one type of variable nodes and four types (±1,±1) of clause nodes. To approach the correlated pair (Φ̂1(M ,m− M−1),Φ̂2(M ,m−M−1)) we need a Galton-Watson process with three types of variable nodes and a full dozen types of clause nodes. Specifically, there are shared, 1-distinct and 2-distinct variable nodes. The root o of T ⊗ is a shared variable node. The clause node types are (s, s′)-shared, (s, s′) 1-distinct and (s, s′) 2-distinct for s, s′ ∈ {±1}. In addition to d ∈ (0,2) the offspring distributions of T ⊗ = T ⊗ d ,t involve a second parameter t ∈ [0,1]: • A shared variable spawns Po(d t/4) shared clauses of type (s, s′) as well as Po(d(1− t )/4) 1-distinct clauses of type (s, s′) and Po(d(1− t )/4) 2-distinct clauses of type (s, s′) for any s, s′ ∈ {±1}. • An h-distinct variable begets Po(d/4) h-distinct clauses of type (s, s′) for any s, s′ ∈ {±1} (h = 1,2). • A shared clause has precisely one shared variable as its offspring. • An h-distinct clause spawns a single h-distinct variable (h = 1,2). Figure 1 provides an illustration of the tree T ⊗. Shared variables/clauses are indicated in red, 1-distinct vari- ables/clauses in green and 2-distinct ones in blue. From T ⊗ we extract a pair (T 1,T 2) of correlated random trees. Specifically, T h is obtained from T ⊗ by deleting all (3−h)-distinct variables and clauses. Hence, the parameter t determines how ‘similar’ T 1,T 2 are. Specifically, 6 if t = 1 then no {1,2}-distinct clauses exist and thus T 1,T 2 are identical. By contrast, if t = 0 then T 1,T 2 are inde- pendent copies of the tree T from Section 2.2. For an integer ℓ≥ 0 obtain T ⊗, (2ℓ), T (2ℓ) 1 , T (2ℓ) 2 from T ⊗,T 1,T 2 by omitting all nodes at a distance greater than 2ℓ from the root o. As in Section 2.2, we can interpret these trees as 2-CNFs, with the type (s, s′) of a clause indicating the signs of its parent and child variables. We say that two possible outcomes T,T ′ of T ⊗, (2ℓ) are isomorphic if there is a tree isomorphism that preserves the root o as well as all types. Further, a variable x ∈Vn is called a 2ℓ-instance of T in (Φ̂1(M , M ′),Φ̂2(M , M ′)) if there exist isomorphisms ιh of the 2-CNFs Th obtained from T by deleting all (3−h)-distinct variables/clauses to the depth-2ℓ neighbourhoods ∂≤2ℓ Φ̂h (M ,M ′) x of x in Φ̂h(M , M ′) such that • the root gets mapped to x, i.e., ι1(o) = ι2(o) = x, • for any shared variable y of T1,T2 the image variables coincide, i.e., ι1(y) = ι2(y), • for any shared clauses a of T1,T2 the image ι1(a) = ι2(a) ∈ {a1, . . . , aM } is a shared clause, • for any 1-distinct clause a whose parent in T1 is a shared variable, ι1(a) ∈ {a ′ 1, . . . , a ′ M ′ }, and • for any 2-distinct clause a whose parent in T2 is a shared variable, ι1(a) ∈ {a ′′ 1 , . . . , a ′′ M ′ }. Let N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′))) be the number of 2ℓ-instances of T in (Φ1(M , M ′),Φ2(M , M ′)).The follow- ing proposition confirms that T ⊗ models the local structure of (Φ̂1(M , M ′),Φ̂2(M , M ′)) faithfully. Proposition 2.6. Let ℓ > 0 be a fixed integer, let t ∈ [0,1] and suppose that M ∼ tdn/2 and M ′ ∼ (1− t )dn/2. Then w.h.p. for all possible outcomes T of T ⊗, (2ℓ) we have N (2ℓ)(T, (Φ̂1(M , M ′),Φ̂2(M , M ′))) ∼ nP [ T ⊗, (2ℓ) ∼= T ] . 2.7. Correlated Belief Propagation. Now that we have a branching process description of our pair of correlated formulas the next step is to run BP on the random trees (T 1,T 2) to find the joint distribution of the truth values σT (2ℓ) 1 ,o ,σT (2ℓ) 2 ,o assigned to the root. Hence, let µ(2ℓ) = ( P [ σT (2ℓ) 1 ,o = 1 | T ⊗ ] ,P [ σT (2ℓ) 2 ,o = 1 | T ⊗ ]) ∈ (0,1)2. (2.11) Since BP is exact on trees, we could calculate these marginals by iterating (2.2)–(2.4) for 2ℓ steps, starting from all-uniform messages. But our objective is not merely to calculate the marginals of a specific pair of trees, but the distribution of the vector (2.11) for a random T ⊗. Fortunately, due to the Markovian nature of the Galton-Watson tree T ⊗, the bottom-up BP computation on a random tree can be expressed by a fixed point iteration on the space of probability distributions onR2. The appropriate operator is the logBP⊗d ,t -operator from (1.3). To be precise, that operator expresses the updates of the log-likelihood ratios of the BP messages from (2.3)–(2.4). Thus, let t : (z1, z2) ∈R2 7→ ((1+ tanh(z1/2))/2,(1+ tanh(z2/2))/2) ∈ (0,1)2 be the function that maps log-likelihood ratios back to probabilities. Furthermore, for a probability measure ρ ∈ P (R2) let t(ρ) be the pushforward probability measure on (0,1)2.4 Proposition 2.7. Let ρ(0) d ,t ∈P (R2) be the atom at the origin and let ρ(ℓ) d ,t = logBP⊗d ,t (ρ(ℓ−1) d ,t ). Then µ(2ℓ) has distribu- tion t(ρ(ℓ) d ,t ). We employ the contraction method to show that the sequence (ρ(ℓ) d ,t )ℓ≥1 of measures converges. Proposition 2.8. There exists a unique ρd ,t ∈P (R2) that satisfies (1.5) and limℓ→∞ρ(ℓ) d ,t = ρd ,t weakly. Furthermore, the Gibbs uniqueness property (2.6) extends to T 1 and T 2. Corollary 2.9. For all t ∈ [0,1] and h = 1,2 we have lim ℓ→∞ E [ max τ∈S(T (2ℓ) h ) ∣∣∣P [ σT (2ℓ) h ,o = 1 | T ⊗,σT (2ℓ) h ,∂2ℓo = τ∂2ℓo ] −P [ σT (2ℓ) h ,o = 1 | T ⊗ ]∣∣∣ ] = 0. (2.12) Combining Propositions 2.7 and 2.8 and Corollary 2.9, we are now in a position to pinpoint the joint marginals of Φ̂1(M , M ′),Φ̂2(M , M ′). Formally, let πΦ̂1(M ,M ′),Φ̂2(M ,M ′) = 1 n n∑ i=1 δ(P[σΦ̂1(M ,M ′),xi =1|Φ̂1(M ,M ′)],P[σΦ̂2(M ,M ′),xi =1|Φ̂2(M ,M ′)]) ∈P ([0,1]2) 4That is, for a measurable A⊆ (0,1)2 we have t(ρ)(A) = ρ(t−1(A)). 7 FIGURE 2. The distributions t(ρd ,t ) for d = 1.9 and t = 0.1,0.5,0.9. be the empiricial distribution of the joint marginals of Φ̂1(M , M ′) and Φ̂2(M , M ′), which we need to know to eval- uate the expressions from Proposition 2.5. Furthermore, denote by W1( · , · ) the Wasserstein L1-distance of two probability measures on [0,1]2. Corollary 2.10. For any t ∈ [0,1] and any M ∼ tnd/2, M ′ ∼ (1− t )dn/2 we have E [ W1 ( πΦ̂1(M ,M ′),Φ̂2(M ,M ′),t(ρd ,t ) )] = o(1). Finally, combining Proposition 2.5 with Corollary 2.10, we obtain the variance of log Z (Φ̂). Corollary 2.11. With η(d)2 from (1.7) we have η(d) > 0 and Varlog Z (Φ̂) ∼ mη2 d . Because the proof of Proposition 2.8 is based on a contraction argument, for any d , t the distribution ρd ,t can be approximated effectively within any given accuracy via a fixed point iteration. Figure 2 displays approximations to t(ρd ,t ) for different values of t and shows how correlations between the two coordinates of the random vector increase with t (brighter diagonal). 2.8. The central limit theorem. With the variance computation done, we have now overcome the greatest hur- dle en route to Theorem 1.1. Indeed, to obtain the desired asymptotic normality we just need to combine the techniques from the variance computation with a generic martingale central limit theorem. To this end we set up a filtration (Fn,M )0≤M≤mn by letting Fn,M be theσ-algebra generated by a1, . . . , aM . Hence, conditioning onFn,M amounts to conditioning on a1, . . . , aM , while averaging on the remaining clauses aM+1, . . . , am . The conditional expectations Z n,M = m−1/2E [ log Z (Φ̂) |Fn,M ] (2.13) then form a Doob martingale. Let X n,M = Z n,M −Z n,M−1 be the martingale differences. Proposition 2.12. For all 0 < d < 2 the martingale (2.13) satisfies lim n→∞E [ max 1≤M≤m |X n,M | ] = 0 and lim n→∞E ∣∣∣∣∣η(d)2 − m∑ M=1 X 2 n,M ∣∣∣∣∣= 0. (2.14) Thanks to pruning, the first condition from (2.14) is easily checked. Furthermore, the steps that we pursued towards the proof of Corollary 2.11, i.e., the variance calculation, also imply the second condition without further ado. Finally, as (2.14) demonstrates that the marginal differences are small and that the variance process converges to a deterministic limit, Theorem 1.1 follows from the general martingale central limit theorem from [28]. 3. DISCUSSION The hunt for satisfiability thresholds of random constraint satisfaction problems was launched by the experimen- tal work of Cheeseman, Kanefsky and Taylor [17]. The 2-SAT threshold was the first one to be caught [20, 32]. Subsequent successes include the 1-in-k-SAT threshold [3] and the k-XORSAT threshold [27, 43]. Furthermore, 8 Friedgut [30] proved the existence of non-uniform (i.e., n-dependent) satisfiability thresholds in considerable gen- erality. The plot thickened when physicists employed a compelling but non-rigorous technique called the cavity method to ‘predict’ the exact satisfiability thresholds of many further problems, including the k-SAT problem for k ≥ 3 [37]. A line of rigorous work [6, 8, 23] culminated in the verification of this physics prediction for large k [25]. Even though the satisfiability threshold of random 2-SAT was determined already in the 1990s, the problem continued to receive considerable attention. For example, Bollobás, Borgs, Chayes, Kim and Wilson [15] investi- gated the scaling window around the satisfiability threshold, a point on which a recent contribution by Dovgal, de Panafieu and Ravelomanana elaborates [26]. Abbe and Montanari [2] made the first substantial step towards the study of the number of satisfying assignments that 1 n log Z (Φ) converges in probability to a deterministic limit ϕ(d) for Lebesgue-almost all d ∈ (0,2). However, their techniques do not reveal the value ϕ(d). Moreover, Monta- nari and Shah [39] obtain a ‘law-of-large-numbers’ estimate of the number of assignments that satisfy all but o(n) clauses for d < 1.16. Finally, the aforementioned article of Achlioptas et al. [5] verifies the prediction from [38] as to the number of satisfying assignments for all d < 2. The main result of the present paper refines these results considerably by establishing a central limit theorem. For random k-CNFs with k ≥ 3 an upper bound on the number of satisfying assignments can be obtained via the interpolation method from mathematical physics [42]. This bound matches the predictions of the cav- ity method [36]. However, no matching lower bound is currently known. The precise physics prediction called the ‘replica symmetric solution’ has only been verified for ‘soft’ versions of random k-SAT where unsatisfied clauses are penalised but not strictly forbidden, and for clause-to-variable ratios well below the satisfiability threshold [39, 41, 47]. Random CSPs such as random k-XORSAT or random k-NAESAT that exhibit stronger symmetry properties than random k-SAT tend to be amenable to the method of moments [6].5 Therefore, more is known about their number of solutions. For example, due to the inherent connection to linear algebra, the number of satisfying assignments of random k-XORSAT formulas is known to concentrate on a single value right up to the satisfiability threshold [11, 27, 43]. Furthermore, in random k-NAESAT, random graph colouring and several related problems the logarithm of the number of solutions superconcentrates, i.e., has only bounded fluctuations for constraint densities up to the so-called condensation threshold, a phase transition that shortly precedes the satisfiability threshold [12, 21, 44]. The same is true of random k-SAT instances with regular literal degrees [24]. A further example is the symmetric perceptron [1], where the number of solutions superconcentrates but the limiting distribution is a log-normal with bounded variance. Going beyond the condensation transition, Sly, Sun and Zhang [46] proved that the number of satisfying assignments of random regular k-NAESAT formulas matches the ‘1-step replica symmetry breaking’ prediction from physics. Apart from the superconcentration results for symmetric problems from [12, 24, 21, 44], the limiting distribution of the logarithm of the number of solutions has not been known in any random constraint satisfaction problem. In particular, Theorem 1.1 is the first central limit theorem for this quantity in any random CSP. We expect that the technique developed in the present work, particularly the use of two correlated random instances in combination with spatial mixing, can be extended to other problems. The present use of correlated instances is inspired by the work of Chen, Dey and Panchenko [19] on the p-spin model from mathematical physics, a generalisation of the famous Sherrington-Kirkpatrick model. That said, on a technical level the present use of correlated instances is quite different from the approach from [19]. Specifically, while here we construct correlated 2-CNFs that share a specific fraction of their clauses and employ a martingale central limit theorem, Chen, Dey and Panchenko com- bine a continuous interpolation of two mixed p-spin Hamiltonians with Stein’s method. A further line of work deals with central limit theorems for random optimisation problems. Cao [16] provided a general framework based on the ‘objective method’ [9]. Unfortunately, the conditions of Cao’s theorem tend to be unwieldy for MAX CSP problems with hard constraints. Recent work of Kreačič [34] and Glasgow, Kwan, Sah, Sawhney [31] on the matching number therefore instead resorts to the use of stochastic differential equations. A promising question for future work might be whether the present method of considering correlated instances might extend to random optimisation problems. Organisation. In the rest of the paper we carry out the strategy from Section 2 in detail. After some preliminaries in Section 4, we prove Proposition 2.3 in Section 5. Subsequently Section 6 deals with the proof of Proposition 2.6. The proof of Proposition 2.5 follows in Section 7. Moreover, Section 8 contains the proof of Proposition 2.8. Further, 5Formally, by ‘symmetry’ we mean that the empirical distribution of the marginals of random solutions converges to an atom; cf. [22]. 9 in Section 9 we prove Proposition 2.7 and Section 11 contains the proofs of Proposition 2.12 and Corollary 2.11. Finally, in Section 12 we complete the proof of Theorem 1.1. 4. PRELIMINARIES AND NOTATION 4.1. Boolean formulas. A 2-SAT formula or 2-CNF Φ consists of a finite set V (Φ) of propositional variables and another set F (Φ) of clauses. Unless specified otherwise, we assume that each clause contains two distinct variables. For a clause a ∈ F (Φ) we denote by ∂a = ∂Φa the set of variables that appear in clause a. Similarly, for a variable x ∈V (Φ) let ∂x = ∂Φx signify the set of clauses in which x appears. Thus, the formulaΦ induces a bipartite graph on variables and clauses, the so-called incidence graph ofΦ. Further, the shortest path metric on the incidence graph induces a metric on the variables and clauses ofΦ. Accordingly, for a variable or clause u let ∂ℓu = ∂ℓΦu be the set of all nodes at a distance precisely ℓ from u. Moreover, let ∂≤ℓu = ∂≤ℓΦ u be the sub-formula ofΦ obtained by deleting all clauses and variables at a distance greater than ℓ from u. In other words, ∂≤ℓu is the depth-ℓ neighbourhood of u. We encode the Boolean values ‘true’ and ‘false’ as ±1. Accordingly, let S(Φ) ⊆ {±1}V (Φ) be the set of satisfying assignments ofΦ and let Z (Φ) = |S(Φ)|. Further, sign(x, a) = signΦ(x, a) ∈ {±1} denotes the sign with which variable x appears in clause a, i.e., sign(x, a) = 1 if x appears in a positively and sign(x, a) = −1 if a contains the negation ¬x. Finally, for a literal l ∈ {x,¬x} we let |l | = x denote the underlying Boolean variable. Assuming S(Φ) ̸= ; let µΦ(σ) = 1{σ ∈ S(Φ)}/Z (Φ), (σ ∈ {±1}V (Φ)) (4.1) be the uniform distribution on S(Φ). We write σ = σΦ = (σΦ,x )x∈V (Φ) ∈ {±1}V (Φ) for a sample from µΦ, i.e., a uni- formly random satisfying assignment ofΦ. In contrast to k-SAT for k ≥ 3, the 2-SAT problem can be solved in polynomial time. This is because a 2-SAT instance is unsatisfiable if and only if it contains a peculiar sub-formula called a bicycle. To be precise, let Φ be a CNF with clauses of length one or two. A bicycle of Φ is an alternating sequence l0, a1, l1, a2, . . . , ak , lk of literals l0, . . . , lk and clauses a1, . . . , ak ∈ F (Φ) such that BIC1: l0 = lk , BIC2: li =¬l0 for some 0 < i < k and BIC3: ai ≡¬li−1 ∨ li ≡ li−1 → li . (Observe that a clause a comprising only a single literal l is logically equivalent to l∨l ≡¬l → l .) Hence, the bicycle consists of clauses that are logically equivalent to a chain of implications l0 →¬l0 → l0. Fact 4.1 ([10]). A CNFΦwith clauses of lengths one or two is unsatisfiable iffΦ contains a bicycle. 4.2. Unit Clause Propagation. The PUC algorithm (Algorithm 1) takes as input a CNF Φ along with an initial set L0 of literals. PUC outputs a set L =L (Φ,L0) ⊇L0 of literals. Let V (Φ,L0) = {|l | : l ∈L (Φ,L0)} be the set of underlying variables. In addition to L (Φ,L0), PUC also outputs a partial assignment σ=σΦ,L0 : V (Φ,L0) → {0,±1} that sets each x ∈ V either to a truth value ±1 or to the dummy value 0. Let V0(Φ,L0) = {x ∈ V (Φ,L0) :σΦ,L0,x = 0} be the set of variables that receive the dummy value. Finally, the algorithm identifies a set C (Φ,L0) of conflict clauses, i.e., clauses a such that ∂a ⊆ V0(Φ,L0). We make a note of a few basic facts about PUC. These remarks apply to any CNF Φ with clauses of length at most two. To get started, we say that a literal l ′ is implication-reachable from another literal l if there exists an alternating sequence l = l0, a1, l1, . . . , ak , lk = l ′ of literals li and clauses ai of Φ such that ai ≡¬li−1 ∨ li ≡ li−1 → li for all 1 ≤ i ≤ k. We call this sequence an implication chain from l to l ′. Observe that a unit clause (clause of length one) comprising a single literal l is equivalent to the implication ¬l → l ≡ l ∨ l . Furthermore, if l ′ is implication- reachable from l , then ¬l is implication-reachable from ¬l ′. Indeed, if l = l0, a1, l1, . . . , ak , lk = l ′ is an implication chain from l to l ′, then its contraposition ¬l ′ =¬lk , ak ,¬lk−1, . . . ,¬l1, a1,¬l 10 is an implication chain from ¬l ′ to ¬l . Lemma 4.2. Let Φ be a CNF with clauses of length at most two and let L0 be a set of literals of Φ. Then L (Φ,L0) is the set of all literals l ′ that are implication-reachable from a literal l ∈L0. Proof. This is an easy induction on the length of the shortest implication chain from l to l ′. □ An immediate consequence of Lemma 4.2 is that the order in which PUC proceeds is irrelevant. Finally, for the sake of completeness, we carry out the proof of Fact 2.2. Proof of Fact 2.2. Fix some order l1, . . . , lν of the literals {x,¬x : x ∈ V (Φ)}. Let σi be the assignments produced by PUC on input (Φ, {li }). We construct an assignment σ : V (Φ) → {0,±1} by proceeding as follows for i = 1, . . . ,ν. For each variable x such that {x,¬x}∩L (Φ, li ) \ ⋃ 1≤h 0. We tacitly assume that 0 < d < 2, i.e., that we are in the satisfiable regime. In the following sections we will need estimates of the sizes of the sets |V (Φ,L )|, |C (Φ,L )| produced by PUC on the random formula Φ for singletons L . Thus, suppose we start the PUC algorithm from an initial literal L = {l }. Since the ensuing chain of implications traced by PUC is stochastically dominated by a sub-critical branching process (for d < 2), we obtain the following bound. Lemma 4.3 ([5, Claim 6.8]). For any literal l and every t > 8/(2−d) we have P [|V (Φ, {l })| > t ] ≤ (2+o(1))exp(−d t/40). Corollary 4.4. With probability 1−o(n−2) we have max 1≤i≤n |V (Φ, {xi })|+ |V (Φ, {¬xi })| ≤ log2 n. Proof. This is an immediate consequence of Lemma 4.3. □ Finally, the following statement estimates the probability that a random formula is unsatisfiable. Lemma 4.5. We have P [ Φ is unsatisfiable ]≤ no(1)−1. Proof. This follows from Fact 4.1 and [5, Claim 6.9]. □ Recall the Galton-Watson tree T from Section 2.2. The following lemma shows that T mimics the local structure of the ‘plain’ random formula with n′ variables and m′ independent random 2-clauses. Also recall that ∂≤2ℓ Φ x denotes the sub-formula ofΦ comprising all clauses and variables at distance at most 2ℓ from x. Lemma 4.6 ([5, p. 15]). Let ℓ≥ 0 be an integer and let T be a possible outcome of T (2ℓ). LetΦ0 be a random 2-CNF with n′ ∼ n variables and m′ ∼ dn/2 clauses. Then w.h.p. the number N (2ℓ)(T,Φ0) of variables xi of Φ0 such that ∂≤2ℓ Φ0 xi ∼= T 2ℓ satisfies N (2ℓ)(T,Φ0) = nP [ T (2ℓ) ∼= T ] +o(n). As a final preparation we need an upper bound on the maximum variable degree. Lemma 4.7. With probability 1−o(n−10) the degree of any variable node xi , i = 1, . . . ,n inΦ is bounded by log2 n. Proof. The number of clauses that contain a given variable xi has distribution Bin(m,2/n). Therefore, the assertion follows from the Chernoff bound. □ Corollary 4.8. With probability 1−o(n−10) the degree of any variable node xi , i = 1, . . . ,n in (Φ1,Φ2) is bounded by log2 n. Proof. Since Φ1,Φ2 separately are distributed as Φ, the assertion follows from Lemma 4.7 and the union bound. □ 11 4.4. Convergence of probability measures. For a measurable subsetΩ of Euclidean space Rk we let P (Ω) denote the space of all probability distributions on Ω equipped with the Borel σ-algebra. Moreover, for p ≥ 1 we define Wp (Ω) to be the set of all µ ∈P (Ω) such that ∫ Ω ∥x∥p 2 dµ(x) <∞. We equip Wp (Ω) with the Wasserstein metric Wp (µ,µ′) = inf X ,X ′E [∥X −X ′∥p 2 ]1/p (µ,µ′ ∈Wp (Ω)), (4.2) where the infimum is taken over all pairs of random variables X , X ′ that are defined on some common probability space such that X has distribution µ and X ′ has distribution µ′. The infimum in (4.2) is attained for any µ,µ′. Random vectors X , X ′ for which the infimum is attained are called optimal couplings. Such optimal couplings exist for all µ,µ′ [14]. The spaces (Wp (Ω),Wp ) are complete metric spaces [14]. Finally, convergence in (Wp (Ω),Wp ) implies weak convergence of the corresponding probability measures. For a measure ρ ∈P (Ω) and a measurable function f :Ω→Ω′ fromΩ to another probability spaceΩ′ we denote by f (ρ) the pushforward measure of ρ. Thus, the measure f (ρ) that assigns mass ρ( f −1(A)) to measurable A ⊆Ω′. Throughout the paper we let t :R2 → (0,1)2,t ( x1 x2 ) = ( 1+tanh(x1/2) 2 1+tanh(x2/2) 2 ) , l : (0,1)2 →R2, l ( x1 x2 ) = ( log x1 1−x1 log x2 1−x2 ) . 5. PROOF OF PROPOSITION 2.3 In this section we estimate the difference between the number of satisfying assignments of the pruned random formula Φ̂ and the original formula Φ. We begin with a basic observation about the Unit Clause Propagation algorithm, and then estimate the number of clauses that the pruning process removes. Apart from proving Propo- sition 2.3, the considerations in this section also pave the way for the proof of the variance formula in Section 11. 5.1. Tracing Unit Clause Propagation. For a 2-CNF Φ and a set of literals L0 consider a set of conflict clauses C = C (Φ,L0) that PUC produces along with a set V = V (Φ,L0) of conflict variables. Let Φ−C be the formula obtained fromΦ by deleting the clauses from C . Clearly Z (Φ) ≤ Z (Φ−C ). Conversely, the following lemma puts a bound on how much bigger Z (Φ−C ) may be. Lemma 5.1. Assume thatΦ is a satisfiable 2-CNF. For any set L0 of literals we have Z (Φ−C (Φ,L0)) ≤ 2|V (Φ,L0)|·1{C (Φ,L0 )̸=;}Z (Φ). (5.1) Towards the proof of Lemma 5.1 let L = L (Φ,L0) be the final set of literals that PUC produces. Moreover, let σ : V → {0,±1} be the function that PUC outputs and let V0 = {x ∈ V :σx = 0}. Further, let Φ0 be a CNF with variable set V that contains the following clauses: (i) any clause a ∈ F (Φ) with ∂a ⊆ V , (ii) a unit clause l for every literal l with |l | ∈ V such thatΦ contains a clause a ≡ l ∨ l ′ with |l ′| ̸∈ V . Thus,Φ0 contains clauses of length one or two. Claim 5.2. The formulaΦ0 possesses a satisfying assignment τ such that τx =σx for all x ∈ V \V0. Proof. Obtain a formula Φ1 by adding to Φ0 a unit clause x for every variable x ∈ V with σx = 1 and a unit clause ¬x for every x ∈ V with σx =−1. Then we just need to show thatΦ1 is satisfiable. Assume otherwise. Then by Fact 4.1 Φ1 contains a bicycle l0, a1, l1, a2, . . . , ak , lk . This bicycle is logically equiva- lent to an implication chain l0 → l1 →···→¬l0 →···→ lk−1 → lk = l0. (5.2) The contraposition of this chain reads ¬l0 =¬lk →¬lk−1 →···→ l0 →···→¬l1 →¬l0. (5.3) SinceΦ is satisfiable, Fact 4.1 shows that the bicycle (5.2) cannot be contained inΦ. Therefore, the bicycle contains a unit clause li ∈ F (Φ1) \ F (Φ) for some 1 ≤ i ≤ k. Hence, the constructions ofΦ0 andΦ1 ensure that li ∈L (Φ,L0). Indeed, letting U be the the set of all literals li that appear in (5.2) as unit clauses, we obtain U ⊆L (Φ,L0). We claim that in fact l0, . . . , lk ∈ L (Φ,L0). To see this, pick any 0 ≤ j ≤ k such that l j does not appear as a unit clause in Φ1. Define l−i = lk−i for 0 ≤ i < k and let 1−k ≤ i < j be the largest index such that li ∈ U . Then 12 Φ contains the implication chain li → ··· → l j . Therefore, Lemma 4.2 implies that l j ∈ L (Φ,L0). Analogously, considering the contraposition (5.3), we conclude that the negations of the literals l0, . . . , lk belong to L (Φ,L0). In summary, l0,¬l0, . . . , lk ,¬lk ∈L (Φ,L0). (5.4) But (5.4) implies that |l0|, . . . , |lk | ∈ V0. Consequently, none of these literals belongs to a unit clause u ∈ F (Φ1) \ F (Φ0). Furthermore, none of the literals li ,¬li belongs to a unit clause a ∈ F (Φ0) \ F (Φ). This is because if Φ contains a clause li ∨ l ′ or ¬li ∨ l ′ and li ,¬li ∈L (Φ,L0), then PUC added l ′ to L (Φ,C0) as well. Thus, we conclude that the bicycle (5.2) consists of clauses ofΦ only. But by Fact 4.1 this contradicts the fact thatΦ is satisfiable. □ Proof of Lemma 5.1. Clearly, if C (Φ,L0) =;, the statement is true. Hence, assume that C (Φ,L0) ̸= ; and let τ be a satisfying assignment ofΦ0 from Claim 5.2. Consider a satisfying assignment χ ofΦ−C and let χ′ : V (Φ)\V → {±1} be the restriction of χ to V (Φ) \V . We extend χ′ to a satisfying assignment χ′′ ofΦ by letting χ′′x = 1{x ∈ V }τx + 1{x ̸∈ V }χ′x ; clearly, χ′′ satisfies all clauses a such that ∂a ∩V =;, because all these clauses are contained in Φ−C . Moreover, χ′′ satisfies all a such that ∂a ⊆ V , because these clauses belong to Φ0. Further, if a = l ∨ l ′ is a clause such that |l | ∈ V but |l ′| ̸∈ V , then τ|l | = σ|l |. Since |l ′| ̸∈ V , this means that σx = sign(|l |, a), as otherwise PUC would have added l ′ to L . Therefore, χ′′|l | = τ|l | =σ|l | satisfies a. Since the map χ 7→ χ′ only discards the values of the variables in V , we obtain the bound (5.1). □ 5.2. Cycles in random formulas. To prove Proposition 2.3 we need a good estimate of the total number of clauses that will be removed fromΦ to obtain Φ̂. This estimate is provided by the following lemma. Lemma 5.3. Fix any δ > 0. With probability 1−o(n−1) the number of literals l such that C (Φ, {l }) ̸= ; is smaller than nδ. Proof. Let N be the number of literals l such that C (Φ, {l }) ̸= ;. We are going to show that for any fixed (i.e., n-independent) ℓ≥ 1 for large enough n we have, E [ ℓ∏ i=1 (N − i +1) ] ≤ ( ℓ log3 n )ℓ . (5.5) Providing ℓ≥ 2/δ and n is sufficiently large, Markov’s inequality then shows that P [ N ≥ nδ ] ≤P [ ℓ∏ i=1 (N − i +1) ≥ (nδ/2)ℓ ] ≤ ( 2ℓ log3 n nδ )ℓ = o(n−1), which implies the assertion. Thus, we are left to prove (5.5). By symmetry it suffices to bound the probability of the event E= ℓ⋂ i=1 {C (Φ, {xi }) ̸= ;} that PUC will produce at least one conflict clause from each of the literals x1, . . . , xℓ; then E [ ℓ∏ i=1 (N − i +1) ] ≤ (2n)ℓP [E] . (5.6) In order to estimate the probability ofEwe are going to launch PUC from the initial set L = {x1, . . . , xℓ}. While the order in which the literals and clauses are processed does not affect the ultimate outcome of PUC, for the present analysis we assume that PUC processes the literals one at a time, each time pursuing all the clauses l ∨¬l ′ that contain the negation of a specific l ′. We also presume that the literals are processed in the same order as they get inserted into the set L . In other words, PUC proceeds in breadth-first-search order. Let Ht be the history of the execution of PUC up to and including the point where the first t literals and their adjacent clauses have been explored. Formally, Ht is the σ-algebra generated by these first t literals that get added to L and their adjacent clauses. 13 Lemma 4.3 implies that with probability 1−o(n−1) the set L returned by PUC has size at most L = ℓ log2 n. Let Et be the event that at time t we explored a clause that contains two variables from L and |L | ≤ L. Moreover, let S =∑ t 1{Et }. Let 0 < t1 < . . . < tℓ ≤ L be distinct time steps. Then P [E] ≤P [S ≥ ℓ] ≤ ∑ t1,...,tℓ P [ ℓ⋂ i=1 Eti ] = ∑ t1,...,tℓ ℓ∏ i=1 P [ Eti ∣∣∣ i−1⋂ j=1 Et j ] . (5.7) To bound the r.h.s. of (5.7) we will estimate the probability of Et+1 given the history Ht of the process up to time t , showing that for all t ≥ 0, P [Et+1|Ht ] ≤ L n . (5.8) In fact the probability that in step t +1 we will run into already discovered variable is bounded by the probability that the literal explored during that step shares a clause with an already explored variable, which is bounded by Lm/n ≤ L/n; for if more than L literals have been already explored the event Et+1 does not occur by definition. Finally, because the event Et is Ht -measurable, (5.8) implies ℓ∏ i=1 P [ Eti ∣∣∣ i−1⋂ j=1 Et j ] ≤ ( L n )ℓ . Thus (5.7) gives P [E] ≤ (L ℓ ) Lℓ nℓ ≤ ( eL2 ℓn )ℓ , which together with (5.6) implies (5.5). □ Proof of Proposition 2.3. Lemma 4.5 shows that P [Z (Φ) = 0] = o(n−1/2). Thus, we may condition on the event that Φ is satisfiable. Furthermore, Lemma 5.1 shows that given thatΦ is satisfiable we have log Z (Φ̂)− log Z (Φ) ≤ n∑ i=1 1{C (Φ, {xi }) ̸= ;}|V (Φ, {xi })|+ 1{C (Φ, {¬xi }) ̸= ;}|V (Φ, {¬xi })|. (5.9) Finally, Corollary 4.4 and Lemma 5.3 (applied with δ< 1/3) imply that with probability 1−o(n−1/2), n∑ i=1 1{C (Φ, {xi }) ̸= ;}|V (Φ, {xi })|+ 1{C (Φ, {¬xi }) ̸= ;}|V (Φ, {¬xi })| (5.10) ≤ (|{x ∈Vn : C (Φ, {x}) ̸= ;}|+ |{x ∈Vn : C (Φ, {¬x}) ̸= ;}|) max 1≤i≤n |V (Φ, {xi })|+ |V (Φ, {¬xi })| = o(n1/3). Thus, the assertion follows from (5.9)–(5.10). □ 6. PROOF OF PROPOSITION 2.6 The proof of Proposition 2.6 is based on a combination of a coupling and a second moment argument. As a first step we observe that we do not need to worry about trees of very high maximum degree. Lemma 6.1. For any ε > 0, ℓ ≥ 0 there exists L > 0 such that for all t ∈ [0,1] with probability at least 1−ε the tree T ⊗, (2ℓ) has maximum degree less than L. Proof. The construction of the tree T ⊗, (2ℓ) in Section 2.6 ensures that every variable node has a Poisson number of clauses as offspring. The mean of this Poisson variable is always bounded by 2d . Hence, Bennett’s inequality shows that for any L > 2d the probability that a specific variable has more than L offspring is bounded by exp(−L2/(4d + L)). Thus, choosing L sufficiently large so that ε> L2ℓ exp(−L2/(4d +L)) and applying the union bound, we obtain the assertion (combined with the chain rule starting from the root). □ Thus, in the following we confine ourselves to trees T with a maximum degree bounded by a large enough number L. First we are going to count the number of copies of such trees T in (Φ1(M , M ′),Φ2(M , M ′)) via the method of moments. The following lemma estimates the first moment. Lemma 6.2. For any fixed integers L,ℓ, any possible outcome T of T ⊗, (2ℓ) of maximum degree at most L and any M ∼ tdn/2, M ′ ∼ (1− t )dn/2 we have E[N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))] ∼ nP [ T ⊗, (2ℓ) ∼= T ] . 14 Proof. We proceed by induction on ℓ. In the case ℓ= 0 the tree T consists of nothing but the root, so that there is nothing to show. Hence, let ℓ≥ 1. Let λ0,s1,s2 be the number of shared children of the root o of T where o appears with sign s1 ∈ {−1,+1} and the other variable appears with sign s2 ∈ {−1,+1}. Also let λh,s1,s2 be the number of h-distinct children of o (h = 1,2), where o appears with sign s1 ∈ {−1,+1} and the other variable appears with sign s2 ∈ {−1,+1}. Consider the event E that variable x1 is a 2ℓ-instance of T . Further, consider the event R that x1 occurs in preciselyλ0,s1,s2 clauses among a1, . . . , aM , where the sign of x1 is s1 and the sign of the other variable is s2, precisely inλ1,s1,s2 clauses among a ′ 1, . . . , a ′ M ′ , where the sign of x1 is s1 and the sign of the other variable is s2 and precisely in λ2,s1,s2 clauses among a ′′ 1 , . . . , a ′′ M ′ , where the sign of x1 is s1 and the sign of the other variable is s2. Since M ∼ d tn/2 and λh,±1,±1 ≤ L for h ∈ {0,1,2} we have P [R] ∼ ∏ s1,s2∈{±1} P [ Bin(M , (2n)−1) =λ0,s1,s2 ] P [ Bin(m −M , (2n)−1) =λ1,s1,s2 ] P [ Bin(m −M , (2n)−1) =λ2,s1,s2 ] ∼ ∏ s1,s2∈{±1} P [ Po(d t/4) =λ0,s1,s2 ] P [ Po(d(1− t )/4) =λ1,s1,s2 ] P [ Po(d(1− t )/4) =λ2,s1,s2 ] . (6.1) Letλh =λh,−1,−1+λh,−1,+1+λh,+1,−1+λh,+1,+1 for h ∈ {0,1,2}. GivenR let (v 0,i )1≤i≤λ0 be the second variables (other than x1) contained in neighbours of x1 among a1, . . . , aM . Analogously, let (v 1,i )1≤i≤λh be the second variables contained in neighbours of x1 among a ′ 1, . . . , a ′ M and (v 2,i )1≤i≤λh be the second variables contained in neighbours of x1 among a ′′ 1 , . . . , a ′′ M . By Φ− h , h = 1,2 define a random formula obtained from Φh(M , M ′) by deleting x1 and its adjacent clauses. Let F be the event that the distance between any two of v 0,1, . . . , v 2,λ2 in both Φ− 1 and Φ− 2 is at least 2ℓ. A routine union bound argument shows that P [F] = 1−o(1). (6.2) Further, let T0,i be the sub-tree obtained from T comprising the i -th shared grandchild of o and its descendants. Consider the event H0 that R and F occur and v 0,i is a (2ℓ− 2)-instance of T0,i in (Φ− 1 ,Φ− 2 ) for any i = 1, . . . ,λ0. Since the depth and the maximum degree of T are bounded, by induction we obtain P [ v 0,i is a (2ℓ−2)-instance of T0,i in (Φ− 1 ,Φ− 2 ) ]=P [ T ⊗, (2ℓ−2) ∼= T0,i ] +o(1). for i = 1, . . . ,λ0. Thus P [H0|F∩R] =P [ λ0⋂ i=1 v 0,i is a (2ℓ−2)-instance of T0,i in (Φ− 1 ,Φ− 2 ) ] = λ0∏ i=0 P [ T ⊗, (2ℓ−2) ∼= T0,i ] +o(1). (6.3) Analogously, let Th,i be the sub-tree of T pending on the i -th h-distinct grandchild of the root. Consider the events Hh that F and R occur and that the depth (2ℓ−2)-neighbourhood of v h,i is isomorphic to Th,i in Φ− h for any i = 1, . . . ,λh , h = 1,2. Since v 1,i and v 2, j are chosen independently for all i and j , using the same embedding process as above in combination with Lemma 4.6 we obtain P [H1|F∩R∩H0] = λ1∏ i=1 P [ T (2ℓ−2) ∼= T1,i ] +o(1) (6.4) P [H2|F∩R∩H0 ∩H1] = λ2∏ i=1 P [ T (2ℓ−2) ∼= T2,i ] +o(1). (6.5) Finally, combining (6.1)–(6.5) we obtain P [E] ∼P [R] λ0∏ i=1 P [ T ⊗, (2ℓ−2) ∼= T0,i ] λ1∏ i=1 P [ T (2ℓ−2) ∼= T1,i ] λ2∏ i=1 P [ T (2ℓ−2) ∼= T2,i ] ∼P [ T ⊗, (2ℓ−2) ∼= T ] . As E[N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))] = nP [E] the assertion follows from the linearity of expectation. □ We also need an estimate of the second moment of N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′))). Lemma 6.3. For any fixed integers L,ℓ and any possible outcome T of T (2ℓ) of maximum degree at most L and any M ∼ tdn/2, M ′ ∼ (1− t )dn/2 we have E[N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))2] ∼ n2P [ T ⊗, (2ℓ) ∼= T ]2 . 15 Proof. Consider the event Ei j that both variables xi and x j are 2ℓ-instances of T for i , j = 1, . . . ,n. Now we can rewrite the second moment as follows. E [ N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))2 ] = n∑ i , j=1 P [ Ei j ]= nP [E11]+n(n −1)P [E12] . (6.6) From Lemma 6.2 we know that P [E11] = P [ T ⊗, (2ℓ) ∼= T ]+o(1), so we only need to estimate P [E12]. Let F be such event that the distance between x1 and x2 is at least 2ℓ. A routine union bound argument shows that P [F] = 1−o(1). This fact completes the proof of Lemma 6.3. P [E12] =P [E12 |F]+o(1) =P [x1 is a 2ℓ-instance of T |F] ·P [x2 is a 2ℓ-instance of T |F]+o(1) =P [ T ⊗, (2ℓ) ∼= T ]2 +o(1). (6.7) Thus the assertion follows from (6.6) and (6.7). □ Proof of Proposition 2.6. From Lemmas 6.1–6.3 in combination with Chebyshev’s inequality it follows that for any ℓ≥ 0,T w.h.p. N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′))) ∼ nP [ T ⊗, (2ℓ) ∼= T ] . (6.8) We need to extend this to the pruned formulas (Φ̂1(M , M ′),Φ̂2(M , M ′)). Let N (2ℓ),+(T, (Φ1,Φ2)) be the number of variable nodes x such that x is an 2ℓ-instance of T in (Φ̂1,Φ̂2) but not in (Φ1,Φ2). Similarly, let N (2ℓ),−(T, (Φ1,Φ2)) be the number of variable nodes x such that they are 2ℓ-instances of T in (Φ1,Φ2) but not in (Φ̂1,Φ̂2). Then N (2ℓ)(T, (Φ̂1,Φ̂2)) = N (2ℓ)(T, (Φ1,Φ2))+N (2ℓ),+(T, (Φ1,Φ2))−N (2ℓ),−(T, (Φ1,Φ2)). (6.9) Note that both N (2ℓ),+(T, (Φ1,Φ2)) and N (2ℓ),−(T, (Φ1,Φ2)) do not exceed the number of variable nodes x whose depth-2ℓ neighbourhood in (Φ1,Φ2) contains at least one clause from ⋃ l∈{xi ,¬xi , 1≤i≤n} C (Φ, {l }). Moreover, Lem- mas 4.4 and 5.3 show that w.h.p. ∣∣∣∣∣ ⋃ l∈{xi ,¬xi , 1≤i≤n} C (Φ, {l }) ∣∣∣∣∣≤ n0.1. (6.10) It follows from Lemma 4.8 that w.h.p. the 2ℓ-depth neighbourhood of each vertex consists of no more then log4ℓ+4 n vertices. Combining this fact with (6.10) we conclude that N (2ℓ),+(T, (Φ1,Φ2)) ≤ n0.1 log4ℓ+4 n, N (2ℓ),−(T, (Φ1,Φ2)) ≤ n0.1 log4ℓ+4 n (6.11) w.h.p. Finally, the assertion follows from (6.8), (6.9) and (6.11). □ 7. PROOF OF PROPOSITION 2.5 We will deal with Z (Φ̂h (M ,m−M)) Z (Φ̂h (M−1,m−M)) in detail; the arguments for the other two quotients are similar. Lemma 7.1. Let h ∈ {1,2}. W.h.p. Φ̂h(M ,m −M) is obtained from Φ̂h(M −1,m −M) by adding a clause aM . Proof. Let l , l ′ be the constituent literals of aM , i.e., aM = l ∨ l ′. Moreover, let Q be the event that Φ̂h(M ,m −M) does not result from Φ̂h(M−1,m−M) by adding clause aM . Thus, on the event Q the additional clause aM triggers the pruning of clauses that do not get pruned from Φ̂(M −1,m −M) (including potentially aM itself). We are going to construct events E,E′ whose probabilities are easy to estimate such that Q⊆E∪E′. (7.1) To this end, for a literal l let Ll =L (Φh(M −1,m−M), {l }) be the final set of literals that PUC(Φh(M −1,m−M), {l }) produces. Call l a trigger of ¬l if ¬l ∈Ll . Further, let E be the event that there exists a trigger l of ¬l such that E1: C (Φ(M −1,m −M), {l })∪C (Φ(M −1,m −M), {l , l ′}) ̸= ;, or E2: ¬l ′ ∈⋃ λ∈{l ,l ′} Lλ. 16 Define E′ analogously with the roles of l , l ′ swapped. We claim that these events E,E′ satisfy (7.1). To see this, assume that neither E nor E′ occurs. We claim that then C (Φh(M −1,m −M), {l }) =C (Φh(M ,m −M), {l }) (7.2) for all literals l ; if so, then clearly Q does not occur either. Thus, assume that (7.2) is false and that l is a literal such that C (Φh(M −1,m −M), {l }) ̸=C (Φh(M ,m −M), {l }). (7.3) Then l must be a trigger of ¬l or of ¬l ′; for otherwise the presence of the extra clause aM has no impact on the set of conflict clauses. Hence, suppose that l is a trigger of ¬l . Then the presence of clause aM in Φh(M ,m −M) causes PUC to add l ′ to L (Φh(M ,m −M), {l }). Since the event E does not occur, neither does E1 and we conclude that C (Φh(M−1,m−M), {l }) =C (Φh(M−1,m−M), {l , l ′}) =;. Hence, none of the clauses a ∈ F (Φh(M−1,m−M)) is a conflict clause and thus (7.3) implies that {aM } =C (Φh(M ,m −M), {l }) \C (Φh(M −1,m −M), {l }). But this is not possible either. For if aM ∈C (Φh(M ,m−M), {l }), then Lemma 4.2 shows that one of l , l , l ′ is a trigger of ¬l ′, and thus E2 occurs. Thus, we obtain (7.2). To complete the proof we are going to show that P [E] ,P [ E′]= o(1). (7.4) Indeed, Lemma 5.3 shows that the number of literals l such that C (Φh(M −1,m −M), {l }) ̸= ; can be bounded by n0.1 w.h.p. Furthermore, Corollary 4.4 shows that |V (Φh(M −1,m −M), {l })| ≤ log2 n w.h.p. for all l . Hence, w.h.p. the total number of literalsλ that have a trigger l such that C (Φh(M−1,m−M), {l }) ̸= ; is bounded by O(n0.1 log2 n). Consequently, the probability that the random literal ¬l possesses such a trigger is bounded by O(n−0.9 log2 n). Moreover, since l ′ is a random literal as well, Lemma 5.3 shows that P [ C (Φ(M −1,m −M), {l ′}) =;]= 1−O(n−0.9). Additionally, w.h.p. for any trigger l of ¬l we have V (Φ(M −1,m −M), {l })∩V (Φ(M −1,m −M), {l ′}) =;, because l , l ′ are drawn independenly ofΦ(M −1,m −M). Similarly, P [¬l ′ ∈L (Φ(M −1,m −M), {l }) ]= o(1) and P [¬l ′ ∈L (Φ(M −1,m −M), {l ′}) ]= o(1). Combining these estimates, we conclude that P [E] = o(1). By symmetry, the same estimate holds for E′. Thus, we obtain (7.4). Finally, the assertion follows from (7.1) and (7.4). □ Corollary 7.2. Let h ∈ {1,2}. W.h.p. we have Z (Φ̂h(M ,m −M)) Z (Φ̂h(M −1,m −M)) =µΦ̂h (M−1,m−M) ({σ |= aM }) . Proof. From Lemma 7.1 we know that w.h.p. Z (Φ̂h(M ,m −M)) = Z (Φ̂(M −1,m −M)+aM ). (7.5) Assuming that (7.5) is correct, Z (Φ̂h(M ,m−M)) equals the number of satisfying assignments of Φ̂h(M −1,m−M) that also happen to satisfy aM . □ Additionally, we need the following asymptotic independence property, known as ‘replica symmetry’ in physics parlance. Lemma 7.3. Let h ∈ {1,2}. For all s, s′ ∈ {±1} we have 1 n2 n∑ i , j=1 E ∣∣∣µΦ̂h (M−1,m−M)({σxi = s,σx j = s′})−µΦ̂h (M−1,m−M)({σxi = s})µΦ̂h (M−1,m−M)({σx j = s′}) ∣∣∣= o(1). Proof. We adapt an argument from [40] to the present setting. By exchangeability it suffices to prove that E ∣∣∣µΦ̂h (M−1,m−M)({σx1 = s,σx2 = s′})−µΦ̂h (M−1,m−M)({σx1 = s})µΦ̂h (M−1,m−M)({σx2 = s′}) ∣∣∣= o(1). The proof rests on the Gibbs uniqueness property. Indeed, Proposition 2.6 shows that for any fixed ℓ the depth-2ℓ neighbourhood ∂≤2ℓxi of xi in Φ̂h(M −1,m −M) is within total variation distance o(1) of the Galton-Watson tree T (2ℓ) h . Furthermore, the distribution of T (2ℓ) h by itself is identical to the distribution of the Galton-Watson tree T (2ℓ). 17 Additionally, Proposition 2.1 shows that T (2ℓ) enjoys the Gibbs uniqueness property (2.6). Consequently, taking ℓ= ℓ(n) →∞ sufficiently slowly as n →∞, we see that w.h.p. ∑ s∈{±1} max κ∈S(Φ̂h (M−1,m−M)) ∣∣∣µΦ̂h (M−1,m−M)({σx1 = s | σ∂ℓx1 = κ∂2ℓx1 })−µΦ̂h (M−1,m−M)({σx1 = s}) ∣∣∣= o(1). (7.6) Furthermore, providing ℓ= ℓ(n) →∞ slowly enough, the distance between x1, x2 exceeds 4ℓw.h.p. In this case, (7.6) gives µΦ̂h (M ,m−M)({σx1 = s,σx2 = s′}) =µΦ̂h (M ,m−M)({σx1 = s | σx2 = s′}) ·µΦ̂h (M ,m−M)({σx2 = s′}) =µΦ̂h (M ,m−M)({σx2 = s′}) · ∑ κ∈{±1}∂ 2ℓx1 µΦ̂h (M ,m−M)({σx1 = s | σ∂2ℓx1 = κ,σx2 = s′}) ·µΦ̂h (M ,m−M)({σ∂2ℓx1 = κ | σx2 = s′}) =µΦ̂h (M ,m−M)({σx2 = s′}) · ∑ κ∈{±1}∂ 2ℓx1 µΦ̂h (M ,m−M)({σx1 = s | σ∂2ℓx1 = κ}) ·µΦ̂h (M ,m−M)({σ∂2ℓx1 = κ | σx2 = s′}) =µΦ̂h (M−1,m−M)({σx1 = s})µΦ̂h (M−1,m−M)({σx2 = s′}) · (1+o(1)), (7.7) as claimed. □ Proof of Proposition 2.5. The proposition follows from Corollary 7.2 and Lemma 7.3. □ 8. PROOF OF PROPOSITION 2.8 In this section, we prove Proposition 2.8 via a contraction argument. For this, recall the operatorlogBP⊗d ,t from (1.3). For notational convenience we let V =   ∑d i=1 si log ( 1+r i tanh(ξρ,i ,1/2) 2 ) +∑d ′ i=1 s ′i log ( 1+r ′ i tanh(ξ′ρ,i ,1/2) 2 ) ∑d i=1 si log ( 1+r i tanh(ξρ,i ,2/2) 2 ) +∑d ′′ i=1 s ′′i log ( 1+r ′′ i tanh(ξ′′ρ,i ,2/2) 2 )   . The main step towards Proposition 2.8 is the following lemma: Lemma 8.1. logBP⊗d ,t is a contraction on the space (W2(R2),W2) for all 0 < d < 2 and 0 ≤ t ≤ 1. Indeed, it immediately follows from Lemma 8.1 and Banach’s fixed point theorem that for every d ∈ (0,2) and t ∈ [0,1], there is a unique ρd ,t ∈ W2(R2) with ρd ,t = logBP⊗d ,t (ρd ,t ), and that for any ρ ∈ W2(R2) and ℓ→ ∞, the ℓ-fold application of logBP⊗d ,t to ρ converges to ρd ,t in Wasserstein distance. We prove Lemma 8.1 in the following subsection, and conclude the section with the proof of Proposition 2.8. 8.1. Proof of Lemma 8.1. We first check that the operator logBP⊗d ,t is well-defined in the sense that it maps the space (W2(R2),W2) to itself. Claim 8.2. The operator logBP⊗d ,t maps the space (W2(R2),W2) to itself. Proof. Let ρ ∈ (W2(R2),W2) and V be a random vector with distribution logBP⊗d ,t (ρ). By the definition of logBP⊗d ,t , E [∥V ∥2 2 ]=E [( d∑ i=1 si log (1+ r i tanh(ξρ,i ,1/2) 2 ) + d ′∑ i=1 s ′i log ( 1+ r ′ i tanh(ξ′ρ,i ,1/2) 2 ))2 + ( d∑ i=1 si log (1+ r i tanh(ξρ,i ,2/2) 2 ) + d ′′∑ i=1 s ′′i log ( 1+ r ′′ i tanh(ξ′′ρ,i ,2/2) 2 ))2 ] . (8.1) By the independence of the random variables (si )i≥1, (s ′i )i≥1 and (s ′′i )i≥1 from everything else, all cross-terms in the evaluation of the squares in (8.1) vanish, e.g. for i ̸= j , E [ si s j log (1+ r i tanh(ξρ,i ,1/2) 2 ) log (1+ r j tanh(ξρ, j ,1/2) 2 )] = E [si ]E [ s j log (1+ r i tanh(ξρ,i ,1/2) 2 ) log (1+ r j tanh(ξρ, j ,1/2) 2 )] = 0. 18 As a consequence, (8.1) in combination with the independence of the Poisson random variables gives that E [∥V ∥2 2 ]=E [ td log2 (1+ r 1 tanh(ξρ,1,1/2) 2 ) + (1− t )d log2 ( 1+ r ′ 1 tanh(ξ′ρ,1,1/2) 2 ) + td log2 (1+ r 1 tanh(ξρ,1,2/2) 2 ) + (1− t )d log2 ( 1+ r ′′ 1 tanh(ξ′′ρ,1,2/2) 2 )] . (8.2) Finally, conditioning on the value of r 1 and an application of the fundamental theorem of calculus, followed by the Cauchy-Schwarz inequality give E [ log2 (1+ r 1 tanh(ξρ,1,1/2) 2 )] =1 2 E [ log2 (1+ tanh(ξρ,1,1/2) 2 ) + log2 (1− tanh(ξρ,1,1/2) 2 )] =1 2 E [(∫ ξρ,1,1 0 1− tanh(x/2) 2 dx − log2 )2 + (∫ ξρ,1,1 0 1+ tanh(x/2) 2 dx + log2 )2] ≤2E [ log2 2+ξ2 ρ,1,1 ] . Analogous bounds can be derived for the remaining three terms in (8.2). Therefore, for any vector ξ ∈ R2 with distribution ρ, E [∥V ∥2 2 ]≤ E [ 2td ( ξ2 ρ,1,1 +ξ2 ρ,1,2 ) +2(1− t )d ( ξ′ 2 ρ,1,1 +ξ′′ 2 ρ,1,2 )] +4d log2 2 = 2dE [∥ξ∥2 2 ]+4d log2 2 <∞. □ Proof of Lemma 8.1. Let ρ,ν ∈ W2(R2) be arbitrary. To show contraction, consider three independent sequences of optimally coupled pairs (ξρ,i ,ξν,i )i≥1, (ξ′ρ,i ,ξ′ν,i )i≥1 and (ξ′′ρ,i ,ξ′′ν,i )i≥1 such that for each ζ = ξ,ξ′,ξ′′, the ζρ,i = (ζρ,i ,1,ζρ,i ,2) ∈R2 have distribution ρ, the ζν,i = (ζν,i ,1,ζν,i ,2) ∈R2 have distribution ν and W2(ρ,ν) = E[∥ζρ,i −ζν,i∥2 2 ]1/2 . (8.3) Let d ∼ Po(td) and d ′,d ′′ ∼ Po((1− t )d) all be independent. Moreover, let (si )i≥1, (r i )i≥1, (s ′i )i≥1, (r ′ i )i≥1, (s ′′i )i≥1 and (r ′′ i )i≥1 be independent sequences of i.i.d. Rademacher random variables with parameter 1/2; all not explicitly coupled random variables are assumed to be independent. Then with ρ̂ = logBP⊗d ,t (ρ) and ν̂ = logBP⊗d ,t (ν) we obtain W2(ρ̂, ν̂)2 ≤ E   ( d∑ i=1 si log (1+ r i tanh(ξρ,i ,1/2) 1+ r i tanh(ξν,i ,1/2) ) + d ′∑ i=1 s ′i log ( 1+ r ′ i tanh(ξ′ρ,i ,1/2) 1+ r ′ i tanh(ξ′ν,i ,1/2) ))2   +E   ( d∑ i=1 si log (1+ r i tanh(ξρ,i ,2/2) 1+ r i tanh(ξν,i ,2/2) ) + d ′′∑ i=1 s ′′i log ( 1+ r ′′ i tanh(ξ′′ρ,i ,2/2) 1+ r ′′ i tanh(ξ′′ν,i ,2/2) ))2   . Analogous to the derivation of (8.2), by the independence of the random signs, the expectations of the cross-terms cancel. Combined with the independence of the Poisson random variables, we conclude that W2(ρ̂, ν̂)2 ≤ tdE [ log2 (1+ r 1 tanh(ξρ,1,1/2) 1+ r 1 tanh(ξν,1,1/2) )] + (1− t )dE [ log2 ( 1+ r ′ 1 tanh(ξ′ρ,1,1/2) 1+ r ′ 1 tanh(ξ′ν,1,1/2) )] + tdE [ log2 (1+ r 1 tanh(ξρ,1,2/2) 1+ r 1 tanh(ξν,1,2/2) )] + (1− t )dE [ log2 ( 1+ r ′′ 1 tanh(ξ′′ρ,1,2/2) 1+ r ′′ 1 tanh(ξ′′ν,1,2/2) )] . (8.4) Moreover, conditioning on the value of r 1 and an application of the fundamental theorem of calculus yield log2 1+ tanh(ξρ,1,1/2) 1+ tanh(ξν,1,1/2) = [∫ ξρ,1,1 ξν,1,1 ∂ log(1+ tanh(z/2)) ∂z dz ]2 = [∫ ξρ,1,1∨ξν,1,1 ξρ,1,1∧ξν,1,1 1− tanh(z/2) 2 d z ]2 , (8.5) log2 1− tanh(ξρ,1,1/2) 1− tanh(ξν,1,1/2) = [∫ ξρ,1,1 ξν,1,1 ∂ log(1− tanh(z/2)) ∂z d z ]2 = [∫ ξρ,1,1∨ξν,1,1 ξρ,1,1∧ξν,1,1 1+ tanh(z/2) 2 d z ]2 . (8.6) 19 Combining (8.5) and (8.6) and applying the Cauchy-Schwarz inequality, we obtain E [ log2 (1+ r 1 tanh(ξρ,1,1/2) 1+ r 1 tanh(ξν,1,1/2) )] ≤ 1 2 E [( ξρ,1,1 −ξν,1,1 )2 ] . (8.7) An identical argument can be made for ξρ,1,2, ξν,1,2, ξ′ρ,1,1, ξ′ν,1,1 and ξ′′ρ,1,2, ξ′′ν,1,2. Finally, (8.3),(8.4) and (8.7) yield W2(ρ̂, ν̂)2 ≤ td 2 E [( ξρ,1,1 −ξν,1,1 )2 + ( ξρ,1,2 −ξν,1,2 )2 ] + (1− t )d 2 E [( ξ′ρ,1,1 −ξ′ν,1,1 )2 + ( ξ′′ρ,1,2 −ξ′′ν,1,2 )2 ] = td 2 W2(ρ,ν)2 + (1− t )d 2 W2(ρ,ν)2 = d 2 W2(ρ,ν)2, (8.8) which implies contraction because d < 2. □ 8.2. Proof of Proposition 2.8. The uniqueness of ρd ,t ∈ W2(R2) with logBP⊗d ,t (ρd ,t ) = ρd ,t (which yields (1.5)) fol- lows from Lemma 8.1 and the Banach fixed point theorem. As the Dirac measure in zero is an element of W2(R2), Lemma 8.1 also implies the weak convergence of (ρ(ℓ) d ,t )ℓ≥0 to ρd ,t . 9. PROOF OF PROPOSITION 2.7 As a first step toward the proof of Proposition 2.7 we are going to introduce an operator on probability distributions on the unit square that resembles the Belief Propagation update equations (2.3)–(2.4). We will see that this operator is closely related to the operator from (1.3). Specifically, (1.3) is the log-likelihood version of the new operator. Subsequently, we will show that the Belief Propagation operator correctly implements marginal computations on the Galton-Watson trees (T 1,T 2). 9.1. Density evolution. Recall that P ((0,1)2) is the space of all Borel probability measures on the unit square (0,1)2. We define an operator BP⊗ d ,t : P ((0,1)2) →P ((0,1)2), π 7→ π̂= BP⊗ d ,t (π) (9.1) as follows. For s ∈ {±1} let (µπ,s,i ,1,µπ,s,i ,2)i≥1, (µ′ π,s,i ,1,µ′ π,s,i ,2)i≥1, (µ′′ π,s,i ,1,µ′′ π,s,i ,2)i≥1 be three sequences of random vectors with distribution π. Further, let (d s ,d ′ s ,d ′′ s )s∈{±1} be Poisson variables with E[d s ] = td/2 and E[d ′ s ] = E[d ′′ s ] = (1− t )d/2. Finally, let ((r s,i ,r ′ s,i ,r ′′ s,i ))s∈{±1},i≥1 be uniformly distributed on {±1}3. All of these random variables are mutually independent. Then π̂ ∈ P ((0,1)2) is the distribution of the random vector (2−d−1−d ′ −1 ∏d−1 i=1 (1+ r −1,i (2µπ,−1,i ,1 −1)) ∏d ′ −1 i=1 (1+ r ′ −1,i (2µ′ π,−1,i ,1 −1)) ∑ s∈{±1} 2−d s−d ′ s ∏d s i=1(1+ r s,i (2µπ,s,i ,1 −1)) ∏d ′ s i=1(1+ r ′ s,i (2µ′ π,s,i ,1 −1)) , (9.2) 2−d−1−d ′′ −1 ∏d−1 i=1 (1+ r −1,i (2µπ,−1,i ,2 −1)) ∏d ′′ −1 i=1 (1+ r ′′ −1,i (2µ′′ π,−1,i ,2 −1)) ∑ s∈{±1} 2−d s−d ′′ s ∏d s i=1(1+ r s,i (2µπ,s,i ,2 −1)) ∏d ′′ s i=1(1+ r ′′ s,i (2µ′′ π,s,i ,2 −1)) ) ∈ (0,1)2. Let u⊗ ∈P ((0,1)2) denote the atom on the centre ( 1 2 , 1 2 ) of the unit square. We write BP⊗(ℓ) d ,t for the ℓ-fold application of the operator BP⊗ d ,t . We are going to perform a fixed point iteration using the operator BP⊗ d ,t , starting from u⊗. This fixed point iteration is known as density evolution in physics jargon [36]. Let π(ℓ) d ,t = BP⊗(ℓ) d ,t (u⊗). Lemma 9.1. Let d ∈ (0,2), t ∈ [0,1] and set πd ,t = t(ρd ,t ), where ρd ,t is the unique fixed point of logBP⊗d ,t from Proposition 2.8. Then πd ,t is a fixed point of BP⊗ d ,t , and πd ,t = lim ℓ→∞ π(ℓ) d ,t . 20 Proof. For s ∈ {±1} let (ξρ,s,i ,1,ξρ,s,i ,2)i≥1, (ξ′ρ,s,i ,1,ξ′ρ,s,i ,2)i≥1, (ξ′′ρ,s,i ,1,ξ′′ρ,s,i ,2)i≥1 (9.3) be three sequences of random vectors with distribution ρd ,t . Further, let (d s ,d ′ s ,d ′′ s )s∈{±1} be Poisson variables with E[d s ] = td/2 and E[d ′ s ] = E[d ′′ s ] = (1− t )d/2 and let (d ,d ′,d ′′) be Poisson variables with E[d ] = td and E[d ′] = E[d ′′] = (1− t )d . Finally, let ((r s,i ,r ′ s,i ,r ′′ s,i ))s∈{±1},i≥1, ((si , s ′i , s ′′i ))i≥1 and ((r i ,r ′ i ,r ′′ i ))i≥1 all be uniformly distributed on {±1}3. All of these random variables are mutually independent. Throughout the proof, we write l = (l1, l2) and t= (t1,t2) for lh : (0,1) →R, lh(x) = log x 1−x , and th :R→ (0,1), th(x) = (1+ tanh(x/2))/2, where h ∈ {1,2}. Then, since t(ξρ,s,1,1,ξρ,s,1,2) = (t2(ξρ,s,1,1),t2(ξρ,s,1,2)) has distribution t(ρd ,t ), using the definitions of l and t, BP⊗ d ,t (t(ρd ,t )) dist= (2−d−1−d ′ −1 ∏d−1 i=1 (1+ r −1,i (2t1(ξρ,−1,i ,1)−1)) ∏d ′ −1 i=1 (1+ r ′ −1,i (2t1(ξ′ρ,−1,i ,1)−1)) ∑ s∈{±1} 2−d s−d ′ s ∏d s i=1(1+ r s,i (2t1(ξρ,s,i ,1)−1)) ∏d ′ s i=1(1+ r ′ s,i (2t1(ξ′ρ,s,i ,1)−1)) , 2−d−1−d ′′ −1 ∏d−1 i=1 (1+ r −1,i (2t1(ξρ,−1,i ,2)−1)) ∏d ′′ −1 i=1 (1+ r ′′ −1,i (2t1(ξ′′ρ,−1,i ,2)−1)) ∑ s∈{±1} 2−d s−d ′′ s ∏d s i=1(1+ r s,i (2t1(ξρ,s,i ,2)−1)) ∏d ′′ s i=1(1+ r ′′ s,i (2t1(ξ′′ρ,s,i ,2)−1)) ) dist= t   ∑d i=1 si log 1+r i (2t1(ξρ,i ,1)−1) 2 +∑d ′ i=1 s ′i log 1+r ′ i (2t1(ξ′ρ,i ,1)−1) 2∑d i=1 si log 1+r i (2t1(ξρ,i ,2)−1) 2 +∑d ′′ i=1 s ′′i log 1+r ′′ i (2t1(ξ′′ρ,i ,2)−1) 2   dist= t   ∑d i=1 si log 1+r i tanh(ξρ,i ,1/2) 2 +∑d ′ i=1 s ′i log 1+r ′ i tanh(ξ′ρ,i ,1/2) 2∑d i=1 si log 1+r i tanh(ξρ,i ,2)/2) 2 +∑d ′′ i=1 s ′′i log 1+r ′′ i tanh(ξ′′ρ,i ,2/2) 2   . Since ρd ,t is a fixed point of logBP⊗d ,t , the last argument vector of t has distribution ρd ,t , and we get that BP⊗ d ,t (t(ρd ,t )) = t(ρd ,t ). So t(ρd ,t ) is a fixed point of BP⊗ d ,t . Next, let n⊗ ∈ P (R2) denote the atom in (0,0). Then since t(0,0) = ( 1 2 , 1 2 ), we have u⊗ = t(n⊗). By a compu- tation analogous to the first part of the proof, one can show inductively that for all ℓ ≥ 1, π(ℓ) d ,t = BP⊗ d ,t (t(n⊗)) = t(logBP⊗d ,t (ℓ)(n⊗)). As ρd ,t = lim ℓ→∞ logBP⊗d ,t (ℓ)(n⊗), the second part of the claim now follows from the continuous mapping theorem. □ 9.2. Belief Propagation on the Galton-Watson tree. The proof of Proposition 2.7 relies on the fact that Belief Prop- agation is ‘exact’ on trees. The following fact, which is a direct consequence of [36, Theorem 14.1], furnishes the precise statement that we will use. Fact 9.2. Assume that the bipartite graph associated with the 2-CNF Φ is a (finite) tree. Let z ∈ V (Φ) be a variable and let ℓ≥ 1 be an integer such that no variable or clause ofΦ has distance greater than 2ℓ from z. Let µ(0) x→a(s) =µ(0) a→x (s) = 1 2 for all x ∈V (Φ), a ∈ ∂x, s ∈ {±1}. Furthermore, obtain the messages (µ(i+1) x→a (s),µ(i+1) a→x (s))x,a,s by applying the BP operator (2.2) to (µ(i ) x→a(s),µ(i ) a→x (s))x,a,s . Then for all i ≥ 2ℓ we have µΦ({σz = s}) = ∏ a∈∂z µ (i ) a→z (s) ∏ a∈∂z µ (i ) a→z (1)+∏ a∈∂z µ (i ) a→z (−1) . (9.4) As a preparation toward the proof of Proposition 2.7 we establish the following ‘univariate’ variant of the propo- sition. Lemma 9.3. Let h = 1,2. Let π(ℓ) d ,t ,h be the distribution of the h-th component of a random vector with distribution π(ℓ) d ,t . Then µT (2ℓ) h ({σo = 1}) has distribution π(ℓ) d ,t ,h . 21 Proof. We proceed by induction on ℓ. For ℓ= 0 there is nothing to show because both µT (0) h ({σo = 1}) = 1 2 and π(ℓ) d ,t ,h is the atom on 1/2. To go from ℓ−1 to ℓ ≥ 1 we exploit the fact that T h by itself has the same distribution as the ‘plain’ Galton-Watson tree T from Section 2.2, after all distinctions between different types of clauses and variables are dropped. In effect, the tree T h,x pending on any grandchild x ∈ ∂2o of the root has the same distribution as T itself, and these trees are mutually independent for all x ∈ ∂2o. Consequently, by induction we know that the marginal µT (2(ℓ−1)) h,x ({σx = 1}) of x in T (2(ℓ−1)) h,x has distribution π(ℓ−1) d ,t ,h . Now let ax ∈ ∂x ∩∂o be the clause that links x and o. Fact 9.2 implies that the marginals µT (2(ℓ−1)) h,x ({σx = s}) co- incide with the messages µ(2(ℓ−1)) T (2ℓ),x→ax (s). Indeed, the marginal formula (9.4) for the tree T (2(ℓ−1)) h,x coincides with the message update formula (2.4), because clause ax is not part of T h,x . Furthermore, because the trees pending on the different grandchildren of the root o are mutually independent, the incoming messages (µ(2(ℓ−1)) T (2ℓ) h,x ,x→ax (±1))x∈∂2o are mutually independent. Moreover, the quotient from (9.4), which, upon substituting in the update equation (2.3), can be rewritten as µT (2ℓ) h ({σo = 1}) = ∏ x∈∂2o 1{sign(o, ax ) = 1}+ 1{sign(o, ax ) =−1}µ(2(ℓ−1)) T (2ℓ) h,x ,x→ax (sign(x, ax )) ∑ s∈{±1} ∏ x∈∂2o 1{sign(o, ax ) = s}+ 1{sign(o, ax ) =−s}µ(2(ℓ−1)) T (2ℓ) h,x ,x→ax (sign(x, ax )) . (9.5) Also recall that o has Po(d) children. Hence, comparing (9.5) with the h-component of (9.2), we conclude that µT (2ℓ) h ({σo = 1}) has distribution π(ℓ) d ,t ,h . □ Lemma 9.4. Let t ∈ [0,1]. Then µ(2ℓ) has distribution π(ℓ) d ,t . Proof. As in the proof of Lemma 9.3 we proceed by induction on ℓ. For ℓ= 0 there is nothing to show. To go from ℓ− 1 to ℓ ≥ 1 we reuse the rewritten update equation (9.5). In each of the correlated trees T 1,T 2 the root o has Po((1− t )d) {1,2}-distinct grandchildren. By construction, the trees pending on these grandchildren are mutually independent copies of the tree T . Hence, the same consideration as in the proof of Lemma 9.3 shows that the messages that the {1,2}-distinct grandchildren pass up are independent with distribution π(ℓ−1) d ,t ,h . The same is true of the messages µ′ π(ℓ−1) d ,t ,±1,i ,h ,µ′′ π(ℓ−1) d ,t ,±1,i ,h from (9.2). Consequently, the contribution of the {1,2}-distinct children in (9.2) matches the corresponding contribution to the update equation (9.5). With respect to the shared grandchildren, we apply induction as in the proof of Lemma 9.3. Indeed, the trees pending on the shared grandchildren x ∈ ∂2 T ⊗o have the same distribution as the original tree T ⊗ and are mutually independent. Therefore, by the induction hypothesis, the pair of messages (µ(2(ℓ−1)) T (2ℓ−2) h ,x→ax (1))h=1,2 that a shared grandchild x sends towards the root has distribution π(ℓ−1) d ,t . Finally, since o has Po(d t ) shared grandchildren, matching the expressions (9.5) and (9.2) completes the proof. □ Proof of Proposition 2.7. The assertion follows from Lemmas 9.1 and 9.4. □ 10. PROOF THAT THE VARIANCE IS FINITE The main goal of this section is to show that both the evaluation of the functional B⊗ d ,t on ρd ,t as well as the integration to obtain η(d)2 yield finite values for any d ∈ (0,2) and t ∈ [0,1]. Lemma 10.1. For any d ∈ (0,2) and t ∈ [0,1], B⊗ d ,t (ρd ,t ) <∞. Moreover, for any d ∈ (0,2), η(d)2 <∞. 10.1. Proof of Lemma 10.1. Let ρ(ℓ) d ,t ∈W2(R2) be the result of ℓ iterations of logBP⊗d ,t launched from n⊗, the atom at (0,0). In the proof of Lemma 10.1, the following properties of the fixed point πd ,t will be used: Claim 10.2. Let πd ,t = t(ρd ,t ) and µπd ,t = (µπd ,t ,1,µπd ,t ,2) be a random vector with distribution πd ,t . Then µπd ,t ,1 dist= µπd ,t ,2 and µπd ,t 1 dist= 1−µπd ,t ,1. Proof. Recall the definition of BP⊗ d ,t from (9.2). The first claim then follows from the following limiting argument: By Lemma 9.1, πd ,t = limℓ→∞ BP⊗(ℓ) d ,t (u⊗), where u⊗ is the Dirac measure on (1/2,1/2). As the marginal distributions 22 of the initial distribution u⊗ are identical, inspection of the update rule (9.2) yields that also the marginal distribu- tions of BP⊗(1) d ,t (u⊗) are identical. Analogously, it is immediate from (9.2) that anyπwith two identical marginals will be mapped to a measure BP⊗ d ,t (π) with two identical marginals, such that the marginal distributions of BP⊗(ℓ) d ,t (u⊗) for any ℓ≥ 0 are identical. Hence, also in the limit, µπd ,t ,1 dist= µπd ,t ,2. On the other hand, Lemma 9.1 also implies that πd ,t = BP⊗ d ,t (πd ,t ), so that the distribution of µπd ,t ,1 is the same as the distribution of 2−d−1−d ′ −1 ∏d−1 i=1 (1+ r −1,i (2µπd ,t ,−1,i ,1 −1)) ∏d ′ −1 i=1 (1+ r ′ −1,i (2µ′ πd ,t ,−1,i ,1 −1)) ∑ s∈{±1} 2−d−s−d ′ −s ∏d−s i=1(1+ r −s,i (2µπd ,t ,−s,i ,1 −1)) ∏d ′ −s i=1(1+ r ′ −s,i (2µ′ πd ,t ,−s,i ,1 −1)) (10.1) while the distribution of 1−µπd ,t ,1 is the same as the distribution of 2−d 1−d ′ 1 ∏d 1 i=1(1+ r 1,i (2µπd ,t ,1,i ,1 −1)) ∏d ′ 1 i=1(1+ r ′ 1,i (2µ′ πd ,t ,1,i ,1 −1)) ∑ s∈{±1} 2−d−s−d ′ −s ∏d−s i=1(1+ r −s,i (2µπd ,t ,−s,i ,1 −1)) ∏d ′ −s i=1(1+ r ′ −s,i (2µ′ πd ,t ,−s,i ,1 −1)) . (10.2) This immediately shows that (10.1) and (10.2) have the same distribution. As a consequence, the second claim holds as well. □ Proof of Lemma 10.1. Recall that πd ,t = t(ρd ,t ). Let µπd ,t ,1 = (µπd ,t ,1,1,µπd ,t ,1,2) = t(ξρd ,t ,1,1,ξρd ,t ,1,2), and µπd ,t ,2 = (µπd ,t ,2,1,µπd ,t ,2,2) = t(ξρd ,t ,2,1,ξρd ,t ,2,2). Then they are independent random vectors with distribution πd ,t . Let r 1, r 2 be independent Rademacher random variables with parameter 1/2 , independent of µπd ,t ,1 and µπd ,t ,2. Conditioning on the values of r 1 and r 2 yields the upper bound |B⊗ d ,t (ρd ,t )| ≤ 1 4 E [∣∣∣log ( 1−µπd ,t ,1,1µπd ,t ,2,1 ) log ( 1−µπd ,t ,1,2µπd ,t ,2,2 )∣∣∣ ] + 1 4 E [∣∣∣log ( 1− ( 1−µπd ,t ,1,1 ) µπd ,t ,2,1 ) log ( 1− ( 1−µπd ,t ,1,2 ) µπd ,t ,2,2 )∣∣∣ ] + 1 4 E [∣∣∣log ( 1−µπd ,t ,1,1 ( 1−µπd ,t ,2,1 )) log ( 1−µπd ,t ,1,2 ( 1−µπd ,t ,2,2 ))∣∣∣ ] + 1 4 E [∣∣∣log ( 1− ( 1−µπd ,t ,1,1 )( 1−µπd ,t ,2,1 )) log ( 1− ( 1−µπd ,t ,1,2 )( 1−µπd ,t ,2,2 ))∣∣∣ ] . The Cauchy-Schwarz inequality further gives that |B⊗ d ,t (ρd ,t )| ≤ 1 4 E [ log2 ( 1−µπd ,t ,1,1µπd ,t ,2,1 )]1/2 E [ log2 ( 1−µπd ,t ,1,2µπd ,t ,2,2 )]1/2 + 1 4 E [ log2 ( 1− ( 1−µπd ,t ,1,1 ) µπd ,t ,2,1 )]1/2 E [ log2 ( 1− ( 1−µπd ,t ,1,2 ) µπd ,t ,2,2 )]1/2 + 1 4 E [ log2 ( 1−µπd ,t ,1,1 ( 1−µπd ,t ,2,1 ))]1/2 E [ log2 ( 1−µπd ,t ,1,2 ( 1−µπd ,t ,2,2 ))]1/2 + 1 4 E [ log2 ( 1− ( 1−µπd ,t ,1,1 )( 1−µπd ,t ,2,1 ))]1/2 E [ log2 ( 1− ( 1−µπd ,t ,1,2 )( 1−µπd ,t ,2,2 ))]1/2 . As µπd ,t ,1,1 dist= µπd ,t ,1,2 and µπd ,t ,1,1 dist= 1−µπd ,t ,1,1 thanks to Claim 10.2, we further get |B⊗ d ,t (ρd ,t )| ≤ E [ log2 ( 1−µπd ,t ,1,1µπd ,t ,2,1 )] . 23 Next, recalling (9.3), E [ log2 ( 1−µπd ,t ,1,1µπd ,t ,2,1 )] ≤ E [ log2 ( 1−µπd ,t ,1,1 )] ≤ E [ 1 { µπd ,t ,1,1 ≤ 1 2 }∣∣∣log2 ( 1−µπd ,t ,1,1 )∣∣∣ ] +E [ 1 { µπd ,t ,1,1 > 1 2 }∣∣∣log2 ( 1−µπd ,t ,1,1 )∣∣∣ ] ≤ E [ 1 { µπd ,t ,1,1 ≤ 1 2 } log2 2 ] +E  1 { µπd ,t ,1,1 > 1 2 } log µπd ,t ,1,1( 1−µπd ,t ,1,1 ) − logµπd ,t ,1,1   2  ≤ log2 2+E [ log2 ( µπd ,t ,1,1 1−µπd ,t ,1,1 )] −2E  1 { µπd ,t ,1,1 > 1 2 } log µπd ,t ,1,1( 1−µπd ,t ,1,1 ) logµπd ,t ,1,1   ≤ log2 2+E [ log2 ( µπd ,t ,1,1 1−µπd ,t ,1,1 )] = log2 2+E [ ξ2 ρd ,t ,1,1 ] . However E[ξ2 ρd ,t ,1,1] <∞, since ρd ,t ∈W2(R2). Thus, B⊗ d ,t (ρd ,t ) <∞. Moreover, it is easy to see that the distribution of the marginal sample ξρd ,t ,1,1 is independent of for any t ∈ [0,1]. Call this distribution ρd and let ξρd be a sample from ρd . Then the previous upper bound yields η(d)2 ≤ ∫ 1 0 |B⊗ d ,t (ρd ,t )|dt +|B⊗ d (ρd ,0)| ≤ ∫ 1 0 log2 2+E [ ξ2 ρd ] dt + log2 2+E [ ξ2 ρd ] ≤ 2 ( log2 2+E [ ξ2 ρd ]) <∞. □ 11. PROOF OF PROPOSITION 2.12 We combine the results from the previous sections in order to analyse the variance process. As a first step we derive a rough upper bound on the potential change in the number of satisfying assignments upon insertion of a single clause (Lemmas 11.1 and 11.6). Subsequently we derive a combinatorial formula for the squared martingle difference (Lemma 11.9), which easily implies Lemma 2.4. The combinatorial formula puts us in a position to obtain an L2-bound on the squared martingale difference (Lemma 11.10). With these ingredients in place, we complete the proof of Proposition 2.12 and of Theorem 1.1 in Section 11.5. 11.1. A pessimistic estimate. LetΦ,Ψ be two 2-CNFs on the same set of variables such thatΨ is obtained fromΦ by adding a single clause e. We are going to need a baseline estimate of the difference | log Z (Φ̂)− log Z (Ψ̂)|. The principal difficulty here is to assess the impact of the additional clause on the pruning operation. The issue is that the extra clause may also cause additional pruning. Indeed, while clearly F (Ψ̂) \ {e} ⊆ F (Φ̂), i.e., any clause a ̸= e that survives pruning on Ψ also remains present in Φ̂, the pruned formula Ψ̂ may ironically end up having strictly fewer clauses than Φ̂. To get a handle on the potential repercussions of pruning, let {v, v ′} = ∂e be the variables that appear in clause e. Let N (Φ, v) be the set of all literals l such that {v,¬v}∩L (Φ, {l }) ̸= ;. Thus, PUC may reach v or ¬v once l is deemed true. Observe that v ∈N (Φ, v) and ¬v ∈N (Φ, v). Define N (Φ, v ′) analogously. Further, let N (Φ,e) = ⋃ l∈N (Φ,v)∪N (Φ,v ′) L (Φ, {l }). (11.1) Thus, N (Φ,e) contains all literals thatPUC can reach by tracing the implications of a literal from N (Φ, v)∪N (Φ, v ′). The definition of the sets N (Φ, v), N (Φ, v ′) ensures that v,¬v ∈N (Φ, v), v ′,¬v ′ ∈N (Φ, v ′). (11.2) Lemma 11.1. LetΦ be a 2-CNF formula. Suppose thatΨ is obtained fromΦ by adding a single clause e. Then ∣∣log(Z (Φ̂))− log(Z (Ψ̂)) ∣∣≤ |N (Φ,e)| log2. (11.3) As a first step towards the proof of Lemma 11.1 we observe that (11.1) can be rewritten as follows. 24 Claim 11.2. We have N (Φ,e) = ⋃ l∈N (Φ,v)∪N (Φ,v ′) L (Ψ, {l }). (11.4) Proof. Clearly L (Ψ, {l }) ⊇L (Φ, {l }) for every literal l . Hence, we just need to show that L (Ψ, {l }) ⊆N (Φ,e) for all l ∈N (Φ, v)∪N (Φ, v ′). (11.5) Hence, let l ∈ N (Φ, v)∪N (Φ, v ′) and let l ′ ∈ L (Ψ, {l }). Then Lemma 4.2 shows that there exists an implication chain l = l0, a1, l1, . . . , ak , lk = l ′ (11.6) comprising literals li and clauses ai ∈ F (Ψ) such that ai ≡ li−1 → li for all 1 ≤ i ≤ k. If ai ̸= e for all i , then the chain (11.6) is contained in Φ and thus l ′ ∈ L (Φ, {l }) ⊆ N (Φ,e). Otherwise let 1 ≤ j ≤ k be the largest index such that a j = e. Then l j is one of the constituent literals of a j and thus l j ∈ {v,¬v, v ′,¬v ′}. Furthermore, the implica- tion chain l j , a j+1, l j+1, . . . , ak , lk = l ′ from l j to l ′ is contained in Φ. Therefore, (11.2) shows in combination with Lemma 4.2 and (11.1) that l ′ ∈L (Φ, {l j }) ⊆N (Φ,e). □ We proceed to show that N (Φ,e) contains the variables of all clauses a ∈ F (Φ) on which the pruning processes run onΦ,Ψ differ. Claim 11.3. For any clause a ∈ F (Φ̂) \ F (Ψ̂) we have ∂a ⊆N (Φ,e). Proof. Consider a clause a ∈ F (Φ) that was removed by pruning applied toΨ but not by pruning applied to Φ. Let w, w ′ be the constituent literals of a, i.e., a ≡ w∨w ′. Then PUC(Ψ, {l }) added a to the set C (Ψ, {l }) of conflict clauses for some literal l . Hence, w,¬w, w ′,¬w ′ ∈L (Ψ, {l }). (11.7) Consequently, Lemma 4.2 shows that for each literal k ∈ {¬w,¬w ′}, PUC(Ψ, {l }) traverses an implication chain l0,k = l , a0,k , l1,k , a1,k , . . . , l jk ,k = k of literals li ,k ∈ L (Ψ, {l }) and clauses ai ,k ≡ ¬li ,k ∨ li+1,k ≡ li ,k → li+1,k for 0 ≤ i < jk . Because a ̸∈ C (Φ, l ), at least one of these two sequences includes the clause e and thus at least one of v,¬v and one of v ′,¬v ′. Hence, l ∈N (Φ, v)∪N (Φ, v ′). Therefore, combining (11.4) and (11.7), we conclude that ∂a = {|w |, |w ′|} ⊆N (Φ,e). □ Let Φ̃ be the formula obtained from Φ̂ by removing all variables x ∈ V (Φ̂) such that {x,¬x}∩N (Φ,e) ̸= ; along with their adjacent clauses. Claim 11.4. For any σ̃ ∈ S(Φ̃) there exists σ ∈ S(Ψ̂) such that σx = σ̃x for all x ∈V (Φ̃). Proof. Let Ψ̌ be a CNF with variables V (Ψ̌) =V (Ψ̂) \V (Φ̃) = {x ∈V (Φ̂) : {x,¬x}∩N (Φ,e) ̸= ;}. The clauses of Ψ̌ include all a ∈ F (Ψ̂) such that ∂a ⊆ V (Ψ̌). Additionally, for every clause a ∈ F (Ψ̂) that contains exactly one literal l with |l | ∈V (Ψ̌) we include the literal l as a unit clause into Ψ̌. In light of Claim 11.3, to prove the assertion it suffices to show that Ψ̌ is satisfiable. For then we could extend any σ ∈ S(Φ̃) to a satisfying assignment of Ψ̂ by simply setting the variables x ∈V (Ψ̂) \V (Φ̃) in accordance with a satisfying assignment of Ψ̌. As in the proof of Fact 2.2, to construct a satisfying assignment of Ψ̌ we fix an order l1, . . . , lk of the literals N (Φ,e). Let σi be the assignment that PUC outputs on input Ψ, {li }. Further, define a {0,±1}-valued assignment (σx )x∈V (Ψ̌) by letting σx =σi ,x for the least index i such that {x,¬x}∩L (Ψ, li ) ̸= ;. We claim that ∀a ∈ F (Ψ̌)∃x ∈ ∂Ψ̌a : σx = sign(x, a) ; (11.8) thus, we can turnσ into a satisfying assignment of Ψ̌ by assigning those variables y withσy = 0 arbitrarily. To verify (11.8), we consider two cases separately. 25 Case 1: |∂Ψ̌a| = 2: then a ∈ F (Ψ). Let ∂a = {x, x ′} and let i be the smallest index such that L (Ψ, li )∩{x,¬x, x ′,¬x ′} ̸= ;. Also let l , l ′ be the constitutent literals of a such that |l | = x and |l ′| = x ′. Suppose that l ∈ L (Ψ, {li }). If ¬l ̸∈ L (Ψ, li ), then σx = sign(x, a) by construction. Hence, assume that l ,¬l ∈ L (Ψ, li ). Then the construction in Steps 1–2 of PUC ensures that l ′ ∈ L (Ψ, li ) as well. Moreover, if ¬l ′ ̸∈ L (Ψ, li ), then σx ′ = sign(x ′, a). Finally, the case l , l ′,¬l ,¬l ′ ∈ L (Ψ, li ) cannot occur because otherwise a would have been pruned, i.e., a ̸∈ F (Ψ̂). Case 2: |∂Ψ̌a| = 1: there exists a clause b ∈ F (Ψ̂) and literals l , l ′ with |l ′| ̸∈V (Ψ̌) such that b = l ∨ l ′ and a = l . Let i be the least index such that {l ,¬l }∩L (Ψ, {li }) ̸= ;. If ¬l ∈ L (Ψ, {li }), then PUC(Ψ, {li }) would have added l ′ to the set L (Ψ, {li }) as well and thus |l ′| ∈V (Ψ̌). But |l ′| ̸∈V (Ψ̌). Hence, {l ,¬l }L (Ψ, {li }) = {l } and thus σ|l | =σi ,|l | = signΨ(|l |,b) = signΨ̌(|l |, a). Thus, in either case σ satisfies clause a. □ In perfect analogy to the above let Ψ̃ be the formula obtained from Ψ̂ by removing all variables x ∈ V (Ψ̂) such that {x,¬x}∩N (Φ,e) ̸= ;, along with their adjacent clauses. Claim 11.5. For any σ̃ ∈ S(Ψ̃) there exists σ ∈ S(Φ̂) such that σx = σ̃x for all x ∈V (Ψ̃). Proof. Let Φ̌ be a CNF with variables V (Φ̌) = V (Φ̂) \ V (Ψ̃). Include in Φ̌ all a ∈ F (Φ̂) with ∂a ⊆ V (Φ̌). Moreover, for every a ∈ F (Φ̂) that contains exactly one literal l with |l | ∈V (Ψ̌) add l as a clause to Φ̌. As in the proof of Claim 11.4 it suffices to construct a satisfying assignment of Φ̌. Due to (11.1) the same argument as in the proof of Claim 11.4 extends. □ Proof of Lemma 11.1. We use Claim 11.4 to prove that Z (Φ̂) ≤ 2|N (Φ,e)|Z (Ψ̂); similar reasoning based on Claim 11.5 yields the reverse bound. To show the desired bound split a satisfying assignment σ ∈ S(Φ̂) up into two parts σ̃ = (σx ){x,¬x}∩N (Φ,e)=;, σ̌ = (σx ){x,¬x}∩N (Φ,e )̸=;. Claim 11.4 shows that the number of possible first parts σ̃ for σ ∈ S(Φ̂) is bounded by Z (Ψ̂), because every σ̃ extends to a satisfying assignment of Ψ̂. Moreover, the total number of possible second parts is bounded by 2|N (Φ,e)|. □ 11.2. A tail bound. As a next step we are going to derive a bound on the r.h.s. of (11.3) on random formulas. More specifically, obtain the formulaΦ′ fromΦ by deleting the last clause am . Let N ′ = |N (Φ′, am)|. Lemma 11.6. There exists c = c(d) > 0 such that for all t > c we have P [ N ′ > t 2]≤ c exp(−t/c). As a first step we are going to estimate the size of the set N (Φ′, x1) that contains all literals l such that L (Φ′, l )∩ {x1,¬x1} ̸= ;. Claim 11.7. There exists c1 = c1(d) > 0 such that for all t > c1 we have P [|N (Φ′, x1)| > t ]≤ c1 exp(−t/c1). Proof. We use a classical branching process argument. Let R be the set of literals l such that x1 ∈ L (Φ′, l ). By symmetry it suffices to bound |R|. For every l ∈ R there exists an alternating sequence l = l0, a1, l1, a2, . . . , lk = x1 of literals and clauses such that ai ≡ ¬li−1 ∨ li . Flipping the negations along this sequence yields a reverse sequence l ′0 = ¬x1 = ¬lk , a′ 1 = ak , l ′1 = ¬lk−1, . . . , l ′k = ¬l such that a′ i ≡ ¬l ′i−1 ∨ l ′i . Hence, R is precisely the set of literals l that are reachable from x1 via such an alternating sequence l ′0, a′ 1, . . . , l ′k . Furthermore, for any literal l the expected number of clauses ai such that ai ≡ l ∨ l ′ for some other literal l ′ equals m/2n ∼ d/2. Therefore, |R| is stochastically dominated by the progeny of a branching process with offspring Po(d/2). Standard branching process tail bounds therefore yield the desired bound on |R|. □ Claim 11.8. There exists c2 = c2(d) > 0 such that for all t > c2 and for every literal l ̸= x1 we have P [|L (Φ′, l )| > t | x1 ∈L (Φ′, l ) ]≤ c2 exp(−t/c2). Proof. We combine a branching process argument with Bayes’ formula. Specifically, because the formula Φ′ is random, the set L (Φ′, l ) \ {l } is random given its size. Hence, for an integer ℓ we have P [ x1 ∈L (Φ′, l ) | |L (Φ′, l )| = ℓ]= ℓ−1 2n −1 . (11.9) 26 Furthermore, the size |L (Φ′, l )| is stochastically dominated by the progeny of a branching process with offspring Po(d/2). Therefore, there exists c ′2 = c ′2(d) > 0 such that for all t > c ′2 we have P [|L (Φ′, l )| > t ]≤ c ′2 exp(−t/c ′2). (11.10) Moreover, for any d > 0 there exists c ′′2 = c ′′2 (d) > 0 such that P [ x1 ∈L (Φ′, l ) ]≥ c ′′2 /n. (11.11) Hence, combining (11.9)–(11.11) with Bayes’ rule, we obtain for ℓ> c ′2, P [|L (Φ′, l )| = ℓ | x1 ∈L (Φ′, l ) ]≤ P [ x1 ∈L (Φ′,ℓ) | |L (Φ′, l )| = ℓ]P[|L (Φ′, l )| = ℓ] P [ x1 ∈L (Φ′, l ) ] ≤ c ′2 c ′′2 ℓexp(−ℓ/c ′2), which implies the assertion. □ Proof of Lemma 11.6. Let R(ℓ) be the event that there exists l ∈N (Φ′, x1)\{x1} such that |L (Φ′, l )| > ℓ. Claim 11.7 implies that there exists c3 = c3(d) > 0 such that P [ x1 ∈L (Φ′, l ) ]≤ c3/n. (11.12) Hence, by Claim 11.8, (11.12) and the union bound, P [R(ℓ)] ≤ ∑ l ̸=x1 P [ x1 ∈L (Φ′, l ), |L (Φ′, l )| > ℓ]≤ 2c2c3 exp(−ℓ/c2). (11.13) Furthermore, Claim 11.7 shows that P [ N (Φ′, x1) \ {x1} > ℓ]≤ c1 exp(−ℓ/c1). (11.14) Combining (11.13) and (11.14), we obtain P [ ∑ l∈N (Φ′,x1) |L (Φ′, l )| > ℓ2 ] ≤P [R(ℓ)]+P [ N (Φ′, x1) \ {x1} > ℓ]≤ c1 exp(−ℓ/c1)+2c2c3 exp(−ℓ/c2). (11.15) By symmetry the same bound holds with x1 replaced by ¬x1. Therefore, the assertion follows from (11.15) and the union bound. □ 11.3. The squared martingale difference. We derive a combinatorial formula for the squared martingale differ- ences X 2 i . Let ∆(M) = log ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) ) · log ( Z (Φ̂2(M ,m −M)) Z (Φ̂2(M −1,m −M)) ) , ∆′(M) = log ( Z (Φ̂1(M −1,m −M +1)) Z (Φ̂1(M −1,m −M)) ) · log ( Z (Φ̂2(M −1,m −M +1)) Z (Φ̂2(M −1,m −M)) ) , ∆′′(M) = log ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) ) · log ( Z (Φ̂2(M −1,m −M +1)) Z (Φ̂2(M −1,m −M)) ) . Lemma 11.9. We have mX 2 M = E[ ∆(M)+∆(M)′−2∆′′(M) |FM ] . Proof. This follows from a direct computation. □ Proof of Lemma 2.4. Lemma 2.4 is an immediate consequence of Lemma 11.9. □ 11.4. An L2-bound. The following L2-bound will enable us to deal with error terms. Lemma 11.10. Uniformly for all 1 ≤ M ≤ m we have E [ ∆(M)2 +∆′(M)2 +∆′′(M)2 ]=O(1). 27 Proof. We will bound E[∆(M)2]; the bounds on the other two terms follow analogously. Invoking the Cauchy- Schwarz inequality, we obtain E [ ∆(M)2]= E [ log2 ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) ) · log2 ( Z (Φ̂2(M ,m −M)) Z (Φ̂2(M −1,m −M)) )] ≤ E [ log4 ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) )]1/2 E [ log4 ( Z (Φ̂2(M ,m −M)) Z (Φ̂2(M −1,m −M)) )]1/2 = E [ log4 ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) )] . (11.16) Furthermore, the random formula Φ1(M ,m −M) is obtained from Φ1(M −1,m −M) by adding a single random clause aM , which is independent ofΦ1(M −1,m −M). Therefore, Lemma 11.1 implies that log ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) ) ≤ |N (Φ1(M −1,m −M), aM )| log2. (11.17) Moreover, since |N (Φ1(M−1,m−M), aM )| has the same distribution as the random variable N ′ from Lemma 11.6, we obtain E [|N (Φ1(M −1,m −M), aM )|4]=O(1) (11.18) uniformly for all M . Finally, the assertion follows from (11.16)–(11.18). □ To facilitate the following steps we introduce trunacted versions of∆(M),∆′(M),∆′′(M): for B > 0 and x > 0 let ΛB (x) =    −B if log(x) <−B , B if log(x) > B , log(x) otherwise. Further, let ∆B (M) =ΛB ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) ) ·ΛB ( Z (Φ̂2(M ,m −M)) Z (Φ̂2(M −1,m −M)) ) , ∆′ B (M) =ΛB ( Z (Φ̂1(M −1,m −M +1)) Z (Φ̂1(M −1,m −M)) ) ·ΛB ( Z (Φ̂2(M −1,m −M +1)) Z (Φ̂2(M −1,m −M)) ) , ∆′′ B (M) =ΛB ( Z (Φ̂1(M ,m −M)) Z (Φ̂1(M −1,m −M)) ) ·ΛB ( Z (Φ̂2(M −1,m −M +1)) Z (Φ̂2(M −1,m −M)) ) . Combining Lemma 11.10 with the Cauchy-Schwarz inequality, we obtain the following. Corollary 11.11. For any ε> 0 there exists B > 0 such that for all 1 ≤ M ≤ m we have E |∆(M)−∆B (M)|+E ∣∣∆′(M)−∆′ B (M) ∣∣+E ∣∣∆′′(M)−∆′′ B (M) ∣∣< ε. 11.5. The variance process. In light of Lemma 11.9, to prove Proposition 2.12 we need to show that 1 m m∑ M=1 E [ ∆(M)+∆′(M)−2∆′′(M) |FM ]→ η(d)2 in probability. To this end we divide the above sum up into batches Σ̄(L,L′) =Σ(L,L′)+Σ′(L,L′)−2Σ′′(L,L′), where Σ(L,L′) = 1 L′−L L′−1∑ M=L E [∆(M) |FM ] , Σ′(L,L′) = 1 L′−L L′−1∑ M=L E [ ∆′(M) |FM ] , Σ′′(L,L′) = 1 L′−L L′−1∑ M=L E [ ∆′′(M) |FM ] . 28 Then for any sequence 1 = L0 < ·· · < Lk = m we have 1 n m∑ M=1 E [ ∆(M)+∆′(M)−2∆′′(M) |FM ]= k∑ i=1 Li −Li−1 n Σ̄(Li−1,Li ). The following lemma is the centerpiece of the proof. Lemma 11.12. For any ε > 0 there exists ω > 0 such that uniformly for all 1 ≤ L < L′ ≤ m with ω ≤ L′−L ≤ 2ω we have E ∣∣∣Σ(L,L′)−B⊗ d ,t (πd ,t ) ∣∣∣+E ∣∣∣Σ′(L,L′)−B⊗ d ,0(πd ,0) ∣∣∣+E ∣∣∣Σ′′(L,L′)−B⊗ d ,0(πd ,0) ∣∣∣< ε+o(1), where t = L/m. We will carry out the details for the first term E|Σ(L,L′)−B⊗ d ,t (πd ,t )|, which is the most delicate; similar but slightly simpler steps yield the other two estimates. We begin by replacing ∆(M) by its truncated version ∆B (M). Accordingly, let ΣB (L,L′) = 1 L′−L L′−1∑ M=L E [∆B (M) |FM ] , Σ′ B (L,L′) = 1 L′−L L′−1∑ M=L E [ ∆′ B (M) |FM ] , Σ′′ B (L,L′) = 1 L′−L L′−1∑ M=L E [ ∆′′ B (M) |FM ] . Claim 11.13. For any ε> 0 there exists B0 > 0 such that for all B > B0 and all L,L′ > 0 we have E ∣∣Σ(L,L′)−ΣB (L,L′) ∣∣< ε+o(1). Proof. This is an immediate consequence of Corollary 11.11. □ We proceed to relate the change in the pruned partition function to the marginal distribution of the truth values of the variables of the additional clause aM . Claim 11.14. Let B > 0. W.h.p. we have ΛB ( Z (Φ̂h(M ,m −M)) Z (Φ̂h(M −1,m −M)) ) =ΛB ( 1− ∏ y∈∂aM µΦ̂h (M−1,m−M) ( σy ̸= sign(y, aM ) ) ) +o(1) (h = 1,2). Proof. Since the functionΛB is bounded and continuous, this follows from Proposition 2.5. □ A combinatorial interpretation of Σ(L,L′) is that the sum gauges the cumulative effect of adding a total of L′−L ‘shared’ clauses, one after the other. Claim 11.14 expresses the effect of adding a shared clause in terms of the marginals of the formula Φ̂h(M −1,m−M). So long as the total number L′−L of clauses added is not too large, we may expect that this marginal distribution does not shift all to much as we add clauses one by one. This is what the following claim verifies. Claim 11.15. Let t = M/m. If L′−L =O(1), then w.h.p. we have L′−1∑ M=L W1(πΦ̂1(M−1,m−M),Φ̂2(M−1,m−M),πd ,t ) = o(1). Proof. This follows from Corollary 2.10. □ As a next step we truncate the functional B⊗ d ,t from (1.4). Hence, for B > 0 let B⊗ B ,d ,t (π) = E [ ΛB ( 1− (1{r −1,1 =−1}+ r −1,1µπ,−1,1,1)(1{r −1,2 =−1}+ r −1,2µπ,−1,2,1) ) ΛB ( 1− (1{r −1,1 =−1}+ r −1,1µπ,−1,1,2)(1{r −1,2 =−1}+ r −1,2µπ,−1,2,2) )] . (11.19) Claim 11.16. For any ε> 0 there exists B0 > 0 such that for all B > B0 and all t ∈ [0,1] we have∣∣∣B⊗ d ,t (πd ,t )−B⊗ B ,d ,t (πd ,t ) ∣∣∣< ε. Proof. Since B⊗ B ,d ,t (πd ,t ) ↑B⊗ d ,t (πd ,t ) as B →∞, this follows from Proposition 2.8. □ 29 Proof of Lemma 11.12. Lemma 10.1 ensures that B⊗ d ,t (πd ,t ) <∞ for all t . Moreover, Claims 11.13 and 11.16 imply that we just need to show that for large B > 0, E ∣∣∣ΣB (L,L′)−B⊗ B ,d ,t (πd ,t ) ∣∣∣< ε. (11.20) Let t = L/m and let (SM )L≤M m1/3]≤O(m−4/3). (11.24) Finally, (11.22) follows from (11.23), (11.24) and the union bound. □ As a final preparation towards the proof of Proposition 2.11 we need a lower bound on log Z (Φ̂). Lemma 11.17. We have Var(log Z (Φ̂)) =Ω(n). Proof. LetC be the set of isolated sub-formulas of Φ̂with precisely three clauses and three variables that are acyclic and whose unique variable of degree two appears with the same sign in both its adjacent clauses. Moreover, let C ′ be the set of isolated sub-formulas of Φ̂ with precisely three clauses and three variables that are acyclic such that the unique variable of degree two appears with two different signs in its adjacent clauses. Then E|C| = E|C′| =Ω(n) and w.h.p. we have Var(|C| | |C|+ |C′|) = Var(|C′| | |C|+ |C′|) =Ω(n). (11.25) Additionally, for each sub-formula C ∈ C we have Z (C ) = 5, while for C ′ ∈ C′ we have Z (C ′) = 4. Since with the sum ranging over the connected components C of Φ̂ we have log Z (Φ̂) = ∑ C log Z (C ), the assertion follows from (11.25). □ 30 Proof of Corollary 2.11. The corollary is an immediate consequence of Lemma 2.4, Lemma 10.1, Corollary 11.11, Lemma 11.12 and Lemma 11.17. □ 12. PROOF OF THEOREM 1.1 We derive Theorem 1.1 from the following general martingale central limit theorem, which is a special case of [33, Theorem 3.2] (see also the subsequent remark there). Theorem 12.1 ([33, Theorem 3.2]). Let (Z n,i ,Fn,i )0≤i≤mn ,n≥1 be a zero-mean, square-integrable martingale array with differences X n,i = Z n,i −Z n,i−1 for 1 ≤ i ≤ mn . Assume that there exists a constant η2 such that lim n→∞ max 1≤i≤mn |X n,i | = 0 in probability, (12.1) lim n→∞ mn∑ i=1 X 2 n,i = η2 in probability, (12.2) E [ max 1≤i≤mn X 2 n,i ] is bounded in n. (12.3) Then Z n,mn converges in distribution to a Gaussian distribution with mean zero and variance η2. Proof of Theorem 1.1. We apply Theorem 12.1 to the filtration (Fn,M )0≤M≤mn from Section 2.8 and to the Doob martingale (Z n,M −E[ Z n,M ] )M from (2.13). This is zero-mean by construction and square-integrable, as log Z (Φ̂) is non-negative and bounded above by n. Let X n,M = Z n,M −Z n,M−1 be the martingale differences. Proposition 2.12 immediately implies conditions (12.1)–(12.2) of Theorem 12.1 since L1-convergence implies convergence in prob- ability. Condition (12.3) also follows from Proposition 2.12 by observing that E [ max 1≤M≤mn X 2 n,M ] ≤ E [ mn∑ M=1 X 2 n,M ] ≤ E ∣∣∣∣∣ mn∑ M=1 X 2 n,M −η(d)2 ∣∣∣∣∣+η(d)2. Furthermore, Lemma 10.1 guarantees that η(d) <∞, while Corollary 2.11 shows that η(d) > 0. Thus, the assertion follows from Theorem 12.1. □ Acknowledgement. Amin Coja-Oghlan’s research is supported by DFG CO 646/3, DFG CO 646/5 and DFG CO 646/6. Pavel Zakharov’s research is supported by DFG CO 646/6. Haodong Zhu’s research is supported by the Eu- ropean Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agree- ment no. 945045, and by the NWO Gravitation project NETWORKS under grant no. 024.002.003. Noela Müller’s research is supported by the NWO Gravitation project NETWORKS under grant no. 024.002.003. We thank Nicloa Kistler for helpful discussions, and particularly for bringing [19] to our attention. REFERENCES [1] E. Abbe, S. Li, A. Sly: Proof of the contiguity conjecture and lognormal limit for the symmetric perceptron. Proc. 62nd FOCS (2022) 327–338. [2] E. Abbe, A. Montanari: On the concentration of the number of solutions of random satisfiability formulas. Random Structures and Algo- rithms 45 (2014) 362–382. [3] D. Achlioptas, A. Chtcherba, G. Istrate, C. Moore: The phase transition in 1-in-k SAT and NAE 3-SAT. Proc. 12th SODA (2001) 721–722. [4] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802. [5] D. Achlioptas, A. Coja-Oghlan, M. Hahn-Klimroth, J. Lee, N. Müller, M. Penschuck, G. Zhou: The number of satisfying assignments of random 2-SAT formulas. Random Structures and Algorithms 58 (2021) 609–647. [6] D. Achlioptas, C. Moore: Random k-SAT: two moments suffice to cross a sharp threshold. SIAM Journal on Computing 36 (2006) 740–762. [7] D. Achlioptas, A. Naor, Y. Peres: Rigorous location of phase transitions in hard optimization problems. Nature 435 (2005) 759–764. [8] D. Achlioptas, Y. Peres: The threshold for random k-SAT is 2k ln2−O(k). Journal of the AMS 17 (2004) 947–973. [9] D. Aldous, J. Steele: The objective method: probabilistic combinatorial optimization and local weak convergence. In: H. Kesten (ed.): Probability on Discrete Structures. Springer (2004). [10] B. Aspvall, M.F. Plass, R.E. Tarjan: A linear-time algorithm for testing the truth of certain quantified boolean formulas. Information Pro- cessing Letters 8 (1979) 121–123. [11] P. Ayre, A. Coja-Oghlan, P. Gao, N. Müller: The satisfiability threshold for random linear equations. Combinatorica 40 (2020) 179–235 [12] V. Bapst, A. Coja-Oghlan, C. Efthymiou: Planting colourings silently. Combintorics, Probability and Computing 26 (2017) 338-366. [13] V Bapst, A. Coja-Oghlan, S. Hetterich, F. Rassmann, D. Vilenchik: The condensation phase transition in random graph coloring. Commu- nications in Mathematical Physics 341 (2016) 543–606. [14] P. J. Bickel, P. A. Freedman. Some asymptotic theory for the bootstrap. Annals of Statistics 9 (1981) 1196–1217. [15] B. Bollobás, C. Borgs J. Chayes, J. Kim, D. Wilson: The scaling window of the 2-SAT transition. Random Structures and Algorithms 18 (2001) 201–256. 31 [16] S. Cao: Central limit theorems for combinatorial optimization problems on sparse Erdős-Rényi graphs. Annals of Applied Probability 31 (2021) 1687–1723. [17] P. Cheeseman, B. Kanefsky, W. Taylor: Where the really hard problems are. Proc. IJCAI (1991) 331–337. [18] G. Bresler, B. Huang: The algorithmic phase transition of random k-SAT for low degree polynomials. Proc. 62nd FOCS (2021) 298–309. [19] W.-K. Chen, P. Dey, D. Panchenko: Fluctuations of the free energy in the mixed p-spin models with external field. Probability Theory and Related Fields 168 (2017) 41–53. [20] V. Chvátal, B. Reed: Mick gets some (the odds are on his side). Proc. 33th FOCS (1992) 620–627. [21] A. Coja-Oghlan, T. Kapetanopoulos, N. Müller: The replica symmetric phase of random constraint satisfaction problems. Combinatorics, Probability and Computing 29 (2020) 346-422. [22] A. Coja-Oghlan, F. Krzakala, W. Perkins, L. Zdeborová: Information-theoretic thresholds from the cavity method. Advances in Mathematics 333 (2018) 694–795. [23] A. Coja-Oghlan, K. Panagiotou: The asymptotic k-SAT threshold. Advances in Mathematics 288 (2016) 985–1068. [24] A. Coja-Oghlan, N. Wormald: The number of satisfying assignments of random regular k-SAT formulas. Combinatorics, Probability and Computing 27 (2018) 496–530. [25] J. Ding, A. Sly, N. Sun: Proof of the satisfiability conjecture for large k. 20 Annals of Mathematics 196 (2022) 1–388. [26] S. Dovgal, É. de Panafieu, V. Ravelomanana: Exact enumeration of satisfiable 2-SAT formulae. arXiv:2108.08067 (2021). [27] O. Dubois, J. Mandler: The 3-XORSAT threshold. Proc. 43rd FOCS (2002) 769–778. [28] G. Eagleson: Martingale convergence to mixtures of infinitely divisible laws. Annals of Probability 3 (1975) 557–562. [29] C. Efthymiou: On sampling symmetric gibbs distributions on sparse random graphs and hypergraphs. Proc. 49th ICALP (2022) #57. [30] E. Friedgut: Sharp thresholds of graph properties, and the k-SAT problem. Journal of the AMS 12 (1999) 1017–1054. [31] M. Glasgow, M. Kwan, A. Sah, M. Sawhney: A central limit theorem for the matching number of a sparse random graph. arXiv:2402.05851 (2024). [32] A. Goerdt: A threshold for unsatisfiability. J. Comput. Syst. Sci. 53 (1996) 469–486 [33] P. Hall, C. Heyde: Martingale limit theory and its applications. Academic Press (1980). [34] E. Kreačič: Some problems related to the Karp-Sipser algorithm on random graphs. Ph.D. thesis, University of Oxford, 2017. [35] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová: Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. National Academy of Sciences 104 (2007) 10318–10323. [36] M. Mézard, A. Montanari: Information, physics and computation. Oxford University Press (2009). [37] M. Mézard, G. Parisi, R. Zecchina: Analytic and algorithmic solution of random satisfiability problems. Science 297 (2002) 812–815. [38] R. Monasson, R. Zecchina: The entropy of the k-satisfiability problem. Phys. Rev. Lett. 76 (1996) 3881. [39] A. Montanari, D. Shah: Counting good truth assignments of random k-SAT formulae. Proc. 18th SODA (2007) 1255–1264. [40] E. Mossel, J. Neeman, A Sly: Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields (2014) 1–31. [41] D. Panchenko: On the replica symmetric solution of the K -sat model. Electron. J. Probab. 19 (2014) #67. [42] D. Panchenko, M. Talagrand: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130 (2004) 319–336. [43] B. Pittel, G. Sorkin: The satisfiability threshold for k-XORSAT. Combinatorics, Probability and Computing 25 (2016) 236–268. [44] F. Rassmann: On the number of solutions in random graph k-colouring. Combinatorics, Probability and Computing 28 (2019) 130–158. [45] R. Robinson, N. Wormald: Almost all regular graphs are Hamiltonian. Random Structures and Algorithms 5 (1994) 363–374. [46] A. Sly, N. Sun, Y. Zhang: The number of solutions for random regular NAE-SAT. Probability Theory and Related Fields 182 (2022) 1–109. [47] M. Talagrand: The high temperature case for the random K -sat problem. Probab. Theory Related Fields 119 (2001) 187–212. [48] L. Valiant: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8 (1979) 410–421. ARNAB CHATTERJEE, arnab.chatterjee@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT- MUND 44227, GERMANY. AMIN COJA-OGHLAN, amin.coja-oghlan@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATH- EMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY. NOELA MÜLLER, n.s.muller@tue.nl, EINDHOVEN UNIVERSITY OF TECHNOLOGY, DEPARTMENT OF MATHEMATICS AND COMPUTER SCI- ENCE, METAFORUM MF 4.084, 5600 MB EINDHOVEN, THE NETHERLANDS. CONNOR RIDDLESDEN, c.d.riddlesden@tue.nl, EINDHOVEN UNIVERSITY OF TECHNOLOGY, DEPARTMENT OF MATHEMATICS AND COM- PUTER SCIENCE, METAFORUM MF 4.084, 5600 MB EINDHOVEN, THE NETHERLANDS. MAURICE ROLVIEN, maurice.rolvien@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT- MUND 44227, GERMANY. PAVEL ZAKHAROV, pavel.zakharov@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATHEMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY. HAODONG ZHU, h.zhu1@tue.nl, EINDHOVEN UNIVERSITY OF TECHNOLOGY, DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE, 5600 MB EINDHOVEN, THE NETHERLANDS. 32 BELIEF PROPAGATION GUIDED DECIMATION ON RANDOM k-XORSAT ARNAB CHATTERJEE, AMIN COJA-OGHLAN, MIHYUN KANG, LENA KRIEG, MAURICE ROLVIEN, GREGORY B. SORKIN ABSTRACT. We analyse the performance of Belief Propagation Guided Decimation, a physics-inspired message passing algorithm, on the random k-XORSAT problem. Specifically, we derive an explicit threshold up to which the algorithm succeeds with a strictly positive probability Ω(1) that we compute explicitly, but beyond which the algorithm with high probability fails to find a satisfying assignment. In addition, we analyse a thought experiment called the decimation process for which we identify a (non-)reconstruction and a condensation phase transition. The main results of the present work confirm physics predictions from [Ricci-Tersenghi and Semerjian: J. Stat. Mech. 2009] that link the phase transitions of the decimation process with the performance of the algorithm, and improve over partial results from a recent article [Yung: Proc. ICALP 2024]. MSc: 60B20, 68W20 1. INTRODUCTION AND RESULTS 1.1. Background and motivation. The random k-XORSAT problem shares many characteristics of other intensely studied random constraint satisfaction problems (‘CSPs’) such as random k-SAT. For instance, random k-XORSAT possesses a sharp satisfiability threshold preceded by a reconstruction or ‘shattering’ phase transition that affects the geometry of the set of solutions [2, 12, 17, 24]. As in random k-SAT, these transitions appear to significantly impact the performance of certain classes of algorithms [7, 16]. At the same time, random k-XORSAT is more amenable to mathematical analysis than, say, random k-SAT. This is because the XOR operation is equivalent to addition modulo two, which is why a k-XORSAT instance translates into a linear system over F2. In effect, k- XORSAT can be solved in polynomial time by means of Gaussian elimination. In addition, the algebraic nature of the problem induces strong symmetry properties that simplify its study [3]. Because of its similarities with other random CSPs combined with said relative amenability, random k-XORSAT provides an instructive benchmark. This was noticed not only in combinatorics, but also in the statistical physics community, which has been contributing intriguing ‘predictions’ on random CSPs since the early 2000s [19, 22]. Among other things, physicists have proposed a message passing algorithm called Belief Propagation Guided Dec- imation (‘BPGD’) that, according to computer experiments, performs impressively on various random CSPs [21]. Furthermore, Ricci-Tersenghi and Semerjian [25] put forward a heuristic analysis of BPGD on random k-SAT and k-XORSAT. Their heuristic analysis proceeds by way of a thought experiment based on an idealized version of the algorithm. We call this thought experiment the decimation process. Based on physics methods Ricci-Tersenghi and Semerjian surmise that the decimation process undergoes two phase transitions, specifically a reconstruction and a condensation transition. A key prediction of Ricci-Tersenghi and Semerjian is that these phase transitions are directly linked to the performance of the BPGD algorithm. Due to the linear algebra-induced symmetry properties, in the case of random k-XORSAT all of these conjectures come as elegant analytical expressions. The aim of this paper is to verify the predictions from [25] on random k-XORSAT mathematically. Specifically, our aim is to rigorously analyse the BPGD algorithm on random k-XORSAT, and to establish the link between its performance and the phase transitions of the decimation process. A first step towards a rigorous analysis of BPGD on random k-XORSAT was undertaken in a recent contribution by Yung [27]. However, Yung’s analysis turns out to be not tight. Specifically, apart from requiring spurious lower bounds on the clause length k, Yung’s results do not quite establish the precise connection between the decimation process and the performance of BPGD. One reason for this is that [27] relies on ‘annealed’ techniques, i.e., essentially moment computations. Here we instead harness ‘quenched’ arguments that were partly developed in prior work on the rank of random matrices over finite fields [3, 8]. Throughout we let k ≥ 3 and n ≥ k be integers and d > 0 a positive real. Let m dist= Po(dn/k) and let F = F (n,d ,k) be a random k-XORSAT formula with variables x1, . . . , xn and m random clauses of length k. To be precise, every Amin Coja-Oghlan is supported by DFG CO 646/3, DFG CO 646/5 and DFG CO 646/6. This research was funded in part by the Austrian Science Fund (FWF) 10.55776/I6502. 1 ar X iv :2 50 1. 17 65 7v 1 [ m at h. C O ] 2 9 Ja n 20 25 clause of F is an XOR of precisely k distinct variables, each of which may or may not come with a negation sign. The m clauses are drawn uniformly and independently out of the set of all 2k (n k ) possibilities. Thus, d equals the average number of clauses that a given variable xi appears in. An event E occurs with high probability (‘w.h.p.’) if limn→∞P [F ∈ E ] = 1. We always keep d ,k fixed as n →∞. 1.2. Belief Propagation Guided Decimation. The first result vindicates the predictions from [25] concerning the success probability of BPGD algorithm. BPGD sets its ambitions higher than merely finding a solution to the k- XORSAT instance F : the algorithm attempts to sample a solution uniformly at random. To this end BPGD assigns values to the variables x1, . . . , xn of F one after the other. In order to assign the next variable the algorithm attempts to compute the marginal probability that the variable is set to ‘true’ under a random solution to the k-XORSAT in- stance, given all previous assignments. More precisely, suppose BPGD has assigned values to the variables x1, . . . , xt already. WriteσBP(x1), . . . ,σBP(xt ) ∈ {0,1} for their values, with 1 representing ‘true’ and 0 ‘false’. Further, let F BP,t be the simplified formula obtained by substituting σBP(x1), . . . ,σBP(xt ) for x1, . . . , xt . We drop any clauses from F BP,t that contain variables from {x1, . . . , xt } only, deeming any such clauses satisfied. Thus, F BP,t is a XORSAT formula with variables xt+1, . . . , xn . Its clauses contain at least one and at most k variables, as well as possibly a constant (the XOR of the values substituted in for x1, . . . , xt ). Let σF BP,t be a uniformly random solution of the XORSAT formula F BP,t , assuming that F BP,t remains satis- fiable. Then BPGD aims to compute the marginal probability P [ σF BP,t (xt+1) = 1 | F BP,t ] that a random satisfying assignment of F BP,t sets xt+1 to true. This is where Belief Propagation (‘BP’) comes in. An efficient message passing heuristic for computing precisely such marginals, BP returns an ‘approximation’µF BP,t ofP [ σF BP,t (xt+1) = 1 | F BP,t ] . We will recap the mechanics of BP in Section 2.2 (the value µF BP,t is defined precisely in (2.11)). Having computed the BP ‘approximation’, BPGD proceeds to assign xt+1 the value ‘true’ with probability µF BP,t , otherwise sets xt+1 to ‘false’, then moves on to the next variable. The pseudocode is displayed as Algorithm 1. Data: a random k-XORSAT formula F with variables x1, . . . , xn conditioned on being satisfiable 1 for t = 0, . . . ,n −1 do 2 compute the BP approximation µF BP,t ; 3 set σBP(xt+1) = { 1 with probability µF BP,t 0 with probability 1−µF BP,t ; 4 return σBP; Algorithm 1: The BPGD algorithm. Let us pause for a few remarks. First, if the BP approximations are exact, i.e., if F BP,t is satisfiable and µF BP,t = P [ σF BP,t (xt+1) = 1 | F BP,t ] for all t , then Bayes’ formula shows that BPGD outputs a uniformly random solution of F . However, there is no universal guarantee that BP returns the correct marginals. Accordingly, the crux of analysing BPGD is precisely to figure out whether this is the case. Indeed, the heuristic work of [25] ties the accuracy of BP to a phase transition of the decimation process thought experiment, to be reviewed momentarily. Second, the strategy behind the BPGD algorithm, particularly the message passing heuristic for ‘approximating’ the marginals, generalizes well beyond k-XORSAT. For instance, the approach applies to k-SAT verbatim. That said, due to the algebraic nature of the XOR operation, BPGD is far easier to analyse on k-XORSAT. In fact, in XORSAT the marginal probabilities are guaranteed to be half-integral as seen in Fact 2.3, i.e., P [ σF BP,t (xt+1) = 1 | F BP,t ] ∈ {0,1/2,1}. (1.1) As a consequence, on XORSAT the BPGD algorithm effectively reduces to a purely combinatorial algorithm called Unit Clause Propagation [19, 25] as per Proposition 6.1, a fact that we will exploit extensively (see Section 6). 2 1.3. A tight analysis of BPGD. In order to state the main results we need to introduce a few threshold values. To this end, given d ,k and an additional real parameter λ≥ 0, consider the functions 1 φd ,k,λ :[0,1] → [0,1], z 7→ 1−exp ( −λ−d zk−1 ) , (1.2) Φd ,k,λ :[0,1] →R, z 7→ exp ( −λ−d zk−1 ) − d(k −1) k zk +d zk−1 − d k . (1.3) Let α∗(λ) = α∗(d ,k,λ) ∈ [0,1] be the smallest and α∗(λ) = α∗(d ,k,λ) ≥ α∗(d ,k,λ) ∈ [0,1] the largest fixed point of φd ,k,λ. Figure 1 visualizesΦ(z) for different values of θ. Further, define dmin(k) = ( k −1 k −2 )k−2 , dcore(k) = sup { d > 0 :α∗(0) = 0 } , dsat(k) = sup { d > 0 :Φd ,k,0(α∗(0)) ≤Φd ,k,0(0) } . (1.4) The value dsat(k) is the random k-XORSAT satisfiability threshold [3, 12, 24]. Thus, for d < dsat(k) the random k-XORSAT formula F possesses satisfying assignments w.h.p., while F is unsatisfiable for d > dsat(k) w.h.p. Further- more, dcore(k) equals the threshold for the emergence of a giant 2-core within the k-uniform hypergraph induced by F [3, 23]. This implies that for d < dcore(k) the set of solutions of F is contiguous in a certain well-defined way, while for dcore(k) < d < dsat(k) the set of solutions shatters into an exponential number of well-separated clus- ters [16, 19]. Moreover, a simple linear time algorithm is known to find a solution w.h.p. for d < dcore(k) [16]. The relevance of dmin(k) will emerge momentarily. A bit of calculus reveals that 0 < dmin(k) < dcore(k) < dsat(k) < k. (1.5) The following theorem determines the precise clause-to-variable densities where BPGD succeeds/fails. To be precise, in the ‘successful’ regime BPGD does not actually succeed with high probability, but with an explicit prob- ability strictly between zero and one, which is displayed in Figure 2 for k = 3,4,5. 0.2 0.4 0.6 0.8 1 z -0.1 -0.05 0.05 0.1 0.15 0.2 Φd, k, λ FIGURE 1. Φd ,k,λ for k = 3 and d = 2.4, for λ from 0 to 0.3 (maximum at z = 0) and from 0.4 to 0.9 0.5 1 1.5 2 2.5 d 0.2 0.4 0.6 0.8 1 k= 3 k= 4 k= 5 FIGURE 2. Success probability of BPGD for 0 < d < dmin(k) and various k. Theorem 1.1. Let k ≥ 3. (i) If d < dmin(k), then lim n→∞P [ BPGD(F ) finds a satisfying assignment ]= exp ( −d 2(k −1)2 4 ∫ 1 0 z2k−4(1− z) 1−d(k −1)zk−2(1− z) dz ) . (1.6) (ii) If dmin(k) < d < dsat(k), then P [ BPGD(F ) finds a satisfying assignment ]= o(1). Theorem 1.1 vindicates the predictions from Ricci-Tersenghi and Semerjian [25, Section 4] as to the perfor- mance of BPGD, and improves over the results from Yung [27]. Specifically, Theorem 1.1 (i) verifies the formula for the success probability from [25, Eq. (38)]. Combinatorially, the formula (1.6) results from the possible presence of bounded length cycles (so called toxic cycles) that may cause the algorithm to run into contradictions. By contrast, Yung has no positive result on the performance of BPGD. Moreover, Yung’s negative results [27, Theorems 2–3] 1The function Φd ,k,λ is known in physics parlance as the “Bethe free entropy” [8, 19]. The stationary points of Φd ,k,λ coincide with the fixed points of φd ,k,λ, as we will verify in Section 2.1. 3 only apply to k ≥ 9 and to d > dcore(k), while Theorem 1.1 (ii) covers all k ≥ 3 and kicks in at the correct threshold dmin(k) < dcore(k) predicted in [25]. 1.4. The decimation process. In addition to the BPGD algorithm itself, the heuristic work [25] considers an ide- alised version of the algorithm, the decimation process. This thought experiment highlights the conceptual reasons behind the success/failure of BPGD. Just like BPGD, the decimation process assigns values to variables one after the other for good. But instead of the BP ‘approximations’ the decimation process uses the actual marginals given its previous decisions. To be precise, suppose that the input formula F is satisfiable and that variables x1, . . . , xt have already been assigned values σDC(x1), . . . ,σDC(xt ) in the previous iterations. Obtain F DC,t by substituting the val- ues σDC(x1), . . . ,σDC(xt ) for x1, . . . , xt and dropping any clauses that do not contain any of xt+1, . . . , xn . Thus, F DC,t is a XORSAT formula with variables xt+1, . . . , xn . Let σF DC,t be a random satisfying assignment of F DC,t . Then the decimation process sets xt+1 according to the true marginal P [ σF DC,t (xt+1) = 1 | F DC,t ] , thus ultimately returning a uniformly random satisfying assignment of F . Data: a random k-XORSAT formula F , conditioned on being satisfiable 1 for t = 0, . . . ,n −1 do 2 compute πF DC,t =P [ σF DC,t (xt+1) = 1 | F DC,t ] ; 3 set σDC(xt ) = { 1 with probability πF DC,t 0 with probability 1−πF DC,t ; 4 return σDC; Algorithm 2: The decimation process. Clearly, if indeed the BP ‘approximations’ are correct, then the decimation process and BPGD are identical. Thus, a key question is for what parameter regimes the two process coincide or diverge, respectively. As it turns out, this question is best answered by parametrize not only in terms of the average variable degree d , but also in terms of the ‘time’ parameter t of the decimation process. 1.5. Phase transitions of the decimation process. Ricci-Tersenghi and Semerjian heuristically identify several phase transitions in terms of d and t that the decimation process undergoes. We will confirm these predictions mathematically and investigate how they relate to the performance of BPGD. The first set of relevant phase transitions concerns the so-called non-reconstruction property. Roughly speak- ing, non-reconstruction means that the marginal πF DC,t =P [ σF DC,t (xt+1) = 1 | F DC,t ] is determined by short-range rather than long-range effects. Since Belief Propagation is essentially a local algorithm, one might expect that the (non-)reconstruction phase transition coincides with the threshold up to which BPGD succeeds; cf. the discussions in [5, 17]. To define (non-)reconstruction precisely, we associate a bipartite graph G(F DC,t ) with the formula F DC,t . The vertices of this graph are the variables and clauses of F DC,t . Each variable is adjacent to the clauses in which it appears. For a (variable or clause) vertex v of G(F DC,t ) let ∂v be the set of vs neighbours. More generally, for an integer ℓ ≥ 1 let ∂ℓv be the set of vertices of G(F DC,t ) at shortest path distance precisely ℓ from v . Following [17], we say that F DC,t has the non-reconstruction property if lim ℓ→∞ limsup n→∞ E [∣∣∣P [ σF DC,t (xt+1) = 1 ∣∣∣F DC,t , { σF DC,t (y) } y∈∂2ℓxt+1 ] −P[ σF DC,t (xt+1) = 1 | F DC,t ]∣∣∣ ∣∣F satisfiable ] = 0. (1.7) Conversely, F DC,t has the reconstruction property if liminf ℓ→∞ liminf n→∞ E [∣∣∣P [ σF DC,t (xt+1) = 1 ∣∣∣F DC,t , { σF DC,t (y) } y∈∂2ℓxt+1 ] −P[ σF DC,t (xt+1) = 1 | F DC,t ]∣∣∣ ∣∣F sat. ] > 0. (1.8) To parse (1.7), notice that in the left probability term we condition on both the outcome F DC,t of the first t steps of the decimation process and on the values σF DC,t (y) that the random solution σF DC,t assigns to the variables y at distance exactly 2ℓ from xt+1. By contrast, in the right probability term we only condition on F DC,t . Thus, the second probability term matches the probability πF DC,t from the decimation process. Hence, (1.7) compares the probability that a random solution sets xt+1 to one given the values σF DC,t (y) of all variables y at distance 2ℓ from xt+1 with plain marginal probability that xt+1 is set to one. What (1.7) asks is that these two probabilities 4 be asymptotically equal in the limit of large ℓ, with high probability over the choice of F and the prior steps of the decimation process. Thus, so long as non-reconstruction holds ‘long-range effects’, meaning anything beyond distance 2ℓ for large enough but fixed ℓ, are negligible. Confirming the predictions from [25], the following theorem identifies the precise regimes of d , t where (non- )reconstruction holds. To state the theorem, we need to know that for dmin(k) < d < dsat(k) the polynomial d(k − 1)zk−2(1− z)−1 has precisely two roots 0 < z∗ = z∗(d ,k) < z∗ = z∗(d ,k) < 1; we are going to prove this as part of Proposition 2.2 below. Let λ∗ =λ∗(d ,k) =− log(1− z∗)− z∗ (k −1)(1− z∗) >λ∗ =λ∗(d ,k) = max { 0,− log(1− z∗)− z∗ (k −1)(1− z∗) } ≥ 0, (1.9) θ∗ = θ∗(d ,k) = 1−exp(−λ∗) > θ∗ = θ∗(d ,k) = 1−exp(−λ∗). (1.10) Additionally, let λcond(d ,k) be the solution to the ODE ∂λcond(d ,k) ∂d =− α∗(λcond(d ,k))k −α∗(λcond(d ,k))k k(α∗(λcond(d ,k))−α∗(λcond(d ,k))) , λcond(dsat(k),k) = 0 (1.11) on the interval (dmin,dsat] and set θcond = θcond(d ,k) = 1−exp(−λcond(d ,k)). Note that θ∗ < θcond < θ∗. Theorem 1.2. Let k ≥ 3 and let 0 ≤ t = t (n) ≤ n be a sequence such that limn→∞ t/n = θ ∈ (0,1). (i) If d < dmin(k), then F DC,t has the non-reconstruction property w.h.p. (ii) If dmin(k) < d < dsat(k) and θ < θ∗ or θ > θcond, then F DC,t has the non-reconstruction property w.h.p. (iii) If dmin(k) < d < dsat(k) and θ∗ < θ < θcond, then F DC,t has the reconstruction property w.h.p. Theorem 1.2 shows that dmin(k) marks the precise threshold of d up to which the decimation process F DC,t exhibits non-reconstruction for all 0 ≤ t ≤ n w.h.p. By contrast, for dmin(k) < d < dsat(k) there is a regime of t where reconstruction occurs. In fact, as Proposition 2.2 shows, for d > dcore(k) we have θ∗ = 0 and thus reconstruction holds even at t = 0, i.e., for the original, undecimated random formula F . Prior to the contribution [25], it had been suggested that this precise scenario (reconstruction on the original problem instance) is the stone on which BPGD stumbles [5]. In fact, Yung’s negative result kicks in at this precise threshold dcore(k). However, Theorems 1.1 and 1.2 show that matters are more subtle. Specifically, for dmin(k) < d < dcore(k) reconstruction, even though absent in the initial formula F , occurs at a later ‘time’ t > 0 as decimation proceeds, which suffices to trip BPGD up. Also, remarkably, Theorem 1.2 shows that non-reconstruction is not ‘monotone’. The property holds for θ < θ∗ and then again for θ > θcond, but not on the interval (θ∗,θcond) as visualised in Figure 3. But there is one more surprise. Namely, Theorem 1.2 (ii) might suggest that for dmin(k) < d < dsat(k) Belief Propagation manages to compute the correct marginals for t/n ∼ θ > θcond, as non-reconstruction kicks back in. But remarkably, this is not quite true. Despite the fact that non-reconstruction holds, BPGD goes astray because the algorithm starts its message passing process from a mistaken, oblivious initialisation. As a consequence, for t/n ∼ θ ∈ (θcond,θ∗) the BP ‘approximations’ remain prone to error. To be precise, the following result identifies the precise ‘times’ where BP succeeds/fails. To state the result let µF DC,t denote the BP ‘approximation’ of the true marginal πF DC,t of variable xt+1 in the formula F DC,t created by the decimation process (see Section 2.2 for a reminder of the definition). Also recall that πF DC,t denotes the correct marginal as used by the decimation process. Theorem 1.3. Let k ≥ 3 and let 0 ≤ t = t (n) ≤ n be a sequence such that limn→∞ t/n = θ ∈ (0,1). (i) If 0 < d < dmin(k) then µF DC,t =πF DC,t w.h.p. (ii) If dmin(k) < d < dsat(k) and θ < θcond or θ > θ∗, then µF DC,t =πF DC,t w.h.p. (iii) If dmin(k) < d < dsat(k) and θcond < θ < θ∗, then E ∣∣µF DC,t −πF DC,t ∣∣=Ω(1). The upshot of Theorems 1.2–1.3 is that the relation between the accuracy of BP and reconstruction is sub- tle. Everything goes well so long as d < dmin as non-reconstruction holds throughout and the BP approximations are correct. But if dmin < d < dsat and θ∗ < θ < θcond, then Theorem 1.2 (iii) shows that reconstruction occurs. Nonetheless, Theorem 1.3 (ii) demonstrates that the BP approximations remain valid in this regime. By contrast, for θcond < θ < θ∗ we have non-reconstruction by Theorem 1.2 (iii), but Theorem 1.3 (iii) shows that BP misses its mark with a non-vanishing probability. Finally, for θ > θ∗ everything is in order once again as BP regains its footing and non-reconstruction holds. Unfortunately BPGD is unlikely to reach this happy state because the algorithm is bound to make numerous mistakes at times t/n ∈ (θcond,θ∗). 5 2.0 2.2 2.4 2.6 d 0.00 0.05 0.10 0.15 dcoredmin dsat * cond * (A) k = 3 2.5 3.0 3.5 d 0.0 0.1 0.2 0.3 dcoredmin dsat * cond * (B) k = 4 2.5 3.0 3.5 4.0 4.5 d 0.0 0.1 0.2 0.3 0.4 dcoredmin dsat * cond * (C) k = 5 FIGURE 3. The phase diagrams for k = 3,4,5 with d ∈ (dmin,dsat) on the horizontal and θ on the vertical axis. The hatched area displays the regime θ < θ∗ and θcond < θwhere non reconstruction holds. In the non hatched area, where θ∗ < θ < θcond, we have reconstruction. Similarly, the blue area displays θ < θcond and θ > θ∗ where BP is correct whereas in the orange area, BP is inaccurate. Theorems 1.2 and 1.3 confirm the predictions from [25, Section 4]. To be precise, while θcond matches the predictions of Ricci-Tersenghi and Semerjian, the ODE formula (1.11) for the threshold, which is easy to evaluate numerically, does not appear in [25]. Instead of the ODE formulation, Ricci-Tersenghi and Semerjian define λcond as the (unique) λ ≥ 0 such that Φd ,k,λ(α∗) = Φd ,k,λ(α∗); Proposition 2.2 below shows that both are equivalent. Illustrating Theorems 1.2–1.3, Figure 3 displays the phase diagram in terms of d and θ ∼ t/n for k = 3,4,5. 2. OVERVIEW This section provides an overview of the proofs of Theorems 1.1–1.3. In the final paragraph we conclude with a discussion of further related work. We assume throughout that k ≥ 3 is an integer and that 0 < d < dsat(k). Moreover, t = t (n) denotes an integer sequence 0 ≤ t (n) ≤ n such that limn→∞ t (n)/n = θ ∈ (0,1). 2.1. Fixed points and thresholds. The first item on our agenda is to study the functions φd ,k,λ,Φd ,k,λ from (1.2)– (1.3). Specifically, we are concerned with the maxima of Φd ,k,λ and the fixed points of φd ,k,λ, the combinatorial relevance of which will emerge as we the analyse BPGD and the decimation process. We begin by observing that the fixed points of φd ,k,λ are precisely the stationary points ofΦd ,k,λ. Fact 2.1. For any d > 0,λ≥ 0 the stationary points z ∈ (0,1) ofΦd ,k,λ coincide with the fixed points of φd ,k,λ in (0,1). Furthermore, for a fixed point z ∈ (0,1) of φd ,k,λ we have Φ′′ d ,k,λ(z)    < 0 if φ′ d ,k,λ(z) < 1, = 0 if φ′ d ,k,λ(z) = 1, > 0 if φ′ d ,k,λ(z) > 1. (2.1) Proof. DifferentiatingΦd ,k,λ, we obtain Φ′ d ,k,λ(z) = d(k −1)zk−2 ( φd ,k,λ(z)− z ) . (2.2) Hence, a point z ∈ (0,1) is a fixed point of φd ,k,λ iffΦ′ d ,k,λ(z) = 0. Differentiating (2.2) once more, we obtain Φ′′ d ,k,λ(z) = d(k −1)zk−3 [ (k −2) ( φd ,k,λ(z)− z )+ z ( φ′ d ,k,λ(z)−1 )] . (2.3) Clearly, if φd ,k,λ(z) = z, then (2.3) simplifies toΦ′′ d ,k,λ(z) = d(k −1)zk−2(φ′ d ,k,λ(z)−1), whence (2.1) follows. □ We recall that 0 ≤ α∗ = α∗(d ,k,λ) ≤ α∗ = α∗(d ,k,λ) ≤ 1 are the smallest and the largest fixed point of φd ,k,λ in [0,1], respectively. Fact 2.1 shows thatΦd ,k,λ attains its global maximum in [0,1] at α∗ or α∗. Let αmax =αmax(d ,k,λ) ∈ {α∗,α∗} be the maximiser of Φd ,k,λ; if Φd ,k,λ(α∗) =Φd ,k,λ(α∗), set αmax = α∗. The following proposition characterises the fixed points of φd ,k,λ and the maximiser αmax. 6 0.04 0.06 0.08 0.10 0.12 0.0 0.2 0.4 0.6 0.8 cond * * max * * (A) αmax 0.04 0.06 0.08 0.10 0.12 0.08 0.10 0.12 0.14 0.16 ( ) cond * * ( max) ( * ) ( *) (B) Φ(αmax) FIGURE 4. αmax andΦ(αmax) for d = 2.4 and k = 3 from θ∗ to θ∗. Proposition 2.2. (i) If d < dmin(k), then for all λ> 0 we have α∗(d ,k,λ) =α∗(d ,k,λ), the function λ ∈ (0,∞) 7→α∗(d ,k,λ) ∈ (0,1) is analytic, and α∗(d ,k,λ) is the unique stable fixed point of φd ,k,λ. (ii) If dmin(k) < d < dsat(k), then the polynomial d(k −1)zk−2(1− z)−1 has precisely two roots 0 < z∗ < z∗ < 1, the numbers λ∗,λ∗ from (1.9) satisfy 0 ≤λ∗ <λ∗ and the following is true. (a) If λ<λ∗ or λ>λ∗, then α∗(d ,k,λ) =α∗(d ,k,λ) ∈ (0,1) is the unique stable fixed point of φd ,k,λ. (b) If λ∗ <λ<λ∗, then 0 <α∗(d ,k,λ) <α∗(d ,k,λ) < 1 are the only stable fixed points of φd ,k,λ. (c) The functions λ ∈ (0,λ∗) 7→α∗(d ,k,λ) and λ ∈ (λ∗,∞) 7→α∗(d ,k,λ) are analytic. (d) If dmin(k) < d < dsat(k), then the solution λcond of (1.11) satisfies λ∗ <λcond =λcond(d) <λ∗ and αmax(d ,k,λ) = { α∗(d ,k,λ) if λ<λcond, α∗(d ,k,λ) if λ>λcond. Furthermore,Φd ,k,λ(α∗(d ,k,λ)) ̸=Φd ,k,λ(α∗(d ,k,λ)) unlessλ=λcond. Thus, the functionλ 7→αmax(d ,k,λ) is analytic on (0,λcond) and on (λcond,∞), but discontinuous at λ=λcond. 2.2. Belief Propagation. Having done our analytic homework, we proceed to recall how Belief Propagation com- putes the ‘approximations’ µF BP,t that the BPGD algorithm relies upon. We will see that due to the inherent symme- tries of XORSAT the Belief Propagation computations simplify and boil down to a simpler message passing process called Warning Propagation. Subsequently we will explain the connection between Warning Propagation and the fixed points α∗,α∗ of φd ,k,λ. It is probably easiest to explain BP on a general XORSAT instance F with a set V (F ) of variables and a set C (F ) of clauses of lengths between one and k. As in Section 1.5 we consider the graph G(F ) induced by F , with vertex set V (F )∪C (F ) and an edge xa between x ∈V (F ) and a ∈C (F ) iff a contains x. Let ∂v = ∂F v be the set of neighbours of v ∈V (F )∪C (F ). Additionally, given an assignment τ ∈ {0,1}∂a of the variables that appear in a, we write τ |= a iff τ satisfies a. With each clause/variable pair x, a such that x ∈ ∂a Belief Propagation associates two sequences of ‘messages’ (µF,x→a,ℓ)ℓ≥0, (µF,a→x,ℓ)ℓ≥0 directed from x to a and from a to x, respectively. These messages are probability distributions on {0,1}, i.e., µF,x→a,ℓ = (µF,x→a,ℓ(0),µF,x→a,ℓ(1)), µF,x→a,ℓ = (µF,a→x,ℓ(0),µF,a→x,ℓ(1)) ∈ [0,1]2 and (2.4) µF,x→a,ℓ(0)+µF,x→a,ℓ(1) =µF,a→x,ℓ(0)+µF,a→x,ℓ(1) = 1. (2.5) The initial messages are uniform, i.e., µF,x→a,0(s) =µF,a→x,0(s) = 1/2 (s ∈ {0,1}). (2.6) 7 Further, the messages at step ℓ+1 are obtained from the messages at step ℓ via the Belief Propagation equations µF,a→x,ℓ+1(s) ∝ ∑ τ∈{0,1}∂a 1{τx = s, τ |= a} ∏ y∈∂a\{x} µF,y→a,ℓ(τy ), (2.7) µF,x→a,ℓ+1(s) ∝ ∏ b∈∂x\{a} µF,b→x,ℓ(s). (2.8) In (2.7)–(2.8) the∝-symbol represents the normalisation required to ensure that the updated messages satisfy (2.5). In the case of (2.8) such a normalization may be impossible because the expressions on the r.h.s. could vanish for both s = 0 and s = 1. In this event we agree that µF,x→a,ℓ+1(s) = { µF,x→a,ℓ(s) if µF,x→a,ℓ(s) ̸= 1/2 1{s = 0} otherwise (s ∈ {0,1}); in other words, we retain the messages from the previous iteration unless its value was 1/2, in which case we set µF,x→a,ℓ+1(0) = 1. The same convention applies to µF,a→x,ℓ+1(s). Further, at any time t the BP messages render a heuristic ‘approximation’ of the marginal probability that a random solution to the formula F sets a variable x to s ∈ {0,1}: µF,x,ℓ(s) ∝ ∏ b∈∂x µF,b→x,ℓ(s). (2.9) We set µF,x,ℓ(0) = 1−µF,x,ℓ(1) = 1 if the normalization in (2.9) fails, i.e., if ∑ s∈{0,1} ∏ b∈∂x µF,b→x,ℓ(s) = 0. Fact 2.3. The BP messages and marginals are half-integral for all t , i.e., for all t ≥ 0 and s ∈ {0,1} we have µF,x→a,ℓ(s),µF,a→x,ℓ(s),µF,x,ℓ(s) ∈ {0,1/2,1}. (2.10) Furthermore, for all ℓ> 2 ∑ a∈C (F ) |∂a| we have µF,x,ℓ(s) =µF,x,ℓ+1(s). Proof. The half-integrality (2.10) follows from a straightforward induction on ℓ. Furthermore, another induction on ℓ and inspection of (2.7)–(2.8) shows that for any x, a,ℓ such that µF,x→a,ℓ(1) ̸= 1/2 we have µF,x→a,ℓ+1(s) = µF,x→a,ℓ(s) (s ∈ {0,1}). A similar statement holds for µF,a→x,ℓ+1(s). In particular, the number of messages that take the value 1/2 is monotonically decreasing in ℓ. Since the total number of messages is bounded by 2 ∑ a∈C (F ) |∂a|, we conclude that the messages will have converged pointwise after this number of iterations. □ Finally, in light of Fact 2.3 it makes sense to define the approximations for BPGD by letting µF BP,t = lim ℓ→∞ µF BP,t ,xt+1,ℓ(1), µF DC,t = lim ℓ→∞ µF DC,t ,xt+1,ℓ(1). (2.11) 2.3. Warning Propagation. Thanks to the half-integrality (2.10) of the messages, Belief Propagation is equivalent to a purely combinatorial message passing procedure called Warning Propagation (‘WP’) [19]. Similar as BP, WP also associates two message sequences (ωF,x→a,ℓ,ωF,a→x,ℓ)ℓ≥0 with every adjacent clause/variable pair. The mes- sages take one of three possible discrete values {f,u,n} (‘frozen’, ‘uniform’, ‘null’). To trace the BP messages from Section 2.2 actually only the two values {n,u} would be necessary. However, the third value f will prove useful in order to compare the BP approximations with the actual marginals. Perhaps unexpectedly given the all-uniform initialisation (2.6), we launch WP from all-frozen start values: ωF,x→a,0 =ωF,a→x,0 = f for all a, x. (2.12) Subsequently the messages get updated according to the rules ωF,a→x,ℓ+1 =    n if ωF,y→a,ℓ = n for all y ∈ ∂a \ {x}, f if ωF,y→a,ℓ ̸= u for all y ∈ ∂a \ {x} and ωF,y→a,ℓ ̸= n for at least one y ∈ ∂a \ {x}, u otherwise, (2.13) ωF,x→a,ℓ+1 =    n if ωF,b→x,ℓ = n for at least one b ∈ ∂x \ {a}, f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x \ {a} and ωF,b→x,ℓ = f for at least one b ∈ ∂x \ {a}, u otherwise. (2.14) 8 In addition to the messages we also define the mark of variable node x by letting ωF,x,ℓ =    n if ωF,b→x,ℓ = n for at least one b ∈ ∂x, f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x and ωF,b→x,ℓ = f for at least one b ∈ ∂x, u otherwise. (2.15) The following statement summarises the relationship between BP and WP. Fact 2.4. For all t ≥ 0 and all x, a we have µx→a,ℓ(1) = 1/2 ⇔ ωF,x→a,ℓ ̸= n, (2.16) µa→x,ℓ(1) = 1/2 ⇔ ωF,a→x,ℓ ̸= n, (2.17) µx,ℓ(1) = 1/2 ⇔ ωF,x,ℓ ̸= n. (2.18) Moreover, for all ℓ> 2|C (F )| we have ωF,x→a,ℓ =ωF,x→a,ℓ+1 and ωF,a→x,ℓ =ωF,a→x,ℓ+1. Proof. The fact thatωF,x→a,ℓ =ωF,x→a,ℓ+1 andωF,a→x,ℓ =ωF,a→x,ℓ+1 for all ℓ> 2|C (F )| follows from the observation that the number of f-messages is monotonically decreasing, while the number of n-messages is monotonically increasing. The equations (2.16)–(2.18) follow by induction on ℓ. Initially all the messages are uniform in BP, i.e., µx→a,0(1) =µa→x,0(1) = 1/2. By contrast, in WP, we start with all frozen values to both variables and clauses as given by (2.12).Then from (2.13),(2.14) and (2.15), for ℓ = 0,(2.16)–(2.18) holds true. For ℓ = 1, we get the messages and marginals in BP obtained from the messages at initial step. From (2.7) it follows that if the marginals are uniform then from WP arguments (2.13), it is sure that ωF,a→x,1 ̸= n because ωF,y→a,0 = f. The same argument is valid for the other way round. If the WP message at step ℓ= 1 is not null, then the BP message from (2.7) after normalization become 1/2. So for ℓ= 1, (2.16) holds true. Let us assume the (2.16) is true for any step ℓ.Then for step ℓ+ 1 the messages in BP is obtained from step ℓ as in (2.7) is 1 2 implies in WP message ωF,a→x,ℓ+1 ̸= n because ωF,y→a,ℓ = u for at least one y ∈ ∂a \ {x}. Similarly, if the WP message ωF,a→x,ℓ+1 ̸= n implies this can be either "uniform" or "frozen". Now, if there will be at least one uniform incoming message then µa→x,ℓ+1(1) = 1/2 and for all frozen incoming messages it is straightforward from the initialization of WP (2.12) which corresponds to µa→x,ℓ+1(1) = 1/2. So at step ℓ+1, (2.16) holds true. We conclude that (2.16) holds true for every ℓ. Similarly, by induction on ℓ we can conclude that (2.17)–(2.18) also hold true for every ℓ. □ Fact 2.4 implies that the WP messages and marks ‘converge’ in the limit of large ℓ, in the sense that eventually they do not change any more. Let ωF,x→a ,ωF,a→x ,ωF,x ∈ {f,u,n} be these limits. Furthermore, let Vf,ℓ(F ), Vu,ℓ(F ), Vn,ℓ(F ) be the sets of variables with the respective mark after ℓ ≥ 0 iterations. Also let Vf(F ),Vu(F ),Vn(F ) be the sets of variables where the limit ωF,x takes the respective value. The following statement traces WP on the random formula F DC,t produced by the decimation process. Proposition 2.5. Let ε> 0 and assume that d > 0, t = t (n) ∼ θn satisfy one of the following conditions: (i) d < dmin, or (ii) d > dmin and θ ̸∈ {θ∗,θ∗}. Then there exists ℓ0 = ℓ0(d ,θ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have ∣∣t +|Vn,ℓ(F DC,t )|−α∗n ∣∣< εn, ∣∣t +|Vf,ℓ(F DC,t )|− (α∗−α∗)n ∣∣< εn, ∣∣Vn(F DC,t )△Vn,ℓ(F DC,t ) ∣∣< εn. (2.19) 2.4. The check matrix. Since the XOR operation is equivalent to addition modulo two, a XORSAT formula F with variables x1, . . . , xn and clauses a1, . . . , am translates into a linear system over F2, as follows. Let AF be the m ×n- matrix over F2 whose (i , j )-entry equals one iff variable x j appears in clause ai . Adopting coding parlance, we refer to AF as the check matrix of F . Furthermore, let yF ∈ Fm 2 be the vector whose i th entry is one plus the sum of any constant term and the number of negation signs of clause ai mod two. Then the solutions σ ∈ Fn n of the linear system AFσ= yF are precisely the satisfying assignments of F . The algebraic properties of AF therefore have a direct impact on the satisfiability of F . For example, if AF has rank m, we may conclude immediately that F is satisfiable. Furthermore, the set of solutions of F is an affine subspace of Fn 2 (if non-empty). In effect, if F is satisfiable, then the number of satisfying assignments equals the size of the kernel of AF . Hence the nullity nul AF = dimker AF of the check matrix is a key quantity. Indeed, the single most significant ingredient towards turning the heuristic arguments from [25] into rigorous proofs is a formula for the nullity of the check matrix of the XORSAT instance F DC,t from the decimation process. 9 To unclutter the notation set At = AF DC,t . We derive the following proposition from a recent general result about the nullity of random matrices over finite fields [8, Theorem 1.1]. The proposition clarifies the semantics of the functionΦd ,k,λ and its maximiser αmax. In physics jargonΦd ,k,λ is known as the Bethe free entropy. Proposition 2.6. Let d > 0 and λ=− log(1−θ). Then lim n→∞nul At =Φd ,k,λ(αmax) in probability. 2.5. Null variables. Proposition 2.6 enables us to derive crucial information about the set of satisfying assign- ments of F DC,t . Specifically, for any XORSAT instance F with variables x1, . . . , xn let V0(F ) be the set of variables xi such that σi = 0 for all σ ∈ ker AF . We call the variables xi ∈ V0(F ) null variables. Since the set of solutions of F , if non-empty, is a translation of ker AF , any two solutions σ,σ′ of F set the variables in V0(F ) to exactly the same values. The following proposition shows that WP identifies certain variables as null. Proposition 2.7. W.h.p. the following two statements are true for any fixed integer ℓ> 0. (i) We have Vn,ℓ(F DC,t ) ⊆V0(F DC,t ). (ii) We have |Vu,ℓ(F DC,t )∩V0(F DC,t )| = o(n). Propositions 2.6 and 2.7 enable us to calculate the number of null variables of F DC,t , so long as we remain clear of the point θcond where αmax is discontinuous. Proposition 2.8. If θ ̸= θcond then |V0(F DC,t )| =αmaxn +o(n) w.h.p. Let us briefly summarise what we have learned thus far. First, because all Belief Propagation messages are half-integral, BP reduces to WP. Second, Proposition 2.5 shows that the fixed points α∗,α∗ of φd ,k,λ determine the number of variables marked n or f by WP. Third, the function Φd ,k,λ and its maximiser αmax govern the nullity of the check matrix and thereby the number of null variables of F DC,t . Clearly, the null variables xi are precisely the ones whose actual marginals P [ σF DC,t (xi ) = s | F DC,t ] are not uniform. As a next step, we investigate whether BP/WP identify these variables correctly. In light of Proposition 2.5, in order to investigate the accuracy of BP it suffices to compare the numbers of vari- ables marked n by WP with the true marginals. The following corollary summarises the result. Corollary 2.9. For any d, θ the following statements are true. (i) If d < dmin, or d > dmin and θ < θcond, or d > dmin and θ > θ∗, then |V0(F DC,t )△Vn(F DC,t )| = o(n) w.h.p. (ii) If d > dmin and θcond < θ < θ∗, then |V0(F DC,t )△Vn(F DC,t )| =Ω(n) w.h.p. Thus, so long as d < dmin or d > dmin and θ < θcond or θ > θ∗, the BP/WP approximations are mostly correct. By contrast, if d > dmin and θcond < θ < θ∗, the BP/WP approximations are significantly at variance with the true marginals w.h.p. Specifically, w.h.p. BP deems Ω(n) frozen variables unfrozen, thereby setting itself up for failure. Indeed, Corollary 2.9 easily implies Theorem 1.3, which in turn implies Theorem 1.1 (ii) without much ado. In addition, to settle the (non-)reconstruction thresholds set out in Theorem 1.2 we need to investigate the conditional marginals given the values of variables at a certain distances from xt+1 as in (1.7). This is where the extra value f from the construction of WP enters. Indeed, for a XORSAT instance F with variables x1, . . . , xn and an integer ℓ let V0,ℓ(F ) be the set of variables xi such that σi = 0 for all σ ∈ ker AF for which σh = 0 for all variables xh ∈ ∂ℓxi . Corollary 2.10. Assume that d > dmin and let ε> 0. (i) If θ < θcond, then for any fixed ℓ we have |Vf,ℓ(F DC,t )∩V0,ℓ(F DC,t )| < εn w.h.p. (ii) If θ > θcond, then there exists ℓ0 = ℓ0(d ,θ,ε) such that for any fixed ℓ> ℓ0 we have |(Vn,ℓ(F DC,t )∪Vf,ℓ(F DC,t ))△V0,ℓ(F DC,t )| < εn w.h.p. Comparing the number of actually frozen variables with the ones marked f by WP, we obtain Theorem 1.2. 2.6. Proving BPGD successful. We are left to prove Theorem 1.1. First, we need to compute the (strictly positive) success probability of BPGD for d < dmin. At this point, the fact that BPGD has a fair chance of succeeding for d < dmin should not come as a surprise. Indeed, Corollary 2.9 implies that the BP approximations of the marginals are mostly correct for d < dmin, at least on the formula F DC,t created by the decimation process. Furthermore, so long as the marginals are correct, the decimation process F DC,t and the execution of the BPGD algorithm F BP,t 10 move in lockstep. The sole difficulty in analysing BPGD lies in proving that the estimates of the algorithm are not just mostly correct, but correct up to only a bounded expected number of discrepancies over the entire execution of the algorithm. To prove this fact we combine the method of differential equations with a subtle analysis of the sources of the remaining bounded number of discrepancies. These discrepancies result from the presence of short (i.e., bounded-length) cycles in the graph G(F ). Finally, the proof of the second (negative) part of Theorem 1.1 follows by coupling the execution of BPGD with the decimation process, and invoking Theorem 1.3. The details of both arguments can be found in Section 6. 2.7. Discussion. The thrust of the present work is to verify the predictions from [25] on the BPGD algorithm and the decimation process rigorously. Concerning the decimation process, the main gap in the deliberations of Ricci- Tersenghi and Semerjian [25] that we needed to plug is the proof of Proposition 2.8 on the actual number of null variables in the decimation process. The proof of Proposition 2.8, in turn, hinges on the formula for the nullity from Proposition 2.6, whereas Ricci-Tersenghi and Semerjian state the (as it turns out, correct) formulas for the nullity and the number of null variables based on purely heuristic arguments. Regarding the analysis of the BPGD algorithm, Ricci-Tersenghi and Semerjian state that they rely on the heuris- tic techniques from the insightful article [11] to predict the formula (1.6), but do not provide any further details; the article [11] principally employs heuristic arguments involving generating functions. By contrast, the method that we use to prove (1.6) is a bit more similar to that of Frieze and Suen [13] for the analysis of a variant of the unit clause algorithm on random k-SAT instances, for which they also obtain the asymptotic success probabil- ity. Yet by comparison to the argument of Frieze and Suen, we pursue a more combinatorially explicit approach that demonstrates that certain small sub-formulas that we call ‘toxic cycles’ are responsible for the failure of BPGD. Specifically, the proof of (1.6) combines the method of differential equations with Poissonisation. Finally, the proof of Theorem 1.1 (ii) is an easy afterthought of the analysis of the decimation process. Yung’s work [27] on random k-XORSAT is motivated by the ‘overlap gap paradigm’ [14], the basic idea behind which is to show that a peculiar clustered geometry of the set of solutions is an obstacle to certain types of algo- rithms. Specifically, Yung only considers the Unit Clause Propagation algorithm and (a truncated version of) BPGD. Following the path beaten in [20], Yung performs moment computations to establish the overlap gap property. However, moment computations (also called ‘annealed computations’ in physics jargon) only provide one-sided bounds. As a consequence, Yung’s results require spurious lower bounds on the clause length k (k ≥ 9 for Unit Clause and k ≥ 13 for BPGD). By contrast, the present proof strategy pivots on the number of null variables rather than overlaps, and Proposition 2.8 provides the precise ‘quenched’ count of null variables. A further improvement over [27] is that the present analysis pinpoints the precise threshold up to which BPGD (as well as Unit Clause) suc- ceeds for any k ≥ 3. Specifically, Yung proves that these algorithms fail for d > dcore, while Theorem 1.1 shows that failure occurs already for d > dmin with dmin < dcore. Conversely, Theorem 1.1 shows that the algorithms succeed with a non-vanishing probability for d < dmin. Thus, Theorem 1.1 identifies the correct threshold for the success of BPGD, as well as the correct combinatorial phenomenon that determines this threshold, namely the onset of reconstruction in the decimation process (Theorems 1.2 and 1.3). The BPGD algorithm as detailed in Section 2.2 applies to a wide variety of problems beyond random k-XORSAT. Of course, the single most prominent example is random k-SAT. Lacking the symmetries of XORSAT, random k- SAT does not allow for the simplification to discrete messages; in particular, the BP messages are not generally half-integral. In effect, BP and WP are no longer equivalent. In addition to random k-XORSAT, the article [25] also provides a heuristic study of BPGD on random k-SAT. But once again due to the lack of half-integrality, the formulas for the phase transitions no longer come as elegant finite-dimensional expressions. Instead, they now come as infinite-dimensional variational problems. Furthermore, the absence of half-integrality also entails that the present proof strategy does not extend to k-SAT. The lack of inherent symmetry in random k-SAT can partly be compensated by assuming that the clause length k is sufficiently large (viz. larger than some usually unspecified constant k0). Under this assumption the random k-SAT version of both the decimation process and the BPGD algorithm have been analysed rigorously [6, 10]. The results are in qualitative agreement with the predictions from [25]. In particular, the BPGD algorithm provably fails to find satisfying assignments on random k-SAT instances even below the threshold where the set of satisfying assignments shatters into well-separated clusters [1, 17]. Furthermore, on random k-SAT a more sophisticated message passing algorithm called Survey Propagation Guided Decimation has been suggested [21, 25]. While on random XORSAT Survey Propagation and Belief Propagation are equivalent, the two algorithms are substantially 11 different on random k-SAT. One might therefore hope that Survey Propagation Guided Decimation outperforms BPGD on random k-SAT and finds satisfying assignments up to the aforementioned shattering transition. A neg- ative result to the effect that Survey Propagation Guided Decimation fails asymptotically beyond the shattering transition point for large enough k exists [15]. Yet a complete analysis of Belief/Survey Propagation Guided Deci- mation on random k-SAT for any k ≥ 3 in analogy to the results obtained here for random k-XORSAT remains an outstanding challenge. Finally, returning to random k-XORSAT, a question for future work may be to investigate the performance of various types of algorithms such as greedy, message passing or local search that aim to find an assignment that violates the least possible number of clauses. Of course, this question is relevant even for d > dsat(k). A first step based on the heuristic ‘dynamical cavity method’ was recently undertaken by Maier, Behrens and Zdeborová [18]. 2.8. Preliminaries and notation. Throughout we assume that k ≥ 3 and 0 < d < dmin and θ ∈ (0,1) are fixed in- dependently of n. We always let t = t (n) ∈ {0,1, . . . ,n} be an integer sequence such that limn→∞ t/n = θ. Un- less specified otherwise we tacitly assume that n is sufficiently large for our various estimates to hold. Asymp- totic notation such as O( · ) refers to the limit of large n by default, with k,d ,θ fixed. We continue to denote by α∗ = α∗(λ) = α∗(d ,k,λ) and α∗ = α∗(λ) = α∗(d ,k,λ) the smallest/largest fixed points of φd ,k,λ in [0,1] and by λ∗ =λ∗(d ,k), λ∗ =λ∗(d ,k), θ∗ = θ∗(d ,k), θ∗ = θ∗(d ,k) the quantities defined in (1.9)–(1.10). For a formula F and a partial assignmentσ : U → {0,1} with U ⊆V (F ) let F [σ] be the simplified formula obtained by substituting constants for the variables in U . The length of a clause of F [σ] is defined as the number of variables from V (F ) \U that the clause contains. The following fact provides the correctness of BP on formulas represented by acyclic graphs G(F ). Fact 2.11 ([19, Chapter 14]). For a XORSAT Formula F with an acyclic bipartite graph G(F ) the BP marginals as defined in (2.9) are exact, i.e. lim ℓ→∞ µF,x,ℓ(1) =P [σF (x) = 1] . 2.9. Organisation. The rest of the paper is organised as follows. Section 3 contains the proof of Proposition 2.2. Subsequently in Section 4 we investigate Warning Propagation to prove Propositions 2.5 and 2.7. Furthermore, Section 5 deals with the study of the check matrix; here we prove Propositions 2.6 and 2.8 as well as Corollaries 2.9 and 2.10. Additionally, with all these preparations completed we put all the pieces together to complete the proofs of Theorems 1.2 and 1.3 in Section 5.5. Finally, Section 6 contains the proof of Theorem 1.1. 3. PROOF OF PROPOSITION 2.2 Even though a few steps are mildly intricate, the proof of Proposition 2.2 mostly consists of ‘routine calculus’. As a convenient shorthand we introduce ζλ(z) = ζd ,k,λ(z) =φd ,k,λ(z)− z = 1−exp ( −λ−d zk−1 ) − z. Its derivatives read ζ′λ(z) = d(k −1) zk−2 exp(−λ−d zk−1) −1 and (3.1) ζ′′λ(z) = d(k −1) zk−3 exp(−λ−d zk−1) [ (k −2)−d(k −1)zk−1 ] . (3.2) Also let z0 = z0(d ,k) = ( k −2 d(k −1) ) 1 k−1 . (3.3) We begin by investigating the zeros of ζλ, obviously identical with fixed points of φd ,k,λ. Lemma 3.1. Assume that λ> 0. (i) The function ζλ has either one or three zeros in z ∈ [0,1], possibly including multiple zeros. If ζλ has three zeros, then at least one lies in the interval [0, z0] and at least one lies in the interval [z0,1]. (ii) Also, ζλ has at most two stationary points, a minimum and a maximum, and if it has both, the minimum occurs left of the maximum. (iii) If ζλ has a unique zero, then α∗ is a stable fixed point of φd ,k,λ and supz∈[0,1]φ ′ d ,k,λ(z) < 1. 12 (iv) If ζλ has three zeros but no double zero, then α∗,α∗ are stable fixed points of φd ,k,λ. Additionally, φd ,k,λ pos- sesses an unstable fixed point αu ∈ (α∗,α∗). Furthermore, there exists ε= ε(d ,λ) > 0 such that sup z∈[0,α∗+ε] φ′ d ,k,λ(z) < 1, sup z∈[α∗−ε,1] φ′ d ,k,λ(z) < 1. Proof. Since ζλ(0) > 0 and ζλ(1) < 0, the number of zeros must be odd, so towards (i) it suffices to show that there cannot be more than three zeros. Indeed, by Rolle’s theorem, between any two zeros of ζλ there is a zero of ζ′ λ . So, if ζλ had four or more zeros then ζ′ λ would have at least three zeros in (0,1], and in turn ζ′′ λ would have at least two. From (3.2) it is clear that ζ′′ λ has only two zeros, at z = 0 (outside the relevant range) and at the inflection point where k−2 = d(k−1)zk−1, namely for z = z0. So, ζ′′ λ has at most two zeros, thus ζλ has at most three zeros, therefore either one or three. The second assertion follows from ζ′′ λ (z0) = 0 and that by inspection of (3.2), ζ′′ λ (z) is decreasing in z, so a local minimum of ζλ at z1 implies ζ′′ λ (z1) > 0 thus z1 < z0, and symmetrically a local maximum at z2 implies that z2 > z0. Moving on to (iii), we observe that ζλ(α∗) = 0. Furthermore, since ζλ(0) > 0 while ζλ(1) < 0, we conclude that ζ′ λ (α∗) < 0, which implies that 0 <φ′ d ,k,λ(α∗) < 1. Hence, α∗ is a stable fixed point. With respect to (iv), if ζλ has three zeros, then α∗ <α∗ are the smallest and the largest zero, respectively. Since we assume that ζλ does not have a double zero, the same reasoning as under (iii) shows that ζ′ λ (α∗) < 0 and thus 0 <φ′ d ,k,λ(α∗) < 1. Further, if ζλ has three zeros, then by Rolle’s theorem and (ii) the function has a local minimum followed by a local maximum, which is followed by the zero α∗. Hence, ζ′ λ (α∗) < 0, and thus 0 <φ′ d ,k,λ(α∗) < 1. □ The following statement implies that φd ,k,λ has only a single fixed point if d < dmin. Lemma 3.2. Let λ> 0. If d < dmin, then ζλ has a unique zero and is strictly decreasing. Proof. Suppose that z is a zero of ζλ. Then exp(−λ−d zk−1) = 1− z and thus φ′ d ,k,λ(z) = d(k −1)zk−2 exp(−λ−d zk−1) = d(k −1)(zk−2 − zk−1). (3.4) The expression on the r.h.s. is positive for z ∈ (0,1) and zero at z ∈ {0,1}. Moreover, its derivative works out to be ∂ ∂z d(k −1)(zk−2 − zk−1) = d(k −1)zk−3(k −2− (k −1)z). Thus, the expression on the r.h.s. of (3.4) takes its maximum value of d((k −2)/(k −1))k−2 at z† = (k −2)/(k −1). Hence, (3.4) implies that φ′ d ,k,λ(z) < 1 and thus ζ′ λ (z) < 0. Consequently, the function φd ,k,λ only has stable fixed points and thus has only a single fixed point by Lemma 3.1. □ Proceeding to average degrees d > dmin, we verify that the values λ∗,λ∗ from Section 1.5 are well defined and satisfy the inequality (1.9). Lemma 3.3. If d > dmin, then the polynomial d(k −1)zk−2(1− z)−1 has precisely two roots 0 < z∗ < z∗ < 1 and the values λ∗,λ∗ defined in (1.9) satisfy λ∗ >λ∗. Furthermore, dcore > dmin and λ∗ = 0 iff d ≥ dcore. Proof. Let z† = (k−2)/(k−1). The polynomial zk−2(1−z) is non-negative on [0,1], strictly increasing on [0, z†] and strictly decreasing on [z†,1]. Hence, at z† the polynomial attains its maximum value of max 0≤z≤1 zk−2(1− z) = (k −2)k−2 (k −1)k−1 . (3.5) If d > dmin, the equation zk−2(1− z) = 1 d(k −1) . (3.6) therefore has two distinct solutions 0 < z∗ < z† < z∗ < 1. Letting l(z) =− log(1− z)− z (1− z)(k −1) , we obtain λ∗ = l(z∗) and λ∗ = max{l(z∗),0}. The function l(z) is positive and monotonically increasing on (0, z†), and monotonically decreasing on (z†,1). Indeed, the derivative works out to be l′(z) = k −2− (k −1)z (k −1)(1− z)2 , (3.7) 13 which is positive for small z > 0 and has its unique zero at z†. Since z∗ < z†, we conclude that λ∗ > 0. Further, [8, Theorem 1.2] shows that at d = dcore we have l(z∗) = 0. Since z∗ is an increasing function of d while l(z) is strictly decreasing in z > z†, we conclude that l(z∗) < 0 for d > dcore, l(z∗) = 0 for d = dcore and l(z∗) =λ∗ > 0 for dmin < d < dcore. Thus, we are left to verify that λ∗ > λ∗, which amounts to showing that l(z∗) < l(z∗). Rearranging (3.6) into d = 1/((k −1)(1− z∗)zk−2 ∗ ) and d = 1/((k −1)(1− z∗)z∗k−2) and applying the inverse function theorem, we obtain ∂z∗ ∂d =− (k −1)(1− z∗)2zk−1 ∗ k −2− (k −1)z∗ , ∂z∗ ∂d =− (k −1)(1− z∗)2z∗k−1 k −2− (k −1)z∗ . (3.8) Combining (3.7) and (3.8) with the chain rule, we arrive at ∂ ∂d l(z∗) =−zk−1 ∗ , ∂ ∂d l(z∗) =−z∗k−1. (3.9) Since z∗ > z∗ for all d > dmin, integrating (3.9) on d shows that λ∗ >λ∗, thereby completing the proof. □ We are ready to identify the zeros of ζλ for d > dmin, depending on the regime of λ. Lemma 3.4. Let λ> 0 and assume that d > dmin. (i) If λ<λ∗, then ζλ has a unique zero. (ii) If λ∗ <λ<λ∗, then ζλ has three distinct zeros. (iii) If λ>λ∗, then ζλ has a unique zero. Proof. Assume that d > dmin. For fixed k and d , the function ζλ varies continuously with λ, so there are contiguous regimes ofλwhere it has one zero, regimes where it has three zeros, and these regimes are divided by critical values of λ where ζλ has three zeros two of which consist of a double zero. In this case, the slope at the double zero is also 0. (By Rolle’s theorem, the slope is 0 somewhere between the two zeros, and this is the limiting case.) Thus, the separation between the regimes with one and three zeros occurs at values of λ such that ζλ(z) = ζ′ λ (z) = 0. Recalling the definition of ζλ and the derivative ζ′ λ from (3.1), we obtain 1− z =exp(−λ−d zk−1) and d(k −1)zk−2 = 1 exp(−λ−d zk−1) . (3.10) Substituting the left equation for the exponential in the right equation, we conclude that (3.10) holds only if z is a solution to (3.6). Further, substituting the two solutions 0 < z∗ < z† = (k −2)/(k −1) < z∗ into either one of the equations from (3.10) and solving for λ, we obtain λ∗ =− log(1− z∗)− z∗ (1− z∗)(k −1) , λ⋆ =− log(1− z∗)− z∗ (1− z∗)(k −1) . Observe that λ∗ = max{λ⋆,0}. Suppose 0 < λ< λ∗. Since ζλ∗ (z∗) = 0, the function λ 7→ ζλ(z∗) is strictly increasing and ζλ(0) > 0, we conclude that ζλ has a zero in the interval (0, z∗). Similarly, if λ > λ∗, then the function ζλ has a zero in the interval (z∗,1). Hence, (ii) is an immediate consequence of Lemma 3.1. Now assume that 0 <λ<λ∗. Since λ∗ >λ∗ by Lemma 3.3, Lemma 3.1 implies that ζλ∗ has precisely three zeros. The largest one is α∗ = z∗, satisfies α∗ > z† > z0, is a double zero and simultaneously a local maximum of ζλ∗ . Since α∗ is a double zero and a local maximum, the smallest zero α∗ satisfies α∗ < z0 by Rolle’s theorem. Hence, ζ′ λ∗ (z) < 0 for all 0 < z <α∗. Since the function λ 7→ ζλ(z) is strictly increasing for all z ∈ (0,1), Lemma 3.1(i) implies that for λ<λ∗ only a single zero remains, which is smaller than z0. Finally, suppose that λ > λ∗ > λ∗. Lemma 3.1(i) implies that ζλ∗ has precisely three zeros, with a double zero occurring at z∗ and another zero at α∗(λ∗) > z† > z0. By Lemma 3.1 and the choice of z∗,λ∗, the double zero at z∗ is a local minimum. Therefore, ζ′ λ∗ (z) < 0 for all z > α∗. Since the function λ 7→ ζλ(z) is strictly increasing for all z ∈ (0,1), we conclude that ζλ(z) > 0 for all λ > λ∗ and z ∈ [0, z0]. Hence, by Lemma 3.1(i) for λ > λ∗ only a single zero remains, which lies in the interval [z0,1]. □ Combining Lemmas 3.3 and 3.4, we can now verify the analytic properties of the function λ 7→α∗ and λ 7→α∗. Lemma 3.5. Let 0 < d < dsat and λ> 0. 14 (i) If d < dmin, then the function λ ∈ (0,∞) 7→α∗ =α∗ is analytic with derivative ∂α∗ ∂λ = 1−α∗ 1−d(k −1)αk−2∗ (1−α∗) < 1. (3.11) (ii) If d > dmin, then λ ∈ (0,λ∗) 7→α∗ is analytic with derivative (3.11). (iii) If d > dmin, then λ ∈ (λ∗,∞) 7→α∗ is analytic differentiable with derivative ∂α∗ ∂λ = 1−α∗ 1−d(k −1)α∗k−2(1−α∗) . Proof. Assume that d > dmin andλ ∈ (0,λ∗). We know from the proof of Lemma 3.4 that z∗ is a double root and local minimum of ζλ∗ . Furthermore, z∗ < z0 and the function λ 7→ ζλ(z) is strictly increasing in λ. Hence, Lemma 3.1 implies that for any 0 < λ < λ∗ the function ζλ has a unique zero in (0, z∗). Similarly, if d < dmin then Lemma 3.2 shows that ζλ has a unique zero atα∗. Therefore, the implicit function theorem implies that in cases (i) and (ii) the function λ 7→α∗ is continuously differentiable. Thus, we are left to work out ∂α∗(d ,k,λ)/∂λ. Consider the function A : (z λ ) 7→ (ζλ(z) λ ) , which is one-to-one in an open interval around α∗. The Jacobi matrix reads DA = ( ∂(φd ,k,λ)/∂α−1 ∂φd ,k,λ/∂λ 0 1 ) . Furthermore, ∂φd ,k,λ ∂α ∣∣ α=α∗ = d(k −1)αk−2 exp(−λ−dαk−1) ∣∣ α=α∗ = d(k −1)αk−2 ∗ (1−α∗), ∂φd ,k,λ ∂t ∣∣ α=α∗ = exp(−λ−dαk−1) ∣∣ α=α∗ = 1−α∗. Hence, by the inverse function theorem the derivative of A −1 reads (DA )−1 = ([ ∂φd ,k,λ/∂α−1 ]−1 −[ ∂φd ,k,λ/∂λ ] / [ ∂φd ,k,λ/∂α−1 ] 0 1 ) , and thus ∂α∗ ∂λ =− ∂φd ,k,λ/∂λ ∂φd ,k,λ/∂α−1 = 1−α∗ 1−d(k −1)αk−2∗ (1−α∗) . Thus, we obtain (i) and (ii). A similar argument applies to λ ∈ (λ∗,∞) 7→α∗ in the case d > dmin and yields (iii). □ As a final preparation towards the proof of Proposition 2.2 we investigate the solution λcond to the differential equation (1.11); notice that Lemma 3.5 shows that this ODE does indeed possess a unique solution on (dmin,dsat]. Lemma 3.6. For any 0 < d < dsat we have 0 < λcond < λ∗. Furthermore, for all 0 < λ < λcond we have Φd ,k,λ(α∗) > Φd ,k,λ(α∗), whileΦd ,k,λ(α∗) <Φd ,k,λ(α∗) for λ∗ >λ>λcond. Proof. For d < dsat define λ∗ cond = inf{λ≥ 0 :Φd ,k,λ(α∗) >Φd ,k,λ(α∗)} (3.12) For any d < dsat we haveΦd ,k,0(0) >Φd ,k,0(z) for all 0 < z ≤ 1; this follows from the characterisation of the k-XORSAT threshold from [3, Theorem 1.1]. Hence, λ∗ cond > 0 for all d < dsat. Further, the function ζλ∗ has a double zero and a local minimum at α∗ = z∗. Since the sign of ζλ∗ (z) matches the sign of Φ′ d ,k,λ∗ (z), this means that Φd ,k,λ∗ (α∗) >Φd ,k,λ∗ (α∗). Hence, there exists ε> 0 such that for 0 <λ∗−ε< λ<λ∗ we haveΦd ,k,λ(α∗) >Φd ,k,λ(α∗). Therefore, 0 <λ∗ cond <λ∗. (3.13) As a next step we show that Φd ,k,λ(α∗) <Φd ,k,λ(α∗) for λ∗ >λ>λ∗ cond. (3.14) 15 To this end, we compute the derivatives of Φd ,k,λ(α∗), Φd ,k,λ(α∗) with respect to 0 < λ < λ∗. Since α∗,α∗ are stationary points ofΦd ,k,λ, the chain rule yields ∂ ∂λ Φd ,k,λ(α∗) = ∂Φd ,k,λ ∂λ ∣∣ α∗ + ∂Φd ,k,λ ∂α ∣∣ α∗ ∂α∗ ∂λ = ∂Φd ,k,λ ∂λ ∣∣ α∗ =−exp(−λ−dαk−1 ∗ ) =α∗−1, (3.15) ∂ ∂λ Φd ,k,λ(α∗) =α∗−1. (3.16) Since α∗ <α∗ for all λ∗ <λ<λ∗, (3.14) follows from (3.15)–(3.16). Finally, we verify that λ∗ cond equals the solution λcond to the differential equation (1.11). Recalling the definition (3.12), we see that it suffices to check that Φd ,k,λcond (α∗) = Φd ,k,λcond (α∗) for all dmin < d < dsat. To this end, we notice that by definition of dsat we have Φdsat,k,0(0) =Φdsat,k,0(α∗), in line with the initial condition λcond(dsat) = 0. Additionally, we claim that λcond(dsat) satisfies ∂Φd ,k,λcond ∂d ∣∣ α∗ = ∂Φd ,k,λcond ∂d ∣∣ α∗ . Indeed, using the chain rule and the fact that α∗,α∗ are stationary points, with λ=λ(d) we obtain ∂Φd ,k,λcond (α∗) ∂d = ∂Φd ,k,λcond ∂d ∣∣ α∗,λcond + ∂Φd ,k,λcond ∂α ∣∣ α∗,λcond ∂α∗ ∂d + ∂Φd ,k,λ ∂λ ∣∣ α∗,λ = ∂Φd ,k,λcond ∂d ∣∣ α∗,λcond + ∂Φd ,k,λ ∂λ ∣∣ α∗,λ =αk−1 ∗ + (α∗−1) ∂λcond ∂d . Analogously, ∂Φd ,k,λcond (α∗) ∂d =α∗k−1 + (α∗−1) ∂λcond ∂d . Hence, the solution λcond to (1.11) satisfies Φd ,k,λcond (α∗) = Φd ,k,λcond (α∗), and thus λcond = λ∗ cond. Therefore, the assertion follows from (3.15) and (3.14). □ Proof of Proposition 2.2. The first assertion is an immediate consequence of Lemmas 3.2 and 3.5. Moreover, the second assertion follows from Lemmas 3.3, 3.4 and 3.5. Finally, the last assertion follows from Lemma 3.6. □ 4. WARNING PROPAGATION AND LOCAL WEAK CONVERGENCE In this section we prove Propositions 2.5 and 2.7. The proofs rely on the concept of local weak convergence. Specif- ically, we are going to set up a Galton-Watson process that mimics the local topology of the graph G(F DC,t ) up to any fixed depth ℓ. Subsequently we will analyse WP on the Galton-Watson tree and argue that the result extends to G(F DC,t ). 4.1. Local weak convergence. The construction of the Galton-Watson process T = T(d ,k, t ) is pretty straightfor- ward. The process has two types called variable nodes and check nodes. The process starts with a single variable node v0. Furthermore, each variable node begets a Po(d) number of check nodes as offspring, while the offspring of a check node is a Bin(k −1,1− t/n) number of variable nodes. Let T be the Galton-Watson tree rooted at v0 that this process generates; T may be infinite. Hence, for an integer ℓ obtain T(ℓ) from T by deleting all variable/check nodes at distance greater than 2ℓ from v0. Thus, T(ℓ) is a finite random tree rooted at v0. For any graphs T,T ′ rooted at v, v ′, respectively, we write T ∼= T ′ if there is a graph isomorphism ι : T → T ′ such that ι(v) = v ′. Furthermore, for a vertex v of G(F DC,t ) and an integer ℓ we let ∂≤ℓF DC,t v be the subgraph obtained from G(F DC,t ) by deleting all vertices at distance greater than 2ℓ from v , rooted at v . Finally, for a rooted graph g and an integer ℓ we let N (ℓ) t (g ) be the number of vertices v of G(F DC,t ) such that ∂≤ℓF DC,t v ∼= g . Lemma 4.1. For any rooted tree g we have E ∣∣∣N (ℓ) t (g )− (n − t )P [ T(ℓ) ∼= g ]∣∣∣= o(n). (4.1) Proof. The proof is based on a routine second moment argument; that is, we claim that E [ N (ℓ) t (g ) ] = (n − t )P [ T(ℓ) ∼= g ] +o(n), E [ N (ℓ) t (g )2 ] = (n − t )2P [ T(ℓ) ∼= g ]2 +o(n2). (4.2) 16 Combining (4.2) with the Markov and Chebyshev inequalities then yields the assertion. We prove (4.2) and thereby (4.1) by induction on ℓ. Recall that F DC,t is a XORSAT instance with variables xt+1, . . . , xn . Let us begin with the estimate of the first moment. Due to the linearity of expectation, it suffices to show that P [ ∂≤ℓ(F DC,t , xt+1) ∼= g ] =P [ T(ℓ) ∼= g ] +o(1). (4.3) For ℓ = 0 there is nothing to show. Hence, suppose that (4.1) is true with ℓ replaced by ℓ−1. Furthermore, let ∆ be the degree of the root r of g and let 1 ≤ κ1 ≤ . . . ≤ κ∆ ≤ k be the degrees of the children of the root; thus, we order the children of r so that their degrees are increasing. For an integer 1 ≤ i ≤ k let Ki be the number j ∈ [∆] such that κ j = i . Further, let (gi , j )1≤i≤∆,1≤ j≤κi be the trees pending on the grandchildren of the root. In addition, let ∆ be the degree of xt+1 in G(F DC,t ) and let 1 ≤κ1 ≤ . . . ≤κ∆ ≤ k be the degrees of the neighbours of xt+1. Then ∂≤ℓ(F DC,t , xt+1) ∼= g is possible only if ∆=∆ and κi = κi for all 1 ≤ i ≤∆. Since the clauses of the random formula F are drawn uniformly and independently and G(F DC,t ) is obtained from G(F ) by deleting the variable nodes x1, . . . , xt along with any ensuing isolated check nodes, we conclude that the event D = {∆=∆, ∧ 1≤i≤∆κi = κi } has probability P [D] =P [Po(d) =∆] ( ∆ K1, . . . ,Kk ) k∏ i=1 P [Bin(k −1,1− t/n) = i ]Ki . (4.4) Further, let G = {gi , j : 1 ≤ i ≤∆, 1 ≤ j ≤ κi } and let E be the event that N (ℓ−1) t (γ) = (n − t )P [ T(ℓ−1) ∼= γ ]+o(n) for all γ ∈G . Then by induction we have P [E |D] = 1−o(1). (4.5) Now, obtain G−(F DC,t ) from G(F DC,t ) by deleting xt+1 along with its adjacent check nodes. Let N (ℓ),− t (gi , j ) be the number of vertices v of G−(F DC,t ) such that ∂≤ℓ(G−(F DC,t ), v) ∼= gi , j . Moreover, let E− be the event that N (ℓ−1),− t (gi , j ) = (n − t )P [ T(ℓ−1) ∼= gi , j ]+ o(n) for all i , j . Since xt+1 has degree ∆ = O(1) given D and all adjacent check nodes have degree at most k, (4.5) implies that P [E− |D] = 1−o(1). (4.6) Finally, since F DC,t is uniformly random, given D the checks a of F DC,t adjacent to xt+1 simply choose their other neighbours uniformly at random from the variable nodes xt+2, . . . , xn of G−(F DC,t ). Therefore, (4.4) implies that P [ ∂≤ℓ(F DC,t , xt+1) ∼= g ] =P [ T(ℓ) ∼= g ] +o(1), thereby proving (4.3) and thus the first part of (4.2). The proof of the second part of (4.2) (the estimate of the second moment) proceeds along similar lines, except that we need to explore the depth-2ℓ neighbourhoods of two variable nodes of F DC,t simultaneously. Specifically, the proof of the second moment bound comes down to showing that P [ ∂≤ℓ(F DC,t , xt+1) ∼= g , ∂≤ℓ(F DC,t , xt+2) ∼= g ] =P [ T(ℓ) ∼= g ]2 +o(1). (4.7) Exploiting that the variable nodes xt+1, xt+2 are at distance greater than 4ℓ w.h.p., we conduct a similar induction as above to verify (4.7) and thus (4.2). □ 4.2. Proof of Proposition 2.5. To prove Proposition 2.5 we estimate the sizes |Vn,ℓ(F DC,t )|, |Vf,ℓ(F DC,t )| separately. Recall that θ ∼ t/n. Lemma 4.2. Let ε> 0 and assume that one of the following conditions is satisfied: (i) d < dmin, or (ii) d > dmin and |θ∗−θ| > ε. Then there exists ℓ0 = ℓ0(d ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have ∣∣t +|Vn,ℓ(F DC,t )|−α∗n ∣∣< εn. Proof. In light of Lemma 4.1 it suffices to investigate WP on the random tree T(ℓ) for large enough ℓ. Specifically, let p(ℓ) be the probability that WP marks the root of T(ℓ) as n. In formulas, recalling (2.15), this means that p(ℓ) =P[ ωT(ℓ),r,ℓ = n ] for ℓ≥ 1, and p(0) = 0. (4.8) 17 Let ∆ be the degree of the root r of T(ℓ) and let κ1, . . . ,κ∆ be the degrees of the children of r . Since the sub-trees of T(ℓ) pending on the grandchildren of r are independent copies of T(ℓ−1), the WP update rules (2.13)–(2.14) yield the recurrence p(ℓ) = 1−E [ ∆∏ i=1 ( 1− κi−1∏ j=0 p(ℓ−1) )] (ℓ> 0). (4.9) By the construction ofT the degree∆ of r has distribution Po(d). Furthermore, each child of r has Bin(k−1,1−t/n) children; thus, κi −1 dist= Bin(k −1,1− t/n). Consequently, (4.9) yields p(ℓ) = 1−exp(−d) ∞∑ ∆=0 d∆ ∆! ( 1− k−1∑ κ=0 ( k −1 κ ) exp(−λκ)(1−exp(−λ))k−1−κp(ℓ−1)κ )∆ = 1−exp ( −d ( 1−exp(−λ)(1−p(ℓ−1)) )k−1 ) . (4.10) Letting z(ℓ) = 1−exp(−λ)(1−p(ℓ)) and recalling the definition (1.2) of φd ,k,λ, we see that (4.10) amounts to z(ℓ) =φd ,k,λ(z(ℓ−1)). (4.11) Moreover, Lemma 3.1 (iii)–(iv), Lemma 3.2 and Lemma 3.4 show that if (i) or (ii) above hold, thenφd ,k,λ is a contrac- tion on [0,α∗]. Therefore, (4.11) shows that limℓ→∞ p(ℓ) = α∗−θ 1−θ . Thus, the assertion follows from Lemma 4.1. □ Lemma 4.3. Let ε> 0 and assume that d > 0, t = t (n) are such that one of the following conditions is satisfied: (i) d < dmin, or (ii) d > dmin, |θ∗−θ| > ε and |θ∗−θ| > ε. Then there exists ℓ0 = ℓ0(d ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have ∣∣|Vf,ℓ(F DC,t )|− (α∗−α∗)n ∣∣< εn. Proof. Once again it suffices to trace WP on T(ℓ) for large ℓ. As in the proof of Lemma 4.2, let p(ℓ) =P[ ωT(ℓ),r,ℓ ̸= u ] for ℓ≥ 1, and p(0) = 1. (4.12) Then with ∆ the degree of r and κ1, . . . ,κ∆ the degrees of the children of r , the WP update rules (2.13)–(2.14) translate into p(ℓ) = 1−E [ ∆∏ i=1 ( 1− κi−1∏ j=0 p(ℓ−1) )] (ℓ> 0), (4.13) Thus, the recurrence is identical to (4.8), but this time with the initial condition p(0) = 1. Hence, letting z(ℓ) = 1−exp(−λ)(1−p(ℓ)) and z(0) = 1 and retracing the steps towards (4.11), we obtain z(ℓ) = 1−exp(−λ)(1−p(ℓ)). (4.14) Invoking Lemmas 3.1, 3.2 and 3.4, we conclude that (i) or (ii) ensure that φd ,k,λ contracts on [0,α∗]. Consequently, (4.14) implies that limℓ→∞ p(ℓ) = α∗−θ 1−θ . Thus, the assertion follows from Lemmas 4.1 and 4.2. □ Finally, we compare the set Vn,ℓ(F DC,t ) obtained after a (large but) bounded number of iterations with the ulti- mate sets Vn(F DC,t ) obtained upon convergence of WP. The proof of the following lemma is an adaptation of the argument from [23] for cores of random hypergraphs. Lemma 4.4. Assume that θ ∈ (0,1) \ {θ∗,θ∗}. Then for any ε> 0 there exists ℓ0 = ℓ0(d ,ε,θ) such that for all ℓ> ℓ0 we have |Vn,ℓ(F DC,t )△Vn(F DC,t )| < εn w.h.p. Proof. In place of the WP message passing process from Section 2.3 we consider the following simpler peeling process, which reproduces the same set Vn(F DC,t ). Let G0 = G(F ) be the bipartite graph induced by F DC,t . For h ≥ 0 obtain Gh+1 from Gh by performing the following peeling operation. Remove all check nodes of degree one along with their variable node neighbours. (4.15) 18 Clearly, this process will reach a fixed point (i.e., Gh+1 =Gh) after at most m iterations. Moreover, a straightforward induction on ℓ shows that V (G0) \ V (Gℓ) = Vn,ℓ(F DC,t ) and thus V (G0) \ V (Gm ) = Vn(F DC,t ). Hence, it suffices to prove that for large enough ℓ= ℓ(d ,ε,θ) we have |V (Gℓ)△V (Gm )| < εn w.h.p. (4.16) Towards the proof of (4.16) let d h = (d h(u))u∈V (Gh )∪C (Gh ) be the degree sequence of Gh . By the principle of deferred decisions Gh is uniformly random given d h . Further, let ∆h( j ) = ∣∣{x ∈V (Gh) : d h(x) = j }∣∣ , ∆′ h( j ) = ∣∣{a ∈C (Gh) : d h(a) = j }∣∣ be the number of variable/check nodes of degree j ≥ 0. Pick δ= δ(d ,ε,θ), δ′ = δ′(d ,δ,θ), δ′′(d ,δ′,θ), small enough and ℓ≥ ℓ0(d ,δ′′,θ) large enough. Then Lemma 4.2 implies that w.h.p. |V (Gℓ) \V (Gℓ+1)| < δ′′n. (4.17) Furthermore, we claim that ∑ j≥0 ∣∣∣∣ ∆ℓ( j ) |V (Gℓ)| −P [ Po(d(1−αk−1 ∗ )) = j ]∣∣∣∣< δ′, ∑ j≥2 ∣∣∣∣∣ ∆′ ℓ ( j ) |C (Gℓ)| − P [ Bin(k,1−α∗) = j ] P [Bin(k,1−α∗) ≥ 2] ∣∣∣∣∣< δ ′. (4.18) Indeed, Lemma 4.1 shows that we just need to study WP on the random tree T(ℓ), as in the proof of Lemma 4.2. Thus, let ∆ be the degree of the root variable and let κ1, . . . ,κ∆ be the degrees of the children of the root. Since the sub-trees pending on the children of the root are independent copies of T(ℓ−1), Lemma 4.2 shows that the probability that any one of the∆ children sends a n-message to r falls into the interval (1−αk−1 ∗ −δ′′,1−αk−1 ∗ +δ′′), provided that ℓ is large enough. Since∆ dist= Po(d), the first part of (4.18) follows from Poisson thinning. Similarly, to obtain the second part of (4.18) consider a clause a that is a child of the root r of T(ℓ). Then by the same token as in the previous paragraph the number of children of a that do not send a n-message after ℓ iterations of WP lies in the interval (1−αk−1 ∗ −δ′′,1−αk−1 ∗ +δ′′). Furthermore, the number of children a′ ̸= a of r has distribution Po(∆). Hence, the probability that the WP-message from r to a equals n comes to α∗±δ′′, and this event is independent of the messages that the children of a send to a. Finally, the probability that one of the messages that a receives after ℓ iterations of WP differs from the message received after ℓ−1 iterations is smaller than δ′′ for large enough ℓ. Since the peeling process removes any checks a with at least k−1 incoming n-messages, we obtain (4.18). To complete the proof we are going to deduce from (4.17)–(4.18) that the peeling process (4.15) will remove no more than εn/2 further nodes from Gℓ before it stops. Following [23], we consider a slowed-down version of the process where no longer all checks of degree one get removed simultaneously, but rather one-at-a-time. Let (Gℓ[ν])ν≥0 be the sequences of graphs produced by this modified process, with Gℓ[0] =Gℓ and Gℓ[ν+1] =Gℓ[ν] if all checks of Gℓ[ν] have degree at least two. Further, let Uℓ[ν] be the number of unary checks of Gℓ[ν]. Let D be the event that the bounds (4.17)–(4.18) hold. Then it suffices to prove that on the event {Uℓ[ν] > 0}∩D we have E [Uℓ[ν+1]−Uℓ[ν] |Uℓ[ν]] < 0 for all 0 ≤ ν≤ εn/2. (4.19) Invoking the principle of deferred decisions, in order to verify (4.19) we compute the expected number of new degree one checks produced by the removal of a single random variable node x . Due to (4.18), for ν ≤ εn/2 the expected number of neighbours a of x of degree precisely two is bounded by dP [Bin(k −1,1−α∗) = 1]+δ= d(k −1)(1−α∗)αk−2 ∗ +δ=φ′ d ,k,λ(α∗)+δ< 1, provided that δ> 0 is chosen sufficiently small. Hence, we obtain (4.19). □ Proof of Proposition 2.5. The proposition is an immediate consequence of Lemmas 4.2–4.4. □ 4.3. Proof of Proposition 2.7. We deal with the two claims separately. Towards the first claim we establish the following stronger, deterministic statement. Lemma 4.5. For any XORSAT instance F with variables Vn = {x1, . . . , xn} and any integer ℓ ≥ 0 we have Vn,ℓ(F ) ⊆ V0(F ). 19 Proof. We proceed by induction on ℓ. For ℓ ≤ 1 there is nothing to show because Vn,ℓ(F ) = ; by construction. Hence, assume that ℓ > 1 and that Vn,ℓ−1(F ) ⊆ V0(F ). If x ∈ Vn,ℓ−1, then (2.15) shows that there exists a check node b ∈ ∂x such that ωF,b→x,ℓ = n. Furthermore, (2.13) shows that if ωF,b→x,ℓ = n, then for all y ∈ ∂b \ {x} we have ωF,y→b,ℓ−1 = n. Additionally, (2.14) shows that ifωF,y→b,ℓ−1 = n, then there exists a ∈ ∂y\{b} such thatωF,a→y,ℓ−2 = n. Hence, (2.15) ensures that ωF,y,ℓ−2 = n and thus y ∈V0(F ) for all y ∈ ∂b \ {x} (4.20) by induction. Now suppose that ∂b = {x j1 , . . . , x jh } with pairwise distinct indices 1 ≤ j1, . . . , jh ≤ n such that x = x j1 . Consider σ ∈ ker A(F ). Then (4.20) implies that σ j2 = ·· · =σ jh = 0. Consequently, σ j1 = 0 and thus x ∈V0(F ). □ The following lemma deals with the variables that WP marks u. Lemma 4.6. For any fixed ℓ≥ 0 we have |Vu,ℓ(F DC,t )∩V0(F DC,t )| = o(n) w.h.p. Proof. We are going to show by induction on ℓ that E|Vu,ℓ(F DC,t )∩V0(F DC,t )| = o(n). To this end, because the distribution of F DC,t is invariant under permutations of the variables xt+1, . . . , xn , it suffices to show that P [ xn ∈Vu,ℓ(F DC,t )∩V0(F DC,t ) ]= o(1). (4.21) Indeed, let A be the event that the depth-2ℓ neighbourhood ∂≤ℓxn of xn in F DC,t is acyclic. Since Lemma 4.1 shows that P [A ] = 1−o(1), towards (4.21) it suffices to prove that on the event A we have xn ̸∈Vu,ℓ(F DC,t )∩V0(F DC,t ). (4.22) But (4.22) follows from the well known fact that BP is exact on acyclic factor graphs (see Fact 2.11). □ Proof of Proposition 2.7. The proposition is an immediate consequence of Lemmas 4.5 and 4.6. □ 5. ANALYSIS OF THE CHECK MATRIX In this section we prove Propositions 2.6 and 2.8. Proposition 2.6 is an easy consequence of [8, Theorem 1.1]. Furthermore, Proposition 2.8 follows from Proposition 2.6 by interpolating on the parameter λ; a related argument was recently used in [9] to show that certain random combinatorial matrices have full rank w.h.p. In addition, we prove Corollaries 2.9 and 2.10 and subsequently complete the proofs of Theorems 1.2–1.3. 5.1. Proof of Proposition 2.6. We use a general result [8, Theorem 1.1] about the rank of sparse random matrices from a fairly universal class of distributions. The definition of this general random matrix goes as follows. Let d,k ≥ 0 be integer-values random variables such that 0 < E[d3]+E[k3] <∞. Moreover, let (di ,ki )i≥0 be families of mutually independent random variables such that di dist= d and ki dist= k. Let d̄ = E[d] and k̄ = E[k] and for an integer n > 0 let m=mn dist= Po(d̄n/k̄). The sequence (mn)n is independent of (di ,ki )i≥0. Further, let Sn be the event that n∑ i=1 di = mn∑ i=1 ki . (5.1) It is a known fact that P [Sn] = Ω(n−1/2) [8, Proposition 1.10]. Given that Sn occurs, create a simple random bi- partite graph Gn with a set Vn = {x1, . . . ,xn} of variable nodes and a set Cn = {c1, . . . ,cmn } of check nodes uniformly at random subject to the condition that x j has degree d j and ci has degree ki for all 1 ≤ j ≤ n and 1 ≤ i ≤mn . Fi- nally, let An be the biadjacency matrix of Gn . Thus, An has size mn ×n and its (i , j )-entry equals 1 iff x j and ci are adjacent in Gn . Theorem 5.1 (special case of [8, Theorem 1.1]). Let D(z) = ∑∞ h=0P [d= h] zh and K(z) = ∑∞ h=0P [k= h] zh be the probability generating functions of d,k, respectively. Furthermore, let F : [0,1] →R, z 7→D(1−K′(z)/K′(1))− D′(1) K′(1) (1−K(z)− (1− z)K′(z)). (5.2) Then lim n→∞ 1 n nulAn = max z∈[0,1] F(z) in probability. We now derive Proposition 2.6 from Theorem 5.1 by identifying suitable distributions d,k such that An resem- bles At . 20 Proof of Proposition 2.6. Recall that 0 ≤ t = t (n) ≤ n satisfies t = θn +o(n) or a fixed 0 ≤ θ ≤ 1. We continue to set λ = − log(1−θ). We are going to construct several random matrices that can be coupled such that their nullities differ by no more than o(n) w.h.p. The first of these random matrices is the matrix At from Proposition 2.6, and the last is the matrix An from Theorem 5.1, with suitably chosen d,k. For a start, consider the check matrix A′ = A0 of the original, ‘undecimated’ k-XORSAT formula F = F DC,0. Obtain A′ t from A′ by adding t new rows to A′. Each of these rows contains precisely a single non-zero entry. The positions of the non-zero entries are chosen uniformly without replacement. Thus, the extra t rows have the effect of fixing t uniformly random coordinates to zero. Since the distribution of the random matrix A′ is invariant under column permutations, we conclude that nul At dist= nul A′ t . (5.3) Further, let A[λ] be the matrix obtained from A′ by adding a random number of l = Po(λn) of rows. Each of these rows contains a single non-zero entry, which is placed in a uniformly random position. The extra rows are chosen mutually independently (thus, ‘with replacement’) and independently of A′. By Poisson thinning, for any column index j ∈ [n] the probability that one of the new l rows has a non-zero entry in the j th column equals 1−exp(−λ) = θ. Since t ∼ θn, the total number of such indices j has distribution Bin(n,θ). Since P [|Bin(n,θ)− t | ≤p n logn ] ≥ 1−1/n by the Chernoff bound, we can couple A′ t and A [λ] such that nul A′ t = nul A [λ]+o(n) w.h.p. (5.4) Finally, let A′[λ] be the matrix obtained as follows. Let d,k have probability generating functions D(z) = exp((λ+d)(z −1)), K(z) = d zk +kλz d +kλ . (5.5) In other words, d has distribution Po(d +λ) while k equals one with probability kλ/(d + kλ) and equals k with probability d/(d +kλ). The definition (5.5) readily yields d̄=D′(1) =λ+d , k̄=K′(1) = k(d +λ) d +kλ . (5.6) Hence, the number m=mn dist= Po(nd̄/k̄) of rows of A=An can be written as a sum of independent random variables m=m′+m′′ with distributions m′ = Po(dn/k), m′′ = Po(λn). (5.7) The first summand m′ prescribes the number of rows of A with k non-zero entries, while m′′ details the number of rows with a single non-zero entry. Consequently, (5.7) shows that the numbers of rows with k or with just a single non-zero entry have the same distributions in both A and A[λ]. We are left to argue that in A the positions of the non-zero entries in the different rows are nearly independent and uniform. To see this, let (hi , j )1≤i≥m,1≤ j≤k be a family of mutually independent and uniform random variables with values in [n] = {1, . . . ,n}. Moreover, let X be the number of indices 1 ≤ i ≤m′ such that there exist 1 ≤ j1 < j2 ≤ k such that hi , j1 = hi , j2 ; in other words, hi ,1, . . . ,hi ,k fail to be pairwise distinct. A routine calculation shows that E[X ] =O(1). (5.8) Now, let us think of (hi , j )1≤i≤m′,1≤ j≤k and (hi ,1)m′p n logn ]= o(n−10), P [|nul A [λ]−E[nul A [λ]]| >p n logn ]= o(n−10). (5.13) Proof. We combine the Azuma–Hoeffding inequality with the simple observation that the nullity satisfies a Lips- chitz condition. Specifically, adding or removing a single row to a matrix changes the nullity by at most one. We apply this observation to the matrix A′ t from the proof of Proposition 2.6, which consists of m + t independent random rows. Indeed, Azuma-Hoeffding implies together with the Lipschitz property that P [|A′ t −E[A′ t | m]| > u | m ]≤ 2exp ( − u2 2(m + t ) ) for any u > 0. (5.14) Furthermore, Bennett’s concentration inequality for Poisson variables shows that P [|m −dn/k| >p n log2/3 n ]= o(n−10). (5.15) Combining (5.14)–(5.15) with the Lipschitz property and setting u =p n log2/3 n, we obtain the first part of (5.13). Similar reasoning applies to the second matrix A[λ]; for given l and m the Lipschitz property yields P [|A′ t −E[A′ t | l ,m]| > u | l ,m ]≤ 2exp ( − u2 2(l +m) ) for any u > 0. (5.16) Moreover, in analogy to (5.15) we have P [|l −λn| >p n log2/3 n ]= o(n−10). (5.17) Thus, (5.15)–(5.17) and Azuma-Hoeffding imply the second part of (5.13). □ We are going to estimate |V0(F DC,t )| by way of estimating changes of nul A[λ] as λ varies. Since nul A[λ]/n converges toΦd ,k,λ(αmax) by Proposition 2.6, we thus need to estimate the derivative ∂ ∂λΦd ,k,λ(αmax). Lemma 5.3. Let d > 0 and assume that (i) d < dmin, or (ii) d > dmin and λ ∈ (0,∞) \ {λcond}. Then ∂ ∂λ Φd ,k,λ(αmax) =αmax −1. (5.18) Proof. The seeming difficulty is thatαmax =αmax(λ) varies withλ. Yet Proposition 2.2 (iii) ensures that the function λ 7→ αmax is continuously differentiable for λ ̸= λcond. Moreover, Fact 2.1 shows that αmax is a local maximum of Φd ,k,λ. Hence, applying the chain rule we obtain ∂ ∂λ Φd ,k,λ(αmax) = ∂Φd ,k,λ ∂λ ∣∣∣ λ,αmax + ∂Φd ,k,λ ∂α ∣∣∣ λ,αmax ∂αmax ∂λ = ∂Φd ,k,λ ∂λ ∣∣∣ λ,αmax =−exp ( −λ−dαk−1 max ) . (5.19) In fact, since Fact 2.1 shows that αmax is a fixed point of φd ,k,λ, the r.h.s. of (5.19) simplifies to (5.18). □ Complementing the analytic formula (5.18), we now derive a combinatorial interpretation of the derivative of the nullity. For a matrix A of size m ×n let V0(A) be the set of all indices i ∈ [n] such that σi = 0 for all σ ∈ ker A. Lemma 5.4. For any d ,λ> 0 we have ∂ ∂λ E[nul A[λ]] = E|V0(A[λ])| n −1. 22 Proof. Recall that A[λ] is obtained from A(F ) by adding m′′dist= Po(λn) stochastically independent rows with a single non-zero entry in a uniformly random position. Consequently, ∂ ∂λ E[nul A[λ]] = ∂ ∂λ ∞∑ ℓ=0 P [ m′′ = ℓ]E[nul A[λ] | m′′ = ℓ] = ∞∑ ℓ=0 E[nul A[λ] | m′′ = ℓ] ∂ ∂λ (λn)ℓ ℓ! exp(−λn) = ∞∑ ℓ=0 E[nul A[λ] | m′′ = ℓ] ( 1{ℓ≥ 1} (λn)ℓ−1 (ℓ−1)! − (λn)ℓ ℓ! ) exp(−λn) = ∞∑ ℓ=0 E[nul A[λ] | m′′ = ℓ] ( P [ m′′ = ℓ]−P[ m′′ = ℓ+1 ]) . (5.20) Hence, obtain A[λ]+ from A[λ] by adding one more row with a single non-zero entry in a uniformly random posi- tion j+ ∈ [n]. Then A[λ]+− A[λ] =−1{ j+ ∈V0(A[λ])}. Hence, (5.20) yields ∂ ∂λ E[nul A[λ]] =−E[ nul(A[λ]+)−nul(A[λ]) ]=P[ j+ ∈V0(A[λ]) ]−1 = E|V0(A[λ])| n −1, as claimed. □ With these preparations in place we can now derive the desired formulas for |V0(At )|. We treat the cases αmax = α∗ and αmax =α∗ separately. Lemma 5.5. Assume that d ,λ> 0 satisfy Φd ,k,λ(α∗) >Φd ,k,λ(α) for all α ∈ [0,1] \ {α∗}. (5.21) Then |V0(A[λ])| =α∗n +o(n) w.h.p. Proof. Proposition 2.5 and Lemma 4.5 yield the lower bound |V0(A[λ])| ≥α∗n +o(n) w.h.p. (5.22) To derive the matching upper bound, fix ε> 0 and assume that the event E = {|V0(A[λ])| > (α∗+ε)n} has probability P [E ] > ε. Then by Proposition 2.2 (iii) there exists λ′ >λ such that αmax(λ′′) =α∗(λ′′) and α∗(λ′′) <α∗(λ)+ε2/2 for all λ′′ ∈ [λ,λ′]. Hence, Lemmas 5.3 yields Φd ,k,λ′ (αmax(λ′))−Φd ,k,λ(αmax(λ)) ≤ ∫ λ′ λ (α∗(λ′′)−1)dλ′′ ≤ (λ′−λ)(α∗(λ)−ε2/2/−1). (5.23) Combining (5.23) with Proposition 2.6 and Lemma 5.2, we obtain n−1 [ E [ nul A[λ′] ]−E [nul A[λ]] ]≤ (λ′−λ)(α∗(λ)−ε2/2/−1+o(1)). (5.24) On the other hand, since adding checks can only increase the number of frozen variables, Lemma 5.4 shows that n−1 [ E [ nul A[λ′] ]−E [nul A[λ]] ]≥ (λ′−λ)(α∗(λ)+P [E ]ε−1+o(1)) ≥ (λ′−λ)(α∗(λ)+ε2 −1+o(1)). (5.25) Finally, since (5.24) and (5.25) contradict each other, we have refuted the assumption P [E ] > ε. □ Lemma 5.6. Assume that d ,λ> 0 are such that Φd ,k,λ(α∗) >Φd ,k,λ(α) for all α ∈ [0,1] \ {α∗}. (5.26) Then |V0(A[λ])| =α∗n +o(n) w.h.p. Proof. We use a similar strategy as in the proof of Lemma 5.5. Hence, assume that d ,λ> 0 satisfy (5.26). Combining Proposition 2.5 and Lemma 4.6, we see that |V0(A[λ])| ≤ α∗n +o(n) w.h.p. Now choose a small enough ε > 0 and assume that E = {|V0(A[λ])| < (α∗−ε)n} occurs with probability P [E ] > ε. Then Proposition 2.2 shows that there exists λ′ <λ such that αmax(λ′′) =α∗(λ′′) and α∗(λ′′) >α∗(λ)−ε2/2 for all λ′′ ∈ [λ,λ′]. Hence, Lemmas 5.3 yields Φd ,k,λ(αmax(λ))−Φd ,k,λ′ (αmax(λ′)) = ∫ λ′ λ (α∗(λ′′)−1)dλ′′ ≥ (λ′−λ)(α∗(λ)−ε2/2/−1). (5.27) But once again because adding checks can only increase the number of frozen variables, Lemma 5.4 yields n−1 [ E [nul A [λ]]−E [ nul A [ λ′]]]≤ (λ′−λ)(α∗(λ)−P [E ]ε−1+o(1)) ≤ (λ′−λ)(α∗(λ)−ε2 −1+o(1)). (5.28) However, Proposition 2.6 and Lemma 5.3 show that (5.27)–(5.28) are in contradiction. □ Proof of Proposition 2.8. Sinceαmax ∈ {α∗,α∗}, the assertion is an immediate consequence of Lemmas 5.5–5.6. □ 23 5.3. Proof of Corollary 2.9. There are four cases to consider separately. Let ε> 0. Case 1: d < dmin: As Proposition 2.2 (i) shows, in this case we have α∗ = α∗ for all λ > 0; thus, the func- tion φd ,k,λ has only the single fixed point α∗, which is stable. Furthermore, Proposition 2.5 shows that ||Vn,ℓ(F DC,t )|−α∗n| < εn/2 for large enough ℓ w.h.p. Moreover, Proposition 2.8 yields |V0(F DC,t )| = α∗n + o(n) w.h.p. Therefore, Proposition 2.7 implies that |V0(F DC,t )△Vn,ℓ(F DC,t )| < εn w.h.p. for large enough ℓ. Since |Vn,ℓ(F DC,t )| ⊆V0(F DC,t ) w.h.p. and |Vn,ℓ(F DC,t )△Vn(F DC,t )| < εn by (2.19), the assertion follows. Case 2: dmin < d < dsat and θ > θ∗: A similar argument as under Case 1 applies. Indeed, Proposition 2.2 (ii) shows that α∗ = α∗ is the unique and stable fixed point of φd ,k,λ. Since ||Vn,ℓ(F DC,t )| −α∗n| < εn/2 for large ℓ w.h.p. by Proposition 2.5 and |V0(F DC,t )| = α∗n + o(n) w.h.p. by Proposition 2.8, Proposition 2.7 yields |V0(F DC,t )△Vn,ℓ(F DC,t )| < εn w.h.p. Therefore, (2.19) implies the assertion. Case 3: dmin < d < dsat and θ < θcond: Proposition 2.2 (ii) shows that α∗ <α∗ in this case. Moreover, Proposi- tion 2.5 yields ||Vn,ℓ(F DC,t )|−α∗n| < εn/2 for large ℓ w.h.p., while Proposition 2.8 and Proposition 2.2 (iii) imply that |V0(F DC,t )| =α∗n +o(n) w.h.p. Thus, the same steps as in Cases 1–2 complete the proof. Case 4: dmin < d < dsat and θcond < θ < θ∗: Once again Proposition 2.2 (ii) shows thatα∗ <α∗, Proposition 2.5 yields ||Vn,ℓ(F DC,t )|−α∗n| < εn/2 for large ℓ w.h.p., and Proposition 2.8 and Proposition 2.2 (iii) show that |V0(F DC,t )| =α∗n +o(n) w.h.p. Since |Vn,ℓ(F DC,t )| ⊆ V0(F DC,t ) w.h.p., the assertion follows from (2.19) and the fact that α∗ <α∗. 5.4. Proof of Corollary 2.10. Assume first that θ < θcond. Then Corollary 2.9 shows that |V0(F DC,t )△Vn(F DC,t ) = o(n) for large enough ℓ. Since Vn(F DC,t )∩Vf(F DC,t ) =; by construction, the first assertion follows. Now suppose θ > θcond. Then Proposition 2.5 yields ||Vf,ℓ(F DC,t )|−α∗n| < εn/2 for large ℓ w.h.p., while Propo- sition 2.8 and Proposition 2.2 (iii) show that |V0(F DC,t )| =α∗n+o(n) w.h.p. Additionally, Proposition 2.5 shows that |Vu,ℓ(F DC,t )∩V0(F DC,t )| < εn for large ℓ, which implies the assertion. 5.5. Proofs of Theorems 1.2 and 1.3. We begin with the following observation. Lemma 5.7. Let σ ∈ ker(F DC,t ) be uniformly random. For any ℓ> 0 w.h.p. we have P [ σxt+1 = 0 | F DC,t ,σ∂2ℓxt+1 ] = 1 2 ( 1+ 1{xt+1 ∈Vf,ℓ(F DC,t )∪Vn,ℓ(F DC,t )} ) , (5.29) πF DC,t =P [ σxt+1 = 0 | F DC,t ]= 1 2 ( 1+ 1{xt+1 ∈V0(F DC,t )} ) . (5.30) Proof. Notice that for d < dsat the random XORSAT instance F is satisfiable w.h.p.; therefore, so is F DC,t . We begin with the proof of (5.30). The first equality πF DC,t =P [ σxt+1 = 0 | F DC,t ] follows from the fact that the set of solutions of F DC,t is an affine translation of ker(A(F DC,t )). Moreover, the second equality sign follows from the well known fact that the marginal P [ σxt+1 = 0 | F DC,t ] is equal to 1/2 or to 1. Moving on to (5.29), we recall from Lemma 4.1 that the depth-2ℓ neighbourhood ∂≤ℓxt+1 of xt+1 in F DC,t is acyclic w.h.p. Furthermore, we can think of P [ σxt+1 = 0 | F DC,t ,σ∂≤ℓxt+1 ] as the marginal probability that xt+1 re- ceives the value zero under a random vector from the kernel of the check matrix of ∂≤ℓxt+1, subject to imposing the values σ∂≤ℓxt+1 upon the variable at distance exactly 2ℓ from xt+1. Let F (ℓ) DC,t signify the XORSAT instance thus obtained. Then we conclude that P [ σxt+1 = 0 | F DC,t ,σ∂≤ℓxt+1 ] = 1 iff xt+1 ∈ V0(F (ℓ) DC,t ). Furthermore, because BP is exact on acyclic factor graphs, we have xt+1 ∈ V0(F (ℓ) DC,t ) iff xt+1 ∈ VV0,ℓ(F DC,t )∪Vn,ℓ(F DC,t ). Thus, we obtain (5.29). □ Proof of Theorem 1.2. We begin with claim (i) concerning d < dmin. As Proposition 2.2 (i) shows, in this case we have α∗ = α∗. Furthermore, Proposition 2.5 shows that ||Vn,ℓ(F DC,t )|−α∗n| < εn and |VV0,ℓ(F DC,t )| < εn for large enough ℓ w.h.p. Moreover, Proposition 2.8 yields |V0(F DC,t )| = α∗n + o(n) w.h.p. Therefore, Proposition 2.7 im- plies that |V0(F DC,t )△Vn,ℓ| < εn w.h.p. for large enough ℓ. Hence, Lemma 5.7 shows that the non-reconstruction property (1.7) holds w.h.p. Similarly, towards the proof of (ii) assume that dmin < d < dsat and θ < θ∗. Then Proposition 2.2 (ii) shows that α∗ = α∗ is the unique (stable) fixed point of φd ,k,λ. Therefore, the argument from the previous paragraph shows that (1.7) holds w.h.p.Further, suppose that dmin < d < dsat and θ > θcond. Then Corollary 2.10 (ii) shows that |(Vn,ℓ(F DC,t )∪Vf,ℓ(F DC,t ))△V0,ℓ(F DC,t )| < εn w.h.p. Therefore, Lemma 5.7 implies non-reconstruction property, and thus the proof of (ii) is complete. 24 Finally, suppose that dmin < d < dsat and θ∗ < θ < θcond. Then Proposition 2.5 shows that ||Vn,ℓ(F DC,t )−α∗n| < εn and |VV0,n|−(α∗−α∗)n| < εn for large enough ℓw.h.p. Moreover, Corollary 2.10 shows that |Vf,n∩V0(F DC,t )| < εn w.h.p. Consequently, Lemma 5.7 demonstrates that the reconstruction condition (1.8) holds w.h.p. □ Proof of Theorem 1.3. Part (i) regarding the case d < dmin is an immediate consequence of Fact 2.4 (the equivalence of WP and BP), Corollary 2.9 (i) and Lemma 5.7. The same is true of part (ii) concerning dmin < d < dsat and θ < θcond or θ > θ∗. Furthermore, (iii) follows from Corollary 2.9 (ii) and Lemma 5.7. □ 6. BELIEF PROPAGATION GUIDED DECIMATION In this section we prove Theorem 1.1. We begin by arguing that BPGD is actually equivalent to the simple combina- torial Unit Clause Propagation algorithm. Then we prove the ‘positive’ part, i.e., the formula (1.6) for the success probability for d < dmin. Subsequently we prove the second part of the theorem concerning dmin < d < dsat. 6.1. Unit Clause Propagation redux. The simple-minded Unit Clause Propagation algorithm attempts to assign random values to as yet unassigned variables one after the other. After each such random assignment the algo- rithm pursues the ‘obvious’ implications of its decisions. Specifically, the algorithm substitutes its chosen truth values for all occurrences of the already assigned variables. If this leaves a clause with only a single unassigned variable, a so-called ‘unit clause’, the algorithm assigns that variable so as to satisfy the unit clause. If a conflict occurs because two unit clauses impose opposing values on a variable, the algorithm declares that a conflict has occurred, sets the variable to false and continues; of course, in the event of a conflict the algorithm will ultimately fail to produce a satisfying assignment. The pseudocode for the algorithm is displayed in Algorithm 3. 1 Let U =; and let σUC : U → {0,1} be the empty assignment; 2 for t = 0, . . . ,n −1 do 3 if xt+1 ̸∈U then 4 add xt+1 to U ; 5 choose σUC(xt+1) ∈ {0,1} uniformly at random; 6 while F [σUC] contains a unit clause a do 7 let x be the variable in a; 8 let s ∈ {0,1} be the truth value that x needs to take to satisfy a; 9 if another unit clause a′ exists that requires x be set to 1− s then 10 output ‘conflict’ and let σUC(x) = 0; 11 else 12 add x to U and let σUC(x) = s; 13 return σUC; Algorithm 3: The UCP algorithm. Let F UC,t denote the simplified formula obtained after the first t iterations (in which the truth values chosen for x1, . . . , xt and any values implied by Unit Clauses have been substituted). We notice that the values assigned during Steps 6–12 are deterministic consequences of the choices in Step 5. In particular, the order in which unit clauses are processed Steps 6–12 does not affect the output of the algorithm. Proposition 6.1. We have P [ BPGD outputs a satisfying assignment of F ]=P[ UCP outputs a satisfying assignment of F ] . Proof. We employ the following coupling. Let τ ∈ {0,1}n be a uniformly random vector. The BPGD algorithm sets σBP(xt+1) = τt+1 if µF BP,t = 1/2. Analogously, UCP sets σUC(xt+1) = τt+1 in Step 5 (if xt+1 ̸∈ U ). Hence, because (1.1) guarantees that the BP marginals µF BP,t are half-integral, the coupling ensures that the “free steps” of the two algorithms pick the same truth values. We now proceed by induction on 0 ≤ t ≤ n to prove the following two statements. UCP1: unless UCP encountered a conflict before time t we have σBP(xi ) =σUC(xi ) for i = 1, . . . , t . UCP2: if t < n and there has been no conflict before time t we have we have µF BP,t+1 = 1/2 iff xt+1 ̸∈U . 25 For t = 0 both of these statements are clearly correct because µF BP,0 = 1/2 and x1 ̸∈U . Now assume that UCP1–UCP2 hold at time t −1 and that no conflict has occurred yet. Then we already know that σBP(xi ) =σUC(xi ) for i = 1, . . . , t −1. Furthermore, since UCP2 is correct at time t −1 we have µF BP,t = 1/2 iff xt ̸∈U . Consequently, if xt ̸∈U then σBP(xt ) =σUC(xt ). Hence, suppose that xt ̸∈U and thus µF BP,t ∈ {0,1}. Then given σBP(x1) =σUC(x1), . . . ,σBP(xt−1) =σUC(xt−1) the value σUC(xt ) is implied by unit clause propagation. But a glimpse at the BP update rules (2.7)–(2.8) shows that these encompass the unit clause rule. Specifically, if x is the only remaining variable in clause a, then (2.7) ensures that the message from a to x gives probability one to the value that satisfies clause a. Therefore, the definition (2.9) of the BP marginal demonstrates that µF BP,t =σUC(x1) and thus σBP(xt ) =σUC(xt ). Thus, UCP1 continues to hold for t . Similar reasoning yields UCP2. Indeed, revisiting (2.7), we see that the BP message that clause a sends to vari- able x equals 1/2 unless a is a unit clause. In effect, (2.9) shows that the BP marginal µF BP,t+1 is equal to 1/2 unless the value of xt+1 is implied by the unit clause rule. This completes the induction. To complete the proof assume that UCP manages to find a satisfying assignment. Then UCP1 applied to t = n demonstrates that BPGD outputs the very same satisfying assignment. Conversely, if UCP encounters a conflict at some time t , then UCP1 shows that BPGD chose the same assignment up to time t . Therefore, it is not possible to extend the partial assignmentσBP(x1), . . . ,σBP(xt ) to a satisfying assignment of F and thus BPGD will ultimately fail to output a satisfying assignment. □ In light of Proposition 6.1 we are left to study the success probability of UCP. The following two subsections deal with this task for d < dmin and d > dmin, respectively. 6.2. The success probability of UCP for d < dmin. We continue to denote by F UC,t the sub-formula obtained after the first t iterations of UCP. Let V (t ) ⊆ {xt+1, . . . , xn} be the set of variables of F UC,t . Thus, V (t ) contains those variables among xt+1, . . . , xn whose values are not implied by the assignment of x1, . . . , xt via unit clauses. Also let C (t ) be the set of clauses of F UC,t ; these clauses contain variables from V (t ) only, and each clause contains at least two variables. Let V̄ (t ) = Vn \ V (t ) be the set of assigned variables. Thus, after its first t iterations UCP has constructed an assignment σUC : V̄ (t ) → {0,1}. Moreover, let V ′(t +1) = V (t ) \ V (t +1) be the set of variables that receive values in the course of the iteration t +1 for 0 ≤ t < n. Additionally, let C ′(t +1) be the set of clauses of F UC,t that consists of variables from V ′(t +1) only. Finally, let F ′ UC,t+1 be the formula comprising the variables V ′(t +1) and the clauses C ′(t +1). To characterise the distribution of F UC,t let n(t ) = |V (t )| and let mℓ(t ) be the number of clauses of length ℓ, i.e., clauses that contain precisely ℓ variables from V (t ). Observe that m1(t ) = 0 because unit clauses get eliminated. Let Ft be the σ-algebra generated by n(t ) and (mℓ(t ))2≤ℓ≤k . Fact 6.2. The XORSAT formula F UC,t is uniformly random given Ft . In other words, the variables that appear in each clause are uniformly random and independent, as are their signs. Proof. This follows from the principle of deferred decisions. □ We proceed to estimate the random variables n(t ),mℓ(t ). Let α(t ) = |V̄ (t )|/n so that n(t ) = n(1−α(t )). Let λ = λ(θ) = − log(1−θ) with θ ∼ t/n and recall that α∗ = α∗(d ,k,λ) denotes the smallest fixed point of φd ,k,λ. The proof of the following proposition proof can be found in Section 6.2.1. Proposition 6.3. Suppose that d < dmin(k). There exists a function δ= δ(n) = o(1) such that for all 0 ≤ t < n and all 2 ≤ ℓ≤ k we have P [|α(t )−α∗| > δ] =O(n−2), P [∣∣∣∣∣mℓ(t )− dn k ( k ℓ ) (1−α∗)ℓαk−ℓ ∗ ∣∣∣∣∣> δn ] =O(n−2). (6.1) Proposition 6.3 paves the way for the actual computation of the success probability of UCP. Let Rt be the event that a conflict occurs in iteration t . The following proposition gives us the correct value of P [Rt |Ft ] w.h.p. Since Ft is a random variable the value for the probability P [Rt |Ft ] is random as well. Proposition 6.4. Fix ε> 0, let 0 ≤ t < (1−ε)n and define fn(t ) = d(k −1)(1−α∗)αk−2 ∗ . (6.2) 26 Then with probability 1−o(1/n) we have P [Rt |Ft ] = fn(t )2 4(n − t )(1− fn(t ))2 +o(1/n). The proof of Proposition 6.4 can be found in Section 6.2.2. Moreover, in Section 6.2.3 we prove the following. Proposition 6.5. Fix ε> 0 and ℓ≥ 1. For any 0 ≤ t1 < ·· · < tℓ < (1−ε)n we have P [ ℓ⋂ i=1 Rti ] ∼ ℓ∏ i=1 fn(ti )2 4(n − ti )(1− fn(ti ))2 . (6.3) Finally, the following statement deals with the εn final steps of the algorithm. Proposition 6.6. For any δ> 0 there exists ε> 0 such that P [⋃ (1−ε)n 0, fix a small enough ε = ε(δ) > 0 and let R = ∑n−1 t=0 1{Rt } be the total number of times at which conflicts occur. Proposition 6.1 shows that the probability that BPGD succeeds equals P [R = 0]. In order to calculateP [R = 0], let Rε = ∑ 0≤t≤(1−ε)n 1{Rt } be the number of failures before time (1−ε)n. Proposition 6.5 shows that for any fixed ℓ≥ 1 we have E [ ℓ∏ i=1 (Rε− i +1) ] = ℓ! ∑ 0≤t1<··· Rε] < δ. (6.7) Thus, the assertion follows from (6.5)–(6.7) upon taking the limit δ→ 0. □ 6.2.1. Proof of Proposition 6.3. The proof of Proposition 6.3 is based on the method of differential equations. Specifically, based on Fact 6.2 we derive a system of ODEs that track the random variables α(t ),m2(t ), . . . ,mk (t ). We will then identify the unique solution to this system. As a first step we work out the conditional expectations of α(t +1),m2(t +1), . . . ,mk (t +1) given Ft . Lemma 6.7. If 2m2(t )/n(t ) < 1−Ω(1) and n(t ) =Ω(n), then E [n(t )−n(t +1) |Ft ] = n(t )2 (n − t )(n(t )−2m2(t )) +o(1), (6.8) E [mℓ(t +1)−mℓ(t ) |Ft ] = n(t )2 (n − t )(n(t )−2m2(t )) · (ℓ+1)mℓ+1(t )−ℓmℓ(t ) n(t ) +o(1) (2 ≤ ℓ< k), (6.9) E [mk (t +1) |Ft ] =− n(t )2 (n − t )(n(t )−2m2(t )) · kmk (t ) n(t ) +o(1). (6.10) 27 Proof. Going from time t to time t +1 involves the express assignment of variable xt+1, unless it had already been assigned a value due to previous decisions, and the subsequent pursuit of unit clause implications. The probability given Ft that xt+1 was set in a previous iteration equals qt+1 = 1− n(t ) n − t . (6.11) Indeed, the first t iterations assigned values to a total of n −n(t ) variables, including x1, . . . , xt , and Fact 6.2 shows that the identities of the assigned variables among xt+1, . . . , xn are random. Let Qt+1 be the event that xt+1 was not assigned previously. Given Qt+1 we need to pursue unit clause im- plications. To this end, recall the bipartite graph representation G(F UC,t ) of the formula F UC,t . Let G2(F UC,t ) be the subgraph of G(F UC,t ) obtained by removing all clauses of length greater than two. Then Fact 6.2 shows that G2(F UC,t ) is a uniformly random bipartite graph with n(t ) nodes on one side and m2(t ) nodes of degree two on the other side. Furthermore, the number of variables whose values are implied by unit clause propagation is lower bounded by the number of variable nodes in the component of xt+1 in G2(F UC,t ). The expected size of this com- ponent can be computed as the expected progeny of a branching process with offspring Po(2m2(ℓ)/n(t )). As is well known, under the assumption 2m2(t )/n(t ) < 1−Ω(1) that the branching process is sub-critical, the expected progeny comes to (1−2m2(t )/n(t ))−1. Hence, we obtain E [n(α(t +1)−α(t )) |Ft ] ≥ 1−qt+1 1−2m2(t )/n(t ) . (6.12) Strictly speaking, (6.12) only gives a lower bound on E [n(α(t +1)−α(t )) |Ft ] because additional unit clause im- plications could arise from clauses of length greater than two. However, for this to happen a clause would have to contain at least two variables that are set in iteration t + 1 (i.e., either xt+1 itself or a variable whose value is implied due to unit clause propagation). But since 2m2(t )/n(t ) < 1−Ω(1), the expected number of such implica- tions is bounded, and thus the expected number of longer clauses that turn into unit clauses is of order O(1/n). Consequently, the lower bound (6.12) is tight up to an O(1/n) error term, whence we obtain (6.8). Moving on to (6.9)–(6.10) we notice that for 2 ≤ ℓ< k there are two ways in which the number of clauses of length ℓ can change from iteration t to iteration t +1. First, it could be that clauses of length ℓ contain one variable that gets a value assigned. Any such clauses shorten to length ℓ−1 (if ℓ> 2) or become unit clauses and subsequently disappear (ℓ = 2). In light of Fact 6.2, the probability that a given clause of length ℓ suffers this fate comes to ℓ(n(t )−n(t +1))/n(t )+o(1). Conversely, if ℓ< k additional clauses of length ℓ may result from the shortening of clauses of length ℓ+1. Analogously to the previous computation, the probability that a given clause of length ℓ+1 shortens to length ℓ comes to (ℓ+1)(n(t )−n(t +1))/n(t )+o(1). Of course, there could also be clauses that contain more than one variable that receives a value during iteration t +1. However, the probability of this event is of order O(1/n2). Hence, (6.8) implies (6.9) and (6.10). □ Lemma 6.7 puts us in a position to derive a system of ODEs to track the random variables n(t ),m2(t ), . . . ,mk (t ). Specifically, we obtain the following. Corollary 6.8. Let n,m2, . . . ,mk : [0,1] →R be continuously differentiable functions such that n(0) = 1, mk (0) = d k , (6.13) ∂n ∂θ =− n2 (1−θ)(n−2m2) , (6.14) ∂mℓ ∂θ = n((ℓ+1)mℓ+1 −ℓmℓ) (1−θ)(n−2m2) (2 ≤ ℓ< k), ∂mk ∂θ =− knmk (1−θ)(n−2m2) . (6.15) Assume, furthermore, that sup θ∈[0,1] 2m2(θ)/n(θ) < 1. (6.16) Then with probability 1−o(n−2) for all 0 ≤ t ≤ n we have n(t )/n = n(t/n)+o(1), mℓ(t )/n =mℓ(t/n)+o(1) (2 ≤ ℓ≤ k). Proof. This follows from Lemma 6.7 in combination with [26, Theorem 2]. □ As a next step we construct an explicit solution to the system (6.13)–(6.15). 28 Lemma 6.9. If d < dmin, then the functions n∗(θ) = 1−α∗(λ(θ)), m∗ ℓ(θ) = d k ( k ℓ ) (1−α∗(λ(θ)))ℓα∗(λ(θ))k−ℓ. (6.17) satisfy (6.13)–(6.16). Proof. The initial condition (6.13) is satisfied because α∗(λ(0)) = 0. Furthermore, (3.11) shows that ∂n∗ ∂θ =−∂α∗ ∂λ · ∂λ ∂θ =− 1−α∗ 1−d(k −1)αk−2∗ (1−α∗) · 1 1−θ =− n∗ (1−θ)(1−2m∗ 2 /n∗) . (6.18) Hence, (6.14) is satisfied. Furthermore, (6.18) implies that for 2 ≤ ℓ< k we have ∂m∗ ℓ ∂θ = d k · ∂λ ∂θ · ∂α∗ ∂λ · ( k ℓ )[ (k −ℓ)αk−ℓ−1 ∗ (1−α∗)ℓ−ℓαk−ℓ ∗ (1−α∗)ℓ−1 ] = n∗ (1−θ)(1−2m∗ 2 /n∗) · d k(1−α∗) · ( k ℓ )[ (ℓ+1)(1−α∗)ℓ+1αk−ℓ−1 ∗ −ℓαk−ℓ ∗ (1−α∗)ℓ ] = n∗ (1−θ)(n∗−2m∗ 2 ) · [(ℓ+1)m∗ ℓ+1 −ℓm∗ ℓ ] , which is the first part of (6.15). An analogous computation yields the second part of (6.15). Finally, (6.16) follows from (3.11). □ Proof of Proposition 6.3. The proposition is an immediate consequence of Corollary 6.8 and Lemma 6.9. □ 6.2.2. Proof of Proposition 6.4. Recall that F ′ UC,t+1 is the XORSAT formula that contains the variables V ′(t +1) that get assigned during iteration t +1 and the clauses C ′(t +1) of F UC,t that contain variables from V ′(t +1) only. Also recall that G(F ′ UC,t+1) signifies the graph representation of this XORSAT formula. Unless V ′(t +1) = ;, the graph G(F ′ UC,t+1) is connected. Lemma 6.10. Fix ε> 0 and let 0 ≤ t ≤ (1−ε)n. With probability 1−o(1/n) the graph G(F ′ UC,t+1) satisfies |E(G(F ′ UC,t+1))| ≤ |V (G(F ′ UC,t+1))|. Proof. We recall from the proof of Lemma 6.7 that iteration t + 1 of UCP can be described by a branching pro- cess on the random graph G(F UC,t ). Given that xt+1 is still unassigned, the offspring distribution of the branch- ing process has mean 2m2(t )/n(t ). Moreover, Proposition 6.3 shows that with probability 1 −O(n−2) we have 2m2(t )/n(t ) ∼ d(k − 1)(1−α∗)αk−2 ∗ < 1 (as d < dmin). Hence, the branching process is sub-critical. As a conse- quence, with probability 1−O(n−2) we have P [ |V (G(F ′ UC,t+1))| ≥ log2 n ] =O(n−2). (6.19) Each step of the branching process corresponds to pursuing the unit clause implications of assigning a truth value to a single variable x. A cycle in G(F ′ UC,t+1) can only ensue if a clause that contains x also contains a variable that has already been set previously during iteration t +1. In light of (6.19), with probability 1−O(n−2) there are no more than log2 n such variables. Hence, the probability that the assignment of x closes a cycle is of order O(log2 n/n). Additionally, by the principle of deferred decisions the events that two different clauses processed by unit clause propagation close cycles is of order O(log4 n/n2). Finally, since by (6.19) we may assume that the total number of clauses does not exceed O(log2 n), we conclude that P [ |E(G(F ′ UC,t+1))| > |V (G(F ′ UC,t+1))| ] =O(log6 n/n2) = o(1/n), as desired. □ Thus, with probability 1−o(1/n) the graph G(F ′ UC,t+1) contains at most one cycle. While it is easy to check that no conflict occurs in iteration t +1 if G(F ′ UC,t+1) is acyclic, in the case that G(F ′ UC,t+1) contains a single cycle there is a chance of a conflict. The following definition describes the type of cycle that poses an obstacle. Definition 6.11. For a XORSAT formula F we call a sequence of variables and clauses C = (v1,c1, . . . , vℓ,cℓ, vℓ+1 = v1) a toxic cycle of length ℓ if 29 TOX1: ci contains the variables xi , xi+1 only, and TOX2: the total number of negations in c1, . . .cℓ is odd iff ℓ is even. Lemma 6.12. (i) If F ′ UC,t+1 contains a toxic cycle, then a conflict occurs in iteration t +1. (ii) If F ′ UC,t+1 contains no toxic cycle and |E(G(F ′ UC,t+1))| ≤ |V (G(F ′ UC,t+1))|, then no conflict occurs in iteration t +1. Proof. Towards (i) we show that F ′ UC,t+1 is not satisfiable if there is a toxic cycle C = (v1,c1, . . . ,cℓ, vℓ+1 = v1); then UCP will, of course, run into a contradiction. To see that F ′ UC,t+1 is unsatisfiable, we transform each of the clauses c1, . . . ,cℓ into a linear equation ci ≡ (vi + vi+1 = yi ) over F2. Here yi ∈ F2 equals 1 iff ci contains an even number of negations. Adding these equations up yields ∑ℓ i=1 yi = 0 in F2. This condition is violated if C is toxic. Let us move on to (ii). Assume for contradiction that there exists a formula F without a toxic cycle such that |V (G(F ))| ≤ |E(G(F ))| and such that given F ′ UC,t+1 = F , UCP may run into a conflict. Consider such a formula F that minimises |V (F )| + |C (F )|. Since UCP succeeds on acyclic F , we have |V (G(F ))| = |E(G(F ))|. Thus, G(F ) contains a single cycle C = (v1,c1, . . . , vℓ,cℓ, vℓ+1 = v1). Apart from the cycle, F contains (possibly empty) acyclic formulas F ′ 1, . . . ,F ′ ℓ attached to v1, . . . , vℓ and F ′′ 1 , . . . ,F ′′ ℓ attached to c1, . . . ,cℓ. The formulas F ′ 1,F ′′ 1 , . . . ,F ′ ℓ ,F ′′ ℓ are mutually disjoint and do not contain unit clauses. We claim that F ′ 1, . . . ,F ′ ℓ are empty because |V (F )|+ |C (F )| is minimum. This is because given any truth assign- ment of v1, . . . , vℓ, UCP will find a satisfying assignment of the acyclic formulas F ′ 1, . . . ,F ′ ℓ . Further, assume that one of the formulas F ′′ 1 , . . . ,F ′′ ℓ is non-empty; say, F ′′ 1 is non-empty. If the start variable that UCP assigns were to belong to F ′′ 1 , then c1, containing x1 and x2, would not shrink to a unit clause, and thus UCP would not assign values to these variables. Hence, UCP starts by assigning a truth value to one of the variables v1, . . . , vℓ; say, UCP starts with v1. We claim that then UCP does not run into a conflict. Indeed, the clauses c2, . . . ,cℓ may force UCP to assign truth values to x2, . . . , xℓ, but no conflict can ensue because UCP will ultimately satisfy c1 by assigning appropriate truth values to the variables of F ′′ 1 . Thus, we may finally assume that all of F ′ 1,F ′′ 1 , . . . ,F ′ ℓ ,F ′′ ℓ are empty. In other words, F consists of the cycle C only. Since C is not toxic, TOX2 does not occur. Consequently, UCP will construct an assignment that satisfies all clauses c1, . . . ,cℓ. This final contradiction implies (ii). □ Corollary 6.13. Fix ε> 0 and let 0 ≤ t ≤ (1−ε)n. Then P [Rt+1] =P [ F ′ UC,t+1 contains a toxic cycle ] +o(1/n). Proof. This is an immediate consequence of Lemma 6.10 and Lemma 6.12. □ Thus, we are left to calculate the probability that F ′ UC,t+1 contains a toxic cycle. To this end, we estimate the number of toxic cycles in the ‘big’ formula F UC,t . Let T t ,ℓ be the number of toxic cycles of length ℓ in F UC,t . Lemma 6.14. Fix ε> 0 and let 1 ≤ t ≤ (1−ε)n. (i) For any fixed ℓ, with probability 1−O(n−2) we have E [T t (ℓ) |Ft ] =βℓ+o(1), where βℓ = 1 4ℓ ( d(k −1)(1−α∗)αk−2 ∗ )ℓ = 1 4ℓ ( fn(t ) )ℓ . (ii) For any 1 ≤ ℓ≤ n, with probability 1−O(n−2) we have E [T t (ℓ) |Ft ] ≤βℓ exp(εℓ). Proof. In light of Fact 6.2, the calculation of the expected number of toxic cycles is straightforward. Indeed, we just need to pick sequences of ℓ distinct variables and clauses, place the variables into the clauses in a cyclic fashion, and multiply by the probability that the clauses contain no other variables and that the parity of the signs of the clauses works out as per TOX2. Of course, in this way we over count toxic cycles 2ℓ times (due to the choice of the starting point and the orientation). Hence, we obtain E [T t (ℓ) |Ft ] = (n)ℓ(m)ℓ 4ℓn2ℓ (k(k −1))ℓ (1−α(t ))ℓα(t )ℓ(k−2). (6.20) Thus, (i) follows from (6.20) and Proposition 6.3. Further, (6.20) demonstrates that E [T t (ℓ) |Ft ] ≤ 1 4ℓ ( d(k −1)(1−α(t ))α(t )k−2 )ℓ . (6.21) Finally, combining (6.21) with Proposition 6.3, we obtain (ii). □ 30 Proof of Proposition 6.4. In light of Corollary 6.13 we just need to calculate the probability that F ′ UC,t+1 contains a toxic cycle. Clearly, if during iteration t +1 UCP encounters a variable of F UC,t that lies on a toxic cycle, UCP will proceed to add the entire toxic cycle to F ′ UC,t+1 (and run into a contradiction). Furthermore, Lemma 6.14 shows that with probability 1−O(n−2) given Ft the probability that a random variable of F UC,t belongs to a toxic cycle comes to β̄= ∑ ℓ≥2 ℓβℓ+o(1) = ∑ ℓ≥2 1 4 ( fn(t ) )ℓ = fn(t )2 4(1− fn(t )) +o(1) =O(1). (6.22) We now use (6.22) to calculate the desired probability of encountering a toxic cycle. To this end we recall from the proof of Lemma 6.7 that the (t +1)-st iteration of UCP corresponds to a branching process with expected off- spring fn(t ), unless the root variable xt+1 has already been assigned. Due to (6.11) and Proposition 6.3, with proba- bility 1−O(n−2) the conditional probability of this latter event equals (nα∗−t )/(n−t )+o(1). Further, given that the root variable has not been assigned previously, the expected progeny of the branching process, i.e., the expected number of variables in F ′ UC,t+1, equals 1/(1− fn(t ))+o(1). Since with probability 1−O(n−2) given Ft there remain n(t ) = (1−α∗+o(1))n unassigned variables in total, (6.22) implies that with probability 1−o(1/n), P [Rt+1 |Ft ] ∼ β̄ (1−α∗)n · 1−α∗ 1− t/n · 1 1− fn(t ) = fn(t )2 4(1− fn(t ))2(n − t ) +o(1/n), as claimed. □ 6.2.3. Proof of Proposition 6.5. We combine Fact 6.2 with the tower rule. Specifically, let 0 ≤ t1 < ·· · < th < (1−ε)n be distinct time indices. Then repeated application of the tower rule gives P [ h⋂ i=1 Rti ] = E [ h∏ i=1 1 { Rti } ] = E [ E [ h∏ i=1 1 { Rti } |Fti−1 ]] = E [( h−1∏ i=1 1 { Rti } ) P [ Rth |Fth−1 ] ] = ·· · = E [ h∏ i=1 P [ Rti |Fti−1 ] ] . (6.23) Furthermore, Proposition 6.4 shows that with probability 1−o(1/n), P [ Rti |Fti−1 ]= fn(ti )2 4(n − ti )(1− fn(ti ))2 +o(1/n) for all 1 ≤ i ≤ h. (6.24) Combining (6.23)–(6.24) completes the proof. 6.2.4. Proof of Proposition 6.6. Given δ> 0 pick ε> 0 small enough and let t = ⌈(1−ε)n⌉. We are going to show that the graph G(F UC,t ) is acyclic with probability at least 1−δ. Since all clauses of F UC,t contain at least two variables, UCP will find a satisfying assignment if G(F UC,t ) is acyclic. To show that G(F UC,t ) is acyclic, we observe that α∗ ≥ t/n. Hence, α∗ approaches one as t/n → 1. Further, Fact 6.2 shows that G(F UC,t ) is uniformly random given the degree distribution (6.1) of the clause nodes. Indeed, the expression (6.1) shows that with probability 1−O(n−2) the expected size of the second neighbourhood of a given variable node is asymptotically equal to γ= γ(ε) = 1 (1−α∗)n · dn k k∑ ℓ=2 ℓ ( k ℓ ) (1−α∗)ℓαk−ℓ ∗ = d(1−αk−1 ∗ ). Hence, as limε→0γ = 0, the average degree of the random graph G(F UC,t ) tends to zero as ε→ 0. Therefore, for small enough ε> 0 the random graph G(F UC,t ) is acyclic with probability greater than 1−δ. 6.3. Failure of UCP for dmin < d < dsat. In this section we assume that dmin < d < dsat. As in Section 6.2 we are going to trace UCP via the method of differential equations. In particular, we keep the notation from Section 6.2. Thus, n(t ) signifies the number of unassigned variables after t iterations, and mℓ(t ) denotes the number of clauses that contain precisely 2 ≤ ℓ ≤ k unassigned variables. Moreover, F UC,t is the formula comprising these variables and clauses. The following statement is the analogue of Proposition 6.3 for dmin < d < dsat. Its proof relies on similar arguments as the proof of Proposition 6.3. Proposition 6.15. Suppose that dmin(k) < d < dsat(k), fix ε,δ > 0 and let 0 < t < (1−ε)θ∗n. Then (6.1) holds with probability 1−O(n−2). 31 Proof. The formulas (6.8)–(6.10) for the conditional expected changes n(t +1)−n(t ),mℓ(t +1)−m(t ) continue to hold for dmin < d < dsat, so long as we assume that 2m2(t )/n(t ) < 1−Ω(1) and n(t ) =Ω(n). Indeed, the proof of Lemma 6.7 only hinges on these assumptions on n(t ),m2(t ), irrespective of d . Hence, if n,m2, . . . ,mk : [0,θ∗−δ] →R are functions that satisfy the conditions (6.13)–(6.15) and that satisfy sup θ∈[0,θ∗−δ] 2m2(θ)/n(θ) < 1, (6.25) then [26, Theorem 2] implies that for all 0 ≤ t < (1−δ)θ∗n we have n(t )/n = n(t/n)+o(1), mℓ(t )/n =mℓ(t/n)+o(1) (2 ≤ ℓ≤ k). Finally, we claim that the functions n∗ : [0,θ∗−δ] → R, m∗ ℓ : [0,θ∗−δ] → R defined by (6.17) satisfy (6.13)–(6.15) and (6.25). In fact, the same manipulations as in the proof of Lemma 6.9 yield (6.13)–(6.15). Additionally, (6.25) follows from Lemma 3.5 (ii) and Proposition 2.2 (ii), which shows that α∗ is a stable fixed point and therefore 2m2(θ)/n(θ) = d(k −1)(1−α∗)αk−2 ∗ < 1 for 0 ≤ θ ≤ θ∗−δ. Thus, we obtain (6.1) for 0 ≤ θ < θ∗. □ Proof of Theorem 1.1 (ii). Let u1, . . . ,un ∈ {0,1} be uniformly distributed, mutually independent and independent of all other randomness. We couple the execution of the decimation process and of the UCP algorithm on a random formula F as follows. At every time t where πF DC,t = 1/2, the decimation process sets σDC(xt+1) = u t+1. Similarly, whenever UCP executes Step 5 we set σUC(xt+1) = u t+1. Let ∆ be the first time 0 ≤ t < n such that σDC(xt+1) ̸= σUC(xt+1); if σDC(xt+1) =σUC(xt+1) for all t , we set∆= n. We claim that UCP encounters a conflict if ∆ < n. To see this, assume that 0 ≤ t < n satisfies σDC(xt+1) ̸= σUC(xt+1) but σDC(xs+1) ̸= σUC(xs+1) for all 0 ≤ s < t and that UCP did not encounter a conflict at any time s ≤ t . Then πF DC,t ∈ {0,1} but Step 5 of UCP sets σUC(xt+1) = u t+1 ̸= σDC(xt+1). Consequently, F possesses no satisfying assignment σ such that σUC(xi ) =σ(xi ) for 1 ≤ i ≤ t +1, and thus UCP will ultimately encounter a conflict. To complete the proof we claim that P [∆< n] = 1−o(1). To verify this consider a time (1+ε)θcond < t/n < (1− ε)θ∗n. Then Proposition 2.2 and Proposition 2.8 show that |V0(F DC,t )| =α∗n +o(n) w.h.p., while Proposition 6.15 shows thatα(t ) =α∗+o(1) w.h.p. In particular, even if∆≥ (1+ε)θcond, the probability that πF DC,t ∈ {0,1} while UCP assigns xt+1 randomly isΩ(1). Therefore,∆< θ∗n w.h.p. □ REFERENCES [1] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802. [2] D. Achlioptas, M. Molloy: The solution space geometry of random linear equations. Random Structures and Algorithms 46 (2015) 197–231. [3] P. Ayre, A. Coja-Oghlan, P. Gao, N. Müller: The satisfiability threshold for random linear equations. Combinatorica 40 (2020) 179–235. [4] B. Bollobás: Random graphs. Cambridge University Press (2001). [5] A. Braunstein, M. Mézard, R. Zecchina: Survey propagation: An algorithm for satisfiability. Random Structures and Algorithms 27 (2005) 201–226. [6] A. Coja-Oghlan: Belief Propagation fails on random formulas. Journal of the ACM 63 (2017) #49. [7] A. Coja-Oghlan: A better algorithm for random k-SAT. SIAM J. Computing 39 (2010) 2823–2864. [8] A. Coja-Oghlan, A. Ergür, P. Gao, S. Hetterich, M. Rolvien: The rank of sparse random matrices. Proc. 31st SODA (2020) 579–591. [9] A. Coja-Oghlan, P. Gao, M. Hahn-Klimroth, J. Lee, N. Müller, M. Rolvien: The full rank condition for sparse random matrices. Combina- torics, Probability and Computing 33 (2024), 643–707. [10] A. Coja-Oghlan, A. Pachon-Pinzon: The decimation process in random k-SAT. SIAM Journal on Discrete Mathematics 26 (2012) 1471–1509. [11] C. Deroulers, R. Monasson: Criticality and universality in the unit-propagation search rule. Eur. Phys. J. B 49 (2006) 339–369. [12] O. Dubois, J. Mandler: The 3-XORSAT threshold. Proc. 43rd FOCS (2002) 769–778. [13] A. Frieze, S. Suen: Analysis of two simple heuristics on a random instance of k-SAT. J. Algorithms 20 (1996) 312–355. [14] D. Gamarnik: The overlap gap property: a topological barrier to optimizing over random structures. PNAS 118 (2021) e2108492118. [15] S. Hetterich: Analysing survey propagation guided decimation on random formulas. Proc. 43rd ICALP (2016) #65. [16] M. Ibrahimi, Y. Kanoria, M. Kraning, A. Montanari: The set of solutions of random XORSAT formulae. Annals of Applied Probability 25 (2015) 2743–2808. [17] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová: Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. National Academy of Sciences 104 (2007) 10318–10323. [18] A. Maier, F. Behrens, L. Zdeborová: Dynamical cavity method for hypergraphs and its application to quenches in the k-XOR-SAT problem. arxiv 2412.14794 (2024). [19] M. Mézard, A. Montanari: Information, physics and computation. Oxford University Press 2009. [20] M. Mézard, T. Mora, R. Zecchina: Clustering of solutions in the random satisfiability problem. Phys. Rev. Lett. 94 (2005) 197205 [21] M. Mézard, G. Parisi, R. Zecchina: Analytic and algorithmic solution of random satisfiability problems. Science 297 (2002) 812–815. 32 [22] M. Mézard, F. Ricci-Tersenghi, R. Zecchina: Two solutions to diluted p-spin models and XORSAT problems. Journal of Statistical Physics 111 (2003) 505–533. [23] M. Molloy: Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms 27 (2005) 124–135. [24] B. Pittel, G. Sorkin: The satisfiability threshold for k-XORSAT. Combinatorics, Probability and Computing 25 (2016) 236–268. [25] F. Ricci-Tersenghi, G. Semerjian: On the cavity method for decimated random constraint satisfaction problems and the analysis of belief propagation guided decimation algorithms. J. Stat. Mech. (2009) P09001. [26] N. Wormald: Differential equations for random processes and random graphs. Ann. Appl. Probab. 5 (1995) 1217–1235. [27] K. Yung: Limits of sequential local algorithms on the random k-XORSAT problem. Proc. 51st ICALP (2024) #123. ARNAB CHATTERJEE, arnab.chatterjee@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT- MUND 44227, GERMANY. AMIN COJA-OGHLAN, amin.coja-oghlan@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATH- EMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY. MIHYUN KANG, kang@math.tugraz.at, TU GRAZ, INSTITUTE OF DISCRETE MATHEMATICS, STEYRERGASSE 30, 8010 GRAZ, AUSTRIA. LENA KRIEG, lena.krieg@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY. MAURICE ROLVIEN, maurice.rolvien@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT- MUND 44227, GERMANY. GREGORY B. SORKIN, g.b.sorkin@lse.ac.uk, THE LONDON SCHOOL OF ECONOMICS AND POLITICAL SCIENCE, DEPARTMENT OF MATHE- MATICS, COLUMBIA HOUSE, HOUGHTON ST, LONDON WC2A 2AE, UNITED KINGDOM 33 THE RANDOM k-SAT GIBBS UNIQUENESS THRESHOLD REVISITED ARNAB CHATTERJEE, AMIN COJA-OGHLAN, CATHERINE GREENHILL, VINCENT PFENNINGER, MAURICE ROLVIEN, PAVEL ZAKHAROV, KOSTAS ZAMPETAKIS ABSTRACT. We prove that for any k ≥ 3 for clause/variable ratios up to the Gibbs uniqueness threshold of the corre- sponding Galton-Watson tree, the number of satisfying assignments of random k-SAT formulas is given by the ‘replica symmetric solution’ predicted by physics methods [Monasson, Zecchina: Phys. Rev. Lett. 76 (1996)]. Furthermore, while the Gibbs uniqueness threshold is still not known precisely for any k ≥ 3, we derive new lower bounds on this thresh- old that improve over prior work [Montanari and Shah: SODA (2007)]. The improvement is significant particularly for small k. MSc: 68Q87, 60C05, 68R07 1. INTRODUCTION 1.1. Background and motivation. Going back to experimental work from the 1990s, the most prominent question concerning random k-SAT has been to pinpoint the satisfiability threshold, defined as the largest density m/n of clauses m to variables n up to which satisfying assignments likely exist [6, 18]. Currently, the satisfiability thresh- old is known precisely in the case of k = 2 [21, 40] and for k ≥ k0 with k0 an undetermined (large) constant [33]. The latter result confirms ‘predictions’ based on an analytic but non-rigorous physics technique called the ‘cavity method’. Indeed, the cavity method predicts the satisfiability threshold for every k ≥ 3 [51], but random k-SAT for ‘small’ k ≥ 3 appears to be a particularly hard nut to crack. Additionally, according to the cavity method several phase transitions precede the satisfiability threshold and are expected to impact, among other things, the perfor- mance of algorithms [47]. One of these phase transitions, the Gibbs uniqueness transition, pertains to a spatial mixing property that also plays a pivotal role in the computational complexity of counting and sampling [61]. From a statistical physics viewpoint, the satisfiability threshold is only the second most important quantity as- sociated with random k-SAT. The first place firmly belongs to the typical number of satisfying assignments, known as the partition function in physics parlance [50]. All the other predictions, including the location of the satisfi- ability threshold, ultimately derive from the formula for the number of satisfying assignments or closely related variables [49]. Yet there has been little progress on confirming the physics formula for the number of satisfying assignments rigorously. Three prior contributions stand out. First, a proof technique called the ‘interpolation method’ turns the physics prediction into a rigorous upper bound [35, 41, 60].1 Second, in the case k = 2, conceptually much simpler than k ≥ 3, the physics formula has been proved correct [3]. Third, Montanari and Shah [56] proved that also for k ≥ 3 for certain clause/variable densities the ‘replica symmetric solution’ from physics correctly approximates the number of ‘good’ assignments that satisfy all but o(n) clauses. However, it seems difficult to estimate the gap between the number of such ‘good’ assignments and the number of actual satisfying assignments. A rigorous method to this effect would likely imply the existence of uniform satisfiability thresholds for all k ≥ 3, thereby resolving a long- standing conundrum [12, 36]. The proof of Montanari and Shah is based on the aforementioned Gibbs uniqueness property. The aim of the present paper is to determine the number of actual satisfying assignments of random k-SAT for- mulas for clause/variable densities up to the Gibbs uniqueness threshold. Specifically, we verify that the ‘replica symmetric solution’ from [54, 55] yields the correct answer for any k ≥ 3 right up to the Gibbs uniqueness threshold, even though the precise value of this threshold is not currently known. Additionally, we derive a new lower bound on the Gibbs uniqueness threshold. The improvement is particularly significant for ‘small’ k ≥ 3. Combining these two results, we obtain the first rigorous formula for the number of satisfying assignments of random k-SAT for- mula for a non-trivial regime of clause/variable densities. Crucially, the result covers meaningful clause/variable densities even for small k ≥ 3. 1Strictly speaking, the contributions [35, 41, 60] deal with the ‘random k-SAT model at positive tempertature’, see Section 2.6. In Corol- lary 2.2 below we combine the interpolation method with a concentration argument to bound the number of actual satisfying assignments. 1 ar X iv :2 50 6. 01 35 9v 2 [ cs .D M ] 1 8 N ov 2 02 5 1.2. Results. Let Φ = Φd ,k (n) be the random k-CNF on n Boolean variables x1, . . . , xn with m = mn ∼ Po(dn/k) clauses a1, . . . , am . The clauses ai are drawn independently and uniformly from the set of all 2k (n k ) possible clauses with k distinct variables. Hence, the parameter d prescribes the expected number of clauses in which a given variable appears. Let S(Φ) be the set of satisfying assignments ofΦ and let Z (Φ) = |S(Φ)|. We encode the Boolean values ‘true’ and ‘false’ by +1 and −1, respectively. Since right up to the satisfiability threshold Z (Φ) is of order exp(Θ(n)) w.h.p. for trivial reasons2, our objective is to study the random variable n−1 log Z (Φ) as n →∞. 1.2.1. The number of satisfying assignments up to the Gibbs uniqueness threshold. The first main result vindicates the ‘replica symmetric solution’ for values of d up to the Gibbs uniqueness threshold of the Galton-Watson tree that mimics the local topology ofΦ. Let us define these concepts precisely. We begin with the Galton-Watson tree T = Td ,k , which is generated by a two-type branching process. The two types are variable nodes and clause nodes. The process starts with a single root variable node x. The offspring of any variable node is a Po(d) number of clause nodes, while every clause node begets precisely k−1 variable nodes. Additionally, independently for each clause node a and every variable node x that is either a child or the parent of a a sign, denoted sign(x, a) ∈ {±1}, is chosen uniformly at random. The resulting random tree T models the local structure of the random formulaΦ in the sense of local weak convergence [9, 48].3 Next, we define the Gibbs uniqueness property on the tree T. For an integer ℓ ≥ 0 let T(ℓ) be the finite tree obtained by removing all variable and clause nodes at a distance greater than 2ℓ from the root x. We identify the finite tree T(ℓ) with a Boolean formula whose variables/clauses are precisely the variable/clause nodes of T(ℓ). Let S(T(ℓ)) ̸= ; be the set of satisfying assignments of this formula and let τ(ℓ) ∈ S(T(ℓ)) be a uniformly random satisfying assignment. Moreover, let ∂2ℓx be the set of variable nodes of T(ℓ) at distance precisely 2ℓ from the root x. Then for given d ,k the tree T=Td ,k has the Gibbs uniqueness property if lim ℓ→∞ E [ max τ∈S(T(ℓ)) ∣∣∣P [ τ(ℓ)(x) = 1 |T ] −P [ τ(ℓ)(x) = 1 |T, ∀x ∈ ∂2ℓx :τ(ℓ)(x) = τ(x) ]∣∣∣ ] = 0 (see [47]). (1.1) In words, in the limit of large ℓ the truth value τ(ℓ)(x) of the root x is asymptotically independent of the truth values {τ(ℓ)(x)}x∈∂2ℓx of the variables at distance 2ℓ from x. In light of the above, for any k ≥ 2 we further define duniq(k) as duniq(k) = inf{d > 0 : condition (1.1) fails to hold for d ,k} . (1.2) It is easy to see that duniq(k) is strictly positive and finite for any k ≥ 2. Indeed, in Theorem 1.2 we will derive explicit lower bounds on duniq(k). However, the exact value of duniq(k) is not currently known for any k ≥ 3. As a final preparation we need to spell out the ‘replica symmetric solution’ from [54]. This prediction comes in terms of a distributional fixed point problem, i.e., a fixed point problem on the space P (0,1) of probability measures on the open unit interval. Specifically, consider the Belief Propagation operator BPd ,k : P (0,1) →P (0,1), π 7→ π̂= BPd ,k (π) (1.3) defined as follows. Let d+,d− ∼ Po(d/2) be Poisson variables with expectation d/2. Moreover, let (µπ,i , j )i , j≥1 be a sequence of i.i.d. random variables, each following distribution π. All these random variables are mutually independent. Further, let µπ,i = 1− k−1∏ j=1 µπ,i , j for i ≥ 1, and µ̂π = ∏d− i=1µπ,2i−1 ∏d− i=1µπ,2i−1 + ∏d+ i=1µπ,2i . (1.4) Then π̂ is the distribution of µ̂π. Furthermore, for a probability measure π ∈P (0,1) define the Bethe free entropy4 Bd ,k (π) = E [ log ( d−∏ i=1 µπ,2i−1 + d+∏ i=1 µπ,2i ) − d(k −1) k log ( 1− k∏ j=1 µπ,1, j )] , (1.5) provided that the expectation on the r.h.s. exists. Finally, let δ1/2 ∈P (0,1) be the atom at 1/2 and let us write BPℓd ,k for the ℓ-fold application of the operator BPd ,k . 2For example, w.h.p. there areΩ(n) variables that do not appear in any clause. 3Corollary 3.7 below provides a precise statement to this effect. 4Throughout the paper log refers to the natural logarithm. 2 FIGURE 1. Comparison of Bd ,k (πd ,k ) with known bounds for limn→∞ 1 n log Z (Φ) for k = 3. The red dotted line depicts the first moment upper bound (1.13), while the green dotted line repre- sents the lower bound provided by (1.14). The blue line displays a numerical approximation of Bd ,3(πd ,3). To obtain our values, we generated 106 samples from π≈ BP25 d ,3(δ1/2) and then evalu- ated the corresponding empirical average of the expression in (1.5). Theorem 1.1. Let k ≥ 3 and assume that 0 < d < duniq(k). Then the weak limit πd ,k = lim ℓ→∞ BPℓd ,k (δ1/2) ∈P (0,1) (1.6) exists and lim n→∞ 1 n log Z (Φ) =Bd ,k (πd ,k ) in probability. (1.7) The formula (1.7) matches the prediction from [54] precisely. Of course, part of the assertion of Theorem 1.1 is that the Bethe free entropy Bd ,k (πd ,k ) is well defined. Admittedly, the formula (1.7) is not ‘explicit’. But the proof of Theorem 1.1 evinces that the convergence (1.6) occurs rapidly. Therefore, a randomised algorithm called ‘population dynamics’ [50] can be used to approximate (1.7) within any desired numerical accuracy. 1.2.2. An improved lower bound on Gibbs uniqueness. The obvious next task is to determine the Gibbs uniqueness threshold duniq(k). Currently, its value is known precisely only in the case k = 2, where duniq(2) = 2 coincides with the random 2-SAT satisfiability threshold [3, 21, 40]. Furthermore, Montanari and Shah [56] proved that the pure literal threshold5 dpure(k) upper bounds duniq(k) for all k ≥ 2.6 The value of dpure(k) admits a neat formula [16, 53]: duniq(k) ≤ dpure(k) = min z>0 z (1−exp(−z/2))k−1 . (1.8) Complementing the upper bound (1.8), Montanari and Shah derived a lower bound dMS(k): dMS(k) = sup { d > 0 : d(k −1) ( 1−exp(−d/2)/4 )( 1−exp(−d/2)/2 )k−2 < 1 } ≤ duniq(k). (1.9) 5This marks the threshold up to which the pure literal algorithm–which repeatedly assigns the preferred value to all variables appearing with a single sign–produces a satisfying assignment w.h.p. 6To be precise, Montanari and Shah established an upper bound on the Gibbs uniqueness threshold that turns out to coincide with the pure literal threshold, albeit without pointing out this identity. 3 Unfortunately, the bound (1.9) is tight not even in the case k = 2, where duniq(2) = 2 while dMS(2) ≈ 1.16. That said, the lower and upper bounds dMS(k) and dpure(k) match asymptotically in the limit of large k, as dMS(k),dpure(k) = (2+ok (1)) logk, (1.10) with ok (1) hiding a term that vanishes as k →∞. The following theorem yields an improved lower bound dcon(k) on duniq(k). Theorem 1.2. For all k ≥ 3 we have duniq(k) ≥ dcon(k) := sup { d > 0 : d(k −1) 2 ( 1−exp(−d/2)/2 )k−2 < 1 } . (1.11) An easy calculation reveals that dMS(k) < dcon(k) for every k ≥ 2. (1.12) Moreover, it is satisfactory that the formula (1.11) reproduces the correct (previously known) threshold duniq(2) = dcon(2) = dpure(2) = 2. That said, we have no reason to believe that (1.11) is tight for any k ≥ 3. k 2 3 4 5 dgiant 1.0000 0.5000 0.3333 0.2500 dMS 1.1625 0.8792 0.8695 0.9236 dcon 2.0000 1.3431 1.2451 1.2635 dpure 2.0000 4.9108 6.1782 7.0178 dsat 2.0000 12.801 39.724 105.585 TABLE 1. The values of dMS(k),dcon(k), and dpure(k) for 2 ≤ k ≤ 5. Additionally, dgiant(k) = 1/(k−1) marks the giant component threshold of the hypergraph induced by the random k-CNF formula. Moreover, dsat(k) is the satisfiability threshold according to physics predictions [49]. It is not hard to show that dgiant(k) ≤ dMS(k) ≤ dcon(k) ≤ duniq(k) ≤ dpure(k) ≤ dsat(k), for all k ≥ 2. Combining Theorems 1.1 and 1.2, we obtain the following. Corollary 1.3. Let k ≥ 3. If d < dcon(k) then (1.7) holds. Corollary 1.3 constitutes the first rigorous result to determine the precise asymptotic value of log Z (Φ) for a non- trivial regime of d for any k ≥ 3. To elaborate, the formula (1.7) is trivially true for d < 1/(k −1) because for such d the k-uniform hypergraph induced by the clauses ofΦ has no giant component and Belief Propagation is exact on acyclic graphical models [50]. But Corollary 1.3 applies to d well beyond this threshold, as displayed in Table 1. In particular, in contrast to much of the prior work on random k-SAT, Corollary 1.3 applies to a non-trivial regime of d even for ‘small’ k ≥ 3. Although Table 1 contains the values dMS(k) from [56] for comparison, we emphasise that Montanari and Shah’s result only yields the number of ‘good’ assignments satisfying all but o(n) clauses, rather than of actual satisfying assignments. In fact, the best prior rigorous bounds on the number of satisfying assignments for d > 1/(k − 1) derive from the first and the second moment methods. Specifically, the folklore first moment bound reads 1 n log Z (Φ) ≤ log2+ d k log(1−2−k )+o(1) w.h.p. (1.13) Furthermore, Achlioptas and Peres [7] perform a second moment argument on the number of balanced satisfying assignments, i.e., satisfying assignments that enjoy a peculiar additional condition required to keep the second moment under control. They show that w.h.p. 1 n log Z (Φ) ≥ (1−d) log2+ d k log [( λ1/2 +λ−1/2)k −λ−k/2 ] +o(1), where (1−λ)(1+λ)k−1 = 1, λ> 0. (1.14) Figure 1 illustrates the bounds (1.13)–(1.14) along with (1.7) for k = 3. As the figure shows, the correct value (1.7) is quite close to the first moment bound. That said, the first moment bound strictly exceeds Bd ,k (πd ,k ) for all d > 0, k ≥ 3 [24]. On the other hand, Figure 1 demonstrates that the ‘balanced second moment bound’ (1.14) significantly undershoots Bd ,3(πd ,3). Recall that Figure 1 is on a logarithmic scale; thus, even small differences translate into exponentially large errors. 4 1.3. Preliminaries and notation. Let Φ be a Boolean expression in conjunctive normal such that no clause con- tains the same variable twice. We write V (Φ) for the set of Boolean variables of Φ and F (Φ) for the set of clauses. The formula Φ gives rise to a bipartite graph G(Φ) on the vertex set V (Φ)∪F (Φ) in which a variable x and a clause a are adjacent iff variable x appears in clause a (either positively or negatively). Let E(Φ) denote the edge set of the graph G(Φ). Furthermore, for a vertex v ∈V (Φ)∪F (Φ) let ∂Φv be the set of neighbours of v ; where the reference to Φ is self-evident, we just write ∂v . The graph G(Φ) induces a metric on V (Φ)∪F (Φ) by letting distΦ(v, w) equal the length of the shortest path from v to w . For a vertex v and an integer ℓ≥ 0 let ∂ℓΦv = ∂ℓv be the set of all vertices w at distance precisely ℓ from v . For a clause a and a variable x ∈ ∂a we define signΦ(x, a) = 1 if a contains x as a positive literal, and signΦ(x, a) = −1 if a contains the negation ¬x. (This is unambiguous because clause a is not allowed to contain both x and ¬x.) For a variable x ∈V (Φ) and s ∈ {±1} we let ∂s Φx = ∂s x be the set of clauses a ∈ ∂Φx such that signΦ(x, a) = s. Where convenient we use the shorthand ∂±x = ∂±1x. We say that a variable x is pure inΦ if signΦ(x, a) = signΦ(x,b) for all a,b ∈ ∂x. More specifically, say that x is a pure literal ofΦ if ∂−x =;. Similarly, ¬x is called a pure literal if ∂+x =;. A variable or literal that fails to be pure is called mixed. For a literal l ∈ {x,¬x : x ∈ V (Φ)} we let |l | denote the underlying variable; thus, |x| = |¬x| = x for x ∈ V (Φ). Moreover, we define sign(x) = 1 and sign(¬x) =−1. Further, for a literal l we define 1 · l = l and (−1) · l =¬l . If Φ is satisfiable, then σΦ = (σΦ(x))x∈V (Φ) denotes a uniformly random satisfying assignment of Φ. Where the reference toΦ is obvious we just write σ. Let µ,ν be two probability measures on Rh , let q ≥ 1 and assume that ∫ Rh ∥x∥q q dµ(x), ∫ Rh ∥x∥q q dν(x) < ∞. We recall that the Lq -Wasserstein distance of µ,ν is defined as Wq (µ,ν) = inf (ξ,ζ) E [ ∥ξ−ζ∥q q ]1/q , where the infimum is taken over all pairs (ξ,ζ) of random variables defined on the same probability space Ω such that ξ has distribution µ and ζ has distribution ν. If X ,Y are random variables with distributions µ,ν, it is conve- nient to use the shorthand Wq (X ,Y ) =Wq (µ,ν), provided that E[∥X ∥q q ],E[∥Y ∥q q ] <∞. For two random variables X ,Y we write X ∼ Y if X ,Y are identically distributed. Moreover, for a probability distribution µ and a random variable X we write X ∼µ if X has distribution µ. We will make repeated use of the following tail bound for Poisson variables. Lemma 1.4 (Bennett’s inequality [14, Theorem 2.9]). Suppose that X ∼ Po(λ) withλ> 0 and letϕ(x) = (1+x) log(1+ x)−x for x >−1. Then P [X ≥λ+ t ] ≤ exp(−λϕ(t/λ)) for any t > 0, P [X ≤λ− t ] ≤ exp(−λϕ(−t/λ)) for any 0 < t <λ. For reals a,b we write a ∨b = max{a,b}, a ∧b = min{a,b}. Unless specified otherwise asymptotic notation o( · ), O( · ), etc. is understood to refer to the limit n →∞. The symbol Õ( · ) is understood to swallow polylog(n) terms. Throughout we tacitly assume that n is sufficiently large so that the various estimates are valid. We use the conventions log0 =−∞ and log∞=∞. Finally, throughout the paper we assume that k ≥ 3 is a fixed integer. 2. OVERVIEW In this section we survey the proofs of the main results. Subsequently, we discuss further related work. The proof details are deferred to the remaining sections; see Section 2.7 for pointers. We assume throughout that k ≥ 3. 2.1. Existence of the fixed point and upper bound. As a first step towards the proof of Theorem 1.1 we prove that the limit (1.6) exists for d < duniq(k). More precisely, we will establish the following statement. Proposition 2.1. For every k ≥ 3 and every d < duniq(k), the W1-limit πd ,k = limℓ→∞ BPℓd ,k (δ1/2) exists and E [ log2µπd ,k ,1,1 ] +E ∣∣∣∣∣log ( d−∏ i=1 µπd ,k ,2i + d+∏ i=1 µπd ,k ,2i−1 )∣∣∣∣∣+E ∣∣∣∣∣log ( 1− k∏ j=1 µπd ,k ,1, j )∣∣∣∣∣<∞. (2.1) In addition, µπd ,k ,1,1 and 1−µπd ,k ,1,1 are identically distributed. 5 The existence of the limit πd ,k is an easy consequence of the Gibbs uniqueness property. As an aside, the limit πd ,k = limℓ→∞ BPℓd ,k (δ1/2) is a fixed point of the Belief Propagation operator, i.e., πd ,k = BPd ,k (πd ,k ). (2.2) The proof of the bound (2.1) is a bit more subtle and requires a few preparations, but we will come to that. The upshot of (2.1) is that the Bethe free entropy Bd ,k (πd ,k ) is well defined. With the fixed point πd ,k in hand we can bring to bear the ‘interpolation method’ to the upper bound the likely value of log Z (Φ). Corollary 2.2. If d < duniq(k) then w.h.p. we have 1 n log Z (Φ) ≤Bd ,k (πd ,k )+o(1). The interpolation method is a mainstay of the study of disordered systems in mathematical physics and has also been used to investigate random constraint satisfaction problems. In particular, the variant of the interpolation method from [60] (in combination with Proposition 2.1) easily implies that limsup n→∞ 1 n E [ log(Z (Φ)∨1) ]≤Bd ,k (πd ,k ) ; taking the logarithm of Z (Φ)∨1 ensures that the expectation is well defined, as it is possible (albeit unlikely for d < duniq(k)) that Φ is unsatisfiable. The added value of Corollary 2.2 is that we obtain a bound that holds with high probability, rather than just a bound on the expectation. The interpolation method was used in [3] in a similar fashion to prove a ‘with high probability’ bound on the number of satisfying assignments of random 2-CNFs. The proof of Corollary 2.2 is an adaptation of that argument to k ≥ 3. 2.2. A matching lower bound. The key step towards Theorem 1.1 is to establish a lower bound on log Z (Φ) that matches the upper bound from Corollary 2.2. To accomplish this task we employ a coupling argument known as the ‘Aizenman-Sims-Starr scheme’ in mathematical physics. Its original version was intended to estimate the partition function of the Sherrington-Kirkpatrick model, a spin glass model [8]. But the technique has since been employed in probabilistic combinatorics (e.g., [24, 25, 65]). By comparison to the mathematical physics context, the crucial difference is that here our objective is to count actual satisfying assignments where every single clause imposes a hard constraint, whereas in spin glass theory constraints are soft. The same issue occurred in previ- ous work on the random 2-SAT problem [3]. However, in that case a relatively simple percolation argument was sufficient to deal with the ensuing complications. As we will see, for k ≥ 3 considerably more care is needed. But first things first. The basic idea behind the Aizenman-Sims-Starr argument is to perform a kind of induc- tion. Translated to random k-SAT this means that we couple the random k-CNFΦd ,k (n) with n variables with the random k-CNFΦd ,k (n+1) with n+1 variables. Recall thatΦd ,k (n) comprises mn ∼ Po(dn/k) independent random clauses. Ultimately Theorem 1.1 is going to be a consequence of Corollary 2.2 and the following statement. Proposition 2.3. If d < duniq(k) then E [ log(Z (Φd ,k (n +1))∨1) ]−E[ log(Z (Φd ,k (n))∨1) ]=Bd ,k (πd ,k )+o(1). Once again we work with Z (Φd ,k (n))∨1 and Z (Φd ,k (n +1))∨1 to ensure that the expectations are well defined. To prove Proposition 2.3 we couple the random formulasΦd ,k (n +1) andΦd ,k (n) as follows. CPL1: LetΦ′ be a random k-CNF with variables x1, . . . , xn and m′ ∼ Po(d(n −k +1)/k) clauses. CPL2: ObtainΦ′′ fromΦ′ by adding another∆′′ ∼ Po(d(k −1)/k) independent random clauses. CPL3: Obtain Φ′′′ from Φ′ by adding one new variable xn+1 and ∆′′′ ∼ Po(d) independent random clauses that each contain xn+1 and k −1 other variables from {x1, . . . , xn}. Observe that Φ′′ ultimately has variables x1, . . . , xn and a total of mn ∼ Po(dn/k) random clauses. Thus, Φ′′ is identical to the random formulaΦd ,k (n). Similarly,Φ′′′ has the same distribution asΦd ,k (n+1). Consequently, we obtain the following. Fact 2.4. For any d > 0 we have Z (Φd ,k (n)) ∼ Z (Φ′′) and Z (Φd ,k (n +1)) ∼ Z (Φ′′′). The coupling CPL1–CPL3 reduces the proof of Proposition 2.3 to getting a handle on the differences log(Z (Φ′′)∨ 1)−log(Z (Φ′)∨1) and log(Z (Φ′′′)∨1)−log(Z (Φ′)∨1). More precisely, recalling (1.4)–(1.5), we see that Proposition 2.3 is a consequence of the following two statements. Proposition 2.5. If d < duniq(k) then E [ log Z (Φ′′)∨1 Z (Φ′)∨1 ] = d(k −1) k E [ log ( 1− k∏ j=1 µπd ,k ,1, j )] +o(1). (2.3) 6 Proposition 2.6. If d < duniq(k) then E [ log Z (Φ′′′)∨1 Z (Φ′)∨1 ] = E [ log ( d−∏ i=1 µπd ,k ,2i + d+∏ i=1 µπd ,k ,2i−1 )] +o(1). (2.4) To prove Propositions 2.5–2.6 we effectively need to trace the impact that local changes have on the number of satisfying assignments. Indeed, under the coupling CPL1–CPL3, the formula Φ′′ is obtained from the ‘base formula’ Φ′ by adding just a bounded expected number of random clauses. Thus, if we imagine that, as both the first moment upper bound (1.13) and the balanced second moment lower bound (1.14) suggest, each addi- tional random clause typically reduces the number of satisfying assignments by a constant factor, then the quantity | log(Z (Φ′′)/Z (Φ′))| should be bounded with probability close to one. Similar reasoning applies toΦ′′′. Yet while with high probability the local changes that turnΦ′ intoΦ′′ orΦ′′′ are indeed benign, because we are dealing with hard constraints there is a non-negligible probability that log(Z (Φ′′)/Z (Φ′)) and log(Z (Φ′′′)/Z (Φ′)) could be large. Indeed, a single extra clause might wipe out all satisfying assignments ofΦ′, in which case log Z (Φ′′)∨1 Z (Φ′)∨1 =− log Z (Φ′) =−Ω(n). Hence, we need to argue that such drastic changes are sufficiently rare. The following statement furnishes the necessary tail bound. Proposition 2.7. For d < duniq(k) we have E [∣∣∣∣log Z (Φ′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 + ∣∣∣∣log Z (Φ′′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] =O(1). (2.5) 2.3. Pure literal pursuit. The proof of Proposition 2.7 constitutes the main technical challenge towards the proof of Theorem 1.1. The linchpin of the proof is an algorithm that we call Pure Literal Pursuit (‘PULP’). Its purpose is to trace the repercussions of setting a relatively small number of variables to specific truth values. More precisely, PULP will allow us to compare the number of satisfying assignments that set a few chosen variables to specific values to the total number of satisfying assignments. To this end PULP attempts to solve the following optimisation task. Suppose we are given a k-CNF Φ and a set L of literals of Φ that we deem to be set to ‘true’. We would like to identify a superset L̄ ⊇ L of literals with the following properties; think of L̄ as a ‘closure’ of L . PULP1: every clause a ofΦ that contains a literal from ¬L̄ = {¬l : l ∈ L̄ } also contains a literal from L̄ . PULP2: there is no literal l such that l ,¬l ∈ L̄ . Of course, it may be impossible to satisfy PULP1 and PULP2 simultaneously. In this case we ask PULP to report a ‘contradiction’. But if PULP1–PULP2 can be satisfied, we aim to find a closure L̄ of as small size |L̄ | as possible. The combinatorial idea behind PULP1–PULP2 is as follows. Deeming the literals from the initial set L ‘true’, our goal is to reconcile this assumption with the formula Φ. To this end we enhance the set L . Clearly, any clause that contains the negation ¬l of a literal l that we deem true also needs to contain another literal l ′ that is set to true. This is what PULP1 asks. Furthermore, it would be contradictory to deem both l and its negation ¬l true; this is PULP2. The size of the closure L̄ yields a bound on the reduction in the number of satisfying assignments if we indeed insist on all literals l ∈ L being set to true. Formally, let S(Φ,L ) be the set of all satisfying assignments σ ∈ S(Φ) under which all literals l ∈L evaluate to ‘true’. Also set Z (Φ,L ) = |S(Φ,L )|. Lemma 2.8. For anyΦ,L and any L̄ ⊇L that satisfies PULP1–PULP2 we have Z (Φ) ≤ 2|L̄ |Z (Φ,L ). In order to identify a ‘small’ closure L̄ the PULP algorithm resorts to pure literal elimination, a simple trick commonplace to satisfiability algorithms. A variable x is pure in a CNF formula Φ if sign(x, a) = sign(x,b) for any two clauses a,b ∈ ∂x. Clearly, if our objective is to construct a satisfying assignment, we might as well set all pure variables x to the value that satisfies all clauses a ∈ ∂x and disregard these clauses henceforth. In light of this observation, pure literal elimination repeatedly removes all clauses that contain a pure variable. Naturally, every round of clause removals may create new pure variables, and thus more clauses may be ripe for removal in the next round. For a clause a of the original formula Φ let ha(Φ) ≥ 1 be the number of the round at which pure literal elimination removes a. If a is never removed then we set ha(Φ) =∞. 7 The PULP algorithm invokes a slightly modified version of pure literal elimination to accommodate the initial set L of literals. Specifically, for a variable x of a CNFΦ and s ∈ {±1} letΦ[x 7→ s] be the CNF obtained by removing all clauses a ∈ ∂x with sign(x, a) = s and removing the literal −s · x from all a ∈ ∂x with sign(x, a) =−s. The definition reflects that if we set x to value s, all a ∈ ∂s x will be satisfied, while all a ∈ ∂−s x will have to be satisfied by one of their other constituent literals. Further, let hx (s,Φ) = { 0 if ∂−s Φ x =;, max { ha(Φ[x 7→ s]) : a ∈ ∂−s Φ x } otherwise. ∈ [0,∞]. (2.6) We refer to hx (s,Φ) as the height of literal s · x in Φ. The PULP algorithm, displayed as Algorithm 1, harnesses the heights as follows. In its attempt to precipitate PULP1 and PULP2 the algorithm iteratively enhances the set L of literals deemed to be ‘true’. For any clause a that violates PULP1 and that contains a literal l ̸∈ ¬L the algorithm adds one such literal l of minimum height to L . This choice is intended to keep the ultimate size of the closure small; one could say that PULP uses height as a proxy of ‘size’. If at any point the algorithm encounters a clause a that consists of literals from ¬L only, the algorithm reports a contradiction and aborts. Input: A k-CNF Φ and a set L of literals. 1 Let L̄ =L ; 2 while there is a clause a that contains a literal from ¬L̄ but no literal from L̄ do 3 Pick such a clause a that minimises the distance from the initial set L = {|l | : l ∈L }; 4 if a consists of literals l ∈¬L̄ only then 5 return ‘contradiction’ and halt; 6 else 7 choose x ∈ ∂a with x,¬x ̸∈ L̄ that minimises hx (sign(x, a),Φ) and add sign(x, a) · x to L̄ ; 8 return L̄ Algorithm 1: The PULP algorithm Remark 2.9. To break ties that may occur in the execution of Steps 3 and 7 of PULP we assume that the variables and clauses of Φ are numbered so that Steps 3 and 7 can choose the clause/variable with the smallest number that satisfies the respective requirements. In due course we will run PULP on (finite subtrees of) the Galton-Watson tree T. To number the variables and clauses of T we equip each of them with an independent Gaussian label. Since T comprises a countable number of clauses/variables, these labels will almost surely be pairwise distinct. From here on we write L̄ for the set of literals returned by PULP if the algorithm does not encounter a contra- diction; in the event of a a contradiction we let L̄ = {x,¬x : x ∈V (Φ)} be the set of all literals. Where the reference to the formula Φ is not entirely obvious, we write L̄Φ. The analysis of PULP on the random formula Φ′ furnishes the following bound on |L̄ | in terms of the size of the initial set L . This bound is the key ingredient towards the proof of Proposition 2.7. Lemma 2.10. There exists C = C (d ,k) > 0 such that the following is true. Let L be a set of literals of Φ′ such that 1 ≤ |L | ≤ log2 n and such that {xi ,¬xi } ̸⊆L for all 1 ≤ i ≤ n. Then E[|L̄ |3/2] ≤C |L |3/2. The proof of Lemma 2.10 is one of the main technical challenges of the present work. The difficulty stems from the stochastic dependencies that are inherent to the PULP algorithm. Specifically, in order to decide which literals to add to the set L , PULP requires knowledge of the heights hx (±1,Φ′). But these heights depend on the other variables y ∈ ∂a \ {x}, the clauses that these variables y appear in, etc. Furthermore, in its subsequent iterations the algorithm is apt to revisit some of these variables and clauses at a point when their heights have already been revealed. These repetitions rule out an analysis of PULP by way of routine techniques such as the principle of de- ferred decisions or the differential equations method. The reason why we manage to cope with these complicated dependencies at all is that, remarkably, the heights hx (±1,Φ′) have only a tiny upper tail. More precisely, as we will see the tails of these random variables decay at a doubly exponential rate. Proposition 2.7 follows from the analysis of PULP. The basic idea is to apply the algorithm to an initial set L of literals that contain one literal from each of the extra clauses that are present inΦ′′ orΦ′′′ but not inΦ′. With a bit 8 of care the bounds from Lemmas 2.8 and 2.10 then imply (2.5). Finally, the analysis of PULP that leads up to the proof of Lemma 2.10 also implies the necessary tail bounds to verify the bounds from (2.1). Specifically, the proof of Lemma 2.10 proceeds by way of analysing PULP on the Galton-Watson tree Td ,k , and the bounds (2.1) come out as a byproduct of that analysis. 2.4. Completing the Aizenman-Sims-Starr scheme. To obtain Propositions 2.5–2.6 we combine Proposition 2.7 with an analysis of the quotients Z (Φ′′)/Z (Φ′) and Z (Φ′′′)/Z (Φ′) on a likely ‘good’ event. On this good event the empirical distribution of the marginal probabilities (P[σΦ′ (xi ) = 1 |Φ′])1≤i≤n of the different variables xi receiving the value ‘true’ under a random satisfying assignment is ‘close’ to the limiting distribution πd ,k from Proposi- tion 2.1. Additionally, on the good event the joint distribution of the truth values assigned to a moderate number of variables is well approximated by a product measure. Of course, to make this precise we need to investigate the empirical distribution π′ n = 1 n n∑ i=1 δP[σΦ′ (xi )=1|Φ′] ∈P (0,1) (2.7) of the marginals (P[σΦ′ (xi ) = 1 |Φ′])1≤i≤n . Proposition 2.11. Assume that d < duniq(k). Then E [ W1(π′ n ,πd ,k ) ]= o(1) and for any ℓ=O(1) we have ∑ σ∈{±1}ℓ E ∣∣∣∣∣P [∀1 ≤ i ≤ ℓ :σΦ′ (xi ) =σi |Φ′]− ℓ∏ i=1 P [ σΦ′ (xi ) =σi |Φ′] ∣∣∣∣∣= o(1). The proof of Proposition 2.11 hinges on the Gibbs uniqueness property and the convergence of the local topol- ogy of the random formula Φ′ to the Galton-Watson tree Td ,k . Together with careful coupling arguments Propo- sitions 2.7–2.11 imply Propositions 2.5–2.6. Moreover, in combination with Fact 2.4 these two propositions yield Proposition 2.3. We complete this paragraph by showing how Theorem 1.1 follows from Corollary 2.2 and Propo- sition 2.3. Proof of Theorem 1.1. The existence of the limit (1.6) follows from Proposition 2.1. With respect to (1.7), we apply Proposition 2.3 to obtain 1 n E [ log(1∨Z (Φd ,k (n))) ]= 1 n n−1∑ N=0 ( E [ log(1∨Z (Φd ,k (N +1)) ]−E[ log(1∨Z (Φd ,k (N )) ]) =Bd ,k (πd ,k )+o(1) . (2.8) Since, conversely, Corollary 2.2 shows that 1 n log Z (Φ) ≤Bd ,k (πd ,k )+o(1) w.h.p. and since log Z (Φ) ≤ n log2 deter- ministically, the assertion follows from (2.8). □ 2.5. Lower-bounding the uniqueness threshold. The proof of Theorem 1.2 combines three ingredients. From the work [3] on random 2-SAT we borrow the idea of constructing an explicit extremal boundary configuration. In effect, in order to prove Gibbs uniqueness we just have to consider one single boundary configuration, instead of an enormous number of possible configurations τ that grows quickly with the height ℓ as in the original defini- tion (1.1). Second, from the work [56] of Montanari and Shah we borrow the idea of expressly considering the effect of pure literals. As it turns out, without explicit consideration of pure literals it seems difficult to even recover the correct asymptotic order (1.8) of the Gibbs uniqueness threshold. Third, and most importantly, the improvement over the bound from [56] stems from a new subtle coupling argument that we will explain in due course. 2.5.1. The extremal boundary condition. An obvious challenge associated with establishing the Gibbs uniqueness property (1.1) seems to be that we need to estimate the marginal of the root variable given any possible boundary condition, i.e., given any assignment of the variables at distance 2ℓ from x. As we expect to see (d(k−1))ℓ variables at distance 2ℓ from x, we thus face a doubly exponential number 2(d(k−1))ℓ of possible boundary conditions. But fortunately, following [3] we may confine ourselves to just a single, explicit boundary configurationτ+ that satisfies P [ τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) =τ+(x) ] = max τ∈S(T(ℓ)) P [ τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) = τ(x) ] . (2.9) Due to the inherent symmetry of the distribution of T with respect to the signs of the clauses, towards the proof of (1.1) it is sufficient to show that the difference (2.9) vanishes as ℓ→∞. 9 The extremal boundary condition can be constructed explicitly. Specifically, givenT(ℓ) we construct a satisfying assignment τ+ ∈ S(T(ℓ)) by working our way down the tree T(ℓ). We begin by setting τ+(x) = 1. Now suppose that for q ≥ 1, the values of the variables at distance 2(q −1) from x have been already determined. Let w be a variable at distance 2q from x with parent clause a and grandparent variable u. Then we define τ+(w) = sign(w, a) · 1{sign(u, a) ̸=τ+(u)}− sign(w, a) · 1{sign(u, a) =τ+(u)} . (2.10) The idea behind (2.10) is for τ+(w) to “nudge” u towards τ+(u) by making sure that w satisfies clause a if setting u to τ+(u) does not, and conversely making sure that w fails to satisfy clause a if setting u to τ+(u) does. A simple induction on ℓ shows that τ+ is a satisfying assignment for which (2.9) holds. Lemma 2.12. For any integer ℓ≥ 0 the assignment τ+ defined via (2.10) satisfies (2.9). Hence, proving Theorem 1.2 reduces to establishing the following. Proposition 2.13. For d < dcon(k) we have that lim ℓ→∞ E [ P [ τ(ℓ)(x) = 1 |T, ∀x ∈ ∂2ℓx :τ(ℓ)(x) =τ+(x) ] −P [ τ(ℓ)(x) = 1 |T ]] = 0. (2.11) The proof of Proposition 2.13 may seem delicate because the boundary condition τ+ depends on the tree T(ℓ). To sidestep this problem, we generalise another idea from the work [3] on random 2-SAT to k ≥ 3 by introducing a quantity that allows us to prove (2.13) but that behaves ‘Markovian’ as we pass up and down the tree. Specifically, for a variable x of T(ℓ) let T(ℓ) x be the sub-formula of T(ℓ) comprising x and its progeny. Moreover, for a satisfying assignment τ ∈ S(T(ℓ)) let S(T(ℓ) x ,τ) = { χ ∈ S(T(ℓ) x ) : ∀y ∈V (T(ℓ) x )∩∂2ℓ T x :χy = τy } , Z (T(ℓ) x ,τ) = ∣∣∣S(T(ℓ) x ,τ) ∣∣∣ . In words, S(T(ℓ) x ,τ) contains all satisfying assignments ofT(ℓ) x that comply with the boundary condition induced by τ. Additionally, for t =±1 let S(T(ℓ) x ,τ, t ) = { χ ∈ S(T(ℓ) x ,τ) :χx = t } , Z (T(ℓ) x ,τ, t ) = ∣∣∣S(T(ℓ) x ,τ, t ) ∣∣∣ be the set and number of satisfying assignments of T(ℓ) x that agree with τ on the boundary and assign value t to x. Finally, let η(ℓ) x = log Z (T(ℓ) x ,τ+,τ+(x)) Z (T(ℓ) x ,τ+,−τ+(x)) ∈R∪ {±∞} (2.12) be the log-likelihood ratio that gauges how likely a random satisfying assignment τ of T(ℓ) x subject to the τ+- boundary condition is to set x to its designated value τ+(x) from (2.10). In terms of (2.12), the proof of Propo- sition 2.13 comes down to showing that for d < dcon(k), lim ℓ→∞ η(ℓ) x = log ( µπd ,k ,1,1 1−µπd ,k ,1,1 ) in distribution. (2.13) For a start, the following lemma bounds the tails of η(ℓ) x for large enough ℓ and x reasonably close to the root variable x. Lemma 2.14. For every 0 < d < dcon(k) there exist c = c(d ,k) and a sequence (εt )t with limt→∞ εt = 0 such that for any t > 0, ℓ> ct c we have P [ max x∈∂2tx ∣∣∣η(ℓ) x ∣∣∣≤ 2t c ] > 1−εt . (2.14) The proof of Lemma 2.14 rests on combinatorial arguments reminiscent of the analysis of PULP. A key feature of the definition (2.12) is that the random variables η(ℓ) x exhibit a ‘reverse Markovian’ behaviour. This is because η(ℓ) x depends only on τ+(x) and the part T(ℓ) x of the tree pending on x. Furthermore, because the distribution of the random treeT(ℓ) x is symmetric with respect to sign flips, even the dependence on the valueτ+(x) can be eliminated. All we need to keep in mind is that the values τ+(y) for y ∈ V (T(ℓ) x ) are constructed from the value τ+(x) in accordance with the recurrence (2.10). Thus, by flipping all signs in the tree T(ℓ) x if necessary, we 10 could assume without loss that τ+(x) = 1 without changing the distribution of η(ℓ) x with respect to the random- ness of T(ℓ) x . As a consequence, it is possible to set up a recurrence that expresses the log-likelihood ratios η(ℓ) x of variables x at distance q from x in terms of the η(ℓ) y for y at distance q +2 from x. Due to the recursive nature of the random tree T, it suffices to set up this recurrence for the root x of the tree. In other words, to prove (2.13) we just need a recurrence that expresses the distribution of the random variable η(ℓ+1) x in terms of the law of η(ℓ) x for ℓ≥ 0. A bit of reflection (see Claim 7.1), reveals that the corresponding distributional operator LL+ d ,k : P ((−∞,∞]) →P ((−∞,∞]) , ρ 7→ ρ̂ = LL+ d ,k (ρ) has the following shape. For a distribution ρ ∈ P ((−∞,∞]) let (ηρ,i , j )i , j≥1 be a family of random variables with distribution ρ. Moreover, let (si )i≥1 be a sequence of uniformly random ±1-valued random variables and let d ∼ Po(d). All of these random variables are mutually independent. Additionally, for q ≥ 0 and z1, . . . , zq ∈ R∪ {±∞} define Γ (z1, . . . , zq ) = q∏ i=1 1+ tanh(zi /2) 2 . (2.15) Then ρ̂ = LL+ d ,k (ρ) is the distribution of the random variable − d∑ i=1 si · log ( 1−Γ ( si · ( ηρ,i ,1, . . . ,ηρ,i ,k−1 ))) . (2.16) Ultimately we will derive (2.13), and thereby Proposition 2.13, from Lemma 2.14 and a contraction argument. However, this is not quite as straightforward as one might be inclined to expect. Indeed, at first glance, a natural approach to proving (2.13) from Lemma 2.14 seems to be to show that LL+ d ,k is a contraction, say, with respect to the W1-metric. This is indeed carried out in [3] for k = 2, where it is shown that LL+ d ,2 contracts for all 0 < d < 2, i.e., right up to the random 2-SAT satisfiability threshold. However, for k ≥ 3 we can only show that LL+ d ,k contracts for d < 2/(k −1), a value well below dcon(k) and short of the correct asymptotic order (1.10). 2.5.2. Pure and mixed literals. To cover a larger range of d we borrow from [56] the idea of expressly taking into account pure literals. To elaborate, while LL+ d ,k describes how the law ofη(ℓ+1) x results from that ofη(ℓ) x , the operator fails to take into account that x itself as well as some of the grandchildren of x in T may be pure literals. However, the pure literal property has a marked effect on the log-likelihood ratios. For if, say, x only appears positively, then a simple double counting argument shows that η(ℓ) x ≥ 0 for all ℓ. By extension, pure literals among the grandchildren of x have a ‘dampening’ effect and may thus improve the range of d for which we can establish contraction. For a variable node x ofT, let us denote byTx the subtree ofT rooted at x and containing its progeny. Leveraging the above observation, we classify a variable x ofT as , ⊕, ⊖, or#, depending on whether x appears both positively and negatively inTx , only positively, only negatively, or whether x has no children at all, respectively. Furthermore, instead of just tracing the law of η(ℓ) x for ℓ≥ 0, we study the four separate conditional distributions given the type ,⊕,⊖ or # of x. Of course, the distribution of η(ℓ) x given type # (i.e., x has no children) is just the atom at zero for all ℓ. To describe the evolution of the other distributions we introduce the operator LL⋆d ,k : P (−∞,∞]×P (0,+∞]×P (−∞,0] →P (−∞,∞]×P (0,+∞]×P (−∞,0] , with (ρ ,ρ⊕,ρ⊖) 7→ (ρ̂ , ρ̂⊕, ρ̂⊖) = LL⋆d ,k (ρ ,ρ⊕,ρ⊖) (2.17) defined as follows. Let d⋆ +,d⋆ + ′,d⋆ −,d⋆ − ′ be Poisson variables with parameter d/2, conditioned on being positive. Moreover, let r 1 = ( r ,1,r ⊕,1,r ⊖,1,r #,1 ) , r 2 = ( r ,2,r ⊕,2,r ⊖,2,r #,2 ) , . . . be multinomial variables with k −1 trials and probabilities p = (1−e−d/2)2, p⊕ = p⊖ = e−d/2(1−e−d/2), p# = e−d . (2.18) 11 For i , j ≥ 1 let η ,i , j , η⊕,i , j , η⊖,i , j be random variables with law ρ , ρ⊕, ρ⊖, respectively. All of the aforementioned random variables are mutually independent. Further, for a sign ε ∈ {±1} and a vector r = (r ,r⊕,r⊖,r#) of non- negative integers with r + r⊕+ r⊖+ r# = k −1 and i ≥ 0, 1 ≤ j ≤ 4 we let ΞΞΞi , j (ε,r ) = 1− 1 2r# ·Γ ( ε ( η ,4i+ j ,1, . . . ,η ,4i+ j ,r )) Γ ( ε ( η⊕,4i+ j ,1, . . . ,η⊕,4i+ j ,r⊕ )) Γ ( ε ( η⊖,4i+ j ,1, . . . ,η⊖,4i+ j ,r⊖ )) . (2.19) The r.h.s. of (2.19) amounts to rewriting the argument of the logarithm in (2.16) when the number of variables of each type is distributed according to r . Finally, let ΞΞΞi , j =ΞΞΞi , j ( (−1) j+1,r 4i+ j ) . (2.20) Then the operator (2.17) maps ρ ,ρ⊕,ρ⊖ to the distributions ρ̂ , ρ̂⊕, ρ̂⊖ of the random variables ρ̂ ∼− d⋆ +∑ i=1 logΞΞΞi ,1 + d⋆ −∑ i=1 logΞΞΞi ,2 , ρ̂⊕ ∼− d⋆ + ′∑ i=1 logΞΞΞi ,3 , ρ̂⊖ ∼+ d⋆ − ′∑ i=1 logΞΞΞi ,4 . (2.21) 2.5.3. Coupling and contraction. While Montanari and Shah [56] do not write their proof of the lower bound dMS(k) ≤ duniq(k) in the language of distributional recurrences, translating their argument to the current formal- ism evinces two key differences by comparison to the approach that we are going to take. First, Montanari and Shah establish contraction with respect to messages from clauses to variables, instead of messages from variables to clauses as considered here. While this change of perspective may seem innocuous at first, working with respect to variables provides us with greater control over how the change in log-likelihood ratios propagates. In particular, working with variable- to-clause messages and taking into account the four variable types ,⊕,⊖,# allows us to optimise the metric with respect to which we establish contraction. Hence, for t > 0 we endow the space P (−∞,∞]×P (0,+∞]×P (−∞,0] with the metric distt (( ρ ,ρ⊕,ρ⊖ ) , ( ρ′ ,ρ′ ⊕,ρ′ ⊖ ))= ( 1−e−t/2) ·W1 ( ρ ,ρ′ )+e−t/2 ·W1 ( ρ⊕,ρ′ ⊕ )+e−t/2 ·W1 ( ρ⊖,ρ′ ⊖ ) . (2.22) The following proposition summarises the main step towards the proof of Theorem 1.2. Proposition 2.15. For every d < dcon(k), the operator LL⋆d ,k is a contraction with respect to the metric distd . The second key difference between [56] and the present approach will emerge in the proof of Proposition 2.15 itself. As we are about to see, leveraging the four variable types enables us to carry out a sharper bound on the derivative of our operator LL⋆d ,k . This comes in the form of a subtle combinatorial coupling between variable types among clauses with opposite signs. To explain this, we recall that LL⋆d ,k describes how the laws of the log- likelihood ratios ρ ,ρ⊕, and ρ⊖, evolve given the corresponding laws of the variables in one generation below. Recall also that we are always considering the positive boundary condition, i.e., the one maximising the value of each log-likelihood ratio. Let us write ρ = (ρ ,ρ⊕,ρ⊖), ρ′ = (ρ′ ,ρ′ ⊕,ρ′ ⊖), and ρ̂, ρ̂′ for their corresponding images under the operator LL⋆d ,k . We wish to establish that distd (ρ̂, ρ̂′) < c ·distd (ρ,ρ′), for some constant c = c(d ,k) < 1. We call a clause a positive if it contains its parent variable as a direct literal; otherwise, we call a negative. The change between the output distributions ρ̂, ρ̂′ describing the log-likelihood law of, say, variable x, comes from two sources: the positive and the negative children of x. Observe that there is no obvious symmetry between the two, as we have imposed the positive boundary condition, and therefore, the influence of positive clauses is typically more pronounced. In turn, the change caused by each clause can be further attributed to that of the k −1 grandchildren variables it features. To be more precise, let us consider the contribution of a single positive clause a. Let us write r = (r ,r ⊕,r ⊖,r #) for the type-distribution of the children variables of a, where r follows the law described in (2.18). Consider also an arbitrary enumeration of the variables of each type t ∈ { ,⊕,⊖}, and write D t i (z,r ;+1) for the magnitude of the partial derivative of the message clause a sends to x, with respect to the message clause a receives from its i -th variable of type t . Then, the expected contribution of clause a to the distance dist(ρ̂, ρ̂′) is bounded in terms of D t i (z,r ;+)’s as follows E [ r ∑ i=1 ∣∣∣∣∣ ∫ η′ ,i η ,i D i (wi ,r ;+1)dwi ∣∣∣∣∣+ r⊕∑ j=1 ∣∣∣∣∣ ∫ η′⊕, j η⊕, j D⊕ j (y j ,r ;+1)dy j ∣∣∣∣∣+ r⊖∑ ℓ=1 ∣∣∣∣∣ ∫ η′⊖,ℓ η⊖,ℓ D⊖ ℓ (zℓ,r ;+1)dzℓ ∣∣∣∣∣ ] , (2.23) 12 FIGURE 2. Example of a coupling between derivative terms in (2.24)–(2.25). For vector r and type t ∈ { ,⊕,⊖}, we pair the term D t (z,r ;+1) in (2.24) with the term D t (z,pt (r );−1) in (2.25). whereηt ,i ,η′ t ,i follow the law ofρt ,ρ′ t , respectively. Expanding the expectation with respect to the type-distribution r , and writing P (r ) =P[r = r ], for the probability of a vector r = (r ,r⊕,r⊖,r#), we rewrite (2.23) as ∑ r P (r ) ( r ·E ∣∣∣∣∣ ∫ η′ ,1 η ,1 D 1 (z,r ;+1)dz ∣∣∣∣∣+ r⊕ ·E ∣∣∣∣∣ ∫ η′⊕,1 η⊕,1 D⊕ 1 (z,r ;+1)dz ∣∣∣∣∣+ r⊖ ·E ∣∣∣∣∣ ∫ η′⊖,1 η⊖,1 D⊖ 1 (z,r ;+1)dz ∣∣∣∣∣ ) . (2.24) The expected contribution of a negative clause is given by an expression similar to (2.24), albeit in terms of D t 1(z,r ;−), i.e., the partial derivative of the message a → x, with respect to the message from a variable of type t to clause a. Specifically, the expected contribution of a negative clause reads: ∑ r ′ P (r ′) ( r ′ ·E ∣∣∣∣∣ ∫ η′ ,1 η ,1 D 1 (z,r ′;−1)dz ∣∣∣∣∣+ r ′ ⊕ ·E ∣∣∣∣∣ ∫ η′⊕,1 η⊕,1 D⊕ 1 (z,r ′;−1)dz ∣∣∣∣∣+ r ′ ⊖ ·E ∣∣∣∣∣ ∫ η′⊖,1 η⊖,1 D⊖ 1 (z,r ′;−1)dz ∣∣∣∣∣ ) . (2.25) It is not hard to see that pure literals have a ‘dampening’ effect on each partial derivative D t 1(z,r ;±). Consider a clause a whose children variables are distributed among the different types according to r . Then each derivative in (2.24)–(2.25), can be bounded in terms of the number of pure literals featured in a, excluding the variable of type t with respect to which the derivative is taken. Notice that the operator LL⋆d ,k effectively incorporates the positive boundary condition by imposing the sign of each variable with respect to its parent clause, a, to be + if a is positive, and − if a is negative. With that in mind, we see that if a is a positive clause, the total number of pure literals it contains is just r⊖ + r#. On the other hand, if a is negative, then the total number of pure literals it contains is r⊕ + r#. Bounding separately each derivative D t i (z,r ;±1) in (2.24)–(2.25), and invoking the mean value theorem, yields an upper bound on for the contraction constant c. However, we can do better by partitioning the derivatives in (2.24)–(2.25) into groups, and optimising them jointly. Indeed, a careful examination of the expression (2.19), reveals that, for example, any sum of the form D⊕(z, (∗,∗,r⊖,r#);+1)+D⊕(z, (∗,r⊖+1,∗,r#);−1) can be explicitly maximised, and the resulting maximum is smaller than the sum of maxima of the parts. At first sight, this seems to be of little use, if any, as in order to implement such a coupling between terms (2.24)–(2.25), we should also match their coefficients, that is, the quantity P (r ) ·r⊕, must remain invariant under the coupling. Somewhat unexpectedly, it turns out that the coupling r 7→ r ′ with r ′ = (r ,r⊖ + 1,r⊕ − 1,r#), enjoys both features. Similar couplings strategies (depicted in Figure 2) facilitate the maximisation of ⊕,⊖-terms. The full proof of Proposition 2.15 can be found in Section 7. We conclude the section explaining how Theorem 1.2 follows from the above. Proof of Theorem 1.2. From the triangle inequality, and Lemma 2.12, it is immediate to obtain Theorem 1.2 from Proposition 2.13. □ 13 2.6. Discussion. The location of the random 2-SAT satisfiability threshold was pinpointed already in the 1990s [21, 40] essentially because the threshold coincides with the giant component phase transition of a directed random graph whose edges correspond to the clauses. This argument also implies that both the pure literal algorithm and another efficient algorithm called unit clause propagation find satisfying assignments up to the satisfiability threshold w.h.p. By contrast, in the case of random k-SAT with k ≥ 3 the satisfiability threshold is known only for k exceeding an undetermined (but large) constant k0 [33]. The proof is based on a sophisticated, physics-inspired second moment argument that significantly extends ideas from earlier work [5, 7, 27]. Asymptotically in the limit of large k the satisfiabiliy threshold reads dsat(k) = k [ 2k log2− 1+ log2 2 ] +εk , where lim k→∞ εk = 0. (2.26) Even though [5, 7, 33, 27] rely on the second moment method, they do not yield asymptotically tight estimates of the number of satisfying assignments for any regime of d . This is because the second moment method is applied not to the number of satisfying assignments, but to another, exponentially smaller random variable. The assump- tion that k exceeds a large constant is used critically in [27, 33] to ensure certain concentration and expansion properties. For 3 ≤ k < k0 even the existence of a uniform satisfiability threshold remains an open problem, although a sharp threshold sequence that may vary with n is known to exist [36]. That said, an upper bound on the satisfiabil- ity threshold (sequence) that matches the so-called ‘1-step replica symmetry breaking’ prediction from statistical physics can be verified using the interpolation method from mathematical physics [35, 49, 51, 60]. However, the currently known lower bounds for small k (say, k = 3,4,5) fall short of this upper bound [7, 42, 46]. For exam- ple, in the case k = 3 the best current lower bound is dsat(3) ≥ 10.56, while dsat(3) ≈ 12.801 according to physics predictions [49, 51]. Thus, the satisfiability of random formulas continues to pose a substantial challenge for ‘small’ 3 ≤ k < k0. In light of this, a particularly satisfactory aspect of the present results is that they apply and are meaningful for all k ≥ 3. In fact, comparing the asymptotic bounds (1.10) and (2.26), we see that the Gibbs uniqueness threshold duniq(k) is much smaller than dsat(k) for large k . Thus, Theorems 1.1 and 1.2 cover larger shares of the satisfiable regime of d for smaller values of k; cf. Table 1. The best current lower bounds on the satisfiability thresholds for k ≥ 4 are non-constructive. With respect to the algorithmic problem of finding a satisfying assignment of a random k-CNF the best current results for ‘small’ k are based on simple combinatorial algorithms, analysed via the method of differential equations [37, 42, 46]. Asymptotically for large k the best known efficient algorithm [22] succeeds up to dalg(k) = (1−εk )2k logk where lim k→∞ εk = 0, (2.27) about a factor of log(k)/k below (2.26). There is evidence that certain types of algorithms do not succeed for much larger values of d , at least for enough large k [2, 15, 23, 44]. Apart from the task of finding a satisfying assignment, an important line of work deals with the problem of counting and sampling satisfying assignments of random k- CNFs for large k [19, 20, 43]. The best current result [20] covers the regime d ≤ 2k /kc for an undetermined (large enough) constant c > 0. Since for large k the bound 2k /kc significantly exceeds the pure literal threshold (1.10), it might be an interesting question whether ideas from [19, 20, 43] can be used to verify the replica symmetric solution (1.7) for d beyond the Gibbs uniqueness threshold for large k. Most of the prior work on the rigorous verification of the replica symmetric solution focuses on a soft version of random k-SAT, the so-called random k-SAT model at inverse termperature β> 0 [57]. The partition function Zβ(Φ) of this model, its the key quantity of interest, is defined as follows. For a clause a of the random formula Φ and a truth assignment σ write σ |= a if σ satisfies clause a. Then Zβ(Φ) = ∑ σ∈{±1}n exp ( −β m∑ i=1 1{σ ̸|= ai } ) . (2.28) Thus, each assignment σ contributes a summand equal to exp(−β) raised to the power of the number of clauses that σ fails to satisfy. In effect, Z (Φ) = lim β→∞ Zβ(Φ). (2.29) 14 A line of prior work [13, 59, 62] deals with the derivation of the ‘thermodynamic limit’ lim n→∞ 1 n E [ log Zβ(Φ) ] (2.30) for small d and/or small β. Specifically, these works verify that (2.30) is given by the replica symmetric solution at inverse temperature β from [54, 55] under the assumption d(k −1)min { 1,6βexp(4β) }< 1. (2.31) We observe that for large β the bound (2.31) holds only up to the giant component threshold d = 1/(k −1), where the replica symmetric solution trivially follows from the fact that Belief Propagation is exact on acyclic graphical models [50, Theorem 4.1]. That said, a technique called the interpolation method shows that the replica symmetric solution yields an upper bound on (2.30) for all d ,β > 0 [35, 60]. In particular, we will combine the interpolation method with a concentration argument in order to prove Corollary 2.2. According to physics predictions the ‘replica symmetric solution’ from [54, 55] yields the correct value of both limn→∞ n−1 log Zβ(Φ) for all β> 0 and of limn→∞ n−1 log Z (Φ) for all d up to a threshold drsb(k) close to but strictly below the satisfiability threshold dsat(k) for all k ≥ 3 [47]. The threshold drsb(k) is known as the ‘1-step replica symmetry breaking phase transition’ in physics jargon; its asymptotic value is predicted as drsb(k) = k [ 2k log2−2log2 ] +εk , where lim k→∞ εk = 0. (2.32) Indeed, the interpolation method can be used to verify that the replica symmetric solution ceases to be correct for drsb(k)+εk < d < dsat(k) with εk → 0. Conversely, the replica symmetric solution is known to be correct for all d and β > 0 where a certain correlation decay condition is satisfied [26], provided that k is large enough. Physics methods predict that this condition holds for all β> 0 and all d < drsb(k) [47]. The aforementioned work of Montanari and Shah [56] also deals with the soft variant of random k-SAT (2.28), but allows for an inverse temperature β = β(n) that tends to infinity slowly as n →∞. Specifically, considering a small power β = nδ enables Montanari and Shah to estimate the number of assignments that satisfy all but o(n) clauses. The proof combines an interpolation on 0 ≤ β ≤ nδ with a contraction argument that improves over the previous contraction estimates from [13, 59, 62]. Instead of the interpolation on β, in order to prove Theorem 1.1 we use the Aizenman-Sims-Starr scheme. Because we count actual satisfying assignments, this requires the care- ful combinatorial analysis of tail events, which is where the PULP algorithm and its analysis come in. Additionally, towards the proof of Theorem 1.2 we devise an improved version of the contraction argument from [56]. Following Montanari and Shah, we also take advantage of the impact of pure literals on the Belief Propagation operator. But we develop an improved coupling scheme that yields a better range of d for which contraction occurs. Addition- ally, once again because we deal with actual satisfying assignments, the proof of the Gibbs uniqueness property involves the analysis of the PULP algorithm on a Galton-Watson tree in order to cope with unlikely events. By comparison to the ‘soft’ random k-SAT model (2.28), few prior contributions deal with the actual number Z (Φ) of satisfying assignments. A result of Abbe and Montanari [1] implies that a deterministic limit (in probability) lim n→∞ 1 n log Z (Φ) (2.33) exists for Lesbegue-almost all 0 < d < dpure(k) for all k ≥ 2. However, the proof, which is based on the interpolation method, does not reveal the value of (2.33). In fact, prior to the present work the limit (2.33) was known only in two cases. First, in the trivial regime d < 1/(k −1) below the giant component threshold. Second, in the case k = 2 for 0 < d < dsat(2) = 2 [3]. In both cases the limit (2.33) coincides with the replica symmetric solution from [54]. Beyond the convergence in probability, log Z (Φ) is known to satisfy a central limit theorem in the case k = 2 [17]. To compute the limit (2.33) in the case k = 2 the contribution [3] employs the Aizenman-Sims-Starr scheme. The couplings that we use towards the proofs of Proposition 2.5–2.6 generalise the argument from [3] to k ≥ 3. The main technical novelty lies in the way that moderately unlikely events are treated. Specifically, in the case k = 2 the simple Unit Clause propagation algorithm, which essentially boils down to directed reachability, was sufficient to derive a tail bound similar to (and actually stronger than) (2.5). By contrast, since in the case k ≥ 3 the clauses “branch out”, the analysis of tail events and, accordingly, the derivation of (2.5) is far more delicate. The core of this derivation is the detailed analysis of the PULP algorithm right up to dpure(k). Finally, by contrast to random k-SAT the validity of the replica symmetric solution is known for the optimal pa- rameter range for several other random constraint satisfaction problems that enjoy certain symmetry properties. 15 Examples include random graph colouring or random k-NAESAT [11, 25]. Due to the symmetry property 7 the replica symmetric solution simply coincides with the first moment of the number of solutions. In effect, in many symmetric problems it is even possible to precisely determine the limiting distribution of the number of solutions, which superconcentrates on the first moment [24]. By contrast, in random k-SAT the first moment overshoots the typical number of satisfying assignments by an exponential factor [5], which is why random k-SAT is so much more delicate than symmetric problems. That said, there is a regular variant of random k-SAT (where every vari- able appears an equal number of times positively and negatively) where symmetry and superconcentration are recovered [29]. 2.7. Organisation. In the remaining sections we work our way through the proofs of Theorems 1.1 and 1.2. Specif- ically, in Section 3 we analyse the PULP algorithm introduced in Section 2.3, proving Lemmas 2.8-2.10, which facil- itates many of the subsequent results. In Section 4 we establish Proposition 2.1, verifying that the quantities appearing in Theorem 1.1 are well- defined. The proof of Corollary 2.2 follows in Section 5. Section 6 is devoted to the Aizenman-Sims-Starr scheme, and in particular the proof of Proposition 2.3. There we also complete the proof of Theorem 1.1. Our final Section 7, deals with the remaining proofs toward establishing Theorem 1.2. We begin by proving Lemma 2.14, showing that the log-likelihood ratios of the random Galton-Watson formula close to the root are bounded w.h.p. This enables us to compare the output distribution of the non-random operator introduced in Section 2.5 with that of actual ratios on the random tree. We then proceed with the proof of Proposition 2.15, and conclude with the proof of (2.13), completing the proof of Theorem 1.2. 3. ANALYSIS OF PULP This section is concerned with the analysis of PULP from Section 2.3. In particular, we prove Lemma 2.10. But let us get the proof of Lemma 2.8 out of the way first. 3.1. Proof of Lemma 2.8. Suppose that L̄ ⊇L satisfies PULP1–PULP2. Let U = {|l | : l ∈ L̄ } be the set of variables underlying the literals L̄ . Moreover, let χ : U → {±1} be the truth assignments under which all literals of l ∈ L̄ evaluate to ‘true’. Due to PULP2, the assignment χ is well defined. Moreover, since L ⊆ L̄ , under χ all literals l ∈L evaluate to ‘true’. Hence, for a satisfying assignment σ ∈ S(Φ) define an assignment σ′ by letting σ′(x) = 1{x ∈U }χ(x)+ 1{x ̸∈U }σ(x) . Because L̄ satisfies condition PULP1, we have σ′ ∈ S(Φ,L ). Finally, because for a satisfying assignment τ′ ∈ S(Φ,L ) there are no more than 2|U | = 2|L̄ | satisfying assignments τ ∈ S(Φ) such that τ(x) = τ′(x) for all x ̸∈U , we obtain the desired bound Z (Φ) ≤ 2|L̄ |Z (Φ,L ). 3.2. Turning a tree to PULP. While the ultimate goal of this section is to study the PULP algorithm on the random formula Φ′ to prove Lemma 2.10, a necessary preparation is to investigate the algorithm on the random Galton- Watson tree T = Td ,k . Of course, since T may be infinite we should formally confine ourselves to the finite trees T(ℓ) truncated at the 2ℓ-th level from the root x. Hence, recalling (2.6), we aim to estimate the height hx(s,T(ℓ)) for finite ℓ. That said, since these random variables are monotonically increasing in ℓ, it makes sense to define hx(s,T) = lim ℓ→∞ hx(s,T(ℓ)) ∈ [0,∞]. (3.1) We point out that for d < dpure(k) the tails of hx(s,T) decay at a doubly exponential rate. Lemma 3.1. For any d < dpure(k) there exist c1 = c1(d ,k),c2 = c2(d ,k) > 0 such that P [ hx(±1,T) ≥ h ]≤ c1 ·exp (−exp(c2 ·h) ) for every h ≥ 1 . Proof. By symmetry it suffices to consider hx(1,T). Thus, let ph,ℓ = P [ hx(1,T(ℓ)) ≥ h ] . All variables at distance 2ℓ from x are leaves and therefore pure in the tree T(ℓ). Consequently, pure literal elimination removes all clauses of T(ℓ) within at most ℓ rounds. Hence, ph,ℓ = 0 for h > ℓ. Furthermore, we claim that ph,ℓ =ϕd ,k (ph−1,ℓ−1), where ϕd ,k (z) = 1−exp ( −d 2 zk−1 ) (1 ≤ h ≤ ℓ). (3.2) 7A formal definition of ‘symmetry’ could be that the uniform distribution on spins is a fixed point of the Belief Propagation operator on the respective Galton-Watson tree. 16 Indeed, if hx(1,T(ℓ)) ≥ h ≥ 1 then by (2.6) there exists a clause a ∈ ∂Tr with sign(x, a) =−1 such that ha(T(ℓ)[x 7→ 1]) ≥ h −1. In other words, pure literal elimination on the sub-tree T(ℓ)[a] of T(ℓ) rooted at clause a and with variable x removed takes at least h −1 rounds to remove clause a. Consequently, pure literal elimination on T(ℓ)[a] takes at least h−1 rounds to remove one of the variables x ∈ ∂Ta \{x}. In other words, the sub-treeT(ℓ)[x] comprising x and its successors satisfies hx (sign(x, a),T(ℓ)[x]) ≥ h −1 for every x ∈ ∂Ta \ {x}. (3.3) But sinceT is a Galton-Watson tree, the sub-treeT(ℓ)[x] has the same distribution as the random treeT(ℓ−1). Hence, (3.3) implies that for every a ∈ ∂Tx with sign(x, a) =−1, P [ ha(T(ℓ)[x 7→ 1]) ≥ h −1 |T(1) ] = pk−1 h−1,ℓ−1. (3.4) Finally, the construction of T ensures that the number of a ∈ ∂Tx with sign(x, a) = −1 has distribution Po(d/2). Therefore, (3.4) shows that ph,ℓ = ∞∑ i=0 P [Po(d/2) = i ] ( 1− ( 1−pk−1 h−1,ℓ−1 )i ) = 1−exp ( −d 2 pk−1 h−1,ℓ−1 ) , which completes the proof of (3.2). Since the sequences (ph,ℓ)ℓ are non-decreasing, the limits ph = limℓ→∞ ph,ℓ exist. Moreover, (3.2) shows that ph =ϕd ,k (ph−1) (h ≥ 1). (3.5) Hence, recalling the definition (1.8) of dpure(k), we find ( ph+1 ph )k−1 = ( ϕd ,k (ph) ph )k−1 = d · ( 1−exp (−d pk−1 h /2 ))k−1 d pk−1 h ≤ d dpure < 1 . Consequently, lim h→∞ ph = 0. (3.6) To complete the proof we expand ϕd ,k (z) around z = 0: ϕd ,k (z) = d 2 zk−1 +O(z2k−2) as z → 0. (3.7) Thus, the function ϕd ,k (z) is well approximated by a (k −1)-th power. Since k ≥ 3, combining (3.5)–(3.7), we con- clude that for sufficiently large h we have ph ≤ (d/2+1)pk−1 h−1. Consequently, ph ≤ c1 ·exp (−exp(c2 ·h) ) for suitable c1 = c1(d ,k) and c2 = c2(d ,k). □ We remind ourselves that {±1 · x}T(ℓ) signifies the output of PULP run on the formula T(ℓ) with initial literal set {±1 · x}. We extend the definition of the closure to the (possibly infinite) tree T by letting {±1 · x}T = ⋂ ℓ0≥1 ⋃ ℓ≥ℓ0 {±1 · x}T(ℓ) . This definition ensures that if the height hx(±1,T) from (3.1) is finite, then {±1 · x}T = {±1 · x}T(ℓ) for all ℓ≥ hx(±1,T). In order to estimate the size of this set, we combine Lemma 3.1 with a crude bound on the total number of variable nodes of the Galton-Watson tree T(ℓ). Recall that V (T(ℓ)) signifies the set of variable nodes of T(ℓ). Lemma 3.2. Let d > 0. For any ℓ≥ 1 and any t > 100(1+d(k −1))2 we have P [ |V (T(ℓ))| > tℓ ] ≤ ℓexp(−tℓ/2/4). Proof. Let Nℓ = |V (T(ℓ))| for brevity, set g = 10(1+d(k −1)) and notice that t > g 2. The construction of the Galton- Watson tree T ensures that N 0 = 1 and that for ℓ ≥ 1 given Nℓ−1 we have Nℓ ∼ (k − 1) · Po(d Nℓ−1). Therefore, Bennett’s inequality shows that P [ N h > g h−ℓtℓ | N h−1 ≤ g h−1−ℓtℓ ] ≤ exp ( − tℓ 4gℓ−h ) ≤ exp(−tℓ/2/4) (1 ≤ h ≤ ℓ). (3.8) 17 Furthermore, if Nℓ > tℓ then there exists 1 ≤ h ≤ ℓ such that N h > g h−ℓtℓ while N h−1 ≤ g h−1−ℓtℓ. Hence, combin- ing (3.8) with the union bound completes the proof. □ Corollary 3.3. For any d < dpure(k) there exists c3 = c3(d ,k) > 0 such that P[|{±1 · x}T| > t ] ≤ c3 exp(−t 1/c3 ) for all t > 0. Proof. By symmetry it suffices to consider N = |{x}T|. Since Lemma 3.1 shows that P[hx(1,T) < ∞] = 1, we may assume from now on that indeed hx(1,T) <∞. Moreover, picking c3 = c3(d ,k) > 0 large enough, we may assume that t > t0 for a large t0 = t0(d ,k). Let Nℓ = |V (T(ℓ))|, ph =P[ hx(1,T) = h ] and g = 10(1+d(k−1)). It is an immediate consequence of the way that PULP proceeds that for all l ∈ {x}T we have |l | ∈ V (T(hx(1,T))). Hence, N ≤ Nhx(1,T). Therefore, by the law of total probability, P [N > t ] ≤ ∑ h≥0 Sh , where Sh =P[ hx(1,T) = h ] P [ N h > t | hx(1,T) = h ] . (3.9) Depending on the value of t in relation to h, we use either Lemma 3.1 or Lemma 3.2 to bound Sh . Case 1: t0 < t ≤ g 2h : Lemma 3.1 shows that for certain c1,c2 > 0 we have Sh ≤P[ hx(1,T) = h ]≤ c1 exp(−exp(c2h)) ≤ c12−h exp(−t 1/c3 ), (3.10) provided c3 is chosen large enough. Case 2: t > g 2h : we apply Lemma 3.2 to obtain Sh ≤P [N h > t ] ≤ h exp(− p t/4) ≤ h2−h exp(−t 1/3), (3.11) provided that t > t0 is sufficiently large. Combining the bounds (3.9)–(3.11) completes the proof. □ Corollary 3.4. For any d < dpure(k) we have E[|{±1 · x}T|2] <∞. Proof. This is an immediate consequence of Corollary 3.3. □ 3.3. Proof of Lemma 2.10. Because the distribution ofΦ′ is invariant under variable permutations and inversions, we may assume the initial set L of literals passed to PULP is just L = {x1, . . . , xL} for an integer L = Õ(1). For an integer ℓ≥ 1 let φ′ ℓ,L be the sub-formula ofΦ′ comprising all clauses and variables at distance at most 2ℓ from L . We recall that this formula has a bipartite graph representation G(φ′ ℓ,L) with variable nodes V (φ′ ℓ,L), clause nodes F (φ′ ℓ,L) and edges E(φ′ ℓ,L). The excess of φ′ ℓ,L is defined as X ℓ,L = |E(φ′ ℓ,L)|− |V (φ′ ℓ,L)|− |F (φ′ ℓ,L)|. Thus, X ℓ,L =−L iff G(φ′ ℓ,L) consists of L acyclic components. Lemma 3.5. Let d > 0, c > 0 and assume that L ≤ logc n and ℓ≤ c loglogn. Then P [ X ℓ,L >−L ]= Õ(n−1), P [ X ℓ,L > 1−L ]= Õ(n−2). (3.12) Furthermore, there exists c4 = c4(c,d ,k) > 0 such that P [ |V (φ′ ℓ,L)|+ |F (φ′ ℓ,L)| > logc4 n ] =O(n−2). (3.13) Proof. We study breadth first search (‘BFS’) on the graph G(Φ′) from the start vertices L by means of a routine deferred decisions argument. Throughout the execution of BFS each variable node is in one of three possible states: unexplored, active, or finished. Towards the proof of (3.13) we study a ‘parallel’ version of BFS. More precisely, let A0 =L be the set of initially active variables, let U0 = {x1, . . . , xn} \ L comprise the initially unexplored variables and let F0 = ;. Further, for t ≥ 0 define At+1,Ut+1,Ft+1 as follows. If At =; then the process has stopped and we let At+1 = At =;,Ut+1 = Ut ,Ft+1 = Ft . Otherwise let At+1 be the set of all variable nodes y ∈ Ut such that there exist an active variable node x ∈At and a clause a that contains x and y ; in symbols, x, y ∈ ∂Φ′a. Further, let Ft+1 =Ft ∪At and Ut+1 = Ut \ At+1. The BFS exploration occurs ‘in parallel’ in the sense that all active vertices activate their previously unexplored second neighbours simultaneously. Let Ft be the σ-algebra generated by the first t rounds of parallel exploring. Then the distribution of |At+1| given Ft is stochastically dominated by a random variable with distribution (k −1)Po(d |At |). This is because by 18 the construction of the formula Φ′ the total number of clauses containing a given variable node has distribution Po(d(1− (k −1)/n)). Hence, for any u > 0 we have P [|At+1| > u |Ft ] ≤P [(k −1)Po(d |At |) > u] . (3.14) To complete the proof of (3.13) we mimic the argument from the proof of Lemma 3.2. Thus, let u = logc4−3 n for a large enough c4 = c4(c,d ,k) and set g = 10(1+d(k −1)). Since ℓ ≤ c loglogn, the bound (3.14) and Bennett’s inequality show that P [ |At+1| > g t+1−ℓu | |At | ≤ g t−ℓu ] ≤ exp ( − u 4gℓ−t+1 ) ≤ exp(−pu/4) =O(n−3) (0 ≤ t < ℓ). Hence, taking a union bound on 0 ≤ t < ℓ and observing that |V (φ′ ℓ,L)| ⊆A0 ∪·· ·∪Aℓ, we obtain P [ |V (φ′ ℓ,L)| > u logn ] = Õ(n−3). (3.15) Finally, another application of Bennett’s inequality demonstrates that with probability 1−O(n−2) no variable ofΦ′ appears in more than logn clauses. Thus, |F (φ′ ℓ,L)| ≤ |V (φ′ ℓ,L)| logn. Hence, (3.15) implies (3.13). We are left to establish (3.12). The way we set up the BFS process implies that there are only two ways in which excess edges can come about. First, there may be clauses a with ∂Φ′a ⊆At ∪At+1 such that |∂Φ′a ∩At | ≥ 2. Given that |At ∪At+1| ≤ logc4 n, the number of such a with |∂Φ′a∩At | = 2 has distribution Po(Õ(1/n)), and the number of a with |∂Φ′a∩At | > 2 has distribution Po(Õ(1/n2)). The second possibility is that for a variable x ∈At+1 there exist clauses a,b ∈ ∂Φ′x with ∂Φ′a,∂Φ′b ⊆At ∪At+1. Once again the number of such clauses has distribution Po(Õ(1/n)) given |At ∪At+1| ≤ logc4 n. Furthermore, excess inducing clauses occur independently at different rounds t of the BFS process. Thus, (3.12) follows from (3.13). □ We proceed to derive bounds on |L̄ | = |L̄Φ′ | depending on the value of the excess. To deal with the case of excess −L, let Λ = Θ(loglogn) and let (T[i ])i≥1 be a sequence of independent copies of the random tree T. In the case that the excess XΛ,L equals −L, the bound on |L̄ | follows from the fact that the Galton-Watson tree T captures the local structure of the graph G(Φ′) in combination with the bound from Corollary 3.3. More precisely, the following is true. Lemma 3.6. For any 0 < d < duniq(k) and c > 0 there exists ζ= ζ(c,d ,k) > 0 such that withΛ= ⌈ζ loglogn⌉ uniformly for all 1 ≤ L ≤ logc n and all u > 0 we have P [ 1{XΛ,L =−L}|L̄ | > u ]≤P [ L∑ i=1 |{x}T[i ]| > u ] +O(n−2). Proof. We begin by coupling the random formula φ′ ℓ,1 with the Galton-Watson tree T(ℓ)[1] for 0 ≤ ℓ ≤ Λ. The coupling operates in accordance with the iterations of the BFS process from the proof of Lemma 3.5. Under the coupling some of the variable and clause nodes of φ′ ℓ,1 and of the tree T(ℓ)[1] are identical, but both T(ℓ)[1] and φ′ ℓ,1 may contain additional clauses or variables. These additional clauses/variables result from excess edges of G(φ′ ℓ,1), i.e., edges that close cycles or merge different components in the course of the BFS process. For ℓ = 0 we just identify the start variable x1 with the root x of the Galton-Watson tree T[1]. Going from ℓ to ℓ+1, we remember the sets Aℓ,Aℓ+1 from the proof of Lemma 3.5. For each variable x ∈ Aℓ let Cx be the set of clauses a ∈ ∂Φ′x such that |∂Φ′a ∩Aℓ+1| = k −1 and also such that none of the variables y ∈Aℓ+1 ∩∂Φ′a appear in another clause b ̸= a with ∂Φ′b ⊆ Aℓ∪Aℓ+1. In other words, Cx contains all clauses a ∈ ∂Φ′x that do not induce excess edges. Let d x = |Cx | be the number of such clauses. As we pointed out in the proof of Lemma 3.5, d x is stochastically dominated by a Po(d) variable. Hence, there is a random variable d ′ x such that d x +d ′ x ∼ Po(d). For any variable x ∈ Aℓ that is also a variable node of T(ℓ)[1] we add all clauses a ∈ Cx and the k −1 variables y ∈ ∂Φ′a ∩Aℓ+1 to T(ℓ+1)[1]. Additionally, T(ℓ+1)[1] contains d ′ x independent random clauses that contain x and k −1 new variable nodes without a counterpart in φ′ ℓ+1,1. Finally, to complete T(ℓ+1)[1] every variable y of T(ℓ)[1] at distance precisely 2ℓ from r such that y ̸∈ V (φ′ ℓ,1) independently begets Po(d) offspring clause nodes, each containing k −1 new variable nodes that do not belong to V (φ′ ℓ+1,1). The coupling ensures that φ′ Λ,1 is a sub-formula of T(Λ)[1] unless XΛ,1 > −L. The extension of this coupling to L = {x1, . . . , xL} is straightforward. We simply perform BFS exploration from the start variables x1, . . . , xL one after the other. Given that XΛ,L = −L, we thus couple the sub-formula of Φ′ explored from each xi with T(Λ)[i ] 19 for 1 ≤ i ≤ L such that φ′ Λ,L is contained in the union of T(Λ)[1], . . . ,T(Λ)[L]. Finally, we obtain independent copies T[1], . . . ,T[L] of the (possibly infinite) tree T by continuing the Galton-Watson processes T(Λ)[i ] independently for depths ℓ>Λ. The remaining task is to compare |L̄ | with ∑L i=1 |{x}T[i ]|. If XΛ,L = −L and if hx(1,T[i ]) < Λ for all 1 ≤ i ≤ L, then the coupling ensures that all clauses and variables of φ′ Λ,L are contained in the disjoint union of the trees T[1], . . . ,T[L], and thus |L̄ | ≤∑L i=1 |{x}T[i ]|. Therefore, for any u > 0 we have P [ 1 { XΛ,L =−L, max 1≤i≤L hx(1,T[i ]) <Λ } |L̄ | > u ] ≤P [ L∑ i=1 |{x}T[i ]| > u ] . (3.16) Furthermore, sinceΛ≥ ζ loglogn for a large ζ> 0, Lemma 3.1 ensures that P [ max 1≤i≤L hx(1,T[i ]) ≥Λ ] ≤O(n−2). (3.17) Combining (3.16) and (3.17) completes the proof. □ For later reference we make a note of the following immediate consequence of the coupling from the proof of Lemma 3.6. For two rooted Boolean formulas φ,φ′ we write φ ∼= φ′ if there is an isomorphism of φ and φ′ that preserves the root variable. We consider the random formulaφ′ ℓ,1 rooted at x1. Corollary 3.7. For every ℓ≥ 0 and any fixed tree T we have ∣∣∣P [ T(ℓ) ∼= T ]−P [ φ′ ℓ,1 ∼= T ]∣∣∣= o(1). From here on, we set Λ= ⌈c5 loglogn⌉ for a large enough c5 = c5(d ,k) > 0. We obtain the following bound on the second moment of |L̄ | on the event that the excess equals −L. Corollary 3.8. For any 0 < d < duniq(k) and any 1 ≤ L ≤ log2 n we have E[1{XΛ,L =−L} · |L̄ |2] =O(1). Proof. Since |L̄ | ≤ 2n deterministically, this is an immediate consequence of Corollary 3.3 and Lemma 3.6. □ As a next step we deal with the case that the excess equals 1−L. More precisely, with c6 = c6(d ,k) ≫ c5 a large enough constant letΛ+ = ⌈c6 loglogn⌉. We are going to bound |L̄ | on the event that XΛ,L = XΛ+,L = 1−L. The proof combines the bound on the probability of this event from Lemma 3.5 with a crude bound on |L̄ |. To elaborate, since Lemma 3.5 shows that the event XΛ,L = XΛ+,L = 1−L has probability Õ(n−1), we can essentially get away with simply bounding |L̄ | by the total number of variables within a 2Λ+ radius around the start variables L . Indeed, as Lemma 3.5 shows, this number of variables is very likely polylogarithmic in n. Working out the details, we obtain the following. Lemma 3.9. Let 0 < d < duniq(k) and let 1 ≤ L ≤ log2 n. Then E [ 1{XΛ,L = XΛ+,L = 1−L}|L̄ |3/2 ]= o(1). Proof. Let V + =V (φ′ Λ,L)\V (φ′ Λ−1,L) and obtainψ− fromΦ′ by deleting all variables from V (φ′ Λ−1,L) and all clauses from F (φ′ Λ,L). Further, let Λ− =Λ+−Λ and let ψ+ be the sub-formula of ψ− comprising all clauses and variables of ψ− with distance at most 2Λ− from V + (see Figure 3 below). If XΛ,L = XΛ+,L = 1−L then |V (φ′ Λ,L)|+ |F (φ′ Λ,L)|− |E(φ′ Λ,L)| = L−1, (3.18) |V (ψ+)|+ |F (ψ+)|− |E(ψ+)| = |V +|. (3.19) Moreover, Lemma 3.5 shows that for suitable c ′5 = c ′5(d ,k,c5),c ′6 = c ′6(d ,k,c6) > 0 we have P [ |V (φ′ Λ,L)|+ |F (φ′ Λ,L)| > logc ′5 n ] =O(n−2), (3.20) P [ |V (φ′ Λ+,L)|+ |F (φ′ Λ+,L)| > logc ′6 n ] =O(n−2). (3.21) Let L̂ ⊆ L̄ be the set of literals l that were added to the output set L̄ by Step 7 of PULP by way of clauses a ∈ F (φ′ Λ,L), i.e., at distance less than 2Λ from the initial set L . Let V − = {|l | : l ∈ L̂ }∩V + be the set of all variables at distance 2Λ from L in Φ′ that underlie a literal from L̂ . If XΛ,L = 1−L, the variables and clauses at distance at most 2Λ from L do not cause PULP to run into a contradiction, because each clause contains k ≥ 3 literals. Therefore, there does not exist a variable x such that both x and ¬x belong to L̂ . Hence, because the signs of 20 1 2 L · · · V + 2Λ 2Λ+ 2Λ− φφφ′ Λ+,L φφφ′ Λ,L ψψψ+ ψψψ− ... ... ... ... ... ... ... ... ... ... ... ... ... ... FIGURE 3. A sketch depicting the subformulasψ+,ψ−,φ′ Λ,L , andφ′ Λ+,L ofΦ′ constructed above. Φ′ are uniformly random and PULP proceeds in a BFS order, we may assume without loss of generality that L̂ contains positive literals only. Thus, V − = L̂ ∩V + ⊆ V +. (3.22) We now apply Lemma 3.6 to the random formulaψ−. Specifically, let L̄ + be the output of PULP on the formula ψ− with the start set L + comprised by the positive literals of V +. Further, let E be the event that |V (φ′ Λ,L)| + |F (φ′ Λ,L)| ≤ logc ′5 n. Let X + = |E(ψ+)|− |V (ψ+)|− |F (ψ+)| be the excess of ψ+. Since 0 < d < duniq(k), given E the formula ψ− has the same distribution as a random k-CNF with n− = n −O(logc n) variables and m− ∼ Po(d−n−/k) random clauses, with 0 < d− = d +o(1) < duniq(k). Hence, assuming that c6 = c6(d ,k) > c ′5 is sufficiently large, Lemma 3.6 shows that P [ 1E∩ {X + =−|V +|} · |L̄ +| > u ]≤P   ∑ 1≤i≤logc′5 n |{x}T[i ]| > u  +O(n−2) (u > 0). (3.23) Combining (3.23) with Corollary 3.3, we conclude that for a large enough c7 = c7(d ,k) > c ′6, P [ 1E∩ {X + =−|V +|} · |L̄ +| > logc7 n ]=O(n−2). (3.24) 21 In light of above, we see that E [ 1{XΛ,L = XΛ+,L = 1−L}|L̄ |3/2]≤ E[ 1E∩ {XΛ,L = XΛ+,L = 1−L}|L̄ |3/2]+ (2n)3/2(1−P [E]), [since |L̄ | ≤ 2n] ≤ E [ 1E∩ {XΛ,L = XΛ+,L = 1−L}(|L̄ +|+ logc ′5 n)3/2 ] +o(1), [from (3.20)] ≤ E [ 1E∩ {X + =−|V +|}(|L̄ +|+ logc ′5 n)3/2 ] +o(1), [from (3.18), (3.19)] ≤P[ 1E∩ {X + =−|V +|} · |L̄ +| > logc7 n ] (2n)3/2 +P[ {XΛ,L = 1−L} ] (2logc7 n)3/2 +o(1) [total probability] = o(1) [from (3.24),Lemma 3.5] completing the proof. □ Proof of Lemma 2.10. The assertion is immediate from Corollary 3.8, Lemma 3.9, Lemma 3.5 and the deterministic bound |L̄ | ≤ 2n. □ 4. PROOF OF PROPOSITION 2.1 Let π(ℓ) d ,k = BPℓd ,k (δ1/2) be the distribution obtained after ℓ iterations of BPd ,k (·), with the convention π(0) d ,k = δ1/2. We recall (µπ,i , j )i , j≥1 signify independent random variables with distribution π. Fact 4.1. For all ℓ≥ 0 the random variables µ π(ℓ) d ,k ,1,1 and 1−µ π(ℓ) d ,k ,1,1 are identically distributed. Proof. This is an immediate consequence of the fact that the random variables d+,d− from the definition (1.3)– (1.4) are identically distributed. □ While the following is a direct consequence of the fact that Belief Propagation is ‘exact on trees’ (see [50, Chap- ter 14] for precise statements), we carry out a detailed proof for the sake of completeness. Following the conven- tions from Section 1.2.1, we continue to denote by τ(ℓ) a random satisfying assignment of the k-CNF T(ℓ) =T(ℓ) d ,k . Fact 4.2. For all ℓ≥ 0, d > 0 we have P [ τ(ℓ)(x) = 1 |T]∼π(ℓ) d ,k . Proof. We proceed by induction on ℓ. As π(0) d ,k = δ1/2, for ℓ = 0 there is nothing to show. To go from ℓ−1 to ℓ ≥ 1, for a clause a ∈ ∂Tx and a variable y ∈ ∂Ta \ { x } letTy→a be the component of the forestT−a obtained by removing clause a that contains variable y . We consider y the root of Ty→a . Further, obtain T(ℓ−1) y→a from Ty→a by deleting all clauses and variables at a distance greater than 2(ℓ−1) from y . Additionally, for s ∈ {±1} let Z (ℓ)(s) = ∣∣∣ { σ ∈ S(T(ℓ)) :σ(x) = s }∣∣∣ , Z (ℓ−1) y→a (s) = ∣∣∣ { σ ∈ S(T(ℓ−1) y→a ) :σ(y) = s }∣∣∣ . (4.1) In words, Z (ℓ)(s) is the number of satisfying assignments of T(ℓ) that set the root x to s, and Z (ℓ−1) y→a (s) is the corre- sponding quantity for the sub-tree T(ℓ−1) y→a . Clearly, setting x to s ∈ {±1} immediately satisfies all clauses a ∈ ∂s T x. By contrast, once x is assigned the value +1 each clause a ∈ ∂−s T x needs to be satisfied by setting some other variable y ∈ ∂Ta \ { x } to the value sign(y, a). Hence, Z (ℓ)(s) =   ∏ a∈∂+ T x ∏ y∈∂Ta\{x} ∑ t∈{±1} Z (ℓ−1) y→a (t )   · [ ∏ a∈∂− T x ( ∏ y∈∂Ta\{x} ∑ t∈{±1} Z (ℓ−1) y→a (t )− ∏ y∈∂Ta\{x} Z (ℓ−1) y→a (−sign(y, a) ) )] . (4.2) Furthermore, the definition of the Galton-Watson tree T ensures that the sub-trees T(ℓ−1) y→a are independent copies of T(ℓ−1). Hence, by induction we have Z (ℓ−1) y→a (1) ∑ s∈{±1} Z (ℓ−1) y→a (s) ∼π(ℓ−1) d ,k for all a ∈ ∂Tx, y ∈ ∂Ta \ { x } , (4.3) 22 and the random variables Z (ℓ−1) y→a (1)/ ∑ s∈{±1} Z (ℓ−1) y→a (s) are mutually independent. Combining (4.2)–(4.3) with Fact 4.1, we finally obtain P [ τ(ℓ)(x) = 1 |T ] = Z (ℓ)(1) ∑ s∈{±1} Z (ℓ)(s) ∼ ∏d− i=1 [ 1−∏k−1 j=1 µπ(ℓ−1) d ,k ,2i−1, j ] ∏d− i=1 [ 1−∏k−1 j=1 µπ(ℓ−1) d ,k ,2i−1, j ] +∏d+ i=1 [ 1−∏k−1 j=1 µπ(ℓ−1) d ,k ,2i , j ] ∼π(ℓ) d ,k , thereby completing the induction. □ Combining the combinatorial interpretation of the distributions π(ℓ) d ,k with the Gibbs uniqueness property, we proceed to show that the sequence (π(ℓ) d ,k )ℓ converges in the weak topology. To this end, it suffices to show that the sequence is Cauchy with respect to the Wasserstein W1 metric. Lemma 4.3. If d < duniq(k) then (π(ℓ) d ,k )ℓ≥0 is a W1-Cauchy sequence. Proof. If d < duniq(k) then the random treeT=Td ,k enjoys the Gibbs uniqueness property; hence, (1.1) is satisfied. Consequently, given 0 < ε< 1 we can choose ℓ0 = ℓ0(d ,k,ε) > 0 large enough so that the event Uε,ℓ = { max τ∈S(T(ℓ)) ∣∣∣P [ τ(ℓ)(x) = 1 |T ] −P [ τ(ℓ)(x) = 1 |T, ∀x ∈ ∂2ℓx :τ(ℓ)(x) = τ(x) ]∣∣∣> ε } has probability P [ Uε,ℓ ]< ε , for all ℓ≥ ℓ0. (4.4) Now suppose that ℓ0 ≤ ℓ < ℓ′. Let τ(ℓ),τ(ℓ′) be independent uniformly random satisfying assignments of T(ℓ) and T(ℓ′), respectively. We claim that P [∣∣∣P [ τ(ℓ)(x) = 1 |T ] −P [ τ(ℓ′)(x) = 1 |T ]∣∣∣> ε ] < ε. (4.5) To see this, let τ(ℓ,ℓ′) = (τ(ℓ′)(x))x∈∂2ℓ T x comprise the truth values that τ(ℓ′) assigns to the variables at distance exactly 2ℓ from x. Then P [ τ(ℓ′)(x) = 1 |T ] = E [ P [ τ(ℓ′)(x) = 1 |T,τ(ℓ,ℓ′) ] |T ] = E [ P [ τ(ℓ)(x) = 1 |T,τ(ℓ,ℓ′), ∀x ∈ ∂2ℓx :τ(ℓ)(x) =τ(ℓ,ℓ′) x ] |T ] . Hence, for every T ∈Uε,ℓ we have ∣∣∣P [ τ(ℓ′)(x) = 1 |T= T ] −P [ τ(ℓ)(x) = 1 |T= T ]∣∣∣≤ ε. (4.6) Thus, (4.5) follows from (4.4) and (4.6). Finally, since Fact 4.2 demonstrates that P [ τ(ℓ)(x) = 1 |T= T ] ∼ π(ℓ) d ,k and P [ τ(ℓ′)(x) = 1 |T= T ] ∼ π(ℓ′) d ,k , (4.5) shows that W1 ( π(ℓ) d ,k ,π(ℓ′) d ,k ) < 2ε for all ℓ0 ≤ ℓ< ℓ′. Hence, the sequence (π(ℓ) d ,k )ℓ is Cauchy. □ We are left to bound the lower tail of the limiting distribution πd ,k = limℓ→∞π(ℓ) d ,k . Lemma 4.4. If d < duniq(k) then E log2µπd ,k ,1,1 <∞ . Proof. We are going to bound E log2µ π(ℓ) d ,k ,1,1 and subsequently invoke the monotone convergence theorem to com- plete the proof. First, we note that for all ℓ≥ 0 we have E log2µ π(ℓ) d ,k ,1,1 = E log2 Z (T(ℓ), {x}) Z (T(ℓ)) [by Fact 4.2] ≤ E[|{x}T|2 ] [by Lemma 2.8] <∞ [by Corollary 3.4]. 23 Since πd ,k is the weak limit of (π(ℓ) d ,k )ℓ, we conclude that for any N ∈N, E [ N ∧ log2µπd ,k ,1,1 ] = lim ℓ→∞ E [ N ∧ log2µ π(ℓ) d ,k ,1,1 ] ≤ E[|{x}T|2 ]<∞. (4.7) Finally, applying the monotone convergence theorem to the limit N → ∞, we see that the uniform bound (4.7) implies the assertion. □ Proof of Proposition 2.1. In light of Fact 4.1 and Lemmas 4.3 and 4.4, it only remains to show that E ∣∣∣∣∣log ( d−∏ i=1 µπd ,k ,2i + d+∏ i=1 µπd ,k ,2i−1 )∣∣∣∣∣<∞ and E ∣∣∣∣∣log ( 1− k∏ j=1 µπd ,k ,1, j )∣∣∣∣∣<∞ . Recall the definition of µπd ,k ,i from (1.4). Using Fact 4.1 and Lemma 4.4, we obtain E ∣∣∣∣∣log ( d−∏ i=1 µπd ,k ,2i + d+∏ i=1 µπd ,k ,2i−1 )∣∣∣∣∣≤ log(2)+E ∣∣∣∣∣log d−∏ i=1 µπd ,k ,2i ∣∣∣∣∣≤ log(2)+ d 2 E ∣∣∣logµπd ,k ,1 ∣∣∣ ≤ log(2)+ d 2 √ E ∣∣∣log2µπd ,k ,1,1 ∣∣∣<∞ , yielding the first inequality. Similarly, invoking Fact 4.1 and Lemma 4.4 for the second l.h.s. above gives E ∣∣∣∣∣log ( 1− k∏ j=1 µπd ,k ,1, j )∣∣∣∣∣≤ E ∣∣∣log ( 1−µπd ,k ,1,1 )∣∣∣= E ∣∣∣log ( µπd ,k ,1,1 )∣∣∣≤ √ E ∣∣∣log2µπd ,k ,1,1 ∣∣∣<∞ , thereby completing the proof. □ 5. PROOF OF COROLLARY 2.2 In order to turn the estimate of the expectation of log1∨Z (Φ) provided by Proposition 2.3 into a ‘with high proba- bility’ statement, we harness a ‘soft’ version of the k-SAT problem where violated clauses are discouraged but not strictly forbidden. To be precise, for a k-CNFΦ and a real β> 0 define Zβ(Φ) = ∑ σ∈{±1}V (Φ) ∏ a∈F (Φ) exp(−β1{σ ̸|= a}). (5.1) Thus, each satisfying assignment contributes one to the sum on the r.h.s. of (5.1), while the contribution of assign- ments that violate a number M of clauses equals exp(−βM). The value Zβ(Φ), called the partition function of the random k-SAT model at inverse temperature β, has received a considerable amount of attention in the mathemat- ical physics literature (see, e.g., [57]). Crucially, by means of an interpolation argument [35, 41] it is possible to prove the following. Theorem 5.1 ([60, Theorem 1]). For any k ≥ 3, any β> 0 and any probability measure π on [0,1] we have 1 n E [ log Zβ(Φ) ]≤ E [ log ( d−∏ i=1 µβ,π,2i + d+∏ i=1 µβ,π,2i−1 ) − d(k −1) k log ( 1− ( 1−e−β ) k∏ j=1 µπ,1, j )] , where (5.2) µβ,π,i = 1− (1−exp(−β)) k−1∏ j=1 µπ,i , j for i ≥ 1. We emphasise that the bound (5.2) holds for any n ≥ k without an error term. We also notice that by the monotone convergence theorem for the measure π=πd ,k from Theorem 1.1 we have lim β→∞ E [ log ( d−∏ i=1 µβ,πd ,k ,2i + d+∏ i=1 µβ,πd ,k ,2i−1 ) − d(k −1) k log ( 1− ( 1−e−β ) k∏ j=1 µπd ,k ,1, j )] = E [ log ( d−∏ i=1 µπd ,k ,2i + d+∏ i=1 µπd ,k ,2i−1 ) − d(k −1) k log ( 1− k∏ j=1 µπd ,k ,1, j )] =Bd ,k (πd ,k ) . (5.3) The reason why we proceed by way of the ‘soft’ model with β<∞ is that for this model a routine application of Azuma-Hoeffding implies the following concentration bound. Lemma 5.2. For any fixed β> 0 we have P [∣∣log Zβ(Φ)−E log Zβ(Φ) ∣∣>p n logn ]= o(1/n). 24 Proof. The clauses of the random formulaΦ are drawn independently, and adding or removing a single clause can alter the value of log Zβ( · ) by no more than ±β. □ Proof of Corollary 2.2. We proceed with a proof by contradiction. In particular, towards a contradiction, assume there exists an ε> 0 such that for infinitely many n ≥ 1 we have P [ 1 n log Z (Φ) >Bd ,k (πd ,k )+ε ] > ε . (5.4) Moreover, by (5.3) we can find a β0 > 0 such that for every β≥β0 we have ∣∣∣∣∣E [ log ( d−∏ i=1 µβ,πd ,k ,2i + d+∏ i=1 µβ,πd ,k ,2i−1 ) − d(k −1) k log ( 1− ( 1−e−β ) k∏ j=1 µπd ,k ,1, j )] −Bd ,k (πd ,k ) ∣∣∣∣∣< ε/3 . (5.5) Invoking Lemma 5.2 for β=β0 and sufficiently large n gives P [ 1 n log Zβ0 (Φ) > 1 n Elog Zβ0 (Φ)+ε/3 ] ≤ ε/3 . (5.6) The definition (5.1) of the partition function ensures that Zβ(Φ) ≥ Z (Φ) for all β> 0. Therefore, combining (5.4)– (5.6), and Theorem 5.1 we see that for large enough n the following holds with probability at least 1− 2 3ε: 1 n log Z (Φ) ≤ 1 n log Zβ0 (Φ) ≤ 1 n Elog Zβ0 (Φ)+ ε 3 ≤Bd ,k (πd ,k )+ 2 3 ε , contradicting our assumption, and thus completing the proof. □ 6. PROOF OF PROPOSITION 2.3 In this section we prove Propositions 2.5 and 2.6, which in light of Fact 2.4, imply Proposition 2.3. Both proofs follow a similar structure and make use of Propositions 2.7 and 2.11, which we therefore prove first. 6.1. Proof of Proposition 2.7. We show that both terms of (2.5) have finite expectation. Let us begin with the first one. Lemma 6.1. If d < duniq(k) then E [∣∣∣log Z (Φ′′)∨1 Z (Φ′)∨1 ∣∣∣ 3/2 ] =O(1). Proof. SinceΦ′′ is obtained fromΦ′ by adding clauses, we have 0 ≤ Z (Φ′′) ≤ Z (Φ′) ≤ 2n . (6.1) Hence, log Z (Φ′′)∨1 Z (Φ′)∨1 = 0 if Z (Φ′) = 0. (6.2) Therefore, we may assume from now on thatΦ′ is satisfiable. The number∆′′ ∼ Po(d(k −1)/k) of new clauses is a Poisson variable with bounded mean. Therefore, Bennett’s inequality shows that P [ ∆′′ > logn ] = O(n−2). Since (6.1) shows that | log((Z (Φ′′)∨ 1)/(Z (Φ′)∨ 1))|3/2 ≤ n3/2, we conclude that E [ 1 { ∆′′ > logn } · ∣∣∣∣log Z (Φ′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] = o(1). (6.3) Further, let c1, . . . ,c∆′′ be the new clauses added by CPL2. Let x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k be their constituent variables and let X = {x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k }. Since the clauses c1, . . . ,c∆′′ are chosen uniformly and inde- pendently, a routine balls-into-bins consideration shows that P [|X | ≤ k(∆′′−1) |∆′′ ≤ logn ]= Õ(n−2). (6.4) Now, consider the ‘good’ event G= { Z (Φ′) > 0,∆′′ ≤ logn, |X | > k(∆′′−1) } . 25 Combining (6.1)–(6.4), we see that E [ (1− 1G) · ∣∣∣∣log Z (Φ′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] = o(1). (6.5) Hence, we are left to bound E[1G·| log((Z (Φ′′)∨1)/(Z (Φ′)∨1))|3/2]. If G occurs and thus |X | > k(∆′′−1), then there exists a set of literals L ⊆ { x1,1,¬x1,1, . . . , x1,k ,¬x1,k , . . . , x∆′′,1,¬x∆′′,1, . . . , x∆′′,k ,¬x∆′′,k } such that • every clause ci contains a literal from L (1 ≤ i ≤∆′′), and • there does not exist x ∈X such that x ∈L and ¬x ∈L . Moreover, on G we have |L | ≤ |X | ≤ k logn. Let L̄ = L̄Φ′ be the output of PULP on (Φ′,L ). Then Lemma 2.8 shows that E [ 1G · ∣∣∣∣log Z (Φ′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] ≤ E [ 1G · ∣∣L̄ ∣∣3/2 ] . (6.6) Furthermore, since by CPL2 the new clauses c1, . . . ,c∆′′ are chosen independently of the formula Φ′, Lemma 2.10 implies that there exists C =C (d ,k) > 0 such that E [ 1G · ∣∣L̄ ∣∣3/2 |∆′′ ] ≤C · (∆′′)3/2 . (6.7) Combining (6.6)–(6.7) and recalling that∆′′ ∼ Po(d(k −1)/k), we obtain E [ 1G · ∣∣∣∣log Z (Φ′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] =O(1). (6.8) Finally, the assertion follows from (6.5) and (6.8). □ We move on to the second term of (2.5). Lemma 6.2. If d < duniq(k) then E [∣∣∣log Z (Φ′′′)∨1 Z (Φ′)∨1 ∣∣∣ 3/2 ] =O(1). Proof. We proceed similarly as in the proof of Lemma 6.1. The construction in CPL3 ensures thatΦ′′′ contains one additional variable xn+1 and ∆′′′ ∼ Po(d) new clauses b1, . . . ,b∆′′′ that each contain xn+1 and k −1 other variables. Let x1,1, . . . , x1,k−1, . . . , x∆′′′,1, . . . , x∆′′′,k−1 ∈ {x1, . . . , xn} be the variables among x1, . . . , xn that appear in b1, . . . ,b∆′′′ and let X = {x1,1, . . . , x∆′′′,k−1}. Then 0 ≤ Z (Φ′′) ≤ 2Z (Φ′) ≤ 2n+1. (6.9) Hence, ifΦ′ is unsatisfiable, then so isΦ′′′ and thus log Z (Φ′′′)∨1 Z (Φ′)∨1 = 0 if Z (Φ′) = 0. (6.10) Furthermore, since ∆′′′ ∼ Po(d), Bennett’s inequality shows that P [ ∆′′′ > logn ] = O(n−2). Therefore, (6.9) shows that E [ 1 { ∆′′′ > logn } · ∣∣∣∣log Z (Φ′′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] = o(1). (6.11) Moreover, since the k−1 variables among x1, . . . , xn that appear in the clauses b1, . . . ,b∆′′′ are chosen uniformly and independently, a simple balls-into-bins argument shows that P [|X | ≤ (k −1)(∆′′−1) |∆′′ ≤ logn ]= Õ(n−2). (6.12) Hence, consider the event G= { Z (Φ′) > 0,∆′′′ ≤ logn, |X | > (k −1)(∆′′−1) } . Combining (6.9)–(6.12), we obtain E [ (1− 1G) · ∣∣∣∣log Z (Φ′′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] = o(1). (6.13) 26 Furthermore, if the event G occurs, then there exists a set L ⊆ {x,¬x : x ∈ X } of literals such that each clause bi , 1 ≤ i ≤ ∆′′′, contains a literal l ∈ L and such that {x,¬x} ̸⊆ L for all x ∈ X . Hence, with L̄ = L̄Φ′ the output of PULP on (Φ′,L ), Lemma 2.8 shows that E [ 1G · ∣∣∣∣log Z (Φ′′′)∨1 Z (Φ′)∨1 ∣∣∣∣ 3/2 ] ≤ E[ 1G · |L̄ |3/2] . (6.14) Furthermore, since the clauses b1, . . . ,b∆′′′ are drawn independently of Φ′′′, Lemma 2.10 shows that there exists C =C (d ,k) > 0 such that E [ 1G · |L̄ |3/2 |∆′′′]≤C · (∆′′′)3/2. (6.15) Finally, since∆′′′ ∼ Po(d), the assertion follows from (6.13), (6.14) and (6.15). □ Proof of Proposition 2.7. The proposition follows immediately from Lemmas 6.1–6.2. □ 6.2. Proof of Proposition 2.11. Let π(ℓ) d ,k = BPℓd ,k (δ1/2) be the result of an ℓ-fold application of the operator BPd ,k from (1.3) to the point mass at 1/2. Also recall from (2.7) thatπ′ n denotes the empirical distribution of the marginals (P[σΦ′ (xi ) = 1 |Φ′])1≤i≤n . Lemma 6.3. Suppose that d < duniq(k). For any ε> 0 there exists ℓ0 = ℓ0(d ,k,ε) > 0 such that for all ℓ≥ ℓ0 we have E[W1(π′ n ,π(ℓ) d ,k ) | Z (Φ′) > 0] < ε+o(1). Proof. Assume that ℓ≥ ℓ0 for a large enough ℓ0 = ℓ0(d ,k,ε) > 0. Since d < duniq(k) and since T=Td ,k is a Galton- Watson tree in which every variable node has Po(d) clause nodes as offspring and the offspring of every clause node consists of k −1 variable nodes, there exists a set Tℓ of trees, with |Tℓ| =O(1), such that the following hold: T0: for every T ∈Tℓ we have P [ T(ℓ) = T ]> 0. T1: P [ T(ℓ) ∈Tℓ ]> 1−ε. T2: given T(ℓ) ∈Tℓ we have max τ∈S(T(ℓ)) ∣∣∣P [ τ(ℓ)(x) = 1 |T(ℓ) ] −P [ τ(ℓ)(x) = 1 |T(ℓ), ∀x ∈ ∂2ℓx :τ(ℓ)(x) = τ(x) ]∣∣∣< ε. For a variable node xi ofΦ′ obtain φ′ ℓ (xi ) fromΦ′ by deleting all variables and clauses at distance greater than 2ℓ from xi . We consider xi being the root of φ′ ℓ (xi ). Moreover, for a tree T ∈Tℓ let VT be the set of variable nodes xi , 1 ≤ i ≤ n, such that φ′ ℓ (xi ) ∼= T ; thus, there is an isomorphism of the CNFs T and φ′ ℓ (xi ) that maps the root x of T to xi . Consider the event Tℓ = { ∑ T∈Tℓ ∣∣∣P [ T(ℓ) ∼= T ] −|VT |/n ∣∣∣< ε } . (6.16) Then Corollary 3.7 implies that P [Tℓ] = 1−o(1) for every ℓ≥ 0. (6.17) We now claim that∣∣∣P [ τ(ℓ)(x) = 1 |T(ℓ) = T ] −P[ σΦ′ (xi ) = 1 |Φ′]∣∣∣< ε for all T ∈Tℓ, xi ∈ VT . (6.18) To see this, let Sℓ(Φ′, xi ) be the set of all assignments σ ∈ {±1}∂ 2ℓxi of the variables at distance 2ℓ from xi in Φ′ such that there exists a satisfying assignment σ′ ∈ S(Φ′) with σ′(y) = σ(y) for all y ∈ ∂2ℓxi . Then the law of total probability shows that P [ σΦ′ (xi ) = 1 |Φ′]= ∑ σ∈Sℓ(Φ′,xi ) P [ σΦ′ (xi ) = 1 |Φ′, ∀y ∈ ∂2ℓxi :σΦ′ (y) =σ(y) ] P [ ∀y ∈ ∂2ℓxi :σΦ′ (y) =σ(y) |Φ′ ] . (6.19) Further, since for T ∈Tℓ and xi ∈ VT we have φ′ ℓ (xi ) ∼= T , condition T2 implies that ∣∣∣P [ σΦ′ (xi ) = 1 |Φ′, ∀y ∈ ∂2ℓxi :σΦ′ (y) =σ(y) ] −P [ τ(ℓ)(x) = 1 |T(ℓ) = T ]∣∣∣< ε. (6.20) Combining (6.19) and (6.20), we obtain (6.18). 27 To complete the proof, we recall from Fact 4.2 thatπ(ℓ) d ,k is precisely the distribution ofP [ τ(ℓ)(x) = 1 |T(ℓ) ] . There- fore, coupling the formulas T(ℓ),Φ′ on the event Tℓ we have W1(π′ n ,π(ℓ) d ,k ) ≤P [ T(ℓ) ̸∈Tℓ ] + 1 n ∑ T∈Tℓ ∑ x∈VT ∣∣∣P [ τ(ℓ)(x) = 1 |T(ℓ) = T ] −P[ σΦ′ (x) = 1 |Φ′]∣∣∣+ε [by (6.16)] ≤ 3ε [by T1 and (6.18)]. Combining this bound with (6.17) completes the proof. □ Proof of Proposition 2.11. The first assertion follows from Proposition 2.1, Lemma 6.3 and the fact that, since 0 < d < duniq(k) < dsat(k), we have that P [ Z (Φ′) > 0 ]= 1−o(1). The second follows a routine argument, which we present below for the case ℓ= 2 and it is standard to extend to any finite ℓ (see [65, Proposition 2.5]). Let t =Θ(loglogn) and recall the definitions of φ′ t (xi ), Tt and St (Φ′, xi ) from the proof of Lemma 6.3. Consider the event D= {φ′ t (x1),φ′ t (x2) are disjoint tree formulas}. From Lemma 3.5, we have that P [D] = 1−o(1). On the event D, Lemma 6.3 implies that for every σ1,σ2 ∈ {±1}, and τ1 ∈ St (Φ′, x1),τ2 ∈ St (Φ′, x2) we have |P[ σΦ′ (xi ) =σi |Φ′, ∀y ∈ ∂2t xi :σΦ′ (y) = τi (y) ]−P[ σΦ′ (xi ) =σi |Φ′] | = o(1) , for i = 1,2 . (6.21) Therefore, from the law of total probability and the triangle inequality we see that for every σ1,σ2 ∈ {±1} ∣∣P[ σΦ′ (x1) =σ1,σΦ′ (x2) =σ2 |Φ′]−P[ σΦ′ (x1) =σ1 |Φ′]P[ σΦ′ (x2) =σ2 |Φ′]∣∣ ≤ ∣∣∣ ∣∣P[ σΦ′ (x1) =σ1 |Φ′,σΦ′ (x2) =σ2 ]−Eτ1,τ2 [ P [ σΦ′ (x1) =σ1,σΦ′ (x2) =σ2 |Φ′,τ1,τ2 ]]∣∣ − ∣∣Eτ1,τ2 [ P [ σΦ′ (x1) =σ1 |Φ′,τ1 ] P [ σΦ′ (x2) =σ2 |Φ′,τ2 ]]−P[ σΦ′ (x2) =σ2 |Φ′]P[ σΦ′ (x2) =σ2 |Φ′]∣∣ ∣∣∣ = ∣∣∣ ∣∣Eτ1,τ2 [ P [ σΦ′ (x1) =σ1 |Φ′,τ1 ] P [ σΦ′ (x2) =σ2 |Φ′,τ2 ]]−P[ σΦ′ (x2) =σ2 |Φ′]P[ σΦ′ (x2) =σ2 |Φ′]∣∣ ∣∣∣ ≤ Eτ1 ∣∣P[ σΦ′ (x1) =σ1 |Φ′,τ1 ]−P[ σΦ′ (x1) =σ1 |Φ′]∣∣+Eτ2 ∣∣P[ σΦ′ (x2) =σ2 |Φ′,τ2 ]−P[ σΦ′ (x2) =σ2 |Φ′]∣∣ = o(1). [by (6.21)] Summing over the four sign combinations of σ1,σ2 gives the desired result. □ 6.3. Proof of Proposition 2.5. As in the proof of Lemma 6.1 let c1, . . . ,c∆′′ be the new clauses added by CPL2 and let x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k be their constituent variables. Let X = {x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k }. For ε > 0 and z ∈ R define λε(z) = log(z ∨ ε). Finally, let (si )i≥0 be a sequence of uniformly random ±1-valued random variables, mutually independent and independent of all other randomness. Lemma 6.4. Assume that d < duniq(k). There exists B = B(d ,k) > 0 such that for all 0 < ε< 1 we have limsup n→∞ E [( ∆′′∑ i=1 λε ( 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] ))2 | Z (Φ′) > 0 ] ≤ B. Proof. Given Z (Φ′) > 0 we have 0 ≥λε ( 1− k∏ j=1 P [ σ(x1, j ) ̸= sign(x1, j ,c1) |Φ′] ) ≥λε ( 1−P[ σ(x1,1) ̸= sign(x1,1,c1) |Φ′]) . (6.22) Recalling that∆′′ ∼ Po(d(k −1)/k), we combine (6.22) with Cauchy-Schwarz to obtain B ′ = B ′(d ,k) > 0 such that E [( ∆′′∑ i=1 λε ( 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] ))2 | Z (Φ′) > 0 ] ≤ B ′ ·E [ λε ( 1−P[ σ(x1,1) ̸= sign(x1,1,c1) |Φ′])2 | Z (Φ′) > 0 ] . (6.23) 28 Further, since the function λε is bounded and continuous for every ε > 0 and since sign(x1,1,c1) is chosen inde- pendently ofΦ′, Proposition 2.11 shows that for any ε> 0, E [ λε ( 1−P[ σ(x1,1) ̸= sign(x1,1,c1) |Φ′])2 | Z (Φ′) > 0 ] = E [ λε ( µπd ,k ,1,1 )2 ] +o(1) ≤ E [ log ( µπd ,k ,1,1 )2 ] +o(1). (6.24) Since Proposition 2.1 shows that E [ log ( µπd ,k ,1,1 )2 ] =O(1), the assertion follows from (6.23) and (6.24). □ Lemma 6.5. Assume that d < duniq(k). For any δ> 0 there exists ε0 > 0 such that for all ε0 > ε> 0 we have limsup n→∞ ∣∣∣∣∣E [ log Z (Φ′′)∨1 Z (Φ′)∨1 ] − d(k −1) k E [ λε ( 1− k∏ j=1 P [ σ(x j ) = s j |Φ′] ) | Z (Φ′) > 0 ]∣∣∣∣∣< δ. Proof. We choose small enough ξ = ξ(d ,k,δ) > ζ(ξ) > η = η(ζ) > ε0 = ε0(η) > 0, let 0 < ε < ε0 and assume that n ≥ n0(ε) is large enough. Also let γ= γ(n) = o(1) be a sequence that tends to zero sufficiently slowly. Additionally, let E be the event that all of the following conditions occur. E1: Z (Φ′) > 0. E2: ∆′′ ≤ ζ−1. E3: |X | = k∆′′. E4: maxx∈X ,s∈{±1}P[σ(x) = s |Φ′] ≤ 1−η. E5: ∑ τ∈{±1}X ∣∣P[∀x ∈X :σ(x) = τ(x) |Φ′]−∏ x∈X P[σ(x) = τ(x) |Φ′] ∣∣< γ. We claim that P [E] ≥ 1−2ξ+o(1). (6.25) Indeed, since 0 < d < duniq(k) < dsat(k), we have that P [ Z (Φ′) > 0 ] = 1 − o(1). Moreover, since ∆′′ ∼ Po(d(k − 1)/k), Markov’s inequality shows that P [ ∆′′ > ζ−1 ] ≤ ζd < ξ. Further, since the new clauses c1, . . . ,c∆′′ are chosen independently, we have P [|X | = k∆′′ |∆′′ ≤ ζ−1 ]= 1−O(1/n). Moreover, per Proposition 2.11 we see that the joint distribution on the assignments over X must be approxi- mately the product measure. The tails of the limiting distribution of the latter are controlled by (2.1). Therefore, for small enough η we should have P [ max x∈X ,s∈{±1} P[σ(x) = s |Φ′] ≤ 1−η |∆′′ ≤ ζ−1, Z (Φ′) > 0 ] ≥ 1−ξ . Similarly, Proposition 2.11 shows together with Markov’s inequality that P [ E5 occurs |∆′′ ≤ ζ−1, Z (Φ′) > 0 ]= 1−o(1) , provided that γ→∞ sufficiently slowly. Thus, we obtain (6.25). Furthermore, (6.25) implies together with Proposition 2.7 and Hölder’s inequality that E ∣∣∣∣(1− 1E) · log Z (Φ′′) Z (Φ′) ∣∣∣∣≤ δ/3+o(1), (6.26) provided that ξ= ξ(d ,k,δ) > 0 is small enough. Analogously, (6.25), Lemma 6.4 and Cauchy-Schwarz yield E ∣∣∣∣∣(1− 1E)λε ( 1− k∏ j=1 P [ σ(x j ) = s j |Φ′] )∣∣∣∣∣≤ δ/3+o(1). (6.27) Thus, we confine ourselves to the event E, on which we have Z (Φ′), Z (Φ′′) > 0 due to E1, E3, E4 and E5. Hence, log Z (Φ′′)∨1 Z (Φ′)∨1 = log Z (Φ′′) Z (Φ′) = log ∑ τ∈{±1}X 1 { τ |= c1, . . . ,c∆′′ } P[∀x ∈X :σ(x) = τ(x) |Φ′] = log ∑ τ∈{±1}X 1 { τ |= c1, . . . ,c∆′′ } ∏ x∈X P[σ(x) = τ(x) |Φ′]+o(1) [by E4, E5] = ∆′′∑ i=1 log [ 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] ] +o(1) [by E3]. (6.28) 29 Further, E4 ensures that for any 1 ≤ i ≤∆′′,∣∣∣∣∣log [ 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] ] −λε [ 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] ]∣∣∣∣∣< ξ. (6.29) Thus, combining (6.28) and (6.29), we obtain E ∣∣∣∣∣1E ( log Z (Φ′′)∨1 Z (Φ′)∨1 − ∆′′∑ i=1 λε ( 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] ))∣∣∣∣∣< δ/3+o(1). (6.30) Further, combining (6.26) and (6.30) with Lemma 6.4, we obtain ∣∣∣∣∣E [ log Z (Φ′′)∨1 Z (Φ′)∨1 ] −E [ ∆′′∑ i=1 λε ( 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] ) | Z (Φ′) > 0 ]∣∣∣∣∣< δ+o(1). (6.31) Finally, since the clauses c1, . . . ,c∆′′ are drawn uniformly and independently and since the distribution of Φ′ is invariant under permutation of the variable nodes, we find E [ ∆′′∑ i=1 λε ( 1− k∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] | Z (Φ′) > 0 )] = d(k −1) k E [ λε ( 1− k∏ j=1 P [ σ(x j ) = s j ] |Φ′ ) | Z (Φ′) > 0 ] . (6.32) Combining (6.31) and (6.32) completes the proof. □ Proof of Proposition 2.5. Proposition 2.11 shows together with Lemma 6.5 that E [ log Z (Φ′′)∨1 Z (Φ′)∨1 ] = d(k −1) k E [ λε ( 1− k∏ j=1 µπd ,k ,1, j )] +oε(1), (6.33) with oε(1) hiding a term that vanishes in the limit ε→ 0. Furthermore, in light of (2.1) the monotone convergence theorem yields E [ log ( 1− k∏ j=1 µπd ,k ,1, j )] = lim ε→0 E [ λε ( 1− k∏ j=1 µπd ,k ,1, j )] . (6.34) The assertion follows from (6.33) and (6.34). □ 6.4. Proof of Proposition 2.6. We adapt the steps from Section 6.3 to the coupling ofΦ′,Φ′′′. Recall that the latter is obtained by adding to Φ′ a single variable xn+1 along with ∆′′′ clauses b1, . . . ,b∆′′′ that each contain xn+1 and k − 1 other variables. Thus, let x1,1, . . . , x1,k−1, . . . , x∆′′′,1, . . . , x∆′′′,k−1 ∈ {x1, . . . , xn} be the variables other than xn+1 that appear in b1, . . . ,b∆′′′ and let X = {x1,1, . . . , x∆′′′,k−1} be the set comprising all these variables. Lemma 6.6. Assume that 0 < d < duniq(k). There exists B = B(d ,k) > 0 such that for all 0 < ε< 1 we have limsup n→∞ E [ λε ( ∑ s∈{±1} ∆′′′∏ i=1 ( 1− 1{sign(xn+1,bi ) ̸= s} k−1∏ j=1 P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] ))2 | Z (Φ′) > 0 ] ≤ B. Proof. Given thatΦ′ is satisfiable, and noticing that λε is increasing, and ε ∈ (0,1), we see that 0∧λε ( ∑ s∈{±1} ∆′′′∏ i=1 ( 1− 1{sign(xn+1,bi ) ̸= s} k−1∏ j=1 P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] )) ≥λε ( ∑ s∈{±1} ∆′′′∏ i=1 1− k−1∏ j=1 P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] ) ≥λε ( ∆′′′∏ i=1 1−P[σ(x i ,1) ̸= sign(x i ,1,bi ) |Φ′] ) =λε ( ∆′′′∏ i=1 P[σ(x i ,1) = sign(x i ,1,bi ) |Φ′] ) ≥ ∆′′′∑ i=1 λε(P[σ(x i ,1) = sign(x i ,1,bi ) |Φ′]). (6.35) 30 We also notice that 0∨λε ( ∑ s∈{±1} ∆′′′∏ i=1 ( 1− 1{sign(xn+1,bi ) ̸= s} k−1∏ j=1 P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] )) < 1 . (6.36) In light of the above, we now bound E [ λε ( ∑ s∈{±1} ∆′′′∏ i=1 ( 1− 1{sign(xn+1,bi ) ̸= s} k−1∏ j=1 P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] ))2 | Z (Φ′) > 0 ] ≤ E [ 1+ ( ∆′′′∑ i=1 λε(P[σ(x i ,1) = sign(x i ,1,bi ) |Φ′] )2 | Z (Φ′) > 0 ] [from (6.35),(6.36)] ≤ d(d +1)E [ 1+ ( λε(P[σ(x1,1) = sign(x1,1,b1) |Φ′]) )2 | Z (Φ′) > 0 ] [∆′′′ ∼ Po(d)] ≤ d(d +1) ( 1+E[ λε(P[σ(x1,1) = sign(x1,1,b1) |Φ′])2 | Z (Φ′) > 0 ]) . (6.37) Further, Proposition 2.11 implies that for any ε> 0, E [ λε(P[σ(x1,1) = sign(x1,1,b1) |Φ′])2 | Z (Φ′) > 0 ]= E [ λε(µπd ,k ,1,1)2 ] +o(1) ≤ E [ log2µπd ,k ,1,1 ] +o(1). (6.38) Finally, the assertion follows from (6.37) and (6.38). □ Lemma 6.7. Assume that 0 < d < duniq(k). For any δ> 0 there exists ε0 > 0 such that for all ε0 > ε> 0 we have limsup n→∞ ∣∣∣∣E [ log Z (Φ′′′)∨1 Z (Φ′)∨1 ] −E [ λε ( ∑ s∈{±1} ( ∆′′′∏ i=1 1− 1{sign(xn+1,bi ) ̸= s} k−1∏ j=1 P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] ))2 | Z (Φ′) > 0 ]∣∣∣∣< δ. Proof. Choose small enough ξ= ξ(d ,k,δ) > ζ(ξ) > η= η(ζ) > ε0 = ε0(η) > 0, let 0 < ε< ε0, suppose that n > n0(ε) is sufficiently large and let 0 < γ= γ(n) = o(1) be a sequence that converges to zero slowly. Let E be the event that the following conditions occur. E1: Z (Φ′) > 0. E2: ∆′′′ ≤ ζ−1. E3: |X | = (k −1)∆′′′. E4: maxx∈X ,s∈{±1}P[σ(x) = s |Φ′] ≤ 1−η. E5: ∑ τ∈{±1}X ∣∣P[∀x ∈X :σ(x) = τ(x) |Φ′]−∏ x∈X P[σ(x) = τ(x) |Φ′] ∣∣< γ. As in the proof of Lemma 6.5 we find that P [E] ≥ 1−2ξ+o(1). (6.39) Let Lε =λε ( ∑ s∈{±1} ∆′′′∏ i=1 ( 1− 1{sign(xn+1,bi ) ̸= s} k−1∏ j=1 P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] )) for brevity. Combining Proposition 2.7, Lemma 6.6 and (6.39) and using Hölder’s inequality, we obtain E ∣∣∣∣(1− 1E) log Z (Φ′′) Z (Φ′) ∣∣∣∣≤ δ/3+o(1), E ∣∣(1− 1E)Lε | Z (Φ′) > 0 ∣∣≤ δ/3+o(1). (6.40) 31 Hence, we are left to compare E ∣∣∣1E · log Z (Φ′′) Z (Φ′) ∣∣∣ and E ∣∣1E ·Lε | Z (Φ′) > 0 ∣∣. On the event E we have Z (Φ′), Z (Φ′′′) > 0. Consequently, log Z (Φ′′′)∨1 Z (Φ′)∨1 = log Z (Φ′′′) Z (Φ′) = log ∑ τ∈{±1}X∪{xn+1} 1 { τ |= b1, . . . ,b∆′′′ } P[∀x ∈X :σ(x) = τ(x) |Φ′] = log ∑ τ∈{±1}X∪{xn+1} 1 { τ |= b1, . . . ,b∆′′′ } ∏ x∈X P[σ(x) = τ(x) |Φ′]+o(1) [by E4, E5] = log [ ∑ s∈{±1} ∆′′′∏ i=1 ( 1− 1{sign(xn+1,bi ) ̸= s }k−1∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] )] +o(1) [by E3]. (6.41) Now, E4 guarantees that log [ ∑ s∈{±1} ∆′′′∏ i=1 ( 1− 1{sign(xn+1,bi ) ̸= s }k−1∏ j=1 P [ σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′] )] = Lε. (6.42) Therefore, we combine (6.41) and (6.42) to obtain E ∣∣∣∣1E ( log Z (Φ′′′)∨1 Z (Φ′)∨1 −Lε )∣∣∣∣< δ/3+o(1). (6.43) Finally, the assertion follows from (6.40) and (6.43). □ Proof of Proposition 2.6. Following similar steps as in the proof of Proposition 2.5, we see that the assertion follows from Lemma 6.7, Proposition 2.1, Proposition 2.11, and the dominated convergence theorem. □ Proof of Proposition 2.3. Immediate from Fact 2.4, Proposition 2.5 and Proposition 2.6. □ 7. PROOF OF PROPOSITION 2.15 7.1. Proof of Lemma 2.12. The proof is by induction on the height of the tree. The following claim summarises the main step of the induction. Claim 7.1. For all ℓ≥ 0, all variables x of T(ℓ) and all satisfying assignments τ ∈ S(T(ℓ)) we have Z (T(ℓ) x ,τ,τ+(x)) Z (T(ℓ) x ,τ) ≤ Z (T(ℓ) x ,τ+,τ+(x)) Z (T(ℓ) x ,τ+) . (7.1) Proof. For boundary variables x ∈ ∂2ℓx there is nothing to show because the r.h.s. of (7.1) equals one. Hence, consider a variable x ∈ ∂2qx for some q < ℓ. If Z (T(ℓ) x ,τ,τ+(x)) = 0, then (7.1) is trivially satisfied. Hence, assume that Z (T(ℓ) x ,τ,τ+(x)) > 0. Let a+ 1 , . . . , a+ g be the children (clauses) of x with sign(x, a+ i ) =τ+(x). Also let y11, . . . , y1(k−1), . . . , yg 1, . . . , yg (k−1) be the children (variables) of a+ 1 , . . . , a+ g . Similarly, let a− 1 , . . . , a− h be the children of x with sign(x, a− i ) =−τ+(x) and let z11, . . . , z1(k−1), . . . , zh1, . . . , zh(k−1) be their children. We claim that for all τ ∈ S(T(ℓ)), Z (T(ℓ) x ,τ,τ+(x)) = ( g∏ i=1 k−1∏ t=1 Z (T(ℓ) yi t ,τ) ) · h∏ j=1 ( k−1∏ t=1 Z (T(ℓ) z j t ,τ)− k−1∏ t=1 Z (T(ℓ) z j t ,τ,−τ+(z j t )) ) , (7.2) Z (T(ℓ) x ,τ,−τ+(x)) = g∏ i=1 ( k−1∏ t=1 Z (T(ℓ) yi t ,τ)− k−1∏ t=1 Z (T(ℓ) yi t ,τ,τ+(yi t )) ) · ( h∏ j=1 k−1∏ t=1 Z (T(ℓ) z j t ,τ) ) . (7.3) For setting x to τ+(x) satisfies a+ 1 , . . . , a+ g ; hence, arbitrary satisfying assignments of the sub-trees T(ℓ) yi t can be com- bined, which explains the first product in (7.2). By contrast, upon assigning x the value τ+(x) we need to ensure that each of the clauses a− 1 , . . . , a− g are satisfied by at least one variable other than x. This explains the second factor of (7.2). A similar argument yields (7.3). Dividing (7.3) by (7.2) and invoking the induction hypothesis (for q +1), 32 we obtain Z (T(ℓ) x ,τ,−τ+(x)) Z (T(ℓ) x ,τ,τ+(x)) = g∏ i=1 ( 1− k−1∏ t=1 Z (T(ℓ) yi t ,τ,τ+(yi t )) Z (T(ℓ) yi t ,τ) ) · h∏ j=1 ( 1− k−1∏ t=1 Z (T(ℓ) z j t ,τ,−τ+(zi )) Z (T(ℓ) z j t ,τ) )−1 ≥ g∏ i=1 ( 1− k−1∏ t=1 Z (T(ℓ) yi t ,τ+,τ+(yi t )) Z (T(ℓ) yi t ,τ+) ) · h∏ j=1 ( 1− k−1∏ t=1 Z (T(ℓ) z j t ,τ+,−τ+(zi )) Z (T(ℓ) z j t ,τ+) )−1 = Z (T(ℓ) x ,τ+,−τ+(x)) Z (T(ℓ) x ,τ+,τ+(x)) , completing the induction. □ Proof of Lemma 2.12. Applying Claim 7.1 to x = x completes the proof of Lemma 2.12. □ 7.2. Proof of Lemma 2.14. We employ the PULP algorithm introduced in Section 2.3 and its analysis on the random tree from Section 3.2. Recall that given an initial set of literals L , PULP returns a superset L̄ with the property that the partial assignment obtained from setting all literals of L̄ to true, leaves no clause with only unsatisfying literals. Let us write L̄ = L̄x,s for the set returned by PULP algorithm, initialized with the literal set L = {s · x}. Claim 7.2. Let 0 ≤ t < ℓ and assume that x ∈ ∂2t T x, s ∈ {±1}, satisfy |L̄x,s | < ℓ− t . Then for all τ ∈ S(T(ℓ)) Z (T(ℓ) x ,τ) ≤ 2|L̄x,s | ·Z (T(ℓ) x ,τ, s) . (7.4) Proof. Notice that under our assumption on the size of L̄x,s , the assignment τ does not clash with the one imposed by PULP. The assertion therefore follows immediately from the same argument as in the proof of Lemma 2.8. □ Claim 7.3. We have limt→∞P [|∂2t T x| > (200d · (k −1))t ]= 0. Proof. This is an immediate consequence of Lemma 3.2. □ Proof of Lemma 2.14. Assume that ℓ> ct c for a large enough c = c(d ,k) > 0 and that t > t0 = t0(d ,k) is sufficiently large. Then Corollary 3.3 shows that P [|L̄x,±1| ≥ t c]≤ exp(−t 2) . (7.5) Combining Claim 7.3 with (7.5) and using the union bound, we obtain a sequence εt → 0 such that P [∀x ∈ ∂2t T x : |L̄x,±1| < t c]≥ 1−εt . (7.6) If x ∈ ∂2t T x satisfies |L̄x,±1| < t c and ℓ> ct c , then Claim 7.2 yields that for all x ∈ ∂2t x ∣∣∣η(ℓ) x ∣∣∣≤ log Z (T(ℓ) x ,σ+) Z (T(ℓ) x ,σ+,+1) + log Z (T(ℓ) x ,σ+) Z (T(ℓ) x ,σ+,−1) ≤ |L̄x,+1|+ |L̄x,−1| ≤ 2t c . (7.7) The result now follows from (7.6) and (7.7). □ 7.3. Proof of Proposition 2.15. We focus on the operator LL⋆d ,k introduced in Section 2.5. Let ρ = ( ρ ,ρ⊕,ρ⊖ ) , and ρ′ = ( ρ′ ,ρ′ ⊕,ρ′ ⊖ ) be two arbitrary triplets in P (−∞,∞] ×P (0,+∞] ×P (−∞,0], and write ρ̂ = ( ρ̂ ρ̂⊕, ρ̂⊖ ) and ρ̂′ = ( ρ̂′ ρ̂ ′ ⊕, ρ̂′ ⊖ ) for the images LL⋆d ,k (ρ) and LL⋆d ,k (ρ′), respectively. We wish to bound distd ( ρ̂, ρ̂′) in terms of distd ( ρ,ρ′). To this end, we begin with bounding the W1-distance separately for each of the coordinates (ρ̂⊕, ρ̂′ ⊕), (ρ̂⊖, ρ̂′ ⊖) and (ρ̂ , ρ̂′ ). Observe that it is sufficient to consider only W1(ρ̂⊕, ρ̂′ ⊕) and W1(ρ̂⊖, ρ̂′ ⊖), as the triangle inequality implies that W1(ρ̂ , ρ̂′ ) ≤W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖). To spell out our bounds, we need to introduce some additional notation. Recall that for i , j ≥ 1 the random variables η ,i , j , η⊕,i , j , η⊖,i , j follow the law of ρ , ρ⊕, ρ⊖, respectively. Similarly, let η′ ,i , j , η′ ⊕,i , j , η′ ⊖,i , j be random variables with law ρ′ ,ρ′ ⊕ and ρ′ ⊖, respectively. We denote with η∧ ,i , j the random variable η ,i , j ∧η′ ,i , j , and with η∨ ,i , j the random variable η ,i , j ∨η′ ,i , j . Similarly, we write η∧ ⊕,i , j = η⊕,i , j ∧η′ ⊕,i , j and η∨ ⊕,i , j = η⊕,i , j ∨η′ ⊕,i , j , and also write η∧ ⊖,i , j =η⊖,i , j ∧η′ ⊖,i , j and η∨ ⊖,i , j =η⊖,i , j ,η′ ⊖,i , j . 33 Moreover, for a sign ε ∈ {±1} and a vector r = (r ,r⊕,r⊖,r#) of non-negative integers with r +r⊕+r⊖+r# = k −1 and 1 ≤ i ≤ r , 1 ≤ j ≤ r⊕, 1 ≤ ℓ≤ r⊖, we let D i (z,r ;ε) = ∣∣∣∣ ∂ ∂z log ( 1− 1 2r# Γ ( ε(η ,1,1, . . . ,η ,1,i−1, z,η′ ,1,i+1, . . .η′ ,1,r ) ) Γ ( ε(η′ ⊕,1,1, . . . ,η′ ⊕,1,r⊕ ) ) Γ ( ε(η′ ⊖,1,1, . . . ,η′ ⊖,1,r⊖ ) ))∣∣∣∣ . Analogously, we define D⊕ j (z,r ;ε) = ∣∣∣∣ ∂ ∂z log ( 1− 1 2r# Γ ( ε(η ,1,1, . . .η ,1,r ) ) Γ ( ε(η⊕,1,1, . . . ,η⊕,1, j−1, z,η′ ⊕,1, j+1, . . . ,η′ ⊕,1,r⊕ ) ) Γ ( ε(η′ ⊖,1,1, . . .η′ ⊖,1,r⊖ ) ))∣∣∣∣ , D⊖ ℓ (z,r ;ε) = ∣∣∣∣ ∂ ∂z log ( 1− 1 2r# Γ ( ε(η ,1,1, . . .η ,1,r ) ) Γ ( ε(η⊕,1,1, . . . ,η⊕,1,r⊕ ) ) Γ ( ε(η⊖,1,1, . . . ,η⊖,1,ℓ−1, z,η′ ⊖,1,ℓ+1, . . . ,η′ ⊕,1,r⊖ ) ))∣∣∣∣ . With the above notation in place, we are now ready to bound W1(ρ̂⊕, ρ̂′ ⊕). For each of the pairs of distributions (ρ ,ρ′ ), (ρ⊕,ρ′ ⊕), and (ρ⊖,ρ′ ⊖), fix an arbitrary coupling among its coordinates. Lemma 7.4. W1(ρ̂⊕, ρ̂′ ⊕) is upper bounded by d/2 1−e− d 2 ·E [r ,1∑ i=1 ∫ η∨ ,1,i η∧ ,1,i D i (wi ,r 1;+1)dwi + r⊕,1∑ j=1 ∫ η∨⊕,1, j η∧⊕,1, j D⊕ j (y j ,r 1;+1)dy j + r⊖,1∑ ℓ=1 ∫ η∨⊖,1,ℓ η∧⊖,1,ℓ D⊖ ℓ (zℓ,r 1;+1)dzℓ ] . (7.8) Proof. Let us writeΞΞΞ′i , j (ε,r ) for the expression in the r.h.s. of (2.19) where distribution ρ′ is used instead of ρ, i.., ΞΞΞ′i , j (ε,r ) = 1− 1 2r# Γ ( ε ( η′ ,4i+ j ,1, . . . ,η′ ,4i+ j ,r )) Γ ( ε ( η′ ⊕,4i+ j ,1, . . . ,η′ ⊕,4i+ j ,r⊕ )) Γ ( ε ( η′ ⊖,4i+ j ,1, . . . ,η′ ⊖,4i+ j ,r⊖ )) . (7.9) By identically coupling the number of clauses and the types of the children variables of each clause in ρ̂⊕, ρ̂′ ⊕, we see that by the definition of the W1 norm, W1(ρ̂⊕, ρ̂′ ⊕) ≤ E [∣∣∣∣∣− d⋆ +∑ i=1 log ΞΞΞi ,3(+1,r 4i+3) ΞΞΞ′i ,3(+1,r 4i+3) ∣∣∣∣∣ ] . Applying Wald’s lemma, we further obtain W1(ρ̂⊕, ρ̂′ ⊕) ≤ d/2 1−e−d/2 ·E [∣∣∣∣∣log ΞΞΞ1,3(+1,r 7) ΞΞΞ′1,3(+1,r 7) ∣∣∣∣∣ ] = d/2 1−e−d/2 ·E [∣∣∣∣∣log ΞΞΞ0,1(+1,r 1) ΞΞΞ′0,1(+1,r 1) ∣∣∣∣∣ ] . (7.10) Let us now focus on the expectation in the r.h.s. of (7.10). Recalling the definition of ΞΞΞ in (2.19), and the definition of ΞΞΞ′ in (7.9), we expand log ΞΞΞ0,1(+1,r 1) ΞΞΞ′0,1(+1,r 1) = log 1−2−r# ·Γ ( η ,1,1, . . . ,η ,1,r ,1 ) ·Γ ( η⊕,1,1, . . . ,η⊕,1,r⊕,1 ) ·Γ ( η⊖,1,1, . . . ,η⊖,1,r⊖,1 ) 1−2−r# ·Γ ( η′ ,1,1, . . . ,η′ ,1,r ,1 ) ·Γ ( η′ ⊕,1,1, . . . ,η′ ⊕,1,r⊕,1 ) ·Γ ( η′ ⊖,1,1, . . . ,η′ ⊖,1,r⊖,1 ) . (7.11) Telescoping over the arguments of the functions Γ in the r.h.s of (7.11), invoking the fundamental theorem of calculus for each term, and applying the triangle inequality we further obtain ∣∣∣∣∣log ΞΞΞ0,1(+1,r 1) ΞΞΞ′0,1(+1,r 1) ∣∣∣∣∣≤ r ,1∑ i=1 ∣∣∣∣∣ ∫ η ,1,i η′ ,1,i D i (wi ,r 1;+1)dwi ∣∣∣∣∣+ r⊕,1∑ j=1 ∣∣∣∣∣ ∫ η⊕,1, j η′⊕,1, j D⊕ j (y j ,r 1;+1)dy j ∣∣∣∣∣+ r⊖,1∑ ℓ=1 ∣∣∣∣∣ ∫ η⊖,1,ℓ η′⊖,1,ℓ D⊖ ℓ (zℓ,r 1;+1)dzℓ ∣∣∣∣∣ . Plugging the above into (7.10) gives the result. □ Following the same steps as above, but replacing ‘+1’ with ‘−1’, yields the corresponding bound for W1(ρ̂⊖, ρ̂′ ⊖). Lemma 7.5. W1(ρ̂⊖, ρ̂′ ⊖) is upper bounded by d/2 1−e− d 2 ·E [r ,1∑ i=1 ∫ η∨ ,1,i η∧ ,1,i D i (wi ,r 1;−1)dwi + r⊕,1∑ j=1 ∫ η∨⊕,1, j η∧⊕,1, j D⊕ j (y j ,r 1;−1)dy j + r⊖,1∑ ℓ=1 ∫ η∨⊖,1,ℓ η∧⊖,1,ℓ D⊖ ℓ (zℓ,r 1;−1)dzℓ ] . (7.12) 34 Exploiting the signs of the variables with types ⊕ and ⊖, we obtain the following bounds for each of the D- functions. For λ ∈ (0,1], we define the real functionψλ : [0,1] →R as ψλ (w) = λ ·w 1−λ ·w · (1−w) . (7.13) It is easy to check thatψλ′ (w) ≤ψλ(w), for every λ′ ≤λ. Claim 7.6. For every r = ( r ,r⊕,r⊖,r# ) , and i ∈ [r ] we have that D i (wi ,r ;+1) ≤ψ2−r#−r⊖ ( 1+ tanh(wi /2) 2 ) and D i (wi ,r ;−1) ≤ψ2−r#−r⊕ ( 1− tanh(wi /2) 2 ) . (7.14) Similarly, we also have that for j ∈ [r⊕], D⊕ j ( y j ,r ;+1 )≤ψ2−r#−r⊖ ( 1+ tanh(y j /2) 2 ) and D⊕ j ( y j ,r ;−1 )≤ψ2−r#−(r⊕−1) ( 1− tanh(y j /2) 2 ) , (7.15) and for ℓ ∈ [r⊖] D⊖ ℓ (zℓ,r ;+1) ≤ψ2−r#−(r⊖−1) ( 1+ tanh(zℓ/2) 2 ) and D⊖ ℓ (zℓ,r ;−1) ≤ψ2−r#−r⊕ ( 1− tanh(zℓ/2) 2 ) . (7.16) Proof. We only prove the first inequality of (7.14) as the rest of them follow in a similar manner. A straightforward calculation shows that for z ∈Rq ,ε ∈ {±1}, and i ∈ [q] we have ∂ ∂zi Γ (ε · z) = ε · 1− tanh(ε · zi /2) 2 ·Γ (ε · z) . (7.17) Writing K = 2−r#Γ ( η ,1,1, . . . ,η ,1,i−1,η′ ,1,i+1, . . .η′ ,1,r ) Γ ( η′ ⊕,1,1, . . . ,η′ ⊕,1,r⊕ ) Γ ( η′ ⊖,1,1, . . . ,η′ ⊖,1,r⊖ ) , applying the chain rule, and using (7.17), we see that D i (wi ,r ;+1) =ψK ( 1+ tanh(wi /2) 2 ) . (7.18) Using the fact that ρ′ ⊖ is supported in (−∞,0], and that Γ ≤ 1, we obtain K ≤ 2−r#−r⊖ . The monotonicity ofψλ with respect to the parameter λ concludes the proof. □ Using Claim 7.6, and maximising each of the functionsψλ appearing in (7.14)–(7.16), we can recover the bounds of [50]. To obtain sharper bounds, a natural idea is to optimise groups of summands, instead of optimising each D-summand of W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) in isolation. In particular, it is tempting to pair terms of the form D(·,−1) with corresponding terms of the form D(·,+1), as Lemma 7.7 suggests. Lemma 7.7. Let φλ : [0,1] → R to be the function φλ(w) = ψλ(w)+ψλ(1− w). For every λ ∈ (0,1], we have that φλ(w) ≤φλ(1/2) = λ/2 1−λ/2 , for all w ∈ [0,1]. Proof. For λ= 1, we have thatψλ(w) = w implyingφλ(w) = 1, and thus, the result holds trivially. Let now λ ∈ (0,1). Differentiatingψλ gives ψ′ λ(w) = λ2w2 −2λw +λ λ2w2 −2λw +1 = 1− 1−λ (1−λw)2 . Therefore, φ′ λ(w) = 1− 1−λ (1−λw)2 − ( 1− 1−λ (1−λ(1−w))2 ) = 1−λ (1−λ(1−w))2 − 1−λ (1−λw)2 . It is straightforward to check that the above expression has only one root at w = 1/2, being non-negative for w ∈ [0,1/2), and non-positive for w ∈ (1/2,1]. Therefore, φλ(1/2) = λ/2 1−λ/2 is the maximum value of φλ. □ However, directly applying Lemma 7.7 to W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) seems hopeless, since in the bounds supplied by Claim 7.6, the parameters of the functions ψ bounding D(·,r,+1)-terms in W1 ( ρ̂⊕, ρ̂′ ⊕ ) are quite different from the parameters of the functionsψ bounding the corresponding D(·,r,−1)-terms in W1 ( ρ̂⊖, ρ̂′ ⊖ ) . The following lemma reveals, a somewhat unexpected, symmetry between W1(ρ̂⊕, ρ̂′ ⊕) and W1(ρ̂⊖, ρ̂′ ⊖), that facilitates our pairing strategy. 35 Some additional notation is in order. We denote with R(k) for the set of all vectors r = (r ,r⊕,r⊖,r#) of non- negative integer entries which sum to k −1. For every r ∈R(k) we use the shorthand P (r ) = (k −1)! r !r⊕!r⊖!r#! ·pr pr⊕ ⊕ pr⊖ ⊖ pr# # , where p , p⊕, p⊖, p# are the probabilities defined in (2.18). Finally, we define E = ∑ r∈R(k) r ≥1 P (r ) · r ·E [∫ η∨ ,1,1 η∧ ,1,1 φ2−r#−r⊖ ( 1+ tanh(w/2) 2 ) dw ] , (7.19) E⊕ = ∑ r∈R(k) r⊕≥1 P (r ) · r⊕ ·E [∫ η∨⊕,1,1 η∧⊕,1,1 φ2−r#−r⊖ ( 1+ tanh(y/2) 2 ) dy ] , (7.20) E⊖ = ∑ r∈R(k) r⊖≥1 P (r ) · r⊖ ·E [∫ η∨⊖,1,1 η∧⊖,1,1 φ2−r#−r⊕ ( 1+ tanh(z/2) 2 ) dz ] . (7.21) Lemma 7.8. We have that W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) ≤ d/2 1−e−d/2 ( E +E⊕+E⊖ ) . (7.22) Proof. Expanding the expectation in (7.8) with respect to r = ( r ,r ⊕,r ⊖,r # ) , and using the shorthand E± (r ) = E [∫ η∨ ,1,1 η∧ ,1,1 D 1 (w,r ;±1)dw ] , E± ⊕(r ) = E [∫ η∨⊕,1,1 η∧⊕,1,1 D⊕ 1 ( y,r ;±1 ) dy ] , E± ⊖(r ) = E [∫ η∨⊖,1,1 η∧⊖,1,1 D⊖ 1 (z,r ;±1)dz ] , we see that W1(ρ̂⊕, ρ̂′ ⊕) ≤ d/2 1−e−d/2 ( ∑ r∈R(k) P (r ) · r ·E+ (r )+ ∑ r∈R(k) P (r ) · r⊕ ·E+ ⊕(r )+ ∑ r∈R(k) P (r ) · r⊖ ·E+ ⊖(r ) ) = d/2 1−e−d/2   ∑ r∈R(k) r ≥1 P (r ) · r ·E+ (r )+ ∑ r∈R(k) r⊕≥1 P (r ) · r⊕ ·E+ ⊕(r )+ ∑ r∈R(k) r⊖≥1 P (r ) · r⊖ ·E+ ⊖(r )   . (7.23) In a similar manner, we derive W1(ρ̂⊖, ρ̂′ ⊖) ≤ d/2 1−e−d/2   ∑ r∈R(k) r ≥1 P (r ) · r ·E− (r )+ ∑ r∈R(k) r⊕≥1 P (r ) · r⊕ ·E− ⊕(r )+ ∑ r∈R(k) r⊖≥1 P (r ) · r⊖ ·E− ⊖(r )   . (7.24) Let us now consider the bound on the sum W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) obtained by summing (7.23), (7.24). We next group each of the three sums in (7.23) with the corresponding sum in (7.24), carefully pairing their terms. Specifically, for the –sums we match the term of ∑ r P (r ) · r ·E+ (r ) corresponding to r = (r ,r⊕,r⊖,r#) with the term of ∑ r ′ P (r ′) · r ′ ·E− (r ′) that corresponds to r ′ = (r ,r⊖,r⊕,r#). Since r 7→ r ′ is a bijection of R(k)∩ {r : r ≥ 1}, and r ′ = r , and P (r ′) = P (r ) we see that ∑ r∈R(k) r ≥1 P (r ) · r ( E+ (r )+E− (r ) )= ∑ r∈R(k) r ≥1 P (r ) · r ( E+ (r )+E− (r ′) ) . (7.25) Invoking the bounds (7.14) of Claim 7.6, and recalling the definitions ofφ, E , we upper bound the r.h.s. of (7.25) by ∑ r∈R(k) r ≥1 P (r )r ( E [∫ η∨ ,1,1 η∧ ,1,1 ψ2−r#−r⊖ ( 1+ tanh(w/2) 2 ) dw ] +E [∫ η∨ ,1,1 η∧ ,1,1 ψ2−r#−r⊖ ( 1− tanh(w/2) 2 ) dw ]) = E . (7.26) The matchings between the terms for the ⊕,⊖–sums of (7.23), (7.24) are more delicate. In particular, for the ⊕–sum it turns out that we can pull off the same trick as above by pairing the term of ∑ r P (r ) · r⊕ ·E+ ⊕(r ) corresponding to 36 the vector r = (r ,r⊕,r⊖,r#) with the term of ∑ r ′′ P (r ′′) ·r ′′ ⊕ ·E− ⊕(r ′′) that corresponds to r ′′ = (r ,r⊖+1,r⊕−1,r#). To see this, note that the mapping r 7→ r ′′ is a bijection of R(k)∩ {r : r⊕ ≥ 1}, leaving the quantity P (r ) · r⊕ invariant as P (r ) · r⊕ = (k −1)! r !(r⊕−1)!r⊖!r#! ·pr pr⊕ ⊕ pr⊖ ⊖ pr# # = (k −1)! r !(r⊖+1)!(r⊕−1)!r#! ·pr pr⊕ ⊕ pr⊖ ⊖ pr# # · (r⊖+1) = P (r ′′) · r ′′ ⊕ . Invoking the bounds (7.15) of Claim 7.6, recalling the definitions of φ, E⊕, and arguing as above, we obtain ∑ r∈R(k) r⊕≥1 P (r ) · r⊕ ( E+ ⊕(r )+E− ⊕(r ′′) )≤ E⊕ . (7.27) Similarly, using the mapping r 7→ r ′′′, with r ′′′= (r ,r⊖−1,r⊕+1,r#), and following the same steps as above, we get ∑ r∈R(k) r⊖≥1 P (r ) · r⊖ ( E+ ⊖(r )+E− ⊖(r ′′′) )≤ E⊖ . (7.28) Summing (7.26)–(7.28) concludes the proof. □ In light of the above, we are now ready to finish the proof of Proposition 2.15. Proof of Proposition 2.15. Applying Lemma 7.7 on the function φ in the r.h.s. of (7.19) gives φ2−r#−r⊖ ( 1+ tanh(w/2) 2 ) ≤ 2−r⊖−r#−1 1−2−r⊖−r#−1 ≤ ( 1 2 )r⊖+r# . (7.29) Plugging the above into (7.19) and applying the binomial theorem, further yields E ≤ (k −1) ·p ( 1− e− d 2 2 )k−2 E [|η ,1,1 −η′ ,1,1| ] . (7.30) Working in a similar manner, we obtain E⊕ ≤ (k −1) ·p⊕ ( 1− e− d 2 2 )k−2 E [|η⊕,1,1 −η′ ⊕,1,1| ] , and E⊖ ≤ (k −1) ·p⊖ ( 1− e− d 2 2 )k−2 E [|η⊖,1,1 −η′ ⊖,1,1| ] . (7.31) Finally, plugging the bounds (7.30), and (7.31) into (7.22) we see that W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) is upper bounded by d · (k −1) 2 · ( 1− e− d 2 2 )k−2[( 1−e− d 2 ) E [|η ,1,1 −η′ ,1,1| ]+e− d 2 E [|η⊕,1,1 −η′ ⊕,1,1| ]+e− d 2 E [|η⊖,1,1 −η′ ⊖,1,1| ]] . (7.32) Recall that we established (7.32) assuming an arbitrary coupling between the coordinates of each pair of distribu- tions (ρ ,ρ′ ), (ρ⊕,ρ′ ⊕), and (ρ⊖,ρ′ ⊖). Therefore, the definition of W1 norm and (7.32), imply the first inequality below, while (7.33) follows by the definition (2.22) of distd W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) ≤ d(k −1) 2 ( 1− e− d 2 2 )k−2 [( 1−e− d 2 ) W1(ρ ,ρ′ )+e− d 2 W1(ρ⊕,ρ′ ⊕)+e− d 2 W1(ρ⊖,ρ′ ⊖) ] ≤ d(k −1) 2 ( 1− e− d 2 2 )k−2 distd (ρ,ρ′) . (7.33) Moreover, as per the triangle inequality we see that W1(ρ̂ , ρ̂′ ) ≤W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) ≤ d(k −1) 2 ( 1− e− d 2 2 )k−2 distd (ρ,ρ′) . (7.34) Plugging the bounds (7.33) and (7.34) into the expression of distd ( ρ̂, ρ̂′) yields distd ( ρ̂, ρ̂′)= (1−e−d/2) ·W1(ρ̂ , ρ̂′ )+e−d/2 ( W1(ρ̂⊕, ρ̂′ ⊕)+W1(ρ̂⊖, ρ̂′ ⊖) )≤ d(k −1) 2 ( 1− e− d 2 2 )k−2 distd (ρ,ρ′) . Recalling the definition of dcon, we see that for d < dcon(k), the operator LL⋆d ,k contracts with respect to the metric distd , as desired. □ 37 7.4. Proof of Proposition 2.13. To get a handle on the η(ℓ) x from (2.12), we show that these quantities can be cal- culated by propagating the extremal boundary condition σ+ bottom-up toward the root of the tree. Specifically, we consider the operator Λ+ T(ℓ) : (−∞,∞]V (T(ℓ)) → (−∞,∞]V (T(ℓ)) , η 7→ η̂=Λ+ T(ℓ) (η) , defined as follows. For all x ∈ ∂2ℓx we set η̂x = ∞. Moreover, for a variable x ∈ ∂2qx with q < ℓ having children clauses a1, . . . , at , and grandchildren variables y1,1, . . . , y1,(k−1), . . . , yt ,1, . . . , yt ,(k−1) we define η̂x =− t∑ i=1 τ+(x)sign(x, ai ) · log ( 1−Γ ( τ+(x)sign(x, ai ) · (ηyi ,1 , . . . ,ηy1,(k−1) ) )) . (7.35) It may not be apparent that the sum above is well-defined as −∞ summands may manifest. The following lemma rules out such possibility and shows that the ℓ-fold iteration ofΛ+ (ℓ) T(ℓ) , initiated all-(+∞) yields η(ℓ) = (η(ℓ) x )x∈V (T(ℓ)). Lemma 7.9. The operatorΛ+ T(ℓ) is well-defined andΛ+ (t ) T(ℓ) (+∞, . . . ,+∞) =η(ℓ) for every t ≥ ℓ. Proof. To show that Λ+ T(ℓ) is well defined we verify that, in the notation of (7.35), η̂x ∈ (−∞,∞] for all x. Indeed, in the expression on the r.h.s. of (7.35) a ±∞ summand can arise only from variables yi , j with ηyi , j =∞. But the definition of τ+ ensures that such yi , j either render a zero summand if τ+(x)sign(x, ai ) =−1, or a +∞ summand if τ+(x)sign(x, ai ) = 1. Thus, the sum is well-defined and η̂x ∈ (−∞,∞]. Further, to verify the identity η(ℓ) =Λ+ (ℓ) T(ℓ) (∞, . . . ,∞), consider a variable x of T(ℓ). Let a+ 1 , . . . , a+ g be the children (clauses) of x with sign(x, a+ i ) = τ+(x). Also let y11, . . . , y1(k−1), . . . , yg 1, . . . , yg (k−1) be the children of a+ 1 , . . . , a+ g . Sim- ilarly, let a− 1 , . . . , a− h be the children of x with sign(x, a− i ) =−τ+(x) and let z11, . . . , z1(k−1), . . . , zh1, . . . , zh(k−1) be their children. Then (7.2), and (7.3) yield η(ℓ) x =− g∑ i=1 log ( 1− k−1∏ q=1 Z (T(ℓ) yi q ,τ+,τ+(yi q )) Z (T(ℓ) yi q ,τ+) ) + h∑ j=1 log ( 1− k−1∏ q=1 Z (T(ℓ) z j q ,τ+,−τ+(z j q )) Z (T(ℓ) z j q ,τ+) ) =− g∑ i=1 log ( 1−Γ ( sign(x, a+ i )τ+(x) · ( η(ℓ) yi 1 , . . . ,η(ℓ) yi (k−1) ))) + h∑ j=1 log ( 1−Γ ( sign(x, a− i )τ+(x) · ( η(ℓ) z j 1 , . . . ,η(ℓ) z j (k−1) ))) . The assertion follows because sign(x, a+ i )τ+(x) = 1 and sign(x, a− i )τ+(x) =−1. □ The next aim is to approximate the ℓ-fold iteration ofΛ+ T(ℓ) , and more specifically the distribution of η(ℓ) x , using a non-random operator. To this end, we need to cope with the ±∞-entries of the vector η(ℓ). This is addressed by Lemma 2.14, proven in Section 7.2, which provides a bound on η(ℓ) x for variables x near the root of the tree. In the following we continue to write c and (εt )t for the number and the sequence supplied by Lemma 2.14. Guided by Lemma 2.14 we consider the vector η(ℓ) ∧t of truncated log-likelihood ratios ( η(ℓ) ∧t ) x =    −2t c if x ∈ ∂2t x and η(ℓ) x <−2t c , 2t c if x ∈ ∂2t x and η(ℓ) x > 2t c , η(ℓ) x otherwise . (7.36) Further, let η(ℓ,t ) be the result of t iterations ofΛ+ T(ℓ) ( · ) starting from η(ℓ) ∧t . The following corollary is a direct conse- quence of Lemma 7.9 and Lemma 2.14. Corollary 7.10. For any ℓ> ct c we have P[η(ℓ,t ) x ̸=η(ℓ) x ] < εt . Proof. Due to Lemma 2.14, the truncation in (7.36) is inconsequential with probability at least 1−εt , in which case η(ℓ,t ) =Λ+ (t ) T(ℓ) ( η(ℓ) ∧t ) =Λ+ (t ) T(ℓ) (η(ℓ)) =Λ+ (ℓ+t ) T(ℓ) (+∞, . . . ,+∞) =η(ℓ) , where the last equality follows from Lemma 7.9. □ Recall that we defined the non-random operator LL⋆d ,k from (2.17), mimicking Λ+ T(ℓ) . To make the connection between the random operatorΛ+ T(ℓ) and LL⋆d ,k precise, we introduce the following concepts. Given a tree formula T we write V (T ), for the set of x variables of T that appear both as positive and negative literals in the sub-tree Tx comprising x and its the progeny. We define V#(T ),V⊕(T ), and V⊖(T ) similarly. Note that the above sets constitute 38 a partition of V (T ). We use tp : V (T ) → { ,⊕,⊖,#} to indicate the part each vertex belongs to. We denote with T (ℓ) the random Galton-Watson formula T conditioned on the root satisfying tp(x) = . We define T(ℓ) ⊕ , and T (ℓ) ⊖ analogously. Degenerately, we also write T(ℓ) # for the formula comprised by a single variable x. Let us denote with η̂(ℓ,t ) the distribution of (η(ℓ) ∧t )x in T(ℓ) . Moreover, let η̄(ℓ−t ) be the distribution of η(ℓ−t ) x · 1 { |η(ℓ−t ) x | ≤ 2t c } +2t c · 1 { η(ℓ−t ) x > 2t c } −2t c · 1 { η(ℓ−t ) x <−2t c } , i.e., the truncation of η(ℓ−t ) x in T (ℓ) . Analogously we define the distributions η̂(ℓ,t ) ⊕ , η̂(ℓ,t ) ⊖ , and η̄(ℓ−t ) ⊕ , η̄(ℓ−t ) ⊖ . Notice that, degenerately, η̂(ℓ,t ) # = η̄(ℓ−t ) # = δ0. Lemma 7.11. For ℓ> ct c we have that ( η̂(ℓ,t ) , η̂(ℓ,t ) ⊕ , η̂(ℓ,t ) ⊖ ) = LL⋆d ,k ( η̄(ℓ−t ) , η̄(ℓ−t ) ⊕ , η̄(ℓ−t ) ⊖ ) . Proof. We use induction on t . Specifically, let ν= ( ν ,ν⊕,ν⊖ ) be any triplet in P (−∞,∞]×P [0,+∞]×P [−∞,0], and ν(t ) = LL⋆(t ) d ,k (ν) be the outcome of the t-fold application of LL⋆d ,k . Moreover, let (ηx )x∈V (T(t )) be a vector of independent samples with ηx ∼ νtp(x). We claim that root value η(t ) x of the random operatorΛ+ (t ) T(t ) , coincides with νtp(x). Indeed, for t = 1 the claim follows readily from the definitions. For the inductive step, we notice that the t- fold application of LL⋆d ,k is obtained by applying LL⋆d ,k to the (t −1)-fold application. Per the induction hypothesis ( Λ+ (t−1) T(t−1) (ηx )x ) x ∼ ν(t−1) tp(x) . (7.37) Applying LL⋆d ,k to ν(t−1) implies the result as the first layer of T(t ) is independent of the subtrees rooted at the grandchildren ∂2x of the root, which are distributed as i.i.d. copies of T(t−1). The lemma follows from applying the above identity to ν= ( η̄(ℓ−t ) , η̄(ℓ−t ) ⊕ , η̄(ℓ−t ) ⊖ ) . □ Refining the definition of the BPd ,k operator in (1.3), we write BP d ,k for the operator obtained from BPd ,k upon conditioning on d+,d− ≥ 1. Similarly BP⊕ d ,k and BP⊖ d ,k are obtained from BPd ,k upon conditioning on d+ ≥ 1,d− = 0, and d+ = 0,d− ≥ 1, respectively. We define π d ,k = BP d ,k ( πd ,k ) , π⊕d ,k = BP⊕ d ,k ( πd ,k ) , π⊖d ,k = BP⊖ d ,k ( πd ,k ) . Let us write γ,γ−1 for the continuous and mutually inverse real functions γ :R→ (0,1), z 7→ (1+ tanh(z/2))/2, γ−1 : (0,1) →R, p 7→ log(p/(1−p)) . (7.38) Let ρ d ,k = γ−1(π d ,k ), and define ρ⊕d ,k ,ρ⊖d ,k similarly. Claim 7.12. The vector ( ρ d ,k ,ρ⊕d ,k ,ρ⊖d ,k ) is a fixed point of the operator LL⋆d ,k . Proof. Let ρd ,k = γ−1 ( πd ,k ) . First, we claim that LL⋆d ,k ( ρd ,k ,ρd ,k ,ρd ,k )= ( ρ d ,k ,ρ⊕d ,k ,ρ⊖d ,k ) . (7.39) Indeed, since all input distributions are the same, by Proposition 2.1, the two summands in the left term of (2.21) corresponding to d⋆ + and d⋆ − are identically distributed, and also identically distributed to the sums that appear in the other two terms. Therefore, (7.39) follows directly from the definitions of BP d ,k ,BP⊕ d ,k , and BP⊖ d ,k . The claim now follows from (7.39), the definition of the operator LL⋆d ,k , and the law of total probability. □ Let ρ(ℓ) be the distribution of the log-likelihood ratio η(ℓ) x . Corollary 7.13. For d < dcon(k) the sequence ( γ ( ρ(ℓ) )) ℓ converges weakly to πd ,k . Proof. The result follows by combining Corollary 7.10, Lemma 7.11, Proposition 2.15, Claim 7.12, and applying the continuous mapping theorem and the law of total probability. □ Proof of Proposition 2.13. Recall that we writeΛ+ (ℓ) T(ℓ) for the ℓ-fold iteration of the operatorΛ+ T . Let us write θ(ℓ) x = ( Λ+ (ℓ) T(ℓ) (0, . . . ,0) ) x. Using arguments similar to Fact 4.2, we can show that θ(ℓ) x is nothing but the distribution of the random variable γ−1(P [ τ(ℓ)(x) = 1 |T] ). Therefore, P[τ(ℓ)(x) = 1 |T] ∼ γ(θ(ℓ) x ) , and P[τ(ℓ)(x) = 1 | ∀y ∈ ∂2ℓx :τ(ℓ)(y) =τ+(y),T] ∼ γ(η(ℓ) x ) . 39 Due to Lemma 2.12, 0 ≤ γ(θ(ℓ) x ) ≤ γ(η(ℓ) x ) ≤ 1. Moreover, from Lemma 7.11, Proposition 2.15, and Claim 7.12, we see that for d < dcon(k) the sequence ( γ(θ(ℓ) x ) ) ℓ converges weakly to πd ,k . Finally, Corollary 7.13 implies that ( γ(η(ℓ) x ) ) ℓ also converges weakly to πd ,k , and thus, lim ℓ→∞ E ∣∣∣γ(θ(ℓ) x )−γ(η(ℓ) x ) ∣∣∣= lim ℓ→∞ ∣∣∣E [ γ(θ(ℓ) x ) ] −E [ γ(η(ℓ) x ) ]∣∣∣= 0 , implying the assertion. □ ACKNOWLEDGEMENTS We would like to thank the anonymous referees for thoroughly reviewing our paper and for suggesting valuable corrections and improvements. Amin Coja-Oghlan is supported by DFG CO 646/3, DFG CO 646/5 and DFG CO 646/6. Catherine Greenhill is supported by ARC DP250101611. Vincent Pfenninger is supported by the Austrian Science Fund (FWF) [10.55776 / 16502]. Pavel Zakharov is supported by DFG CO 646/6. Kostas Zampetakis is supported by DFG CO 646/5. For open access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. REFERENCES [1] E. Abbe, A. Montanari: On the concentration of the number of solutions of random satisfiability formulas. Random Structures and Algo- rithms 45 (2014) 362–382. [2] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802. [3] D. Achlioptas, A. Coja-Oghlan, M. Hahn-Klimroth, J. Lee, N. Müller, M. Penschuck, G. Zhou: The number of satisfying assignments of random 2-SAT formulas. Random Structures and Algorithms 58 (2021) 609–647. [4] D. Achlioptas, A. Coja-Oghlan, F. Ricci-Tersenghi: On the solution-space geometry of random constraint satisfaction problems. Random Structures and Algorithms 38 (2011) 251–268. [5] D. Achlioptas, C. Moore: Random k-SAT: two moments suffice to cross a sharp threshold. SIAM Journal on Computing 36 (2006) 740–762. [6] D. Achlioptas, A. Naor, Y. Peres: Rigorous location of phase transitions in hard optimization problems. Nature 435 (2005) 759–764. [7] D. Achlioptas, Y. Peres: The threshold for random k-SAT is 2k ln2−O(k). Journal of the AMS 17 (2004) 947–973. [8] M. Aizenman, R. Sims, S. Starr: An extended variational principle for the SK spin-glass model. Phys. Rev. B 68 (2003) 214403. [9] D. Aldous, J. Steele: The objective method: probabilistic combinatorial optimization and local weak convergence. In: H. Kesten (ed.): Probability on Discrete Structures. Springer (2004). [10] N. Alon, J. Spencer: The probabilistic method. Wiley (2016). [11] V. Bapst, A. Coja-Oghlan, S. Hetterich, F. Rassmann, D. Vilenchik: The condensation phase transition in random graph coloring. Commu- nications in Mathematical Physics 341 (2016) 543–606. [12] M. Bayati, D. Gamarnik, P. Tetali: Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Ann. Probab. 41 (2013) 4080–4115. [13] R. Biswas, W. Chen, A. Sen: On the replica symmetric solution in general diluted spin glasses. arXiv:2410.15599 (2024). [14] S. Boucheron, G. Lugosi, P. Massart: Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford (2013). [15] G. Bresler, B. Huang: The algorithmic phase transition of random k-sat for low degree polynomials. Proc. 62th FOCS (2021) 298–309. [16] A. Broder, A. Frieze, E. Upfal: On the satisfiability and maximum satisfiability of random 3-CNF formulas. Proc. 4th SODA (1993) 322–330. [17] A. Chatterjee, A. Coja-Oghlan, N. Müller, C. Riddlesden, M. Rolvien, P. Zakharov, H. Zhu: The number of random 2-SAT solutions is asymp- totically log-normal. Proc. 28th RANDOM (2024) #39. [18] P. Cheeseman, B. Kanefsky, W. Taylor: Where the really hard problems are. Proc. IJCAI (1991) 331–337. [19] Z. Chen, A. Galanis, L. A. Goldberg, H. Guo, A. Herrera-Poyatos, N. Mani, A. Moitra: Fast sampling of satisfying assignments from random k-SAT with applications to connectivity. SIAM J. Disc. Math. 38 (2024) 2750–2811. [20] Z. Chen, A. Lonkar, C. Wang, K. Yang, Y. Yin: Counting random k-SAT near the satisfiability threshold. arXiv 2411.02980v1 (2024). [21] V. Chvátal, B. Reed: Mick gets some (the odds are on his side). Proc. 33th FOCS (1992) 620–627. [22] A. Coja-Oghlan: A better algorithm for random k-SAT. SIAM Journal on Computing 39 (2010) 2823–2864. [23] A. Coja-Oghlan: Belief Propagation fails on random formulas. Journal of the ACM 63 (2017) #49. [24] A. Coja-Oghlan, T. Kapetanopoulos, N. Müller: The replica symmetric phase of random constraint satisfaction problems. Combinatorics, Probability and Computing 29 (2020) 346-422. [25] A. Coja-Oghlan, F. Krzakala, W. Perkins, L. Zdeborová: Information-theoretic thresholds from the cavity method. Advances in Mathematics 333 (2018) 694–795. [26] A. Coja-Oghlan, N. Müller, J. Ravelomanana: Belief Propagation on the random k-SAT model. Annals of Applied Probability 32 (2022) 3718–3796. [27] A. Coja-Oghlan, K. Panagiotou: The asymptotic k-SAT threshold. Advances in Mathematics 288 (2016) 985–1068. [28] A. Coja-Oghlan, W. Perkins: Belief Propagation on replica symmetric random factor graph models. Annales de l’institut Henri Poincare D 5 (2018) 211–249. [29] A. Coja-Oghlan, N. Wormald: The number of satisfying assignments of random regular k-SAT formulas. Combinatorics, Probability and Computing 27 (2018) 496–530. 40 [30] A. Dembo, A. Montanari: Gibbs measures and phase transitions on sparse random graphs. Brazilian Journal of Probability and Statistics 24 (2010) 137–211. [31] A. Dembo, A. Montanari: Ising models on locally tree-like graphs. Annals of Applied Probability 20 (2010) 565–592. [32] A. Dembo, A. Montanari, N. Sun: Factor models on locally tree-like graphs. Annals of Probability 41 (2013) 4162–4213. [33] J. Ding, A. Sly, N. Sun: Proof of the satisfiability conjecture for large k. 20 Annals of Mathematics 196 (2022) 1–388. [34] O. Dubois, J. Mandler: The 3-XORSAT threshold. Proc. 43rd FOCS (2002) 769–778. [35] S. Franz, M. Leone: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111 (2003) 535–564. [36] E. Friedgut: Sharp thresholds of graph properties, and the k-SAT problem. Journal of the AMS 12 (1999) 1017–1054. [37] A. Frieze, S. Suen: Analysis of two simple heuristics on a random instance of k-SAT. Journal of Algorithms 20 (1996) 312–355. [38] A. Galanis, L. A. Goldberg, H. Guo, K. Yang. Counting solutions to random SAT formulas. SIAM J. Comput. 50 (2021) 1701–1738. [39] H.-O. Georgii: Gibbs measures and phase transitions. De Gruyter (1988). [40] A. Goerdt: A threshold for unsatisfiability. J. Comput. Syst. Sci. 53 (1996) 469–486 [41] F. Guerra: Broken replica symmetry bounds in the mean field spin glass model. Comm. Math. Phys. 233 (2003) 1–12. [42] M. Hajiaghayi, G. Sorkin: The satisfiability threshold of random 3-SAT is at least 3.52. IBM Research Report RC22942 (2003). [43] K. He, K. Wu, K. Yang: Improved bounds for sampling solutions of random SAT formulas. Proc. 34th SODA (2023) 3330–3361. [44] S. Hetterich: Analysing Survey Propagation Guided Decimationon Random Formulas. Proc. 43rd ICALP (2016) #65. [45] S. Janson, T. Luczak, A. Ruciński: Random Graphs. Wiley (2000). [46] A. Kaporis, L. Kirousis, E. Lalas: The probabilistic analysis of a greedy satisfiability algorithm. Random Structures and Algorithms 28 (2006) 444–480. [47] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová: Gibbs states and the set of solutions of random constraint satisfaction problems. Proc. National Academy of Sciences 104 (2007) 10318–10323. [48] L. Lovász: Large networks and graph limits. AMS (2012). [49] S. Mertens, M. Mézard, Riccardo Zecchina: Threshold values of random K-SAT from the cavity method. Random Structures and Algorithms 28 (2006) 340–373. [50] M. Mézard, A. Montanari: Information, physics and computation. Oxford University Press (2009). [51] M. Mézard, G. Parisi, R. Zecchina: Analytic and algorithmic solution of random satisfiability problems. Science 297 (2002) 812–815. [52] A. Moitra. Approximate counting, the Lovasz local lemma, and inference in graphical models. J. ACM 66 #10 (2019). [53] M. Molloy: Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms 27 (2005) 124–135. [54] R. Monasson, R. Zecchina: The entropy of the k-satisfiability problem. Phys. Rev. Lett. 76 (1996) 3881. [55] R. Monasson, R. Zecchina: Statistical mechanics of the random K -SAT model. Phys. Rev. E 56 (1997) 1357–1370. [56] A. Montanari, D. Shah: Counting good truth assignments of random k-SAT formulae. Proc. 18th SODA (2007) 1255–1264. [57] D. Panchenko: The Sherrington-Kirkpatrick model. Springer (2013). [58] D. Panchenko: Spin glass models from the point of view of spin distributions. Annals of Probability 41 (2013) 1315–1361. [59] D. Panchenko: On the replica symmetric solution of the K -sat model. Electron. J. Probab. 19 (2014) #67. [60] D. Panchenko, M. Talagrand: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130 (2004) 319–336. [61] A. Sly: Computational transition at the uniqueness threshold. Proc. 51st FOCS (2010) 287–296. [62] M. Talagrand: The high temperature case for the random K -sat problem. Probab. Theory Related Fields 119 (2001) 187–212. [63] L. Valiant: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8 (1979) 410–421. [64] C. Wang Y. Yin: A sampling Lovasz local lemma for large domain sizes. Proc. 65th FOCS (2024) 129–150. [65] A. Coja-Oghlan, W. Perkins: Bethe states of random factor graphs. Communications in Mathematical Physics 366 (2019) 173–201. ARNAB CHATTERJEE, arnab.chatterjee@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT- MUND 44227, GERMANY. AMIN COJA-OGHLAN, amin.coja-oghlan@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATH- EMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY. CATHERINE GREENHILL, c.greenhill@unsw.edu.au, SCHOOL OF MATHEMATICS AND STATISTICS, UNSW SYDNEY, NSW 2052, AUS- TRALIA. VINCENT PFENNINGER, pfenninger@math.tu-graz.at,TU GRAZ, INSTITUTE OF DISCRETE MATHEMATICS, STEYRERGASSE 30, 8010 GRAZ, AUSTRIA. MAURICE ROLVIEN, maurice.rolvien@uni-hamburg.de, UNIVERSITY OF HAMBURG, FACULTY OF MATHEMATICS, INFORMATICS AND NAT- URAL SCIENCES, DEPARTMENT OF INFORMATICS, VOGT-KÖLLN-STR. 30, 22527 HAMBURG, GERMANY. PAVEL ZAKHAROV, pavel.zakharov@tu-dortmund.de,TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATHEMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY. KOSTAS ZAMPETAKIS, konstantinos.zampetakis@tu-dortmund.de,TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN- ST, DORTMUND 44227, GERMANY. 41 Acknowledgements Abstract Introduction Models Constraint Satisfaction Problems Definitions The SAT Problem Why SAT ? Factor Graphs Statistical Physics and CSPs Boltzmann (Gibbs) probability distribution Some statistical physics models Message Passing Algorithms Belief Propagation BP messages Computing marginals Bethe-Free Entropy Algorithms Belief Propagation Guided Decimation Decimation Process Unit Clause Propagation Pure Literal Pursuit Warning Propagation Phase Transitions in random CSPs The Satisfiabilty Transition Quenched and Annealed Techniques Gibbs measure and Long range correlation Gibbs measure on random CSPs Correlation decay and Gibbs Uniqueness Replica Symmetry Clustering transition: Reconstruction Property Different phases in random k-SAT and random k-XORSAT A Central Limit Theorem for random 2-SAT solutions Motivation and History Main Result. Proof Strategy Method of Moments fails. BP Approximation. Towards calculating variance. Establishing the Central Limit Theorem Performance of BPGD on random k-XORSAT Motivation and History Problem Statement and Results. Analysis of BPGD Phase Transition of Decimation process Proof Strategy. On the Gibbs Uniqueness in random k-SAT Motivation and History Main Results. Limit in probability of -partition function in random k-SAT Lower bound on Gibbs uniqueness Proof Strategy Existence of fixed point. Interpolation method: matching upper bound Aizenmann-Sims-Starr: matching lower bound Lower bound on Gibbs uniqueness threshold The Last Chapter Summary of the thesis Future Directions Contribution of the authors List of Papers