PARTITION FUNCTION ESTIMATION
AND PHASE TRANSITIONS ON RANDOM

SATISFIABILITY PROBLEMS

Dissertation

zur Erlangung des Grades eines

Doktors der Naturwissenschaften

der Technischen Universität Dortmund
an der Fakultät für Informatik

von

Arnab Chatterjee
aus Kharagpur,West Bengal,India

Dortmund

2025


Dekan:
Prof. Dr. Jens Teubner

Gutachter:
Prof. Dr. Amin Coja-Oghlan
TU Dortmund

Prof. Dr. Dimitris Achlioptas
University of Athens

Datum der mündlichen Prüfung: 25.11.2025


DEDICATION

To my parents

and

all young minds of West Bengal.


Acknowledgements

“So long, and thanks for all the fish.”

Looking back at my time as a master student at IIIT Delhi, at the end of my second semester, when

many of my friends chose their thesis topics in the booming areas of machine learning and artificial

intelligence, I was fortunate to have Prof. Subhabrata Samajder as my master thesis advisor who

introduced me to the world of “Random Graphs”. Many of the random ideas that struck me during my

thesis took shape in various forms over the next several years and culminated in this thesis. I would

say Prof. Samajder is one of those responsible person for turning my attention towards theoretical

computer science. This thesis is my humble token of appreciation for his effort and sincerity.

Nevertheless, one year and a half working at the industry after completing my masters, I made one

of the most important decisions of my life – pursuing a PhD. in the field of random graphs. So, the cruise

again resume its journey with a different commander now on the deck. My PhD. supervisor Prof. Amin

Coja-Oghlan patiently let me explore the depths but gave the screw the crucial half-turn just when

needed to prevent me from floating away. I can’t thank him enough for his constant encouragement

and his confidence in me. Each and every time whenever I came up with a new idea he has been

patient with me to stretch my mind and raise me up to think like a mathematician. Throughout the

PhD. his guidance and support played a pivotal role in my decision to dedicate myself towards random

graphs and probabilistic combinatorics more specifically on random satisfiability problems.

I would also like to extend my sincere appreciation to my dissertation committee members–

Dimitris Achlioptas, Jean Christoph Jung and Kevin Buchin for taking the time to read my thesis,

participate in my defense, and provide thoughtful feedback..

I am also beholden to my coauthors from world’s prestigious institutes — Prof. Catherine Greenhill

(UNSW), Prof. Mihyung Kang (TU Graz), Prof. Noela Múller (TU Eindhoven) and Prof. Gregory

B. Sorkin (LSE). Collaborating with them broadened my understanding of the field and advanced

my learning trajectory. Besides them, I am also thankful to work closely with my colleagues at TU

Dortmund. Kostas, Lena, Maurice, Olga, Pavel and Ulrike created a space that was both supportive

and engaging, with conversations covering from technical details to everyday’s casual banter. Special

thanks go to Maurice who helped me not only in finding accommodation, but also in some official

works as an interpreter during my earlier days in Germany.

I also thank Haodong and Joon with whom I spent a very good time at Leiden and our conversation

often went late into the evenings reminding me that the mathematics can be serious but also fun.

During my PhD. I also have the valuable opportunity to spend a research visit at University of

California, Irvine (UCI). I owe my heartiest thanks to Prof. Asaf Ferber for hosting and making me

feel welcome to his research group. Beside him, I also thank to his group members – Marcelo, Mason

and Xiaonan with whom I shared my california stays, so not just productive discussion but also many

memorable days, from working on existing problems to explore new ideas.

i


The whole PhD, journey is not complete without the people who provide strength outside the

academic world. My parents Soumen Chatterjee and Mamata Chatterjee are above and beyond all

thanks that I can ever gather. I am dedicating this thesis to my lovely parents as well as to all the

young minds of West Bengal. It is not possible to adequately express in words the encouragement and

support they have given me throughout my graduation timeframe. I hope I somewhat succeeded in

meeting their expectations. I also owe heartiest thanks to my bhai (cousin) and masi (aunty), whose

encouragement and motivation gave me the strength to keep going even when things seem uncertain.

My deepest appreciation goes to my fiancée, Susmita who has been my constant companion and

whose belief, love, care on me has carried me through the difficult times.

This thesis, which contains whatever my research output that I could possibly ’write’ in words,

is a joint fruit of labor, persistence and confidence of a lot of people, spread all around the globe. I

took this opportunity to thank all those who helped me turn a possibility into a reality. I surely missed

more names than I remembered to mention above. But mentioned or unmentioned, my gratitude

transgresses the words I used to express my heartiest thanks.

Arnab,

California, September 2025.

ii


Abstract

This thesis emphasizes on the estimation of partition functions and analyze phase transitions in

random satisfiability problems with focuses on random 2-SAT, random k-XORSAT and random

k-SAT models. Partition functions capture the exponential growth of solution spaces and

establish a bridge among combinatorics, probability, and statistical physics. Studying their

asymptotics and fluctuations helps us to understand the mechanisms behind sharp phase

transitions and the solution space geometry in random constraint satisfaction problems.

Our first contribution establishes a central limit theorem for the number of solutions (also

called ’partition function’ in physics jargon) of random 2-SAT – first CLT of this type for any

random CSPs. Thereby it provides a precise probabilistic characterization of fluctuations on the

logarithm of the number of satisfying assignments of order
p

n with n the number of variables.

In addition to this we effectively evaluated the formula for variance on the number of random

2-SAT solutions. The proof techniques relies on the Martingale central limit theorem along with

the Gibbs uniqueness property and the local convergence to the Galton-Watson tree combined

with a coupling argument called ‘Aizenmann-Sims-Starr scheme’.

The second part of the thesis investigates the performance of a statistical physics inspired

message passing algorithm called ’Belief Propagation Guided Decimation’ on the random

k-XORSAT problem. Specifically, we derive an explicit threshold upto which the algorithm

succeeds with a strictly positive probability between 0 and 1. Additionally, we study a thought

experiment called ‘Decimation process’ for which we determine different phase transitions such

as (non)-reconstruction and condensation phase transition and their connection to BPGD (in

which regimes these two processes diverges or converge).

Finally, for random k-SAT, we revisited the Gibbs uniqueness threshold, improving the lower

bound over the previous work by Montanari and Shah [83]. More specifically, we count the

number of actual satisfying assignments of random k-SAT which is given by the physics inspired

‘replica symmetry solution’ upto the Gibbs uniqueness threshold. Mathematically, we find an

explicit expression on the logarithm of the number of solutions of random k-SAT in terms of

the Bethe free entropy which is a function defined for a probability measure in the unit intterval.

Moreover, our lower bound in contrast to Montanari-Shah bound is significant particularly for

small k.

In a nutshell, this thesis advance the rigorous understanding of random satisfiability problems

by combining the algorithmic analysis, probabilistic combinatorics and statistical physics

equipment. In light of both the structural properties of random formulas and the effectiveness of

different message passing algorithms along with the universal principles governing fluctuations,

correlation decay and mathematical foundation for the phenomena predicted by spin glass

theory, point toward new directions for the future research on random satisfiability problems.

iii


Contents

Acknowledgements i

Abstract iii

1 Introduction 3

2 Models 9

2.1 Constraint Satisfaction Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.2 The SAT Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.3 Why SAT ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.4 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Statistical Physics and CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Boltzmann (Gibbs) probability distribution . . . . . . . . . . . . . . . . . . . . 14

2.2.2 Some statistical physics models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Message Passing Algorithms 18

3.1 Belief Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 BP messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.2 Computing marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.3 Bethe-Free Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Belief Propagation Guided Decimation . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.2 Decimation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.3 Unit Clause Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.4 Pure Literal Pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Warning Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Phase Transitions in random CSPs 31

4.1 The Satisfiabilty Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Quenched and Annealed Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Gibbs measure and Long range correlation . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 Gibbs measure on random CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . 34


CONTENTS CONTENTS

4.3.2 Correlation decay and Gibbs Uniqueness . . . . . . . . . . . . . . . . . . . . . 35

4.3.3 Replica Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.4 Clustering transition: Reconstruction Property . . . . . . . . . . . . . . . . . . 39

4.4 Different phases in random k-SAT and random k-XORSAT . . . . . . . . . . . . . . 41

5 A Central Limit Theorem for random 2-SAT solutions 44

5.1 Motivation and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Main Result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Proof Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3.1 Method of Moments fails. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3.2 BP Approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3.3 Towards calculating variance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.4 Establishing the Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Performance of BPGD on random k-XORSAT 53

6.1 Motivation and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2 Problem Statement and Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.2.1 Analysis of BPGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.2.2 Phase Transition of Decimation process . . . . . . . . . . . . . . . . . . . . . . 57

6.3 Proof Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7 On the Gibbs Uniqueness in random k-SAT 61

7.1 Motivation and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.2 Main Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.2.1 Limit in probability of log-partition function in random k-SAT . . . . . . . 62

7.2.2 Lower bound on Gibbs uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.3 Proof Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.3.1 Existence of fixed point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.3.2 Interpolation method: matching upper bound . . . . . . . . . . . . . . . . . . 66

7.3.3 Aizenmann-Sims-Starr: matching lower bound . . . . . . . . . . . . . . . . . 67

7.3.4 Lower bound on Gibbs uniqueness threshold . . . . . . . . . . . . . . . . . . . 68

8 The Last Chapter 70

8.1 Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8.3 Contribution of the authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A List of Papers 85

i


List of Figures

2.1 Factor Graph representation of the SAT formula in example 2.1.1 . . . . . . . . . . . . . . 11

2.2 Left: A factor Graph representation of a 3-SAT formula F 3SAT = (x1 ∨¬x3 ∨ x4)∧ (x2 ∨
¬x4 ∨¬x5)∧ (¬x1 ∨x5 ∨¬x6)∧ (x3 ∨¬x4 ∨x6). Right: A factor graph representation of a

random linear system of equations (2.1.5) over F2. . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Left: Factor graph involved in computing ν(t+1)
x→a which is a function of all ’incoming

messages’ ν̂(t )
b→x with b ̸= a. Right: Factor graph involved in computing ν̂(t )

a→x which is a

function of all ’incoming messages’ ν(t )
y→a with y ̸= x. . . . . . . . . . . . . . . . . . . . . . 20

3.2 Up: A local snapshot of Warning Propagation update rules for message νF,a→x,ℓ defined

in (3.3.3). Down: Similarly, a local snapshot of Warning Propagation update rules for

message νF,x→a,ℓ defined in (3.3.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1 Galton-Watson tree T with Gibbs Uniqueness property . . . . . . . . . . . . . . . . . . . . 37

4.2 Phase diagram of k-SAT adapted and modified from [66]. Left to Right: Uniqueness, Clus-

tering (Replica Symmetry), Clustering → Condensation (dynamic 1RSB), Condensation

→ Satisfiability (static 1RSB), UNSAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3 Phase diagram of k-XORSAT adapted and modified from [66]. Left to Right: Clustering

(Easy SAT), Satisfiability(Hard SAT), UNSAT . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.1 Numerical approximations to the function φ(d) from (5.1.1) (red) and the variance η(d)2

from (5.2.5) (green). The black dashed line is the first moment bound d 7→ log(2)+
d
2 log(3/4) whereas the purple dashed line is the second moment bound. (Figure 1, [23]) 46

5.2 An illustration of the correlated GW-tree T ⊗ (Figure 1, [23]) . . . . . . . . . . . . . . . . . 50

5.3 Marginal distribution on two correlated formulas for d = 0.9 and M = 0.1m,0.5m,0.9m

(Figure 2, [23]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1 Matrix A and A′ corresponds to the Tanner graph G and G ′ . . . . . . . . . . . . . . . . . 55

6.2 Φd ,k,λ for k = 3 and d = 2.4, for λ from 0 to 0.3 (maximum at z = 0) and from 0.4 to 0.9

(Figure 1, [22]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.3 The phase diagrams for k = 3,4,5 with d ∈ (dmin,dSAT) on the horizontal and θ on the

vertical axis (Figure 3, [22]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

7.1 Comparison of Bd ,k (πd ,k ) with known bounds for limn→∞ 1
n log Z (Φ) for k = 3. [21] . . 64

1


LIST OF FIGURES LIST OF FIGURES

7.2 A graphical representation of coupling technique (Aizenmann-Sims-Starr scheme) . . 68

2


1
Introduction

“The theory of probability as mathematical discipline can and

should be developed from axioms in exactly the same way as

Geometry and Algebra.”

– Andrey Kolmogorov

In theoretical computer science, profound insights often arise at the intersection of the discrete

and probabilistic paradigms, formally referred to as probabilistic combinatorics. In this setting, random

graphs serve as a basic framework where edges are assigned at random according to a given probabil-

ity distribution [48]. Many researchers investigate the circumstances under which important global

properties emerge namely connectivity, the emergence of a giant component and the threshold in chro-

matic number [3, 5, 48, 63, 94]. These phase transitions or discontinuities, not only reflect phenomena

in statistical physics but also reveal the average-case complexity measures of algorithms [1]. General-

izing this perspective from graphs to higher dimensions naturally allows for a deep investigation of

random constraint satisfaction problems (CSPs) in which constraints are a generalization of edges and

involve assignments rather than colorings. As a consequence, random CSPs form a unifying locus:

they capture the combinatorial complexity of random graphs which require the precise algorithmic

attention typical in computer science and use probabilistic techniques honed in the mathematics

domain [9, 10]. This thesis capitalizes on this ternary relation in studying phase transitions in random

CSPs, characterizing the number of satisfying assignments, the constraint density boundaries beyond

which satisfiability disappears, analyze the performance of few physics inspired algorithms on random

CSPs and detailing the structural mechanisms underlying the phase transition [8].

The Constraint Satisfaction Problem (CSP) is defined as the set of n discrete-valued variables taking

3


CHAPTER 1. INTRODUCTION 4

values from a finite domain and finds a satisfying assignment subject to m-constraints. Each of the

constraints enforces some requirements on a subset of the variables [99]. A solution of the CSP is

an assignment of the variables that satisfies simultaneously all the constraints. There are numerous

studies were tackled in the field of computer science as well as in combinatorics and statistical

mechanics [11, 61, 66, 68]. Its inter-disciplinary research interest comes from a broad spectrum of

applications ranging from coding theory and communication engineering to computer architecture,

operational research and artificial intelligence [17, 45, 46, 54]. Famous examples of CSPs are the k-

satisfiability (k-SAT) problem and the graph q-coloring one(q-COL). In k-SAT the variables are boolean

and each constraint is the disjunction (OR) of k-literals (either a variable or its negation). In the second

one the variables are placed on the vertices of a graph where they can take q-possible colors and each

edge of the graph enforce the constraint that the two end vertices of an edge take different colors.

CSPs can be analyzed from several different perspectives. One fundamental approach comes from

computational complexity theory [51, 84, 91] that classifies CSPs based on their worst-case difficulty.

Particularly, it determines whether there exists an efficient polynomial time algorithm in the number

of variables n and m clauses to determine the existence of a solution for every possible instances. It

may come in several forms — beside the search variant which tries to find a satisfying assignment, the

decision variant handles the question of whether there exists a solution or not [66]. Once a solution

does exist one can might interest in how many such solutions exist [100]. While the general CSPs

can be defined over deterministic structures, a complementary insightful perspective turns up when

randomness comes into picture above the classical CSPs, in which behaviors are hard to observe

in a deterministically generated instance [73], defined as random CSPs — the models in which the

structure, assignments of variables and constraints, the matrix corresponding to the linear equations

in Fq are chosen according to some probability distribution.

Over the years, a significant developments have been made to illuminate the various behavior of

random CSPs. It share a formal mathematical analogy with the models studied in statistical mechanics

of disordered systems, particularly in the mean-field spin glasses [16, 87, 88, 90] where the interactions

induced by constraints are of vexing nature and due to the randomness in their construction, they

don’t possess any underlying finite dimensional structure. For instance, take the example of q-coloring

problem on random graphs, the variable can be treated as Potts spins, and our goal is to find the

ground state in the anti-ferromagnetic Potts model [13, 39]. A straight forward observation is when all

the connection edges in the ground state are bi-colored, the problems admits a satisfying assignment

as all the constraints are satisfied simultaneously. In the case of satisfiability problem, for instance

in the case of random k-SAT (with k = 3 for simpler version) each variable inside a clause one can

assume as a spin in the Ising model and the clauses can be thought as the interactions between the

spins with satisfying clauses refer to a certain configuration of the spins [37, 40, 77, 78, 89]. With the

connection to statistical mechanics one could define an energy function that penalizes the unsatisfied

clauses. Like previous example here also our goal is to find the ground state of this system which would

correspond to a satisfying assignment. Analogously, one can find the low-energy configuration in an

Ising model. A particular interesting research topic is to analyze the regime where both the number


5 CHAPTER 1. INTRODUCTION

of variables(n) and clauses (m) tends to infinity (∞) at a fixed ratio α= m/n, called as the constraint

density ratio. The random CSPs exhibit the threshold phenomena in this regime, the probability

of some of the properties falling abruptly from 1 to 0 [74, 80] as a function of α which treated as a

control parameter. Any of these phase transitions occurs at the satisfiability threshold denoted as αSAT

which depends on the parameter k,d , q of the problem (where d refers to average variable degree

of a random satisfiability problem and q refers to the number of colorings that can be assigned to

a variable in the case of q-coloring problem on random (hyper)-graph). For α < αSAT, the system

admits a satisfying assignment that satisfy all the constraint simultaneously, while a random instance

is typically unsatisfiable for α>αSAT
1.

To establish phase transitions in different random CSPs, statistical physicists have contributed

significantly by analyzing non-rigorous methods and explained the combinatorial structure of CSPs by

providing in details the solution space geometry of the problem and its connection with the phase

transitions [59, 66]. Thanks to the similarity between random CSPs and spin glasses, the application

of these methods first develop in the context of statistical mechanics of disordered systems namely

the replica symmetry and cavity method [15, 59, 65, 72, 79] in the statistical physics jargon. These

non-rigorous techniques have provided the predictions of αSAT in many models as well uncovered

many other phase transition corresponds to the structure of the set of solutions in the satisfiability

phase. Additionally, the cavity method has inspired new algorithms, particularly the message passing

algorithms as well spectral algorithms and leads to exciting predictions to the information-theoretic

and computational nature of different inference problems. These algorithms exploit the detailed

picture of the solution space, out of which some of the predictions have been confirmed later as well

rigorously [1, 8, 41, 76]. So, on one side statistical physicists are happy with their heuristic predictions

or evidence to a given problem, whereas on the other side mathematician/theoretical computer

scientists try to verify their heuristic predictions by means of mathematical proofs. For example, in [32],

the authors mathematically establish a formula for the mutual information in statistical inference

problems induced by random graphs inspired by cavity method, a non-rigorous physics approach.

In this PhD. also we verify and confirm such a heuristic prediction of two physicists Ricci-Tersenghi

and Semerjian [96] using a physics inspired message passing algorithm called as Belief Propagation

Guided Decimation. In particular, we [22] derive an explicit threshold upto which the algorithm suc-

ceeds with a strictly positive probability Ω(1) and beyond which it fails to find a satisfying assignment

with high probability [22]. Alongside, we also analyze a thought experiment called the decimation

process for which we identify a (non)-reconstruction and a condensation phase transition. There is

one more interesting phase transition occurs in the satisfiable phase, name as clustering (dynamic)

phase transition occurs at a critical density denoted as αclus. From the name it is clear that this tran-

sition emphasizes the drastic change of the shape of the set of solutions, which can be treated as a

subset of the whole configuration space. Below this threshold (αclus), the set of solutions is rather

1Although Friedgut [47] provides the transition from SAT to UNSAT (Theorem 4.1.2, Chapter 4), the existence of satisfia-
bility thresholds (αSAT) still is an open problem in many interesting problems like- random 3-SAT, random k-NAE-SAT (for
small k values – say random 3-NAE-SAT), random 3-graph coloring etc. Many other general CSP thresholds besides k-SAT
and coloring predicted by physics heuristics, but not proved rigorously.


CHAPTER 1. INTRODUCTION 6

well-connected that is any solution can be reached from other by a reshuffling of a confined number of

variables, whereas above αclus the set of solution chops into a large number of distinct clusters which

corresponds to a pure decomposition of the uniform measure over the set of solutions [59]. Internally,

the clusters are well-connected but well-separated from the other. This phase transition also marks

the emergence of a certain long range correlations among variables which enables the solvability of

an information-theoretic problem called tree reconstruction [85]. But when we are talking about the

static version of the model, then the properties are not affected by this clustering transition, rather

only sensitive to a further transition called condensation phase transition denoted as αcond which

affects the number of dominant clusters [59]. In random k-XORSAT both (non)-reconstruction and the

clustering thresholds coincide (the phase diagram can be found in Chapter 4). Because of the structure

of the k-XORSAT (which can be translated into linear system over F2), the solution space geometry

is completely determined by the linear 2-core which marks both (i). the onset of frozen variables –

shattering into clusters and (ii). the onset of long-range correlations on the GW tree model which im-

plies reconstruction possible. Further the clustering threshold αclus can be defined as the appearance

of a non-trivial solution of the one step of Replica Symmetry Breaking (1RSB) equation with Parisi

breaking parameter X = 1, in the context of cavity method [69]. For more details in particular about

the reconstruction and systematic connection one can refer to 67. Experimentally, for some random

CSPs it is verified that the clustering threshold αclus happens at a much smaller constraint density

than the satisfiability threshold αSAT in the large k-limit. For instance, the asymptotic bound for the

clustering and satisfiability thresholds for the random k-SAT isαclus ∼ 2k

k (logk+loglogk+1−log2) [81]

and αSAT ∼ 2k log2 [34] (more details can be found in Chapter 4).

Despite the detailed picture of the solution space geometry of the random CSPs, it remains an

interesting research question to understand how the algorithms behave when they are trying to find a

solution in the satisfiable regime. More specifically, researchers would like to determine the algorithmic

threshold denoted as αalg above which no algorithm is able to find a solution for the problem with

high probability in polynomial time. Using numerical simulations, for small k-values one can able to

design the algorithms efficiently where the clause to variable densities very close to the satisfiability

threshold (αSAT), whereas it cannot be possible to solve numerically for large values of k. In this context,

Coja-Oghlan in [27] provides a polynomial time algorithm for random k-SAT upto the constraints

densities coincide at leading order with the clustering threshold αclus. Although this strands a broad

range of the threshold value where typical instances have a non-empty set of solutions, but there is

no known algorithm which can be able to find the solutions efficiently. In some cases it is proven

that these algorithms fail to find the solutions [30, 50, 56]. Even if it is hard to predict the algorithmic

threshold αalg precisely in terms of structural phase transition one can predict the hypothesis that the

clustering threshold is upper bound to the algorithmic one, αalg ≤ αclus. The research in structural

phase transitions within the satisfiability regime particularly the emergence of the clustering threshold

αclus in terms of long-range correlations, relies on the uniform distribution of all satisfying assignments.

In this thesis, we also prove that throughout this satisfiable phase, the logarithm of the number of

satisfying assignments of a random 2-SAT exhibits fluctuations of order
p

n, where n is the number of


7 CHAPTER 1. INTRODUCTION

variables.

Going back to the statistical physics inspired perspective, one more prominent threshold arise in

terms of characterization on the decay of correlation under the Gibbs (or Boltzmann) measure on the

solution of the problem is called “Gibbs Uniqueness threshold” denoted as duni q [98]. This threshold

is expressed in terms of the average degree of variable (d) and comes into picture when analyzing

the infinite tree limit – Galton-Watson process approximation of the problem. Conversely, other two

thresholds (satisfiability and clustering) are defined with respect to the constraint density (α). In sparse

random graph models, these two parameters are closely related although d concerns with the local

geometric structure of the factor graph, whereas α treats as a control parameter in the random CSP

ensembles. For instance, in case of random k-SAT or random k-coloring, the constraint density α

determines the expected average degree of variables d , making the uniqueness threshold relate with

the other structural transitions. Specially, from an algorithmic prospective, the Gibbs uniqueness

threshold plays an important role as the local algorithm such as Belief propagation is effective in this

regime. In this thesis, we explicitly provide a lower bound on this Gibbs uniqueness threshold (duni q )

that improves over prior work of Montanari and Shah [83]. Particularly, we prove that for any k ≥ 3

for clause/variable ratios upto this uniqueness threshold of the corresponding Galton-Watson tree,

the number of satisfying assignments of random k-SAT is given by the physics method called ’replica

symmetry’ predicted in [77].

This manuscript is based on the three papers [21–23] that have been produced during my PhD. The

remaining chapters are organized as follows.

■ Chapter 2 introduces the definition and representation of constraint satisfaction problems.

We also discuss one of the well-known CSP: the satisfiability problem and its importance in

computer science. The chapter concludes with a brief discussion of several statistical physics

quantities that will be used throughout the thesis.

■ Chapter 3 focuses on message passing algorithms, including Belief Propagation and Warning

Propagation, along with some variants derived from them. We discuss the applications and

characteristics of algorithms such as Belief Propagation Guided Decimation, the Decimation

Process (a statistical physics–inspired thought process), and the purely combinatorial algorithm

namely Unit Clause Propagation. We also present one of the most useful algorithms employed

in this thesis for estimating the logarithm of the partition function on random k-SAT (further

details in Chapter 7).

■ In Chapter 4 we examine the satisfiability transition and compares short-range and long-range

correlations in random satisfiability problems, with a focus on random k-SAT and random

k-XORSAT. We study the Gibbs measure on random CSPs and explore various phase transitions,

including reconstruction, non-reconstruction, condensation, and Gibbs uniqueness, in the

context of both random k-XORSAT and k-SAT. The chapter ends with a comparison of the

different phases and the associated solution space geometry of random k-SAT versus random

k-XORSAT.


CHAPTER 1. INTRODUCTION 8

■ Chapter 5 addresses the number of satisfying assignments in random 2-SAT. This chapter con-

tains the first main result of the thesis, establishing a central limit theorem for the number

of solutions of random 2-SAT formulas – first time CLT on any kind of random satisfiability

problems.

■ Chapter 6 presents the second main result of the thesis, analyzing the performance of the

Belief Propagation Guided Decimation algorithm on random k-XORSAT and comparing it with

the statistical physics–inspired thought process namely decimation process. We also identify

different phase transitions of the decimation process, pinpointing the regimes of d and θ where

BPGD succeeds or fails.

■ In Chapter 7 we revisit the Gibbs uniqueness threshold for random k-SAT and improves the lower

bound established by Montanari and Shah [83]. Towards the proof of the result we introduce

’interpolation method’ and ’Aizenmann-Sims-Starr scheme’ for proving the matching upper and

lower bound on the logarithm of number of satisfying assignments on random k-SAT upto Gibbs

uniqueness threshold. We explicitly determine the number of satisfying assignments predicted

by the statistical physics–inspired replica symmetric solution. The result comes in terms of the

Bethe free entropy Bd ,k which is a function defined for a probability measure π ∈P (0,1).

■ The final chapter summarizes the results of the thesis, comparing them with existing work in the

probabilistic combinatorics as well as statistical physics literature. We conclude by outlining

several interesting open problems and possible directions for future research.


2
Models

“...random constraint satisfaction problems (CSPs) is the geometry of

the space of satisfying or almost satisfying assignments ... for which

a precise landscape of predictions has been made via statistical

physics-based heuristics.”

–Jun-Ting Hsieh et. al.

2.1 Constraint Satisfaction Problems

2.1.1 Definitions

A constraint satisfaction problem (’CSP’) is defined as the set of n variables denoted as V = {x1, x2, · · · , xn}

are submitted to a set of a number of constraints C = {a1, a2, · · · , am} for some m ∈N. The variables

xi , i ∈ {1, · · · ,n} take their values in a finite setΩ. Clearly, when |Ω| = 2, the variables will be treated as

Boolean variables: Ω= {0,1}. In statistical physics paradigm these are equivalent to spins σi ∈ {−1,1},

using the change of variable σi = 2xi −1. When the set Ω takes an arbitrary integer q , the variables

can be served as Potts spins or colors: Ω= {1,2, · · · , q}. Further, we call σx = {x1, x2, · · · , xn} ∈Ωn as a

configuration of the variables and for a subset ω⊆ {1, · · · ,n} of variables, we call σxω its configuration.

The clauses a j with j ∈ {1, · · · ,m} details a subset ∂ j ⊆ {1, · · · ,n} of variables [99] and put a constraint

on the value of their configuration σx∂ j . More specifically, when the constraint is satisfied the function

a j : Ω|∂ j | → {0,1} assesses to value 1, otherwise to value 0.

There are several variations of CSPs. In the optimization version of the problem, the goal is to find

an optimal configuration which minimizes the cost function. The cost function E :Ωn →R+ is defined

9


CHAPTER 2. MODELS 10

as the counts of total number of constraints which are unsatisfied by a configuration σ:

E(σx ) =
m∑

j=1
(1−a j (σ∂ j )) (2.1.1)

The second type of the problem referred as decision making where the aim is to find a configuration

σ⋆x with cost function E (σ⋆x ) ≤ E0, where E0 is given as the threshold value for the cost function. When

this threshold value becomes 0, then the goal will be to find such a configuration which satisfies all the

constraints simultaneously. Such this configuration will be referred as the solution of the problem. In

other way, a solution or, a satisfying assignment is a mapping σ : V →Ω that satisfies every constraints.

As long as computing the cost function is easy to evaluate, the decision making problem will be easier

than the optimization version. Because in the decision case, once we have the optimal configuration

we just need to compare it with the threshold value of the cost function. The third variant of the CSPs

so called counting problem refers to count the number of satisfying assignments (or, solutions) of a

given instance. Generally speaking, this version is much more harder as compared to the previous two

variants. In this thesis, we compute such a problem on random 2-SAT where we count the number of

satisfying assignments of random 2-SAT throughout the satisfiable regime. Moreover, in this thesis

we also determine the number of actual satisfying assignments of random k-SAT formula for the

clause-to-variable densities upto Gibbs Uniqueness threshold discussed in Chapter 4.

2.1.2 The SAT Problem

The satisfiability problem has a long and exciting history in probabilistic combinatorics as well as in

computer science. Consider a boolean formula F consists of n boolean variables and m logical clauses

{a1, a2, · · · , am} on the set of literals. Each literal li corresponds to a variable xi can take either the value

xi or its negation (¬xi ). Each clause present in the SAT formula F is a disjunction (logical OR (∨)) on

the literals and are of the form

a j = li (1) ∨ li (2) ∨ li (3) ∨·· ·∨ li (∂ j ) (2.1.2)

where each literals li are formed from the variables in ∂ j . So, the variables in ∂ j can take values from

2|∂ j | possible combinations out of which in one case only the clauses get violated when all the literals

take the value 0. Then the satisfiability formula F is the conjunction (logical AND(∧)) over the set of

clauses and are of the form

F = a1 ∧a2 ∧a3 ∧·· ·∧am (2.1.3)

and is also called a CNF formula (conjunctive normal form). Subsequently, the formula F satisfies i.e.,

evaluates to 1 if and only if all the clauses present in F are evaluate to 1. Below example 2.1.1 provides

a SAT formula which consists of eight variables {x1, x2, · · · , x8} and four clauses {a1, · · · , a4}. Moreover,

a satisfying assignment to the below toy example is given by σ(x1) = 1,σ(x2) = 1,σ(x3) = 0,σ(x4) =


11 CHAPTER 2. MODELS

x1

x2

x3

x4

x5

x6

x7

x8

a1

a2a3

a4

x9

Figure 2.1: Factor Graph representation of the SAT formula in example 2.1.1

0,σ(x5) = 1,σ(x6) = 1,σ(x7) = 0,σ(x8) = 1,σ(x9) = 1.

Example 2.1.1. F = (¬x1 ∨x2 ∨x3 ∨x4)︸ ︷︷ ︸
a1

∧ (¬x4 ∨x9)︸ ︷︷ ︸
a2

∧ (x4 ∨¬x5 ∨x6)︸ ︷︷ ︸
a3

∧ (x1 ∨¬x6 ∨¬x7 ∨x8)︸ ︷︷ ︸
a4

In the decision version of this problem, a satisfying assignment is the configuration σ such that

our formula F evaluates to 1. Thus the instance of a SAT problem is defined by the number of variables

(n) and clauses (m) as well for each clause a the given choice of the subset ∂ j and for each variable

appears in that subset ∂ j the choice of the literal li appears in the clause a j by describing the formula

F . Using the spins σi ∈ {−1,1} with changing variable σi = 2xi −1 used in the context of Ising model in

statistical physics paradigm, the clauses can be written as follows:

a j (σ∂ j ) = 1−
∏

i∈∂ j
1[σi =−Υ j

i ] (2.1.4)

where, Υ
j
i =





1 if the literal is xi and,

−1 if the literal is ¬xi

2.1.3 Why SAT ?

So far, we have discussed the basic definitions of constraint satisfaction problems and the satisfiability

(SAT) problem which is a member of a larger family of CSPs. The obvious question comes into reader’s

mind that why we study the satisfiability problems in computer science. The SAT problems play a

crucial role in theoretical computer science and sit in a prominent place among all NP-complete

problems. This is because SAT is both simple by empowering combinatorial reasoning and general

enough to model any kind of other problems in a quite natural style.


CHAPTER 2. MODELS 12

(i) SAT is simple to describe: In complexity theory, one of the most general theorem concerning

the NP-complete problems is “NDTM-ACCEPTANCE” which states: given the description of a

non-deterministic turing machine M , an input string x and the number of steps t , does M accept

x within t steps? Although this problem is more general than SAT, but not simple. General in the

sense that the theorem “NDTM-ACCEPTANCE is NP-complete” becomes a triviality whereas

on the other hand the Cook-Levin theorem “3-SAT is NP-complete” is one of the fundamental

results making SAT much more pliable by allowing combinatorial reasoning than the non-

deterministic turing machine. One of the most interesting and deep research topic of SAT

problems say random k-SAT, intrinsically defining a probability distribution on k-SAT formulas

exhibits many interesting phenomena. By contrast, defining a probability distribution on the

tuples (M , x, t), makes the instance of NDTM-ACCEPTANCE much less natural as well less

interesting as compared to SAT.

(ii) SAT is general: Beside its simplicity regarding the combinatorial structure, it is still general

enough to model a wide range of problems in a quite natural fashion. Thus it can be served as a

’modeling language’ for many problems. In complexity theory any NP-complete problem can

model another NP problem via reduction if there exists a polynomial time algorithm for that

problem. But some of these reductions are more or less straightforward, whereas some are not.

Let’s consider the problem “Hamiltonian Path” where we have given a graph G and we know

some of the edges say e1,e2,e3 are the part of the Hamiltonian path. Then can we be able to

extend this to complete a Hamiltonian Path? So, formulating this problem as a SAT problem is

quite straightforward whereas the reduction from SAT to Hamiltonian Path requires the design

of several clever mechanism.

2.1.4 Factor Graphs

Coming to the graphical representation of an instance of a CSP, the most prominent way to represent

any CSP problem is using a graph called factor graph or, tanner graph. This is a bipartite graph

G = (V ,C ,E) consists of two types of nodes where the first type V = {x1, x2, · · · , xn} represents the

variables (referred as variable nodes) and the second type C = {a1, a2, · · · , am} represents the clauses

(referred as check/factor nodes). E is the set of edges connecting the variable and check nodes. There

will be an edge between a variable node (xi ) and a check node (ai ) if the variable xi appears in the check

ai either in original form (xi ) or as its negation (¬xi ). Furthermore, there can be a weight function

associated with each check node ai where these weights are linked to a probability distribution called

Boltzmann (Gibbs) distribution described later in this section.

Figure 2.1 is a factor graph representation corresponds to the SAT formula F given in example 2.1.1.

In the figure, the variable nodes V = {x1, x2, · · · , x9} are represented by filled circles whereas the check

nodes C = {a1, · · · , a4} are represented by the empty square. The set E is represented by both solid

and dashed edge, where the solid link between a variable and check node is referred as the variable

appears as its original form in that check, whereas the dashed link represents that the variable appears


13 CHAPTER 2. MODELS

x1 x2 x3 x4 x5 x6

a1 a2 a3 a4

a1

a2

a3

x1

x2

x3

x4

Figure 2.2: Left: A factor Graph representation of a 3-SAT formula
F 3SAT = (x1 ∨¬x3 ∨x4)∧ (x2 ∨¬x4 ∨¬x5)∧ (¬x1 ∨x5 ∨¬x6)∧ (x3 ∨¬x4 ∨x6). Right: A factor graph

representation of a random linear system of equations (2.1.5) over F2.

as negation in that check node. Equivalently, the factor graph G also can be viewed as a hyper-graph

where the variables are still represented as vertices but the clauses are now represent as hyper-edges

which link a subset of vertices with length > 2. In this thesis we are mainly interested in the factor

graph associated with the random k-SAT model and the random matrix over finite field. Let’s talk

about the representation of the factor graph corresponds to these two models briefly.

(i) random k-SAT: The most common example of SAT problem is random k-SAT where each clause

can take exactly k-variables. Similar to other factor graph representations, the set V represents

the boolean variables and the set C represents the clauses and there will be an edge between a

variable node and a check node if and only if that particular variable is present in that check. The

variables are denoted by a circle and the check/factor nodes are denoted by a square. Figure 2.2

(left) is a simple factor graph representation of a k-SAT formula with k = 3.

(ii) random linear equations over F2: Coming to the random linear system of equations unlike

random k-SAT, it is easier to grab. Consider a linear system of equation Ax = b over F2 where A

is an n ×n matrix with each entry 1 with probability d/n where d is the average variable degree.

Now, given a random vector b = {0,1}m , our goal is to design a factor graph corresponding to the

system of linear equations. Resembling to k-SAT, here also, the set V represents the variables

designed by a circle and the check nodes C represents the equations designed by a square. There

will be an edge between a variable and a check if and only if that particular variable will appear

in that equation. Let’s take a toy example of this kind below:




1 0 1 0

0 1 1 1

1 1 0 1




︸ ︷︷ ︸
A




x1

x2

x3

x4




︸ ︷︷ ︸
x

=




1

0

1




︸︷︷︸
b

(2.1.5)


CHAPTER 2. MODELS 14

Finding the set of all possible solutions of the vector X of the system of linear equations Ax = b

over F2 is referred as a well-known random CSP called random k-XORSAT where each equation

contains exactly k variables. In other words the matrix A has exactly k ones in each row. So, as

compared to k-SAT, the disjunction ∨ (OR) inside a clause is replaced by XOR denoted by ⊕:

a j = li (1) ⊕ li (2) ⊕ li (3) ⊕·· ·⊕ li (k) (2.1.6)

Equivalently, one can rewrite this as the sum
∑k

p=1 li (p) of literals inside a clause equals to 1

modulo 2. One of the best known algorithms for solving this linear system of equations in

polynomial time O(n3) is Gaussian elimination where the number of equations m =Θ(n) with

n is the total number of variables present in those equations. Moreover, this k-XORSAT can be

random when the clauses (equations) are drawn independently and uniformly at random from

the all possible 2k
(n

k

)
XOR-clauses on the set of variables V = {x1, x2, · · · , xn}. Due to its algebraic

structure, any algorithm is always easy to analyze on this model. Later in this thesis, we will

look over the performance of such a physics inspired algorithm on random k-XORSAT model.

Figure 2.2 (right) shows the factor graph representation of the linear system of equations over F2

as given in equation (2.1.5).

2.2 Statistical Physics and CSPs

One of the most striking phenomenon of science is to deal with the ever growing variety of states of

matter with various properties. Here the statistical physics comes into picture which aims to explain

how the complex behaviors can emerge when a large numbers of identical elementary component

interact with each other. It relies on two notable steps, in one hand passing the idea from the deter-

ministic law of physics to a probabilistic description, on the other hand it starts from a probabilistic

description and tries to recover that determinism by law of large numbers at a macroscopic level. In

this section we will discuss some of the basic properties of statistical physics and its connection with

the constraint satisfaction problems.

2.2.1 Boltzmann (Gibbs) probability distribution

From equation (2.1.1), there exists a cost function E : Ωn → R+ which counts the total number of

clauses violated by a given assignment σx ∈Ωn of the n variables. In the mathematical optimization

problem one always aims to minimize this cost function defined over the set of all possible configura-

tions Ωn . Once the configuration space Ω and the cost function E are fixed, the Boltzmann probability

distribution for the system to be found in the set of configuration is given by,

µβ(σx ) = 1

Z (β)
e−βE(σx ) (2.2.1)


15 CHAPTER 2. MODELS

where, the normalization constant Z (β) is known as Partition function in physics jargon and is equal

to

Z (β) =
∑

σx∈Ωn

e−βE(σx ) (2.2.2)

The real parameter T = 1/β is the temperature with β refers as the inverse temperature. In the context

of CSP, to emphasize the factor graph G by introducing a weight function ψa j :Ω∂a j → (0,∞) to each

constraint a j , the Boltzmann distribution can be re-written as

µ(σx ) = 1

ZG

m∏
j=1

ψa j (σx∂ j ) where, ZG =
∑

σx∈Ωn

m∏
j=1

ψa j (σx∂ j ) (2.2.3)

However equations (2.2.1)–(2.2.2) interpolates smoothly between numerous interesting situations.

In the high-temperature limit (β→ 0), one can recover uniform probability distribution whereas in

the low-temperature limit (β→∞) it concentrates on the global maxima of the original distribution.

Specifically, in theβ→∞ limit, a configurationσx0 ∈Ω such that E (σx ) ≥ E (σx0 ) for anyσx ∈Ω is called

a ground state with Ω0 denotes the set of all ground states and the corresponding energy E0 = E (σx0 ) is

called ground state energy. Therefore,

lim
β→0

µβ(σx ) = 1

|Ω| and lim
β→∞

µβ(σx ) = 1

|Ω0|
I(σx ∈Ω0) (2.2.4)

Also, in this setting the cost function E is termed as the Hamiltonian or the energy function and is

defined as,

EG(σx ) =− log
m∏

j=1
ψa j (σx∂ j ) (2.2.5)

The most important thermodynamic potential in this regards is the Free energy which is defined as

FG(β) =− 1

β
log Z (β) (2.2.6)

whereas in calculations, it is often more convenient to use the Free entropy given by,

ΦG(β) =−βFG(β) = log Z (β) (2.2.7)

2.2.2 Some statistical physics models

In mathematical physics a wide range of interesting phenomenon occur when we make the number

of variables n →∞. From the above section it is clear to see that there is a direct map between CSPs

and the statistical physics problems. Let’s take an example of a spin glass models which are the

generalization of Ising model with the variables treated as spins σ j with σ j ∈ {−1,+1} and the coupling


CHAPTER 2. MODELS 16

J a either takes value from R or from {−1,+1}. Therefore, the energy function is defined as,

E(σx ) =−
m∑

a=1
J a

∏
j∈∂a

σ j (2.2.8)

But when it comes to the interaction between general spin-glass model to p-body, p-spin model comes

into picture. One of the most famous well-known model in this regard is the Edward-Anderson model

when p = 2. Moreover, the spin glass model can be formatted as a constraint satisfaction problem in

the β→∞ limit as the Boltzmann distribution in (2.2.1) minimizes our energy function E(σx ). For a

two body interaction, the variables should have either ferromagnetic (where J a > 0 and
∏

j∈∂aσ j = 1)

or, antiferromagnetic (where J a < 0 and
∏

j∈∂aσ j =−1). The general idea behind the ferromagnetic

regime is that when β is large (low temperature case), one of the spins begins to dominate the others

and the system shows a positive or negative magnetization. On the other hand when β is small (high

temperature case), there is no magnetization has been observed and the regime is called paramagnetic.

Then the obvious observation is to pinpoint the critical inverse temperature (βcrit) where the phase

transition occurs i.e., the system suddenly switches from paramagnetic to ferromagnetic.

Coming to the random k-SAT model, for each constraint a j with j ∈ [m] and for some β> 0 we can

rewrite the weight function and Hamiltonian as follows

ψa j (σa∂ j ) =
m∏

j=1
exp

(−β · 1{σÕ a j }
)

E(σ) =β ·
m∑

j=1
1{σÕ a j }

Here the Hamiltonian counts the number of unsatisfied assignments and a penalty of −β is imposed

to the satisfied clauses. As a result the partition function ZG(β) approximates the number of satisfying

assignments by taking the inverse temperature β to infinity. Moreover, when β=∞, the Gibbs distribu-

tion is the uniform distribution over the solution space as the unsatisfied clauses gets a zero penalty

and therefore the partition function ZG(β) counts the number of solutions exactly. However, it is easier

to handle the finite β case and take limit after.

Coming to random k-XORSAT which boils down to a problem of random linear system of equations

over F2, let’s take for the clause/equation a j the weight is given by,

ψa j (σa∂ j ) =
n∏

i=1
1
{ n∑

j=1
Ai jσx j=0

}
E(σ) =

n∑
i=1

1
{ n∑

j=1
Ai jσx j=0

}

where the partition function ZG can be computed as the cardinality of the kernel of A(G) and can be

written as,

|kerA(G)| = ZG =
∑

σx∈Fn
2

n∏
i=1

1
{ n∑

j=1
Ai jσx j=0

}

The weight function ψa j (σa∂ j ) can be extended by allowing the value zero when the equation is

unsatisfied, but the Gibbs distribution is always well-defined as the zero vector always belongs to the


17 CHAPTER 2. MODELS

Kernel of the matrix A and therefore, ZG = |kerA| > 0.

In view of weight function (sometimes called compatibility function) defined in (2.2.3), ψa j broad-

cast the temperature to the variable nodes in the factor graph G. Then for analyzing the partition

function one needs to look at the normalized limit of the “free entropy (ΦG(β))” referred as free entropy

density (φG) and defined as,

φG =φG(β) = lim
n→∞

1

n
ΦG(β) = lim

n→∞
1

n
log Z (β) (2.2.9)

A clear observation on the Free entropy density φG reveals that there can happen a phase transition at

the singularities of it if and only if φG is non-analytic. Two common types of phase transition in this

regard are the first and second order phase transition. The first order phase transition occurs when the

first derivative of φG w.r.t. β i.e., ∂
∂βφG is discontinuous at some β̃ and similarly for the second order

phase transition when ∂
∂β2φG is discontinuous. Concurrently the higher order phase transitions may

occur and described accordingly. Therefore, Free entropy density φG is one of the most important

entity for understanding the physical system and its changing behavior. A heuristic towards Free

entropy density φG is Belief Propagation, a statistical physics inspired message passing algorithm,

which we will describe in details in the next chapter.


3
Message Passing Algorithms

“Message passing algorithms have proved surprisingly successful in

solving hard constraint satisfaction problems on sparse random

graphs.”

– Andrea Montanari et.al.

Consider a universal problem of computing marginals of a graphical model with n variables

denoted as V = {x1, x2, · · · , xn} taking values in a finite setΩ. One naive approach for computing such

marginals is to take the sums over all configurations with time complexity |Ω|n . But when we are

talking about tree factor graphs, computing marginals on such model takes time grows linearly with n.

This can be done through a ’dynamic programming’ which recursively sums over all variables starting

from the leaves and moving towards the root of the tree. Such a recursive procedure is remodeled as a

distributed ‘message passing’ algorithm. These algorithms operate on ’messages’ associated with the

edges of the factor graph and update the messages recursively based on local computations done at

the vertices of the graph. In this chapter we will discuss few of such message passing methods along

with few algorithms associated with these methods.

3.1 Belief Propagation

Belief Propagation (in short BP) is one of most well known iterative message passing procedure for

computing marginals as well as to compute the partition function Z with respect to a measure µ

(defined in (2.2.3)) of a variable xi or any subset of variables. Moreover, it provides an efficient way

to sample a configuration σx from µ and the best thing is all these computations can be achieved in

18


19 CHAPTER 3. MESSAGE PASSING ALGORITHMS

polynomial time with respect to the sample size n. It is straightforward to prove that BP computes

such marginals exactly on trees. For this purpose BP is extremely effective in the case of loopy graphs

as well. The basic intuition behind this success is that BP as a local message passing procedure should

be successful when the underlying model is a locally tree like structure. There are many applications

of these type of factor graph models appear frequently in the field of probabilistic combinatorics as

well as in statistical physics. Despite these advantages, BP becomes ineffective in the emergence of

long-range correlations which in turn lead to a phase transition. In the later chapters we will see few of

such application.

3.1.1 BP messages

In 1962, R.G.Gallager [49] introduced BP messages for decoding the low density parity check matrix and

in 1988 by J.Pearl [92] it was again launched for the first time in the context of probabilistic inference.

Lets define two type of messages ν(t )
x→a and ν̂(t )

a→x associated with each edge (x, a) ∈ E of the factor

graph G= (V ,C ,E) at step t . Out of these twos the message ν(t )
x→a is going from a variable to a check

node whereas the message ν̂(t )
a→x is going from the check node to a variable node. More specifically,

ν(t )
x→a is the marginal of the variable xi when the check node a is removed whereas ν̂(t )

a→x is the marginal

of a variable xi when all the check nodes in ∂i \a have been discarded. As the messages are dependent

on the time parameter t , so for each t > 0 both messages are probability distributions overΩ. Initially,

both the messages are the uniform distribution over Ω i.e., ν(0)
x→a(s) = ν̂(0)

a→x (s) = 1/Ω for all x ∈V , a ∈C

and s ∈Ω. One can also initialize the messages by drawing i.i.d from a probability distribution P on

P (Ω). Furthermore, the BP equations [60] on a tree consists of the set of messages {ν(t )
x→a , ν̂(t )

a→x }(x,a)∈E

with t > 0 are given by,

ν(t+1)
x→a (s) = 1

Zx,a

∏
b∈∂x\a

ν̂(t )
a→x (s) (3.1.1)

ν̂(t )
a→x (s) = 1

Ẑx,a

∑
σ∈Ω∂a

1
{
σx = s

}
ψa(σ)

∏
y∈∂a\x

ν(t )
y→a(σy ) (3.1.2)

where the Zx,a ,Ẑx,a are the normalization constants of the messages. The equations (3.1.1)–(3.1.2)

are referred as the Belief Propagation (BP) equations for which one can consider the fixed point

equations. Moreover, all the messages are updated in parallel. It is clear from the above BP equations

that if ∂x\a = ; then equation 3.1.1 is the uniform distribution over Ω. Similarly, if ∂a\x = ; then

ν̂(t+1)
a→x (s) ∝ψa(s). Figure 3.1 shows a pictorial illustration of the BP equation update rules. Moreover,

the algorithm 3.1 (also see [66]) provides the iterative procedure for finding a solution of the BP

equations (3.1.1–3.1.2).

The obvious question is that under which condition(s) the messages converge to a limit (ν∗x→a ,ν∗a→x ).

From [66], it is clear that on a tree of diameter tmax the algorithm 3.1 guarantees to find the set of

messages exactly, independently of the choice of the initialization and the updating rules given in

(3.1.1)–(3.1.2). Moreover in this algorithm we haven’t specified any ordering of the edges (x, a) for the

update of the messages. So, reshuffling the ordering, taking any random permutation of edges before


CHAPTER 3. MESSAGE PASSING ALGORITHMS 20

Figure 3.1: Left: Factor graph involved in computing ν(t+1)
x→a which is a function of all ’incoming

messages’ ν̂(t )
b→x with b ̸= a. Right: Factor graph involved in computing ν̂(t )

a→x which is a function of all

’incoming messages’ ν(t )
y→a with y ̸= x.

Input: a factor graph G= (V ,C ,E), set of functional nodes {ψa}a∈C , precision accuracy ε,
maximum number of iterations tmax.

Output: A set of messages {ν(·),ν̂(·)}, or state ’Not Converge’ if fails.

1 Initialization: For each edge {x, a} ∈ E , initialize νx→a(·) and ν̂a→x (·) as i.i.d. random variables
with distribution P.

2 for t = 0, · · · , tmax do
3 Compute two messages: first {ν̂(t )

a→x }(x,a)∈E , then {ν(t+1)
x→a }(x,a)∈E using (3.1.1) if δ(maximum

message change) < ε then
4 return set of messages {ν(t+1)

x→a , ν̂(t )
a→x }

5 return “Not Converged”.

Algorithm 3.1: Belief Propagation algorithm [66]

each updating of messages are allowed as BP is used as a heuristic on factor graphs with loops without

guarantee of convergence.

3.1.2 Computing marginals

Our next goal is to compute the marginals µ(x) of a variable x ∈ V . Since we have in our hand the

solution of the BP equations from algorithm 3.1, using the Markov property and the finite tree factor

graph model we can construct the marginal [66] of the variable x using:

µ(x) ∝
∏

a∈∂x

( ∑
σ∂b\x ,s∈Ω

1{σx = s}ψb(σ∂b)
∏

y∈∂b\x
νy→b(σy )

)
(3.1.3)


21 CHAPTER 3. MESSAGE PASSING ALGORITHMS

Then using the equation (3.1.2) we get:

µ(x) = 1

Zx

∏
a∈∂x

ν̂a→x (s) (3.1.4)

where Zx is a normalization constant. So far, we see how the marginal computation of a variable using

the BP equations and know that the BP messages converge to a limit (ν∗x→a ,ν∗a→x ). The next question

should come into our mind that if the limit(s) does exist then what is/are the significance of such

limit(s)? Although the marginals µ(x) for all x ∈V are computed exactly on tree factor graph model

using the BP fixed points, but generally this is not true always because of several limits depend on the

initialization of factor graph which contains short cycles (cycles with bounded length). However, BP

provides a good approximation on the marginal computation with the help of a correct initialization if

the corresponding factor graph doesn’t contain too many short cycles. Furthermore, in case of tree

factor graph model one can express the free entropy density Φ= 1
n log Z (where Z is referred as the

partition function) from the set of messages which is the solution of BP equation (3.1.1)–(3.1.2) is

known as Bethe Free Entropy (denoted asΦBethe) in physics jargon. In the next subsection we will see

in details of this quantity.

3.1.3 Bethe-Free Entropy

In 1935 German-American physicist Hans Albrecht Eduard Bethe in [14] first introduced the Bethe

free entropy density for the ferromagnetic Ising model. Using the decomposition property of Gibbs

distribution, we introduce the Bethe Free EntropyΦBethe in terms of 2|E | BP messages {νx→a(·), ν̂a→x (·)}

and can be expressed as [66]

ΦBethe = 1

n

[ ∑
x∈V

B(t )
x +

∑
a∈C

B(t )
a −

∑
(xa)∈E

B(t )
ax

]
(3.1.5)

where

B(t )
x = log

[ ∑
σ∈Ω∂a

ψa(σ)
∏

x∈∂a
ν(t )

x→a(σx )
]

,

B(t )
a = log

[∑
s∈Ω

∏
a∈∂x

ν̂(t )
a→x (s)

]
and

B(t )
ax =

∑
x∈∂a

log
[∑

s∈Ω
ν(t )

x→a(s) · ν̂(t )
a→x (s)

]

Roughly speaking, B(t )
x corresponds to the contribution of the variable nodes to the partition function,

B(t )
a to the contribution of the check nodes and B(t )

ax to the contribution of the edges. The aim is that,

lim
n→∞

1

n
E[log Z ] = lim

t→∞ lim
n→∞Φ

Bethe (3.1.6)


CHAPTER 3. MESSAGE PASSING ALGORITHMS 22

The limit ν∗x→a ,ν∗a→x which are the fixed points of the BP equations (3.1.1)–(3.1.2) (if they exist) are

expected to correspond to the stationary points ofΦBethe [101]. The quantity such as the BP messages

and the Bethe-Free entropy are model specific; i.e., they depend on the factor graph or the setΩ and

the convergence problem on top of this.

In this thesis, we find an explicit expression for the logarithm of the number of solutions of a

random k-SAT formula F = F d ,k (n) for every d within the Gibbs uniqueness threshold (discuss in

Chapter 4). The result comes in terms of the Bethe-Free entropy Bd ,k which is a function defined for a

probability measure π ∈P (0,1). We discuss the details of this result in Chapter 7.

3.2 Algorithms

In this section we discuss a few algorithms/processes on random k-XORSAT/k-SAT instances which

we used in this thesis with the technique of Belief Propagation.

3.2.1 Belief Propagation Guided Decimation

In early 2000s, physicists have proposed a message passing algorithm called Belief Propagation Guided

Decimation (BPGD) which performs impressively on various random CSPs [72, 96] according to the

computer experiments.

BPGD sets its ambitions higher than merely finding a solution to the k-XORSAT instance F : the

algorithm attempts to sample a solution uniformly at random. To this end BPGD assigns values to

the variables x1, . . . , xn of F one after the other. In order to assign the next variable the algorithm

attempts to compute the marginal probability that the variable is set to ‘true’ under a random solution

to the k-XORSAT instance, given all previous assignments. More precisely, suppose BPGD has assigned

values to the variables x1, . . . , xt already. Write σBP(x1), . . . ,σBP(xt ) ∈ {0,1} for their values, with 1

representing ‘true’ and 0 ‘false’. Further, let F BP,t be the simplified formula obtained by substituting

σBP(x1), . . . ,σBP(xt ) for x1, . . . , xt . We drop any clauses from F BP,t that contain variables from {x1, . . . , xt }

only, deeming any such clauses satisfied. Thus, F BP,t is a XORSAT formula with variables xt+1, . . . , xn .

Its clauses contain at least one and at most k variables, as well as possibly a constant (the XOR of the

values substituted in for x1, . . . , xt ).

Input: a random k-XORSAT formula F with variables x1, . . . , xn conditioned on being satisfiable
Output: an assignment σBP : {x1, . . . , xn} → {0,1}.

1 for t = 0, . . . ,n −1 do
2 compute the BP approximation µF BP,t ;

3 set σBP(xt+1) =
{

1 with probability µF BP,t

0 with probability 1−µF BP,t

;

4 return σBP;

Algorithm 3.2: The BPGD algorithm (Section 1.2, [22]).


23 CHAPTER 3. MESSAGE PASSING ALGORITHMS

Let σF BP,t be a uniformly random solution of the XORSAT formula F BP,t , assuming that F BP,t

remains satisfiable. Then BPGD aims to compute the marginal probability P
[
σF BP,t (xt+1) = 1 | F BP,t

]

that a random satisfying assignment of F BP,t sets xt+1 to true. This is where Belief Propagation (‘BP’)

comes in. An efficient message passing heuristic for computing precisely such marginals, BP returns

an ‘approximation’ µF BP,t of P
[
σF BP,t (xt+1) = 1 | F BP,t

]
. Having computed the BP ‘approximation’, BPGD

proceeds to assign xt+1 the value ‘true’ with probability µF BP,t , otherwise sets xt+1 to ‘false’, then moves

on to the next variable. The pseudocode is displayed as Algorithm 3.2.

Remark 3.2.1.

• If the BP approximations are exact, i.e., if F BP,t is satisfiable andµF BP,t =P
[
σF BP,t (xt+1) = 1 | F BP,t

]

for all t , then Bayes’ formula shows that BPGD outputs a uniformly random solution of F .

However, there is no universal guarantee that BP returns the correct marginals.

• Due to the algebraic structure of the XOR operation, BPGD is easier to analyze on random k-

XORSAT and in fact the marginal probabilities are guaranteed to be half integral as seen in below

Fact 3.2.2 i.e.,

P
[
σF BP,t (xt+1) = 1 | F BP,t

] ∈ {0,1/2,1}. (3.2.1)

Fact 3.2.2. The BP messages and marginals are half-integral for all t , i.e., for all t ≥ 0 and s ∈ {0,1} we

have

µF,x→a,ℓ(s),µF,a→x,ℓ(s),µF,x,ℓ(s) ∈ {0,1/2,1}. (3.2.2)

Furthermore, for all ℓ> 2
∑

a∈C (F ) |∂a| we haveµF,x,ℓ(s) =µF,x,ℓ+1(s). (Since the total number of messages

is bounded by 2
∑

a∈C (F ) |∂a|, the BP messages will have converged point wise after this number of

iterations.)

3.2.2 Decimation Process

In addition to the BPGD algorithm itself, the heuristic work [96] considers an idealized version of the

algorithm, the decimation process. This is a thought experiment that highlights the conceptual reasons

behind the success/failure of BPGD algorithm. Just like BPGD, it also assigns values to variables one

after the other but instead of the BP ‘approximations’, the decimation process uses the actual marginals

given its previous decisions. To be precise, suppose that the input formula F is satisfiable and that

variables x1, . . . , xt have already been assigned values σDC(x1), . . . ,σDC(xt ) in the previous iterations.

Obtain F DC,t by substituting the values σDC(x1), . . . ,σDC(xt ) for x1, . . . , xt and dropping any clauses

that do not contain any of xt+1, . . . , xn . Thus, F DC,t is a XORSAT formula with variables xt+1, . . . , xn .

Let σF DC,t be a random satisfying assignment of F DC,t . Then the decimation process sets xt+1 ac-

cording to the true marginal P
[
σF DC,t (xt+1) = 1 | F DC,t

]
, thus ultimately returning a uniformly random

satisfying assignment of F . The pseudocode is displayed as Algorithm 3.3.


CHAPTER 3. MESSAGE PASSING ALGORITHMS 24

Input: a random k-XORSAT formula F , conditioned on being satisfiable
Output: an assignment σDC : {x1, . . . , xn} → {0,1}.

1 for t = 0, . . . ,n −1 do
2 compute πF DC,t =P

[
σF DC,t (xt+1) = 1 | F DC,t

]
;

3 set σDC(xt ) =
{

1 with probability πF DC,t

0 with probability 1−πF DC,t

;

4 return σDC;

Algorithm 3.3: The decimation process (Section 1.4, [22]).

Remark 3.2.3. If the ’BP approximations’ are correct, the decimation process and BPGD are identical.

The key question should come to our mind that for what parameter regimes these two processes coincide

or diverge.

Later in Chapter 6 we will see in details the connection of BPGD and decimation process and their

phase transitions and the performance of BPGD by providing the exact success/failure probability

regimes which verifies mathematically the heuristic work by Ricci-Tersenghi and Semerjian [96].

3.2.3 Unit Clause Propagation

In this thesis, we analyze two variants of Unit Clause Propagation, one for random 2-SAT [23] and

another one for random k-XORSAT model [22].

Employed by all modern SAT solvers as a sub-routine, Unit Clause Propagation is a linear time

algorithm that tracks the implication of the partial assignments. As we know that random 2-SAT

problem is in P, the polynomial algorithm, that solves it, is a sequential assignment procedure that a

variable is assigned to a given value at each time, ends either when all the variables are assigned and

the resultant assignment is SAT (or when it has proven the formula UNSAT).

At each step, once a variable is assigned, the initial CNF-formula can be simplified according to a

reduction process. Suppose we set a variable i to xi = 1 (the case xi = 0 is symmetric). Each clause

containing the literal xi is satisfied by this assignment and can be removed from the formula. On the

other hand, clauses containing the opposite literal ¬xi cannot be satisfied by this assignment; thus,

the literal ¬xi is removed from those clauses, reducing their length by one. As a consequence, this

reduction may produce a 0-clause: for instance, if in the original formula F there was already a unit

clause c =¬xi , then setting xi = 1 immediately creates a contradiction, and F |xi = 1 is UNSAT. In that

case, backtracking is required: one must undo the assignment xi = 1 and instead try xi = 0 in order to

proceed.

During this process, whenever the simplified formula contains a unit clause, that variable is forced

to take the unique value satisfying it. This assignment may generate new unit clauses, which in turn

must also be satisfied, and so on. This cascading sequence of forced steps is known as Unit Clause

Propagation (UCP). However, for worst-case 2-SAT instances, UCP alone does not guarantee success: it

must be combined with backtracking to systematically explore assignments when contradictions arise.


25 CHAPTER 3. MESSAGE PASSING ALGORITHMS

Let’s consider a 2-CNF formula F along with a set L of literals. These literals are deemed to be

‘true’. The algorithm then pursues direct logical implications, thereby identifying additional ‘implied’

literals that need to be true so that no clause gets violated. This procedure is outlined in Steps 1–2

of Algorithm 3.4; the outcome of Steps 1–2 is independent of the order in which literals/clauses are

processed.

Input: A 2-CNFΦ along with a set L of literals deemed true.

1 while there exists a clause a ≡ l ∨¬l ′ with l ′ ∈L and l ̸∈L do
2 add literal l to L ;
3 For variables x ∈V (Φ) such that x ∈L or ¬x ∈L let

σx =





1 if x ∈L and ¬x ̸∈L ,

−1 if ¬x ∈L and x ̸∈L ,

0 otherwise.

Let C be the set of all clauses a such that σx = 0 for all x ∈ ∂a and return L ,C ,σ;

Algorithm 3.4: Pessimistic Unit Clause Propagation (‘PUC’) (Section 2.4, [23]).

Clearly, trouble occurs if PUC ends up placing both a literal l and its negation ¬l into the set L .

Our ‘pessimistic’ Unit Clause variant makes no attempt at mitigating such contradictions. Instead,

Step 3 just constructs a partial assignment where all conflicting literals are set to a dummy value zero.

In addition to this, PUC identifies the set C of conflict clauses that contain conflicted variables only.

Now consider a 2-CNF F on a set of variables V (F ). For each possible literal l ∈ {x,¬x : x ∈V (F )}

we run PUC (F ,L = {l }). Let C (F , {l }) be the set of conflict clauses returned by PUC. Obtain the pruned

formula F̂ from F by removing all clauses in C (F ) =⋃
l C (F , {l }). Then it is easy to verify the following

fact:

Fact 3.2.4. For any 2-CNF F the pruned 2-CNF F̂ is satisfiable.

Remark 3.2.5. The pruned formula F̂ could have far fewer clauses than the original formula F . Accord-

ingly, even if F is satisfiable the number Z (F̂ ) of satisfying assignments of F̂ could dramatically exceed

Z (F ).

As UCP returns all the assignments that were forced due to the presence of unit clauses, there are

three possible output for this process:

• the output is an assignment of all the variables in V , then F is SAT which in turn applies that the

assignment produced is a solution to F .

• If the obtained simplified formula doesn’t contain unit clauses, then the output is a partial

assignment which can be extended to a complete satisfying assignment if and only if the input

formula is satisfiable i.e., if and only if F is SAT.

• the output is UNSAT which in turn the input formula F is UNSAT.


CHAPTER 3. MESSAGE PASSING ALGORITHMS 26

Coming to the analysis of UCP on random k-XORSAT, due to the Fact 3.2.2, the BPGD algorithm

effectively reduces to UCP, a purely combinatorial algorithm [66, 96]. It works exactly same as we

discussed before by attempting to assign random values to as yet unassigned variables one after the

other. After each such random assignment the algorithm pursues the ‘obvious’ implications of its

decisions. Specifically, the algorithm substitutes its chosen truth values for all occurrences of the

already assigned variables. If this leaves a ‘unit clause’, the algorithm assigns that variable so as to

satisfy the unit clause. If a conflict occurs because two unit clauses impose opposing values on a

variable, the algorithm declares that a conflict has occurred, sets the variable to false and continues; of

course, in the event of a conflict the algorithm will ultimately fail to produce a satisfying assignment.

The pseudocode for the algorithm is displayed in Algorithm 3.5.

1 Let U =; and let σUC : U → {0,1} be the empty assignment;
2 for t = 0, . . . ,n −1 do
3 if xt+1 ̸∈U then
4 add xt+1 to U ;
5 choose σUC(xt+1) ∈ {0,1} uniformly at random;
6 while F [σUC] contains a unit clause a do
7 let x be the variable in a;
8 let s ∈ {0,1} be the truth value that x needs to take to satisfy a;
9 if another unit clause a′ exists that requires x be set to 1− s then

10 output ‘conflict’ and let σUC(x) = 0;
11 else
12 add x to U and let σUC(x) = s;

13 return σUC;

Algorithm 3.5: The UCP algorithm for random k-XORSAT instance F (Section 6.1, [22]).

Let F UC,t denote the simplified random k-XORSAT formula obtained after the first t iterations

(in which the truth values chosen for x1, . . . , xt and any values implied by unit clauses have been

substituted). We notice that the values assigned during Steps 6–12 are deterministic consequences of

the choices in Step 5. In particular, the order in which unit clauses are processed Steps 6–12 does not

affect the output of the algorithm. Later in this thesis, we will see in details the connection/performance

of UCP algorithm with the previously discussed BPGD algorithm on random k-XORSAT in Chapter 6.

3.2.4 Pure Literal Pursuit

In our third result of this thesis [21] on the lower bound of Gibbs Uniqueness threshold on random

k-SAT, the main result relies on the algorithm called Pure Literal Pursuit(’PULP’). Its purpose is to trace

the repercussions of setting a relatively small number of variables to specific truth values which will

allow us to compare the number of satisfying assignments that set a few chosen variables to specific

values to the total number of satisfying assignments. Given a k-CNF F and a set L of literals of F that

we deem to be set to ‘true’. We would like to identify a superset L̄ ⊇L of literals (L̄ is a ‘closure’ of L )

with the following properties:


27 CHAPTER 3. MESSAGE PASSING ALGORITHMS

PULP1 every clause a that contains a literal from ¬L̄ = {¬l : l ∈ L̄ } also contains a literal from L̄ .

PULP2 there is no literal l such that l ,¬l ∈ L̄ .

It may be impossible to satisfy PULP1 and PULP2 simultaneously. In this case we ask PULP to report a

‘contradiction’. But if PULP1–PULP2 can be satisfied, we aim to find a closure L̄ of as small size |L̄ | as

possible.

The combinatorial idea behind PULP1–PULP2 is as follows. Deeming the literals from the initial

set L ‘true’, our goal is to reconcile this assumption with the formula F . To this end we enhance the

set L . Clearly, any clause that contains the negation ¬l of a literal l that we deem true also needs

to contain another literal l ′ that is set to true. This is what PULP1 asks. Furthermore, it would be

contradictory to deem both l and its negation ¬l true; this is PULP2.

In order to identify a ‘small’ closure L̄ the PULP algorithm resorts to pure literal elimination,

let’s consider a variable x is pure in a CNF formula F if sign(x, a) = sign(x,b) for any two clauses

a,b ∈ ∂x. Clearly, if our objective is to construct a satisfying assignment, we might as well set all pure

variables x to the value that satisfies all clauses a ∈ ∂x and disregard these clauses henceforth. In light

of this observation, pure literal elimination repeatedly removes all clauses that contain a pure variable.

Naturally, every round of clause removals may create new pure variables, and thus more clauses may

be ripe for removal in the next round. For a clause a of the original formula F let ha(F ) ≥ 1 be the

number of the round at which pure literal elimination removes a. If a is never removed then we set

ha(F ) =∞.

The PULP algorithm invokes a slightly modified version of pure literal elimination to accommodate

the initial set L of literals. Specifically, for a variable x of a CNF F and s ∈ {±1} let F [x 7→ s] be the

CNF obtained by removing all clauses a ∈ ∂x with sign(x, a) = s and removing the literal −s · x from all

a ∈ ∂x with sign(x, a) =−s. The definition reflects that if we set x to value s, all a ∈ ∂s x will be satisfied,

while all a ∈ ∂−s x will have to be satisfied by one of their other constituent literals. Further, let

hx (s,F ) =




0 if ∂−s
F x =;,

max
{
ha(F [x 7→ s]) : a ∈ ∂−s

F x
}

otherwise.
∈ [0,∞]. (3.2.3)

We refer to hx (s,F ) as the height of literal s · x in F .

The PULP algorithm, displayed as Algorithm 3.6, harnesses the heights as follows. In its attempt to

precipitate PULP1 and PULP2 the algorithm iteratively enhances the set L of literals deemed to be

‘true’. For any clause a that violates PULP1 and that contains a literal l ̸∈ ¬L the algorithm adds one

such literal l of minimum height to L . This choice is intended to keep the ultimate size of the closure

small; one could say that PULP uses height as a proxy of ‘size’. If at any point the algorithm encounters

a clause a that consists of literals from ¬L only, the algorithm reports a contradiction and aborts.

Remark 3.2.6. To break ties that may occur in the execution of Steps 3 and 7 of PULP we assume that

the variables and clauses of F are numbered so that Steps 3 and 7 can choose the clause/variable with

the smallest number that satisfies the respective requirements. In due course we will run PULP on (finite


CHAPTER 3. MESSAGE PASSING ALGORITHMS 28

Input: A k-CNF F and a set L of literals.

1 Let L̄ =L ;
2 while there is a clause a that contains a literal from ¬L̄ but no literal from L̄ do
3 Pick such a clause a that minimizes the distance from the initial set L = {|l | : l ∈L };
4 if a consists of literals l ∈¬L̄ only then
5 return ‘contradiction’ and halt;
6 else
7 choose x ∈ ∂a with x,¬x ̸∈ L̄ that minimizes hx (sign(x, a),F ) and add sign(x, a) ·x to L̄

8 return L̄

Algorithm 3.6: The PULP algorithm (Section 2.3, [21])

subtrees of) the Galton-Watson tree T. To number the variables and clauses of T we equip each of them

with an independent Gaussian label. Since T comprises a countable number of clauses/variables, these

labels will almost surely be pairwise distinct.

The analysis of this algorithm can be found in Chapter 7 briefly.

3.3 Warning Propagation

Warning Propagation is a purely combinatorial message passing algorithm in the same family like

Belief Propagation. For some graph based matrix and constraint satisfaction models the ’discrete’

version (where the messages are from the finite alphabetΩ instead of being probability distributions)

of Belief Propagation is treated as Warning Propagation which helps to find direct implications of

a recursive processes associated with graphs. So, for a graph G let M (G) be the set of all vectors

(ωu→v )(u,v)∈V (G)2:{u,v}∈E(G) ∈Ω2|E(G)|. Here also the parallelism holds for updating messages based on

some fixed rules like in BP. The update rule ϕ is defined for d ∈N and
(Ω

d

)
, the set of all d-ary multisets

with elements fromΩ:

ϕ :
⋃

d≥0

(
Ω

d

)
→Ω (3.3.1)

which takes any multiset of input messages and produces an output messages. Now the corresponding

Warning Propagation operator is defined as,

WPG : M (G) →M (G)

ω= (ωu→v )uv → (
ϕ ({ωu→v : uv ∈ E(G),u ̸= v})

)
uv

The message from node u to node v is updated according to the WP updated rule applied to the multiset

of messages that u receives from all of its neighbors except v . In factor graph setting, similar as BP, WP

also associates two message sequences (ωF,x→a ,ωF,a→x ) with every adjacent clause/variable pair. In

this thesis we provide a detailed analysis of Warning Propagation when we analyze the performance


29 CHAPTER 3. MESSAGE PASSING ALGORITHMS

of Belief Propagation Guided Decimation(’BPGD’) on random k-XORSAT model (more details can be

found in Chapter 6).

Due to the half-integrality of the BP messages and marginals in random k-XORSAT, BP is equivalent

to Warning Propagation. The messages take one of three possible discrete values {f,u,n} (‘frozen’,

‘uniform’, ‘null’). To trace the BP messages for which the two values {n,u} would be necessary. However,

the third value f will prove useful in order to compare the BP approximations (computed using ’BPGD’

algorithm discussed in Section 3.2.1) with the actual marginals(computed using ’Decimation’ process

discussed in Section 3.2.2). Although the messages initially in BP are given as uniform i.e.,

ωF,x→a,0(s) =ωF,a→x,0(s) = 1/2 (s ∈ {0,1}).

we launch WP from all frozen start values.

ωF,x→a,0 =ωF,a→x,0 = f for all a, x. (3.3.2)

Subsequently the messages get updated according to the rules:

x

a

y

n

n n n

x

a

y

f

f u u

x

a

y

u

n f u

a

x

b

n

n f u

a

x

b

f

f u u

a

x

b

u

u u u

Figure 3.2: Up: A local snapshot of Warning Propagation update rules for message νF,a→x,ℓ defined in
(3.3.3). Down: Similarly, a local snapshot of Warning Propagation update rules for message νF,x→a,ℓ

defined in (3.3.4)


CHAPTER 3. MESSAGE PASSING ALGORITHMS 30

ωF,a→x,ℓ+1 =





n if ωF,y→a,ℓ = n for all y ∈ ∂a \ {x},

f if ωF,y→a,ℓ ̸= u for all y ∈ ∂a \ {x} and ωF,y→a,ℓ ̸= n for at least one y ∈ ∂a \ {x},

u otherwise,

(3.3.3)

ωF,x→a,ℓ+1 =





n if ωF,b→x,ℓ = n for at least one b ∈ ∂x \ {a},

f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x \ {a} and ωF,b→x,ℓ = f for at least one b ∈ ∂x \ {a},

u otherwise.

(3.3.4)

In addition to the messages we also define the mark of variable node x by letting

ωF,x,ℓ =





n if ωF,b→x,ℓ = n for at least one b ∈ ∂x,

f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x and ωF,b→x,ℓ = f for at least one b ∈ ∂x,

u otherwise.

(3.3.5)

We conclude the chapter by establishing a relationship between BP and WP.

Fact 3.3.1. For all t ≥ 0 and all x, a we have

νx→a,ℓ(1) = 1/2 ⇔ ωF,x→a,ℓ ̸= n, (3.3.6)

νa→x,ℓ(1) = 1/2 ⇔ ωF,a→x,ℓ ̸= n, (3.3.7)

νx,ℓ(1) = 1/2 ⇔ ωF,x,ℓ ̸= n. (3.3.8)

In the next chapter we will have a detailed overview of the phase transitions of different random

satisfiability problems.


4
Phase Transitions in random CSPs

“Recent research indicates that many convex optimization problems

with random constraints exhibit a phase transition as the number of

constraints increases..”

–Dennis Amelunxen et al.

In this chapter we provide a detailed analysis on the phase transitions occurring in random

constraint satisfaction problems when the clause density α is varied. Particularly, we will take as an

example of mainly two random CSP models, one is random k-SAT and another is random k-XORSAT

which are the main theme of this thesis, but many other random CSP ensembles share the same

qualitative nature.

4.1 The Satisfiabilty Transition

Recall that the satisfiability threshold αsat (k) separates a phase α < αsat (k) where the random in-

stances are SAT w.h.p. to a phase where random instances are UNSAT w.h.p. One of the most well-

known technique for an estimation of the satisfiability threshold discussed in [15, 59, 65, 72, 79] is the

cavity method. However, the existence of the satisfiability transition is not yet proven for all k values.

The below conjecture summarizes the fact.

Conjecture 4.1.1. Let F = F (k,α) be a random CNF formula with n variables and m = αn clauses

with α is the clause density, drawn from the random k-SAT ensemble. Then for any k ≥ 2 there exists a

31


CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 32

constant αsat (k) such that for any ε> 0,

lim
n→∞P[F (k,αsat (k)−ε) is SAT ] = 1 lim

n→∞P[F (k,αsat (k)+ε) is SAT ] = 0

This conjecture has been proved for k = 2 with αsat = 1 by Chvatal and Reed in 1992 [26] and

Goerdt in 1996 [53]. Later in 2015, Ding, Sly and Sun prove the satisfiability conjecture for large but

finite k value and show that the value of αsat (k) is given by the one step symmetry breaking cavity

method prediction. However, Friedgut in 1999 [47] provides a partial result in this regards:

Theorem 4.1.2. For every k ≥ 2, there exists a sequence αk (n) such that for all ε> 0,

lim
n→∞P[F (k,αk (n)−ε)is SAT ] = 1 lim

n→∞P[F (k,αk (n)+ε)is SAT ] = 0

The above theorem provides the transition from SAT to UNSAT which takes place in a window

smaller than any fixed ε for large enough n. However, it remains to prove the convergence of the

sequence αk (n) to some value αsat (k) as n → ∞ to prevent from possible oscillations. There are

several methods to derive the upper and the lower bounds rigorously on this sequence αk (n). The

most well-known choice for deriving the upper bound on the number of satisfying assignments is to

apply the Markov’s inequality (or, the first moment bound) whereas for obtaining the lower bound on

the number of solutions, Chebyshev’s inequality (or, second moment method) which is more delicate

to implement than the first moment bound.

More precisely, define a function on the set of instances U (F ) such that,

U (F ) =




0, if F is UNSAT

≥ 1, otherwise

Then after applying Markov’s inequality one can get,

P[F is SAT] ≤ E[U (F )]

As we don’t know how to compute the quantity U (F ) = 1[F is SAT], the first choice will be to use

U (F ) = Z (F ), the number of solutions of F . Then given an assignment σ, by linearity of expectation

and by uniformity in the clause generation E[Z (F )] is given by,

E[Z (F )] = 2n(1−2−k )m = exp[n(log2+α log(1−2−k ))]

When n →∞ we get,

E[Z (F )] →




0, if α>αu(k),

+∞, if α<αu(k)

with αu(k) =− log2
log(1−2−k) . Because of the number of satisfying assignments Z (F ) can take exponentially


33 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS

large values as n →∞ and its fluctuations can be exponentially large as well, one can expect that

the above upper bound αu is not tight. Later in 1998, Kirousis et.al. [57] define the function U (F )

as the number of locally maximal SAT assignment which counts the number of solutions in a small

subclass of solutions. Thus they obtain a new more tighter upper bound α̂u(k) that is the solution of

the equation

α log(1−2k )+ log(2−exp(−kα/(2k −1))) = 0

Coming to the lower bound on the value αk (n), in 2006 Achlioptas and Moore [4] first introduced the

method for obtaining the lower bound using second moment method. Applying the second moment

to the function U (F ) we get,

P[F is SAT] ≥ E[U (F )]2

E[U (F )2]

It can be shown that applying this method to the number of solutions Z (F ) doesn’t provide a useful

bound because the fraction E[Z (F )]2

E[Z (F )2] disappears for any non-zero value of α. Instead one can choose

another function U to be the size of the subset of the set of solutions. Using this, Achlioptas and Moore

in [4] showed that for any k ≥ 3,

lim
n→∞P[F (k,α) is SAT] = 1, if α≤ 2k−1 log2−2

For n →∞ this lower bound along with the upper bound provides the scaling αsat (k) =O(2k ). In the

next section we will discuss briefly ’quenched’ and ’annealed’ techniques which is very useful in the

analysis of satisfiability threshold on random CSP problems.

4.2 Quenched and Annealed Techniques

Let us consider a graphical model G and its corresponding measure µG (F ). The support of the model is

the set of solutions of a given instance F :

µG (F )(σ) =




0, if σ is not a solution

> 0, otherwise

When σ is not a solution i.e., µG (F )(σ) = 0 we can introduce the parameter inverse temperature (β) and

bringing the normalized free entropy (defined in (2.2.7)) as follows:

ΦG =Φ(
G (F ),β

)= 1

n
log Z (G (F ),β) (4.2.1)

Now when the instance is randomly drawn (for instance from the random k-SAT ensemble), the mea-

sure µG (F ) becomes random. Determining the typical properties of measure µG (F ) and the normalized


CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 34

free entropy density ΦG are two most important quantities for estimating the partition function which

counts the number of solutions in any random satisfiability problems. For this the quenched free

entropy density is defined over the random ensemble of instances:

Φque(G ) = lim
n→∞

1

n
E[log Z (G (F ))] (4.2.2)

Usually, this quantity cannot be evaluated exactly on random satisfiability problems, but can be

estimated using non-rigorous cavity method which we will discuss later in this chapter. There is a

natural upper bound on this quantity provided by normalized annealed free entropy density,

Φann(G ) = lim
n→∞

1

n
logE[Z (G (F ))] (4.2.3)

It is straightforward from Jensen’s inequality applied on the number of solutions Z (G (F )) yields:

Φann(G ) ≥Φque(G ) (4.2.4)

Later in this chapter, when we will talk about the replica symmetry trick, we will again introduce the

’quenched free energy’ quantity. In the second result of this thesis, we harness ’quenched’ arguments

which was partly developed in some prior work [12,29] on the rank of random matrices over finite fields

to establish a precise connection between the decimation process and the performance of BPGD [22].

4.3 Gibbs measure and Long range correlation

The effectiveness of belief propagation relies on a basic assumption that the adjacent variable nodes

becomes weakly correlated with respect to the resulting distribution when a check node is pruned

from the factor graph. But when the factor graph contains small loops or variables are correlated at a

long distance then the above hypothesis may break down [66]. So, in factor graphs with locally tree like

structure, the long range correlation is responsible for the failure of BP. Thus a phase transition will

occur with the emergence of such long range correlation separating ’weakly correlated’ and ’highly

correlated’ phase. The central tool behind the study of any random CSPs such as random k-SAT and

random k-XORSAT through the lens of statistical mechanics is ’Gibbs measure’ which encodes the

uniform distributions over all set of solutions. Thus it provides insight into the correlation decay, algo-

rithmic tractability and various phase transitions [38]. This section develops the framework of Gibbs

measure in sparse random structures [43] by introducing correlation decay and Gibbs uniqueness

property followed by the landscape of phase transition – replica symmetry, one-step replica symmetry

breaking (1-RSB cavity method) and the (non)-reconstruction / reconstruction properties on trees.

4.3.1 Gibbs measure on random CSPs

Recall the Boltzmann(Gibbs) distribution from Chapter 2, with F be an instance of a random CSP

consists of variable set V = {x1, x2, · · · , xn} and constraint set C = {c1,c2, · · · ,cm}. A satisfying assignment


35 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS

σ ∈ {0,1}n is a solution if all the constraints are satisfied. The associated Gibbs measure at inverse

temperature β≥ 0 is defined as,

µβ(σ) = 1

Zβ(F )
exp

(−βH(σ)
)

where H (σ) is number of violated constraints under the assignment σ and the partition function Zβ(F )

is given by,

Zβ(F ) =
∑

τ∈{0,1}n

exp
(−βH(τ)

)

Remark 4.3.1.

• At β=∞, the Gibbs measure µβ(F ) is the uniform distribution over all satisfying assignments.

• At finite β, µβ(F ) interpolates between uniform randomness and strong bias toward solutions.

Thus the Gibbs measure encodes the solution space geometry and serves as a basis for analyzing

the correlations between variables.

4.3.2 Correlation decay and Gibbs Uniqueness

In statistical mechanics, physical systems that have only short range correlation should relax rapidly to

their equilibrium distribution. The reason behind this depends on the different degrees of freedom. If

the degrees of freedom are independent, then the system relaxes on microscopic scales (namely the

relaxation time of a single particle, spin etc.), whereas if they are not independent but their correlations

are short-ranged, they can be harsh in such a way that they become nearly independent. For two

variables xi and x j ∈V , define their correlation under the Gibbs measure [38, 40] µβ by

Corrµβ(xi , x j ) = Eµβ [xi x j ]−Eµβ [xi ]Eµβ [x j ]

So we need to use a measure of how much the joint distribution µi j (·, ·) of xi and x j is different from

their product marginals µi (·) times µ j (·). Thus defining the two-point correlation [66] by averaging

their variation distance ||µi j (·, ·)−µi (·)µ j (·)|| over the vertices i , j :

Corr(2) ≡ 1

n

∑
i , j∈V

||µi j (·, ·)−µi (·)µ j (·)||

Remark 4.3.2.

• Correlation decay occurs if correlations Corrµβ(xi , x j ) vanish with the distance between two nodes

i , j ∈V in the graph, i.e.,

Corrµβ(xi , x j ) → 0 as dist(i , j ) →∞


CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 36

• Absence of correlation decay indicates long-range dependencies which often indicates the cluster-

ing or broken symmetry.

These phenomena are linked to the uniqueness of Gibbs measure on the infinite tree limit (local

weak convergence) [21, 40].

Gibbs Uniqueness

Begin with the Galton-Watson tree T=Td ,k , which is generated by a two-type branching process. The

two types are variable nodes and clause nodes. The process starts with a single root variable node r .

The offspring of any variable node is a Po(d) number of clause nodes, while every clause node begets

precisely k −1 variable nodes. Additionally, independently for each clause node a and every variable

node x that is either a child or the parent of a a sign, denoted sign(x, a) ∈ {±1}, is chosen uniformly

at random. The resulting random tree T models the local structure of the random k-CNF formula

F = F (n,d ,k) in the sense of local weak convergence [9, 62].

For an integer ℓ≥ 0 let T(ℓ) be the finite tree obtained by removing all variable and clause nodes

at a distance greater than 2ℓ from the root r . We identify the finite tree T(ℓ) with a Boolean formula

whose variables/clauses are precisely the variable/clause nodes of T(ℓ). Let S(T(ℓ)) ̸= ; be the set

of satisfying assignments of this formula and let τ(ℓ) ∈ S(T(ℓ)) be a uniformly random satisfying

assignment. Moreover, let ∂2ℓr be the set of variable nodes ofT(ℓ) at distance precisely 2ℓ from the root

r . Then for given d ,k the tree T=Td ,k has the Gibbs uniqueness property (see [59] for more details) if

lim
ℓ→∞

E

[
max

τ∈S(T(ℓ))

∣∣∣P
[
τ(ℓ)(r ) = 1 |T

]
−P

[
τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) = τ(x)

]∣∣∣
]
= 0. (4.3.1)

In words, in the limit of large ℓ the truth value τ(ℓ)(r ) of the root r is asymptotically independent of the

truth values {τ(ℓ)(x)}x∈∂2ℓr of the variables at distance 2ℓ from r . In this thesis, we explicitly derive the

lower bound on the duniq(k) threshold for any k ≥ 3. The details of this is discussed in Chapter 7.

Remark 4.3.3.

• Uniqueness regime: correlation decay along the tree T and the belief propagation converges to a

unique fixed point.

• Non-Uniqueness regime: In the infinite tree T, suitably chosen boundary conditions can give

rise to distinct extremal Gibbs measures, commonly referred to as pure states. On the other

hand, in large finite random graphs, this phenomenon appears in the thermodynamic limit as a

decomposition of the solution space into multiple well-separated clusters, which correspond to

these pure states.

• In random k-SAT, the breakdown of Gibbs uniqueness (a state where the solution is not unique and

has many distinct clauses) occurs well before the clustering threshold (see Figure 4.2) whereas in

random k-XORSAT the phase transition for the appearance of a linear number of frozen variables


37 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS

(variables whose assignments are determined by the problem structure) occurs simultaneously

with the emergence of the 2-core.

r

1
1

0 1

0
01

0 1

2ℓ

σ :

Figure 4.1: Galton-Watson tree T with Gibbs Uniqueness property

4.3.3 Replica Symmetry

The replica symmetry condition originally come from the method to study the partition function Z .

As estimating the partition function is hard because of the sum runs over the exponential number

of indices (say for k-SAT we have 2n possible summands). Therefore to get the leading exponential

order of partition function, a reasonable approximation is given by the term limn→∞ 1
n log Z . Then

the average log-partition function (in other words ’quenched free energy’) defined in Section 4.2 for

computing the moments E[Z ℓ] for any fixed ℓ ∈N:

lim
n→∞

1

n
E[log Z ] = lim

n→∞ lim
ℓ→0

1

nℓ
logE[Z ℓ]

This technique is known as replica symmetry trick [36, 70, 71]. To see the working mechanism the most

suitable model to consider is random energy model(REM). Although this model does not describe any

realistic physical system but is a good example for studying the concept of replica theory. Consider the

model, we have 2n possible assignments σ with each has an energy E (σ). Therefore ℓ-th moment of Z

is given by,

Z ℓ =
2n∑

i1,i2,··· ,iℓ
exp

[
β

(−Ei1 −·· ·−Eiℓ

)]

where, β denotes the inverse temperature. This quantity can be considered as a partition function of a

new system given by ℓ-tuples {i1, · · · , iℓ} with energies Ei1,··· ,iℓ = Ei1 +·· ·+Eiℓ . which implies that the


CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 38

new system is obtained by taking ℓ independent copies of the original model with each copy refers as

’replica’. Then the above equation can be written as,

Z ℓ =
2n∑

i1,··· ,iℓ=1

2n∏
j=1

exp

[
−βE j

(
ℓ∑

a=1
1(ia = j )

)]

Therefore for obtaining the average by taking the linearity of expectation and the i.i.d Gaussian energies

E j , we get,

E
[

Z ℓ
]
=

2n∑
i1,··· ,iℓ=1

exp

[
β2n

4

ℓ∑
a,b=1

1 (ia = ib)

]
(4.3.2)

Now we can write the indicator in the equation (4.3.2) using ℓ×ℓ overlap matrix Q where each entry is

given by Qab = 1{ia = ib} ∈ {0,1}. Then we can rewrite the above equation as below:

E
[

Z ℓ
]
=

∑
Q

Nn(Q)exp

[
β2n

4

ℓ∑
a,b=1

Qab

]

where Nn(Q) is the number of configurations (i1, · · · , iℓ) whose overlap matrix is Qab and the sum

over
∑

Q runs over the symmetric matrix {0,1}ℓ×ℓ matrices. Using the large deviation principle for

an entropy function s(Q) which only depends on matrix Q of the REM model we obtain Nn(Q) =
exp(n(s(Q)+o(1))). Then after taking the log to the above equation we get,

logE
[

Z ℓ
]
= n(max

Q
γ(Q)+o(1)) γ(Q) = β2

4

n∑
a,b=1

Qab + s(Q) (4.3.3)

Here, γ is symmetric under permutation of replicas for ℓ> 1 For any permutation π on set |ℓ| we have

γ(Q) = γ(Qπ) with elements Qπ
ab = Qπ(a)π(b). This γ is symmetric due to the fact that the replicas at

the beginning are identical and that implies Qab = q0 ∈ {0,1} for all a ̸= b — this is called in physics

replica symmetry(RS). Now the immediate consequence comes for the maximization over γ with

the matrix Q yields two maxima: one is Q1 where all entries are one and second one is the identity

matrix Q0. From [66] there exist a precise threshold βc = 2
√

(log2)/ℓ, with β≤βc, the global maximum

obtained at Q0 (identity matrix) whereas with β>βc, the maximum attained at Q1 (all 1-matrix). Thus

heuristically, by putting the threshold value βc for the identity matrix Q0 we obtain the following

prediction for β≤βc:

lim
n→∞

1

n
E [Z ] = lim

n→∞ lim
ℓ→0

1

ℓn
logE

[
Z ℓ

]

= lim
ℓ→0

1

ℓ
γ(Q0) = β2

4
− log2

Moreover, two problems come into picture for ℓ< 1: first one we need to maximize over the negative

number of variables. For the remedy of this problem Giorgio Parisi transformed the problem into a


39 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS

minimization problem referred as ’Parisi axioms’. Mathematically,

logE
[

Z ℓ
]
= n

(
min

Q
γ(Q)+o(1)

)

For β>βc the sum in (4.3.3) is dominated by Q which are not replica symmetric. In order to improve

on the RS result, one can the subspace of matrices to be optimized over. Proposed by Parisi in the

more complicated case of spin glass mean-field theory the replica symmetry breaking, prescribes a

recursive procedure for defining larger and larger spaces of matrices Q where one searches for saddle

points. The first step of this procedure called one step replica symmetry breaking(1RSB) [66]. For

better understanding let us suppose ℓ is a multiple of x and we group ℓ replicas into ℓ/x groups of x in

such as way with q0 ̸= q1,





Qaa = 1,

Qab = q1, if a and b are in same group

Qab = q0, if a and b are in different group

In random k-SAT model, for any variable x1 and x2 and a sample assignment σ from the Boltzmann

distribution we have,

lim
n→∞E

[∣∣µ(σx1 =σx2 = 1)−µ(σx1 = 1) ·µ(σx2 = 1)
∣∣]= 0

If the replica symmetry condition approximately holds for random factor graphs, then the limiting

normalized log-partition function is predicted by,

lim
n→∞

1

n
E
[
log Z

]= sup
π∈P 2(Ω)

B(π) (4.3.4)

where, π is the probability measure defined in the interval [0,1] and P (Ω) is the probability distribution

onΩ and B : P 2(Ω) →R. In physics paradigm the quantity in the r.h.s. of (4.3.4) is referred as ’Bethe

free entropy’. Our third result of this thesis on random k-SAT provide a detailed evaluation of this

quantity. For more details refer to Chapter 7 and [21].

In the next subsection we will see the clustering transition upto which this replica symmetry holds

for random k-SAT and beyond which this 1RSB holds that the cluster breaks into exponentially many

solution clusters.

4.3.4 Clustering transition: Reconstruction Property

when we are in the satisfiable regime α < αSAT(k), the clustering transition (also known as recon-

struction or, dynamic transition) occurs. The set of typical solutions is rather well connected, that

means any solution can be reached from the other by intermediate solutions. But above the clustering


CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 40

threshold αclus the typical solutions break into an exponentially many clusters or pure states which

are internally well connected, but separated one from each other by free-energy barriers. So, the

definition concerning typical solutions with respect to the measure µ=µG (F ) chosen to describe the

set of solutions. Moreover introduced by Montanari and Semerjian in [82] the clustering transition

can be interpreted as the birth of the point-to-set correlation function under that probability measure

µG (F ) chosen to describe the set of solutions. Given a variable node v and a set of variables V , the

point-to-set correlation function is defined for spin variables over the measure µ:

Cor(v,V ) =
∑
σV

Pµ (σV )

(∑
σv

Pµ (σv |σV )σv

)2

−
(∑
σi

Pµ (σv )σv

)

In the unclustered regime when the distance from variable v to V on the graph of interaction grows,

the point-to-set interaction function Cor(v,V ) vanishes, whereas in the clustered regime it doesn’t

decay to zero. Further in [82], the author justifies the terminology dynamic transition by showing

that this correlation implies the divergence of the relaxation time of local stochastic processes that

obey the detailed balanced condition such as ’Markov Chain Monte Carlo’. In the cavity method pf

random CSPs, this clustering threshold refers to as the appearance of a non-trivial solution of one-step

of Replica Symmetry Breaking (1RSB) with parisi parameter X = 1.

Let us briefly summarize the above thing. In the unclustered phase, the typical solution belongs

to a single cluster. Thus the thermodynamic properties of the measure µ are well defined by the

rigorous technique called Replica Symmetric cavity method which in particular estimate ΦRS that is

the quenched free entropy density defined in (4.2.2). On the other hand, in the case of 1-RSB cavity

method, the solution set splits into an exponential number of disjoint clusters (also called ’pure states’).

In the next subsection we will see different phases of these transitions in random k-SAT and k-XORSAT

model.

In the second result of this thesis, we employ the concept of (non-) reconstruction property in

the context of computing the marginal probability of the root of a bipartite graph G(F DC,t ) associated

with a random k-XORSAT formula F DC,t generated by the decimation process (defined in Chapter 3).

Roughly speaking, non-reconstruction means that the marginal πF DC,t =P
[
σF DC,t (xt+1) = 1 | F DC,t

]
is

determined by short-range rather than long-range effects. As we know that Belief Propagation is a

local algorithm, one might expect that the (non-)reconstruction phase transition coincides with the

threshold up to which BPGD succeeds (more details can be found in [22]).

For a (variable or clause) vertex v of G(F DC,t ) let ∂v be the set of vs neighbors. More generally,

for an integer ℓ ≥ 1 let ∂ℓv be the set of vertices of G(F DC,t ) at shortest path distance precisely ℓ

from v . From Figure 4.1 also one can refer that computing the marginal of the root r , whatever

assignments at a distance greater than 2ℓ will not affect the marginal computation of root in the case

of (non)-reconstruction phase. Following [59], we say that F DC,t has the non-reconstruction property if

lim
ℓ→∞

limsup
n→∞

E
[∣∣∣P

[
σF DC,t (xt+1) = 1

∣∣∣F DC,t ,
{
σF DC,t (y)

}
y∈∂2ℓxt+1

]
−P[

σF DC,t (xt+1) = 1 | F DC,t
]∣∣∣ |F XOR sat.

]
= 0.

(4.3.5)


41 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS

Conversely, F DC,t has the reconstruction property if

liminf
ℓ→∞

liminf
n→∞ E

[∣∣∣P
[
σF DC,t (xt+1) = 1

∣∣∣F DC,t ,
{
σF DC,t (y)

}
y∈∂2ℓxt+1

]
−P[

σF DC,t (xt+1) = 1 | F DC,t
]∣∣∣ |F XOR sat.

]
> 0.

(4.3.6)

4.4 Different phases in random k-SAT and random k-XORSAT

There are several phase transitions affecting the structure of the set of solutions of typical instances

predicted by cavity method in the satisfiable phase. Last few subsections describe the concept of

correlation decay and pure state decomposition in the clustering transition. When we deal with

models on locally tree like structure, we have encountered three main phases (also in random k-SAT)

which can be studied using appropriate cavity method.

UNSAT

Figure 4.2: Phase diagram of k-SAT adapted and modified from [66]. Left to Right: Uniqueness,
Clustering (Replica Symmetry), Clustering → Condensation (dynamic 1RSB), Condensation →

Satisfiability (static 1RSB), UNSAT

• Replica Symmetry: We further divide this phase into two different regimes.

(i). α<αuniq: In this phase there exists no trivial decomposition into pure states. The system

is in the replica symmetry phase with one big cluster of solution.

(ii). αuniq <α<αclus: Although replica symmetry holds in this regime but the clusters form one

big cluster along with exponentially tiny and scarce cluster.

• Dynamic 1RSB (αclus <α<αcond): In this phase the Gibbs measure µ(·) admits a non-trivial

decomposition into an exponentially number of pure states. From the correlation point of view

this phase is stable to small perturbations, but it is reconstructible. The solution space undergoes

clustering that is exponentially many solution clusters of small size.

• Static 1RSB (αclus <α<αcond): This is the ’original’ 1RSB phase analogous to the low-temperature

phase of the REM. This phase is not stable to small perturbations, and it is reconstructible implies

it has long-range correlations. The number of solutions in this regime is bounded.


CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS 42

• UNSAT: Beyond the satisfiability threshold αSAT, there exists no solution and the regime is

referred as Unsatisfiable phase.

Remark 4.4.1. In terms of statistical physics, one replica symmetry breaking (1RSB) cavity method

gives a full overview of the solution space geometry of random k-SAT. According to physics method, the

satisfiability threshold for random k-SAT turns out to be

αsat (k) = 2k log2− 1+ log2

2
+ok (1)

Further for larger k values the satisfiability threshold have been studied rigorously [34, 41].

UNSAT

Easy SAT Hard SAT

Figure 4.3: Phase diagram of k-XORSAT adapted and modified from [66]. Left to Right: Clustering
(Easy SAT), Satisfiability(Hard SAT), UNSAT

k 3 4 5
αclus 0.81847 0.77228 0.70178
αSAT 0.91794 0.97677 0.99244

Table 4.1: The thresholds αclus and αSAT for various k [66]

On the other hand, in case of random k-XORSAT, the whole regime α < αSAT is the satisfiable

phase. This means there exist solutions to the random linear system with high probability and more

specifically, the number of solutions is given by, Z = en(1−α) where n is the number of variables of

random k-XORSAT formula. From the figure it is clear that, the threshold αclus separates two phases:

• α<αclus: This phase is referred as ’Easy SAT ’ where there is a single cluster of solutions.

• αclus <α<αSAT: This phase is referred as ’Hard SAT 1’ where the solutions of linear system are

grouped into well-separated clusters.

1The k-XORSAT problem can, of course, always be solved by ’Gaussian elimination in O(n3) time with n is the number of
involved variables and m =Θ(n) is the number of equations.


43 CHAPTER 4. PHASE TRANSITIONS IN RANDOM CSPS

• α>αSAT: In this regime there exists no solution and is referred as ’UNSAT ’ phase.

Table 4.1 shows the thresholds for small k values. For the large k, αclus ≈ logk
k and αSAT ≈ 1− e−k +

O(e−2k ) [66]. In the second result of this thesis, we analyze the performance of BPGD algorithm on

thresholds of α values, more precisely on the threshold values of average variable degree d (where

α= d/k) with constant k ≥ 3.


Based on:

The number of random 2-SAT solutions is asymptotically log-normal [23]

Arnab Chatterjee, Amin Coja-Oghlan, Noela Müller, Connor Riddlesden,

Maurice Rolvien, Pavel Zakharov, Haodong Zhu

Proc. 28th RANDOM (2024)

5
A Central Limit Theorem for random 2-SAT

solutions

“The central limit theorem is the most fundamental theory in

modern statistics.· · · With the central limit theorem, parametric tests

have higher statistical power than non-parametric tests, which do

not require probability distribution assumptions.”

–Sang Gyu Kwak et.al.

Till now we have explored the random constraint satisfaction models, the different message passing

algorithms and the phase transitions depend on the values of variable to clause density (α) and the

solution space geometry based on the values ofα. The goal of this chapter is to provide a deeper insight

on the estimation of partition function i.e., the number of solutions (more precisely, the logarithm of

the number of satisfying assignments) in random 2-SAT, the simplest random CSP model.

5.1 Motivation and History

The hunt for satisfiability thresholds has been a guiding theme of research into random constraint

satisfaction problems [6,24,41]. Once the satisfiability threshold has been pinpointed, the next obvious

question should come into one’s mind is to determine the distribution of satisfying assignments

within the satisfiable phase [59]. Indeed, the number of such solutions is intimately tied to phase

transitions that affect the solution space geometry, which in turn impacts the computational nature of

finding or sampling solutions [1, 18, 44]. Despite its importance, the problem of counting solutions in

44


45 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS

random CSPs remains difficult, with few general-purpose tools currently available. In those instances

where precise, rigorous results have been obtained, such as for random NAE-SAT or XORSAT, the

proofs commonly rely on the method of moments (e.g., [4, 42, 93, 95]). A necessary condition for the

success of this approach is that the problem exhibits certain symmetries which are absent in many

interesting cases [6, 31]. Random 2-SAT, the simplest random constraint satisfaction problem lacking

the aforementioned symmetry properties, is therefore an intriguing topic of study.

The number of satisfying assignments in random 2-SAT

Let F 2SAT = F n,m be a random 2-CNF on n Boolean variables x1, . . . , xn with m clauses, drawn indepen-

dently and uniformly from all 4
(n

2

)
possible 2-clauses. Further assume that m ∼ dn/2 for a fixed real

d > 0. Since 1990s it has been known that F 2SAT is satisfiable w.h.p. if d < 2, and unsatisfiable w.h.p.

if d>2 [26, 53]. Whereas the first order approximation to the number of satisfying assignments has

been studied recently [2]. Alongside this, calculating the number of satisfying assignments Z (F 2SAT)

is a #P-hard task 1 [100]. Nonetheless, Monasson and Zecchina in [77] put forward a conjecture

on the exponential order of the number of satisfying assignments of random 2-CNFs using physics

inspired technique. In 2021 Achlioptas et al. [2] provides a first order on the logarithm of the number

of satisfying assignments using law of large number approximation by introducing a function φ(d) > 0

such that for all d < 2, i.e.,throughout the entire satisfiable phase,

log Z (F 2SAT) = nφ(d)+o(n) w.h.p., (5.1.1)

In this thesis we determine not only the leading order of log Z (F 2SAT) but also its fluctuations. We

also provide a precise result by showing that the logarithm of the number of satisfying assignments

converges to a Gaussian throughout the satisfiable regime – the first central limit theorem (’CLT’) of

this type for any random CSPs.

5.2 Main Result.

In this section we state our first result of this thesis [23].

Let P (R2) be the set of all (Borel) probability measures on R2. For 0 < d < 2 and 0 ≤ t ≤ 1 we define

an operator

logBP⊗d ,t :P
(
R2)→P

(
R2) , ρ 7→ ρ̂ = logBP⊗d ,t (ρ), (5.2.1)

as follows. Let

(ξρ,i )i≥1, (ξ′ρ,i )i≥1, (ξ′′ρ,i )i≥1, ξρ,i =
(
ξρ,i ,1

ξρ,i ,2

)
, ξ′ρ,i =

(
ξ′ρ,i ,1

ξ′ρ,i ,2

)
, ξ′′ρ,i =

(
ξ′′ρ,i ,1

ξ′′ρ,i ,2

)

be random vectors with distribution ρ, let d
d=Po(td), d ′,d ′′ d=Po((1− t )d) and let si , s ′i , s ′′i ,r i ,r ′

i ,r ′′
i for

1#P-hard comprises counting problems that ask for the existence of the number of solutions for a given NP decision
problem. A problem is said to be #P-hard if it is as hard as any problem in the class #P. For this reason, any problem in #P can
be reduced to it in polynomial time.


CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 46

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
d

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Va
ria

nc
e

Variance
Second moment bound

0.40

0.45

0.50

0.55

0.60

0.65

0.70

Ex
pe

ct
at

io
n

Expectation
First moment bound

Figure 5.1: Numerical approximations to the function φ(d) from (5.1.1) (red) and the variance η(d)2

from (5.2.5) (green). The black dashed line is the first moment bound d 7→ log(2)+ d
2 log(3/4) whereas

the purple dashed line is the second moment bound. (Figure 1, [23])

i ≥ 1 be uniformly random on {±1}, all mutually independent. Then ρ̂ is the distribution of the vector




∑d
i=1 si log

(1
2

(
1+ r i tanh(ξρ,i ,1/2)

))+∑d ′
i=1 s ′i log

(
1
2

(
1+ r ′

i tanh(ξ′ρ,i ,1/2)
))

∑d
i=1 si log

(1
2

(
1+ r i tanh(ξρ,i ,2/2)

))+∑d ′′
i=1 s ′′i log

(
1
2

(
1+ r ′′

i tanh(ξ′′ρ,i ,2/2)
))


 ∈R2 .

In addition, define a function B⊗
d ,t : P (R2) → (0,∞] by letting

B⊗
d ,t (ρ) = E

[
2∏

h=1
log

(
1− 1

4
(1+ r 1 tanh(ξρ,1,h/2))(1+ r 2 tanh(ξρ,2,h/2))

)]
. (5.2.2)

The main theorem establishes a CLT for the logarithm of the number of solutions of F 2SAT, with a

standard deviation that is closely connected to the aforementioned function B⊗
d ,t :

Theorem 5.2.1 (Theorem 1.1, [23]). For any 0 < d < 2, t ∈ [0,1] there exists a unique probability measure

ρd ,t ∈P (R2) such that

ρd ,t = logBP⊗d ,t (ρd ,t ) and
∫

R2
|ξ|22 dρd ,t (ξ) <∞. (5.2.3)

Furthermore,

lim
n→∞

log Z (F 2SAT)−E[log Z (F 2SAT) | Z (F 2SAT) > 0]p
m

=Γη(d) in distribution, where (5.2.4)

η(d)2 =
∫ 1

0
B⊗

d ,t (ρd ,t )dt −B⊗
d ,0(ρd ,0) ∈ (0,∞). (5.2.5)


47 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS

Remark 5.2.2. Note that, the conditioning on log Z (F 2SAT) > 0 is necessary in (5.2.4), because even

for d < 2 the formula F 2SAT is unsatisfiable with probability Ω(n−1), in which case log Z (F 2SAT) =−∞.

Moreover, the L2-bound ensures that the integral (5.2.5) is well-defined. Finally, (5.2.4) implies that,

P
[
log Z (F 2SAT)−E[log Z (F 2SAT) | Z (F 2SAT) > 0] < z

p
m

]∼P[
Γη(d) < z

]
(z ∈R). (5.2.6)

So, the first result of this thesis [23] addresses two questions.

Question 1. How to show the asymptotic normality in (5.2.4) ?

Question 2. How to calculate the variance of the formula effectively?

Evaluating Standard Deviation.

Towards answering the second question, we know that the proof of the uniqueness of the stochastic

fixed point ρd ,t from (5.2.3) is based on the contraction method, a fixed point iteration will con-

verge rapidly. In effect, for any d , t a discrete distribution that approximates ρd ,t arbitrarily well (in

Wasserstein distance) can be computed via a randomized heuristic algorithm called population dynam-

ics [66, Chapter 14]. Since B⊗
d ,t (ρd ,t ) varies continuously in d and t , η(d)2 can thus be approximated

within any desired accuracy, see Figure 5.1.

Study Fixed Point ρd ,t .

The distribution of each coordinate of the fixed point has been studied by Múller, Neininger and Zhu

in [86]. It is shown that when d ≤ 1, the corresponding measure is purely discrete. After applying

the transformation 1+tanh(·/2)
2 the support of the measure consists of rational numbers in (0,1) for all

0 < d < 2. Moreover, when d ∈ (1,2), the measure acquires a continuous part.

5.3 Proof Strategy

The main hurdle towards the proof of Theorem 5.2.1 is to compute the variance of log Z (F 2SAT) given

satisfiability. The key idea, inspired by spin glass theory [25] but new to any random CSPs, is to

count the joint number of satisfying assignments of two correlated random formulas. Once this

is accomplished Theorem 5.2.1 will follow from a general martingale central limit theorem. To get

habituated we first revisit the method of moments, the reasons it fails on random 2-SAT and the

combinatorial interpretation of the law of large numbers (5.1.1).

5.3.1 Method of Moments fails.

The default approach to estimating the number of solutions to a random CSP is the venerable second

moment method [6]. Its core idea is to show that the second moment of the number of solutions

is of the same order as the square of the expected number of solutions. If so then the moment


CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 48

computation together with small subgraph conditioning yields the precise limiting distribution of the

number of solutions [35, 97]. However, this approach works only if the log of the number of solutions

superconcentrates around the log of the expected number of solutions which does not hold in random

2-SAT. In fact, a straightforward calculation yields that,

1

n
logE[Z (F 2SAT)] ∼ log2+ d

2
log(3/4). (5.3.1)

The formula on the r.h.s. is displayed as the black dashed line in Figure 5.1. As can be verified

analytically, this line strictly exceeds the function φ(d) from (5.1.1) for any 0 < d < 2. Consequently,

(5.1.1) implies that log Z (F 2SAT) ≤ logE[Z (F 2SAT)]−Ω(n) w.h.p. In other words, the expected number

of solutions E[Z (F 2SAT)] overshoots the typical number of solutions by an exponential factor w.h.p. (for

details, see the discussion in [4, 7]).

Rather than relying on the method of moments, Monasson and Zecchina in [77] proposed a

physics-inspired approach for estimating log Z (F 2SAT) using the Belief Propagation algorithm which is

discussed in Chapter 3 of this thesis. This approach was established rigorously by Achlioptas et.al. [2].

5.3.2 BP Approximation.

The Belief propagation messages introduced in Chapter 3 get updated iteratively by an operator

BP : (νx→a ,νa→x )a,x∈∂a 7→ (ν̂x→a , ν̂a→x )a,x∈∂a = BP((νx→a ,νa→x )a,x∈∂a). (5.3.2)

where, for ∂a = {x, y} the updated messages ν̂a→x (±1) are defined by

ν̂a→x (sign(x, a)) = 1

1+νy→a(sign(y, a))
, ν̂a→x (−sign(x, a)) = νy→a(sign(y, a))

1+νy→a(sign(y, a))
. (5.3.3)

Moreover, for a variable x and a clause a ∈ ∂x we define 2

ν̂x→a(s) =
∏

b∈∂x\{a}νb→x (s)∏
b∈∂x\{a}νb→x (1)+∏

b∈∂x\{a}νb→x (−1)
(s ∈ {±1}) . (5.3.4)

The purpose of BP is to heuristically ‘approximate’ the marginal probabilities that a random satisfying

assignment σ=σF 2SAT of F 2SAT will set a certain variable to a specific truth value. The ‘approximation’

given by the set (νx→a ,νa→x )a,x∈∂a of messages reads

νx (s) =
∏

b∈∂x νb→x (s)∏
b∈∂x νb→x (1)+∏

b∈∂x νb→x (−1)
(s ∈ {±1}). (5.3.5)

Equation (5.3.5) suggests that the BP operator should be iterated until a fixed point is reached; that is,

until ν̂x→a = νx→a and ν̂a→x = νa→x hold for all x, a. We then evaluate (5.3.5) and substitute them into

a general expression known as Bethe free entropy, which yields the BP approximation of log Z (F 2SAT).

2For the sake of tidyness, if the above denominator vanishes we simply let µ̂x→a (±1) = 1
2 .


49 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS

This BP approximation is accurate when the bipartite graph induced by the clause-variable incidences

of the 2-CNF F 2SAT is acyclic, but it may produce inaccurate results in the presence of cycles.

5.3.3 Towards calculating variance.

The proof of the formula (5.1.1) combines the Gibbs uniqueness property discussed in Chapter and the

local convergence to the Galton-Watson tree with a coupling argument called the ’Aizenman-Sims-Starr

scheme’ [2]. Unfortunately it is not clear how the order of the standard deviation of log Z (F 2SAT) could

be derived because the main problem relies on the convergence of the Gibbs uniqueness property

diminishes as d approaches the satisfiability threshold. To overcome this challenge, we develop a

combinatorial interpretation of log2(Z (F 2SAT)) by constructing a correlated pair (F 1(M , M ′),F 2(M , M ′))

for any given integers M , M ′ ≥ 0 of formulas on the variable set Vn = {x1, . . . , xn} as follows. Let (ai )i≥1,

(a ′
i )i≥1, (a ′′

i )i≥1 be sequences of mutually independent uniformly random clauses on Vn . Then

F 1(M , M ′) = a1 ∧·· ·∧aM ∧a ′
1 ∧·· ·∧a ′

M ′ and, (5.3.6)

F 2(M , M ′) = a1 ∧·· ·∧aM ∧a ′′
1 ∧·· ·∧a ′′

M ′ . (5.3.7)

Thus, the two formulas share clauses a1, . . . , aM . Additionally, each contains another M ′ independent

clauses. In particular, F 1(m,0), F 2(m,0) are identical, while F 1(0,m), F 2(0,m) are independent. For

computing the variance given that F 1(M ,m −M) and F 2(M ,m −M) are satisfiable for all M , we can

write a telescoping sum

log Z (F 1(m,0)) · log Z (F 2(m,0))− log Z (F 1(0,m)) · log Z (F 2(0,m)) (5.3.8)

=
m∑

M=1
log Z (F 1(M ,m −M)) · log Z (F 2(M ,m −M))

− log Z (F 1(M −1,m −M +1)) · log Z (F 2(M −1,m −M +1)).

Clearly, if we could take the expectation on the l.h.s. of (5.3.8), we would precisely obtain the variance

of log Z (F 2SAT). However, we cannot just take the expectation of (5.3.8), because some F h(M ,m −M)

may be unsatisfiable for (h = 1,2). potentially leading to occurrences of log0. To address this issue we

replace log Z (F 2SAT) with more tractable random variable sharing same limiting distribution whose

construction is based on ’Unit Clause Propagation’ discussed in Chapter 3, Section 3.2.3. Thus we

obtain a pruned formula F̂ 2SAT from the original 2-CNF F 2SAT and the following Fact can be verified

(for more details of the proof see [23]).

Fact 5.3.1 (Fact 2.2, [23]). For any 2-CNF F 2SAT, the pruned 2-CNF F̂ 2SAT is satisfiable.

Note that, even if F 2SAT is satisfiable the number Z (F̂ 2SAT) of satisfying assignments of F̂ 2SAT could

dramatically exceed Z (F 2SAT) as the pruned formula F̂ 2SAT generally have far fewer clauses than the

original formula F 2SAT. However, the following proposition shows that on a random formula, the

impact of pruning is modest.


CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 50

Proposition 5.3.2 (Proposition 2.3, [23]). | log Z (F̂ 2SAT)− log Z (F 2SAT)| ≤ n1/3. with probability 1−
o(n−1/2).

Figure 5.2: An illustration of the correlated GW-tree T ⊗ (Figure 1, [23])

As the error bound from Proposition 5.3.2 is tight, it suffices to establish a CLT for the log of the

number of satisfying assignments of the pruned formula log Z (F̂ 2SAT). Revisiting the telescoping sum

(5.3.8) we obtain the following lemma Lemma 5.3.3 which expresses the variance as a sum of local

changes. For example,Φ1(M ,m −M) is obtained fromΦ1(M −1,m −M) by adding a single random

clause, namely aM . On the other hand, only a few clauses are pruned from random formulas w.h.p.

Expanding the variance Var
(
log Z (F̂ 2SAT)

)
as follows:

Lemma 5.3.3 (Lemma 2.4, [23]). Let

∆(M) = E
[

log

(
Z (F̂ 1(M ,m −M))

Z (F̂ 1(M −1,m −M))

)
· log

(
Z (F̂ 2(M ,m −M))

Z (F̂ 2(M −1,m −M))

)]
, (5.3.9)

∆′(M) = E
[

log

(
Z (F̂ 1(M −1,m −M +1))

Z (F̂ 1(M −1,m −M))

)
· log

(
Z (F̂ 2(M −1,m −M +1))

Z (F̂ 2(M −1,m −M))

)]
. (5.3.10)

Then Var
[
log Z (F̂ 2SAT)

]=
m∑

M=1
∆(M)−∆′(M).

Now for the analysis of the correlated formulas we need the following expressions to evaluate.

Proposition 5.3.4 (Proposition 2.5, [23]). Let 1 ≤ M ≤ m. Then,

Z (F̂ h(M ,m −M))

Z (F̂ h(M −1,m −M))
= 1−

∏
y∈∂aM

P
[
σy ̸= sign(y, aM ) | F̂ h(M −1,m −M), aM

]+o(1) (h = 1,2),

Z (F̂ 1(M −1,m −M +1))

Z (F̂ 1(M −1,m −M))
= 1−

∏
y∈∂a ′

m−M+1

P
[
σy ̸= sign(y, a ′

m−M+1) | F̂ 1(M −1,m −M), a ′
m−M+1

]+o(1),

Z (F̂ 2(M −1,m −M +1))

Z (F̂ 2(M −1,m −M))
= 1−

∏
y∈∂a ′′

m−M+1

P
[
σy ̸= sign(y, a ′′

m−M+1) | F̂ 2(M −1,m −M), a ′
m−M+1

]+o(1).


51 CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS

We construct a Galton-Watson tree T ⊗ that approximates the joint distribution of the local structure

of the pair (F̂ 1(M −1,m −M), F̂ 2(M −1,m −M)).

Shared variables/clauses are indicated in red, 1-distinct variables/clauses in green and 2-distinct

ones in blue in Figure 5.2. (for more details refer to [23]) From T ⊗ we extract a pair (T 1,T 2) of correlated

random trees. Specifically, T h is obtained from T ⊗ by deleting all (3−h)-distinct variables and clauses.

Hence, the parameter t determines how ‘similar’ T 1,T 2 are. As we have generated a pair of random

formulas and take a uniformly random pair of satisfying assignments, the joint distribution of any of n

coordinates can be viewed on the heatmaps (shown in Figure 5.3): almost independent formulas on

the left and highly correlated formulas on the right.

Figure 5.3: Marginal distribution on two correlated formulas for d = 0.9 and M = 0.1m,0.5m,0.9m
(Figure 2, [23])

Now in our hand we have a pair of correlated formulas the next step is to run BP on the random

trees (T 1,T 2) to find the joint distribution of the truth values σT (2ℓ)
1 ,o ,σT (2ℓ)

2 ,o assigned to the root o.

Fortunately, due to the Markovian nature of the Galton-Watson tree T ⊗, the bottom-up BP computation

on a random tree can be expressed by a fixed point iteration on the space of probability distributions

on R2. The most appropriate operator logBP⊗d ,t expresses the updates of the log-likelihood ratios of

the BP messages from (5.3.3)–(5.3.4). Thus the followings hold:

Proposition 5.3.5 (Proposition 2.8, [23]). There exists a unique ρd ,t ∈P (R2) that satisfies (5.2.3) and

limℓ→∞ρ(ℓ)
d ,t = ρd ,t weakly.

Corollary 5.3.6 (Corollary 2.11, [23]). With η(d)2 from (5.2.5) we have η(d) > 0 and Varlog Z (F̂ 2SAT) ∼
mη2

d .

The proof of Proposition 5.3.5 is based on a contraction argument, for any d , t the distribution

ρd ,t can be approximated effectively within any given accuracy via a fixed point iteration. From

the contraction argument on the evaluation of the functional B⊗
d ,t on ρd ,t yield finite values for any

d ∈ (0,2) and t ∈ [0,1] which implies the finiteness of the variance.

Lemma 5.3.7 (Lemma 10.1, [23]). For any d ∈ (0,2) and t ∈ [0,1], B⊗
d ,t (ρd ,t ) <∞. Moreover, for any

d ∈ (0,2), η(d)2 <∞.


CHAPTER 5. A CLT FOR RANDOM 2-SAT SOLUTIONS 52

Finally, along with Proposition 5.3.4, the BP arguments on correlated formulas give us the variance

of log Z (F̂ 2SAT).

5.4 Establishing the Central Limit Theorem

Once finished calculating variance we set up a filtration (Fn,M )0≤M≤mn by letting Fn,M be theσ-algebra

generated by a1, . . . , aM . The conditional expectations is given by,

Z n,M = m−1/2E
[
log Z (F̂ 2SAT) |Fn,M

]
(5.4.1)

then form a Doob martingale. Let X n,M = Z n,M −Z n,M−1 be the martingale differences.

Proposition 5.4.1 (Proposition 2.12, [23]). For all 0 < d < 2 the martingale (5.4.1) satisfies

lim
n→∞E

[
max

1≤M≤m
|X n,M |

]
= 0 and, (5.4.2)

lim
n→∞E

∣∣∣∣∣η(d)2 −
m∑

M=1
X 2

n,M

∣∣∣∣∣= 0. (5.4.3)

Clearly, the above conditions can be checked easily with the help of pruning argument.

Thus we conclude this chapter by deriving the main result of this chapter from the following

general martingale central limit theorem, which is a special case of [55, Theorem 3.2].

Theorem 5.4.2 ( [55, Theorem 3.2]). Let (Z n,i ,Fn,i )0≤i≤mn ,n≥1 be a zero-mean, square-integrable mar-

tingale array with differences X n,i = Z n,i −Z n,i−1 for 1 ≤ i ≤ mn . Assume that there exists a constant η2

such that

lim
n→∞ max

1≤i≤mn

|X n,i | = 0 in probability, (5.4.4)

lim
n→∞

mn∑
i=1

X 2
n,i = η2 in probability, (5.4.5)

E

[
max

1≤i≤mn

X 2
n,i

]
is bounded in n. (5.4.6)

Then Z n,mn converges in distribution to a Gaussian distribution with mean zero and variance η2.

Observe that,Proposition 5.4.1 directly implies the conditions of Theorem 5.4.2. Lastly, for the

finiteness and positiveness of the variance, Lemma 5.3.7 guarantees that η(d) <∞, while Corollary 5.3.6

shows that η(d) > 0.


Based on:

Belief Propagation Guided Decimation on Random k-XORSAT [22]

Arnab Chatterjee, Amin Coja-Oghlan, Mihyung Kang, Lena Krieg,

Maurice Rolvien, Gregory Sorkin

Proc. 52nd ICALP (2025)

6
Performance of BPGD on random

k-XORSAT

“ BPGD enhances conventional BP by sequentially fixing

(decimating) variable nodes based on their belief values and

reducing the solution space.”

–Masoumeh Alinia et.al.

As we established a central limit theorem holds for the number of solutions of random 2-SAT

formulae in the previous chapter, we now shift our attention from random 2-SAT to another random

satisfiability problem namely random k-XORSAT, one of the simplest examples of random contraint

satisfaction problems exhibiting sharp phase transition. More precisely, we will analyze the perfor-

mance of Belief Propagation Guided Decimation (’BPGD’) introduced in Chapter 3 by mathematically

verified the heuristic work by Ricci-Tersenghi and Semerjian [96]. In addition to this, we study a

thought experiment called ’decimation process’ (also initiated in Chapter 3) for which we identify a

(non)-reconstruction and condensation phase transition. Begin the chapter with some motivation and

background behind this work.

6.1 Motivation and History

The random k-XORSAT exhibits many features common to other intensely studied random CSPs,

such as random k-SAT. At the same time, the random k-XORSAT is mathematically more compliant

than say random k-SAT because a XORSAT instance translates into a linear system over F2 as XOR

53


CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 54

operation is equivalent to addition modulo two. In addition, the algebraic nature of the problem

induces strong symmetry properties that simplify its study [12]. Since early 2000, in combinatorics as

well as in statistical physics there has been contributing intriguing ’prediction’ on different random

CSPs. Furthermore, in 2008 Ricci-Tersenghi and Semerjian in [96] put forward a heuristic analysis of

BPGD on both random k-SAT and k-XORSAT. Later Coja-Oghlan and Pachon-Pinzon, demonstrated

both ’decimation process’ and ’BPGD’ rigorously on random k-SAT [28,33] by assuming clause length k

is sufficiently large due to the lack of inherent symmetry in random k-SAT. In a recent paper [102], Yung

in 2024 a first step towards the rigorous analysis of BPGD on random k-XORSAT has been undertaken.

However, Yung’s analysis turns out to be not tight. Specifically, apart from requiring spurious lower

bounds on the clause length k, Yung’s results do not quite establish the precise connection between the

decimation process and the performance of BPGD. One reason for this is that [102] relies on ‘annealed’

techniques, i.e., essentially moment computations. Here we instead harness ‘quenched’ arguments

to proof the success probability of BPGD and to make a precise connection between BPGD and the

decimation process. The next section provides a very brief overview of the main problem addressed in

this work along with their results.

6.2 Problem Statement and Results.

let F XOR = F (n,d ,k) be a random k-XORSAT formula with variables x1, . . . , xn and m random clauses

of length k where m
d=Po(dn/k). The m clauses are drawn uniformly and independently out of the

set of all 2k
(n

k

)
possibilities. Thus, d > 0 equals the average number of clauses that a given variable xi

appears in. Moreover, every clause of F XOR is an XOR of precisely k distinct variables with k ≥ 3, each

of which may or may not come with a negation sign. Mathematically, if we are handed a number of

independent random constraints (clauses) ci of the type

ci = yi 1 XOR · · · XOR yi k ,

where each yi j is either one of n available Boolean variables x1, . . . , xn or a negation ¬x1, . . . ,¬xn . As

we know that boolean XOR boils down to addition over F2, this problem can be rephrased as the full

rank problem for the random matrix A with q = 2, k = k fixed to a deterministic value. Furthermore,

the random negation patterns of the constraints amount to choosing a random right-hand side vector

y for which we are to solve Ax = y .

Let A be a matrix representation of a random k-XORSAT formula F XOR. In addition to the matrix

A, define A′ = A′(θ) by adding θn (0 ≤ θ ≤ 1) new rows, each with exactly a single one, at the bottom

of A. Equivalently, the Tanner graph G ′ of A′ is obtained by adding Po(λn) unary check nodes to the

Tanner graph G of A where λ = − log(1−θ). This process is called ’Pinning’ which helps to remove

mostly ’short linear relations’. Below Figure 6.1 gives a rough sketch of the matrix A and A′ with respect

to the tanner graph G and G ′. So the second result of this thesis [22] addresses two questions.

Question 1. How does the solution space geometry will change when ’pinning’ occurs?


55 CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT

Figure 6.1: Matrix A and A′ corresponds to the Tanner graph G and G ′

Question 2. Establish a link between the performance of the BPGD algorithm and phase transition in

decimation process.

6.2.1 Analysis of BPGD

In order to state the main results we need to introduce a few threshold values. To this end, given d ,k

and a real parameter λ ≥ 0, consider the probability generating functions of D corresponds to the

variable node and treat as a Poisson random variable and K corresponds to the check nodes and treat

as a two-point distribution, either k or 1 (as the check nodes contains two types of nodes, one with

degree k and another is the unit clause.) and is given by,

D(z) = exp((λ+d)(z −1))

K (z) = kλ

kλ+d
z + d

kλ+d
zk . (6.2.1)

Definition 6.2.1. The Bethe free entropyΦ of the matrix A′ is defined by

Φ(z) = D
(
1− K ′(z)

K ′(1)

)
− D ′(1)

K ′(1)

(
1−K (z)− (1− z)K ′(z)

)
.

Also, consider a function φ:

φ(z) = 1−D ′ (1−K ′(z)/K ′(1)
)

/D ′(1).

Remark 6.2.2.

• Φ′(z) = D ′(1)K ′′(z)(φ(z)− z)/K ′(1).

• So, the stationary points ofΦ coincide with the fixed points of φ which is verified in [22].


CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 56

Substituting for the specific distributions from (6.2.1) we get the following expressions for φ(z) and

Φ(z):

φd ,k,λ :[0,1] → [0,1], z 7→ 1−exp
(
−λ−d zk−1

)
, (6.2.2)

Φd ,k,λ :[0,1] →R, z 7→ exp
(
−λ−d zk−1

)
− d(k −1)

k
zk +d zk−1 − d

k
. (6.2.3)

Let α∗(λ) = α∗(d ,k,λ) ∈ [0,1] be the smallest and α∗(λ) = α∗(d ,k,λ) ≥ α∗(d ,k,λ) ∈ [0,1] the largest

fixed point of φd ,k,λ. Figure 6.2 visualizesΦd ,k,λ(z) for different values of λ.

0.2 0.4 0.6 0.8 1
z

-0.1
-0.05

0.05
0.1

0.15
0.2
Φd, k, λ

Figure 6.2: Φd ,k,λ for k = 3 and d = 2.4, for λ from 0 to 0.3 (maximum at z = 0) and from 0.4 to 0.9
(Figure 1, [22])

In addition to this, define few threshold values of d .

dmin(k) =
(

k −1

k −2

)k−2

, (6.2.4)

dcore(k) = sup
{
d > 0 :α∗(0) = 0

}
, (6.2.5)

dSAT(k) = sup
{
d > 0 :Φd ,k,0(α∗(0)) ≤Φd ,k,0(0)

}
. (6.2.6)

where, the value dSAT(k) is the random k-XORSAT satisfiability threshold [12, 42, 93] and dcore(k)

equals the threshold for the emergence of a giant 2-core within the k-uniform hypergraph induced by

Φ [12, 75]. A bit of calculus reveals that

0 < dmin(k) < dcore(k) < dSAT(k) < k.

Now we state our second result of this thesis [22].

Theorem 6.2.3 (Theorem 1.1, [22]). Let k ≥ 3.


57 CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT

(i). If d < dmin(k), then

lim
n→∞P

[
BPGD(F XOR) finds a satisfying assignment

]= exp

(
−d 2(k −1)2

4

∫ 1

0

z2k−4(1− z)

1−d(k −1)zk−2(1− z)
dz

)
.

(ii). If dmin(k) < d < dSAT(k), then

P
[
BPGD(F XOR) finds a satisfying assignment

]= o(1).

The above theorem determines the precise clause-to-variable densities where BPGD succeeds/fails

and mathematically verified the heuristic work by Ricci-Tersenghi and Semerjian [96]. To be precise,

in the ‘successful’ regime BPGD does not actually succeed with high probability, but with an explicit

prob- ability strictly between zero and one. The most significant ingredient towards turning the

heuristic arguments from [96] into a rigorous proof is a formula for the nullity of the check matrix

of the XORSAT instance F DC,t from the decimation process introduced in Chapter 3. The following

proposition establishes a relationship between the matrix At = AF DC,t and the functionΦd ,k,λ for the

pre-defined d and λ.

Proposition 6.2.4 (Proposition 2.6, [22]).

lim
n→∞nulAt =Φd ,k,λ(αmax) in probability.

6.2.2 Phase Transition of Decimation process

In addition to the success probability of BPGD algorithm, we also mathematically confirm the predic-

tions of phase transition heuristically introduced by Ricci-Tersenghi and Semerjian [96] and investigate

how they relate to the performance of BPGD. The next two theorems identify precise regime of d ,t

where different phase transitions of the decimation process hold. Before going to the statement of the

theorems let us introduce few values of λ. [accordingly the θ values follow from λ=− log(1−θ).

λ∗ =λ∗(d ,k) =− log(1− z∗)− z∗
(k −1)(1− z∗)

>λ∗

where, λ∗ =λ∗(d ,k) = max

{
0,− log(1− z∗)− z∗

(k −1)(1− z∗)

}
≥ 0

Additionally, let λcond(d ,k) be the solution to the ODE

∂λcond(d ,k)

∂d
=− α∗(λcond(d ,k))k −α∗(λcond(d ,k))k

k(α∗(λcond(d ,k))−α∗(λcond(d ,k)))
, λcond(dSAT(k),k) = 0 (6.2.7)

To be precise, while θcond matches the predictions of [96], the ODE formula (6.2.7) for the threshold,

which is easy to evaluate numerically, does not appear in [96]. Instead of the ODE formulation,


CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 58

[96] define λcond as the (unique) λ ≥ 0 such that Φ(α∗) = Φ(α∗); (we showed in [22] that both are

equivalent.)

Theorem 6.2.5 (Theorem 1.2, [22]). Let k ≥ 3 and let 0 ≤ t = t(n) ≤ n be a sequence such that

limn→∞ t/n = θ ∈ (0,1).

(i). If d < dmin(k), then F DC,t has the non-reconstruction property w.h.p.

(ii). If dmin(k) < d < dSAT(k) and θ < θ∗ or θ > θcond, then F DC,t has the non-reconstruction property

w.h.p.

(iii). If dmin(k) < d < dSAT(k) and θ∗ < θ < θcond, then F DC,t has the reconstruction property w.h.p.

Recall µF DC,t denote the BP ‘approximation’ of the correct marginal πF DC,t of variable xt+1 in the

formula F DC,t created by the decimation process.

Theorem 6.2.6 (Theorem 1.3, [22]). Let k ≥ 3 and let 0 ≤ t = t(n) ≤ n be a sequence such that

limn→∞ t/n = θ ∈ (0,1).

(i). If 0 < d < dmin(k) then µF DC,t =πF DC,t w.h.p.

(ii). If dmin(k) < d < dSAT(k) and θ < θcond or θ > θ∗, then µF DC,t =πF DC,t w.h.p.

(iii). If dmin(k) < d < dSAT(k) and θcond < θ < θ∗, then E
∣∣µF DC,t −πF DC,t

∣∣=Ω(1).

The upshot of Theorems 6.2.5–6.2.6 is that the relation between the accuracy of BP and reconstruc-

tion is subtle.

Remark 6.2.7. As long as d < dmin non-reconstruction holds throughout and the BP approximations

are correct. But if dmin < d < dSAT and θ∗ < θ < θcond, then Theorem 6.2.5 (iii) shows that reconstruction

occurs. Nonetheless, Theorem 6.2.6 (ii) demonstrates that the BP approximations remain valid in

this regime. By contrast, for θcond < θ < θ∗ we have non-reconstruction by Theorem 6.2.5 (iii), but

Theorem 6.2.6 (iii) shows that BP misses its mark with a non-vanishing probability. Finally, for θ > θ∗
everything is in order once again as BP regains its footing and non-reconstruction holds. Unfortunately

BPGD is unlikely to reach this happy state because the algorithm is bound to make numerous mistakes at

times t/n ∈ (θcond,θ∗).

Figure 6.3 illustrates Theorems 6.2.5–6.2.6, displays the phase diagram in terms of d and θ ∼ t/n

for k = 3,4,5.

Figure 6.3 description:

• Hatched area: displays the regime θ < θ∗ and θcond < θ where non reconstruction holds.

• Non-Hatched area: displays the regime θ∗ < θ < θcond where we have reconstruction.

• Blue area: displays θ < θcond and θ > θ∗ where BP is correct.

• Orange area: BP is inaccurate.


59 CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT

2.0 2.2 2.4 2.6
d

0.00

0.05

0.10

0.15

dcoredmin dsat

*
cond

*

(a) k = 3

2.5 3.0 3.5
d

0.0

0.1

0.2

0.3

dcoredmin dsat

*
cond

*

(b) k = 4

2.5 3.0 3.5 4.0 4.5
d

0.0

0.1

0.2

0.3

0.4

dcoredmin dsat

*
cond

*

(c) k = 5

Figure 6.3: The phase diagrams for k = 3,4,5 with d ∈ (dmin,dSAT) on the horizontal and θ on the
vertical axis (Figure 3, [22]).

6.3 Proof Strategy.

Thanks to the half integrality of the messages introduced in [Chapter 3, Fact 3.2.2], BP is equivalent to

Warning Propagation in random k-XORSAT. Theorem 6.2.5–6.2.6 rely on the count of null variables in

the WP algorithm. Recall ωF,x→a ,ωF,a→x ,ωF,x ∈ {f,u,n} be the WP limits from Chapter 3. Furthermore,

let Vf,ℓ(F ), Vu,ℓ(F ), Vn,ℓ(F ) be the sets of variables with the respective mark after ℓ≥ 0 iterations and

Vf(F ),Vu(F ),Vn(F ) be the sets of variables where the limit ωF,x takes the respective value.

The following statement traces WP on the random formula F DC,t produced by the decimation

process.

Proposition 6.3.1 (Proposition 2.5, [22]). Let ε> 0 and assume that d > 0, t = t (n) ∼ θn satisfy one of

the following conditions:

(i). d < dmin, or

(ii). d > dmin and θ ̸∈ {θ∗,θ∗}.

Then there exists ℓ0 = ℓ0(d ,θ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have

∣∣t +|Vn,ℓ(F DC,t )|−α∗n
∣∣< εn,

∣∣t +|Vf,ℓ(F DC,t )|− (α∗−α∗)n
∣∣< εn,

∣∣Vn(F DC,t )△Vn,ℓ(F DC,t )
∣∣< εn.

Along with the above proposition, in order to investigate the accuracy of BP it suffices to compare

the numbers of variables marked n by WP with the true marginals. The following corollary summarizes

the result.

Corollary 6.3.2 (Corollary 2.9, [22]). For any d, θ the following statements are true.

(i). If d < dmin, or d > dmin and θ < θcond, or d > dmin and θ > θ∗, then |V0(F DC,t )△Vn(F DC,t )| = o(n)

w.h.p.

(ii). If d > dmin and θcond < θ < θ∗, then |V0(F DC,t )△Vn(F DC,t )| =Ω(n) w.h.p.


CHAPTER 6. PERFORMANCE OF BPGD ON RANDOM k-XORSAT 60

The Corollary 6.3.2 directly implies Theorem 6.2.6 which in turn implies Theorem 6.2.3 (ii). For the

(non-)reconstruction thresholds in Theorem 6.2.5 we need to investigate the conditional marginals

given the values of variables at a certain distances from xt+1 as in the (non)-reconstruction property

defined in Chapter 4. This is where the extra value f from the construction of WP enters.

Corollary 6.3.3 (Corollary 2.10, [22]). Assume that d > dmin and let ε> 0.

(i). If θ < θcond, then for any fixed ℓ we have |Vf,ℓ(F DC,t )∩V0,ℓ
1(F DC,t )| < εn w.h.p.

(ii). If θ > θcond, then there exists ℓ0 = ℓ0(d ,θ,ε) such that for any fixed ℓ> ℓ0 we have

|(Vn,ℓ(F DC,t )∪Vf,ℓ(F DC,t ))△V0,ℓ(F DC,t )| < εn w.h.p.

Comparing the number of actually frozen variables with the ones marked f by WP, we obtain Theo-

rem 6.2.5.

Coming to the proof of the success probability of BPGD, the Corollary 6.3.3 directly implies that

the BP approximations of the marginals are mostly correct for d < dmin on the formula F DC,t obtained

by the decimation process. The difficulty in analyzing BPGD lies in proving that the estimates of

the algorithm are not just mostly correct, but correct up to only a bounded expected number of

discrepancies over the entire execution of the algorithm. To prove this fact we combine the method

of differential equations with a precise analysis of the sources of the remaining bounded number of

discrepancies which comes from the presence of short (i.e., bounded-length) cycles (we call this as

’toxic cycles’ ) in the graph G(F ). (more details of the proof can be found in [22]).

Again due to the half-integrality fact 3.2.2 on random k-XORSAT, we know that BPGD boils down

to the pure combinatorial algorithm called ’Unit Clause Propagation’ (UCP) (pseudocode of the UCP

algorithm can be found in Chapter 3). We conclude this chapter by stating the following proposition

which can be verified easily (details can be found in [22]).

Proposition 6.3.4 (Proposition 6.1, [22]). We have,

P
[
BPGD outputs a satisfying assignment of F XOR

]=P[
UCP outputs a satisfying assignment of F XOR

]
.

This proposition implies that the success probability of BPGD established in Theorem 6.2.3 is

equivalent to that of the UCP algorithm. So, the second result of the thesis establishes a sharper bound

on the clause length for k ≥ 3. Depending on the regime of d , both the BPGD and UCP algorithms may

succeed or fail, thereby substantially improved over Yung’s result [102] which shows the lower bounds

on the clause length (k ≥ 9 for UCP and k ≥ 13 for BPGD).

1V0,ℓ(F ) be the set of variables xi such that σi = 0 for all σ ∈ ker AF for which σh = 0 for all variables xh ∈ ∂ℓxi


Based on: The random k-SAT Gibbs Uniqueness Threshold revisited [21]

Arnab Chatterjee, Amin Coja-Oghlan, Catherine Greenhill,

Vincent Pfenninger, Maurice Rolvien, Pavel Zakharov, Konstantinos Zampetakis

arXiv:2506.01359 (2025)

7
On the Gibbs Uniqueness in random k-SAT

“· · · the theory of Gibbs measures, which provides a very effective and

flexible way to define collections of “locally dependent” random

variables.”

–Amir Dembo et.al.

Unlike in previous chapter where we have investigated the random k-XORSAT problem, analyzing

the performance of BPGD and its connection to the decimation process associated with different phase

transitions, we now turn to random k-SAT, one of the most extensively studied random constraint

satisfaction problems. Beyond its central role in computational complexity, it provides a natural

framework for exploring the fundamental phenomena such as the number of satisfying assignments,

clustering, reconstruction and Gibbs uniqueness phase transition. In this chapter our focus is to

establish rigorous lower bounds on the number of satisfying assignments of random k-SAT upto

the Gibbs uniqueness threshold inspired from the statistical physics inspired mechanism ’replica

symmetric solution’ [77, 78].

7.1 Motivation and History

Since the time of 1990s, pinpointing the satisfiability threshold on random k-SAT, defined as the largest

clause to variable density upto which satisfying assignments exist [6], has been a guiding theme of

research in the area of random CSPs. For every k ≥ 3, indeed the physics inspired ’cavity method’

predicts the satisfiability threshold exactly [72] but for ’small’ k ≥ 3 in random k-SAT it is hard nut to

crack. In statistical physics, one of the most important quantity is to determine the exact number of

61


CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 62

satisfying assignments (also known as ’partition function’ in physics jargon) and then the satisfiability

threshold. More recently three prior contributions stand out to prove the physics prediction correctly.

Firstly, in [90] Panchenko and Talagrand proved a rigorous upper bound on the physics formula using

a proof technique called ’interpolation method’. Secondly, Achlioptas et al. [2] proved the physics

formula in the case k = 2 which is conceptually easier than k ≥ 3. Montanari and Shah in [83] provided

a correct approximation on the number of ’good’ satisfying assignment all but o(n) clauses for all

k ≥ 3. Our paper verifies the correct number of satisfying assignments given by physics method ’replica

symmetry solution’ for any k ≥ 3 upto Gibbs uniqueness threshold. Moreover we derive a lower bound

on the Gibbs uniqueness threshold which improves significantly over the work of Montanari and Shah

in [83] for small k ≥ 3.

7.2 Main Results.

In this section we state our third result of this thesis [21].

Let F kSAT = F d ,k (n) be the random k-CNF on n Boolean variables x1, . . . , xn with m
d=Po(dn/k)

clauses a1, . . . , am . Similarly like in random 2-SAT(Chapter 5) and in random k-XORSAT (Chapter 6)

the clauses ai are drawn independently and uniformly from the set of all 2k
(n

k

)
possible clauses with

k distinct variables. The parameter d prescribes the expected number of clauses associated with a

given variable appears to it. Also, let S(F kSAT) be the set of satisfying assignments of F kSAT and let

Z (F kSAT) = |S(F kSAT)| i.e., the number of satisfying assignments of the random k-SAT formula F kSAT.

The aim of this chapter are twofold:

(i). To study the logarithm of the number of satisfying assignments of random k-SAT (in statistical

physics it is called ’partition function’ as defined in Chapter 2) i.e., the quantity 1
n log Z (F kSAT)

as n →∞, which is given by the prediction of ’replica symmetry solution’ in terms of ’Bethe Free

entropy’ which is a function defined for a probability measure π ∈P (0,1).

(ii). When the above quantity is well-defined we further rigorously analyzed the lower bound ob-

tained for the Gibbs uniqueness threshold for any k ≥ 3 which is significantly improved for small

k values over the work in [83].

7.2.1 Limit in probability of log-partition function in random k-SAT

Along with the Gibbs uniqueness property defined in Chapter 4 as a final preparation we need to

illuminate the ’replica symmetric solution’ from [77, 78]. This prediction comes in terms of a fixed

point problem on the space P (0,1) of probability measures on the open unit interval. Consider the

Belief Propagation operator

BPd ,k : P (0,1) →P (0,1), π 7→ π̂= BPd ,k (π) (7.2.1)


63 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT

defined as follows. Let d+,d− d=Po(d/2) be Poisson variables with expectation d/2. Moreover, let

(µπ,i , j )i , j≥1 be a sequence of i.i.d. random variables, each following distribution π. All these random

variables are mutually independent. Further, let

µπ,i = 1−
k−1∏
j=1

µπ,i , j for i ≥ 1, and µ̂π =
∏d−

i=1µπ,2i−1∏d−
i=1µπ,2i−1 +

∏d+
i=1µπ,2i

. (7.2.2)

Then π̂ is the distribution of µ̂π. Furthermore, define the Bethe free entropy

Bd ,k (π) = E
[

log

(
d−∏
i=1

µπ,2i +
d+∏
i=1

µπ,2i−1

)
− d(k −1)

k
log

(
1−

k∏
j=1

µπ,1, j

)]
, (7.2.3)

provided that the expectation on the r.h.s. exists.

Theorem 7.2.1 (Theorem 1.1, [21]). Let k ≥ 3 and assume that 0 < d < duniq(k). Then the weak limit

πd ,k = lim
ℓ→∞

BPℓd ,k (δ1/2) ∈P (0,1) (7.2.4)

exists and

lim
n→∞

1

n
log Z (Φ) =Bd ,k (πd ,k ) in probability. (7.2.5)

where, BPℓd ,k is the ℓ-fold application of the operator BPd ,k and δ1/2 ∈ P (0,1) be the atom at

1/2. Although the formula (7.2.5) is not explicit, but the proof the Theorem 7.2.1 reveals that the

convergence of the weak limit πd ,k occurs rapidly. In the next subsection we will introduce few

threshold values of d and provide a improved lower bound on the number of solutions of random

k-SAT for any k ≥ 3.

7.2.2 Lower bound on Gibbs uniqueness

Before we dive into the second result of this chapter we first introduce few known and our threshold

values of d corresponds to the number of satisfying assignments of random k-SAT. From the title of

this chapter one natural question arises: How can we determine the Gibbs uniqueness threshold duniq

in random k-SAT?

The best known current result for duniq is in the case of k = 2 which coincides with the value dSAT

for random 2-SAT. As the precise value of duniq is not known for k ≥ 3, Montanari and Shah in [83]

proved that this value is upper bounded by the pure literal threshold dpure defined in [19, 74]:

duniq(k) ≤ dpure(k) = min
z>0

z

(1−exp(−z/2))k−1
. (7.2.6)


CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 64

Figure 7.1: Comparison of Bd ,k (πd ,k ) with known bounds for limn→∞ 1
n log Z (Φ) for k = 3. [21]

Complementing the upper bound (7.2.6), Montanari and Shah derived a lower bound dMS(k):

dMS(k) = sup
{

d > 0 : d(k −1)
(
1−exp(−d/2)/4

)(
1−exp(−d/2)/2

)k−2 < 1
}
≤ duniq(k). (7.2.7)

But unfortunately, this bound is not tight even for d = 2. Along this their bound only yields the number

of ‘good’ assignments satisfying all but o(n) clauses, rather than of actual satisfying assignments. In

the following theorem we derived a new lower bound dour on the Gibbs uniqueness threshold duniq on

the number of actual satisfying assignments of random k-SAT.

Theorem 7.2.2 (Theorem 1.2, [21]). For all k ≥ 3 we have

duniq(k) ≥ dour(k) := sup

{
d > 0 :

d(k −1)

2

(
1−exp(−d/2)/2

)k−2 < 1

}
. (7.2.8)

An easy calculation exposes that for every k ≥ 2,

dMS(k) < dour(k)

Beside that, the best prior rigorous bounds on the number of satisfying assignments beyond the giant

component threshold dgiant = 1/(k −1) from the first and second moment methods. The first moment

bound reads

1

n
log Z (F kSAT) ≤ log2+ d

k
log(1−2−k )+o(1) w.h.p. (7.2.9)

Moreover, Achlioptas and Peres [7] perform a second moment argument on the number of satisfying

assignments that enjoy a peculiar additional condition required to keep the second moment under


65 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT

control. They show that w.h.p.

1

n
log Z (F kSAT) ≥ (1−d) log2+ d

k
log

[(
λ1/2 +λ−1/2)k −λ−k/2

]
+o(1), (7.2.10)

where(1−λ)(1+λ)k−1 = 1, λ> 0. (7.2.11)

Figure 7.1 illustrates the bounds (7.2.9)–(7.2.10) along with (7.2.5) for k = 3.

Figure 7.1 description:

• The red dotted line depicts the first moment upper bound (7.2.9).

• The green dotted line represents the lower bound provided by (7.2.10).

• The blue line displays a numerical approximation of Bd ,3(πd ,3). To obtain our values, we

generated 106 samples from π≈ BP25
d ,3(δ1/2) and then evaluated the corresponding empirical

average of the expression in (7.2.3).

Finally combining Theorem 7.2.1–Theorem 7.2.2 we obtain the following corollary:

Corollary 7.2.3 (Corollary 1.3, [21]). For k ≥ 3 and d < dour the following holds from (7.2.5)

lim
n→∞

1

n
log Z (Φ) =Bd ,k (πd ,k )

7.3 Proof Strategy

The proof of the two results of this chapter comprises several steps discussing below:

7.3.1 Existence of fixed point.

The existence of the limit πd ,k is an easy consequence of the Gibbs uniqueness property discussed

in Chapter 4 for every d < duniq and the limit πd ,k = limℓ→∞ BPℓd ,k (δ1/2) is a fixed point of the Belief

Propagation operator from 7.2.1. The following proposition implies that the limit defined in (7.2.4)

exists with respect to W1-metric.

Proposition 7.3.1 (Proposition 2.1, [21]). The W1-limit πd ,k = limℓ→∞ BPℓd ,k (δ1/2) exists and

E
[

log2µπd ,k ,1,1

]
+E

∣∣∣∣∣log

(
d−∏
i=1

µπd ,k ,2i +
d+∏
i=1

µπd ,k ,2i−1

)∣∣∣∣∣+E
∣∣∣∣∣log

(
1−

k∏
j=1

µπd ,k ,1, j

)∣∣∣∣∣<∞. (7.3.1)

In addition, µπd ,k ,1,1 and 1−µπd ,k ,1,1 are identically distributed.

Along with the fixed point existence below we will discuss the upper and lower bound of the value

of log Z (F kSAT) using ’interpolation method’ and ’Aizenmann-Sims-Starr scheme’ respectively towards

the proof of our result in the regime d < duniq.


CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 66

7.3.2 Interpolation method: matching upper bound

In mathematical physics as well as in random constraint satisfaction problems, there are several

literature [31, 32, 90] deals with the interpolation method to provide a matching upper bound on the

normalized log-partition function when n →∞. The basic idea is to construct a family of random

CSPs, parameterized by t ∈ [0,1] which coincides with the final limiting random graph Ĝ of interest

while for t =0 the CSP is so simple that we can calculate the partition function easily.

Setup

Define a family of interpolating CSPs {Gt }t∈[0,1]:

• At t = 0: G0 is a trivial/decoupled CSP, often consisting of independent clauses on single variables.

• At t = 1: G1 = Ĝ , the full random CSP model of interest. The clauses consists of exactly k variables.

Indeed at t = 0 the logarithm of the partition function asymptotically equal to nBd ,k . To obtain the

matching upper bound on E[log Z (Ĝ)] one can show that the mean of the logarithm of the partition

function is a monotonically increasing function of t .

In our model, the interpolation method along with Proposition 7.3.1 easily implies that,

limsup
n→∞

1

n
E
[
log(Z (Φ)∨1)

]≤Bd ,k (πd ,k )

Ultimately Theorem 7.2.1 is the direct consequence of the following corollary and the proposition.

Corollary 7.3.2 (Corollary 2.2, [21]). If d < duniq(k) then w.h.p. we have

1

n
log Z (Φ) ≤Bd ,k (πd ,k )+o(1)

Proposition 7.3.3 (Proposition 2.3, [21]). If d < duniq(k) then

E
[
log(Z (Φd ,k (n +1))∨1)

]−E[
log(Z (Φd ,k (n))∨1)

]=Bd ,k (πd ,k )+o(1).

In order to evaluate the expectation from Proposition 7.3.3 we harness a ‘soft’ version of the k-SAT

problem where violated clauses are discouraged but not strictly forbidden. Define for a real β> 0:

Zβ(F kSAT) =
∑

σ∈{±1}V (F kSAT)

∏
a∈C (F kSAT)

exp(−β1{σ ̸|= a}). (7.3.2)

The above definition of the partition function ensures that Zβ(F kSAT) ≥ Z (F kSAT) for all β> 0. Then by

means of interpolating argument we can say the following theorem and lemma.


67 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT

Theorem 7.3.4 ( [90, Theorem 1]). For any k ≥ 3, any β> 0 and any probability measure π on [0,1] we

have

1

n
E
[
log Zβ(F kSAT)

]≤ E
[

log

(
d−∏
i=1

µβ,π,2i +
d+∏
i=1

µβ,π,2i−1

)
− d(k −1)

k
log

(
1−

(
1−e−β

) k∏
j=1

µπ,1, j

)]
, (7.3.3)

where, µβ,π,i = 1− (1−exp(−β))
k−1∏
j=1

µπ,i , j (for i ≥ 1).

The monotone convergence theorem for the measure π=πd ,k we get the explicit expression of the

r.h.s. in (7.3.3). Now the routine application of Azuma-Hoeffding implies to this soft model with β<∞
gives the below concentration bound:

Lemma 7.3.5 (Lemma 5.2, [21]). For any fixedβ> 0 we haveP
[∣∣log Zβ(F kSAT)−E log Zβ(F kSAT)

∣∣>p
n logn

]=
o(1/n).

This lemma implies that the clauses of the random formula F kSAT are drawn independently, and

adding or removing a single clause can alter the value of log Zβ( ·) by no more than ±β. Finally, the

Corollary 7.3.2 directly implies from Theorem 7.3.4 and Lemma 7.3.5.

7.3.3 Aizenmann-Sims-Starr: matching lower bound

The key step of this chapter is to establish a lower bound on log Z (F kSAT) that matches the upper

bound from Corollary 7.3.2.

Here, we couple the random k-CNF F kSAT(n) = F d ,k (n) with n variables with the random k-CNF

F kSAT(n + 1) = F d ,k (n + 1) with n + 1 variables. Most important part of the lower bound proof of

Proposition 7.3.3 consists of the coupling argument discussed below along with the necessary tail

bound.

CPL1 Let F ′
kSAT be a random k-CNF with variables x1, . . . , xn and m′ d=Po(d(n −k +1)/k) clauses.

CPL2 Obtain F ′′
kSAT from F ′

kSAT by adding another∆′′ d=Po(d(k −1)/k) independent random clauses.

CPL3 Obtain F ′′′
kSAT from F ′

kSAT by adding one new variable xn+1 and∆′′′ d=Po(d) independent random

clauses that each contain xn+1 and k −1 other variables from {x1, . . . , xn}.

Figure 7.2 shows a graphical representation of the Aizenmann-Sims-Starr scheme (coupling technique

used above). Based on the coupling we have the following fact and the tail bound:

Fact 7.3.6 (Fact 2.4, [21]). For any d > 0 we have Z (F kSAT(n))
d=Z (F ′′

kSAT) and Z (F kSAT(n+1))
d=Z (F ′′′

kSAT).

Proposition 7.3.7 (Proposition 2.7, [21]). For d < duniq(k) we have

E

[∣∣∣∣∣log
Z (F ′′

kSAT)∨1

Z (F ′
kSAT)∨1

∣∣∣∣∣
3/2

+
∣∣∣∣∣log

Z (F ′′′
kSAT)∨1

Z (F ′
kSAT)∨1

∣∣∣∣∣
3/2]

=O(1). (7.3.4)


CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT 68

independent
random
clauses

independent
random
clauses

Figure 7.2: A graphical representation of coupling technique (Aizenmann-Sims-Starr scheme)

Finally, towards the proof of Theorem 7.2.1, the existence of the limit comes from Proposition 7.3.1

(for the detailed proof one can refer [21]) and the normalized log-partition function comes from

the Aizenmann-Sims-Starr lower bound along with the ’PULP’ algorithm we discussed in Chapter 3.

Mathematically, the below equation (7.3.5) along with Corollary 7.3.2 implies the result.

1

n
E
[
log(1∨Z (F kSAT(n)))

]= 1

n

n−1∑
N=0

(
E
[
log(1∨Z (F kSAT(N +1))

]−E[
log(1∨Z (F kSAT(N ))

])

=Bd ,k (πd ,k )+o(1) . (7.3.5)

7.3.4 Lower bound on Gibbs uniqueness threshold

Finally, we are left with the proof of the Gibbs uniqueness threshold lower bound stated in Theo-

rem 7.2.2. An obvious challenge associated with the establishing the Gibbs uniqueness property

discussed in Chapter 4 is to estimate the marginal of the root variable given any possible boundary

conditions at a distance 2ℓ from the root r . But using the help of [2] we may confine ourselves to just a

single, explicit boundary configuration τ+ that satisfies

P
[
τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) =τ+(x)

]
= max
τ∈S(T(ℓ))

P
[
τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) = τ(x)

]
.

(7.3.6)


69 CHAPTER 7. ON THE GIBBS UNIQUENESS IN RANDOM k-SAT

and define for any variable w at distance 2q from x with parent clause a and grandparent variable u as

τ+(w) = sign(w, a) · 1{sign(u, a) ̸=τ+(u)}− sign(w, a) · 1{sign(u, a) =τ+(u)} . (7.3.7)

Then we can say for any integer ℓ > 0 the assignment τ+ satisfies (7.3.6). Hence the Theorem 7.2.2

reduces to the following statement.

Proposition 7.3.8 (Proposition 2.13, [21]). For d < dour(k) we have that

lim
ℓ→∞

E
[
P

[
τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) =τ+(x)

]
−P

[
τ(ℓ)(r ) = 1 |T

]]
= 0. (7.3.8)

Although it seem delicate because the boundary condition τ+ depends on the tree T(ℓ). To over-

come this problem, we generalize another technique from the work [2] on random 2-SAT to k ≥ 3 by

introducing a quantity that allows us to prove (7.3.8) which behaves ‘Markovian’ as we pass up and

down the tree.

Finally, by combining the mechanism of coupling and contraction with the treatment of both pure

and mixed literals (the detailed proof can be found in [21]), we conclude that Theorem 7.2.2 directly

follows from the Proposition 7.3.8 together with the triangle inequality.


8
The Last Chapter

“The important thing is not to stop questioning. Curiosity has its

own reason for existing.”

–Albert Einstein

In this concluding chapter we take a break from formulating and answering questions and instead

summarize the main ideas of this thesis and an evaluation of the author’s contribution to each paper is

addressed. We also discuss some potential future directions and mention some relevant research areas

that were out of scope for discussions in the previous chapters and in the papers (found in appendix).

8.1 Summary of the thesis

So far, this thesis has addressed several problems in the probabilistic analysis of random combinatorial

structures, more particularly in random constraint satisfaction problems (CSPs). While Chapter 5,6,7

focused on more specific model, together they contribute to a unified understanding of how the

combinatorial structure, randomness and statistical physics ideas interact in discrete probability.

Below we will give a very brief overview of our results stated in previous three chapters and a quick

comparison with the previous results.

Random 2-SAT: Our first result concerns the random 2-SAT, where we establish a central limit

theorem on the number of random 2-SAT solutions which exhibits log-normal fluctuations. This

provides a precise distributional description of the solution space size and strengthen the clear

picture of probabilistic argument of random satisfiability by analyzing the contraction property

of logBP⊗d ,t which ensures the existence of a unique fixed point ρd ,t and involves the analysis of

Belief Propagation on a Galton-Watson tree, which connects the fixed point to the BP marginals.

70


71 CHAPTER 8. THE LAST CHAPTER

Finally, we derive our main result from the general martingale central limit theorem , which is a

special case of ( [55],Theorem 3.2).

Comparison with previous work:

Known results:

• In 1996, Goerdt [53] stated that

d = 2 is the satisfiability thresh-

old of a random 2-SAT formula.

In other words, for any ε > 0, the

probability of random 2-SAT is

satisfiable tends to one if d < 2−ε
and tends to zero if d > 2 + ε as

n →∞.

Another obvious question is to

find out the number of satisfying

assignments in the satisfiable

regime i.e., when d < 2.

• In 2021, Achlioptas et.al. [2] pro-

vides a first order approximation

by stating the normalized parti-

tion function i.e., the logarithm

of the number of solution (Z ) of

random 2-SAT formula F w.r.t.

n (where n is the number of

variables in F ) converges in prob-

ability to a constant (µd ) which

doesn’t depend on n. Mathemati-

cally,

log Z (F )

n

p−→µd

Our results:

In this thesis, we say even more about

the number of solutions of random 2-SAT.

The number of random 2-SAT solutions

exhibits fluctuations of order
p

n. More

precisely,

log Z (F )−E[log Z (F )]

ηd
p

n
d−→N (0,1)

where ηd ≥ 0 is not depend on n. Along

with the asymptotic normality we also

evaluated the formula for variance effec-

tively.

Note: By contrast, for other random CSPs

the typical fluctuations of the logarithm

of the number of solutions are bounded

throughout all or most of the satisfiable

regime.

Random k-XORSAT: In the context of the second result of this thesis, we analyzed the performance

of Belief Propagation Guided Decimation, a statistical physics inspired algorithm on random k-

XORSAT and our rigorous analysis mathematically verified the heuristic work of Ricci-Tersenghi

and Semerjian [96]. Specifically, we derive an explicit threshold upto which the BPGD algorithm

succeeds with a strictly positive probability Ω(1) and beyond which the algorithm fails with


CHAPTER 8. THE LAST CHAPTER 72

high probability. Due to its algebraic structure, BPGD is equivalent to the purely combinatorial

algorithm called ’Unit Clause Propagation’(UCP), so the results work for BPGD should work

for UCP as well for the same parameter values. In addition to this we analyze the ’Decimation

process’ for which we identify a (non)-reconstruction and condensation phase transition.

Comparison with previous work:

Known results:

• In 2009, Ricci-Tersenghi and

Semerjian [96] provide a heuris-

tic on the satisfiability formula

without giving a proof. They just

mentioned the success proba-

bility of both BPGD and UCP

without verified the proof for

that.

Psucc = exp

(
−

∫ 1

0

d t

4(1− t )

f (t )2

(1− f (t ))

)

with f (t) = αk(k − 1)t k−2(1 − t)

where α is the clause-to-variable

density.

• In a recent paper by Yung [102]

establishes the ’Overlap Gap

paradigm’ which only provides

one-sided bound due to the mo-

ment computations (in physics

jargon called ’annealed’ tech-

nique), which implies no positive

result of the algorithm.

For the above reason, his lower

bounds on the clause length

k ≥ 9 for UCP and k ≥ 13 for

BPGD are not tight. Moreover

according to his argument BPGD

and Unit clause both algorithm

fail for d > dcore.

Our results:

We mathematically verified the heuristic

predictions of Ricci-Tersenghi and Semer-

jian [96] as to the performance of BPGD

by providing an explicit formula for the

success probability of the algorithm for

precise clause-to-variable densities.

For k ≥ 3, we proof the following:

• If d < dmin(k), then

lim
n→∞Psucc = exp

(
−d 2(k −1)2

4
∫ 1

0

z2k−4(1− z)

1−d(k −1)zk−2(1− z)
dz

)
.

• If dmin(k) < d < dSAT(k), then

Psucc = o(1).

Moreover, in contrast to Yung’s result, our

technique relies on ’quenched’ argument

which shows that for any k ≥ 3, both

BPGD and UCP find the satisfying assign-

ment with strictly positive probability for

d < dmin with dmin < dcore.

We also analyze different phase transition

of decimation process for which we pin-

point the (non)-reconstruction and con-

densation phase transition based on dif-

ferent d and θ values for any k ≥ 3.


73 CHAPTER 8. THE LAST CHAPTER

Random k-SAT: Turning to random k-SAT, we revisited the Gibbs Uniqueness threshold. In this

result we prove that for any k ≥ 3 for the clause-to-variable densities upto the Gibbs uniqueness

threshold , the number of satisfying assignments of random k-SAT is given by physics inspired

’replica symmetric solution’. Our result sharpen the understanding of the onset of long range

correlations on random k-SAT by clarifying the transition of Gibbs measure from uniqueness

to (non)-uniqueness regime. Below we will compare our result with the previous results in this

context.

Comparison with previous work:

Known results:

• In 2004 Panchenko and Tala-

grand [90] proved a rigorous

upper bound on the physics

formula for the number of satis-

fying assignments using a proof

technique called ’interpolation

method’.

• Achlioptas et.al. in [2] proved

the physics formula in the case

k = 2 which is much simpler

than for any k ≥ 3.

• The most important work in this

regards by Montanari and Shah

in 2007 provided a correct ap-

proximation on the number of

’good’ satisfying assignment all

but o(n) clauses for k ≥ 3.

However, it seems difficult to

estimate the gap between the

number of such ’good’ assign-

ments and the number of actual

satisfying assignments.

Our results:

Our paper verifies the correct number of

satisfying assignments given by physics

method ’replica symmetry solution’ used

in [77,78] for any k ≥ 3 upto Gibbs unique-

ness threshold.

Moreover we derive a lower bound on the

Gibbs uniqueness threshold which im-

proves significantly over the work of Mon-

tanari and Shah in [83] for small k ≥ 3.

Below we provide the values of dgiant (gi-

ant component threshold of hypergraph

induced by random k-CNF formula),

dMS (Montanari-Shah bound), dour (our

bound), dpure (pure literal threshold),

dSAT (satisfiability threshold) for small k-

values.

k 2 3 4 5

dgiant 1.0000 0.5000 0.3333 0.2500

dMS 1.1625 0.8792 0.8695 0.9236

dour 2.0000 1.3431 1.2451 1.2635

dpure 2.0000 4.9108 6.1782 7.0178

dSAT 2.0000 12.801 39.724 105.585

Taken together the results in this thesis highlight the estimation of partition function (in other

words, the number of satisfying assignments) of random formulas and their phase transitions by

providing a clear picture of the solution space geometry of different random satisfiability problems.


CHAPTER 8. THE LAST CHAPTER 74

Therefore, it establishes a bridge between probabilistic combinatorics, random graphs and statistical

physics. More specifically, it show how the tools such as local weak convergence, recursive fixed

point distributional equations, different message passing algorithms (such as Belief Propagation,

Warning Propagation, UCP), correlation decay methods can be leveraged to analyze the long range

dependencies, phase transitions, solution space geometry and algorithmic thresholds in random

structures.

There are several challenging and more interesting open problems still remain which we will

encounter in the next section.

8.2 Future Directions

Paper 1 [23]: The number of random 2-SAT solutions is asymptotically log-normal

■ Investigate whether the present method of considering correlated instances be extended to

random optimization problems. Moreover, establishing a central limit theorems for random

optimization problems are also very interesting.

Cao [20] provided a general framework based on the ‘objective method’ [9]. Unfortunately, the

conditions of Cao’s theorem tend to be unwieldy for MAX-CSP problems with hard constraints.

Recent work of Kreačič [58] and Glasgow, Kwan, Sah, Sawhney [52] on the matching number

therefore instead resorts to the use of stochastic differential equations.

■ Another most interesting question can be whether the log-normal fluctuations hold for counting

the number of solutions in other models such as random Horn-SAT or any planted CSPs.

■ Understand the precise behavior of variance near the satisfiability threshold and across the

families of random CSPs whether the fluctuations emerge or not.

Paper 2 [22]: Belief Propagation Guided Decimation on random k-XORSAT

■ Unlike random k-XORSAT, due to the lack of inherent symmetry in random k-SAT, BPGD algo-

rithm provably fails to find the satisfying assignments on random k-SAT instances even below

the threshold where the set of satisfying assignments shatters into well-separated clusters [1, 59].

So, a sophisticated message passing algorithm called ’Survey Propagation Guided Decimation’

has been suggested in [72, 96]. In random k-XORSAT, both BPGD and SPGD are equivalent but

these two algorithms are substantially different in random k-SAT. Therefore investigating SPGD

on random k-SAT can be one of the most interesting problem in this context and one might

hope that SPGD outperforms BPGD on random k-SAT and finds satisfying assignments up to

the aforementioned shattering transition. A negative result to the effect that Survey Propagation

Guided Decimation fails asymptotically beyond the shattering transition point for large enough

k exists [56]. Yet a complete analysis of SPGD/BPGD on random k-SAT like in random k-XORSAT

used in this thesis remains an outstanding challenge.


75 CHAPTER 8. THE LAST CHAPTER

■ Investigating the interplay between the decimation process and the solution space geometry

especially how the frozen variables emerge dynamically.

■ Apply the technique used in this paper to some inference problems in graphical models (e.g.,

community detection, coding theory and planted problems).

■ Finally, one of the most interesting open problem towards the performance of various types of

algorithms such as greedy, message passing or local search that aim to find an assignment that

violates the least possible number of clauses. A first step based on the heuristic ‘dynamical cavity

method’ was recently undertaken by Maier, Behrens and Zdeborová [64].

Paper 3 [21]: The random k-SAT Gibbs Uniqueness threshold revisited

■ Pinpoint the exact threshold and nature of the Gibbs uniqueness threshold for general k ≥ 3 and

its connection with statistical physics predictions.

■ Detailed exploration of the other phase transitions beyond Gibbs uniqueness like- (non)-reconstruction

/ reconstruction, condensation and freezing phases and their interactions.

Physics predicts a sequence of phase transitions [59]: Replica Symmetric (non-reconstruction)

→ clustering/dynamic phase → condensation → freezing

whereas the reconstruction threshold coincide with the clustering threshold. Physics predictions

exist for the reconstruction threshold in random k-SAT using the heuristic technique called

’cavity method’ (or, replica symmetry) and it happens when αrec ≈ 2k

k logk for large k values.

So, when it comes with the long range correlations in the Gibbs measure the exact rigorously

mathematical proof of reconstruction threshold is still a challenging problem.

■ Extending the results on Gibbs uniqueness to other random CSPs like hypergraph coloring,

independent sets in random hypergraphs and spin glasses – making a promising and interesting

research direction.

8.3 Contribution of the authors

In the previous chapters we discussed an overview of the results and the proofs along with the useful

tools we have used to get our results. The full versions of the papers can be found in appendix. We

conclude the thesis by providing a list of papers that are the backbone of the thesis and to which the

author of this thesis contributed.

The first result of this thesis is from the paper titled “The number of random 2-SAT solutions

is asymptotically log-normal” by Arnab Chatterjee, Amin Coja-Oghlan, Noela Müller, Connor Rid-

dlesden, Maurice Rolvien, Pavel Zakharov, Haodong Zhu. The extended version of the paper has

been published in the ’Theory of Computing’ (TOC) journal and the preliminary version appeared in


CHAPTER 8. THE LAST CHAPTER 76

’Approximation, Randomization, and Combinatorial Optimization’ (APPROX/RANDOM’24) – Leib-

niz International Proceedings in Informatics (LIPIcs), volume 317, 39 : 1−39 : 15, Schloss Dagstuhl

- Leibniz-Zentrum für Informatik (2024). This paper establishes a central limit theorem (’CLT’) on

the logarithm of the number of solutions of random 2-SAT formula throughout the satisfiable regime.

Beside this the paper also calculate the variance effectively. The problem first raised when NM, CR and

HZ visited TU Dortmund for a week and introduced the martingale central limit theorem from the

book of Hall and Heyde [55] and AC, ACO, MR and PZ jointly discussed with them about the martingale

CLT on the number of satisfying assignments of random 2-SAT and how to prove the finiteness of

the variance. Later AC, ACO, MR and PZ discussed towards variance calculation by constructing two

correlated formulas and how to obtained a pruned correlated formula using UCP. AC, ACO, MR and PZ

jointly examined the effect of removing a single clause from the original formula on the number of

solutions in the pruned formula. AC contributed towards creating the Galton-Watson tree which is the

local limit of two correlated formulas. Beside this, AC, NM and HZ was involving in the contraction

property of logBP⊗d ,t which ensures the existence of a unique fixed point ρd ,t and involves the analysis

of belief propagation on a Galton-Watson tree, which connects this fixed point to the BP marginals and

thus completing the proof of our main result.

The second result of this thesis is from the paper titled “Belief Propagation Guided Decimation

on random k-XORSAT” by Arnab Chatterjee, Amin Coja-Oghlan, Mihyung Kang, Lena Krieg, Maurice

Rolvien and Gregory Sorkin. The extended version of the paper has been submitted to ’Theory of

Computing’ (TOC) journal and the conference version of this paper appeared in the ’52nd Interna-

tional Colloquium on Automata, Languages and Programming’ (ICALP’25) – Leibniz International

Proceedings in Informatics (LIPIcs), volume 334, 47 : 1−47 : 21, Schloss Dagstuhl - Leibniz-Zentrum

für Informatik (2025). This paper analyze the performance of BPGD algorithm on random k-XORSAT

formula and different phase transition of the decimation process. The problem was first raised when

GS visited TU Dortmund and ACO introduced the problem by pointing out the paper by two physicists

Ricci-Tersenghi and Semerjian [96] that their paper only provide a heuristic on the success probability

of the algorithm. AC,MR and GS worked towards the tight analysis of BPGD by analyzing the function

Φd ,k (Bethe-Free Entropy) and some threshold values of d . AC carried out the calculus part which

leads to the proof of the behavior of the function w.r.t. different parameters (d ,λ). Later, AC, ACO

and LK jointly developed the lemmas involving the presence of so-called toxic cycles in a sub formula

which create obstacles towards the success probability of the formula.

Coming to the last result of this thesis, it is from the paper titled “The random k-SAT Gibbs unique-

ness threshold revisited” by Arnab Chatterjee, Amin Coja-Oghlan, Catherine Greenhill, Vincent Pfen-

ninger, Maurice Rolvien, Pavel Zakharov and Konstantinos Zampetakis. The paper has been submitted

to “Combinatorics, Probability and Computing” (CPC) journal. The aim of this paper is to determine

the number of actual satisfying assignments of random k-SAT formula for clause-to-variable densities

upto Gibbs uniqueness threshold. The problem was first raised when CG and VP visited TU Dortmund

and we came up with an problem on counting the number of actual assignments of random k-SAT for-

mula. Then we all looked at the paper by Montanari and Shah [83] in which they showed that for k ≥ 3


77 CHAPTER 8. THE LAST CHAPTER

certain clause/variable densities the ’replica symmetric solution’ from physics correctly approximates

the number of ‘good’ assignments that satisfy all but o(n) clauses. But unfortunately their bound was

not tight even for k = 2 case. In this context AC, ACO, PZ and KZ jointly developed the pure literal

pursuit (’PULP’) algorithm whose purpose is to trace the repercussions of setting a relatively small

number of variables to specific truth values which constitutes the main technical challenge towards

proof of the main result. AC and KZ was involved in constructing pure and mixed literal operator LL⋆k,d

and the contraction of the operator with respect to a metric related to W1-Wasserstein distance and

this summarizes the main step towards the proof of second theorem of the paper.

nainaṁ chhindanti śhhastrān. i nainaṁ dahati pāvakah.
na chainaṁ kledayantyāpo na śhhos. hayati mārutah.

The atma (soul) cannot be shattered by weapons, it cannot be burnt by fires,

it cannot be drenched by the waters and it cannot be rendered dry by the winds.

– Srimat Bhagavad Gita (2.23)


Bibliography

[1] D. Achlioptas and A. Coja-Oghlan. Algorithmic barriers from phase transitions. In 49th Annual

IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25–28, 2008, Philadel-

phia, PA, USA, IEEE Computer Society, pp. 793–802 (2008).

[2] D. Achlioptas, A. Coja-Oghlan, M. Hahn-Klimroth, J. Lee, N. Müller, M. Penschuck, G. Zhou: The

number of satisfying assignments of random 2-SAT formulas. Random Structures and Algorithms

58, pp. 609–647, (2021).

[3] D. Achlioptas and E. Friedgut: A sharp threshold for k-colorability. Random Structures & Algo-

rithms,14, pp. 63–70, (1999).

[4] D. Achlioptas, C. Moore: Random k-SAT: two moments suffice to cross a sharp threshold. SIAM

Journal on Computing 36, 3, pp.740–762, (2006).

[5] D. Achlioptas and A. Naor: The Two Possible Values of the Chromatic Number of a Random Graph.

Annals of Mathematics, Second Series, Vol. 162, 3, pp. 1335-1351, (2005).

[6] D. Achlioptas, A. Naor and Y. Peres: Rigorous location of phase transitions in hard optimization

problems. Nature 435, 7043, pp.759–764, (2005).

[7] D. Achlioptas, Y. Peres: The threshold for random k-SAT is 2k log2−O(k). Journal of the AMS 17,

pp.947–973, (2004).

[8] D. Achlioptas and F. Ricci-Tersenghi. On the solution-space geometry of random constraint

satisfaction problems. In Proceedings of 38th STOC, 130-139, New York, NY, USA, (2006), ACM.

[9] D. Aldous, J. Steele: The objective method: probabilistic combinatorial optimization and local

weak convergence. In: H. Kesten (ed.): Probability on Discrete Structures. Springer (2004).

[10] N. Alon and J. Spencer: The probabilistic method. Willey, 2nd edition, (2000).

[11] S. Arora and B. Barak. Computational complexity: a modern approach. Cambridge University

Press, (2009).

[12] P. Ayre, A. Coja-Oghlan, P. Gao, N. Müller: The satisfiability threshold for random linear equations.

Combinatorica 40, pp. 179–235, (2020).

78


[13] A.B.Babaev, A.K.Murtazaev and F.A. Kassan-Ogly: Ground State of an Antiferromagnetic Three-

State Potts Model on a Triangular Lattice with Competing Interactions. Journal of Experimental

and Theoretical Physics, Vol. 127, pp. 323–327, (2018).

[14] H. A. Bethe. Statistical physics of superlattices. Proceedings of the Royal Society of London A,

150:552–575, (1935).

[15] G. Biroli, R. Monasson and M. Weigt. A variational description of the ground state structure in

random satisfiability problems. The European Physics Journal B, 14:551, (2000).

[16] R. Biswas, W. Chen, A. Sen: On the replica symmetric solution in general diluted spin glasses.

arXiv:2410.15599 (2024).

[17] S.C. Brailsford, C.N. Potts and B.M. Smith. Constraint satisfaction problems: Algorithms and

applications. European journal of operational research, 119(3):557–581, (1999).

[18] G. Bresler, B. Huang: The algorithmic phase transition of random k-sat for low degree polynomials.

In 62nd Annual Symposium on Foundations of Computer Science, FOCS 2021, IEEE Computer

Society, Los Alamitos, CA, pp.298–309, (2022).

[19] A. Broder, A. Frieze, E. Upfal: On the satisfiability and maximum satisfiability of random 3-CNF

formulas. Proc. 4th SODA, pp.322–330, (1993).

[20] S. Cao: Central limit theorems for combinatorial optimization problems on sparse Erdős-Rényi

graphs. Annals of Applied Probability 31, pp.1687–1723, (2021).

[21] A. Chatterjee, A. Coja-Oghlan, C. Greenhill, V. Pfenninger, M. Rolvien, P. Zakharov, K. Zampetakis:

The random k-SAT Gibbs Uniqueness Threshold revisited. arXiv preprint arXiv:2506.01359, (2025).

[22] A. Chatterjee, A. Coja-Oghlan, M. Kang, L. Krieg, M. Rolvien, G. Sorkin: Belief Propagation Guided

Decimation on random k-XORSAT. In Proceedings of the 52nd ICALP, (2025).

[23] A. Chatterjee, A. Coja-Oghlan, N. Müller, C. Riddlesden, M. Rolvien, P. Zakharov, H. Zhu: The

number of random 2-SAT solutions is asymptotically log-normal. In Proceedings of 28th RANDOM,

39, (2024).

[24] P. Cheeseman, B. Kanefsky, W. Taylor: Where the really hard problems are. In Proceedings of the

IJCAI, pp.331–337, (1991).

[25] W.-K. Chen, P. Dey, D. Panchenko: Fluctuations of the free energy in the mixed p-spin models

with external field. Probability Theory and Related Fields 168, pp.41–53, (2017).

[26] V. Chvátal, B.A. Reed: Mick gets some (the odds are on his side). In 33rd Annual Symposium

on Foundations of Computer Science, Pittsburgh, Pennsylvania, USA, 24–27 October 1992, IEEE

Computer Society, pp.620–627.

79


[27] A. Coja-Oghlan. A better algorithm for random k-SAT. SIAM Journal on Computing, 39(7):2823–

2864, (2010).

[28] A. Coja-Oghlan: Belief Propagation fails on random formulas. Journal of the ACM 63 (2017) #49.

[29] A. Coja-Oghlan, A. Ergür, P. Gao, S. Hetterich, M. Rolvien: The rank of sparse random matrices.

Proc. 31st SODA, pp. 579–591, (2020).

[30] A. Coja-Oghlan, A. Haqshenas and S. Hetterich. Walksat stalls well below satisfiability. SIAM

Journal on Discrete Mathematics, 31(2):1160–1173, (2017).

[31] A. Coja-Oghlan, T. Kapetanopoulos, N. Müller: The replica symmetric phase of random constraint

satisfaction problems. Combinatorics, Probability and Computing 29, 3 , pp.346-422, (2020).

[32] A. Coja-Oghlan, F. Krzakala, W. Perkins, L. Zdeborová: Information-theoretic thresholds from the

cavity method. Advances in Mathematics 333, pp.694–795, (2018).

[33] A. Coja-Oghlan, A. Pachon-Pinzon: The decimation process in random k-SAT. SIAM Journal on

Discrete Mathematics 26, pp.1471–1509, (2012).

[34] A. Coja-Oghlan and K. Panagiotou. The asymptotic k-SAT threshold. Advances in Mathematics,

288, pp.985–1068, (2016).

[35] A. Coja-Oghlan, N. Wormald: The number of satisfying assignments of random regular k-SAT

formulas. Combinatorics, Probability and Computing 27, pp.496–530, (2018).

[36] A. Crisanti, G. Paladin and H.-J.S.A. Vulpiani: Replica trick and fluctuations in disordered systems.

Journal de Physique I, 2(7):1325–1332, (1992).

[37] A. Dembo, A. Montanari: Ising models on locally tree-like graphs. Annals of Applied Probability

20, pp.565–592, (2010).

[38] A. Dembo, A. Montanari: Gibbs measures and phase transitions on sparse random graphs. Brazil-

ian Journal of Probability and Statistics 24, pp.137–211, (2010).

[39] A. Dembo, A. Montanari, A. Sly, N. Sun: The replica symmetric solution for Potts models on

d-regular graphs. Communications in Mathematical Physics, 327, pp. 551–575, (2014).

[40] A. Dembo, A. Montanari, N. Sun: Factor models on locally tree-like graphs. Annals of Probability

41, pp.4162–4213, (2013).

[41] J. Ding, A. Sly and N. Sun. Proof of the satisfiability conjecture for large k. 20 Annals of Mathematics

196, pp.1–388, (2022).

[42] O. Dubois, J. Mandler: The 3-XORSAT threshold. In 43rd Annual IEEE Symposium on Foundations

of Computer Science, FOCS, pp. 769–778, (2002).

80


[43] A. Dylan: PhD Thesis : Gibbs Measures on Sparse Random Graphs. Princeton University, Computer

Science Dept. Engineering Quadrangle Princeton, NJ, United States, (2022).

[44] C. Efthymiou: On sampling symmetric Gibbs distributions on sparse random graphs and hyper-

graphs. In 49th EATCS International Conference on Automata, Languages, and Programming, vol.

229 of LIPICs. Leibniz Int. Proc. Inform. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, (2022),

page Art. No. 57, 16.

[45] J. Feigenbaum. The use of coding theory in computational complexity. In Proceedings of Sympo-

sium in Applied Mathematics, volume 50, pp.207–233, (1995).

[46] E.C. Freuder and A.K. Mackworth. Constraint satisfaction: An emerging paradigm. In Foundations

of Artificial Intelligence, volume 2, 13–27. Elsevier, (2006).

[47] E. Friedgut. Sharp thresholds of graph properties, and the k-SAT problem. Journal of the AMS,

12:1017–1054, (1999).

[48] A. Frieze and M. Karoński: Introduction to Random Graphs. Cambridge University Press, (2015).

[49] R. G. Gallager: Low-density parity check codes. IEEE Transaction on Information Theory, 8:21–28,

(1962).

[50] D. Gamarnik and M.Sudan. Performance of sequential local algorithms for the random NAE-k-SAT

problem. SIAM Journal on Computing, 46(2):590–619, (2017).

[51] M.R. Garey and D.S. Johnson. Computers and intractability: A guide to the theory of NP-

completeness. Freeman, San Francisco, (1979).

[52] M. Glasgow, M. Kwan, A. Sah, M. Sawhney: A central limit theorem for the matching number of a

sparse random graph. arXiv:2402.05851 (2024).

[53] A. Goerdt: A threshold for unsatisfiability. Journal of Computer and System Sciences, 53, pp.469–

486, (1996).

[54] J. Gu and R. Sosic. A parallel architecture for constraint satisfaction. In International conference

on industrial and engineering applications of artificial intelligence and expert systems,pp. 229–237,

(1991).

[55] P. Hall, C. Heyde: Martingale limit theory and its applications. Academic Press (1980).

[56] S. Hetterich: Analysing survey propagation guided decimation on random formulas. arXiv

preprint arXiv:1602.08519, (2016).

[57] L.M. Kirousis, E. Kranakis, D. Krizanc and Y.C. Stamatiou: Approximating the unsatisfiability

threshold of random formulas. Random Structures & Algorithms, 12(3):253–269, (1998).

81


[58] E. Kreačič: Some problems related to the Karp-Sipser algorithm on random graphs. Ph.D. thesis,

University of Oxford, (2017).

[59] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian and L. Zdeborová: Gibbs states and

the set of solutions of random constraint satisfaction problems. In Proceedings of the National

Academy of Sciences 104 (25), pp. 10318–10323, (2007).

[60] F.R. Kschischang, B. Frey and H. A. Loeliger: Factor graphs and the sum-product algorithm. IEEE

Transactions on Information Theory, 47(2):498–519, (2001).

[61] V. Kumar. Algorithms for constraint-satisfaction problems: A survey. AI magazine, 13(1):32–32,

(1992).

[62] L. Lovász: Large networks and graph limits. AMS (2012).

[63] T. Łuczakand J.C. Wierman: The chromatic number of random graphs at the double-jump

threshold. Combinatorica, Volume 9, pages 39–49, (1989).

[64] A. Maier, F. Behrens, L. Zdeborová: Dynamical cavity method for hypergraphs and its application

to quenches in the k-XOR-SAT problem. arxiv 2412.14794 (2024).

[65] S. Mertens, M. Mézard and R. Zecchina. Threshold values of random k-SAT from the cavity

method. Random Structures and Algorithms, 28(3):340–373, (2006).

[66] M. Mézard and A. Montanari : Information, physics and computation. Oxford University Press,

(2009).

[67] M. Mézard and A. Montanari. Reconstruction on trees and spin glass transition. Journal of Statis-

tical Physics, 124:1317-1350, (2006).

[68] M. Mézard and T. Mora. Constraint satisfaction problems and neural networks: A statistical

physics perspective. Journal of Physiology-Paris, 103(1-2): 107–113, (2009).

[69] M. Mézard and G. Parisi. The bethe lattice spin glass revisited. The European Physics Journal B,

20:217, (2001).

[70] M. Mézard, G. Parisi, N. Sourlas, G. Toulouse, and M. Virasoro: Replica symmetry breaking and

the nature of the spin glass phase. Journal de Physique, 45(5), pp. 843–854, (1984).

[71] M. Mézard, G. Parisi and M. Virasoro: Spin glass theory and beyond: An introduction to the

replica method and its applications, World Scientific Publishing Company, Vol. 9, (1987).

[72] M. Mézard, G. Parisi and R. Zecchina. Analytic and algorithmic solution of random satisfiability

problems. Science, 297:812–815, (2002).

[73] D. Mitchell, B. Selman and H. Levesque. Hard and easy distributions of SAT problems. In Proceed-

ings of the 10th National Conference on Artificial Intelligence, pp. 459-465, (1992).

82


[74] M. Molloy. Models for random constraint satisfaction problems. SIAM Journal on Computing,

32(4):935–949, (2003).

[75] M. Molloy: Cores in random hypergraphs and Boolean formulas. Random Structures and Algo-

rithms 27, pp.124–135, (2005).

[76] M. Molloy. The freezing threshold for k-colorings of a random graph. In Proceedings of the 44th

symposium on Theory of Computing, 921. ACM, (2012).

[77] R. Monasson, R. Zecchina. The entropy of the k-satisfiability problem. Physics Review Letter, 76,

3881, (1996).

[78] R. Monasson, R. Zecchina: Statistical mechanics of the random K -SAT model. Phys. Rev. E 56,

pp.1357–1370, (1997).

[79] R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman and L. Troyansky. 2+p-sat: Relation of typical-

case complexity to the nature of the phase transition. Random Structures and Algorithms, 15:414,

(1999).

[80] A. Montanari, R. Restrepo and P. Tetali. Reconstruction and clustering in random constraint

satisfaction problems. SIAM Journal on Discrete Mathematics, 25(2):771–808, (2011).

[81] A. Montanari, F. Ricci-Tersenghi, G. Semerjian. Cluster of solutions and replica symmetry breaking

in random k-satisfiability. Journal of Statistical Mechanics, P04004, (2008).

[82] A. Montanari and G. Semerjian: Rigorous inequalities between length and time scales in glassy

systems. Journal of Statistical Physics, 125: 23, (2006).

[83] A. Montanari and D. Shah. Counting good truth assignments of random k-SAT formulae. In

Proceedings of the 18th SODA, pp. 1255–1264, (2007).

[84] C. Moore and S. Mertens. The Nature of Computation. Oxford University Press, (2011).

[85] E. Mossel and Y. Peres. Information flow on trees. Annals of Applied Probability, 13(3):817–844, 08

(2003).

[86] N. Múller, R. Neininger, H. Zhu. Random 2-SAT: The set of atoms of the limiting empirical marginal

distribution. arXiv:2410.17749 [math.PR], (2024).

[87] D. Panchenko: The Sherrington-Kirkpatrick model. Springer (2013).

[88] D. Panchenko: Spin glass models from the point of view of spin distributions. Annals of Probability

41, pp.1315–1361, (2013).

[89] D. Panchenko: On the replica symmetric solution of the K -sat model. Electron. J. Probab. 19

(2014) #67.

83


[90] D. Panchenko, M. Talagrand: Bounds for diluted mean-fields spin glass models. Probab. Theory

Relat. Fields 130, pp. 319–336, (2004).

[91] C.H. Papadimitriou. Computational complexity. Addison-Wesley, (1994).

[92] J. Pearl: Probabilistic reasoning in intelligent systems : networks of plausible inference. Morgan

kaufmann Publishers Inc., San Francisco, CA, USA, (1988).

[93] B. Pittel, G.B. Sorkin: The satisfiability threshold for k-XORSAT. Combinatorics, Probability and

Computing 25, 2, pp. 236–268, (2016).

[94] B. Pittel, J. Spencer and N. Wormald: Sudden emergence of a giant k-core in a random graph.

Journal of Combinatorial Theory, Series B, Vol. 67, (1996).

[95] F. Rassmann : On the number of solutions in random graph k-coloring. Combinatorics, Probability

and Computing 28, 1, pp. 130–158, (2019).

[96] F. Ricci-Tersenghi, G. Semerjian: On the cavity method for decimated random constraint satisfac-

tion problems and the analysis of belief propagation guided decimation algorithms. Journal of

Statistical Mechanics, P09001, (2009).

[97] R. Robinson, N. Wormald: Almost all regular graphs are Hamiltonian. Random Structures and

Algorithms 5, pp. 363–374, (1994).

[98] A. Sly: Computational transition at the uniqueness threshold. Proc. 51st FOCS, pp. 287–296,

(2010).

[99] E.P.K. Tsang. Foundations of constraint satisfaction. Academic press, (1993).

[100] L.G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Comput-

ing, 8(3):410–421, (1979).

[101] J.S. Yedidia, W.T. Freeman and Y. Weiss: Constructing free energy approximations and general-

ized belief propagation algorithms. Technical report TR-2002–35, Mitsubishi Electrical Research

Laboratories, (2002).

[102] K. Yung: Limits of sequential local algorithms on the random k-XORSAT problem. Proc. 51st

ICALP (2024) #123.

84


A
List of Papers

85


THE NUMBER OF RANDOM 2-SAT SOLUTIONS IS ASYMPTOTICALLY LOG-NORMAL

ARNAB CHATTERJEE, AMIN COJA-OGHLAN, NOËLA MÜLLER, CONNOR RIDDLESDEN, MAURICE ROLVIEN, PAVEL ZAKHAROV,
HAODONG ZHU

ABSTRACT. We prove that throughout the satisfiable phase, the logarithm of the number of satisfying assignments of a
random 2-SAT formula satisfies a central limit theorem. This implies that the log of the number of satisfying assignments
exhibits fluctuations of order

p
n, with n the number of variables. The formula for the variance can be evaluated effec-

tively. By contrast, for numerous other random constraint satisfaction problems the typical fluctuations of the logarithm
of the number of solutions are bounded throughout all or most of the satisfiable regime. MSc: 05C80, 60C05, 68Q87

1. INTRODUCTION

1.1. Background and motivation. The quest for satisfiability thresholds has been a guiding theme of research
into random constraint satisfaction problems [7, 17, 25]. But once the satisfiability threshold has been pinpointed
a question of no less consequence is to determine the distribution of the number of satisfying assignments within
the satisfiable phase [35]. Indeed, the number of solutions is intimately tied to phase transitions that affect the
geometry of the solution space, which in turn impacts the computational nature of finding or sampling solu-
tions [4, 18, 29]. However, few tools are currently available to count solutions of random problems. Where precise
rigorous results exist (such as in random NAESAT or XORSAT), the proofs typically rely on the method of moments
(e.g., [6, 27, 43, 44]). Yet a necessary condition for the success of this approach is that the problem in question
exhibits certain symmetries, which are absent in many interesting cases [7, 21].

The aim of the present paper is to shed a closer light on the number of satisfying assignments in random 2-SAT,
the simplest random CSP that lacks said symmetry properties. While the random 2-SAT satisfiability threshold has
been known since the 1990s [20, 32], a first-order approximation to the number of satisfying assignments has been
obtained only recently [5]. This timeline reflects the computational complexity of the respective questions. As is
well known, deciding the satisfiability of a 2-CNF reduces to directed reachability, solvable in polynomial time [10].

By contrast, calculating the number of satisfying assignmets Z (Φ) of a 2-CNFΦ is a #P-hard task [48]. Nonethe-
less, Monasson and Zecchina [38] put forward a delicate physics-inspired conjecture as to the exponential order of
the number of satisfying assignments of random 2-CNFs. Achlioptas et al. [5] recently proved this conjecture. Their
theorem provides a first-order, law-of-large-numbers approximation of the logarithm of the number of satisfying
assignments. The present paper contributes a much more precise result, namely a central limit theorem. We show
that throughout the satisfiable phase the logarithm of the number of satisfying assignments, suitably shifted and
scaled, converges to a Gaussian. This is the first central limit theorem of this type for any random CSP.

Let Φ =Φn,m be a random 2-CNF on n Boolean variables x1, . . . , xn with m clauses, drawn independently and
uniformly from all 4

(n
2

)
possible 2-clauses. Suppose that m ∼ dn/2 for a fixed real d > 0. Thus, d gauges the average

number of clauses in which a variable xi appears. The value d = 2 marks the satisfiability threshold; hence, Φ is
satisfiable with high probability (‘w.h.p.’) if d < 2, and unsatisfiable w.h.p. if d > 2 [20, 32]. Achlioptas et al. [5]
determined a function φ(d) > 0 such that for all d < 2, i.e., throughout the entire satisfiable phase we have

Z (Φ) = exp(nφ(d)+o(n)) w.h.p. , (1.1)

thereby determining the leading exponential order of Z (Φ).
However, (1.1) fails to identify the limiting distribution of Z (Φ). To be precise, since (1.1) shows that Z (Φ) scales

exponentially, we expect this random variable to exhibit multiplicative fluctuations. Therefore, the appropriate
goal is to find the limiting distribution of the logarithm of this random variable, i.e., of log Z (Φ). Indeed, physics
intuition suggests that log Z (Φ) should be asymptotically Gaussian [36]. The main result of the present paper
confirms this hunch. Specifically, letting Γη(d) be a Gaussian with mean 0 and standard deviation η(d) > 0, we
prove that for all 0 < d < 2, log Z (Φ) satisfies

P
[
log Z (Φ)−E[log Z (Φ) | Z (Φ) > 0] < z

p
m

]∼P[
Γη(d) < z

]
(z ∈R). (1.2)

1

ar
X

iv
:2

40
5.

03
30

2v
2 

 [
cs

.D
M

] 
 2

0 
Se

p 
20

24


0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
d

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Va
ria

nc
e

(d)2

0.40

0.45

0.50

0.55

0.60

0.65

0.70

Ex
pe

ct
at

io
n

(d)
First moment bound

FIGURE 1. Left: Numerical approximations to the functionφ(d) from (1.1) (red) and the variance
η(d)2 from (1.7) (green). The black dashed line is the first moment bound d 7→ log(2)+ d

2 log(3/4).
Right: An illustration of the tree T ⊗ from Section 2.6.

The order Θ(
p

n) of fluctuations confirmed by (1.2) sets random 2-SAT apart from a large family of other ran-
dom constraint satisfaction problems. For example, for random graph q-colouring with q ≥ 3 colours the log of the
number of q-colourings superconcentrates, i.e., merely has bounded fluctuations throughout most of the regime
where the random graph is q-colourable [12].1 The same is true of random NAESAT, XORSAT and the symmet-
ric perceptron [1, 11, 21, 43]. In each of these cases, certain fundamental symmetry properties (e.g., that the set
of q-colourings remains invariant under permutations of the colours) enable the computation of the number of
solutions via the method of moments. Random 2-SAT lacks the respective symmetry (as the set of satisfying as-
signments is not generally invariant under swapping ‘true’ and ‘false’), and accordingly (1.2) establishes that the
number of solutions fails to superconcentrate (for more details see [21]).

1.2. The main result. The formula for the standard deviation η(d) from (1.2) comes in terms of a fixed point equa-
tion on a space of probability measures. Thus, let P (R2) be the set of all (Borel) probability measures on R2. For
0 < d < 2 and 0 ≤ t ≤ 1 we define an operator

logBP⊗d ,t :P
(
R2)→P

(
R2) , ρ 7→ ρ̂ = logBP⊗d ,t (ρ), (1.3)

as follows. Let

(ξρ,i )i≥1, (ξ′ρ,i )i≥1, (ξ′′ρ,i )i≥1, ξρ,i =
(
ξρ,i ,1

ξρ,i ,2

)
, ξ′ρ,i =

(
ξ′ρ,i ,1

ξ′ρ,i ,2

)
, ξ′′ρ,i =

(
ξ′′ρ,i ,1

ξ′′ρ,i ,2

)

be random vectors with distribution ρ, let d
dist= Po(td), d ′,d ′′ dist= Po((1− t )d) and let si , s ′i , s ′′i ,r i ,r ′

i ,r ′′
i for i ≥ 1 be

uniformly random on {±1}, all mutually independent. Then ρ̂ is the distribution of the vector



∑d
i=1 si log

( 1
2

(
1+ r i tanh(ξρ,i ,1/2)

))+∑d ′
i=1 s ′i log

(
1
2

(
1+ r ′

i tanh(ξ′ρ,i ,1/2)
))

∑d
i=1 si log

( 1
2

(
1+ r i tanh(ξρ,i ,2/2)

))+∑d ′′
i=1 s ′′i log

(
1
2

(
1+ r ′′

i tanh(ξ′′ρ,i ,2/2)
))


 ∈R2 .

In addition, define a function B⊗
d ,t : P (R2) → (0,∞] by letting

B⊗
d ,t (ρ) = E

[
2∏

h=1
log

(
1− 1

4
(1+ r 1 tanh(ξρ,1,h/2))(1+ r 2 tanh(ξρ,2,h/2))

)]
. (1.4)

1Formally, up to the so-called condensation threshold, which precedes the q-colourabiliy threshold by a small additive constant, the loga-
rithm of the number of q-colurings minus its expectation converges in distribution to a random variable with bounded moments [12, 13, 21].

2


Theorem 1.1. For any 0 < d < 2, t ∈ [0,1] there exists a unique probability measure ρd ,t ∈P (R2) such that

ρd ,t = logBP⊗d ,t (ρd ,t ) and
∫

R2
∥ξ∥2

2dρd ,t (ξ) <∞. (1.5)

Furthermore,

lim
n→∞

log Z (Φ)−E[log Z (Φ) | Z (Φ) > 0]p
m

=Γη(d) in distribution, where (1.6)

η(d)2 =
∫ 1

0
B⊗

d ,t (ρd ,t )dt −B⊗
d ,0(ρd ,0) ∈ (0,∞). (1.7)

The conditioning on log Z (Φ) > 0 is necessary in (1.6), because even for d < 2 the formulaΦ is unsatisfiable with
probabilityΩ(n−1), in which case log Z (Φ) =−∞. Moreover, the L2-bound from (1.5) ensures that the integral (1.7)
is well-defined. Finally, (1.6) implies (1.2).

How can the formula (1.7) be evaluated? Because the proof of the uniqueness of the stochastic fixed point
ρd ,t from (1.5) is based on the contraction method, a fixed point iteration will converge rapidly. In effect, for any
d , t a discrete distribution that approximates ρd ,t arbitrarily well (in Wasserstein distance) can be computed via a
randomised algorithm called population dynamics [36, Chapter 14]. Since B⊗

d ,t (ρd ,t ) varies continously in d and

t , η(d)2 can thus be approximated within any desired accuracy, see Figure 1.

2. PROOF STRATEGY

The main challenge towards the proof of Theorem 1.1 is to get a handle on the variance of log Z (Φ) given satis-
fiability. The key idea, inspired by spin glass theory [19] but novel to random constraint satisfaction, is to count
the joint number of satisfying assignments of two correlated random formulas. Once this is accomplished Theo-
rem 1.1 will follow from the careful application of a general martingale central limit theorem. To get acclimatised
we first revisit the method of moments, the reasons it fails on random 2-SAT and the combinatorial interpretation
of the law of large numbers (1.1).

2.1. The method of moments fails. The default approach to estimating the number of solutions to a random
CSP is the venerable second moment method [7]. Its thrust is to show that the second moment of the number of
solutions is of the same order as the square of the expected number of solutions. If so then the moment compu-
tation together with small subgraph conditioning yields the precise limiting distribution of the number of solu-
tions [24, 45]. However, this approach works only if the log of the number of solutions superconcentrates around
the log of the expected number of solutions.

This necessary condition is not satisfied in random 2-SAT. In fact, a straightforward calculation yields

1

n
logE[Z (Φ)] ∼ log2+ d

2
log(3/4). (2.1)

The formula on the r.h.s. is displayed as the black dashed line in Figure 1. As can be verified analytically, this
line strictly exceeds the function φ(d) from (1.1) for any 0 < d < 2. Consequently, (1.1) implies that log Z (Φ) ≤
logE[Z (Φ)]−Ω(n) w.h.p. In other words, the expected number of solutions E[Z (Φ)] overshoots the typical number
of solutions by an exponential factor w.h.p. ; cf. the discussion in [6, 8].

2.2. Belief Propagation. Instead of the method of moments, the prescription of the physics-based work of Monas-
son and Zecchina [38] is to estimate log Z (Φ) by way of the Belief Propagation (BP) message passing algorithm. This
approach was vindicated rigorously by Achlioptas et al. [5].

As we will reuse certain elements of that analysis we dwell on BP briefly. For a clause a of a 2-CNFΦ let ∂a = ∂Φa
be the set of variables that a contains. Moreover, for x ∈ ∂a let signΦ(x, a) = sign(x, a) ∈ {±1} be the sign with
which x appears in a. Analogously, let ∂x = ∂Φx be the set of clauses in which variable x appears. BP introduces
‘messages’ between clauses a and the variables x ∈ ∂a. More precisely, each such clause-variable pair a, x comes
with two messages µx→a ,µa→x . The messages are probability distributions on ‘true’ and ‘false’, which we represent
by ±1. Thus, µx→a(±1),µa→x (±1) ≥ 0 and µx→a(1)+µx→a(−1) =µa→x (1)+µa→x (−1) = 1.

The messages get updated iteratively by an operator

BP : (µx→a ,µa→x )a,x∈∂a 7→ (µ̂x→a , µ̂a→x )a,x∈∂a = BP((µx→a ,µa→x )a,x∈∂a). (2.2)

3


For a clause a with adjacent variables ∂a = {x, y} the updated messages µ̂a→x (±1) are defined by

µ̂a→x (sign(x, a)) = 1

1+µy→a(sign(y, a))
, µ̂a→x (−sign(x, a)) = µy→a(sign(y, a))

1+µy→a(sign(y, a))
. (2.3)

Moreover, for a variable x and a clause a ∈ ∂x we define2

µ̂x→a(s) =
∏

b∈∂x\{a}µb→x (s)∏
b∈∂x\{a}µb→x (1)+∏

b∈∂x\{a}µb→x (−1)
(s ∈ {±1}) . (2.4)

The purpose of BP is to heuristically ‘approximate’ the marginal probabilities that a random satisfying assignment
σ=σΦ ofΦwill set a certain variable to a specific truth value. The ‘approximation’ given by the set (µx→a ,µa→x )a,x∈∂a

of messages reads

µx (s) =
∏

b∈∂x µb→x (s)∏
b∈∂x µb→x (1)+∏

b∈∂x µb→x (−1)
(s ∈ {±1}). (2.5)

The BP ‘ansatz’ now asks that we iterate the BP operator until an (approximate) fixed point is reached, i.e.,
ideally until µ̂a→x = µa→x and µ̂x→a = µx→a for all a, x. Then we evaluate the BP marginals (2.5) and plug them
into a generic formula called the Bethe free entropy, which yields the BP ‘approximation’ of log Z (Φ); an excellent
exposition can be found in [36]. The BP recipe provably yields the correct result if the bipartite graph induced by
the clause-variable incidences of the 2-CNFΦ is acyclic, but may be totally off otherwise.

Of course, for 1 < d < 2 the bipartite graph associated with the random formulaΦ contains cycles in abundance.
Nonetheless, (1.1) confirms that the BP formula provides a valid approximation to within o(n). The proof is based
on two observations. First, that the local structure of the clause-variable incidence graph can be described by a
Galton-Watson tree. Second, that the Galton-Watson tree enjoys a spatial mixing property called Gibbs uniqueness.

Since the proof of Theorem 1.1 also harnesses Gibbs uniqueness, let us elaborate. To mimic the local structure
ofΦ consider a multitype Galton-Watson tree T whose types are variable nodes and clause nodes of four sub-types
(s, s′) with s, s′ ∈ {±1}. The root o is a variable node. The offspring of any variable node is a Po(d/4) number of clause
nodes of each of the four sub-types. Finally, the offspring of a clause node is a single variable node. The clause type
(s, s′) indicates that s is the sign with which the parent variable appears in the clause, while s′ determines the sign
of the child variable. Thus, the Galton-Watson tree T can be viewed as a (possibly infinite) 2-CNF. For an integer
ℓ ≥ 0 let T (2ℓ) be the finite tree/2-CNF obtained by deleting all variables and clauses at a distance larger than 2ℓ
from the root.

The tree T approximates Φ locally in the sense that for any fixed ℓ and any given variable xi the distribution
of the depth-2ℓ neighbourhood of xi in Φ converges to T (2ℓ) as n →∞ (in the sense of local weak convergence).
Moreover, Gibbs uniqueness posits that under random satisfying assignments of the tree-CNF T (2ℓ) the truth value
σo of the root under a random satisfying assignment σ decouples from the values σT ,y of variables y ∈ ∂2ℓo at
distance precisely 2ℓ from o for large enough ℓ. Formally, with S(T (2ℓ)) the set of satisfying assignments of the
2-CNF T (2ℓ), the following is true.

Proposition 2.1 ([5, Proposition 2.2]). We have

lim
ℓ→∞

E

[
max

τ∈S(T (2ℓ))

∣∣∣P
[
σo = 1 | T (2ℓ),σ∂2ℓo = τ∂2ℓo

]
−P

[
σo = 1 | T (2ℓ)

]∣∣∣
]
= 0. (2.6)

2.3. Approaching the variance. The proof of the formula (1.1) combines the Gibbs uniqueness property and the
local convergence to the Galton-Watson tree with a coupling argument called the ‘Aizenman-Sims-Starr scheme’ [5].
Unfortunately, this combination does not seem precise enough to get a handle on the limiting distribution of
log Z (Φ) by a long shot. Actually, it is anything but clear how even the order of the standard deviation of log Z (Φ)
could be derived along these lines. One specific problem is that the rate of convergence of (2.6) diminishes as d
approaches the satisfiability threshold.

To tackle this challenge we devise a combinatorial interpretation of log2 Z (Φ). A key idea, which we borrow from
spin glass theory [19], is to set up a family of correlated random formulas. Specifically, given integers M , M ′ ≥ 0 we
construct a correlated pair (Φ1(M , M ′),Φ2(M , M ′)) of formulas on the variable set Vn = {x1, . . . , xn} as follows. Let
(ai )i≥1, (a ′

i )i≥1, (a ′′
i )i≥1 be sequences of mutually independent uniformly random clauses on Vn . Then

Φ1(M , M ′) = a1 ∧·· ·∧aM ∧a ′
1 ∧·· ·∧a ′

M ′ , Φ2(M , M ′) = a1 ∧·· ·∧aM ∧a ′′
1 ∧·· ·∧a ′′

M ′ . (2.7)

2For the sake of tidyness, if the above denominator vanishes we simply let µ̂x→a (±1) = 1
2 .

4


Thus, the two formulas share clauses a1, . . . , aM . Additionally, each contains another M ′ independent clauses. In
particular,Φ1(m,0),Φ2(m,0) are identical, whileΦ1(0,m),Φ2(0,m) are independent.

Interpolating between these extreme cases offers a promising avenue for computing the variance: given that
Φ1(M ,m −M) andΦ2(M ,m −M) are satisfiable for all M , we can write a telescoping sum

log Z (Φ1(m,0)) · log Z (Φ2(m,0))− log Z (Φ1(0,m)) · log Z (Φ2(0,m)) (2.8)

=
m∑

M=1
log Z (Φ1(M ,m −M)) · log Z (Φ2(M ,m −M))

− log Z (Φ1(M −1,m −M +1)) · log Z (Φ2(M −1,m −M +1)).

If we could take the expectation on the l.h.s. of (2.8), we would precisely obtain the variance of log Z (Φ). Moreover,
each summand on the r.h.s. amounts to a ‘local’ change of swapping a shared clause for a pair of independent
clauses. Yet we cannot just take the expectation of (2.8), because some Φh(M ,m − M) may be unsatisfiable. To
remedy this, we will replace log Z (Φ) by a tamer random variable with the same limiting distribution. Its construc-
tion is based on the Unit Clause Propagation algorithm.

2.4. Unit Clause Propagation. Employed by all modern SAT solvers as a sub-routine, Unit Clause Propagation is a
linear time algorithm that tracks the implications of partial assignments. The algorithm receives as input a 2-CNF
Φ along with a set L of literals. These literals are deemed to be ‘true’. The algorithm then pursues direct logical
implications, thereby identifying additional ‘implied’ literals that need to be true so that no clause gets violated.
This procedure is outlined in Steps 1–2 of Algorithm 1; the outcome of Steps 1–2 is independent of the order in
which literals/clauses are processed.

Input: A 2-CNFΦ along with a set L of literals deemed true.
1 while there exists a clause a ≡ l ∨¬l ′ with l ′ ∈L and l ̸∈L do
2 add literal l to L ;
3 For variables x ∈V (Φ) such that x ∈L or ¬x ∈L let

σx =





1 if x ∈L and ¬x ̸∈L ,

−1 if ¬x ∈L and x ̸∈L ,

0 otherwise.

Let C be the set of all clauses a such that σx = 0 for all x ∈ ∂a and return L ,C ,σ;

Algorithm 1: Pessimistic Unit Clause Propagation (‘PUC’).

Clearly, trouble brews if PUC ends up placing both a literal l and its negation ¬l into the set L . Our ‘pessimistic’
Unit Clause variant makes no attempt at mitigating such contradictions. Instead, Step 3 just constructs a partial
assignment where all conflicting literals are set to a dummy value zero. Additionally, PUC identifies the set C of
conflict clauses that contain conflicted variables only.

Now consider a 2-CNF Φ on a set of variables V (Φ). For each possible literal l ∈ {x,¬x : x ∈ V (Φ)} we run
PUC(Φ,L = {l }). Let C (Φ, {l }) be the set of conflict clauses returned by PUC. Obtain the pruned formula Φ̂ from Φ

by removing all clauses in C (Φ) =⋃
l C (Φ, {l }). Then it is easy to verify the following.3

Fact 2.2. For any 2-CNFΦ the pruned 2-CNF Φ̂ is satisfiable.

Generally, the pruned formula Φ̂ could have far fewer clauses than the original formula Φ. Accordingly, even
if Φ is satisfiable the number Z (Φ̂) of satisfying assignments of Φ̂ could dramatically exceed Z (Φ). However, the
following proposition shows that on a random formula, the impact of pruning is modest.

Proposition 2.3. With probability 1−o(n−1/2) we have | log Z (Φ̂)− log Z (Φ)| ≤ n1/3.

3See Section 4.2 for a detailed proof.

5


2.5. Variance redux. The error bound from Proposition 2.3 is tight enough so that towards the proof of Theo-
rem 1.1 it suffices to establish a central limit theorem for log Z (Φ̂), i.e., the log of the number of satisfying assign-
ments of the pruned formula. Once again the pivotal task to this end is to compute the variance of log Z (Φ̂). Revisit-
ing the telescoping sum (2.8), we obtain the following expression. Recalling (2.7), we write Φ̂h(M , M ′) = áΦh(M , M ′)
for the formula obtained by pruningΦh(M , M ′).

Lemma 2.4. Let

∆(M) = E
[

log

(
Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
· log

(
Z (Φ̂2(M ,m −M))

Z (Φ̂2(M −1,m −M))

)]
, (2.9)

∆′(M) = E
[

log

(
Z (Φ̂1(M −1,m −M +1))

Z (Φ̂1(M −1,m −M))

)
· log

(
Z (Φ̂2(M −1,m −M +1))

Z (Φ̂2(M −1,m −M))

)]
. (2.10)

Then Var
[
log Z (Φ̂)

]=
m∑

M=1
∆(M)−∆′(M).

Lemma 2.4 expresses the variance as a sum of local changes. For example, Φ1(M ,m − M) is obtained from
Φ1(M −1,m −M) by adding a single random clause, namely aM . Thus, ∆(M) equals the expected change upon
addition of a single shared clause—modulo the effect of pruning, that is.

But fortunately, on random formulas only a few clauses get pruned w.h.p. In effect, we can express the impact of
these random changes neatly in terms of random satisfying assignments of the ‘small’ formulas Φ̂h(M −1,m −M)
that appear in (2.9)–(2.10). Specifically, the quotients in (2.9)–(2.10) boil down to the probabilities that random
satisfying assignments of the ‘small’ formulas survive the extra clause that gets added to obtain the 2-CNFs in the
respective numerators. Thus, withσ= (σy )y∈Vn denoting a random satisfying assignment of Φ̂h(M −1,m−M), we
obtain the following.

Proposition 2.5. Let 1 ≤ M ≤ m. W.h.p. we have

Z (Φ̂h(M ,m −M))

Z (Φ̂h(M −1,m −M))
= 1−

∏
y∈∂aM

P
[
σy ̸= sign(y, aM ) | Φ̂h(M −1,m −M), aM

]+o(1) (h = 1,2),

Z (Φ̂1(M −1,m −M +1))

Z (Φ̂1(M −1,m −M))
= 1−

∏
y∈∂a ′

m−M+1

P
[
σy ̸= sign(y, a ′

m−M+1) | Φ̂1(M −1,m −M), a ′
m−M+1

]+o(1),

Z (Φ̂2(M −1,m −M +1))

Z (Φ̂2(M −1,m −M))
= 1−

∏
y∈∂a ′′

m−M+1

P
[
σy ̸= sign(y, a ′′

m−M+1) | Φ̂2(M −1,m −M), a ′
m−M+1

]+o(1).

2.6. Local convergence in probability. To evaluate the expressions from Proposition 2.5 we need to get a grip on
the joint distribution of the truth values of y under random satisfying assignments of the two correlated formulas
Φ̂h(M − 1,m − M). To this end we will devise a Galton-Watson tree T ⊗ that mimics the joint distribution of the
local structure of (Φ̂1(M −1,m −M),Φ̂2(M −1,m −M)). Subsequently, we will establish Gibbs uniqueness for this
Galton-Watson tree to compute the expressions from Proposition 2.5.

The Galton-Watson tree T from Section 2.2 that describes the local topology of the ‘plain’ random formula Φ
had one type of variable nodes and four types (±1,±1) of clause nodes. To approach the correlated pair (Φ̂1(M ,m−
M−1),Φ̂2(M ,m−M−1)) we need a Galton-Watson process with three types of variable nodes and a full dozen types
of clause nodes. Specifically, there are shared, 1-distinct and 2-distinct variable nodes. The root o of T ⊗ is a shared
variable node. The clause node types are (s, s′)-shared, (s, s′) 1-distinct and (s, s′) 2-distinct for s, s′ ∈ {±1}.

In addition to d ∈ (0,2) the offspring distributions of T ⊗ = T ⊗
d ,t involve a second parameter t ∈ [0,1]:

• A shared variable spawns Po(d t/4) shared clauses of type (s, s′) as well as Po(d(1− t )/4) 1-distinct clauses
of type (s, s′) and Po(d(1− t )/4) 2-distinct clauses of type (s, s′) for any s, s′ ∈ {±1}.

• An h-distinct variable begets Po(d/4) h-distinct clauses of type (s, s′) for any s, s′ ∈ {±1} (h = 1,2).
• A shared clause has precisely one shared variable as its offspring.
• An h-distinct clause spawns a single h-distinct variable (h = 1,2).

Figure 1 provides an illustration of the tree T ⊗. Shared variables/clauses are indicated in red, 1-distinct vari-
ables/clauses in green and 2-distinct ones in blue.

From T ⊗ we extract a pair (T 1,T 2) of correlated random trees. Specifically, T h is obtained from T ⊗ by deleting
all (3−h)-distinct variables and clauses. Hence, the parameter t determines how ‘similar’ T 1,T 2 are. Specifically,

6


if t = 1 then no {1,2}-distinct clauses exist and thus T 1,T 2 are identical. By contrast, if t = 0 then T 1,T 2 are inde-
pendent copies of the tree T from Section 2.2.

For an integer ℓ≥ 0 obtain T ⊗, (2ℓ), T (2ℓ)
1 , T (2ℓ)

2 from T ⊗,T 1,T 2 by omitting all nodes at a distance greater than 2ℓ
from the root o. As in Section 2.2, we can interpret these trees as 2-CNFs, with the type (s, s′) of a clause indicating
the signs of its parent and child variables. We say that two possible outcomes T,T ′ of T ⊗, (2ℓ) are isomorphic if there
is a tree isomorphism that preserves the root o as well as all types.

Further, a variable x ∈Vn is called a 2ℓ-instance of T in (Φ̂1(M , M ′),Φ̂2(M , M ′)) if there exist isomorphisms ιh of
the 2-CNFs Th obtained from T by deleting all (3−h)-distinct variables/clauses to the depth-2ℓ neighbourhoods
∂≤2ℓ
Φ̂h (M ,M ′)

x of x in Φ̂h(M , M ′) such that

• the root gets mapped to x, i.e., ι1(o) = ι2(o) = x,
• for any shared variable y of T1,T2 the image variables coincide, i.e., ι1(y) = ι2(y),
• for any shared clauses a of T1,T2 the image ι1(a) = ι2(a) ∈ {a1, . . . , aM } is a shared clause,
• for any 1-distinct clause a whose parent in T1 is a shared variable, ι1(a) ∈ {a ′

1, . . . , a ′
M ′ }, and

• for any 2-distinct clause a whose parent in T2 is a shared variable, ι1(a) ∈ {a ′′
1 , . . . , a ′′

M ′ }.

Let N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′))) be the number of 2ℓ-instances of T in (Φ1(M , M ′),Φ2(M , M ′)).The follow-
ing proposition confirms that T ⊗ models the local structure of (Φ̂1(M , M ′),Φ̂2(M , M ′)) faithfully.

Proposition 2.6. Let ℓ > 0 be a fixed integer, let t ∈ [0,1] and suppose that M ∼ tdn/2 and M ′ ∼ (1− t )dn/2. Then
w.h.p. for all possible outcomes T of T ⊗, (2ℓ) we have N (2ℓ)(T, (Φ̂1(M , M ′),Φ̂2(M , M ′))) ∼ nP

[
T ⊗, (2ℓ) ∼= T

]
.

2.7. Correlated Belief Propagation. Now that we have a branching process description of our pair of correlated
formulas the next step is to run BP on the random trees (T 1,T 2) to find the joint distribution of the truth values
σT (2ℓ)

1 ,o ,σT (2ℓ)
2 ,o assigned to the root. Hence, let

µ(2ℓ) =
(
P

[
σT (2ℓ)

1 ,o = 1 | T ⊗
]

,P
[
σT (2ℓ)

2 ,o = 1 | T ⊗
])

∈ (0,1)2. (2.11)

Since BP is exact on trees, we could calculate these marginals by iterating (2.2)–(2.4) for 2ℓ steps, starting from
all-uniform messages. But our objective is not merely to calculate the marginals of a specific pair of trees, but the
distribution of the vector (2.11) for a random T ⊗. Fortunately, due to the Markovian nature of the Galton-Watson
tree T ⊗, the bottom-up BP computation on a random tree can be expressed by a fixed point iteration on the space
of probability distributions onR2. The appropriate operator is the logBP⊗d ,t -operator from (1.3). To be precise, that
operator expresses the updates of the log-likelihood ratios of the BP messages from (2.3)–(2.4). Thus, let

t : (z1, z2) ∈R2 7→ ((1+ tanh(z1/2))/2,(1+ tanh(z2/2))/2) ∈ (0,1)2

be the function that maps log-likelihood ratios back to probabilities. Furthermore, for a probability measure ρ ∈
P (R2) let t(ρ) be the pushforward probability measure on (0,1)2.4

Proposition 2.7. Let ρ(0)
d ,t ∈P (R2) be the atom at the origin and let ρ(ℓ)

d ,t = logBP⊗d ,t (ρ(ℓ−1)
d ,t ). Then µ(2ℓ) has distribu-

tion t(ρ(ℓ)
d ,t ).

We employ the contraction method to show that the sequence (ρ(ℓ)
d ,t )ℓ≥1 of measures converges.

Proposition 2.8. There exists a unique ρd ,t ∈P (R2) that satisfies (1.5) and limℓ→∞ρ(ℓ)
d ,t = ρd ,t weakly.

Furthermore, the Gibbs uniqueness property (2.6) extends to T 1 and T 2.

Corollary 2.9. For all t ∈ [0,1] and h = 1,2 we have

lim
ℓ→∞

E

[
max

τ∈S(T (2ℓ)
h )

∣∣∣P
[
σT (2ℓ)

h ,o = 1 | T ⊗,σT (2ℓ)
h ,∂2ℓo = τ∂2ℓo

]
−P

[
σT (2ℓ)

h ,o = 1 | T ⊗
]∣∣∣

]
= 0. (2.12)

Combining Propositions 2.7 and 2.8 and Corollary 2.9, we are now in a position to pinpoint the joint marginals
of Φ̂1(M , M ′),Φ̂2(M , M ′). Formally, let

πΦ̂1(M ,M ′),Φ̂2(M ,M ′) =
1

n

n∑
i=1

δ(P[σΦ̂1(M ,M ′),xi
=1|Φ̂1(M ,M ′)],P[σΦ̂2(M ,M ′),xi

=1|Φ̂2(M ,M ′)]) ∈P ([0,1]2)

4That is, for a measurable A⊆ (0,1)2 we have t(ρ)(A) = ρ(t−1(A)).

7


FIGURE 2. The distributions t(ρd ,t ) for d = 1.9 and t = 0.1,0.5,0.9.

be the empiricial distribution of the joint marginals of Φ̂1(M , M ′) and Φ̂2(M , M ′), which we need to know to eval-
uate the expressions from Proposition 2.5. Furthermore, denote by W1( · , · ) the Wasserstein L1-distance of two
probability measures on [0,1]2.

Corollary 2.10. For any t ∈ [0,1] and any M ∼ tnd/2, M ′ ∼ (1− t )dn/2 we have

E
[

W1

(
πΦ̂1(M ,M ′),Φ̂2(M ,M ′),t(ρd ,t )

)]
= o(1).

Finally, combining Proposition 2.5 with Corollary 2.10, we obtain the variance of log Z (Φ̂).

Corollary 2.11. With η(d)2 from (1.7) we have η(d) > 0 and Varlog Z (Φ̂) ∼ mη2
d .

Because the proof of Proposition 2.8 is based on a contraction argument, for any d , t the distribution ρd ,t can
be approximated effectively within any given accuracy via a fixed point iteration. Figure 2 displays approximations
to t(ρd ,t ) for different values of t and shows how correlations between the two coordinates of the random vector
increase with t (brighter diagonal).

2.8. The central limit theorem. With the variance computation done, we have now overcome the greatest hur-
dle en route to Theorem 1.1. Indeed, to obtain the desired asymptotic normality we just need to combine the
techniques from the variance computation with a generic martingale central limit theorem.

To this end we set up a filtration (Fn,M )0≤M≤mn by letting Fn,M be theσ-algebra generated by a1, . . . , aM . Hence,
conditioning onFn,M amounts to conditioning on a1, . . . , aM , while averaging on the remaining clauses aM+1, . . . , am .
The conditional expectations

Z n,M = m−1/2E
[
log Z (Φ̂) |Fn,M

]
(2.13)

then form a Doob martingale. Let X n,M = Z n,M −Z n,M−1 be the martingale differences.

Proposition 2.12. For all 0 < d < 2 the martingale (2.13) satisfies

lim
n→∞E

[
max

1≤M≤m
|X n,M |

]
= 0 and lim

n→∞E

∣∣∣∣∣η(d)2 −
m∑

M=1
X 2

n,M

∣∣∣∣∣= 0. (2.14)

Thanks to pruning, the first condition from (2.14) is easily checked. Furthermore, the steps that we pursued
towards the proof of Corollary 2.11, i.e., the variance calculation, also imply the second condition without further
ado. Finally, as (2.14) demonstrates that the marginal differences are small and that the variance process converges
to a deterministic limit, Theorem 1.1 follows from the general martingale central limit theorem from [28].

3. DISCUSSION

The hunt for satisfiability thresholds of random constraint satisfaction problems was launched by the experimen-
tal work of Cheeseman, Kanefsky and Taylor [17]. The 2-SAT threshold was the first one to be caught [20, 32].
Subsequent successes include the 1-in-k-SAT threshold [3] and the k-XORSAT threshold [27, 43]. Furthermore,

8


Friedgut [30] proved the existence of non-uniform (i.e., n-dependent) satisfiability thresholds in considerable gen-
erality. The plot thickened when physicists employed a compelling but non-rigorous technique called the cavity
method to ‘predict’ the exact satisfiability thresholds of many further problems, including the k-SAT problem for
k ≥ 3 [37]. A line of rigorous work [6, 8, 23] culminated in the verification of this physics prediction for large k [25].

Even though the satisfiability threshold of random 2-SAT was determined already in the 1990s, the problem
continued to receive considerable attention. For example, Bollobás, Borgs, Chayes, Kim and Wilson [15] investi-
gated the scaling window around the satisfiability threshold, a point on which a recent contribution by Dovgal,
de Panafieu and Ravelomanana elaborates [26]. Abbe and Montanari [2] made the first substantial step towards
the study of the number of satisfying assignments that 1

n log Z (Φ) converges in probability to a deterministic limit
ϕ(d) for Lebesgue-almost all d ∈ (0,2). However, their techniques do not reveal the value ϕ(d). Moreover, Monta-
nari and Shah [39] obtain a ‘law-of-large-numbers’ estimate of the number of assignments that satisfy all but o(n)
clauses for d < 1.16. Finally, the aforementioned article of Achlioptas et al. [5] verifies the prediction from [38] as
to the number of satisfying assignments for all d < 2. The main result of the present paper refines these results
considerably by establishing a central limit theorem.

For random k-CNFs with k ≥ 3 an upper bound on the number of satisfying assignments can be obtained
via the interpolation method from mathematical physics [42]. This bound matches the predictions of the cav-
ity method [36]. However, no matching lower bound is currently known. The precise physics prediction called the
‘replica symmetric solution’ has only been verified for ‘soft’ versions of random k-SAT where unsatisfied clauses
are penalised but not strictly forbidden, and for clause-to-variable ratios well below the satisfiability threshold [39,
41, 47].

Random CSPs such as random k-XORSAT or random k-NAESAT that exhibit stronger symmetry properties than
random k-SAT tend to be amenable to the method of moments [6].5 Therefore, more is known about their number
of solutions. For example, due to the inherent connection to linear algebra, the number of satisfying assignments
of random k-XORSAT formulas is known to concentrate on a single value right up to the satisfiability threshold [11,
27, 43]. Furthermore, in random k-NAESAT, random graph colouring and several related problems the logarithm
of the number of solutions superconcentrates, i.e., has only bounded fluctuations for constraint densities up to the
so-called condensation threshold, a phase transition that shortly precedes the satisfiability threshold [12, 21, 44].
The same is true of random k-SAT instances with regular literal degrees [24]. A further example is the symmetric
perceptron [1], where the number of solutions superconcentrates but the limiting distribution is a log-normal with
bounded variance. Going beyond the condensation transition, Sly, Sun and Zhang [46] proved that the number
of satisfying assignments of random regular k-NAESAT formulas matches the ‘1-step replica symmetry breaking’
prediction from physics.

Apart from the superconcentration results for symmetric problems from [12, 24, 21, 44], the limiting distribution
of the logarithm of the number of solutions has not been known in any random constraint satisfaction problem.
In particular, Theorem 1.1 is the first central limit theorem for this quantity in any random CSP. We expect that the
technique developed in the present work, particularly the use of two correlated random instances in combination
with spatial mixing, can be extended to other problems. The present use of correlated instances is inspired by the
work of Chen, Dey and Panchenko [19] on the p-spin model from mathematical physics, a generalisation of the
famous Sherrington-Kirkpatrick model. That said, on a technical level the present use of correlated instances is
quite different from the approach from [19]. Specifically, while here we construct correlated 2-CNFs that share a
specific fraction of their clauses and employ a martingale central limit theorem, Chen, Dey and Panchenko com-
bine a continuous interpolation of two mixed p-spin Hamiltonians with Stein’s method.

A further line of work deals with central limit theorems for random optimisation problems. Cao [16] provided
a general framework based on the ‘objective method’ [9]. Unfortunately, the conditions of Cao’s theorem tend to
be unwieldy for MAX CSP problems with hard constraints. Recent work of Kreačič [34] and Glasgow, Kwan, Sah,
Sawhney [31] on the matching number therefore instead resorts to the use of stochastic differential equations.
A promising question for future work might be whether the present method of considering correlated instances
might extend to random optimisation problems.

Organisation. In the rest of the paper we carry out the strategy from Section 2 in detail. After some preliminaries
in Section 4, we prove Proposition 2.3 in Section 5. Subsequently Section 6 deals with the proof of Proposition 2.6.
The proof of Proposition 2.5 follows in Section 7. Moreover, Section 8 contains the proof of Proposition 2.8. Further,

5Formally, by ‘symmetry’ we mean that the empirical distribution of the marginals of random solutions converges to an atom; cf. [22].

9


in Section 9 we prove Proposition 2.7 and Section 11 contains the proofs of Proposition 2.12 and Corollary 2.11.
Finally, in Section 12 we complete the proof of Theorem 1.1.

4. PRELIMINARIES AND NOTATION

4.1. Boolean formulas. A 2-SAT formula or 2-CNF Φ consists of a finite set V (Φ) of propositional variables and
another set F (Φ) of clauses. Unless specified otherwise, we assume that each clause contains two distinct variables.

For a clause a ∈ F (Φ) we denote by ∂a = ∂Φa the set of variables that appear in clause a. Similarly, for a variable
x ∈V (Φ) let ∂x = ∂Φx signify the set of clauses in which x appears. Thus, the formulaΦ induces a bipartite graph on
variables and clauses, the so-called incidence graph ofΦ. Further, the shortest path metric on the incidence graph
induces a metric on the variables and clauses ofΦ. Accordingly, for a variable or clause u let ∂ℓu = ∂ℓΦu be the set of

all nodes at a distance precisely ℓ from u. Moreover, let ∂≤ℓu = ∂≤ℓΦ u be the sub-formula ofΦ obtained by deleting

all clauses and variables at a distance greater than ℓ from u. In other words, ∂≤ℓu is the depth-ℓ neighbourhood of
u.

We encode the Boolean values ‘true’ and ‘false’ as ±1. Accordingly, let S(Φ) ⊆ {±1}V (Φ) be the set of satisfying
assignments ofΦ and let Z (Φ) = |S(Φ)|. Further, sign(x, a) = signΦ(x, a) ∈ {±1} denotes the sign with which variable
x appears in clause a, i.e., sign(x, a) = 1 if x appears in a positively and sign(x, a) = −1 if a contains the negation
¬x. Finally, for a literal l ∈ {x,¬x} we let |l | = x denote the underlying Boolean variable.

Assuming S(Φ) ̸= ; let

µΦ(σ) = 1{σ ∈ S(Φ)}/Z (Φ), (σ ∈ {±1}V (Φ)) (4.1)

be the uniform distribution on S(Φ). We write σ = σΦ = (σΦ,x )x∈V (Φ) ∈ {±1}V (Φ) for a sample from µΦ, i.e., a uni-
formly random satisfying assignment ofΦ.

In contrast to k-SAT for k ≥ 3, the 2-SAT problem can be solved in polynomial time. This is because a 2-SAT
instance is unsatisfiable if and only if it contains a peculiar sub-formula called a bicycle. To be precise, let Φ be
a CNF with clauses of length one or two. A bicycle of Φ is an alternating sequence l0, a1, l1, a2, . . . , ak , lk of literals
l0, . . . , lk and clauses a1, . . . , ak ∈ F (Φ) such that

BIC1: l0 = lk ,
BIC2: li =¬l0 for some 0 < i < k and
BIC3: ai ≡¬li−1 ∨ li ≡ li−1 → li .

(Observe that a clause a comprising only a single literal l is logically equivalent to l∨l ≡¬l → l .) Hence, the bicycle
consists of clauses that are logically equivalent to a chain of implications l0 →¬l0 → l0.

Fact 4.1 ([10]). A CNFΦwith clauses of lengths one or two is unsatisfiable iffΦ contains a bicycle.

4.2. Unit Clause Propagation. The PUC algorithm (Algorithm 1) takes as input a CNF Φ along with an initial set
L0 of literals. PUC outputs a set L =L (Φ,L0) ⊇L0 of literals. Let

V (Φ,L0) = {|l | : l ∈L (Φ,L0)}

be the set of underlying variables. In addition to L (Φ,L0), PUC also outputs a partial assignment

σ=σΦ,L0 : V (Φ,L0) → {0,±1}

that sets each x ∈ V either to a truth value ±1 or to the dummy value 0. Let

V0(Φ,L0) = {x ∈ V (Φ,L0) :σΦ,L0,x = 0}

be the set of variables that receive the dummy value. Finally, the algorithm identifies a set C (Φ,L0) of conflict
clauses, i.e., clauses a such that ∂a ⊆ V0(Φ,L0).

We make a note of a few basic facts about PUC. These remarks apply to any CNF Φ with clauses of length at
most two. To get started, we say that a literal l ′ is implication-reachable from another literal l if there exists an
alternating sequence l = l0, a1, l1, . . . , ak , lk = l ′ of literals li and clauses ai of Φ such that ai ≡¬li−1 ∨ li ≡ li−1 → li

for all 1 ≤ i ≤ k. We call this sequence an implication chain from l to l ′. Observe that a unit clause (clause of length
one) comprising a single literal l is equivalent to the implication ¬l → l ≡ l ∨ l . Furthermore, if l ′ is implication-
reachable from l , then ¬l is implication-reachable from ¬l ′. Indeed, if l = l0, a1, l1, . . . , ak , lk = l ′ is an implication
chain from l to l ′, then its contraposition

¬l ′ =¬lk , ak ,¬lk−1, . . . ,¬l1, a1,¬l

10


is an implication chain from ¬l ′ to ¬l .

Lemma 4.2. Let Φ be a CNF with clauses of length at most two and let L0 be a set of literals of Φ. Then L (Φ,L0) is
the set of all literals l ′ that are implication-reachable from a literal l ∈L0.

Proof. This is an easy induction on the length of the shortest implication chain from l to l ′. □
An immediate consequence of Lemma 4.2 is that the order in which PUC proceeds is irrelevant.

Finally, for the sake of completeness, we carry out the proof of Fact 2.2.

Proof of Fact 2.2. Fix some order l1, . . . , lν of the literals {x,¬x : x ∈ V (Φ)}. Let σi be the assignments produced
by PUC on input (Φ, {li }). We construct an assignment σ : V (Φ) → {0,±1} by proceeding as follows for i = 1, . . . ,ν.
For each variable x such that {x,¬x}∩L (Φ, li ) \

⋃
1≤h<i L (Φ, lh) ̸= ; let σx = σi (x). We claim that for each clause

a ∈ Φ̂ there is a variable x ∈ ∂a such that σx = sign(x, a). Indeed, it is not possible that σx = 0 for all x ∈ ∂a;
for otherwise a ∈ C (Φ, li ) for some i , and thus a would not be present in Φ̂. Thus, there exists x ∈ ∂a such that
σx ∈ {±1}. If σx ̸= sign(x, a), then PUC would have included the second literal k that appears in a into the set L .
Hence, σ|k| = sign(|k|, a), because otherwise a would have been a contradiction and therefore omitted from Φ̂. □
4.3. Random 2-SAT. Recall from Section 1.1 thatΦdenotes the random 2-CNF formula with variables Vn = {x1, . . . , xn}
and clauses Fm = {a1, . . . , am}, where m = mn ∼ dn/2 for a fixed d > 0. We tacitly assume that 0 < d < 2, i.e., that
we are in the satisfiable regime.

In the following sections we will need estimates of the sizes of the sets |V (Φ,L )|, |C (Φ,L )| produced by PUC
on the random formula Φ for singletons L . Thus, suppose we start the PUC algorithm from an initial literal L =
{l }. Since the ensuing chain of implications traced by PUC is stochastically dominated by a sub-critical branching
process (for d < 2), we obtain the following bound.

Lemma 4.3 ([5, Claim 6.8]). For any literal l and every t > 8/(2−d) we have

P [|V (Φ, {l })| > t ] ≤ (2+o(1))exp(−d t/40).

Corollary 4.4. With probability 1−o(n−2) we have

max
1≤i≤n

|V (Φ, {xi })|+ |V (Φ, {¬xi })| ≤ log2 n.

Proof. This is an immediate consequence of Lemma 4.3. □
Finally, the following statement estimates the probability that a random formula is unsatisfiable.

Lemma 4.5. We have P
[
Φ is unsatisfiable

]≤ no(1)−1.

Proof. This follows from Fact 4.1 and [5, Claim 6.9]. □
Recall the Galton-Watson tree T from Section 2.2. The following lemma shows that T mimics the local structure

of the ‘plain’ random formula with n′ variables and m′ independent random 2-clauses. Also recall that ∂≤2ℓ
Φ x

denotes the sub-formula ofΦ comprising all clauses and variables at distance at most 2ℓ from x.

Lemma 4.6 ([5, p. 15]). Let ℓ≥ 0 be an integer and let T be a possible outcome of T (2ℓ). LetΦ0 be a random 2-CNF
with n′ ∼ n variables and m′ ∼ dn/2 clauses. Then w.h.p. the number N (2ℓ)(T,Φ0) of variables xi of Φ0 such that
∂≤2ℓ
Φ0

xi
∼= T 2ℓ satisfies

N (2ℓ)(T,Φ0) = nP
[

T (2ℓ) ∼= T
]
+o(n).

As a final preparation we need an upper bound on the maximum variable degree.

Lemma 4.7. With probability 1−o(n−10) the degree of any variable node xi , i = 1, . . . ,n inΦ is bounded by log2 n.

Proof. The number of clauses that contain a given variable xi has distribution Bin(m,2/n). Therefore, the assertion
follows from the Chernoff bound. □
Corollary 4.8. With probability 1−o(n−10) the degree of any variable node xi , i = 1, . . . ,n in (Φ1,Φ2) is bounded by
log2 n.

Proof. Since Φ1,Φ2 separately are distributed as Φ, the assertion follows from Lemma 4.7 and the union bound.
□

11


4.4. Convergence of probability measures. For a measurable subsetΩ of Euclidean space Rk we let P (Ω) denote
the space of all probability distributions on Ω equipped with the Borel σ-algebra. Moreover, for p ≥ 1 we define
Wp (Ω) to be the set of all µ ∈P (Ω) such that

∫
Ω ∥x∥p

2 dµ(x) <∞. We equip Wp (Ω) with the Wasserstein metric

Wp (µ,µ′) = inf
X ,X ′E

[∥X −X ′∥p
2

]1/p
(µ,µ′ ∈Wp (Ω)), (4.2)

where the infimum is taken over all pairs of random variables X , X ′ that are defined on some common probability
space such that X has distribution µ and X ′ has distribution µ′.

The infimum in (4.2) is attained for any µ,µ′. Random vectors X , X ′ for which the infimum is attained are called
optimal couplings. Such optimal couplings exist for all µ,µ′ [14].

The spaces (Wp (Ω),Wp ) are complete metric spaces [14]. Finally, convergence in (Wp (Ω),Wp ) implies weak
convergence of the corresponding probability measures.

For a measure ρ ∈P (Ω) and a measurable function f :Ω→Ω′ fromΩ to another probability spaceΩ′ we denote
by f (ρ) the pushforward measure of ρ. Thus, the measure f (ρ) that assigns mass ρ( f −1(A)) to measurable A ⊆Ω′.

Throughout the paper we let

t :R2 → (0,1)2,t

(
x1

x2

)
=

(
1+tanh(x1/2)

2
1+tanh(x2/2)

2

)
, l : (0,1)2 →R2, l

(
x1

x2

)
=

(
log x1

1−x1

log x2
1−x2

)
.

5. PROOF OF PROPOSITION 2.3

In this section we estimate the difference between the number of satisfying assignments of the pruned random
formula Φ̂ and the original formula Φ. We begin with a basic observation about the Unit Clause Propagation
algorithm, and then estimate the number of clauses that the pruning process removes. Apart from proving Propo-
sition 2.3, the considerations in this section also pave the way for the proof of the variance formula in Section 11.

5.1. Tracing Unit Clause Propagation. For a 2-CNF Φ and a set of literals L0 consider a set of conflict clauses
C = C (Φ,L0) that PUC produces along with a set V = V (Φ,L0) of conflict variables. Let Φ−C be the formula
obtained fromΦ by deleting the clauses from C . Clearly Z (Φ) ≤ Z (Φ−C ). Conversely, the following lemma puts a
bound on how much bigger Z (Φ−C ) may be.

Lemma 5.1. Assume thatΦ is a satisfiable 2-CNF. For any set L0 of literals we have

Z (Φ−C (Φ,L0)) ≤ 2|V (Φ,L0)|·1{C (Φ,L0 )̸=;}Z (Φ). (5.1)

Towards the proof of Lemma 5.1 let L = L (Φ,L0) be the final set of literals that PUC produces. Moreover, let
σ : V → {0,±1} be the function that PUC outputs and let V0 = {x ∈ V :σx = 0}. Further, let Φ0 be a CNF with variable
set V that contains the following clauses:

(i) any clause a ∈ F (Φ) with ∂a ⊆ V ,
(ii) a unit clause l for every literal l with |l | ∈ V such thatΦ contains a clause a ≡ l ∨ l ′ with |l ′| ̸∈ V .

Thus,Φ0 contains clauses of length one or two.

Claim 5.2. The formulaΦ0 possesses a satisfying assignment τ such that τx =σx for all x ∈ V \V0.

Proof. Obtain a formula Φ1 by adding to Φ0 a unit clause x for every variable x ∈ V with σx = 1 and a unit clause
¬x for every x ∈ V with σx =−1. Then we just need to show thatΦ1 is satisfiable.

Assume otherwise. Then by Fact 4.1 Φ1 contains a bicycle l0, a1, l1, a2, . . . , ak , lk . This bicycle is logically equiva-
lent to an implication chain

l0 → l1 →···→¬l0 →···→ lk−1 → lk = l0. (5.2)

The contraposition of this chain reads

¬l0 =¬lk →¬lk−1 →···→ l0 →···→¬l1 →¬l0. (5.3)

SinceΦ is satisfiable, Fact 4.1 shows that the bicycle (5.2) cannot be contained inΦ. Therefore, the bicycle contains
a unit clause li ∈ F (Φ1) \ F (Φ) for some 1 ≤ i ≤ k. Hence, the constructions ofΦ0 andΦ1 ensure that li ∈L (Φ,L0).
Indeed, letting U be the the set of all literals li that appear in (5.2) as unit clauses, we obtain U ⊆L (Φ,L0).

We claim that in fact l0, . . . , lk ∈ L (Φ,L0). To see this, pick any 0 ≤ j ≤ k such that l j does not appear as a
unit clause in Φ1. Define l−i = lk−i for 0 ≤ i < k and let 1−k ≤ i < j be the largest index such that li ∈ U . Then

12


Φ contains the implication chain li → ··· → l j . Therefore, Lemma 4.2 implies that l j ∈ L (Φ,L0). Analogously,
considering the contraposition (5.3), we conclude that the negations of the literals l0, . . . , lk belong to L (Φ,L0). In
summary,

l0,¬l0, . . . , lk ,¬lk ∈L (Φ,L0). (5.4)

But (5.4) implies that |l0|, . . . , |lk | ∈ V0. Consequently, none of these literals belongs to a unit clause u ∈ F (Φ1) \
F (Φ0). Furthermore, none of the literals li ,¬li belongs to a unit clause a ∈ F (Φ0) \ F (Φ). This is because if Φ
contains a clause li ∨ l ′ or ¬li ∨ l ′ and li ,¬li ∈L (Φ,L0), then PUC added l ′ to L (Φ,C0) as well. Thus, we conclude
that the bicycle (5.2) consists of clauses ofΦ only. But by Fact 4.1 this contradicts the fact thatΦ is satisfiable. □

Proof of Lemma 5.1. Clearly, if C (Φ,L0) =;, the statement is true. Hence, assume that C (Φ,L0) ̸= ; and let τ be a
satisfying assignment ofΦ0 from Claim 5.2. Consider a satisfying assignment χ ofΦ−C and let χ′ : V (Φ)\V → {±1}
be the restriction of χ to V (Φ) \V . We extend χ′ to a satisfying assignment χ′′ ofΦ by letting

χ′′x = 1{x ∈ V }τx + 1{x ̸∈ V }χ′x ;

clearly, χ′′ satisfies all clauses a such that ∂a ∩V =;, because all these clauses are contained in Φ−C . Moreover,
χ′′ satisfies all a such that ∂a ⊆ V , because these clauses belong to Φ0. Further, if a = l ∨ l ′ is a clause such that
|l | ∈ V but |l ′| ̸∈ V , then τ|l | = σ|l |. Since |l ′| ̸∈ V , this means that σx = sign(|l |, a), as otherwise PUC would have
added l ′ to L . Therefore, χ′′|l | = τ|l | =σ|l | satisfies a. Since the map χ 7→ χ′ only discards the values of the variables
in V , we obtain the bound (5.1). □

5.2. Cycles in random formulas. To prove Proposition 2.3 we need a good estimate of the total number of clauses
that will be removed fromΦ to obtain Φ̂. This estimate is provided by the following lemma.

Lemma 5.3. Fix any δ > 0. With probability 1−o(n−1) the number of literals l such that C (Φ, {l }) ̸= ; is smaller
than nδ.

Proof. Let N be the number of literals l such that C (Φ, {l }) ̸= ;. We are going to show that for any fixed (i.e.,
n-independent) ℓ≥ 1 for large enough n we have,

E

[
ℓ∏

i=1
(N − i +1)

]
≤ (

ℓ log3 n
)ℓ

. (5.5)

Providing ℓ≥ 2/δ and n is sufficiently large, Markov’s inequality then shows that

P
[

N ≥ nδ
]
≤P

[
ℓ∏

i=1
(N − i +1) ≥ (nδ/2)ℓ

]
≤

(
2ℓ log3 n

nδ

)ℓ
= o(n−1),

which implies the assertion.
Thus, we are left to prove (5.5). By symmetry it suffices to bound the probability of the event

E=
ℓ⋂

i=1
{C (Φ, {xi }) ̸= ;}

that PUC will produce at least one conflict clause from each of the literals x1, . . . , xℓ; then

E

[
ℓ∏

i=1
(N − i +1)

]
≤ (2n)ℓP [E] . (5.6)

In order to estimate the probability ofEwe are going to launch PUC from the initial set L = {x1, . . . , xℓ}. While the
order in which the literals and clauses are processed does not affect the ultimate outcome of PUC, for the present
analysis we assume that PUC processes the literals one at a time, each time pursuing all the clauses l ∨¬l ′ that
contain the negation of a specific l ′. We also presume that the literals are processed in the same order as they
get inserted into the set L . In other words, PUC proceeds in breadth-first-search order. Let Ht be the history of
the execution of PUC up to and including the point where the first t literals and their adjacent clauses have been
explored. Formally, Ht is the σ-algebra generated by these first t literals that get added to L and their adjacent
clauses.

13


Lemma 4.3 implies that with probability 1−o(n−1) the set L returned by PUC has size at most L = ℓ log2 n. Let
Et be the event that at time t we explored a clause that contains two variables from L and |L | ≤ L. Moreover, let
S =∑

t 1{Et }. Let 0 < t1 < . . . < tℓ ≤ L be distinct time steps. Then

P [E] ≤P [S ≥ ℓ] ≤
∑

t1,...,tℓ

P

[
ℓ⋂

i=1
Eti

]
=

∑
t1,...,tℓ

ℓ∏
i=1

P

[
Eti

∣∣∣
i−1⋂
j=1

Et j

]
. (5.7)

To bound the r.h.s. of (5.7) we will estimate the probability of Et+1 given the history Ht of the process up to time t ,
showing that for all t ≥ 0,

P [Et+1|Ht ] ≤ L

n
. (5.8)

In fact the probability that in step t +1 we will run into already discovered variable is bounded by the probability
that the literal explored during that step shares a clause with an already explored variable, which is bounded by
Lm/n ≤ L/n; for if more than L literals have been already explored the event Et+1 does not occur by definition.
Finally, because the event Et is Ht -measurable, (5.8) implies

ℓ∏
i=1

P

[
Eti

∣∣∣
i−1⋂
j=1

Et j

]
≤

(
L

n

)ℓ
.

Thus (5.7) gives P [E] ≤
(L
ℓ

) Lℓ

nℓ
≤

(
eL2

ℓn

)ℓ
, which together with (5.6) implies (5.5). □

Proof of Proposition 2.3. Lemma 4.5 shows that P [Z (Φ) = 0] = o(n−1/2). Thus, we may condition on the event that
Φ is satisfiable. Furthermore, Lemma 5.1 shows that given thatΦ is satisfiable we have

log Z (Φ̂)− log Z (Φ) ≤
n∑

i=1
1{C (Φ, {xi }) ̸= ;}|V (Φ, {xi })|+ 1{C (Φ, {¬xi }) ̸= ;}|V (Φ, {¬xi })|. (5.9)

Finally, Corollary 4.4 and Lemma 5.3 (applied with δ< 1/3) imply that with probability 1−o(n−1/2),

n∑
i=1
1{C (Φ, {xi }) ̸= ;}|V (Φ, {xi })|+ 1{C (Φ, {¬xi }) ̸= ;}|V (Φ, {¬xi })| (5.10)

≤ (|{x ∈Vn : C (Φ, {x}) ̸= ;}|+ |{x ∈Vn : C (Φ, {¬x}) ̸= ;}|) max
1≤i≤n

|V (Φ, {xi })|+ |V (Φ, {¬xi })| = o(n1/3).

Thus, the assertion follows from (5.9)–(5.10). □

6. PROOF OF PROPOSITION 2.6

The proof of Proposition 2.6 is based on a combination of a coupling and a second moment argument. As a first
step we observe that we do not need to worry about trees of very high maximum degree.

Lemma 6.1. For any ε > 0, ℓ ≥ 0 there exists L > 0 such that for all t ∈ [0,1] with probability at least 1−ε the tree
T ⊗, (2ℓ) has maximum degree less than L.

Proof. The construction of the tree T ⊗, (2ℓ) in Section 2.6 ensures that every variable node has a Poisson number of
clauses as offspring. The mean of this Poisson variable is always bounded by 2d . Hence, Bennett’s inequality shows
that for any L > 2d the probability that a specific variable has more than L offspring is bounded by exp(−L2/(4d +
L)). Thus, choosing L sufficiently large so that ε> L2ℓ exp(−L2/(4d +L)) and applying the union bound, we obtain
the assertion (combined with the chain rule starting from the root). □

Thus, in the following we confine ourselves to trees T with a maximum degree bounded by a large enough
number L. First we are going to count the number of copies of such trees T in (Φ1(M , M ′),Φ2(M , M ′)) via the
method of moments. The following lemma estimates the first moment.

Lemma 6.2. For any fixed integers L,ℓ, any possible outcome T of T ⊗, (2ℓ) of maximum degree at most L and any
M ∼ tdn/2, M ′ ∼ (1− t )dn/2 we have

E[N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))] ∼ nP
[

T ⊗, (2ℓ) ∼= T
]

.

14


Proof. We proceed by induction on ℓ. In the case ℓ= 0 the tree T consists of nothing but the root, so that there is
nothing to show. Hence, let ℓ≥ 1. Let λ0,s1,s2 be the number of shared children of the root o of T where o appears
with sign s1 ∈ {−1,+1} and the other variable appears with sign s2 ∈ {−1,+1}. Also let λh,s1,s2 be the number of
h-distinct children of o (h = 1,2), where o appears with sign s1 ∈ {−1,+1} and the other variable appears with sign
s2 ∈ {−1,+1}.

Consider the event E that variable x1 is a 2ℓ-instance of T . Further, consider the event R that x1 occurs in
preciselyλ0,s1,s2 clauses among a1, . . . , aM , where the sign of x1 is s1 and the sign of the other variable is s2, precisely
inλ1,s1,s2 clauses among a ′

1, . . . , a ′
M ′ , where the sign of x1 is s1 and the sign of the other variable is s2 and precisely in

λ2,s1,s2 clauses among a ′′
1 , . . . , a ′′

M ′ , where the sign of x1 is s1 and the sign of the other variable is s2. Since M ∼ d tn/2
and λh,±1,±1 ≤ L for h ∈ {0,1,2} we have

P [R] ∼
∏

s1,s2∈{±1}
P

[
Bin(M , (2n)−1) =λ0,s1,s2

]
P

[
Bin(m −M , (2n)−1) =λ1,s1,s2

]
P

[
Bin(m −M , (2n)−1) =λ2,s1,s2

]

∼
∏

s1,s2∈{±1}
P

[
Po(d t/4) =λ0,s1,s2

]
P

[
Po(d(1− t )/4) =λ1,s1,s2

]
P

[
Po(d(1− t )/4) =λ2,s1,s2

]
. (6.1)

Letλh =λh,−1,−1+λh,−1,+1+λh,+1,−1+λh,+1,+1 for h ∈ {0,1,2}. GivenR let (v 0,i )1≤i≤λ0 be the second variables (other
than x1) contained in neighbours of x1 among a1, . . . , aM . Analogously, let (v 1,i )1≤i≤λh

be the second variables
contained in neighbours of x1 among a ′

1, . . . , a ′
M and (v 2,i )1≤i≤λh

be the second variables contained in neighbours
of x1 among a ′′

1 , . . . , a ′′
M . By Φ−

h , h = 1,2 define a random formula obtained from Φh(M , M ′) by deleting x1 and its
adjacent clauses. Let F be the event that the distance between any two of v 0,1, . . . , v 2,λ2 in both Φ−

1 and Φ−
2 is at

least 2ℓ. A routine union bound argument shows that

P [F] = 1−o(1). (6.2)

Further, let T0,i be the sub-tree obtained from T comprising the i -th shared grandchild of o and its descendants.
Consider the event H0 that R and F occur and v 0,i is a (2ℓ− 2)-instance of T0,i in (Φ−

1 ,Φ−
2 ) for any i = 1, . . . ,λ0.

Since the depth and the maximum degree of T are bounded, by induction we obtain

P
[

v 0,i is a (2ℓ−2)-instance of T0,i in (Φ−
1 ,Φ−

2 )
]=P

[
T ⊗, (2ℓ−2) ∼= T0,i

]
+o(1).

for i = 1, . . . ,λ0. Thus

P [H0|F∩R] =P
[
λ0⋂

i=1
v 0,i is a (2ℓ−2)-instance of T0,i in (Φ−

1 ,Φ−
2 )

]
=

λ0∏
i=0

P
[

T ⊗, (2ℓ−2) ∼= T0,i

]
+o(1). (6.3)

Analogously, let Th,i be the sub-tree of T pending on the i -th h-distinct grandchild of the root. Consider the
events Hh that F and R occur and that the depth (2ℓ−2)-neighbourhood of v h,i is isomorphic to Th,i in Φ−

h for
any i = 1, . . . ,λh , h = 1,2. Since v 1,i and v 2, j are chosen independently for all i and j , using the same embedding
process as above in combination with Lemma 4.6 we obtain

P [H1|F∩R∩H0] =
λ1∏

i=1
P

[
T (2ℓ−2) ∼= T1,i

]
+o(1) (6.4)

P [H2|F∩R∩H0 ∩H1] =
λ2∏

i=1
P

[
T (2ℓ−2) ∼= T2,i

]
+o(1). (6.5)

Finally, combining (6.1)–(6.5) we obtain

P [E] ∼P [R]
λ0∏

i=1
P

[
T ⊗, (2ℓ−2) ∼= T0,i

] λ1∏
i=1

P
[

T (2ℓ−2) ∼= T1,i

] λ2∏
i=1

P
[

T (2ℓ−2) ∼= T2,i

]
∼P

[
T ⊗, (2ℓ−2) ∼= T

]
.

As E[N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))] = nP [E] the assertion follows from the linearity of expectation. □

We also need an estimate of the second moment of N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′))).

Lemma 6.3. For any fixed integers L,ℓ and any possible outcome T of T (2ℓ) of maximum degree at most L and any
M ∼ tdn/2, M ′ ∼ (1− t )dn/2 we have

E[N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))2] ∼ n2P
[

T ⊗, (2ℓ) ∼= T
]2

.

15


Proof. Consider the event Ei j that both variables xi and x j are 2ℓ-instances of T for i , j = 1, . . . ,n. Now we can
rewrite the second moment as follows.

E
[

N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′)))2
]
=

n∑
i , j=1

P
[
Ei j

]= nP [E11]+n(n −1)P [E12] . (6.6)

From Lemma 6.2 we know that P [E11] = P
[
T ⊗, (2ℓ) ∼= T

]+o(1), so we only need to estimate P [E12]. Let F be such
event that the distance between x1 and x2 is at least 2ℓ. A routine union bound argument shows that

P [F] = 1−o(1).

This fact completes the proof of Lemma 6.3.

P [E12] =P [E12 |F]+o(1) =P [x1 is a 2ℓ-instance of T |F] ·P [x2 is a 2ℓ-instance of T |F]+o(1)

=P
[

T ⊗, (2ℓ) ∼= T
]2

+o(1). (6.7)

Thus the assertion follows from (6.6) and (6.7). □

Proof of Proposition 2.6. From Lemmas 6.1–6.3 in combination with Chebyshev’s inequality it follows that for any
ℓ≥ 0,T w.h.p.

N (2ℓ)(T, (Φ1(M , M ′),Φ2(M , M ′))) ∼ nP
[

T ⊗, (2ℓ) ∼= T
]

. (6.8)

We need to extend this to the pruned formulas (Φ̂1(M , M ′),Φ̂2(M , M ′)). Let N (2ℓ),+(T, (Φ1,Φ2)) be the number of
variable nodes x such that x is an 2ℓ-instance of T in (Φ̂1,Φ̂2) but not in (Φ1,Φ2). Similarly, let N (2ℓ),−(T, (Φ1,Φ2))
be the number of variable nodes x such that they are 2ℓ-instances of T in (Φ1,Φ2) but not in (Φ̂1,Φ̂2). Then

N (2ℓ)(T, (Φ̂1,Φ̂2)) = N (2ℓ)(T, (Φ1,Φ2))+N (2ℓ),+(T, (Φ1,Φ2))−N (2ℓ),−(T, (Φ1,Φ2)). (6.9)

Note that both N (2ℓ),+(T, (Φ1,Φ2)) and N (2ℓ),−(T, (Φ1,Φ2)) do not exceed the number of variable nodes x whose
depth-2ℓ neighbourhood in (Φ1,Φ2) contains at least one clause from

⋃
l∈{xi ,¬xi , 1≤i≤n} C (Φ, {l }). Moreover, Lem-

mas 4.4 and 5.3 show that w.h.p.
∣∣∣∣∣

⋃
l∈{xi ,¬xi , 1≤i≤n}

C (Φ, {l })

∣∣∣∣∣≤ n0.1. (6.10)

It follows from Lemma 4.8 that w.h.p. the 2ℓ-depth neighbourhood of each vertex consists of no more then
log4ℓ+4 n vertices. Combining this fact with (6.10) we conclude that

N (2ℓ),+(T, (Φ1,Φ2)) ≤ n0.1 log4ℓ+4 n, N (2ℓ),−(T, (Φ1,Φ2)) ≤ n0.1 log4ℓ+4 n (6.11)

w.h.p. Finally, the assertion follows from (6.8), (6.9) and (6.11). □

7. PROOF OF PROPOSITION 2.5

We will deal with Z (Φ̂h (M ,m−M))
Z (Φ̂h (M−1,m−M))

in detail; the arguments for the other two quotients are similar.

Lemma 7.1. Let h ∈ {1,2}. W.h.p. Φ̂h(M ,m −M) is obtained from Φ̂h(M −1,m −M) by adding a clause aM .

Proof. Let l , l ′ be the constituent literals of aM , i.e., aM = l ∨ l ′. Moreover, let Q be the event that Φ̂h(M ,m −M)
does not result from Φ̂h(M−1,m−M) by adding clause aM . Thus, on the event Q the additional clause aM triggers
the pruning of clauses that do not get pruned from Φ̂(M −1,m −M) (including potentially aM itself).

We are going to construct events E,E′ whose probabilities are easy to estimate such that

Q⊆E∪E′. (7.1)

To this end, for a literal l let Ll =L (Φh(M −1,m−M), {l }) be the final set of literals that PUC(Φh(M −1,m−M), {l })
produces. Call l a trigger of ¬l if ¬l ∈Ll . Further, let E be the event that there exists a trigger l of ¬l such that

E1: C (Φ(M −1,m −M), {l })∪C (Φ(M −1,m −M), {l , l ′}) ̸= ;, or
E2: ¬l ′ ∈⋃

λ∈{l ,l ′} Lλ.
16


Define E′ analogously with the roles of l , l ′ swapped.
We claim that these events E,E′ satisfy (7.1). To see this, assume that neither E nor E′ occurs. We claim that

then

C (Φh(M −1,m −M), {l }) =C (Φh(M ,m −M), {l }) (7.2)

for all literals l ; if so, then clearly Q does not occur either.
Thus, assume that (7.2) is false and that l is a literal such that

C (Φh(M −1,m −M), {l }) ̸=C (Φh(M ,m −M), {l }). (7.3)

Then l must be a trigger of ¬l or of ¬l ′; for otherwise the presence of the extra clause aM has no impact on the
set of conflict clauses. Hence, suppose that l is a trigger of ¬l . Then the presence of clause aM in Φh(M ,m −M)
causes PUC to add l ′ to L (Φh(M ,m −M), {l }). Since the event E does not occur, neither does E1 and we conclude
that C (Φh(M−1,m−M), {l }) =C (Φh(M−1,m−M), {l , l ′}) =;. Hence, none of the clauses a ∈ F (Φh(M−1,m−M))
is a conflict clause and thus (7.3) implies that

{aM } =C (Φh(M ,m −M), {l }) \C (Φh(M −1,m −M), {l }).

But this is not possible either. For if aM ∈C (Φh(M ,m−M), {l }), then Lemma 4.2 shows that one of l , l , l ′ is a trigger
of ¬l ′, and thus E2 occurs. Thus, we obtain (7.2).

To complete the proof we are going to show that

P [E] ,P
[
E′]= o(1). (7.4)

Indeed, Lemma 5.3 shows that the number of literals l such that C (Φh(M −1,m −M), {l }) ̸= ; can be bounded by
n0.1 w.h.p. Furthermore, Corollary 4.4 shows that |V (Φh(M −1,m −M), {l })| ≤ log2 n w.h.p. for all l . Hence, w.h.p.
the total number of literalsλ that have a trigger l such that C (Φh(M−1,m−M), {l }) ̸= ; is bounded by O(n0.1 log2 n).
Consequently, the probability that the random literal ¬l possesses such a trigger is bounded by O(n−0.9 log2 n).
Moreover, since l ′ is a random literal as well, Lemma 5.3 shows that P

[
C (Φ(M −1,m −M), {l ′}) =;]= 1−O(n−0.9).

Additionally, w.h.p. for any trigger l of ¬l we have V (Φ(M −1,m −M), {l })∩V (Φ(M −1,m −M), {l ′}) =;, because
l , l ′ are drawn independenly ofΦ(M −1,m −M). Similarly,

P
[¬l ′ ∈L (Φ(M −1,m −M), {l })

]= o(1) and P
[¬l ′ ∈L (Φ(M −1,m −M), {l ′})

]= o(1).

Combining these estimates, we conclude that P [E] = o(1). By symmetry, the same estimate holds for E′. Thus, we
obtain (7.4). Finally, the assertion follows from (7.1) and (7.4). □

Corollary 7.2. Let h ∈ {1,2}. W.h.p. we have

Z (Φ̂h(M ,m −M))

Z (Φ̂h(M −1,m −M))
=µΦ̂h (M−1,m−M) ({σ |= aM }) .

Proof. From Lemma 7.1 we know that w.h.p.

Z (Φ̂h(M ,m −M)) = Z (Φ̂(M −1,m −M)+aM ). (7.5)

Assuming that (7.5) is correct, Z (Φ̂h(M ,m−M)) equals the number of satisfying assignments of Φ̂h(M −1,m−M)
that also happen to satisfy aM . □

Additionally, we need the following asymptotic independence property, known as ‘replica symmetry’ in physics
parlance.

Lemma 7.3. Let h ∈ {1,2}. For all s, s′ ∈ {±1} we have

1

n2

n∑
i , j=1

E
∣∣∣µΦ̂h (M−1,m−M)({σxi = s,σx j = s′})−µΦ̂h (M−1,m−M)({σxi = s})µΦ̂h (M−1,m−M)({σx j = s′})

∣∣∣= o(1).

Proof. We adapt an argument from [40] to the present setting. By exchangeability it suffices to prove that

E
∣∣∣µΦ̂h (M−1,m−M)({σx1 = s,σx2 = s′})−µΦ̂h (M−1,m−M)({σx1 = s})µΦ̂h (M−1,m−M)({σx2 = s′})

∣∣∣= o(1).

The proof rests on the Gibbs uniqueness property. Indeed, Proposition 2.6 shows that for any fixed ℓ the depth-2ℓ
neighbourhood ∂≤2ℓxi of xi in Φ̂h(M −1,m −M) is within total variation distance o(1) of the Galton-Watson tree
T (2ℓ)

h . Furthermore, the distribution of T (2ℓ)
h by itself is identical to the distribution of the Galton-Watson tree T (2ℓ).

17


Additionally, Proposition 2.1 shows that T (2ℓ) enjoys the Gibbs uniqueness property (2.6). Consequently, taking
ℓ= ℓ(n) →∞ sufficiently slowly as n →∞, we see that w.h.p.

∑
s∈{±1}

max
κ∈S(Φ̂h (M−1,m−M))

∣∣∣µΦ̂h (M−1,m−M)({σx1 = s | σ∂ℓx1
= κ∂2ℓx1

})−µΦ̂h (M−1,m−M)({σx1 = s})
∣∣∣= o(1). (7.6)

Furthermore, providing ℓ= ℓ(n) →∞ slowly enough, the distance between x1, x2 exceeds 4ℓw.h.p. In this case,
(7.6) gives

µΦ̂h (M ,m−M)({σx1 = s,σx2 = s′}) =µΦ̂h (M ,m−M)({σx1 = s | σx2 = s′}) ·µΦ̂h (M ,m−M)({σx2 = s′})

=µΦ̂h (M ,m−M)({σx2 = s′}) ·
∑

κ∈{±1}∂
2ℓx1

µΦ̂h (M ,m−M)({σx1 = s | σ∂2ℓx1
= κ,σx2 = s′})

·µΦ̂h (M ,m−M)({σ∂2ℓx1
= κ | σx2 = s′})

=µΦ̂h (M ,m−M)({σx2 = s′}) ·
∑

κ∈{±1}∂
2ℓx1

µΦ̂h (M ,m−M)({σx1 = s | σ∂2ℓx1
= κ})

·µΦ̂h (M ,m−M)({σ∂2ℓx1
= κ | σx2 = s′})

=µΦ̂h (M−1,m−M)({σx1 = s})µΦ̂h (M−1,m−M)({σx2 = s′}) · (1+o(1)), (7.7)

as claimed. □
Proof of Proposition 2.5. The proposition follows from Corollary 7.2 and Lemma 7.3. □

8. PROOF OF PROPOSITION 2.8

In this section, we prove Proposition 2.8 via a contraction argument. For this, recall the operatorlogBP⊗d ,t from (1.3).
For notational convenience we let

V =




∑d
i=1 si log

(
1+r i tanh(ξρ,i ,1/2)

2

)
+∑d ′

i=1 s ′i log

(
1+r ′

i tanh(ξ′ρ,i ,1/2)

2

)

∑d
i=1 si log

(
1+r i tanh(ξρ,i ,2/2)

2

)
+∑d ′′

i=1 s ′′i log

(
1+r ′′

i tanh(ξ′′ρ,i ,2/2)

2

)


 .

The main step towards Proposition 2.8 is the following lemma:

Lemma 8.1. logBP⊗d ,t is a contraction on the space (W2(R2),W2) for all 0 < d < 2 and 0 ≤ t ≤ 1.

Indeed, it immediately follows from Lemma 8.1 and Banach’s fixed point theorem that for every d ∈ (0,2) and
t ∈ [0,1], there is a unique ρd ,t ∈ W2(R2) with ρd ,t = logBP⊗d ,t (ρd ,t ), and that for any ρ ∈ W2(R2) and ℓ→ ∞, the

ℓ-fold application of logBP⊗d ,t to ρ converges to ρd ,t in Wasserstein distance.
We prove Lemma 8.1 in the following subsection, and conclude the section with the proof of Proposition 2.8.

8.1. Proof of Lemma 8.1. We first check that the operator logBP⊗d ,t is well-defined in the sense that it maps the

space (W2(R2),W2) to itself.

Claim 8.2. The operator logBP⊗d ,t maps the space (W2(R2),W2) to itself.

Proof. Let ρ ∈ (W2(R2),W2) and V be a random vector with distribution logBP⊗d ,t (ρ). By the definition of logBP⊗d ,t ,

E
[∥V ∥2

2

]=E
[(

d∑
i=1

si log

(1+ r i tanh(ξρ,i ,1/2)

2

)
+

d ′∑
i=1

s ′i log

(
1+ r ′

i tanh(ξ′ρ,i ,1/2)

2

))2

+
(

d∑
i=1

si log

(1+ r i tanh(ξρ,i ,2/2)

2

)
+

d ′′∑
i=1

s ′′i log

(
1+ r ′′

i tanh(ξ′′ρ,i ,2/2)

2

))2 ]
. (8.1)

By the independence of the random variables (si )i≥1, (s ′i )i≥1 and (s ′′i )i≥1 from everything else, all cross-terms in
the evaluation of the squares in (8.1) vanish, e.g. for i ̸= j ,

E

[
si s j log

(1+ r i tanh(ξρ,i ,1/2)

2

)
log

(1+ r j tanh(ξρ, j ,1/2)

2

)]

= E [si ]E

[
s j log

(1+ r i tanh(ξρ,i ,1/2)

2

)
log

(1+ r j tanh(ξρ, j ,1/2)

2

)]
= 0.

18


As a consequence, (8.1) in combination with the independence of the Poisson random variables gives that

E
[∥V ∥2

2

]=E
[

td log2
(1+ r 1 tanh(ξρ,1,1/2)

2

)
+ (1− t )d log2

(
1+ r ′

1 tanh(ξ′ρ,1,1/2)

2

)

+ td log2
(1+ r 1 tanh(ξρ,1,2/2)

2

)
+ (1− t )d log2

(
1+ r ′′

1 tanh(ξ′′ρ,1,2/2)

2

)]
. (8.2)

Finally, conditioning on the value of r 1 and an application of the fundamental theorem of calculus, followed by
the Cauchy-Schwarz inequality give

E

[
log2

(1+ r 1 tanh(ξρ,1,1/2)

2

)]
=1

2
E

[
log2

(1+ tanh(ξρ,1,1/2)

2

)
+ log2

(1− tanh(ξρ,1,1/2)

2

)]

=1

2
E

[(∫ ξρ,1,1

0

1− tanh(x/2)

2
dx − log2

)2

+
(∫ ξρ,1,1

0

1+ tanh(x/2)

2
dx + log2

)2]

≤2E
[

log2 2+ξ2
ρ,1,1

]
.

Analogous bounds can be derived for the remaining three terms in (8.2). Therefore, for any vector ξ ∈ R2 with
distribution ρ,

E
[∥V ∥2

2

]≤ E
[

2td
(
ξ2
ρ,1,1 +ξ2

ρ,1,2

)
+2(1− t )d

(
ξ′ 2
ρ,1,1 +ξ′′ 2

ρ,1,2

)]
+4d log2 2 = 2dE

[∥ξ∥2
2

]+4d log2 2 <∞.

□

Proof of Lemma 8.1. Let ρ,ν ∈ W2(R2) be arbitrary. To show contraction, consider three independent sequences
of optimally coupled pairs (ξρ,i ,ξν,i )i≥1, (ξ′ρ,i ,ξ′ν,i )i≥1 and (ξ′′ρ,i ,ξ′′ν,i )i≥1 such that for each ζ = ξ,ξ′,ξ′′, the ζρ,i =
(ζρ,i ,1,ζρ,i ,2) ∈R2 have distribution ρ, the ζν,i = (ζν,i ,1,ζν,i ,2) ∈R2 have distribution ν and

W2(ρ,ν) = E[∥ζρ,i −ζν,i∥2
2

]1/2
. (8.3)

Let d ∼ Po(td) and d ′,d ′′ ∼ Po((1− t )d) all be independent. Moreover, let (si )i≥1, (r i )i≥1, (s ′i )i≥1, (r ′
i )i≥1, (s ′′i )i≥1

and (r ′′
i )i≥1 be independent sequences of i.i.d. Rademacher random variables with parameter 1/2; all not explicitly

coupled random variables are assumed to be independent. Then with ρ̂ = logBP⊗d ,t (ρ) and ν̂ = logBP⊗d ,t (ν) we
obtain

W2(ρ̂, ν̂)2 ≤ E



(
d∑

i=1
si log

(1+ r i tanh(ξρ,i ,1/2)

1+ r i tanh(ξν,i ,1/2)

)
+

d ′∑
i=1

s ′i log

(
1+ r ′

i tanh(ξ′ρ,i ,1/2)

1+ r ′
i tanh(ξ′ν,i ,1/2)

))2



+E



(
d∑

i=1
si log

(1+ r i tanh(ξρ,i ,2/2)

1+ r i tanh(ξν,i ,2/2)

)
+

d ′′∑
i=1

s ′′i log

(
1+ r ′′

i tanh(ξ′′ρ,i ,2/2)

1+ r ′′
i tanh(ξ′′ν,i ,2/2)

))2

 .

Analogous to the derivation of (8.2), by the independence of the random signs, the expectations of the cross-terms
cancel. Combined with the independence of the Poisson random variables, we conclude that

W2(ρ̂, ν̂)2 ≤ tdE

[
log2

(1+ r 1 tanh(ξρ,1,1/2)

1+ r 1 tanh(ξν,1,1/2)

)]
+ (1− t )dE

[
log2

(
1+ r ′

1 tanh(ξ′ρ,1,1/2)

1+ r ′
1 tanh(ξ′ν,1,1/2)

)]

+ tdE

[
log2

(1+ r 1 tanh(ξρ,1,2/2)

1+ r 1 tanh(ξν,1,2/2)

)]
+ (1− t )dE

[
log2

(
1+ r ′′

1 tanh(ξ′′ρ,1,2/2)

1+ r ′′
1 tanh(ξ′′ν,1,2/2)

)]
. (8.4)

Moreover, conditioning on the value of r 1 and an application of the fundamental theorem of calculus yield

log2
1+ tanh(ξρ,1,1/2)

1+ tanh(ξν,1,1/2)
=

[∫ ξρ,1,1

ξν,1,1

∂ log(1+ tanh(z/2))

∂z
dz

]2

=
[∫ ξρ,1,1∨ξν,1,1

ξρ,1,1∧ξν,1,1

1− tanh(z/2)

2
d z

]2

, (8.5)

log2
1− tanh(ξρ,1,1/2)

1− tanh(ξν,1,1/2)
=

[∫ ξρ,1,1

ξν,1,1

∂ log(1− tanh(z/2))

∂z
d z

]2

=
[∫ ξρ,1,1∨ξν,1,1

ξρ,1,1∧ξν,1,1

1+ tanh(z/2)

2
d z

]2

. (8.6)

19


Combining (8.5) and (8.6) and applying the Cauchy-Schwarz inequality, we obtain

E

[
log2

(1+ r 1 tanh(ξρ,1,1/2)

1+ r 1 tanh(ξν,1,1/2)

)]
≤ 1

2
E
[(
ξρ,1,1 −ξν,1,1

)2
]

. (8.7)

An identical argument can be made for ξρ,1,2, ξν,1,2, ξ′ρ,1,1, ξ′ν,1,1 and ξ′′ρ,1,2, ξ′′ν,1,2. Finally, (8.3),(8.4) and (8.7) yield

W2(ρ̂, ν̂)2 ≤ td

2
E
[(
ξρ,1,1 −ξν,1,1

)2 + (
ξρ,1,2 −ξν,1,2

)2
]
+ (1− t )d

2
E

[(
ξ′ρ,1,1 −ξ′ν,1,1

)2
+

(
ξ′′ρ,1,2 −ξ′′ν,1,2

)2
]

= td

2
W2(ρ,ν)2 + (1− t )d

2
W2(ρ,ν)2 = d

2
W2(ρ,ν)2, (8.8)

which implies contraction because d < 2. □

8.2. Proof of Proposition 2.8. The uniqueness of ρd ,t ∈ W2(R2) with logBP⊗d ,t (ρd ,t ) = ρd ,t (which yields (1.5)) fol-

lows from Lemma 8.1 and the Banach fixed point theorem. As the Dirac measure in zero is an element of W2(R2),
Lemma 8.1 also implies the weak convergence of (ρ(ℓ)

d ,t )ℓ≥0 to ρd ,t .

9. PROOF OF PROPOSITION 2.7

As a first step toward the proof of Proposition 2.7 we are going to introduce an operator on probability distributions
on the unit square that resembles the Belief Propagation update equations (2.3)–(2.4). We will see that this operator
is closely related to the operator from (1.3). Specifically, (1.3) is the log-likelihood version of the new operator.
Subsequently, we will show that the Belief Propagation operator correctly implements marginal computations on
the Galton-Watson trees (T 1,T 2).

9.1. Density evolution. Recall that P ((0,1)2) is the space of all Borel probability measures on the unit square
(0,1)2. We define an operator

BP⊗
d ,t : P ((0,1)2) →P ((0,1)2), π 7→ π̂= BP⊗

d ,t (π) (9.1)

as follows. For s ∈ {±1} let

(µπ,s,i ,1,µπ,s,i ,2)i≥1, (µ′
π,s,i ,1,µ′

π,s,i ,2)i≥1, (µ′′
π,s,i ,1,µ′′

π,s,i ,2)i≥1

be three sequences of random vectors with distribution π. Further, let (d s ,d ′
s ,d ′′

s )s∈{±1} be Poisson variables with
E[d s ] = td/2 and E[d ′

s ] = E[d ′′
s ] = (1− t )d/2. Finally, let ((r s,i ,r ′

s,i ,r ′′
s,i ))s∈{±1},i≥1 be uniformly distributed on {±1}3.

All of these random variables are mutually independent. Then π̂ ∈ P ((0,1)2) is the distribution of the random
vector

(2−d−1−d ′
−1

∏d−1
i=1 (1+ r −1,i (2µπ,−1,i ,1 −1))

∏d ′
−1

i=1 (1+ r ′
−1,i (2µ′

π,−1,i ,1 −1))
∑

s∈{±1} 2−d s−d ′
s
∏d s

i=1(1+ r s,i (2µπ,s,i ,1 −1))
∏d ′

s
i=1(1+ r ′

s,i (2µ′
π,s,i ,1 −1))

, (9.2)

2−d−1−d ′′
−1

∏d−1
i=1 (1+ r −1,i (2µπ,−1,i ,2 −1))

∏d ′′
−1

i=1 (1+ r ′′
−1,i (2µ′′

π,−1,i ,2 −1))
∑

s∈{±1} 2−d s−d ′′
s
∏d s

i=1(1+ r s,i (2µπ,s,i ,2 −1))
∏d ′′

s
i=1(1+ r ′′

s,i (2µ′′
π,s,i ,2 −1))

)
∈ (0,1)2.

Let u⊗ ∈P ((0,1)2) denote the atom on the centre ( 1
2 , 1

2 ) of the unit square. We write BP⊗(ℓ)
d ,t for the ℓ-fold application

of the operator BP⊗
d ,t . We are going to perform a fixed point iteration using the operator BP⊗

d ,t , starting from u⊗.
This fixed point iteration is known as density evolution in physics jargon [36]. Let

π(ℓ)
d ,t = BP⊗(ℓ)

d ,t (u⊗).

Lemma 9.1. Let d ∈ (0,2), t ∈ [0,1] and set πd ,t = t(ρd ,t ), where ρd ,t is the unique fixed point of logBP⊗d ,t from
Proposition 2.8.

Then πd ,t is a fixed point of BP⊗
d ,t , and

πd ,t = lim
ℓ→∞

π(ℓ)
d ,t .

20


Proof. For s ∈ {±1} let

(ξρ,s,i ,1,ξρ,s,i ,2)i≥1, (ξ′ρ,s,i ,1,ξ′ρ,s,i ,2)i≥1, (ξ′′ρ,s,i ,1,ξ′′ρ,s,i ,2)i≥1 (9.3)

be three sequences of random vectors with distribution ρd ,t . Further, let (d s ,d ′
s ,d ′′

s )s∈{±1} be Poisson variables
with E[d s ] = td/2 and E[d ′

s ] = E[d ′′
s ] = (1− t )d/2 and let (d ,d ′,d ′′) be Poisson variables with E[d ] = td and E[d ′] =

E[d ′′] = (1− t )d . Finally, let ((r s,i ,r ′
s,i ,r ′′

s,i ))s∈{±1},i≥1, ((si , s ′i , s ′′i ))i≥1 and ((r i ,r ′
i ,r ′′

i ))i≥1 all be uniformly distributed

on {±1}3. All of these random variables are mutually independent. Throughout the proof, we write l = (l1, l2) and
t= (t1,t2) for lh : (0,1) →R, lh(x) = log x

1−x , and th :R→ (0,1), th(x) = (1+ tanh(x/2))/2, where h ∈ {1,2}.
Then, since t(ξρ,s,1,1,ξρ,s,1,2) = (t2(ξρ,s,1,1),t2(ξρ,s,1,2)) has distribution t(ρd ,t ), using the definitions of l and t,

BP⊗
d ,t (t(ρd ,t ))

dist=
(2−d−1−d ′

−1
∏d−1

i=1 (1+ r −1,i (2t1(ξρ,−1,i ,1)−1))
∏d ′

−1
i=1 (1+ r ′

−1,i (2t1(ξ′ρ,−1,i ,1)−1))

∑
s∈{±1} 2−d s−d ′

s
∏d s

i=1(1+ r s,i (2t1(ξρ,s,i ,1)−1))
∏d ′

s
i=1(1+ r ′

s,i (2t1(ξ′ρ,s,i ,1)−1))
,

2−d−1−d ′′
−1

∏d−1
i=1 (1+ r −1,i (2t1(ξρ,−1,i ,2)−1))

∏d ′′
−1

i=1 (1+ r ′′
−1,i (2t1(ξ′′ρ,−1,i ,2)−1))

∑
s∈{±1} 2−d s−d ′′

s
∏d s

i=1(1+ r s,i (2t1(ξρ,s,i ,2)−1))
∏d ′′

s
i=1(1+ r ′′

s,i (2t1(ξ′′ρ,s,i ,2)−1))

)

dist= t




∑d
i=1 si log

1+r i (2t1(ξρ,i ,1)−1)
2 +∑d ′

i=1 s ′i log
1+r ′

i (2t1(ξ′ρ,i ,1)−1)

2∑d
i=1 si log

1+r i (2t1(ξρ,i ,2)−1)
2 +∑d ′′

i=1 s ′′i log
1+r ′′

i (2t1(ξ′′ρ,i ,2)−1)

2




dist= t




∑d
i=1 si log

1+r i tanh(ξρ,i ,1/2)
2 +∑d ′

i=1 s ′i log
1+r ′

i tanh(ξ′ρ,i ,1/2)

2∑d
i=1 si log

1+r i tanh(ξρ,i ,2)/2)
2 +∑d ′′

i=1 s ′′i log
1+r ′′

i tanh(ξ′′ρ,i ,2/2)

2


 .

Since ρd ,t is a fixed point of logBP⊗d ,t , the last argument vector of t has distribution ρd ,t , and we get that

BP⊗
d ,t (t(ρd ,t )) = t(ρd ,t ).

So t(ρd ,t ) is a fixed point of BP⊗
d ,t .

Next, let n⊗ ∈ P (R2) denote the atom in (0,0). Then since t(0,0) = ( 1
2 , 1

2 ), we have u⊗ = t(n⊗). By a compu-

tation analogous to the first part of the proof, one can show inductively that for all ℓ ≥ 1, π(ℓ)
d ,t = BP⊗

d ,t (t(n⊗)) =
t(logBP⊗d ,t

(ℓ)(n⊗)). As

ρd ,t = lim
ℓ→∞

logBP⊗d ,t
(ℓ)(n⊗),

the second part of the claim now follows from the continuous mapping theorem. □

9.2. Belief Propagation on the Galton-Watson tree. The proof of Proposition 2.7 relies on the fact that Belief Prop-
agation is ‘exact’ on trees. The following fact, which is a direct consequence of [36, Theorem 14.1], furnishes the
precise statement that we will use.

Fact 9.2. Assume that the bipartite graph associated with the 2-CNF Φ is a (finite) tree. Let z ∈ V (Φ) be a variable
and let ℓ≥ 1 be an integer such that no variable or clause ofΦ has distance greater than 2ℓ from z. Let

µ(0)
x→a(s) =µ(0)

a→x (s) = 1

2
for all x ∈V (Φ), a ∈ ∂x, s ∈ {±1}.

Furthermore, obtain the messages (µ(i+1)
x→a (s),µ(i+1)

a→x (s))x,a,s by applying the BP operator (2.2) to
(µ(i )

x→a(s),µ(i )
a→x (s))x,a,s . Then for all i ≥ 2ℓ we have

µΦ({σz = s}) =
∏

a∈∂z µ
(i )
a→z (s)

∏
a∈∂z µ

(i )
a→z (1)+∏

a∈∂z µ
(i )
a→z (−1)

. (9.4)

As a preparation toward the proof of Proposition 2.7 we establish the following ‘univariate’ variant of the propo-
sition.

Lemma 9.3. Let h = 1,2. Let π(ℓ)
d ,t ,h be the distribution of the h-th component of a random vector with distribution

π(ℓ)
d ,t . Then µT (2ℓ)

h
({σo = 1}) has distribution π(ℓ)

d ,t ,h .

21


Proof. We proceed by induction on ℓ. For ℓ= 0 there is nothing to show because both µT (0)
h

({σo = 1}) = 1
2 and π(ℓ)

d ,t ,h

is the atom on 1/2. To go from ℓ−1 to ℓ ≥ 1 we exploit the fact that T h by itself has the same distribution as the
‘plain’ Galton-Watson tree T from Section 2.2, after all distinctions between different types of clauses and variables
are dropped. In effect, the tree T h,x pending on any grandchild x ∈ ∂2o of the root has the same distribution as
T itself, and these trees are mutually independent for all x ∈ ∂2o. Consequently, by induction we know that the
marginal µT (2(ℓ−1))

h,x
({σx = 1}) of x in T (2(ℓ−1))

h,x has distribution π(ℓ−1)
d ,t ,h .

Now let ax ∈ ∂x ∩∂o be the clause that links x and o. Fact 9.2 implies that the marginals µT (2(ℓ−1))
h,x

({σx = s}) co-

incide with the messages µ(2(ℓ−1))
T (2ℓ),x→ax

(s). Indeed, the marginal formula (9.4) for the tree T (2(ℓ−1))
h,x coincides with the

message update formula (2.4), because clause ax is not part of T h,x . Furthermore, because the trees pending on the

different grandchildren of the root o are mutually independent, the incoming messages (µ(2(ℓ−1))

T (2ℓ)
h,x ,x→ax

(±1))x∈∂2o are

mutually independent. Moreover, the quotient from (9.4), which, upon substituting in the update equation (2.3),
can be rewritten as

µT (2ℓ)
h

({σo = 1}) =

∏
x∈∂2o 1{sign(o, ax ) = 1}+ 1{sign(o, ax ) =−1}µ(2(ℓ−1))

T (2ℓ)
h,x ,x→ax

(sign(x, ax ))

∑
s∈{±1}

∏
x∈∂2o 1{sign(o, ax ) = s}+ 1{sign(o, ax ) =−s}µ(2(ℓ−1))

T (2ℓ)
h,x ,x→ax

(sign(x, ax ))
. (9.5)

Also recall that o has Po(d) children. Hence, comparing (9.5) with the h-component of (9.2), we conclude that
µT (2ℓ)

h
({σo = 1}) has distribution π(ℓ)

d ,t ,h . □

Lemma 9.4. Let t ∈ [0,1]. Then µ(2ℓ) has distribution π(ℓ)
d ,t .

Proof. As in the proof of Lemma 9.3 we proceed by induction on ℓ. For ℓ= 0 there is nothing to show. To go from
ℓ− 1 to ℓ ≥ 1 we reuse the rewritten update equation (9.5). In each of the correlated trees T 1,T 2 the root o has
Po((1− t )d) {1,2}-distinct grandchildren. By construction, the trees pending on these grandchildren are mutually
independent copies of the tree T . Hence, the same consideration as in the proof of Lemma 9.3 shows that the
messages that the {1,2}-distinct grandchildren pass up are independent with distribution π(ℓ−1)

d ,t ,h . The same is true

of the messages µ′
π(ℓ−1)

d ,t ,±1,i ,h
,µ′′

π(ℓ−1)
d ,t ,±1,i ,h

from (9.2). Consequently, the contribution of the {1,2}-distinct children

in (9.2) matches the corresponding contribution to the update equation (9.5).
With respect to the shared grandchildren, we apply induction as in the proof of Lemma 9.3. Indeed, the trees

pending on the shared grandchildren x ∈ ∂2
T ⊗o have the same distribution as the original tree T ⊗ and are mutually

independent. Therefore, by the induction hypothesis, the pair of messages (µ(2(ℓ−1))

T (2ℓ−2)
h ,x→ax

(1))h=1,2 that a shared

grandchild x sends towards the root has distribution π(ℓ−1)
d ,t . Finally, since o has Po(d t ) shared grandchildren,

matching the expressions (9.5) and (9.2) completes the proof. □

Proof of Proposition 2.7. The assertion follows from Lemmas 9.1 and 9.4. □

10. PROOF THAT THE VARIANCE IS FINITE

The main goal of this section is to show that both the evaluation of the functional B⊗
d ,t on ρd ,t as well as the

integration to obtain η(d)2 yield finite values for any d ∈ (0,2) and t ∈ [0,1].

Lemma 10.1. For any d ∈ (0,2) and t ∈ [0,1], B⊗
d ,t (ρd ,t ) <∞. Moreover, for any d ∈ (0,2), η(d)2 <∞.

10.1. Proof of Lemma 10.1. Let ρ(ℓ)
d ,t ∈W2(R2) be the result of ℓ iterations of logBP⊗d ,t launched from n⊗, the atom

at (0,0). In the proof of Lemma 10.1, the following properties of the fixed point πd ,t will be used:

Claim 10.2. Let πd ,t = t(ρd ,t ) and µπd ,t
= (µπd ,t ,1,µπd ,t ,2) be a random vector with distribution πd ,t . Then

µπd ,t ,1
dist= µπd ,t ,2 and µπd ,t 1

dist= 1−µπd ,t ,1.

Proof. Recall the definition of BP⊗
d ,t from (9.2). The first claim then follows from the following limiting argument:

By Lemma 9.1, πd ,t = limℓ→∞ BP⊗(ℓ)
d ,t (u⊗), where u⊗ is the Dirac measure on (1/2,1/2). As the marginal distributions

22


of the initial distribution u⊗ are identical, inspection of the update rule (9.2) yields that also the marginal distribu-
tions of BP⊗(1)

d ,t (u⊗) are identical. Analogously, it is immediate from (9.2) that anyπwith two identical marginals will

be mapped to a measure BP⊗
d ,t (π) with two identical marginals, such that the marginal distributions of BP⊗(ℓ)

d ,t (u⊗)
for any ℓ≥ 0 are identical. Hence, also in the limit,

µπd ,t ,1
dist= µπd ,t ,2.

On the other hand, Lemma 9.1 also implies that πd ,t = BP⊗
d ,t (πd ,t ), so that the distribution of µπd ,t ,1 is the same as

the distribution of

2−d−1−d ′
−1

∏d−1
i=1 (1+ r −1,i (2µπd ,t ,−1,i ,1 −1))

∏d ′
−1

i=1 (1+ r ′
−1,i (2µ′

πd ,t ,−1,i ,1 −1))

∑
s∈{±1} 2−d−s−d ′

−s
∏d−s

i=1(1+ r −s,i (2µπd ,t ,−s,i ,1 −1))
∏d ′

−s
i=1(1+ r ′

−s,i (2µ′
πd ,t ,−s,i ,1 −1))

(10.1)

while the distribution of 1−µπd ,t ,1 is the same as the distribution of

2−d 1−d ′
1
∏d 1

i=1(1+ r 1,i (2µπd ,t ,1,i ,1 −1))
∏d ′

1
i=1(1+ r ′

1,i (2µ′
πd ,t ,1,i ,1 −1))

∑
s∈{±1} 2−d−s−d ′

−s
∏d−s

i=1(1+ r −s,i (2µπd ,t ,−s,i ,1 −1))
∏d ′

−s
i=1(1+ r ′

−s,i (2µ′
πd ,t ,−s,i ,1 −1))

. (10.2)

This immediately shows that (10.1) and (10.2) have the same distribution. As a consequence, the second claim
holds as well. □

Proof of Lemma 10.1. Recall that πd ,t = t(ρd ,t ). Let

µπd ,t ,1 = (µπd ,t ,1,1,µπd ,t ,1,2) = t(ξρd ,t ,1,1,ξρd ,t ,1,2),

and

µπd ,t ,2 = (µπd ,t ,2,1,µπd ,t ,2,2) = t(ξρd ,t ,2,1,ξρd ,t ,2,2).

Then they are independent random vectors with distribution πd ,t . Let r 1, r 2 be independent Rademacher random
variables with parameter 1/2 , independent of µπd ,t ,1 and µπd ,t ,2. Conditioning on the values of r 1 and r 2 yields
the upper bound

|B⊗
d ,t (ρd ,t )| ≤ 1

4
E
[∣∣∣log

(
1−µπd ,t ,1,1µπd ,t ,2,1

)
log

(
1−µπd ,t ,1,2µπd ,t ,2,2

)∣∣∣
]

+ 1

4
E
[∣∣∣log

(
1−

(
1−µπd ,t ,1,1

)
µπd ,t ,2,1

)
log

(
1−

(
1−µπd ,t ,1,2

)
µπd ,t ,2,2

)∣∣∣
]

+ 1

4
E
[∣∣∣log

(
1−µπd ,t ,1,1

(
1−µπd ,t ,2,1

))
log

(
1−µπd ,t ,1,2

(
1−µπd ,t ,2,2

))∣∣∣
]

+ 1

4
E
[∣∣∣log

(
1−

(
1−µπd ,t ,1,1

)(
1−µπd ,t ,2,1

))
log

(
1−

(
1−µπd ,t ,1,2

)(
1−µπd ,t ,2,2

))∣∣∣
]

.

The Cauchy-Schwarz inequality further gives that

|B⊗
d ,t (ρd ,t )| ≤ 1

4
E
[

log2
(
1−µπd ,t ,1,1µπd ,t ,2,1

)]1/2
E
[

log2
(
1−µπd ,t ,1,2µπd ,t ,2,2

)]1/2

+ 1

4
E
[

log2
(
1−

(
1−µπd ,t ,1,1

)
µπd ,t ,2,1

)]1/2
E
[

log2
(
1−

(
1−µπd ,t ,1,2

)
µπd ,t ,2,2

)]1/2

+ 1

4
E
[

log2
(
1−µπd ,t ,1,1

(
1−µπd ,t ,2,1

))]1/2
E
[

log2
(
1−µπd ,t ,1,2

(
1−µπd ,t ,2,2

))]1/2

+ 1

4
E
[

log2
(
1−

(
1−µπd ,t ,1,1

)(
1−µπd ,t ,2,1

))]1/2
E
[

log2
(
1−

(
1−µπd ,t ,1,2

)(
1−µπd ,t ,2,2

))]1/2
.

As µπd ,t ,1,1
dist= µπd ,t ,1,2 and µπd ,t ,1,1

dist= 1−µπd ,t ,1,1 thanks to Claim 10.2, we further get

|B⊗
d ,t (ρd ,t )| ≤ E

[
log2

(
1−µπd ,t ,1,1µπd ,t ,2,1

)]
.

23


Next, recalling (9.3),

E
[

log2
(
1−µπd ,t ,1,1µπd ,t ,2,1

)]
≤ E

[
log2

(
1−µπd ,t ,1,1

)]

≤ E
[
1

{
µπd ,t ,1,1 ≤

1

2

}∣∣∣log2
(
1−µπd ,t ,1,1

)∣∣∣
]
+E

[
1

{
µπd ,t ,1,1 >

1

2

}∣∣∣log2
(
1−µπd ,t ,1,1

)∣∣∣
]

≤ E
[
1

{
µπd ,t ,1,1 ≤

1

2

}
log2 2

]
+E


1

{
µπd ,t ,1,1 >

1

2

}
log

µπd ,t ,1,1(
1−µπd ,t ,1,1

) − logµπd ,t ,1,1




2


≤ log2 2+E
[

log2

(
µπd ,t ,1,1

1−µπd ,t ,1,1

)]
−2E


1

{
µπd ,t ,1,1 >

1

2

}
log

µπd ,t ,1,1(
1−µπd ,t ,1,1

) logµπd ,t ,1,1




≤ log2 2+E
[

log2

(
µπd ,t ,1,1

1−µπd ,t ,1,1

)]
= log2 2+E

[
ξ2
ρd ,t ,1,1

]
.

However E[ξ2
ρd ,t ,1,1] <∞, since ρd ,t ∈W2(R2). Thus, B⊗

d ,t (ρd ,t ) <∞.
Moreover, it is easy to see that the distribution of the marginal sample ξρd ,t ,1,1 is independent of for any t ∈ [0,1].

Call this distribution ρd and let ξρd
be a sample from ρd . Then the previous upper bound yields

η(d)2 ≤
∫ 1

0
|B⊗

d ,t (ρd ,t )|dt +|B⊗
d (ρd ,0)| ≤

∫ 1

0
log2 2+E

[
ξ2
ρd

]
dt + log2 2+E

[
ξ2
ρd

]
≤ 2

(
log2 2+E

[
ξ2
ρd

])
<∞.

□

11. PROOF OF PROPOSITION 2.12

We combine the results from the previous sections in order to analyse the variance process. As a first step we
derive a rough upper bound on the potential change in the number of satisfying assignments upon insertion of a
single clause (Lemmas 11.1 and 11.6). Subsequently we derive a combinatorial formula for the squared martingle
difference (Lemma 11.9), which easily implies Lemma 2.4. The combinatorial formula puts us in a position to
obtain an L2-bound on the squared martingale difference (Lemma 11.10). With these ingredients in place, we
complete the proof of Proposition 2.12 and of Theorem 1.1 in Section 11.5.

11.1. A pessimistic estimate. LetΦ,Ψ be two 2-CNFs on the same set of variables such thatΨ is obtained fromΦ

by adding a single clause e. We are going to need a baseline estimate of the difference | log Z (Φ̂)− log Z (Ψ̂)|. The
principal difficulty here is to assess the impact of the additional clause on the pruning operation. The issue is that
the extra clause may also cause additional pruning. Indeed, while clearly

F (Ψ̂) \ {e} ⊆ F (Φ̂),

i.e., any clause a ̸= e that survives pruning on Ψ also remains present in Φ̂, the pruned formula Ψ̂ may ironically
end up having strictly fewer clauses than Φ̂.

To get a handle on the potential repercussions of pruning, let {v, v ′} = ∂e be the variables that appear in clause
e. Let N (Φ, v) be the set of all literals l such that {v,¬v}∩L (Φ, {l }) ̸= ;. Thus, PUC may reach v or ¬v once l is
deemed true. Observe that v ∈N (Φ, v) and ¬v ∈N (Φ, v). Define N (Φ, v ′) analogously. Further, let

N (Φ,e) =
⋃

l∈N (Φ,v)∪N (Φ,v ′)
L (Φ, {l }). (11.1)

Thus, N (Φ,e) contains all literals thatPUC can reach by tracing the implications of a literal from N (Φ, v)∪N (Φ, v ′).
The definition of the sets N (Φ, v), N (Φ, v ′) ensures that

v,¬v ∈N (Φ, v), v ′,¬v ′ ∈N (Φ, v ′). (11.2)

Lemma 11.1. LetΦ be a 2-CNF formula. Suppose thatΨ is obtained fromΦ by adding a single clause e. Then
∣∣log(Z (Φ̂))− log(Z (Ψ̂))

∣∣≤ |N (Φ,e)| log2. (11.3)

As a first step towards the proof of Lemma 11.1 we observe that (11.1) can be rewritten as follows.
24


Claim 11.2. We have

N (Φ,e) =
⋃

l∈N (Φ,v)∪N (Φ,v ′)
L (Ψ, {l }). (11.4)

Proof. Clearly L (Ψ, {l }) ⊇L (Φ, {l }) for every literal l . Hence, we just need to show that

L (Ψ, {l }) ⊆N (Φ,e) for all l ∈N (Φ, v)∪N (Φ, v ′). (11.5)

Hence, let l ∈ N (Φ, v)∪N (Φ, v ′) and let l ′ ∈ L (Ψ, {l }). Then Lemma 4.2 shows that there exists an implication
chain

l = l0, a1, l1, . . . , ak , lk = l ′ (11.6)

comprising literals li and clauses ai ∈ F (Ψ) such that ai ≡ li−1 → li for all 1 ≤ i ≤ k. If ai ̸= e for all i , then the
chain (11.6) is contained in Φ and thus l ′ ∈ L (Φ, {l }) ⊆ N (Φ,e). Otherwise let 1 ≤ j ≤ k be the largest index such
that a j = e. Then l j is one of the constituent literals of a j and thus l j ∈ {v,¬v, v ′,¬v ′}. Furthermore, the implica-
tion chain l j , a j+1, l j+1, . . . , ak , lk = l ′ from l j to l ′ is contained in Φ. Therefore, (11.2) shows in combination with
Lemma 4.2 and (11.1) that l ′ ∈L (Φ, {l j }) ⊆N (Φ,e). □

We proceed to show that N (Φ,e) contains the variables of all clauses a ∈ F (Φ) on which the pruning processes
run onΦ,Ψ differ.

Claim 11.3. For any clause a ∈ F (Φ̂) \ F (Ψ̂) we have ∂a ⊆N (Φ,e).

Proof. Consider a clause a ∈ F (Φ) that was removed by pruning applied toΨ but not by pruning applied to Φ. Let
w, w ′ be the constituent literals of a, i.e., a ≡ w∨w ′. Then PUC(Ψ, {l }) added a to the set C (Ψ, {l }) of conflict clauses
for some literal l . Hence,

w,¬w, w ′,¬w ′ ∈L (Ψ, {l }). (11.7)

Consequently, Lemma 4.2 shows that for each literal k ∈ {¬w,¬w ′}, PUC(Ψ, {l }) traverses an implication chain

l0,k = l , a0,k , l1,k , a1,k , . . . , l jk ,k = k

of literals li ,k ∈ L (Ψ, {l }) and clauses ai ,k ≡ ¬li ,k ∨ li+1,k ≡ li ,k → li+1,k for 0 ≤ i < jk . Because a ̸∈ C (Φ, l ), at
least one of these two sequences includes the clause e and thus at least one of v,¬v and one of v ′,¬v ′. Hence,
l ∈N (Φ, v)∪N (Φ, v ′). Therefore, combining (11.4) and (11.7), we conclude that ∂a = {|w |, |w ′|} ⊆N (Φ,e). □

Let Φ̃ be the formula obtained from Φ̂ by removing all variables x ∈ V (Φ̂) such that {x,¬x}∩N (Φ,e) ̸= ; along
with their adjacent clauses.

Claim 11.4. For any σ̃ ∈ S(Φ̃) there exists σ ∈ S(Ψ̂) such that σx = σ̃x for all x ∈V (Φ̃).

Proof. Let Ψ̌ be a CNF with variables

V (Ψ̌) =V (Ψ̂) \V (Φ̃) = {x ∈V (Φ̂) : {x,¬x}∩N (Φ,e) ̸= ;}.

The clauses of Ψ̌ include all a ∈ F (Ψ̂) such that ∂a ⊆ V (Ψ̌). Additionally, for every clause a ∈ F (Ψ̂) that contains
exactly one literal l with |l | ∈V (Ψ̌) we include the literal l as a unit clause into Ψ̌. In light of Claim 11.3, to prove the
assertion it suffices to show that Ψ̌ is satisfiable. For then we could extend any σ ∈ S(Φ̃) to a satisfying assignment
of Ψ̂ by simply setting the variables x ∈V (Ψ̂) \V (Φ̃) in accordance with a satisfying assignment of Ψ̌.

As in the proof of Fact 2.2, to construct a satisfying assignment of Ψ̌ we fix an order l1, . . . , lk of the literals
N (Φ,e). Let σi be the assignment that PUC outputs on input Ψ, {li }. Further, define a {0,±1}-valued assignment
(σx )x∈V (Ψ̌) by letting σx =σi ,x for the least index i such that {x,¬x}∩L (Ψ, li ) ̸= ;.

We claim that

∀a ∈ F (Ψ̌)∃x ∈ ∂Ψ̌a : σx = sign(x, a) ; (11.8)

thus, we can turnσ into a satisfying assignment of Ψ̌ by assigning those variables y withσy = 0 arbitrarily. To verify
(11.8), we consider two cases separately.

25


Case 1: |∂Ψ̌a| = 2: then a ∈ F (Ψ). Let ∂a = {x, x ′} and let i be the smallest index such that L (Ψ, li )∩{x,¬x, x ′,¬x ′} ̸=
;. Also let l , l ′ be the constitutent literals of a such that |l | = x and |l ′| = x ′. Suppose that l ∈ L (Ψ, {li }).
If ¬l ̸∈ L (Ψ, li ), then σx = sign(x, a) by construction. Hence, assume that l ,¬l ∈ L (Ψ, li ). Then the
construction in Steps 1–2 of PUC ensures that l ′ ∈ L (Ψ, li ) as well. Moreover, if ¬l ′ ̸∈ L (Ψ, li ), then
σx ′ = sign(x ′, a). Finally, the case l , l ′,¬l ,¬l ′ ∈ L (Ψ, li ) cannot occur because otherwise a would have
been pruned, i.e., a ̸∈ F (Ψ̂).

Case 2: |∂Ψ̌a| = 1: there exists a clause b ∈ F (Ψ̂) and literals l , l ′ with |l ′| ̸∈V (Ψ̌) such that b = l ∨ l ′ and a = l .
Let i be the least index such that {l ,¬l }∩L (Ψ, {li }) ̸= ;. If ¬l ∈ L (Ψ, {li }), then PUC(Ψ, {li }) would have
added l ′ to the set L (Ψ, {li }) as well and thus |l ′| ∈V (Ψ̌). But |l ′| ̸∈V (Ψ̌). Hence, {l ,¬l }L (Ψ, {li }) = {l } and
thus σ|l | =σi ,|l | = signΨ(|l |,b) = signΨ̌(|l |, a).

Thus, in either case σ satisfies clause a. □

In perfect analogy to the above let Ψ̃ be the formula obtained from Ψ̂ by removing all variables x ∈ V (Ψ̂) such
that {x,¬x}∩N (Φ,e) ̸= ;, along with their adjacent clauses.

Claim 11.5. For any σ̃ ∈ S(Ψ̃) there exists σ ∈ S(Φ̂) such that σx = σ̃x for all x ∈V (Ψ̃).

Proof. Let Φ̌ be a CNF with variables V (Φ̌) = V (Φ̂) \ V (Ψ̃). Include in Φ̌ all a ∈ F (Φ̂) with ∂a ⊆ V (Φ̌). Moreover, for
every a ∈ F (Φ̂) that contains exactly one literal l with |l | ∈V (Ψ̌) add l as a clause to Φ̌. As in the proof of Claim 11.4
it suffices to construct a satisfying assignment of Φ̌. Due to (11.1) the same argument as in the proof of Claim 11.4
extends. □

Proof of Lemma 11.1. We use Claim 11.4 to prove that Z (Φ̂) ≤ 2|N (Φ,e)|Z (Ψ̂); similar reasoning based on Claim 11.5
yields the reverse bound. To show the desired bound split a satisfying assignment σ ∈ S(Φ̂) up into two parts
σ̃ = (σx ){x,¬x}∩N (Φ,e)=;, σ̌ = (σx ){x,¬x}∩N (Φ,e )̸=;. Claim 11.4 shows that the number of possible first parts σ̃ for
σ ∈ S(Φ̂) is bounded by Z (Ψ̂), because every σ̃ extends to a satisfying assignment of Ψ̂. Moreover, the total number
of possible second parts is bounded by 2|N (Φ,e)|. □

11.2. A tail bound. As a next step we are going to derive a bound on the r.h.s. of (11.3) on random formulas. More
specifically, obtain the formulaΦ′ fromΦ by deleting the last clause am . Let N ′ = |N (Φ′, am)|.
Lemma 11.6. There exists c = c(d) > 0 such that for all t > c we have

P
[

N ′ > t 2]≤ c exp(−t/c).

As a first step we are going to estimate the size of the set N (Φ′, x1) that contains all literals l such that L (Φ′, l )∩
{x1,¬x1} ̸= ;.

Claim 11.7. There exists c1 = c1(d) > 0 such that for all t > c1 we have P
[|N (Φ′, x1)| > t

]≤ c1 exp(−t/c1).

Proof. We use a classical branching process argument. Let R be the set of literals l such that x1 ∈ L (Φ′, l ). By
symmetry it suffices to bound |R|.

For every l ∈ R there exists an alternating sequence l = l0, a1, l1, a2, . . . , lk = x1 of literals and clauses such that
ai ≡ ¬li−1 ∨ li . Flipping the negations along this sequence yields a reverse sequence l ′0 = ¬x1 = ¬lk , a′

1 = ak , l ′1 =
¬lk−1, . . . , l ′k = ¬l such that a′

i ≡ ¬l ′i−1 ∨ l ′i . Hence, R is precisely the set of literals l that are reachable from x1

via such an alternating sequence l ′0, a′
1, . . . , l ′k . Furthermore, for any literal l the expected number of clauses ai

such that ai ≡ l ∨ l ′ for some other literal l ′ equals m/2n ∼ d/2. Therefore, |R| is stochastically dominated by the
progeny of a branching process with offspring Po(d/2). Standard branching process tail bounds therefore yield the
desired bound on |R|. □

Claim 11.8. There exists c2 = c2(d) > 0 such that for all t > c2 and for every literal l ̸= x1 we have

P
[|L (Φ′, l )| > t | x1 ∈L (Φ′, l )

]≤ c2 exp(−t/c2).

Proof. We combine a branching process argument with Bayes’ formula. Specifically, because the formula Φ′ is
random, the set L (Φ′, l ) \ {l } is random given its size. Hence, for an integer ℓ we have

P
[
x1 ∈L (Φ′, l ) | |L (Φ′, l )| = ℓ]= ℓ−1

2n −1
. (11.9)

26


Furthermore, the size |L (Φ′, l )| is stochastically dominated by the progeny of a branching process with offspring
Po(d/2). Therefore, there exists c ′2 = c ′2(d) > 0 such that for all t > c ′2 we have

P
[|L (Φ′, l )| > t

]≤ c ′2 exp(−t/c ′2). (11.10)

Moreover, for any d > 0 there exists c ′′2 = c ′′2 (d) > 0 such that

P
[
x1 ∈L (Φ′, l )

]≥ c ′′2 /n. (11.11)

Hence, combining (11.9)–(11.11) with Bayes’ rule, we obtain for ℓ> c ′2,

P
[|L (Φ′, l )| = ℓ | x1 ∈L (Φ′, l )

]≤ P
[
x1 ∈L (Φ′,ℓ) | |L (Φ′, l )| = ℓ]P[|L (Φ′, l )| = ℓ]

P
[
x1 ∈L (Φ′, l )

] ≤ c ′2
c ′′2
ℓexp(−ℓ/c ′2),

which implies the assertion. □

Proof of Lemma 11.6. Let R(ℓ) be the event that there exists l ∈N (Φ′, x1)\{x1} such that |L (Φ′, l )| > ℓ. Claim 11.7
implies that there exists c3 = c3(d) > 0 such that

P
[
x1 ∈L (Φ′, l )

]≤ c3/n. (11.12)

Hence, by Claim 11.8, (11.12) and the union bound,

P [R(ℓ)] ≤
∑

l ̸=x1

P
[
x1 ∈L (Φ′, l ), |L (Φ′, l )| > ℓ]≤ 2c2c3 exp(−ℓ/c2). (11.13)

Furthermore, Claim 11.7 shows that

P
[
N (Φ′, x1) \ {x1} > ℓ]≤ c1 exp(−ℓ/c1). (11.14)

Combining (11.13) and (11.14), we obtain

P

[ ∑
l∈N (Φ′,x1)

|L (Φ′, l )| > ℓ2

]
≤P [R(ℓ)]+P

[
N (Φ′, x1) \ {x1} > ℓ]≤ c1 exp(−ℓ/c1)+2c2c3 exp(−ℓ/c2). (11.15)

By symmetry the same bound holds with x1 replaced by ¬x1. Therefore, the assertion follows from (11.15) and the
union bound. □

11.3. The squared martingale difference. We derive a combinatorial formula for the squared martingale differ-
ences X 2

i . Let

∆(M) = log

(
Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
· log

(
Z (Φ̂2(M ,m −M))

Z (Φ̂2(M −1,m −M))

)
,

∆′(M) = log

(
Z (Φ̂1(M −1,m −M +1))

Z (Φ̂1(M −1,m −M))

)
· log

(
Z (Φ̂2(M −1,m −M +1))

Z (Φ̂2(M −1,m −M))

)
,

∆′′(M) = log

(
Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
· log

(
Z (Φ̂2(M −1,m −M +1))

Z (Φ̂2(M −1,m −M))

)
.

Lemma 11.9. We have mX 2
M = E[

∆(M)+∆(M)′−2∆′′(M) |FM
]

.

Proof. This follows from a direct computation. □

Proof of Lemma 2.4. Lemma 2.4 is an immediate consequence of Lemma 11.9. □

11.4. An L2-bound. The following L2-bound will enable us to deal with error terms.

Lemma 11.10. Uniformly for all 1 ≤ M ≤ m we have E
[
∆(M)2 +∆′(M)2 +∆′′(M)2

]=O(1).
27


Proof. We will bound E[∆(M)2]; the bounds on the other two terms follow analogously. Invoking the Cauchy-
Schwarz inequality, we obtain

E
[
∆(M)2]= E

[
log2

(
Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
· log2

(
Z (Φ̂2(M ,m −M))

Z (Φ̂2(M −1,m −M))

)]

≤ E
[

log4
(

Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)]1/2

E

[
log4

(
Z (Φ̂2(M ,m −M))

Z (Φ̂2(M −1,m −M))

)]1/2

= E
[

log4
(

Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)]
. (11.16)

Furthermore, the random formula Φ1(M ,m −M) is obtained from Φ1(M −1,m −M) by adding a single random
clause aM , which is independent ofΦ1(M −1,m −M). Therefore, Lemma 11.1 implies that

log

(
Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
≤ |N (Φ1(M −1,m −M), aM )| log2. (11.17)

Moreover, since |N (Φ1(M−1,m−M), aM )| has the same distribution as the random variable N ′ from Lemma 11.6,
we obtain

E
[|N (Φ1(M −1,m −M), aM )|4]=O(1) (11.18)

uniformly for all M . Finally, the assertion follows from (11.16)–(11.18). □

To facilitate the following steps we introduce trunacted versions of∆(M),∆′(M),∆′′(M): for B > 0 and x > 0 let

ΛB (x) =





−B if log(x) <−B ,

B if log(x) > B ,

log(x) otherwise.

Further, let

∆B (M) =ΛB

(
Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
·ΛB

(
Z (Φ̂2(M ,m −M))

Z (Φ̂2(M −1,m −M))

)
,

∆′
B (M) =ΛB

(
Z (Φ̂1(M −1,m −M +1))

Z (Φ̂1(M −1,m −M))

)
·ΛB

(
Z (Φ̂2(M −1,m −M +1))

Z (Φ̂2(M −1,m −M))

)
,

∆′′
B (M) =ΛB

(
Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
·ΛB

(
Z (Φ̂2(M −1,m −M +1))

Z (Φ̂2(M −1,m −M))

)
.

Combining Lemma 11.10 with the Cauchy-Schwarz inequality, we obtain the following.

Corollary 11.11. For any ε> 0 there exists B > 0 such that for all 1 ≤ M ≤ m we have

E |∆(M)−∆B (M)|+E
∣∣∆′(M)−∆′

B (M)
∣∣+E

∣∣∆′′(M)−∆′′
B (M)

∣∣< ε.

11.5. The variance process. In light of Lemma 11.9, to prove Proposition 2.12 we need to show that

1

m

m∑
M=1

E
[
∆(M)+∆′(M)−2∆′′(M) |FM

]→ η(d)2 in probability.

To this end we divide the above sum up into batches Σ̄(L,L′) =Σ(L,L′)+Σ′(L,L′)−2Σ′′(L,L′), where

Σ(L,L′) = 1

L′−L

L′−1∑
M=L

E [∆(M) |FM ] ,

Σ′(L,L′) = 1

L′−L

L′−1∑
M=L

E
[
∆′(M) |FM

]
,

Σ′′(L,L′) = 1

L′−L

L′−1∑
M=L

E
[
∆′′(M) |FM

]
.

28


Then for any sequence 1 = L0 < ·· · < Lk = m we have

1

n

m∑
M=1

E
[
∆(M)+∆′(M)−2∆′′(M) |FM

]=
k∑

i=1

Li −Li−1

n
Σ̄(Li−1,Li ).

The following lemma is the centerpiece of the proof.

Lemma 11.12. For any ε > 0 there exists ω > 0 such that uniformly for all 1 ≤ L < L′ ≤ m with ω ≤ L′−L ≤ 2ω we
have

E
∣∣∣Σ(L,L′)−B⊗

d ,t (πd ,t )
∣∣∣+E

∣∣∣Σ′(L,L′)−B⊗
d ,0(πd ,0)

∣∣∣+E
∣∣∣Σ′′(L,L′)−B⊗

d ,0(πd ,0)
∣∣∣< ε+o(1), where t = L/m.

We will carry out the details for the first term E|Σ(L,L′)−B⊗
d ,t (πd ,t )|, which is the most delicate; similar but

slightly simpler steps yield the other two estimates. We begin by replacing ∆(M) by its truncated version ∆B (M).
Accordingly, let

ΣB (L,L′) = 1

L′−L

L′−1∑
M=L

E [∆B (M) |FM ] ,

Σ′
B (L,L′) = 1

L′−L

L′−1∑
M=L

E
[
∆′

B (M) |FM
]

,

Σ′′
B (L,L′) = 1

L′−L

L′−1∑
M=L

E
[
∆′′

B (M) |FM
]

.

Claim 11.13. For any ε> 0 there exists B0 > 0 such that for all B > B0 and all L,L′ > 0 we have

E
∣∣Σ(L,L′)−ΣB (L,L′)

∣∣< ε+o(1).

Proof. This is an immediate consequence of Corollary 11.11. □
We proceed to relate the change in the pruned partition function to the marginal distribution of the truth values

of the variables of the additional clause aM .

Claim 11.14. Let B > 0. W.h.p. we have

ΛB

(
Z (Φ̂h(M ,m −M))

Z (Φ̂h(M −1,m −M))

)
=ΛB

(
1−

∏
y∈∂aM

µΦ̂h (M−1,m−M)

(
σy ̸= sign(y, aM )

)
)
+o(1) (h = 1,2).

Proof. Since the functionΛB is bounded and continuous, this follows from Proposition 2.5. □
A combinatorial interpretation of Σ(L,L′) is that the sum gauges the cumulative effect of adding a total of L′−L

‘shared’ clauses, one after the other. Claim 11.14 expresses the effect of adding a shared clause in terms of the
marginals of the formula Φ̂h(M −1,m−M). So long as the total number L′−L of clauses added is not too large, we
may expect that this marginal distribution does not shift all to much as we add clauses one by one. This is what
the following claim verifies.

Claim 11.15. Let t = M/m. If L′−L =O(1), then w.h.p. we have

L′−1∑
M=L

W1(πΦ̂1(M−1,m−M),Φ̂2(M−1,m−M),πd ,t ) = o(1).

Proof. This follows from Corollary 2.10. □
As a next step we truncate the functional B⊗

d ,t from (1.4). Hence, for B > 0 let

B⊗
B ,d ,t (π) = E

[
ΛB

(
1− (1{r −1,1 =−1}+ r −1,1µπ,−1,1,1)(1{r −1,2 =−1}+ r −1,2µπ,−1,2,1)

)

ΛB
(
1− (1{r −1,1 =−1}+ r −1,1µπ,−1,1,2)(1{r −1,2 =−1}+ r −1,2µπ,−1,2,2)

)]
. (11.19)

Claim 11.16. For any ε> 0 there exists B0 > 0 such that for all B > B0 and all t ∈ [0,1] we have∣∣∣B⊗
d ,t (πd ,t )−B⊗

B ,d ,t (πd ,t )
∣∣∣< ε.

Proof. Since B⊗
B ,d ,t (πd ,t ) ↑B⊗

d ,t (πd ,t ) as B →∞, this follows from Proposition 2.8. □
29


Proof of Lemma 11.12. Lemma 10.1 ensures that B⊗
d ,t (πd ,t ) <∞ for all t . Moreover, Claims 11.13 and 11.16 imply

that we just need to show that for large B > 0,

E
∣∣∣ΣB (L,L′)−B⊗

B ,d ,t (πd ,t )
∣∣∣< ε. (11.20)

Let t = L/m and let (SM )L≤M<L′ be independent copies of the random variable
2∏

h=1
ΛB

(
1− (1{r −1,1 =−1}+ r −1,1µπd ,t ,−1,1,h)(1{r −1,2 =−1}+ r −1,2µπd ,t ,−1,2,h)

)
.

Furthermore, let

D M =
2∏

h=1
ΛB

(
Z (Φ̂h(M ,m −M))

Z (Φ̂h(M −1,m −M))

)

Then Claims 11.14– 11.15 show that

E

[
W1

(
L−1∑

M=L
D M ,

L−1∑
M=L

SM

)]
= o(1). (11.21)

Finally, since
∑L−1

M=L SM is a sum of bounded independent random variables, (11.20) follows from (11.21) and the
strong law of large numbers. □
Proof of Proposition 2.12. The second equality in (2.14) follows from Corollary 11.11, Lemma 11.12 and the triangle
inequality. Thus, we are left to verify the first condition. Since Lemma 11.9 shows that

X 2
M = 1

m
E
[
∆(M)+∆(M)′−2∆′′(M) |FM

]
,

it suffices to prove that

E max
1≤M≤m

∆(M)2 +E max
1≤M≤m

∆′(M)2 +E max
1≤M≤m

∆′′(M)2 = o(m). (11.22)

We will bound the first expectation; similar arguments apply to the others.
Retracing the steps of the proof of Lemma 11.10, we write

∆(M)2 = log2
(

Z (Φ̂1(M ,m −M))

Z (Φ̂1(M −1,m −M))

)
· log2

(
Z (Φ̂2(M ,m −M))

Z (Φ̂2(M −1,m −M))

)
.

SinceΦ1(M ,m−M) is obtained fromΦ1(M −1,m−M) by adding the random clause aM , Lemma 11.1 implies that

log

(
Z (Φ̂h(M ,m −M))

Z (Φ̂h(M −1,m −M))

)
≤ |N (Φh(M −1,m −M), aM )| log2 (h = 1,2). (11.23)

As |N (Φh(M −1,m −M), aM )| has the same distribution as N ′ from Lemma 11.6, we obtain

E
[|N (Φ1(M −1,m −M), aM )|4]=O(1)

uniformly for all M . Therefore, Markov’s inequality implies

P
[|N (Φ1(M −1,m −M), aM )| > m1/3]≤O(m−4/3). (11.24)

Finally, (11.22) follows from (11.23), (11.24) and the union bound. □
As a final preparation towards the proof of Proposition 2.11 we need a lower bound on log Z (Φ̂).

Lemma 11.17. We have Var(log Z (Φ̂)) =Ω(n).

Proof. LetC be the set of isolated sub-formulas of Φ̂with precisely three clauses and three variables that are acyclic
and whose unique variable of degree two appears with the same sign in both its adjacent clauses. Moreover, let C ′

be the set of isolated sub-formulas of Φ̂ with precisely three clauses and three variables that are acyclic such that
the unique variable of degree two appears with two different signs in its adjacent clauses. Then E|C| = E|C′| =Ω(n)
and w.h.p. we have

Var(|C| | |C|+ |C′|) = Var(|C′| | |C|+ |C′|) =Ω(n). (11.25)

Additionally, for each sub-formula C ∈ C we have Z (C ) = 5, while for C ′ ∈ C′ we have Z (C ′) = 4. Since with
the sum ranging over the connected components C of Φ̂ we have log Z (Φ̂) = ∑

C log Z (C ), the assertion follows
from (11.25). □

30


Proof of Corollary 2.11. The corollary is an immediate consequence of Lemma 2.4, Lemma 10.1,
Corollary 11.11, Lemma 11.12 and Lemma 11.17. □

12. PROOF OF THEOREM 1.1

We derive Theorem 1.1 from the following general martingale central limit theorem, which is a special case of [33,
Theorem 3.2] (see also the subsequent remark there).

Theorem 12.1 ([33, Theorem 3.2]). Let (Z n,i ,Fn,i )0≤i≤mn ,n≥1 be a zero-mean, square-integrable martingale array
with differences X n,i = Z n,i −Z n,i−1 for 1 ≤ i ≤ mn . Assume that there exists a constant η2 such that

lim
n→∞ max

1≤i≤mn
|X n,i | = 0 in probability, (12.1)

lim
n→∞

mn∑
i=1

X 2
n,i = η2 in probability, (12.2)

E

[
max

1≤i≤mn
X 2

n,i

]
is bounded in n. (12.3)

Then Z n,mn converges in distribution to a Gaussian distribution with mean zero and variance η2.

Proof of Theorem 1.1. We apply Theorem 12.1 to the filtration (Fn,M )0≤M≤mn from Section 2.8 and to the Doob
martingale (Z n,M −E[

Z n,M
]
)M from (2.13). This is zero-mean by construction and square-integrable, as log Z (Φ̂) is

non-negative and bounded above by n. Let X n,M = Z n,M −Z n,M−1 be the martingale differences. Proposition 2.12
immediately implies conditions (12.1)–(12.2) of Theorem 12.1 since L1-convergence implies convergence in prob-
ability. Condition (12.3) also follows from Proposition 2.12 by observing that

E

[
max

1≤M≤mn
X 2

n,M

]
≤ E

[
mn∑

M=1
X 2

n,M

]
≤ E

∣∣∣∣∣
mn∑

M=1
X 2

n,M −η(d)2

∣∣∣∣∣+η(d)2.

Furthermore, Lemma 10.1 guarantees that η(d) <∞, while Corollary 2.11 shows that η(d) > 0. Thus, the assertion
follows from Theorem 12.1. □
Acknowledgement. Amin Coja-Oghlan’s research is supported by DFG CO 646/3, DFG CO 646/5 and DFG CO
646/6. Pavel Zakharov’s research is supported by DFG CO 646/6. Haodong Zhu’s research is supported by the Eu-
ropean Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agree-
ment no. 945045, and by the NWO Gravitation project NETWORKS under grant no. 024.002.003. Noela Müller’s
research is supported by the NWO Gravitation project NETWORKS under grant no. 024.002.003. We thank Nicloa
Kistler for helpful discussions, and particularly for bringing [19] to our attention.

REFERENCES

[1] E. Abbe, S. Li, A. Sly: Proof of the contiguity conjecture and lognormal limit for the symmetric perceptron. Proc. 62nd FOCS (2022) 327–338.
[2] E. Abbe, A. Montanari: On the concentration of the number of solutions of random satisfiability formulas. Random Structures and Algo-

rithms 45 (2014) 362–382.
[3] D. Achlioptas, A. Chtcherba, G. Istrate, C. Moore: The phase transition in 1-in-k SAT and NAE 3-SAT. Proc. 12th SODA (2001) 721–722.
[4] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802.
[5] D. Achlioptas, A. Coja-Oghlan, M. Hahn-Klimroth, J. Lee, N. Müller, M. Penschuck, G. Zhou: The number of satisfying assignments of

random 2-SAT formulas. Random Structures and Algorithms 58 (2021) 609–647.
[6] D. Achlioptas, C. Moore: Random k-SAT: two moments suffice to cross a sharp threshold. SIAM Journal on Computing 36 (2006) 740–762.
[7] D. Achlioptas, A. Naor, Y. Peres: Rigorous location of phase transitions in hard optimization problems. Nature 435 (2005) 759–764.
[8] D. Achlioptas, Y. Peres: The threshold for random k-SAT is 2k ln2−O(k). Journal of the AMS 17 (2004) 947–973.
[9] D. Aldous, J. Steele: The objective method: probabilistic combinatorial optimization and local weak convergence. In: H. Kesten (ed.):

Probability on Discrete Structures. Springer (2004).
[10] B. Aspvall, M.F. Plass, R.E. Tarjan: A linear-time algorithm for testing the truth of certain quantified boolean formulas. Information Pro-

cessing Letters 8 (1979) 121–123.
[11] P. Ayre, A. Coja-Oghlan, P. Gao, N. Müller: The satisfiability threshold for random linear equations. Combinatorica 40 (2020) 179–235
[12] V. Bapst, A. Coja-Oghlan, C. Efthymiou: Planting colourings silently. Combintorics, Probability and Computing 26 (2017) 338-366.
[13] V Bapst, A. Coja-Oghlan, S. Hetterich, F. Rassmann, D. Vilenchik: The condensation phase transition in random graph coloring. Commu-

nications in Mathematical Physics 341 (2016) 543–606.
[14] P. J. Bickel, P. A. Freedman. Some asymptotic theory for the bootstrap. Annals of Statistics 9 (1981) 1196–1217.
[15] B. Bollobás, C. Borgs J. Chayes, J. Kim, D. Wilson: The scaling window of the 2-SAT transition. Random Structures and Algorithms 18 (2001)

201–256.

31


[16] S. Cao: Central limit theorems for combinatorial optimization problems on sparse Erdős-Rényi graphs. Annals of Applied Probability 31
(2021) 1687–1723.

[17] P. Cheeseman, B. Kanefsky, W. Taylor: Where the really hard problems are. Proc. IJCAI (1991) 331–337.
[18] G. Bresler, B. Huang: The algorithmic phase transition of random k-SAT for low degree polynomials. Proc. 62nd FOCS (2021) 298–309.
[19] W.-K. Chen, P. Dey, D. Panchenko: Fluctuations of the free energy in the mixed p-spin models with external field. Probability Theory and

Related Fields 168 (2017) 41–53.
[20] V. Chvátal, B. Reed: Mick gets some (the odds are on his side). Proc. 33th FOCS (1992) 620–627.
[21] A. Coja-Oghlan, T. Kapetanopoulos, N. Müller: The replica symmetric phase of random constraint satisfaction problems. Combinatorics,

Probability and Computing 29 (2020) 346-422.
[22] A. Coja-Oghlan, F. Krzakala, W. Perkins, L. Zdeborová: Information-theoretic thresholds from the cavity method. Advances in Mathematics

333 (2018) 694–795.
[23] A. Coja-Oghlan, K. Panagiotou: The asymptotic k-SAT threshold. Advances in Mathematics 288 (2016) 985–1068.
[24] A. Coja-Oghlan, N. Wormald: The number of satisfying assignments of random regular k-SAT formulas. Combinatorics, Probability and

Computing 27 (2018) 496–530.
[25] J. Ding, A. Sly, N. Sun: Proof of the satisfiability conjecture for large k. 20 Annals of Mathematics 196 (2022) 1–388.
[26] S. Dovgal, É. de Panafieu, V. Ravelomanana: Exact enumeration of satisfiable 2-SAT formulae. arXiv:2108.08067 (2021).
[27] O. Dubois, J. Mandler: The 3-XORSAT threshold. Proc. 43rd FOCS (2002) 769–778.
[28] G. Eagleson: Martingale convergence to mixtures of infinitely divisible laws. Annals of Probability 3 (1975) 557–562.
[29] C. Efthymiou: On sampling symmetric gibbs distributions on sparse random graphs and hypergraphs. Proc. 49th ICALP (2022) #57.
[30] E. Friedgut: Sharp thresholds of graph properties, and the k-SAT problem. Journal of the AMS 12 (1999) 1017–1054.
[31] M. Glasgow, M. Kwan, A. Sah, M. Sawhney: A central limit theorem for the matching number of a sparse random graph. arXiv:2402.05851

(2024).
[32] A. Goerdt: A threshold for unsatisfiability. J. Comput. Syst. Sci. 53 (1996) 469–486
[33] P. Hall, C. Heyde: Martingale limit theory and its applications. Academic Press (1980).
[34] E. Kreačič: Some problems related to the Karp-Sipser algorithm on random graphs. Ph.D. thesis, University of Oxford, 2017.
[35] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová: Gibbs states and the set of solutions of random constraint

satisfaction problems. Proc. National Academy of Sciences 104 (2007) 10318–10323.
[36] M. Mézard, A. Montanari: Information, physics and computation. Oxford University Press (2009).
[37] M. Mézard, G. Parisi, R. Zecchina: Analytic and algorithmic solution of random satisfiability problems. Science 297 (2002) 812–815.
[38] R. Monasson, R. Zecchina: The entropy of the k-satisfiability problem. Phys. Rev. Lett. 76 (1996) 3881.
[39] A. Montanari, D. Shah: Counting good truth assignments of random k-SAT formulae. Proc. 18th SODA (2007) 1255–1264.
[40] E. Mossel, J. Neeman, A Sly: Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields (2014)

1–31.
[41] D. Panchenko: On the replica symmetric solution of the K -sat model. Electron. J. Probab. 19 (2014) #67.
[42] D. Panchenko, M. Talagrand: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130 (2004) 319–336.
[43] B. Pittel, G. Sorkin: The satisfiability threshold for k-XORSAT. Combinatorics, Probability and Computing 25 (2016) 236–268.
[44] F. Rassmann: On the number of solutions in random graph k-colouring. Combinatorics, Probability and Computing 28 (2019) 130–158.
[45] R. Robinson, N. Wormald: Almost all regular graphs are Hamiltonian. Random Structures and Algorithms 5 (1994) 363–374.
[46] A. Sly, N. Sun, Y. Zhang: The number of solutions for random regular NAE-SAT. Probability Theory and Related Fields 182 (2022) 1–109.
[47] M. Talagrand: The high temperature case for the random K -sat problem. Probab. Theory Related Fields 119 (2001) 187–212.
[48] L. Valiant: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8 (1979) 410–421.

ARNAB CHATTERJEE, arnab.chatterjee@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT-
MUND 44227, GERMANY.

AMIN COJA-OGHLAN, amin.coja-oghlan@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATH-
EMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY.

NOELA MÜLLER, n.s.muller@tue.nl, EINDHOVEN UNIVERSITY OF TECHNOLOGY, DEPARTMENT OF MATHEMATICS AND COMPUTER SCI-
ENCE, METAFORUM MF 4.084, 5600 MB EINDHOVEN, THE NETHERLANDS.

CONNOR RIDDLESDEN, c.d.riddlesden@tue.nl, EINDHOVEN UNIVERSITY OF TECHNOLOGY, DEPARTMENT OF MATHEMATICS AND COM-
PUTER SCIENCE, METAFORUM MF 4.084, 5600 MB EINDHOVEN, THE NETHERLANDS.

MAURICE ROLVIEN, maurice.rolvien@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT-
MUND 44227, GERMANY.

PAVEL ZAKHAROV, pavel.zakharov@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATHEMATICS,
12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY.

HAODONG ZHU, h.zhu1@tue.nl, EINDHOVEN UNIVERSITY OF TECHNOLOGY, DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE,
5600 MB EINDHOVEN, THE NETHERLANDS.

32


BELIEF PROPAGATION GUIDED DECIMATION ON RANDOM k-XORSAT

ARNAB CHATTERJEE, AMIN COJA-OGHLAN, MIHYUN KANG, LENA KRIEG, MAURICE ROLVIEN, GREGORY B. SORKIN

ABSTRACT. We analyse the performance of Belief Propagation Guided Decimation, a physics-inspired message passing
algorithm, on the random k-XORSAT problem. Specifically, we derive an explicit threshold up to which the algorithm
succeeds with a strictly positive probability Ω(1) that we compute explicitly, but beyond which the algorithm with high
probability fails to find a satisfying assignment. In addition, we analyse a thought experiment called the decimation
process for which we identify a (non-)reconstruction and a condensation phase transition. The main results of the present
work confirm physics predictions from [Ricci-Tersenghi and Semerjian: J. Stat. Mech. 2009] that link the phase transitions
of the decimation process with the performance of the algorithm, and improve over partial results from a recent article
[Yung: Proc. ICALP 2024]. MSc: 60B20, 68W20

1. INTRODUCTION AND RESULTS

1.1. Background and motivation. The random k-XORSAT problem shares many characteristics of other intensely
studied random constraint satisfaction problems (‘CSPs’) such as random k-SAT. For instance, random k-XORSAT
possesses a sharp satisfiability threshold preceded by a reconstruction or ‘shattering’ phase transition that affects
the geometry of the set of solutions [2, 12, 17, 24]. As in random k-SAT, these transitions appear to significantly
impact the performance of certain classes of algorithms [7, 16]. At the same time, random k-XORSAT is more
amenable to mathematical analysis than, say, random k-SAT. This is because the XOR operation is equivalent to
addition modulo two, which is why a k-XORSAT instance translates into a linear system over F2. In effect, k-
XORSAT can be solved in polynomial time by means of Gaussian elimination. In addition, the algebraic nature of
the problem induces strong symmetry properties that simplify its study [3].

Because of its similarities with other random CSPs combined with said relative amenability, random k-XORSAT
provides an instructive benchmark. This was noticed not only in combinatorics, but also in the statistical physics
community, which has been contributing intriguing ‘predictions’ on random CSPs since the early 2000s [19, 22].
Among other things, physicists have proposed a message passing algorithm called Belief Propagation Guided Dec-
imation (‘BPGD’) that, according to computer experiments, performs impressively on various random CSPs [21].
Furthermore, Ricci-Tersenghi and Semerjian [25] put forward a heuristic analysis of BPGD on random k-SAT and
k-XORSAT. Their heuristic analysis proceeds by way of a thought experiment based on an idealized version of the
algorithm. We call this thought experiment the decimation process. Based on physics methods Ricci-Tersenghi and
Semerjian surmise that the decimation process undergoes two phase transitions, specifically a reconstruction and
a condensation transition. A key prediction of Ricci-Tersenghi and Semerjian is that these phase transitions are
directly linked to the performance of the BPGD algorithm. Due to the linear algebra-induced symmetry properties,
in the case of random k-XORSAT all of these conjectures come as elegant analytical expressions.

The aim of this paper is to verify the predictions from [25] on random k-XORSAT mathematically. Specifically,
our aim is to rigorously analyse the BPGD algorithm on random k-XORSAT, and to establish the link between its
performance and the phase transitions of the decimation process. A first step towards a rigorous analysis of BPGD
on random k-XORSAT was undertaken in a recent contribution by Yung [27]. However, Yung’s analysis turns out
to be not tight. Specifically, apart from requiring spurious lower bounds on the clause length k, Yung’s results do
not quite establish the precise connection between the decimation process and the performance of BPGD. One
reason for this is that [27] relies on ‘annealed’ techniques, i.e., essentially moment computations. Here we instead
harness ‘quenched’ arguments that were partly developed in prior work on the rank of random matrices over finite
fields [3, 8].

Throughout we let k ≥ 3 and n ≥ k be integers and d > 0 a positive real. Let m
dist= Po(dn/k) and let F = F (n,d ,k)

be a random k-XORSAT formula with variables x1, . . . , xn and m random clauses of length k. To be precise, every

Amin Coja-Oghlan is supported by DFG CO 646/3, DFG CO 646/5 and DFG CO 646/6.
This research was funded in part by the Austrian Science Fund (FWF) 10.55776/I6502.

1

ar
X

iv
:2

50
1.

17
65

7v
1 

 [
m

at
h.

C
O

] 
 2

9 
Ja

n 
20

25


clause of F is an XOR of precisely k distinct variables, each of which may or may not come with a negation sign.
The m clauses are drawn uniformly and independently out of the set of all 2k

(n
k

)
possibilities. Thus, d equals the

average number of clauses that a given variable xi appears in. An event E occurs with high probability (‘w.h.p.’) if
limn→∞P [F ∈ E ] = 1. We always keep d ,k fixed as n →∞.

1.2. Belief Propagation Guided Decimation. The first result vindicates the predictions from [25] concerning the
success probability of BPGD algorithm. BPGD sets its ambitions higher than merely finding a solution to the k-
XORSAT instance F : the algorithm attempts to sample a solution uniformly at random. To this end BPGD assigns
values to the variables x1, . . . , xn of F one after the other. In order to assign the next variable the algorithm attempts
to compute the marginal probability that the variable is set to ‘true’ under a random solution to the k-XORSAT in-
stance, given all previous assignments. More precisely, suppose BPGD has assigned values to the variables x1, . . . , xt

already. WriteσBP(x1), . . . ,σBP(xt ) ∈ {0,1} for their values, with 1 representing ‘true’ and 0 ‘false’. Further, let F BP,t be
the simplified formula obtained by substituting σBP(x1), . . . ,σBP(xt ) for x1, . . . , xt . We drop any clauses from F BP,t

that contain variables from {x1, . . . , xt } only, deeming any such clauses satisfied. Thus, F BP,t is a XORSAT formula
with variables xt+1, . . . , xn . Its clauses contain at least one and at most k variables, as well as possibly a constant
(the XOR of the values substituted in for x1, . . . , xt ).

Let σF BP,t be a uniformly random solution of the XORSAT formula F BP,t , assuming that F BP,t remains satis-
fiable. Then BPGD aims to compute the marginal probability P

[
σF BP,t (xt+1) = 1 | F BP,t

]
that a random satisfying

assignment of F BP,t sets xt+1 to true. This is where Belief Propagation (‘BP’) comes in. An efficient message passing
heuristic for computing precisely such marginals, BP returns an ‘approximation’µF BP,t ofP

[
σF BP,t (xt+1) = 1 | F BP,t

]
.

We will recap the mechanics of BP in Section 2.2 (the value µF BP,t is defined precisely in (2.11)). Having computed
the BP ‘approximation’, BPGD proceeds to assign xt+1 the value ‘true’ with probability µF BP,t , otherwise sets xt+1 to
‘false’, then moves on to the next variable. The pseudocode is displayed as Algorithm 1.

Data: a random k-XORSAT formula F with variables x1, . . . , xn conditioned on being satisfiable
1 for t = 0, . . . ,n −1 do
2 compute the BP approximation µF BP,t ;

3 set σBP(xt+1) =
{

1 with probability µF BP,t

0 with probability 1−µF BP,t

;

4 return σBP;

Algorithm 1: The BPGD algorithm.

Let us pause for a few remarks. First, if the BP approximations are exact, i.e., if F BP,t is satisfiable and µF BP,t =
P

[
σF BP,t (xt+1) = 1 | F BP,t

]
for all t , then Bayes’ formula shows that BPGD outputs a uniformly random solution of F .

However, there is no universal guarantee that BP returns the correct marginals. Accordingly, the crux of analysing
BPGD is precisely to figure out whether this is the case. Indeed, the heuristic work of [25] ties the accuracy of BP to
a phase transition of the decimation process thought experiment, to be reviewed momentarily.

Second, the strategy behind the BPGD algorithm, particularly the message passing heuristic for ‘approximating’
the marginals, generalizes well beyond k-XORSAT. For instance, the approach applies to k-SAT verbatim. That said,
due to the algebraic nature of the XOR operation, BPGD is far easier to analyse on k-XORSAT. In fact, in XORSAT the
marginal probabilities are guaranteed to be half-integral as seen in Fact 2.3, i.e.,

P
[
σF BP,t (xt+1) = 1 | F BP,t

] ∈ {0,1/2,1}. (1.1)

As a consequence, on XORSAT the BPGD algorithm effectively reduces to a purely combinatorial algorithm called
Unit Clause Propagation [19, 25] as per Proposition 6.1, a fact that we will exploit extensively (see Section 6).

2


1.3. A tight analysis of BPGD. In order to state the main results we need to introduce a few threshold values. To
this end, given d ,k and an additional real parameter λ≥ 0, consider the functions 1

φd ,k,λ :[0,1] → [0,1], z 7→ 1−exp
(
−λ−d zk−1

)
, (1.2)

Φd ,k,λ :[0,1] →R, z 7→ exp
(
−λ−d zk−1

)
− d(k −1)

k
zk +d zk−1 − d

k
. (1.3)

Let α∗(λ) = α∗(d ,k,λ) ∈ [0,1] be the smallest and α∗(λ) = α∗(d ,k,λ) ≥ α∗(d ,k,λ) ∈ [0,1] the largest fixed point of
φd ,k,λ. Figure 1 visualizesΦ(z) for different values of θ. Further, define

dmin(k) =
(

k −1

k −2

)k−2

, dcore(k) = sup
{
d > 0 :α∗(0) = 0

}
, dsat(k) = sup

{
d > 0 :Φd ,k,0(α∗(0)) ≤Φd ,k,0(0)

}
. (1.4)

The value dsat(k) is the random k-XORSAT satisfiability threshold [3, 12, 24]. Thus, for d < dsat(k) the random
k-XORSAT formula F possesses satisfying assignments w.h.p., while F is unsatisfiable for d > dsat(k) w.h.p. Further-
more, dcore(k) equals the threshold for the emergence of a giant 2-core within the k-uniform hypergraph induced
by F [3, 23]. This implies that for d < dcore(k) the set of solutions of F is contiguous in a certain well-defined way,
while for dcore(k) < d < dsat(k) the set of solutions shatters into an exponential number of well-separated clus-
ters [16, 19]. Moreover, a simple linear time algorithm is known to find a solution w.h.p. for d < dcore(k) [16]. The
relevance of dmin(k) will emerge momentarily. A bit of calculus reveals that

0 < dmin(k) < dcore(k) < dsat(k) < k. (1.5)

The following theorem determines the precise clause-to-variable densities where BPGD succeeds/fails. To be
precise, in the ‘successful’ regime BPGD does not actually succeed with high probability, but with an explicit prob-
ability strictly between zero and one, which is displayed in Figure 2 for k = 3,4,5.

0.2 0.4 0.6 0.8 1
z

-0.1
-0.05

0.05
0.1

0.15
0.2
Φd, k, λ

FIGURE 1. Φd ,k,λ for k = 3 and d = 2.4,
for λ from 0 to 0.3 (maximum at z = 0)
and from 0.4 to 0.9

0.5 1 1.5 2 2.5
d

0.2

0.4

0.6

0.8

1

k= 3
k= 4
k= 5

FIGURE 2. Success probability of BPGD
for 0 < d < dmin(k) and various k.

Theorem 1.1. Let k ≥ 3.

(i) If d < dmin(k), then

lim
n→∞P

[
BPGD(F ) finds a satisfying assignment

]= exp

(
−d 2(k −1)2

4

∫ 1

0

z2k−4(1− z)

1−d(k −1)zk−2(1− z)
dz

)
. (1.6)

(ii) If dmin(k) < d < dsat(k), then

P
[
BPGD(F ) finds a satisfying assignment

]= o(1).

Theorem 1.1 vindicates the predictions from Ricci-Tersenghi and Semerjian [25, Section 4] as to the perfor-
mance of BPGD, and improves over the results from Yung [27]. Specifically, Theorem 1.1 (i) verifies the formula for
the success probability from [25, Eq. (38)]. Combinatorially, the formula (1.6) results from the possible presence of
bounded length cycles (so called toxic cycles) that may cause the algorithm to run into contradictions. By contrast,
Yung has no positive result on the performance of BPGD. Moreover, Yung’s negative results [27, Theorems 2–3]

1The function Φd ,k,λ is known in physics parlance as the “Bethe free entropy” [8, 19]. The stationary points of Φd ,k,λ coincide with the

fixed points of φd ,k,λ, as we will verify in Section 2.1.

3


only apply to k ≥ 9 and to d > dcore(k), while Theorem 1.1 (ii) covers all k ≥ 3 and kicks in at the correct threshold
dmin(k) < dcore(k) predicted in [25].

1.4. The decimation process. In addition to the BPGD algorithm itself, the heuristic work [25] considers an ide-
alised version of the algorithm, the decimation process. This thought experiment highlights the conceptual reasons
behind the success/failure of BPGD. Just like BPGD, the decimation process assigns values to variables one after the
other for good. But instead of the BP ‘approximations’ the decimation process uses the actual marginals given its
previous decisions. To be precise, suppose that the input formula F is satisfiable and that variables x1, . . . , xt have
already been assigned values σDC(x1), . . . ,σDC(xt ) in the previous iterations. Obtain F DC,t by substituting the val-
ues σDC(x1), . . . ,σDC(xt ) for x1, . . . , xt and dropping any clauses that do not contain any of xt+1, . . . , xn . Thus, F DC,t

is a XORSAT formula with variables xt+1, . . . , xn . Let σF DC,t be a random satisfying assignment of F DC,t . Then the
decimation process sets xt+1 according to the true marginal P

[
σF DC,t (xt+1) = 1 | F DC,t

]
, thus ultimately returning

a uniformly random satisfying assignment of F .

Data: a random k-XORSAT formula F , conditioned on being satisfiable
1 for t = 0, . . . ,n −1 do
2 compute πF DC,t =P

[
σF DC,t (xt+1) = 1 | F DC,t

]
;

3 set σDC(xt ) =
{

1 with probability πF DC,t

0 with probability 1−πF DC,t

;

4 return σDC;

Algorithm 2: The decimation process.

Clearly, if indeed the BP ‘approximations’ are correct, then the decimation process and BPGD are identical. Thus,
a key question is for what parameter regimes the two process coincide or diverge, respectively. As it turns out, this
question is best answered by parametrize not only in terms of the average variable degree d , but also in terms of
the ‘time’ parameter t of the decimation process.

1.5. Phase transitions of the decimation process. Ricci-Tersenghi and Semerjian heuristically identify several
phase transitions in terms of d and t that the decimation process undergoes. We will confirm these predictions
mathematically and investigate how they relate to the performance of BPGD.

The first set of relevant phase transitions concerns the so-called non-reconstruction property. Roughly speak-
ing, non-reconstruction means that the marginal πF DC,t =P

[
σF DC,t (xt+1) = 1 | F DC,t

]
is determined by short-range

rather than long-range effects. Since Belief Propagation is essentially a local algorithm, one might expect that the
(non-)reconstruction phase transition coincides with the threshold up to which BPGD succeeds; cf. the discussions
in [5, 17].

To define (non-)reconstruction precisely, we associate a bipartite graph G(F DC,t ) with the formula F DC,t . The
vertices of this graph are the variables and clauses of F DC,t . Each variable is adjacent to the clauses in which it
appears. For a (variable or clause) vertex v of G(F DC,t ) let ∂v be the set of vs neighbours. More generally, for an
integer ℓ ≥ 1 let ∂ℓv be the set of vertices of G(F DC,t ) at shortest path distance precisely ℓ from v . Following [17],
we say that F DC,t has the non-reconstruction property if

lim
ℓ→∞

limsup
n→∞

E
[∣∣∣P

[
σF DC,t (xt+1) = 1

∣∣∣F DC,t ,
{
σF DC,t (y)

}
y∈∂2ℓxt+1

]
−P[

σF DC,t (xt+1) = 1 | F DC,t
]∣∣∣

∣∣F satisfiable
]
= 0.

(1.7)

Conversely, F DC,t has the reconstruction property if

liminf
ℓ→∞

liminf
n→∞ E

[∣∣∣P
[
σF DC,t (xt+1) = 1

∣∣∣F DC,t ,
{
σF DC,t (y)

}
y∈∂2ℓxt+1

]
−P[

σF DC,t (xt+1) = 1 | F DC,t
]∣∣∣

∣∣F sat.
]
> 0. (1.8)

To parse (1.7), notice that in the left probability term we condition on both the outcome F DC,t of the first t steps
of the decimation process and on the values σF DC,t (y) that the random solution σF DC,t assigns to the variables
y at distance exactly 2ℓ from xt+1. By contrast, in the right probability term we only condition on F DC,t . Thus,
the second probability term matches the probability πF DC,t from the decimation process. Hence, (1.7) compares
the probability that a random solution sets xt+1 to one given the values σF DC,t (y) of all variables y at distance 2ℓ
from xt+1 with plain marginal probability that xt+1 is set to one. What (1.7) asks is that these two probabilities

4


be asymptotically equal in the limit of large ℓ, with high probability over the choice of F and the prior steps of
the decimation process. Thus, so long as non-reconstruction holds ‘long-range effects’, meaning anything beyond
distance 2ℓ for large enough but fixed ℓ, are negligible.

Confirming the predictions from [25], the following theorem identifies the precise regimes of d , t where (non-
)reconstruction holds. To state the theorem, we need to know that for dmin(k) < d < dsat(k) the polynomial d(k −
1)zk−2(1− z)−1 has precisely two roots 0 < z∗ = z∗(d ,k) < z∗ = z∗(d ,k) < 1; we are going to prove this as part of
Proposition 2.2 below. Let

λ∗ =λ∗(d ,k) =− log(1− z∗)− z∗
(k −1)(1− z∗)

>λ∗ =λ∗(d ,k) = max

{
0,− log(1− z∗)− z∗

(k −1)(1− z∗)

}
≥ 0, (1.9)

θ∗ = θ∗(d ,k) = 1−exp(−λ∗) > θ∗ = θ∗(d ,k) = 1−exp(−λ∗). (1.10)

Additionally, let λcond(d ,k) be the solution to the ODE

∂λcond(d ,k)

∂d
=− α∗(λcond(d ,k))k −α∗(λcond(d ,k))k

k(α∗(λcond(d ,k))−α∗(λcond(d ,k)))
, λcond(dsat(k),k) = 0 (1.11)

on the interval (dmin,dsat] and set θcond = θcond(d ,k) = 1−exp(−λcond(d ,k)). Note that

θ∗ < θcond < θ∗.

Theorem 1.2. Let k ≥ 3 and let 0 ≤ t = t (n) ≤ n be a sequence such that limn→∞ t/n = θ ∈ (0,1).

(i) If d < dmin(k), then F DC,t has the non-reconstruction property w.h.p.
(ii) If dmin(k) < d < dsat(k) and θ < θ∗ or θ > θcond, then F DC,t has the non-reconstruction property w.h.p.

(iii) If dmin(k) < d < dsat(k) and θ∗ < θ < θcond, then F DC,t has the reconstruction property w.h.p.

Theorem 1.2 shows that dmin(k) marks the precise threshold of d up to which the decimation process F DC,t

exhibits non-reconstruction for all 0 ≤ t ≤ n w.h.p. By contrast, for dmin(k) < d < dsat(k) there is a regime of t where
reconstruction occurs. In fact, as Proposition 2.2 shows, for d > dcore(k) we have θ∗ = 0 and thus reconstruction
holds even at t = 0, i.e., for the original, undecimated random formula F . Prior to the contribution [25], it had
been suggested that this precise scenario (reconstruction on the original problem instance) is the stone on which
BPGD stumbles [5]. In fact, Yung’s negative result kicks in at this precise threshold dcore(k). However, Theorems 1.1
and 1.2 show that matters are more subtle. Specifically, for dmin(k) < d < dcore(k) reconstruction, even though
absent in the initial formula F , occurs at a later ‘time’ t > 0 as decimation proceeds, which suffices to trip BPGD up.
Also, remarkably, Theorem 1.2 shows that non-reconstruction is not ‘monotone’. The property holds for θ < θ∗ and
then again for θ > θcond, but not on the interval (θ∗,θcond) as visualised in Figure 3.

But there is one more surprise. Namely, Theorem 1.2 (ii) might suggest that for dmin(k) < d < dsat(k) Belief
Propagation manages to compute the correct marginals for t/n ∼ θ > θcond, as non-reconstruction kicks back in.
But remarkably, this is not quite true. Despite the fact that non-reconstruction holds, BPGD goes astray because
the algorithm starts its message passing process from a mistaken, oblivious initialisation. As a consequence, for
t/n ∼ θ ∈ (θcond,θ∗) the BP ‘approximations’ remain prone to error. To be precise, the following result identifies
the precise ‘times’ where BP succeeds/fails. To state the result let µF DC,t denote the BP ‘approximation’ of the
true marginal πF DC,t of variable xt+1 in the formula F DC,t created by the decimation process (see Section 2.2 for a
reminder of the definition). Also recall that πF DC,t denotes the correct marginal as used by the decimation process.

Theorem 1.3. Let k ≥ 3 and let 0 ≤ t = t (n) ≤ n be a sequence such that limn→∞ t/n = θ ∈ (0,1).

(i) If 0 < d < dmin(k) then µF DC,t =πF DC,t w.h.p.
(ii) If dmin(k) < d < dsat(k) and θ < θcond or θ > θ∗, then µF DC,t =πF DC,t w.h.p.

(iii) If dmin(k) < d < dsat(k) and θcond < θ < θ∗, then E
∣∣µF DC,t −πF DC,t

∣∣=Ω(1).

The upshot of Theorems 1.2–1.3 is that the relation between the accuracy of BP and reconstruction is sub-
tle. Everything goes well so long as d < dmin as non-reconstruction holds throughout and the BP approximations
are correct. But if dmin < d < dsat and θ∗ < θ < θcond, then Theorem 1.2 (iii) shows that reconstruction occurs.
Nonetheless, Theorem 1.3 (ii) demonstrates that the BP approximations remain valid in this regime. By contrast,
for θcond < θ < θ∗ we have non-reconstruction by Theorem 1.2 (iii), but Theorem 1.3 (iii) shows that BP misses its
mark with a non-vanishing probability. Finally, for θ > θ∗ everything is in order once again as BP regains its footing
and non-reconstruction holds. Unfortunately BPGD is unlikely to reach this happy state because the algorithm is
bound to make numerous mistakes at times t/n ∈ (θcond,θ∗).

5


2.0 2.2 2.4 2.6
d

0.00

0.05

0.10

0.15

dcoredmin dsat

*
cond

*

(A) k = 3

2.5 3.0 3.5
d

0.0

0.1

0.2

0.3

dcoredmin dsat

*
cond

*

(B) k = 4

2.5 3.0 3.5 4.0 4.5
d

0.0

0.1

0.2

0.3

0.4

dcoredmin dsat

*
cond

*

(C) k = 5

FIGURE 3. The phase diagrams for k = 3,4,5 with d ∈ (dmin,dsat) on the horizontal and θ on the
vertical axis. The hatched area displays the regime θ < θ∗ and θcond < θwhere non reconstruction
holds. In the non hatched area, where θ∗ < θ < θcond, we have reconstruction. Similarly, the
blue area displays θ < θcond and θ > θ∗ where BP is correct whereas in the orange area, BP is
inaccurate.

Theorems 1.2 and 1.3 confirm the predictions from [25, Section 4]. To be precise, while θcond matches the
predictions of Ricci-Tersenghi and Semerjian, the ODE formula (1.11) for the threshold, which is easy to evaluate
numerically, does not appear in [25]. Instead of the ODE formulation, Ricci-Tersenghi and Semerjian define λcond

as the (unique) λ ≥ 0 such that Φd ,k,λ(α∗) = Φd ,k,λ(α∗); Proposition 2.2 below shows that both are equivalent.
Illustrating Theorems 1.2–1.3, Figure 3 displays the phase diagram in terms of d and θ ∼ t/n for k = 3,4,5.

2. OVERVIEW

This section provides an overview of the proofs of Theorems 1.1–1.3. In the final paragraph we conclude with a
discussion of further related work. We assume throughout that k ≥ 3 is an integer and that 0 < d < dsat(k). Moreover,
t = t (n) denotes an integer sequence 0 ≤ t (n) ≤ n such that limn→∞ t (n)/n = θ ∈ (0,1).

2.1. Fixed points and thresholds. The first item on our agenda is to study the functions φd ,k,λ,Φd ,k,λ from (1.2)–
(1.3). Specifically, we are concerned with the maxima of Φd ,k,λ and the fixed points of φd ,k,λ, the combinatorial
relevance of which will emerge as we the analyse BPGD and the decimation process. We begin by observing that the
fixed points of φd ,k,λ are precisely the stationary points ofΦd ,k,λ.

Fact 2.1. For any d > 0,λ≥ 0 the stationary points z ∈ (0,1) ofΦd ,k,λ coincide with the fixed points of φd ,k,λ in (0,1).
Furthermore, for a fixed point z ∈ (0,1) of φd ,k,λ we have

Φ′′
d ,k,λ(z)





< 0 if φ′
d ,k,λ(z) < 1,

= 0 if φ′
d ,k,λ(z) = 1,

> 0 if φ′
d ,k,λ(z) > 1.

(2.1)

Proof. DifferentiatingΦd ,k,λ, we obtain

Φ′
d ,k,λ(z) = d(k −1)zk−2 (

φd ,k,λ(z)− z
)

. (2.2)

Hence, a point z ∈ (0,1) is a fixed point of φd ,k,λ iffΦ′
d ,k,λ(z) = 0. Differentiating (2.2) once more, we obtain

Φ′′
d ,k,λ(z) = d(k −1)zk−3

[
(k −2)

(
φd ,k,λ(z)− z

)+ z
(
φ′

d ,k,λ(z)−1
)]

. (2.3)

Clearly, if φd ,k,λ(z) = z, then (2.3) simplifies toΦ′′
d ,k,λ(z) = d(k −1)zk−2(φ′

d ,k,λ(z)−1), whence (2.1) follows. □

We recall that 0 ≤ α∗ = α∗(d ,k,λ) ≤ α∗ = α∗(d ,k,λ) ≤ 1 are the smallest and the largest fixed point of φd ,k,λ in
[0,1], respectively. Fact 2.1 shows thatΦd ,k,λ attains its global maximum in [0,1] at α∗ or α∗. Let

αmax =αmax(d ,k,λ) ∈ {α∗,α∗}

be the maximiser of Φd ,k,λ; if Φd ,k,λ(α∗) =Φd ,k,λ(α∗), set αmax = α∗. The following proposition characterises the
fixed points of φd ,k,λ and the maximiser αmax.

6


0.04 0.06 0.08 0.10 0.12
0.0

0.2

0.4

0.6

0.8
cond

* *

max
*

*

(A) αmax

0.04 0.06 0.08 0.10 0.12

0.08

0.10

0.12

0.14

0.16

(
)

cond
* *

( max)
( * )
( *)

(B) Φ(αmax)

FIGURE 4. αmax andΦ(αmax) for d = 2.4 and k = 3 from θ∗ to θ∗.

Proposition 2.2.

(i) If d < dmin(k), then for all λ> 0 we have α∗(d ,k,λ) =α∗(d ,k,λ), the function λ ∈ (0,∞) 7→α∗(d ,k,λ) ∈ (0,1) is
analytic, and α∗(d ,k,λ) is the unique stable fixed point of φd ,k,λ.

(ii) If dmin(k) < d < dsat(k), then the polynomial d(k −1)zk−2(1− z)−1 has precisely two roots 0 < z∗ < z∗ < 1, the
numbers λ∗,λ∗ from (1.9) satisfy 0 ≤λ∗ <λ∗ and the following is true.
(a) If λ<λ∗ or λ>λ∗, then α∗(d ,k,λ) =α∗(d ,k,λ) ∈ (0,1) is the unique stable fixed point of φd ,k,λ.
(b) If λ∗ <λ<λ∗, then 0 <α∗(d ,k,λ) <α∗(d ,k,λ) < 1 are the only stable fixed points of φd ,k,λ.
(c) The functions λ ∈ (0,λ∗) 7→α∗(d ,k,λ) and λ ∈ (λ∗,∞) 7→α∗(d ,k,λ) are analytic.
(d) If dmin(k) < d < dsat(k), then the solution λcond of (1.11) satisfies λ∗ <λcond =λcond(d) <λ∗ and

αmax(d ,k,λ) =
{
α∗(d ,k,λ) if λ<λcond,

α∗(d ,k,λ) if λ>λcond.

Furthermore,Φd ,k,λ(α∗(d ,k,λ)) ̸=Φd ,k,λ(α∗(d ,k,λ)) unlessλ=λcond. Thus, the functionλ 7→αmax(d ,k,λ)
is analytic on (0,λcond) and on (λcond,∞), but discontinuous at λ=λcond.

2.2. Belief Propagation. Having done our analytic homework, we proceed to recall how Belief Propagation com-
putes the ‘approximations’ µF BP,t that the BPGD algorithm relies upon. We will see that due to the inherent symme-
tries of XORSAT the Belief Propagation computations simplify and boil down to a simpler message passing process
called Warning Propagation. Subsequently we will explain the connection between Warning Propagation and the
fixed points α∗,α∗ of φd ,k,λ.

It is probably easiest to explain BP on a general XORSAT instance F with a set V (F ) of variables and a set C (F ) of
clauses of lengths between one and k. As in Section 1.5 we consider the graph G(F ) induced by F , with vertex set
V (F )∪C (F ) and an edge xa between x ∈V (F ) and a ∈C (F ) iff a contains x. Let ∂v = ∂F v be the set of neighbours
of v ∈V (F )∪C (F ). Additionally, given an assignment τ ∈ {0,1}∂a of the variables that appear in a, we write τ |= a iff
τ satisfies a.

With each clause/variable pair x, a such that x ∈ ∂a Belief Propagation associates two sequences of ‘messages’
(µF,x→a,ℓ)ℓ≥0, (µF,a→x,ℓ)ℓ≥0 directed from x to a and from a to x, respectively. These messages are probability
distributions on {0,1}, i.e.,

µF,x→a,ℓ = (µF,x→a,ℓ(0),µF,x→a,ℓ(1)), µF,x→a,ℓ = (µF,a→x,ℓ(0),µF,a→x,ℓ(1)) ∈ [0,1]2 and (2.4)

µF,x→a,ℓ(0)+µF,x→a,ℓ(1) =µF,a→x,ℓ(0)+µF,a→x,ℓ(1) = 1. (2.5)

The initial messages are uniform, i.e.,

µF,x→a,0(s) =µF,a→x,0(s) = 1/2 (s ∈ {0,1}). (2.6)

7


Further, the messages at step ℓ+1 are obtained from the messages at step ℓ via the Belief Propagation equations

µF,a→x,ℓ+1(s) ∝
∑

τ∈{0,1}∂a

1{τx = s, τ |= a}
∏

y∈∂a\{x}
µF,y→a,ℓ(τy ), (2.7)

µF,x→a,ℓ+1(s) ∝
∏

b∈∂x\{a}
µF,b→x,ℓ(s). (2.8)

In (2.7)–(2.8) the∝-symbol represents the normalisation required to ensure that the updated messages satisfy (2.5).
In the case of (2.8) such a normalization may be impossible because the expressions on the r.h.s. could vanish for
both s = 0 and s = 1. In this event we agree that

µF,x→a,ℓ+1(s) =
{
µF,x→a,ℓ(s) if µF,x→a,ℓ(s) ̸= 1/2

1{s = 0} otherwise
(s ∈ {0,1});

in other words, we retain the messages from the previous iteration unless its value was 1/2, in which case we set
µF,x→a,ℓ+1(0) = 1. The same convention applies to µF,a→x,ℓ+1(s). Further, at any time t the BP messages render a
heuristic ‘approximation’ of the marginal probability that a random solution to the formula F sets a variable x to
s ∈ {0,1}:

µF,x,ℓ(s) ∝
∏

b∈∂x
µF,b→x,ℓ(s). (2.9)

We set µF,x,ℓ(0) = 1−µF,x,ℓ(1) = 1 if the normalization in (2.9) fails, i.e., if
∑

s∈{0,1}
∏

b∈∂x µF,b→x,ℓ(s) = 0.

Fact 2.3. The BP messages and marginals are half-integral for all t , i.e., for all t ≥ 0 and s ∈ {0,1} we have

µF,x→a,ℓ(s),µF,a→x,ℓ(s),µF,x,ℓ(s) ∈ {0,1/2,1}. (2.10)

Furthermore, for all ℓ> 2
∑

a∈C (F ) |∂a| we have µF,x,ℓ(s) =µF,x,ℓ+1(s).

Proof. The half-integrality (2.10) follows from a straightforward induction on ℓ. Furthermore, another induction
on ℓ and inspection of (2.7)–(2.8) shows that for any x, a,ℓ such that µF,x→a,ℓ(1) ̸= 1/2 we have µF,x→a,ℓ+1(s) =
µF,x→a,ℓ(s) (s ∈ {0,1}). A similar statement holds for µF,a→x,ℓ+1(s). In particular, the number of messages that take
the value 1/2 is monotonically decreasing in ℓ. Since the total number of messages is bounded by 2

∑
a∈C (F ) |∂a|,

we conclude that the messages will have converged pointwise after this number of iterations. □

Finally, in light of Fact 2.3 it makes sense to define the approximations for BPGD by letting

µF BP,t = lim
ℓ→∞

µF BP,t ,xt+1,ℓ(1), µF DC,t = lim
ℓ→∞

µF DC,t ,xt+1,ℓ(1). (2.11)

2.3. Warning Propagation. Thanks to the half-integrality (2.10) of the messages, Belief Propagation is equivalent
to a purely combinatorial message passing procedure called Warning Propagation (‘WP’) [19]. Similar as BP, WP
also associates two message sequences (ωF,x→a,ℓ,ωF,a→x,ℓ)ℓ≥0 with every adjacent clause/variable pair. The mes-
sages take one of three possible discrete values {f,u,n} (‘frozen’, ‘uniform’, ‘null’). To trace the BP messages from
Section 2.2 actually only the two values {n,u} would be necessary. However, the third value f will prove useful in
order to compare the BP approximations with the actual marginals. Perhaps unexpectedly given the all-uniform
initialisation (2.6), we launch WP from all-frozen start values:

ωF,x→a,0 =ωF,a→x,0 = f for all a, x. (2.12)

Subsequently the messages get updated according to the rules

ωF,a→x,ℓ+1 =





n if ωF,y→a,ℓ = n for all y ∈ ∂a \ {x},

f if ωF,y→a,ℓ ̸= u for all y ∈ ∂a \ {x} and ωF,y→a,ℓ ̸= n for at least one y ∈ ∂a \ {x},

u otherwise,

(2.13)

ωF,x→a,ℓ+1 =





n if ωF,b→x,ℓ = n for at least one b ∈ ∂x \ {a},

f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x \ {a} and ωF,b→x,ℓ = f for at least one b ∈ ∂x \ {a},

u otherwise.

(2.14)

8


In addition to the messages we also define the mark of variable node x by letting

ωF,x,ℓ =





n if ωF,b→x,ℓ = n for at least one b ∈ ∂x,

f if ωF,b→x,ℓ ̸= n for all b ∈ ∂x and ωF,b→x,ℓ = f for at least one b ∈ ∂x,

u otherwise.

(2.15)

The following statement summarises the relationship between BP and WP.

Fact 2.4. For all t ≥ 0 and all x, a we have

µx→a,ℓ(1) = 1/2 ⇔ ωF,x→a,ℓ ̸= n, (2.16)

µa→x,ℓ(1) = 1/2 ⇔ ωF,a→x,ℓ ̸= n, (2.17)

µx,ℓ(1) = 1/2 ⇔ ωF,x,ℓ ̸= n. (2.18)

Moreover, for all ℓ> 2|C (F )| we have ωF,x→a,ℓ =ωF,x→a,ℓ+1 and ωF,a→x,ℓ =ωF,a→x,ℓ+1.

Proof. The fact thatωF,x→a,ℓ =ωF,x→a,ℓ+1 andωF,a→x,ℓ =ωF,a→x,ℓ+1 for all ℓ> 2|C (F )| follows from the observation
that the number of f-messages is monotonically decreasing, while the number of n-messages is monotonically
increasing. The equations (2.16)–(2.18) follow by induction on ℓ. Initially all the messages are uniform in BP, i.e.,
µx→a,0(1) =µa→x,0(1) = 1/2. By contrast, in WP, we start with all frozen values to both variables and clauses as given
by (2.12).Then from (2.13),(2.14) and (2.15), for ℓ = 0,(2.16)–(2.18) holds true. For ℓ = 1, we get the messages and
marginals in BP obtained from the messages at initial step. From (2.7) it follows that if the marginals are uniform
then from WP arguments (2.13), it is sure that ωF,a→x,1 ̸= n because ωF,y→a,0 = f. The same argument is valid for
the other way round. If the WP message at step ℓ= 1 is not null, then the BP message from (2.7) after normalization
become 1/2. So for ℓ= 1, (2.16) holds true.
Let us assume the (2.16) is true for any step ℓ.Then for step ℓ+ 1 the messages in BP is obtained from step ℓ as
in (2.7) is 1

2 implies in WP message ωF,a→x,ℓ+1 ̸= n because ωF,y→a,ℓ = u for at least one y ∈ ∂a \ {x}. Similarly, if
the WP message ωF,a→x,ℓ+1 ̸= n implies this can be either "uniform" or "frozen". Now, if there will be at least
one uniform incoming message then µa→x,ℓ+1(1) = 1/2 and for all frozen incoming messages it is straightforward
from the initialization of WP (2.12) which corresponds to µa→x,ℓ+1(1) = 1/2. So at step ℓ+1, (2.16) holds true. We
conclude that (2.16) holds true for every ℓ. Similarly, by induction on ℓ we can conclude that (2.17)–(2.18) also
hold true for every ℓ. □

Fact 2.4 implies that the WP messages and marks ‘converge’ in the limit of large ℓ, in the sense that eventually
they do not change any more. Let ωF,x→a ,ωF,a→x ,ωF,x ∈ {f,u,n} be these limits. Furthermore, let Vf,ℓ(F ), Vu,ℓ(F ),
Vn,ℓ(F ) be the sets of variables with the respective mark after ℓ ≥ 0 iterations. Also let Vf(F ),Vu(F ),Vn(F ) be the
sets of variables where the limit ωF,x takes the respective value. The following statement traces WP on the random
formula F DC,t produced by the decimation process.

Proposition 2.5. Let ε> 0 and assume that d > 0, t = t (n) ∼ θn satisfy one of the following conditions:

(i) d < dmin, or
(ii) d > dmin and θ ̸∈ {θ∗,θ∗}.

Then there exists ℓ0 = ℓ0(d ,θ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have
∣∣t +|Vn,ℓ(F DC,t )|−α∗n

∣∣< εn,
∣∣t +|Vf,ℓ(F DC,t )|− (α∗−α∗)n

∣∣< εn,
∣∣Vn(F DC,t )△Vn,ℓ(F DC,t )

∣∣< εn. (2.19)

2.4. The check matrix. Since the XOR operation is equivalent to addition modulo two, a XORSAT formula F with
variables x1, . . . , xn and clauses a1, . . . , am translates into a linear system over F2, as follows. Let AF be the m ×n-
matrix over F2 whose (i , j )-entry equals one iff variable x j appears in clause ai . Adopting coding parlance, we refer
to AF as the check matrix of F . Furthermore, let yF ∈ Fm

2 be the vector whose i th entry is one plus the sum of any
constant term and the number of negation signs of clause ai mod two. Then the solutions σ ∈ Fn

n of the linear
system AFσ= yF are precisely the satisfying assignments of F .

The algebraic properties of AF therefore have a direct impact on the satisfiability of F . For example, if AF has
rank m, we may conclude immediately that F is satisfiable. Furthermore, the set of solutions of F is an affine
subspace of Fn

2 (if non-empty). In effect, if F is satisfiable, then the number of satisfying assignments equals the
size of the kernel of AF . Hence the nullity nul AF = dimker AF of the check matrix is a key quantity.

Indeed, the single most significant ingredient towards turning the heuristic arguments from [25] into rigorous
proofs is a formula for the nullity of the check matrix of the XORSAT instance F DC,t from the decimation process.

9


To unclutter the notation set At = AF DC,t . We derive the following proposition from a recent general result about
the nullity of random matrices over finite fields [8, Theorem 1.1]. The proposition clarifies the semantics of the
functionΦd ,k,λ and its maximiser αmax. In physics jargonΦd ,k,λ is known as the Bethe free entropy.

Proposition 2.6. Let d > 0 and λ=− log(1−θ). Then

lim
n→∞nul At =Φd ,k,λ(αmax) in probability.

2.5. Null variables. Proposition 2.6 enables us to derive crucial information about the set of satisfying assign-
ments of F DC,t . Specifically, for any XORSAT instance F with variables x1, . . . , xn let V0(F ) be the set of variables xi

such that σi = 0 for all σ ∈ ker AF . We call the variables xi ∈ V0(F ) null variables. Since the set of solutions of F ,
if non-empty, is a translation of ker AF , any two solutions σ,σ′ of F set the variables in V0(F ) to exactly the same
values. The following proposition shows that WP identifies certain variables as null.

Proposition 2.7. W.h.p. the following two statements are true for any fixed integer ℓ> 0.

(i) We have Vn,ℓ(F DC,t ) ⊆V0(F DC,t ).
(ii) We have |Vu,ℓ(F DC,t )∩V0(F DC,t )| = o(n).

Propositions 2.6 and 2.7 enable us to calculate the number of null variables of F DC,t , so long as we remain clear
of the point θcond where αmax is discontinuous.

Proposition 2.8. If θ ̸= θcond then |V0(F DC,t )| =αmaxn +o(n) w.h.p.

Let us briefly summarise what we have learned thus far. First, because all Belief Propagation messages are
half-integral, BP reduces to WP. Second, Proposition 2.5 shows that the fixed points α∗,α∗ of φd ,k,λ determine the
number of variables marked n or f by WP. Third, the function Φd ,k,λ and its maximiser αmax govern the nullity
of the check matrix and thereby the number of null variables of F DC,t . Clearly, the null variables xi are precisely
the ones whose actual marginals P

[
σF DC,t (xi ) = s | F DC,t

]
are not uniform. As a next step, we investigate whether

BP/WP identify these variables correctly.
In light of Proposition 2.5, in order to investigate the accuracy of BP it suffices to compare the numbers of vari-

ables marked n by WP with the true marginals. The following corollary summarises the result.

Corollary 2.9. For any d, θ the following statements are true.

(i) If d < dmin, or d > dmin and θ < θcond, or d > dmin and θ > θ∗, then |V0(F DC,t )△Vn(F DC,t )| = o(n) w.h.p.
(ii) If d > dmin and θcond < θ < θ∗, then |V0(F DC,t )△Vn(F DC,t )| =Ω(n) w.h.p.

Thus, so long as d < dmin or d > dmin and θ < θcond or θ > θ∗, the BP/WP approximations are mostly correct.
By contrast, if d > dmin and θcond < θ < θ∗, the BP/WP approximations are significantly at variance with the true
marginals w.h.p. Specifically, w.h.p. BP deems Ω(n) frozen variables unfrozen, thereby setting itself up for failure.
Indeed, Corollary 2.9 easily implies Theorem 1.3, which in turn implies Theorem 1.1 (ii) without much ado.

In addition, to settle the (non-)reconstruction thresholds set out in Theorem 1.2 we need to investigate the
conditional marginals given the values of variables at a certain distances from xt+1 as in (1.7). This is where the
extra value f from the construction of WP enters. Indeed, for a XORSAT instance F with variables x1, . . . , xn and an
integer ℓ let V0,ℓ(F ) be the set of variables xi such that σi = 0 for all σ ∈ ker AF for which σh = 0 for all variables
xh ∈ ∂ℓxi .

Corollary 2.10. Assume that d > dmin and let ε> 0.

(i) If θ < θcond, then for any fixed ℓ we have |Vf,ℓ(F DC,t )∩V0,ℓ(F DC,t )| < εn w.h.p.
(ii) If θ > θcond, then there exists ℓ0 = ℓ0(d ,θ,ε) such that for any fixed ℓ> ℓ0 we have

|(Vn,ℓ(F DC,t )∪Vf,ℓ(F DC,t ))△V0,ℓ(F DC,t )| < εn w.h.p.

Comparing the number of actually frozen variables with the ones marked f by WP, we obtain Theorem 1.2.

2.6. Proving BPGD successful. We are left to prove Theorem 1.1. First, we need to compute the (strictly positive)
success probability of BPGD for d < dmin. At this point, the fact that BPGD has a fair chance of succeeding for
d < dmin should not come as a surprise. Indeed, Corollary 2.9 implies that the BP approximations of the marginals
are mostly correct for d < dmin, at least on the formula F DC,t created by the decimation process. Furthermore,
so long as the marginals are correct, the decimation process F DC,t and the execution of the BPGD algorithm F BP,t

10


move in lockstep. The sole difficulty in analysing BPGD lies in proving that the estimates of the algorithm are not
just mostly correct, but correct up to only a bounded expected number of discrepancies over the entire execution
of the algorithm. To prove this fact we combine the method of differential equations with a subtle analysis of the
sources of the remaining bounded number of discrepancies. These discrepancies result from the presence of short
(i.e., bounded-length) cycles in the graph G(F ). Finally, the proof of the second (negative) part of Theorem 1.1
follows by coupling the execution of BPGD with the decimation process, and invoking Theorem 1.3. The details of
both arguments can be found in Section 6.

2.7. Discussion. The thrust of the present work is to verify the predictions from [25] on the BPGD algorithm and
the decimation process rigorously. Concerning the decimation process, the main gap in the deliberations of Ricci-
Tersenghi and Semerjian [25] that we needed to plug is the proof of Proposition 2.8 on the actual number of null
variables in the decimation process. The proof of Proposition 2.8, in turn, hinges on the formula for the nullity
from Proposition 2.6, whereas Ricci-Tersenghi and Semerjian state the (as it turns out, correct) formulas for the
nullity and the number of null variables based on purely heuristic arguments.

Regarding the analysis of the BPGD algorithm, Ricci-Tersenghi and Semerjian state that they rely on the heuris-
tic techniques from the insightful article [11] to predict the formula (1.6), but do not provide any further details;
the article [11] principally employs heuristic arguments involving generating functions. By contrast, the method
that we use to prove (1.6) is a bit more similar to that of Frieze and Suen [13] for the analysis of a variant of the
unit clause algorithm on random k-SAT instances, for which they also obtain the asymptotic success probabil-
ity. Yet by comparison to the argument of Frieze and Suen, we pursue a more combinatorially explicit approach
that demonstrates that certain small sub-formulas that we call ‘toxic cycles’ are responsible for the failure of BPGD.
Specifically, the proof of (1.6) combines the method of differential equations with Poissonisation. Finally, the proof
of Theorem 1.1 (ii) is an easy afterthought of the analysis of the decimation process.

Yung’s work [27] on random k-XORSAT is motivated by the ‘overlap gap paradigm’ [14], the basic idea behind
which is to show that a peculiar clustered geometry of the set of solutions is an obstacle to certain types of algo-
rithms. Specifically, Yung only considers the Unit Clause Propagation algorithm and (a truncated version of) BPGD.
Following the path beaten in [20], Yung performs moment computations to establish the overlap gap property.
However, moment computations (also called ‘annealed computations’ in physics jargon) only provide one-sided
bounds. As a consequence, Yung’s results require spurious lower bounds on the clause length k (k ≥ 9 for Unit
Clause and k ≥ 13 for BPGD). By contrast, the present proof strategy pivots on the number of null variables rather
than overlaps, and Proposition 2.8 provides the precise ‘quenched’ count of null variables. A further improvement
over [27] is that the present analysis pinpoints the precise threshold up to which BPGD (as well as Unit Clause) suc-
ceeds for any k ≥ 3. Specifically, Yung proves that these algorithms fail for d > dcore, while Theorem 1.1 shows that
failure occurs already for d > dmin with dmin < dcore. Conversely, Theorem 1.1 shows that the algorithms succeed
with a non-vanishing probability for d < dmin. Thus, Theorem 1.1 identifies the correct threshold for the success
of BPGD, as well as the correct combinatorial phenomenon that determines this threshold, namely the onset of
reconstruction in the decimation process (Theorems 1.2 and 1.3).

The BPGD algorithm as detailed in Section 2.2 applies to a wide variety of problems beyond random k-XORSAT.
Of course, the single most prominent example is random k-SAT. Lacking the symmetries of XORSAT, random k-
SAT does not allow for the simplification to discrete messages; in particular, the BP messages are not generally
half-integral. In effect, BP and WP are no longer equivalent. In addition to random k-XORSAT, the article [25]
also provides a heuristic study of BPGD on random k-SAT. But once again due to the lack of half-integrality, the
formulas for the phase transitions no longer come as elegant finite-dimensional expressions. Instead, they now
come as infinite-dimensional variational problems. Furthermore, the absence of half-integrality also entails that
the present proof strategy does not extend to k-SAT.

The lack of inherent symmetry in random k-SAT can partly be compensated by assuming that the clause length
k is sufficiently large (viz. larger than some usually unspecified constant k0). Under this assumption the random
k-SAT version of both the decimation process and the BPGD algorithm have been analysed rigorously [6, 10]. The
results are in qualitative agreement with the predictions from [25]. In particular, the BPGD algorithm provably fails
to find satisfying assignments on random k-SAT instances even below the threshold where the set of satisfying
assignments shatters into well-separated clusters [1, 17]. Furthermore, on random k-SAT a more sophisticated
message passing algorithm called Survey Propagation Guided Decimation has been suggested [21, 25]. While on
random XORSAT Survey Propagation and Belief Propagation are equivalent, the two algorithms are substantially

11


different on random k-SAT. One might therefore hope that Survey Propagation Guided Decimation outperforms
BPGD on random k-SAT and finds satisfying assignments up to the aforementioned shattering transition. A neg-
ative result to the effect that Survey Propagation Guided Decimation fails asymptotically beyond the shattering
transition point for large enough k exists [15]. Yet a complete analysis of Belief/Survey Propagation Guided Deci-
mation on random k-SAT for any k ≥ 3 in analogy to the results obtained here for random k-XORSAT remains an
outstanding challenge.

Finally, returning to random k-XORSAT, a question for future work may be to investigate the performance of
various types of algorithms such as greedy, message passing or local search that aim to find an assignment that
violates the least possible number of clauses. Of course, this question is relevant even for d > dsat(k). A first step
based on the heuristic ‘dynamical cavity method’ was recently undertaken by Maier, Behrens and Zdeborová [18].

2.8. Preliminaries and notation. Throughout we assume that k ≥ 3 and 0 < d < dmin and θ ∈ (0,1) are fixed in-
dependently of n. We always let t = t (n) ∈ {0,1, . . . ,n} be an integer sequence such that limn→∞ t/n = θ. Un-
less specified otherwise we tacitly assume that n is sufficiently large for our various estimates to hold. Asymp-
totic notation such as O( · ) refers to the limit of large n by default, with k,d ,θ fixed. We continue to denote by
α∗ = α∗(λ) = α∗(d ,k,λ) and α∗ = α∗(λ) = α∗(d ,k,λ) the smallest/largest fixed points of φd ,k,λ in [0,1] and by
λ∗ =λ∗(d ,k), λ∗ =λ∗(d ,k), θ∗ = θ∗(d ,k), θ∗ = θ∗(d ,k) the quantities defined in (1.9)–(1.10).

For a formula F and a partial assignmentσ : U → {0,1} with U ⊆V (F ) let F [σ] be the simplified formula obtained
by substituting constants for the variables in U . The length of a clause of F [σ] is defined as the number of variables
from V (F ) \U that the clause contains.

The following fact provides the correctness of BP on formulas represented by acyclic graphs G(F ).

Fact 2.11 ([19, Chapter 14]). For a XORSAT Formula F with an acyclic bipartite graph G(F ) the BP marginals as
defined in (2.9) are exact, i.e.

lim
ℓ→∞

µF,x,ℓ(1) =P [σF (x) = 1] .

2.9. Organisation. The rest of the paper is organised as follows. Section 3 contains the proof of Proposition 2.2.
Subsequently in Section 4 we investigate Warning Propagation to prove Propositions 2.5 and 2.7. Furthermore,
Section 5 deals with the study of the check matrix; here we prove Propositions 2.6 and 2.8 as well as Corollaries 2.9
and 2.10. Additionally, with all these preparations completed we put all the pieces together to complete the proofs
of Theorems 1.2 and 1.3 in Section 5.5. Finally, Section 6 contains the proof of Theorem 1.1.

3. PROOF OF PROPOSITION 2.2

Even though a few steps are mildly intricate, the proof of Proposition 2.2 mostly consists of ‘routine calculus’. As a
convenient shorthand we introduce

ζλ(z) = ζd ,k,λ(z) =φd ,k,λ(z)− z = 1−exp
(
−λ−d zk−1

)
− z.

Its derivatives read

ζ′λ(z) = d(k −1) zk−2 exp(−λ−d zk−1) −1 and (3.1)

ζ′′λ(z) = d(k −1) zk−3 exp(−λ−d zk−1)
[

(k −2)−d(k −1)zk−1
]

. (3.2)

Also let

z0 = z0(d ,k) =
(

k −2

d(k −1)

) 1
k−1

. (3.3)

We begin by investigating the zeros of ζλ, obviously identical with fixed points of φd ,k,λ.

Lemma 3.1. Assume that λ> 0.

(i) The function ζλ has either one or three zeros in z ∈ [0,1], possibly including multiple zeros. If ζλ has three zeros,
then at least one lies in the interval [0, z0] and at least one lies in the interval [z0,1].

(ii) Also, ζλ has at most two stationary points, a minimum and a maximum, and if it has both, the minimum
occurs left of the maximum.

(iii) If ζλ has a unique zero, then α∗ is a stable fixed point of φd ,k,λ and supz∈[0,1]φ
′
d ,k,λ(z) < 1.

12


(iv) If ζλ has three zeros but no double zero, then α∗,α∗ are stable fixed points of φd ,k,λ. Additionally, φd ,k,λ pos-
sesses an unstable fixed point αu ∈ (α∗,α∗). Furthermore, there exists ε= ε(d ,λ) > 0 such that

sup
z∈[0,α∗+ε]

φ′
d ,k,λ(z) < 1, sup

z∈[α∗−ε,1]
φ′

d ,k,λ(z) < 1.

Proof. Since ζλ(0) > 0 and ζλ(1) < 0, the number of zeros must be odd, so towards (i) it suffices to show that there
cannot be more than three zeros. Indeed, by Rolle’s theorem, between any two zeros of ζλ there is a zero of ζ′

λ
.

So, if ζλ had four or more zeros then ζ′
λ

would have at least three zeros in (0,1], and in turn ζ′′
λ

would have at least
two. From (3.2) it is clear that ζ′′

λ
has only two zeros, at z = 0 (outside the relevant range) and at the inflection point

where k−2 = d(k−1)zk−1, namely for z = z0. So, ζ′′
λ

has at most two zeros, thus ζλ has at most three zeros, therefore
either one or three.

The second assertion follows from ζ′′
λ

(z0) = 0 and that by inspection of (3.2), ζ′′
λ

(z) is decreasing in z, so a local
minimum of ζλ at z1 implies ζ′′

λ
(z1) > 0 thus z1 < z0, and symmetrically a local maximum at z2 implies that z2 > z0.

Moving on to (iii), we observe that ζλ(α∗) = 0. Furthermore, since ζλ(0) > 0 while ζλ(1) < 0, we conclude that
ζ′
λ

(α∗) < 0, which implies that 0 <φ′
d ,k,λ(α∗) < 1. Hence, α∗ is a stable fixed point.

With respect to (iv), if ζλ has three zeros, then α∗ <α∗ are the smallest and the largest zero, respectively. Since
we assume that ζλ does not have a double zero, the same reasoning as under (iii) shows that ζ′

λ
(α∗) < 0 and thus

0 <φ′
d ,k,λ(α∗) < 1. Further, if ζλ has three zeros, then by Rolle’s theorem and (ii) the function has a local minimum

followed by a local maximum, which is followed by the zero α∗. Hence, ζ′
λ

(α∗) < 0, and thus 0 <φ′
d ,k,λ(α∗) < 1. □

The following statement implies that φd ,k,λ has only a single fixed point if d < dmin.

Lemma 3.2. Let λ> 0. If d < dmin, then ζλ has a unique zero and is strictly decreasing.

Proof. Suppose that z is a zero of ζλ. Then exp(−λ−d zk−1) = 1− z and thus

φ′
d ,k,λ(z) = d(k −1)zk−2 exp(−λ−d zk−1) = d(k −1)(zk−2 − zk−1). (3.4)

The expression on the r.h.s. is positive for z ∈ (0,1) and zero at z ∈ {0,1}. Moreover, its derivative works out to be

∂

∂z
d(k −1)(zk−2 − zk−1) = d(k −1)zk−3(k −2− (k −1)z).

Thus, the expression on the r.h.s. of (3.4) takes its maximum value of d((k −2)/(k −1))k−2 at z† = (k −2)/(k −1).
Hence, (3.4) implies that φ′

d ,k,λ(z) < 1 and thus ζ′
λ

(z) < 0. Consequently, the function φd ,k,λ only has stable fixed
points and thus has only a single fixed point by Lemma 3.1. □

Proceeding to average degrees d > dmin, we verify that the values λ∗,λ∗ from Section 1.5 are well defined and
satisfy the inequality (1.9).

Lemma 3.3. If d > dmin, then the polynomial d(k −1)zk−2(1− z)−1 has precisely two roots 0 < z∗ < z∗ < 1 and the
values λ∗,λ∗ defined in (1.9) satisfy λ∗ >λ∗. Furthermore, dcore > dmin and λ∗ = 0 iff d ≥ dcore.

Proof. Let z† = (k−2)/(k−1). The polynomial zk−2(1−z) is non-negative on [0,1], strictly increasing on [0, z†] and
strictly decreasing on [z†,1]. Hence, at z† the polynomial attains its maximum value of

max
0≤z≤1

zk−2(1− z) = (k −2)k−2

(k −1)k−1
. (3.5)

If d > dmin, the equation

zk−2(1− z) = 1

d(k −1)
. (3.6)

therefore has two distinct solutions 0 < z∗ < z† < z∗ < 1. Letting

l(z) =− log(1− z)− z

(1− z)(k −1)
,

we obtain λ∗ = l(z∗) and λ∗ = max{l(z∗),0}.
The function l(z) is positive and monotonically increasing on (0, z†), and monotonically decreasing on (z†,1).

Indeed, the derivative works out to be

l′(z) = k −2− (k −1)z

(k −1)(1− z)2 , (3.7)

13


which is positive for small z > 0 and has its unique zero at z†. Since z∗ < z†, we conclude that λ∗ > 0.
Further, [8, Theorem 1.2] shows that at d = dcore we have l(z∗) = 0. Since z∗ is an increasing function of d while

l(z) is strictly decreasing in z > z†, we conclude that l(z∗) < 0 for d > dcore, l(z∗) = 0 for d = dcore and l(z∗) =λ∗ > 0
for dmin < d < dcore.

Thus, we are left to verify that λ∗ > λ∗, which amounts to showing that l(z∗) < l(z∗). Rearranging (3.6) into
d = 1/((k −1)(1− z∗)zk−2

∗ ) and d = 1/((k −1)(1− z∗)z∗k−2) and applying the inverse function theorem, we obtain

∂z∗
∂d

=− (k −1)(1− z∗)2zk−1
∗

k −2− (k −1)z∗
,

∂z∗

∂d
=− (k −1)(1− z∗)2z∗k−1

k −2− (k −1)z∗ . (3.8)

Combining (3.7) and (3.8) with the chain rule, we arrive at

∂

∂d
l(z∗) =−zk−1

∗ ,
∂

∂d
l(z∗) =−z∗k−1. (3.9)

Since z∗ > z∗ for all d > dmin, integrating (3.9) on d shows that λ∗ >λ∗, thereby completing the proof. □

We are ready to identify the zeros of ζλ for d > dmin, depending on the regime of λ.

Lemma 3.4. Let λ> 0 and assume that d > dmin.

(i) If λ<λ∗, then ζλ has a unique zero.
(ii) If λ∗ <λ<λ∗, then ζλ has three distinct zeros.

(iii) If λ>λ∗, then ζλ has a unique zero.

Proof. Assume that d > dmin. For fixed k and d , the function ζλ varies continuously with λ, so there are contiguous
regimes ofλwhere it has one zero, regimes where it has three zeros, and these regimes are divided by critical values
of λ where ζλ has three zeros two of which consist of a double zero. In this case, the slope at the double zero is
also 0. (By Rolle’s theorem, the slope is 0 somewhere between the two zeros, and this is the limiting case.)

Thus, the separation between the regimes with one and three zeros occurs at values of λ such that ζλ(z) =
ζ′
λ

(z) = 0. Recalling the definition of ζλ and the derivative ζ′
λ

from (3.1), we obtain

1− z =exp(−λ−d zk−1) and d(k −1)zk−2 = 1

exp(−λ−d zk−1)
. (3.10)

Substituting the left equation for the exponential in the right equation, we conclude that (3.10) holds only if z is
a solution to (3.6). Further, substituting the two solutions 0 < z∗ < z† = (k −2)/(k −1) < z∗ into either one of the
equations from (3.10) and solving for λ, we obtain

λ∗ =− log(1− z∗)− z∗
(1− z∗)(k −1)

, λ⋆ =− log(1− z∗)− z∗

(1− z∗)(k −1)
.

Observe that λ∗ = max{λ⋆,0}.
Suppose 0 < λ< λ∗. Since ζλ∗ (z∗) = 0, the function λ 7→ ζλ(z∗) is strictly increasing and ζλ(0) > 0, we conclude

that ζλ has a zero in the interval (0, z∗). Similarly, if λ > λ∗, then the function ζλ has a zero in the interval (z∗,1).
Hence, (ii) is an immediate consequence of Lemma 3.1.

Now assume that 0 <λ<λ∗. Since λ∗ >λ∗ by Lemma 3.3, Lemma 3.1 implies that ζλ∗ has precisely three zeros.
The largest one is α∗ = z∗, satisfies α∗ > z† > z0, is a double zero and simultaneously a local maximum of ζλ∗ .
Since α∗ is a double zero and a local maximum, the smallest zero α∗ satisfies α∗ < z0 by Rolle’s theorem. Hence,
ζ′
λ∗ (z) < 0 for all 0 < z <α∗. Since the function λ 7→ ζλ(z) is strictly increasing for all z ∈ (0,1), Lemma 3.1(i) implies

that for λ<λ∗ only a single zero remains, which is smaller than z0.
Finally, suppose that λ > λ∗ > λ∗. Lemma 3.1(i) implies that ζλ∗ has precisely three zeros, with a double zero

occurring at z∗ and another zero at α∗(λ∗) > z† > z0. By Lemma 3.1 and the choice of z∗,λ∗, the double zero at z∗
is a local minimum. Therefore, ζ′

λ∗
(z) < 0 for all z > α∗. Since the function λ 7→ ζλ(z) is strictly increasing for all

z ∈ (0,1), we conclude that ζλ(z) > 0 for all λ > λ∗ and z ∈ [0, z0]. Hence, by Lemma 3.1(i) for λ > λ∗ only a single
zero remains, which lies in the interval [z0,1]. □

Combining Lemmas 3.3 and 3.4, we can now verify the analytic properties of the function λ 7→α∗ and λ 7→α∗.

Lemma 3.5. Let 0 < d < dsat and λ> 0.
14


(i) If d < dmin, then the function λ ∈ (0,∞) 7→α∗ =α∗ is analytic with derivative

∂α∗
∂λ

= 1−α∗
1−d(k −1)αk−2∗ (1−α∗)

< 1. (3.11)

(ii) If d > dmin, then λ ∈ (0,λ∗) 7→α∗ is analytic with derivative (3.11).
(iii) If d > dmin, then λ ∈ (λ∗,∞) 7→α∗ is analytic differentiable with derivative

∂α∗

∂λ
= 1−α∗

1−d(k −1)α∗k−2(1−α∗)
.

Proof. Assume that d > dmin andλ ∈ (0,λ∗). We know from the proof of Lemma 3.4 that z∗ is a double root and local
minimum of ζλ∗ . Furthermore, z∗ < z0 and the function λ 7→ ζλ(z) is strictly increasing in λ. Hence, Lemma 3.1
implies that for any 0 < λ < λ∗ the function ζλ has a unique zero in (0, z∗). Similarly, if d < dmin then Lemma 3.2
shows that ζλ has a unique zero atα∗. Therefore, the implicit function theorem implies that in cases (i) and (ii) the
function λ 7→α∗ is continuously differentiable.

Thus, we are left to work out ∂α∗(d ,k,λ)/∂λ. Consider the function A :
(z
λ

) 7→ (ζλ(z)
λ

)
, which is one-to-one in an

open interval around α∗. The Jacobi matrix reads

DA =
(
∂(φd ,k,λ)/∂α−1 ∂φd ,k,λ/∂λ

0 1

)
.

Furthermore,

∂φd ,k,λ

∂α

∣∣
α=α∗ = d(k −1)αk−2 exp(−λ−dαk−1)

∣∣
α=α∗ = d(k −1)αk−2

∗ (1−α∗),

∂φd ,k,λ

∂t

∣∣
α=α∗ = exp(−λ−dαk−1)

∣∣
α=α∗ = 1−α∗.

Hence, by the inverse function theorem the derivative of A −1 reads

(DA )−1 =
([
∂φd ,k,λ/∂α−1

]−1 −[
∂φd ,k,λ/∂λ

]
/
[
∂φd ,k,λ/∂α−1

]

0 1

)
, and thus

∂α∗
∂λ

=− ∂φd ,k,λ/∂λ

∂φd ,k,λ/∂α−1
= 1−α∗

1−d(k −1)αk−2∗ (1−α∗)
.

Thus, we obtain (i) and (ii). A similar argument applies to λ ∈ (λ∗,∞) 7→α∗ in the case d > dmin and yields (iii). □

As a final preparation towards the proof of Proposition 2.2 we investigate the solution λcond to the differential
equation (1.11); notice that Lemma 3.5 shows that this ODE does indeed possess a unique solution on (dmin,dsat].

Lemma 3.6. For any 0 < d < dsat we have 0 < λcond < λ∗. Furthermore, for all 0 < λ < λcond we have Φd ,k,λ(α∗) >
Φd ,k,λ(α∗), whileΦd ,k,λ(α∗) <Φd ,k,λ(α∗) for λ∗ >λ>λcond.

Proof. For d < dsat define

λ∗
cond = inf{λ≥ 0 :Φd ,k,λ(α∗) >Φd ,k,λ(α∗)} (3.12)

For any d < dsat we haveΦd ,k,0(0) >Φd ,k,0(z) for all 0 < z ≤ 1; this follows from the characterisation of the k-XORSAT
threshold from [3, Theorem 1.1]. Hence, λ∗

cond > 0 for all d < dsat.
Further, the function ζλ∗ has a double zero and a local minimum at α∗ = z∗. Since the sign of ζλ∗ (z) matches

the sign of Φ′
d ,k,λ∗

(z), this means that Φd ,k,λ∗ (α∗) >Φd ,k,λ∗ (α∗). Hence, there exists ε> 0 such that for 0 <λ∗−ε<
λ<λ∗ we haveΦd ,k,λ(α∗) >Φd ,k,λ(α∗). Therefore,

0 <λ∗
cond <λ∗. (3.13)

As a next step we show that

Φd ,k,λ(α∗) <Φd ,k,λ(α∗) for λ∗ >λ>λ∗
cond. (3.14)

15


To this end, we compute the derivatives of Φd ,k,λ(α∗), Φd ,k,λ(α∗) with respect to 0 < λ < λ∗. Since α∗,α∗ are
stationary points ofΦd ,k,λ, the chain rule yields

∂

∂λ
Φd ,k,λ(α∗) = ∂Φd ,k,λ

∂λ

∣∣
α∗ +

∂Φd ,k,λ

∂α

∣∣
α∗
∂α∗
∂λ

= ∂Φd ,k,λ

∂λ

∣∣
α∗ =−exp(−λ−dαk−1

∗ ) =α∗−1, (3.15)

∂

∂λ
Φd ,k,λ(α∗) =α∗−1. (3.16)

Since α∗ <α∗ for all λ∗ <λ<λ∗, (3.14) follows from (3.15)–(3.16).
Finally, we verify that λ∗

cond equals the solution λcond to the differential equation (1.11). Recalling the definition
(3.12), we see that it suffices to check that Φd ,k,λcond (α∗) = Φd ,k,λcond (α∗) for all dmin < d < dsat. To this end, we
notice that by definition of dsat we have Φdsat,k,0(0) =Φdsat,k,0(α∗), in line with the initial condition λcond(dsat) = 0.
Additionally, we claim that λcond(dsat) satisfies

∂Φd ,k,λcond

∂d

∣∣
α∗ =

∂Φd ,k,λcond

∂d

∣∣
α∗ .

Indeed, using the chain rule and the fact that α∗,α∗ are stationary points, with λ=λ(d) we obtain

∂Φd ,k,λcond (α∗)

∂d
= ∂Φd ,k,λcond

∂d

∣∣
α∗,λcond

+ ∂Φd ,k,λcond

∂α

∣∣
α∗,λcond

∂α∗
∂d

+ ∂Φd ,k,λ

∂λ

∣∣
α∗,λ

= ∂Φd ,k,λcond

∂d

∣∣
α∗,λcond

+ ∂Φd ,k,λ

∂λ

∣∣
α∗,λ =αk−1

∗ + (α∗−1)
∂λcond

∂d
.

Analogously,

∂Φd ,k,λcond (α∗)

∂d
=α∗k−1 + (α∗−1)

∂λcond

∂d
.

Hence, the solution λcond to (1.11) satisfies Φd ,k,λcond (α∗) = Φd ,k,λcond (α∗), and thus λcond = λ∗
cond. Therefore, the

assertion follows from (3.15) and (3.14). □

Proof of Proposition 2.2. The first assertion is an immediate consequence of Lemmas 3.2 and 3.5. Moreover, the
second assertion follows from Lemmas 3.3, 3.4 and 3.5. Finally, the last assertion follows from Lemma 3.6. □

4. WARNING PROPAGATION AND LOCAL WEAK CONVERGENCE

In this section we prove Propositions 2.5 and 2.7. The proofs rely on the concept of local weak convergence. Specif-
ically, we are going to set up a Galton-Watson process that mimics the local topology of the graph G(F DC,t ) up to
any fixed depth ℓ. Subsequently we will analyse WP on the Galton-Watson tree and argue that the result extends
to G(F DC,t ).

4.1. Local weak convergence. The construction of the Galton-Watson process T = T(d ,k, t ) is pretty straightfor-
ward. The process has two types called variable nodes and check nodes. The process starts with a single variable
node v0. Furthermore, each variable node begets a Po(d) number of check nodes as offspring, while the offspring
of a check node is a Bin(k −1,1− t/n) number of variable nodes.

Let T be the Galton-Watson tree rooted at v0 that this process generates; T may be infinite. Hence, for an
integer ℓ obtain T(ℓ) from T by deleting all variable/check nodes at distance greater than 2ℓ from v0. Thus, T(ℓ)

is a finite random tree rooted at v0. For any graphs T,T ′ rooted at v, v ′, respectively, we write T ∼= T ′ if there is a
graph isomorphism ι : T → T ′ such that ι(v) = v ′. Furthermore, for a vertex v of G(F DC,t ) and an integer ℓ we let
∂≤ℓF DC,t

v be the subgraph obtained from G(F DC,t ) by deleting all vertices at distance greater than 2ℓ from v , rooted

at v . Finally, for a rooted graph g and an integer ℓ we let N (ℓ)
t (g ) be the number of vertices v of G(F DC,t ) such that

∂≤ℓF DC,t
v ∼= g .

Lemma 4.1. For any rooted tree g we have

E
∣∣∣N (ℓ)

t (g )− (n − t )P
[
T(ℓ) ∼= g

]∣∣∣= o(n). (4.1)

Proof. The proof is based on a routine second moment argument; that is, we claim that

E
[

N (ℓ)
t (g )

]
= (n − t )P

[
T(ℓ) ∼= g

]
+o(n), E

[
N (ℓ)

t (g )2
]
= (n − t )2P

[
T(ℓ) ∼= g

]2
+o(n2). (4.2)

16


Combining (4.2) with the Markov and Chebyshev inequalities then yields the assertion.
We prove (4.2) and thereby (4.1) by induction on ℓ. Recall that F DC,t is a XORSAT instance with variables

xt+1, . . . , xn . Let us begin with the estimate of the first moment. Due to the linearity of expectation, it suffices
to show that

P
[
∂≤ℓ(F DC,t , xt+1) ∼= g

]
=P

[
T(ℓ) ∼= g

]
+o(1). (4.3)

For ℓ = 0 there is nothing to show. Hence, suppose that (4.1) is true with ℓ replaced by ℓ−1. Furthermore, let
∆ be the degree of the root r of g and let 1 ≤ κ1 ≤ . . . ≤ κ∆ ≤ k be the degrees of the children of the root; thus, we
order the children of r so that their degrees are increasing. For an integer 1 ≤ i ≤ k let Ki be the number j ∈ [∆]
such that κ j = i . Further, let (gi , j )1≤i≤∆,1≤ j≤κi be the trees pending on the grandchildren of the root. In addition,
let ∆ be the degree of xt+1 in G(F DC,t ) and let 1 ≤κ1 ≤ . . . ≤κ∆ ≤ k be the degrees of the neighbours of xt+1. Then
∂≤ℓ(F DC,t , xt+1) ∼= g is possible only if ∆=∆ and κi = κi for all 1 ≤ i ≤∆. Since the clauses of the random formula
F are drawn uniformly and independently and G(F DC,t ) is obtained from G(F ) by deleting the variable nodes
x1, . . . , xt along with any ensuing isolated check nodes, we conclude that the event D = {∆=∆,

∧
1≤i≤∆κi = κi } has

probability

P [D] =P [Po(d) =∆]

(
∆

K1, . . . ,Kk

)
k∏

i=1
P [Bin(k −1,1− t/n) = i ]Ki . (4.4)

Further, let G = {gi , j : 1 ≤ i ≤∆, 1 ≤ j ≤ κi } and let E be the event that N (ℓ−1)
t (γ) = (n − t )P

[
T(ℓ−1) ∼= γ

]+o(n) for
all γ ∈G . Then by induction we have

P [E |D] = 1−o(1). (4.5)

Now, obtain G−(F DC,t ) from G(F DC,t ) by deleting xt+1 along with its adjacent check nodes. Let N (ℓ),−
t (gi , j ) be

the number of vertices v of G−(F DC,t ) such that ∂≤ℓ(G−(F DC,t ), v) ∼= gi , j . Moreover, let E− be the event that

N (ℓ−1),−
t (gi , j ) = (n − t )P

[
T(ℓ−1) ∼= gi , j

]+ o(n) for all i , j . Since xt+1 has degree ∆ = O(1) given D and all adjacent
check nodes have degree at most k, (4.5) implies that

P [E− |D] = 1−o(1). (4.6)

Finally, since F DC,t is uniformly random, given D the checks a of F DC,t adjacent to xt+1 simply choose their other
neighbours uniformly at random from the variable nodes xt+2, . . . , xn of G−(F DC,t ). Therefore, (4.4) implies that

P
[
∂≤ℓ(F DC,t , xt+1) ∼= g

]
=P

[
T(ℓ) ∼= g

]
+o(1),

thereby proving (4.3) and thus the first part of (4.2).
The proof of the second part of (4.2) (the estimate of the second moment) proceeds along similar lines, except

that we need to explore the depth-2ℓ neighbourhoods of two variable nodes of F DC,t simultaneously. Specifically,
the proof of the second moment bound comes down to showing that

P
[
∂≤ℓ(F DC,t , xt+1) ∼= g , ∂≤ℓ(F DC,t , xt+2) ∼= g

]
=P

[
T(ℓ) ∼= g

]2
+o(1). (4.7)

Exploiting that the variable nodes xt+1, xt+2 are at distance greater than 4ℓ w.h.p., we conduct a similar induction
as above to verify (4.7) and thus (4.2). □
4.2. Proof of Proposition 2.5. To prove Proposition 2.5 we estimate the sizes |Vn,ℓ(F DC,t )|, |Vf,ℓ(F DC,t )| separately.
Recall that θ ∼ t/n.

Lemma 4.2. Let ε> 0 and assume that one of the following conditions is satisfied:

(i) d < dmin, or
(ii) d > dmin and |θ∗−θ| > ε.

Then there exists ℓ0 = ℓ0(d ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have
∣∣t +|Vn,ℓ(F DC,t )|−α∗n

∣∣< εn.

Proof. In light of Lemma 4.1 it suffices to investigate WP on the random tree T(ℓ) for large enough ℓ. Specifically,
let p(ℓ) be the probability that WP marks the root of T(ℓ) as n. In formulas, recalling (2.15), this means that

p(ℓ) =P[
ωT(ℓ),r,ℓ = n

]
for ℓ≥ 1, and p(0) = 0. (4.8)

17


Let ∆ be the degree of the root r of T(ℓ) and let κ1, . . . ,κ∆ be the degrees of the children of r . Since the sub-trees
of T(ℓ) pending on the grandchildren of r are independent copies of T(ℓ−1), the WP update rules (2.13)–(2.14) yield
the recurrence

p(ℓ) = 1−E
[
∆∏

i=1

(
1−

κi−1∏
j=0

p(ℓ−1)

)]
(ℓ> 0). (4.9)

By the construction ofT the degree∆ of r has distribution Po(d). Furthermore, each child of r has Bin(k−1,1−t/n)

children; thus, κi −1
dist= Bin(k −1,1− t/n). Consequently, (4.9) yields

p(ℓ) = 1−exp(−d)
∞∑
∆=0

d∆

∆!

(
1−

k−1∑
κ=0

(
k −1

κ

)
exp(−λκ)(1−exp(−λ))k−1−κp(ℓ−1)κ

)∆

= 1−exp

(
−d

(
1−exp(−λ)(1−p(ℓ−1))

)k−1
)

. (4.10)

Letting z(ℓ) = 1−exp(−λ)(1−p(ℓ)) and recalling the definition (1.2) of φd ,k,λ, we see that (4.10) amounts to

z(ℓ) =φd ,k,λ(z(ℓ−1)). (4.11)

Moreover, Lemma 3.1 (iii)–(iv), Lemma 3.2 and Lemma 3.4 show that if (i) or (ii) above hold, thenφd ,k,λ is a contrac-

tion on [0,α∗]. Therefore, (4.11) shows that limℓ→∞ p(ℓ) = α∗−θ
1−θ . Thus, the assertion follows from Lemma 4.1. □

Lemma 4.3. Let ε> 0 and assume that d > 0, t = t (n) are such that one of the following conditions is satisfied:

(i) d < dmin, or
(ii) d > dmin, |θ∗−θ| > ε and |θ∗−θ| > ε.

Then there exists ℓ0 = ℓ0(d ,ε) > 0 such that for any fixed ℓ≥ ℓ0 with λ=− log(1−θ) w.h.p. we have
∣∣|Vf,ℓ(F DC,t )|− (α∗−α∗)n

∣∣< εn.

Proof. Once again it suffices to trace WP on T(ℓ) for large ℓ. As in the proof of Lemma 4.2, let

p(ℓ) =P[
ωT(ℓ),r,ℓ ̸= u

]
for ℓ≥ 1, and p(0) = 1. (4.12)

Then with ∆ the degree of r and κ1, . . . ,κ∆ the degrees of the children of r , the WP update rules (2.13)–(2.14)
translate into

p(ℓ) = 1−E
[
∆∏

i=1

(
1−

κi−1∏
j=0

p(ℓ−1)

)]
(ℓ> 0), (4.13)

Thus, the recurrence is identical to (4.8), but this time with the initial condition p(0) = 1. Hence, letting z(ℓ) =
1−exp(−λ)(1−p(ℓ)) and z(0) = 1 and retracing the steps towards (4.11), we obtain

z(ℓ) = 1−exp(−λ)(1−p(ℓ)). (4.14)

Invoking Lemmas 3.1, 3.2 and 3.4, we conclude that (i) or (ii) ensure that φd ,k,λ contracts on [0,α∗]. Consequently,
(4.14) implies that limℓ→∞ p(ℓ) = α∗−θ

1−θ . Thus, the assertion follows from Lemmas 4.1 and 4.2. □

Finally, we compare the set Vn,ℓ(F DC,t ) obtained after a (large but) bounded number of iterations with the ulti-
mate sets Vn(F DC,t ) obtained upon convergence of WP. The proof of the following lemma is an adaptation of the
argument from [23] for cores of random hypergraphs.

Lemma 4.4. Assume that θ ∈ (0,1) \ {θ∗,θ∗}. Then for any ε> 0 there exists ℓ0 = ℓ0(d ,ε,θ) such that for all ℓ> ℓ0 we
have |Vn,ℓ(F DC,t )△Vn(F DC,t )| < εn w.h.p.

Proof. In place of the WP message passing process from Section 2.3 we consider the following simpler peeling
process, which reproduces the same set Vn(F DC,t ). Let G0 = G(F ) be the bipartite graph induced by F DC,t . For
h ≥ 0 obtain Gh+1 from Gh by performing the following peeling operation.

Remove all check nodes of degree one along with their variable node neighbours. (4.15)

18


Clearly, this process will reach a fixed point (i.e., Gh+1 =Gh) after at most m iterations. Moreover, a straightforward
induction on ℓ shows that V (G0) \ V (Gℓ) = Vn,ℓ(F DC,t ) and thus V (G0) \ V (Gm ) = Vn(F DC,t ). Hence, it suffices to
prove that for large enough ℓ= ℓ(d ,ε,θ) we have

|V (Gℓ)△V (Gm )| < εn w.h.p. (4.16)

Towards the proof of (4.16) let d h = (d h(u))u∈V (Gh )∪C (Gh ) be the degree sequence of Gh . By the principle of
deferred decisions Gh is uniformly random given d h . Further, let

∆h( j ) =
∣∣{x ∈V (Gh) : d h(x) = j

}∣∣ , ∆′
h( j ) =

∣∣{a ∈C (Gh) : d h(a) = j
}∣∣

be the number of variable/check nodes of degree j ≥ 0. Pick δ= δ(d ,ε,θ), δ′ = δ′(d ,δ,θ), δ′′(d ,δ′,θ), small enough
and ℓ≥ ℓ0(d ,δ′′,θ) large enough. Then Lemma 4.2 implies that w.h.p.

|V (Gℓ) \V (Gℓ+1)| < δ′′n. (4.17)

Furthermore, we claim that

∑
j≥0

∣∣∣∣
∆ℓ( j )

|V (Gℓ)| −P
[

Po(d(1−αk−1
∗ )) = j

]∣∣∣∣< δ′,
∑
j≥2

∣∣∣∣∣
∆′
ℓ

( j )

|C (Gℓ)| −
P

[
Bin(k,1−α∗) = j

]

P [Bin(k,1−α∗) ≥ 2]

∣∣∣∣∣< δ
′. (4.18)

Indeed, Lemma 4.1 shows that we just need to study WP on the random tree T(ℓ), as in the proof of Lemma 4.2.
Thus, let ∆ be the degree of the root variable and let κ1, . . . ,κ∆ be the degrees of the children of the root. Since
the sub-trees pending on the children of the root are independent copies of T(ℓ−1), Lemma 4.2 shows that the
probability that any one of the∆ children sends a n-message to r falls into the interval (1−αk−1

∗ −δ′′,1−αk−1
∗ +δ′′),

provided that ℓ is large enough. Since∆
dist= Po(d), the first part of (4.18) follows from Poisson thinning.

Similarly, to obtain the second part of (4.18) consider a clause a that is a child of the root r of T(ℓ). Then by
the same token as in the previous paragraph the number of children of a that do not send a n-message after ℓ
iterations of WP lies in the interval (1−αk−1

∗ −δ′′,1−αk−1
∗ +δ′′). Furthermore, the number of children a′ ̸= a of

r has distribution Po(∆). Hence, the probability that the WP-message from r to a equals n comes to α∗±δ′′, and
this event is independent of the messages that the children of a send to a. Finally, the probability that one of the
messages that a receives after ℓ iterations of WP differs from the message received after ℓ−1 iterations is smaller
than δ′′ for large enough ℓ. Since the peeling process removes any checks a with at least k−1 incoming n-messages,
we obtain (4.18).

To complete the proof we are going to deduce from (4.17)–(4.18) that the peeling process (4.15) will remove
no more than εn/2 further nodes from Gℓ before it stops. Following [23], we consider a slowed-down version of
the process where no longer all checks of degree one get removed simultaneously, but rather one-at-a-time. Let
(Gℓ[ν])ν≥0 be the sequences of graphs produced by this modified process, with Gℓ[0] =Gℓ and Gℓ[ν+1] =Gℓ[ν] if
all checks of Gℓ[ν] have degree at least two. Further, let Uℓ[ν] be the number of unary checks of Gℓ[ν]. Let D be
the event that the bounds (4.17)–(4.18) hold. Then it suffices to prove that on the event {Uℓ[ν] > 0}∩D we have

E [Uℓ[ν+1]−Uℓ[ν] |Uℓ[ν]] < 0 for all 0 ≤ ν≤ εn/2. (4.19)

Invoking the principle of deferred decisions, in order to verify (4.19) we compute the expected number of new
degree one checks produced by the removal of a single random variable node x . Due to (4.18), for ν ≤ εn/2 the
expected number of neighbours a of x of degree precisely two is bounded by

dP [Bin(k −1,1−α∗) = 1]+δ= d(k −1)(1−α∗)αk−2
∗ +δ=φ′

d ,k,λ(α∗)+δ< 1,

provided that δ> 0 is chosen sufficiently small. Hence, we obtain (4.19). □

Proof of Proposition 2.5. The proposition is an immediate consequence of Lemmas 4.2–4.4. □

4.3. Proof of Proposition 2.7. We deal with the two claims separately. Towards the first claim we establish the
following stronger, deterministic statement.

Lemma 4.5. For any XORSAT instance F with variables Vn = {x1, . . . , xn} and any integer ℓ ≥ 0 we have Vn,ℓ(F ) ⊆
V0(F ).

19


Proof. We proceed by induction on ℓ. For ℓ ≤ 1 there is nothing to show because Vn,ℓ(F ) = ; by construction.
Hence, assume that ℓ > 1 and that Vn,ℓ−1(F ) ⊆ V0(F ). If x ∈ Vn,ℓ−1, then (2.15) shows that there exists a check
node b ∈ ∂x such that ωF,b→x,ℓ = n. Furthermore, (2.13) shows that if ωF,b→x,ℓ = n, then for all y ∈ ∂b \ {x} we have
ωF,y→b,ℓ−1 = n. Additionally, (2.14) shows that ifωF,y→b,ℓ−1 = n, then there exists a ∈ ∂y\{b} such thatωF,a→y,ℓ−2 = n.
Hence, (2.15) ensures that ωF,y,ℓ−2 = n and thus

y ∈V0(F ) for all y ∈ ∂b \ {x} (4.20)

by induction. Now suppose that ∂b = {x j1 , . . . , x jh } with pairwise distinct indices 1 ≤ j1, . . . , jh ≤ n such that x = x j1 .
Consider σ ∈ ker A(F ). Then (4.20) implies that σ j2 = ·· · =σ jh = 0. Consequently, σ j1 = 0 and thus x ∈V0(F ). □

The following lemma deals with the variables that WP marks u.

Lemma 4.6. For any fixed ℓ≥ 0 we have |Vu,ℓ(F DC,t )∩V0(F DC,t )| = o(n) w.h.p.

Proof. We are going to show by induction on ℓ that E|Vu,ℓ(F DC,t )∩V0(F DC,t )| = o(n). To this end, because the
distribution of F DC,t is invariant under permutations of the variables xt+1, . . . , xn , it suffices to show that

P
[
xn ∈Vu,ℓ(F DC,t )∩V0(F DC,t )

]= o(1). (4.21)

Indeed, let A be the event that the depth-2ℓ neighbourhood ∂≤ℓxn of xn in F DC,t is acyclic. Since Lemma 4.1
shows that P [A ] = 1−o(1), towards (4.21) it suffices to prove that on the event A we have

xn ̸∈Vu,ℓ(F DC,t )∩V0(F DC,t ). (4.22)

But (4.22) follows from the well known fact that BP is exact on acyclic factor graphs (see Fact 2.11). □

Proof of Proposition 2.7. The proposition is an immediate consequence of Lemmas 4.5 and 4.6. □

5. ANALYSIS OF THE CHECK MATRIX

In this section we prove Propositions 2.6 and 2.8. Proposition 2.6 is an easy consequence of [8, Theorem 1.1].
Furthermore, Proposition 2.8 follows from Proposition 2.6 by interpolating on the parameter λ; a related argument
was recently used in [9] to show that certain random combinatorial matrices have full rank w.h.p. In addition, we
prove Corollaries 2.9 and 2.10 and subsequently complete the proofs of Theorems 1.2–1.3.

5.1. Proof of Proposition 2.6. We use a general result [8, Theorem 1.1] about the rank of sparse random matrices
from a fairly universal class of distributions. The definition of this general random matrix goes as follows. Let
d,k ≥ 0 be integer-values random variables such that 0 < E[d3]+E[k3] <∞. Moreover, let (di ,ki )i≥0 be families of

mutually independent random variables such that di
dist= d and ki

dist= k. Let d̄ = E[d] and k̄ = E[k] and for an integer

n > 0 let m=mn
dist= Po(d̄n/k̄). The sequence (mn)n is independent of (di ,ki )i≥0. Further, let Sn be the event that

n∑
i=1

di =
mn∑
i=1

ki . (5.1)

It is a known fact that P [Sn] = Ω(n−1/2) [8, Proposition 1.10]. Given that Sn occurs, create a simple random bi-
partite graph Gn with a set Vn = {x1, . . . ,xn} of variable nodes and a set Cn = {c1, . . . ,cmn } of check nodes uniformly
at random subject to the condition that x j has degree d j and ci has degree ki for all 1 ≤ j ≤ n and 1 ≤ i ≤mn . Fi-
nally, let An be the biadjacency matrix of Gn . Thus, An has size mn ×n and its (i , j )-entry equals 1 iff x j and ci are
adjacent in Gn .

Theorem 5.1 (special case of [8, Theorem 1.1]). Let D(z) = ∑∞
h=0P [d= h] zh and K(z) = ∑∞

h=0P [k= h] zh be the
probability generating functions of d,k, respectively. Furthermore, let

F : [0,1] →R, z 7→D(1−K′(z)/K′(1))− D′(1)

K′(1)
(1−K(z)− (1− z)K′(z)). (5.2)

Then

lim
n→∞

1

n
nulAn = max

z∈[0,1]
F(z) in probability.

We now derive Proposition 2.6 from Theorem 5.1 by identifying suitable distributions d,k such that An resem-
bles At .

20


Proof of Proposition 2.6. Recall that 0 ≤ t = t (n) ≤ n satisfies t = θn +o(n) or a fixed 0 ≤ θ ≤ 1. We continue to set
λ = − log(1−θ). We are going to construct several random matrices that can be coupled such that their nullities
differ by no more than o(n) w.h.p. The first of these random matrices is the matrix At from Proposition 2.6, and the
last is the matrix An from Theorem 5.1, with suitably chosen d,k.

For a start, consider the check matrix A′ = A0 of the original, ‘undecimated’ k-XORSAT formula F = F DC,0.
Obtain A′

t from A′ by adding t new rows to A′. Each of these rows contains precisely a single non-zero entry. The
positions of the non-zero entries are chosen uniformly without replacement. Thus, the extra t rows have the effect
of fixing t uniformly random coordinates to zero. Since the distribution of the random matrix A′ is invariant under
column permutations, we conclude that

nul At
dist= nul A′

t . (5.3)

Further, let A[λ] be the matrix obtained from A′ by adding a random number of l = Po(λn) of rows. Each of these
rows contains a single non-zero entry, which is placed in a uniformly random position. The extra rows are chosen
mutually independently (thus, ‘with replacement’) and independently of A′. By Poisson thinning, for any column
index j ∈ [n] the probability that one of the new l rows has a non-zero entry in the j th column equals 1−exp(−λ) =
θ. Since t ∼ θn, the total number of such indices j has distribution Bin(n,θ). Since P

[|Bin(n,θ)− t | ≤p
n logn

] ≥
1−1/n by the Chernoff bound, we can couple A′

t and A [λ] such that

nul A′
t = nul A [λ]+o(n) w.h.p. (5.4)

Finally, let A′[λ] be the matrix obtained as follows. Let d,k have probability generating functions

D(z) = exp((λ+d)(z −1)), K(z) = d zk +kλz

d +kλ
. (5.5)

In other words, d has distribution Po(d +λ) while k equals one with probability kλ/(d + kλ) and equals k with
probability d/(d +kλ). The definition (5.5) readily yields

d̄=D′(1) =λ+d , k̄=K′(1) = k(d +λ)

d +kλ
. (5.6)

Hence, the number m=mn
dist= Po(nd̄/k̄) of rows of A=An can be written as a sum of independent random variables

m=m′+m′′ with distributions

m′ = Po(dn/k), m′′ = Po(λn). (5.7)

The first summand m′ prescribes the number of rows of A with k non-zero entries, while m′′ details the number of
rows with a single non-zero entry. Consequently, (5.7) shows that the numbers of rows with k or with just a single
non-zero entry have the same distributions in both A and A[λ].

We are left to argue that in A the positions of the non-zero entries in the different rows are nearly independent
and uniform. To see this, let (hi , j )1≤i≥m,1≤ j≤k be a family of mutually independent and uniform random variables
with values in [n] = {1, . . . ,n}. Moreover, let X be the number of indices 1 ≤ i ≤m′ such that there exist 1 ≤ j1 < j2 ≤ k
such that hi , j1 = hi , j2 ; in other words, hi ,1, . . . ,hi ,k fail to be pairwise distinct. A routine calculation shows that

E[X ] =O(1). (5.8)

Now, let us think of (hi , j )1≤i≤m′,1≤ j≤k and (hi ,1)m′<i≤m as the ‘bins’ where km′+m′′ randomly tossed ‘balls’ land.
Then the standard Poissonisation of the balls-into-bins experiment shows that given the event (5.1) the loads of
the bins are distributed precisely as the vector (d1, . . . ,dn). Therefore, (5.8) shows that A[λ],A can be coupled such
that

nul A[λ] = nulA+o(n) w.h.p. (5.9)

Combining (5.3), (5.4) and (5.9), we see that At and A can be coupled such that

nul At = nulA+o(n) w.h.p. (5.10)

Hence, Theorem 5.1 implies that

lim
n→∞

1

n
nul At = max

z∈[0,1]
F(z) in probability. (5.11)

21


Further, recalling the definitions (5.2), (5.5) of F,D,K and performing a bit of calculus, we verify that F(z) coincides
with the function Φd ,k,λ(z) from (1.3). Finally, the assertion follows from (5.11) and the fact that Φd ,k,λ(αmax) =
maxz∈[0,1]Φd ,k,λ(z). □
5.2. Proof of Proposition 2.8. We continue to work with the random matrix A[λ] from the above proof of Propo-
sition 2.6. As we recall, this matrix is obtained by adding l = Po(λn) stochastically independent new rows to the
matrix A(F ) that each contain a single non-zero entry in a uniformly random position. Combining (5.3)–(5.4), we
see that

|E[nul A[λ]]−E[nul At ]| = o(n) for λ= log(1−θ). (5.12)

Towards the proof of Proposition 2.8 we observe that nul A[λ],nul At concentrate about their expectations.

Lemma 5.2. We have

P
[|nul At −E[nul At ]| >p

n logn
]= o(n−10), P

[|nul A [λ]−E[nul A [λ]]| >p
n logn

]= o(n−10). (5.13)

Proof. We combine the Azuma–Hoeffding inequality with the simple observation that the nullity satisfies a Lips-
chitz condition. Specifically, adding or removing a single row to a matrix changes the nullity by at most one. We
apply this observation to the matrix A′

t from the proof of Proposition 2.6, which consists of m + t independent
random rows. Indeed, Azuma-Hoeffding implies together with the Lipschitz property that

P
[|A′

t −E[A′
t | m]| > u | m

]≤ 2exp

(
− u2

2(m + t )

)
for any u > 0. (5.14)

Furthermore, Bennett’s concentration inequality for Poisson variables shows that

P
[|m −dn/k| >p

n log2/3 n
]= o(n−10). (5.15)

Combining (5.14)–(5.15) with the Lipschitz property and setting u =p
n log2/3 n, we obtain the first part of (5.13).

Similar reasoning applies to the second matrix A[λ]; for given l and m the Lipschitz property yields

P
[|A′

t −E[A′
t | l ,m]| > u | l ,m

]≤ 2exp

(
− u2

2(l +m)

)
for any u > 0. (5.16)

Moreover, in analogy to (5.15) we have

P
[|l −λn| >p

n log2/3 n
]= o(n−10). (5.17)

Thus, (5.15)–(5.17) and Azuma-Hoeffding imply the second part of (5.13). □
We are going to estimate |V0(F DC,t )| by way of estimating changes of nul A[λ] as λ varies. Since nul A[λ]/n

converges toΦd ,k,λ(αmax) by Proposition 2.6, we thus need to estimate the derivative ∂
∂λΦd ,k,λ(αmax).

Lemma 5.3. Let d > 0 and assume that

(i) d < dmin, or
(ii) d > dmin and λ ∈ (0,∞) \ {λcond}.

Then
∂

∂λ
Φd ,k,λ(αmax) =αmax −1. (5.18)

Proof. The seeming difficulty is thatαmax =αmax(λ) varies withλ. Yet Proposition 2.2 (iii) ensures that the function
λ 7→ αmax is continuously differentiable for λ ̸= λcond. Moreover, Fact 2.1 shows that αmax is a local maximum of
Φd ,k,λ. Hence, applying the chain rule we obtain

∂

∂λ
Φd ,k,λ(αmax) = ∂Φd ,k,λ

∂λ

∣∣∣
λ,αmax

+ ∂Φd ,k,λ

∂α

∣∣∣
λ,αmax

∂αmax

∂λ
= ∂Φd ,k,λ

∂λ

∣∣∣
λ,αmax

=−exp
(
−λ−dαk−1

max

)
. (5.19)

In fact, since Fact 2.1 shows that αmax is a fixed point of φd ,k,λ, the r.h.s. of (5.19) simplifies to (5.18). □
Complementing the analytic formula (5.18), we now derive a combinatorial interpretation of the derivative of

the nullity. For a matrix A of size m ×n let V0(A) be the set of all indices i ∈ [n] such that σi = 0 for all σ ∈ ker A.

Lemma 5.4. For any d ,λ> 0 we have

∂

∂λ
E[nul A[λ]] = E|V0(A[λ])|

n
−1.

22


Proof. Recall that A[λ] is obtained from A(F ) by adding m′′dist= Po(λn) stochastically independent rows with a single
non-zero entry in a uniformly random position. Consequently,

∂

∂λ
E[nul A[λ]] = ∂

∂λ

∞∑
ℓ=0

P
[
m′′ = ℓ]E[nul A[λ] | m′′ = ℓ] =

∞∑
ℓ=0

E[nul A[λ] | m′′ = ℓ]
∂

∂λ

(λn)ℓ

ℓ!
exp(−λn)

=
∞∑
ℓ=0

E[nul A[λ] | m′′ = ℓ]

(
1{ℓ≥ 1}

(λn)ℓ−1

(ℓ−1)!
− (λn)ℓ

ℓ!

)
exp(−λn)

=
∞∑
ℓ=0

E[nul A[λ] | m′′ = ℓ]
(
P

[
m′′ = ℓ]−P[

m′′ = ℓ+1
])

. (5.20)

Hence, obtain A[λ]+ from A[λ] by adding one more row with a single non-zero entry in a uniformly random posi-
tion j+ ∈ [n]. Then A[λ]+− A[λ] =−1{ j+ ∈V0(A[λ])}. Hence, (5.20) yields

∂

∂λ
E[nul A[λ]] =−E[

nul(A[λ]+)−nul(A[λ])
]=P[

j+ ∈V0(A[λ])
]−1 = E|V0(A[λ])|

n
−1,

as claimed. □
With these preparations in place we can now derive the desired formulas for |V0(At )|. We treat the cases αmax =

α∗ and αmax =α∗ separately.

Lemma 5.5. Assume that d ,λ> 0 satisfy

Φd ,k,λ(α∗) >Φd ,k,λ(α) for all α ∈ [0,1] \ {α∗}. (5.21)

Then |V0(A[λ])| =α∗n +o(n) w.h.p.

Proof. Proposition 2.5 and Lemma 4.5 yield the lower bound

|V0(A[λ])| ≥α∗n +o(n) w.h.p. (5.22)

To derive the matching upper bound, fix ε> 0 and assume that the event E = {|V0(A[λ])| > (α∗+ε)n} has probability
P [E ] > ε. Then by Proposition 2.2 (iii) there exists λ′ >λ such that αmax(λ′′) =α∗(λ′′) and α∗(λ′′) <α∗(λ)+ε2/2 for
all λ′′ ∈ [λ,λ′]. Hence, Lemmas 5.3 yields

Φd ,k,λ′ (αmax(λ′))−Φd ,k,λ(αmax(λ)) ≤
∫ λ′

λ
(α∗(λ′′)−1)dλ′′ ≤ (λ′−λ)(α∗(λ)−ε2/2/−1). (5.23)

Combining (5.23) with Proposition 2.6 and Lemma 5.2, we obtain

n−1 [
E
[
nul A[λ′]

]−E [nul A[λ]]
]≤ (λ′−λ)(α∗(λ)−ε2/2/−1+o(1)). (5.24)

On the other hand, since adding checks can only increase the number of frozen variables, Lemma 5.4 shows that

n−1 [
E
[
nul A[λ′]

]−E [nul A[λ]]
]≥ (λ′−λ)(α∗(λ)+P [E ]ε−1+o(1)) ≥ (λ′−λ)(α∗(λ)+ε2 −1+o(1)). (5.25)

Finally, since (5.24) and (5.25) contradict each other, we have refuted the assumption P [E ] > ε. □
Lemma 5.6. Assume that d ,λ> 0 are such that

Φd ,k,λ(α∗) >Φd ,k,λ(α) for all α ∈ [0,1] \ {α∗}. (5.26)

Then |V0(A[λ])| =α∗n +o(n) w.h.p.

Proof. We use a similar strategy as in the proof of Lemma 5.5. Hence, assume that d ,λ> 0 satisfy (5.26). Combining
Proposition 2.5 and Lemma 4.6, we see that |V0(A[λ])| ≤ α∗n +o(n) w.h.p. Now choose a small enough ε > 0 and
assume that E = {|V0(A[λ])| < (α∗−ε)n} occurs with probability P [E ] > ε. Then Proposition 2.2 shows that there
exists λ′ <λ such that αmax(λ′′) =α∗(λ′′) and α∗(λ′′) >α∗(λ)−ε2/2 for all λ′′ ∈ [λ,λ′]. Hence, Lemmas 5.3 yields

Φd ,k,λ(αmax(λ))−Φd ,k,λ′ (αmax(λ′)) =
∫ λ′

λ
(α∗(λ′′)−1)dλ′′ ≥ (λ′−λ)(α∗(λ)−ε2/2/−1). (5.27)

But once again because adding checks can only increase the number of frozen variables, Lemma 5.4 yields

n−1 [
E [nul A [λ]]−E

[
nul A

[
λ′]]]≤ (λ′−λ)(α∗(λ)−P [E ]ε−1+o(1)) ≤ (λ′−λ)(α∗(λ)−ε2 −1+o(1)). (5.28)

However, Proposition 2.6 and Lemma 5.3 show that (5.27)–(5.28) are in contradiction. □
Proof of Proposition 2.8. Sinceαmax ∈ {α∗,α∗}, the assertion is an immediate consequence of Lemmas 5.5–5.6. □

23


5.3. Proof of Corollary 2.9. There are four cases to consider separately. Let ε> 0.

Case 1: d < dmin: As Proposition 2.2 (i) shows, in this case we have α∗ = α∗ for all λ > 0; thus, the func-
tion φd ,k,λ has only the single fixed point α∗, which is stable. Furthermore, Proposition 2.5 shows that
||Vn,ℓ(F DC,t )|−α∗n| < εn/2 for large enough ℓ w.h.p. Moreover, Proposition 2.8 yields |V0(F DC,t )| = α∗n +
o(n) w.h.p. Therefore, Proposition 2.7 implies that |V0(F DC,t )△Vn,ℓ(F DC,t )| < εn w.h.p. for large enough ℓ.
Since |Vn,ℓ(F DC,t )| ⊆V0(F DC,t ) w.h.p. and |Vn,ℓ(F DC,t )△Vn(F DC,t )| < εn by (2.19), the assertion follows.

Case 2: dmin < d < dsat and θ > θ∗: A similar argument as under Case 1 applies. Indeed, Proposition 2.2 (ii)
shows that α∗ = α∗ is the unique and stable fixed point of φd ,k,λ. Since ||Vn,ℓ(F DC,t )| −α∗n| < εn/2 for
large ℓ w.h.p. by Proposition 2.5 and |V0(F DC,t )| = α∗n + o(n) w.h.p. by Proposition 2.8, Proposition 2.7
yields |V0(F DC,t )△Vn,ℓ(F DC,t )| < εn w.h.p. Therefore, (2.19) implies the assertion.

Case 3: dmin < d < dsat and θ < θcond: Proposition 2.2 (ii) shows that α∗ <α∗ in this case. Moreover, Proposi-
tion 2.5 yields ||Vn,ℓ(F DC,t )|−α∗n| < εn/2 for large ℓ w.h.p., while Proposition 2.8 and Proposition 2.2 (iii)
imply that |V0(F DC,t )| =α∗n +o(n) w.h.p. Thus, the same steps as in Cases 1–2 complete the proof.

Case 4: dmin < d < dsat and θcond < θ < θ∗: Once again Proposition 2.2 (ii) shows thatα∗ <α∗, Proposition 2.5
yields ||Vn,ℓ(F DC,t )|−α∗n| < εn/2 for large ℓ w.h.p., and Proposition 2.8 and Proposition 2.2 (iii) show that
|V0(F DC,t )| =α∗n +o(n) w.h.p. Since |Vn,ℓ(F DC,t )| ⊆ V0(F DC,t ) w.h.p., the assertion follows from (2.19) and
the fact that α∗ <α∗.

5.4. Proof of Corollary 2.10. Assume first that θ < θcond. Then Corollary 2.9 shows that |V0(F DC,t )△Vn(F DC,t ) =
o(n) for large enough ℓ. Since Vn(F DC,t )∩Vf(F DC,t ) =; by construction, the first assertion follows.

Now suppose θ > θcond. Then Proposition 2.5 yields ||Vf,ℓ(F DC,t )|−α∗n| < εn/2 for large ℓ w.h.p., while Propo-
sition 2.8 and Proposition 2.2 (iii) show that |V0(F DC,t )| =α∗n+o(n) w.h.p. Additionally, Proposition 2.5 shows that
|Vu,ℓ(F DC,t )∩V0(F DC,t )| < εn for large ℓ, which implies the assertion.

5.5. Proofs of Theorems 1.2 and 1.3. We begin with the following observation.

Lemma 5.7. Let σ ∈ ker(F DC,t ) be uniformly random. For any ℓ> 0 w.h.p. we have

P
[
σxt+1 = 0 | F DC,t ,σ∂2ℓxt+1

]
= 1

2

(
1+ 1{xt+1 ∈Vf,ℓ(F DC,t )∪Vn,ℓ(F DC,t )}

)
, (5.29)

πF DC,t =P
[
σxt+1 = 0 | F DC,t

]= 1

2

(
1+ 1{xt+1 ∈V0(F DC,t )}

)
. (5.30)

Proof. Notice that for d < dsat the random XORSAT instance F is satisfiable w.h.p.; therefore, so is F DC,t .
We begin with the proof of (5.30). The first equality πF DC,t =P

[
σxt+1 = 0 | F DC,t

]
follows from the fact that the set

of solutions of F DC,t is an affine translation of ker(A(F DC,t )). Moreover, the second equality sign follows from the
well known fact that the marginal P

[
σxt+1 = 0 | F DC,t

]
is equal to 1/2 or to 1.

Moving on to (5.29), we recall from Lemma 4.1 that the depth-2ℓ neighbourhood ∂≤ℓxt+1 of xt+1 in F DC,t is

acyclic w.h.p. Furthermore, we can think of P
[
σxt+1 = 0 | F DC,t ,σ∂≤ℓxt+1

]
as the marginal probability that xt+1 re-

ceives the value zero under a random vector from the kernel of the check matrix of ∂≤ℓxt+1, subject to imposing
the values σ∂≤ℓxt+1

upon the variable at distance exactly 2ℓ from xt+1. Let F (ℓ)
DC,t signify the XORSAT instance thus

obtained. Then we conclude that P
[
σxt+1 = 0 | F DC,t ,σ∂≤ℓxt+1

]
= 1 iff xt+1 ∈ V0(F (ℓ)

DC,t ). Furthermore, because BP

is exact on acyclic factor graphs, we have xt+1 ∈ V0(F (ℓ)
DC,t ) iff xt+1 ∈ VV0,ℓ(F DC,t )∪Vn,ℓ(F DC,t ). Thus, we obtain

(5.29). □

Proof of Theorem 1.2. We begin with claim (i) concerning d < dmin. As Proposition 2.2 (i) shows, in this case we
have α∗ = α∗. Furthermore, Proposition 2.5 shows that ||Vn,ℓ(F DC,t )|−α∗n| < εn and |VV0,ℓ(F DC,t )| < εn for large
enough ℓ w.h.p. Moreover, Proposition 2.8 yields |V0(F DC,t )| = α∗n + o(n) w.h.p. Therefore, Proposition 2.7 im-
plies that |V0(F DC,t )△Vn,ℓ| < εn w.h.p. for large enough ℓ. Hence, Lemma 5.7 shows that the non-reconstruction
property (1.7) holds w.h.p.

Similarly, towards the proof of (ii) assume that dmin < d < dsat and θ < θ∗. Then Proposition 2.2 (ii) shows that
α∗ = α∗ is the unique (stable) fixed point of φd ,k,λ. Therefore, the argument from the previous paragraph shows
that (1.7) holds w.h.p.Further, suppose that dmin < d < dsat and θ > θcond. Then Corollary 2.10 (ii) shows that
|(Vn,ℓ(F DC,t )∪Vf,ℓ(F DC,t ))△V0,ℓ(F DC,t )| < εn w.h.p. Therefore, Lemma 5.7 implies non-reconstruction property,
and thus the proof of (ii) is complete.

24


Finally, suppose that dmin < d < dsat and θ∗ < θ < θcond. Then Proposition 2.5 shows that ||Vn,ℓ(F DC,t )−α∗n| <
εn and |VV0,n|−(α∗−α∗)n| < εn for large enough ℓw.h.p. Moreover, Corollary 2.10 shows that |Vf,n∩V0(F DC,t )| < εn
w.h.p. Consequently, Lemma 5.7 demonstrates that the reconstruction condition (1.8) holds w.h.p. □

Proof of Theorem 1.3. Part (i) regarding the case d < dmin is an immediate consequence of Fact 2.4 (the equivalence
of WP and BP), Corollary 2.9 (i) and Lemma 5.7. The same is true of part (ii) concerning dmin < d < dsat and
θ < θcond or θ > θ∗. Furthermore, (iii) follows from Corollary 2.9 (ii) and Lemma 5.7. □

6. BELIEF PROPAGATION GUIDED DECIMATION

In this section we prove Theorem 1.1. We begin by arguing that BPGD is actually equivalent to the simple combina-
torial Unit Clause Propagation algorithm. Then we prove the ‘positive’ part, i.e., the formula (1.6) for the success
probability for d < dmin. Subsequently we prove the second part of the theorem concerning dmin < d < dsat.

6.1. Unit Clause Propagation redux. The simple-minded Unit Clause Propagation algorithm attempts to assign
random values to as yet unassigned variables one after the other. After each such random assignment the algo-
rithm pursues the ‘obvious’ implications of its decisions. Specifically, the algorithm substitutes its chosen truth
values for all occurrences of the already assigned variables. If this leaves a clause with only a single unassigned
variable, a so-called ‘unit clause’, the algorithm assigns that variable so as to satisfy the unit clause. If a conflict
occurs because two unit clauses impose opposing values on a variable, the algorithm declares that a conflict has
occurred, sets the variable to false and continues; of course, in the event of a conflict the algorithm will ultimately
fail to produce a satisfying assignment. The pseudocode for the algorithm is displayed in Algorithm 3.

1 Let U =; and let σUC : U → {0,1} be the empty assignment;
2 for t = 0, . . . ,n −1 do
3 if xt+1 ̸∈U then
4 add xt+1 to U ;
5 choose σUC(xt+1) ∈ {0,1} uniformly at random;
6 while F [σUC] contains a unit clause a do
7 let x be the variable in a;
8 let s ∈ {0,1} be the truth value that x needs to take to satisfy a;
9 if another unit clause a′ exists that requires x be set to 1− s then

10 output ‘conflict’ and let σUC(x) = 0;
11 else
12 add x to U and let σUC(x) = s;
13 return σUC;

Algorithm 3: The UCP algorithm.

Let F UC,t denote the simplified formula obtained after the first t iterations (in which the truth values chosen for
x1, . . . , xt and any values implied by Unit Clauses have been substituted). We notice that the values assigned during
Steps 6–12 are deterministic consequences of the choices in Step 5. In particular, the order in which unit clauses
are processed Steps 6–12 does not affect the output of the algorithm.

Proposition 6.1. We have

P
[
BPGD outputs a satisfying assignment of F

]=P[
UCP outputs a satisfying assignment of F

]
.

Proof. We employ the following coupling. Let τ ∈ {0,1}n be a uniformly random vector. The BPGD algorithm sets
σBP(xt+1) = τt+1 if µF BP,t = 1/2. Analogously, UCP sets σUC(xt+1) = τt+1 in Step 5 (if xt+1 ̸∈ U ). Hence, because
(1.1) guarantees that the BP marginals µF BP,t are half-integral, the coupling ensures that the “free steps” of the two
algorithms pick the same truth values.

We now proceed by induction on 0 ≤ t ≤ n to prove the following two statements.

UCP1: unless UCP encountered a conflict before time t we have σBP(xi ) =σUC(xi ) for i = 1, . . . , t .
UCP2: if t < n and there has been no conflict before time t we have we have µF BP,t+1 = 1/2 iff xt+1 ̸∈U .

25


For t = 0 both of these statements are clearly correct because µF BP,0 = 1/2 and x1 ̸∈U .
Now assume that UCP1–UCP2 hold at time t −1 and that no conflict has occurred yet. Then we already know

that σBP(xi ) =σUC(xi ) for i = 1, . . . , t −1. Furthermore, since UCP2 is correct at time t −1 we have µF BP,t = 1/2 iff
xt ̸∈U . Consequently, if xt ̸∈U then σBP(xt ) =σUC(xt ). Hence, suppose that xt ̸∈U and thus µF BP,t ∈ {0,1}. Then
given σBP(x1) =σUC(x1), . . . ,σBP(xt−1) =σUC(xt−1) the value σUC(xt ) is implied by unit clause propagation. But a
glimpse at the BP update rules (2.7)–(2.8) shows that these encompass the unit clause rule. Specifically, if x is the
only remaining variable in clause a, then (2.7) ensures that the message from a to x gives probability one to the
value that satisfies clause a. Therefore, the definition (2.9) of the BP marginal demonstrates that µF BP,t =σUC(x1)
and thus σBP(xt ) =σUC(xt ). Thus, UCP1 continues to hold for t .

Similar reasoning yields UCP2. Indeed, revisiting (2.7), we see that the BP message that clause a sends to vari-
able x equals 1/2 unless a is a unit clause. In effect, (2.9) shows that the BP marginal µF BP,t+1 is equal to 1/2 unless
the value of xt+1 is implied by the unit clause rule. This completes the induction.

To complete the proof assume that UCP manages to find a satisfying assignment. Then UCP1 applied to t = n
demonstrates that BPGD outputs the very same satisfying assignment. Conversely, if UCP encounters a conflict at
some time t , then UCP1 shows that BPGD chose the same assignment up to time t . Therefore, it is not possible to
extend the partial assignmentσBP(x1), . . . ,σBP(xt ) to a satisfying assignment of F and thus BPGD will ultimately fail
to output a satisfying assignment. □

In light of Proposition 6.1 we are left to study the success probability of UCP. The following two subsections deal
with this task for d < dmin and d > dmin, respectively.

6.2. The success probability of UCP for d < dmin. We continue to denote by F UC,t the sub-formula obtained after
the first t iterations of UCP. Let V (t ) ⊆ {xt+1, . . . , xn} be the set of variables of F UC,t . Thus, V (t ) contains those
variables among xt+1, . . . , xn whose values are not implied by the assignment of x1, . . . , xt via unit clauses. Also
let C (t ) be the set of clauses of F UC,t ; these clauses contain variables from V (t ) only, and each clause contains at
least two variables. Let V̄ (t ) = Vn \ V (t ) be the set of assigned variables. Thus, after its first t iterations UCP has
constructed an assignment σUC : V̄ (t ) → {0,1}. Moreover, let V ′(t +1) = V (t ) \ V (t +1) be the set of variables that
receive values in the course of the iteration t +1 for 0 ≤ t < n. Additionally, let C ′(t +1) be the set of clauses of F UC,t

that consists of variables from V ′(t +1) only. Finally, let F ′
UC,t+1 be the formula comprising the variables V ′(t +1)

and the clauses C ′(t +1).
To characterise the distribution of F UC,t let n(t ) = |V (t )| and let mℓ(t ) be the number of clauses of length ℓ, i.e.,

clauses that contain precisely ℓ variables from V (t ). Observe that m1(t ) = 0 because unit clauses get eliminated.
Let Ft be the σ-algebra generated by n(t ) and (mℓ(t ))2≤ℓ≤k .

Fact 6.2. The XORSAT formula F UC,t is uniformly random given Ft . In other words, the variables that appear in
each clause are uniformly random and independent, as are their signs.

Proof. This follows from the principle of deferred decisions. □

We proceed to estimate the random variables n(t ),mℓ(t ). Let α(t ) = |V̄ (t )|/n so that n(t ) = n(1−α(t )). Let
λ = λ(θ) = − log(1−θ) with θ ∼ t/n and recall that α∗ = α∗(d ,k,λ) denotes the smallest fixed point of φd ,k,λ. The
proof of the following proposition proof can be found in Section 6.2.1.

Proposition 6.3. Suppose that d < dmin(k). There exists a function δ= δ(n) = o(1) such that for all 0 ≤ t < n and all
2 ≤ ℓ≤ k we have

P [|α(t )−α∗| > δ] =O(n−2), P

[∣∣∣∣∣mℓ(t )− dn

k

(
k

ℓ

)
(1−α∗)ℓαk−ℓ

∗

∣∣∣∣∣> δn

]
=O(n−2). (6.1)

Proposition 6.3 paves the way for the actual computation of the success probability of UCP. Let Rt be the event
that a conflict occurs in iteration t . The following proposition gives us the correct value of P [Rt |Ft ] w.h.p. Since
Ft is a random variable the value for the probability P [Rt |Ft ] is random as well.

Proposition 6.4. Fix ε> 0, let 0 ≤ t < (1−ε)n and define

fn(t ) = d(k −1)(1−α∗)αk−2
∗ . (6.2)

26


Then with probability 1−o(1/n) we have

P [Rt |Ft ] = fn(t )2

4(n − t )(1− fn(t ))2 +o(1/n).

The proof of Proposition 6.4 can be found in Section 6.2.2. Moreover, in Section 6.2.3 we prove the following.

Proposition 6.5. Fix ε> 0 and ℓ≥ 1. For any 0 ≤ t1 < ·· · < tℓ < (1−ε)n we have

P

[
ℓ⋂

i=1
Rti

]
∼

ℓ∏
i=1

fn(ti )2

4(n − ti )(1− fn(ti ))2 . (6.3)

Finally, the following statement deals with the εn final steps of the algorithm.

Proposition 6.6. For any δ> 0 there exists ε> 0 such that P
[⋃

(1−ε)n<t<n Rt
]< δ.

Before we proceed we notice that Propositions 6.4–6.6 imply the first part of Theorem 1.1.

Proof of Theorem 1.1 (i). Pick δ > 0, fix a small enough ε = ε(δ) > 0 and let R = ∑n−1
t=0 1{Rt } be the total number of

times at which conflicts occur. Proposition 6.1 shows that the probability that BPGD succeeds equals P [R = 0]. In
order to calculateP [R = 0], let Rε =

∑
0≤t≤(1−ε)n 1{Rt } be the number of failures before time (1−ε)n. Proposition 6.5

shows that for any fixed ℓ≥ 1 we have

E

[
ℓ∏

i=1
(Rε− i +1)

]
= ℓ!

∑
0≤t1<···<tℓ≤(1−ε)n

P

[
ℓ⋂

i=1
Rti

]
∼ ℓ!

∑
0≤t1<···<tℓ≤(1−ε)n

ℓ∏
i=1

fn(ti )2

4(n − ti )(1− fn(ti ))2

= (1+o(1))
∑

0≤t1,...,tℓ≤(1−ε)n

ℓ∏
i=1

fn(ti )2

4(n − ti )(1− fn(ti ))2 ∼ E[Rε]ℓ. (6.4)

Hence, the inclusion/exclusion principle (e.g., [4, Theorem 1.21]) implies that

P [Rε = 0] ∼ exp(−E[Rε]). (6.5)

Further, using Proposition 6.4 and the linearity of expectation, we obtain with λ(θ) =− log(1−θ)

E[Rε] =
∑

0≤t≤(1−ε)n
P [Rt ] ∼

∑
0≤t≤(1−ε)n

fn(t )2

4(n − t )(1− fn(t ))2 ∼ 1

4n

∫ 1−ε

0

fn(θn)2

(1−θ)(1− fn(θn))2 dθ

= 1

4n

∫ 1−ε

0

fn(θn)2

(1−α∗)(1− fn(θn))

∂α∗
∂λ

∂λ(θ)

∂θ
dθ [by (3.11)]

= d 2(k −1)2

4

∫ 1−ε

0

z2k−4(1− z)

1−d(k −1)zk−2(1− z)
dz [by (6.2)]. (6.6)

Finally, Proposition 6.6 implies that

P [R > Rε] < δ. (6.7)

Thus, the assertion follows from (6.5)–(6.7) upon taking the limit δ→ 0. □

6.2.1. Proof of Proposition 6.3. The proof of Proposition 6.3 is based on the method of differential equations.
Specifically, based on Fact 6.2 we derive a system of ODEs that track the random variables α(t ),m2(t ), . . . ,mk (t ).
We will then identify the unique solution to this system. As a first step we work out the conditional expectations of
α(t +1),m2(t +1), . . . ,mk (t +1) given Ft .

Lemma 6.7. If 2m2(t )/n(t ) < 1−Ω(1) and n(t ) =Ω(n), then

E [n(t )−n(t +1) |Ft ] = n(t )2

(n − t )(n(t )−2m2(t ))
+o(1), (6.8)

E [mℓ(t +1)−mℓ(t ) |Ft ] = n(t )2

(n − t )(n(t )−2m2(t ))
· (ℓ+1)mℓ+1(t )−ℓmℓ(t )

n(t )
+o(1) (2 ≤ ℓ< k), (6.9)

E [mk (t +1) |Ft ] =− n(t )2

(n − t )(n(t )−2m2(t ))
· kmk (t )

n(t )
+o(1). (6.10)

27


Proof. Going from time t to time t +1 involves the express assignment of variable xt+1, unless it had already been
assigned a value due to previous decisions, and the subsequent pursuit of unit clause implications. The probability
given Ft that xt+1 was set in a previous iteration equals

qt+1 = 1− n(t )

n − t
. (6.11)

Indeed, the first t iterations assigned values to a total of n −n(t ) variables, including x1, . . . , xt , and Fact 6.2 shows
that the identities of the assigned variables among xt+1, . . . , xn are random.

Let Qt+1 be the event that xt+1 was not assigned previously. Given Qt+1 we need to pursue unit clause im-
plications. To this end, recall the bipartite graph representation G(F UC,t ) of the formula F UC,t . Let G2(F UC,t ) be
the subgraph of G(F UC,t ) obtained by removing all clauses of length greater than two. Then Fact 6.2 shows that
G2(F UC,t ) is a uniformly random bipartite graph with n(t ) nodes on one side and m2(t ) nodes of degree two on
the other side. Furthermore, the number of variables whose values are implied by unit clause propagation is lower
bounded by the number of variable nodes in the component of xt+1 in G2(F UC,t ). The expected size of this com-
ponent can be computed as the expected progeny of a branching process with offspring Po(2m2(ℓ)/n(t )). As is
well known, under the assumption 2m2(t )/n(t ) < 1−Ω(1) that the branching process is sub-critical, the expected
progeny comes to (1−2m2(t )/n(t ))−1. Hence, we obtain

E [n(α(t +1)−α(t )) |Ft ] ≥ 1−qt+1

1−2m2(t )/n(t )
. (6.12)

Strictly speaking, (6.12) only gives a lower bound on E [n(α(t +1)−α(t )) |Ft ] because additional unit clause im-
plications could arise from clauses of length greater than two. However, for this to happen a clause would have
to contain at least two variables that are set in iteration t + 1 (i.e., either xt+1 itself or a variable whose value is
implied due to unit clause propagation). But since 2m2(t )/n(t ) < 1−Ω(1), the expected number of such implica-
tions is bounded, and thus the expected number of longer clauses that turn into unit clauses is of order O(1/n).
Consequently, the lower bound (6.12) is tight up to an O(1/n) error term, whence we obtain (6.8).

Moving on to (6.9)–(6.10) we notice that for 2 ≤ ℓ< k there are two ways in which the number of clauses of length
ℓ can change from iteration t to iteration t +1. First, it could be that clauses of length ℓ contain one variable that
gets a value assigned. Any such clauses shorten to length ℓ−1 (if ℓ> 2) or become unit clauses and subsequently
disappear (ℓ = 2). In light of Fact 6.2, the probability that a given clause of length ℓ suffers this fate comes to
ℓ(n(t )−n(t +1))/n(t )+o(1). Conversely, if ℓ< k additional clauses of length ℓ may result from the shortening of
clauses of length ℓ+1. Analogously to the previous computation, the probability that a given clause of length ℓ+1
shortens to length ℓ comes to (ℓ+1)(n(t )−n(t +1))/n(t )+o(1). Of course, there could also be clauses that contain
more than one variable that receives a value during iteration t +1. However, the probability of this event is of order
O(1/n2). Hence, (6.8) implies (6.9) and (6.10). □

Lemma 6.7 puts us in a position to derive a system of ODEs to track the random variables n(t ),m2(t ), . . . ,mk (t ).
Specifically, we obtain the following.

Corollary 6.8. Let n,m2, . . . ,mk : [0,1] →R be continuously differentiable functions such that

n(0) = 1, mk (0) = d

k
, (6.13)

∂n

∂θ
=− n2

(1−θ)(n−2m2)
, (6.14)

∂mℓ

∂θ
= n((ℓ+1)mℓ+1 −ℓmℓ)

(1−θ)(n−2m2)
(2 ≤ ℓ< k),

∂mk

∂θ
=− knmk

(1−θ)(n−2m2)
. (6.15)

Assume, furthermore, that

sup
θ∈[0,1]

2m2(θ)/n(θ) < 1. (6.16)

Then with probability 1−o(n−2) for all 0 ≤ t ≤ n we have

n(t )/n = n(t/n)+o(1), mℓ(t )/n =mℓ(t/n)+o(1) (2 ≤ ℓ≤ k).

Proof. This follows from Lemma 6.7 in combination with [26, Theorem 2]. □
As a next step we construct an explicit solution to the system (6.13)–(6.15).

28


Lemma 6.9. If d < dmin, then the functions

n∗(θ) = 1−α∗(λ(θ)), m∗
ℓ(θ) = d

k

(
k

ℓ

)
(1−α∗(λ(θ)))ℓα∗(λ(θ))k−ℓ. (6.17)

satisfy (6.13)–(6.16).

Proof. The initial condition (6.13) is satisfied because α∗(λ(0)) = 0. Furthermore, (3.11) shows that

∂n∗

∂θ
=−∂α∗

∂λ
· ∂λ
∂θ

=− 1−α∗
1−d(k −1)αk−2∗ (1−α∗)

· 1

1−θ =− n∗

(1−θ)(1−2m∗
2 /n∗)

. (6.18)

Hence, (6.14) is satisfied. Furthermore, (6.18) implies that for 2 ≤ ℓ< k we have

∂m∗
ℓ

∂θ
= d

k
· ∂λ
∂θ

· ∂α∗
∂λ

·
(

k

ℓ

)[
(k −ℓ)αk−ℓ−1

∗ (1−α∗)ℓ−ℓαk−ℓ
∗ (1−α∗)ℓ−1

]

= n∗

(1−θ)(1−2m∗
2 /n∗)

· d

k(1−α∗)
·
(

k

ℓ

)[
(ℓ+1)(1−α∗)ℓ+1αk−ℓ−1

∗ −ℓαk−ℓ
∗ (1−α∗)ℓ

]

= n∗

(1−θ)(n∗−2m∗
2 )

· [(ℓ+1)m∗
ℓ+1 −ℓm∗

ℓ

]
,

which is the first part of (6.15). An analogous computation yields the second part of (6.15). Finally, (6.16) follows
from (3.11). □

Proof of Proposition 6.3. The proposition is an immediate consequence of Corollary 6.8 and Lemma 6.9. □

6.2.2. Proof of Proposition 6.4. Recall that F ′
UC,t+1 is the XORSAT formula that contains the variables V ′(t +1) that

get assigned during iteration t +1 and the clauses C ′(t +1) of F UC,t that contain variables from V ′(t +1) only. Also
recall that G(F ′

UC,t+1) signifies the graph representation of this XORSAT formula. Unless V ′(t +1) = ;, the graph
G(F ′

UC,t+1) is connected.

Lemma 6.10. Fix ε> 0 and let 0 ≤ t ≤ (1−ε)n. With probability 1−o(1/n) the graph G(F ′
UC,t+1) satisfies

|E(G(F ′
UC,t+1))| ≤ |V (G(F ′

UC,t+1))|.
Proof. We recall from the proof of Lemma 6.7 that iteration t + 1 of UCP can be described by a branching pro-
cess on the random graph G(F UC,t ). Given that xt+1 is still unassigned, the offspring distribution of the branch-
ing process has mean 2m2(t )/n(t ). Moreover, Proposition 6.3 shows that with probability 1 −O(n−2) we have
2m2(t )/n(t ) ∼ d(k − 1)(1−α∗)αk−2

∗ < 1 (as d < dmin). Hence, the branching process is sub-critical. As a conse-
quence, with probability 1−O(n−2) we have

P
[
|V (G(F ′

UC,t+1))| ≥ log2 n
]
=O(n−2). (6.19)

Each step of the branching process corresponds to pursuing the unit clause implications of assigning a truth
value to a single variable x. A cycle in G(F ′

UC,t+1) can only ensue if a clause that contains x also contains a variable

that has already been set previously during iteration t +1. In light of (6.19), with probability 1−O(n−2) there are
no more than log2 n such variables. Hence, the probability that the assignment of x closes a cycle is of order
O(log2 n/n). Additionally, by the principle of deferred decisions the events that two different clauses processed by
unit clause propagation close cycles is of order O(log4 n/n2). Finally, since by (6.19) we may assume that the total
number of clauses does not exceed O(log2 n), we conclude that

P
[
|E(G(F ′

UC,t+1))| > |V (G(F ′
UC,t+1))|

]
=O(log6 n/n2) = o(1/n),

as desired. □

Thus, with probability 1−o(1/n) the graph G(F ′
UC,t+1) contains at most one cycle. While it is easy to check that

no conflict occurs in iteration t +1 if G(F ′
UC,t+1) is acyclic, in the case that G(F ′

UC,t+1) contains a single cycle there
is a chance of a conflict. The following definition describes the type of cycle that poses an obstacle.

Definition 6.11. For a XORSAT formula F we call a sequence of variables and clauses C = (v1,c1, . . . , vℓ,cℓ, vℓ+1 =
v1) a toxic cycle of length ℓ if

29


TOX1: ci contains the variables xi , xi+1 only, and
TOX2: the total number of negations in c1, . . .cℓ is odd iff ℓ is even.

Lemma 6.12. (i) If F ′
UC,t+1 contains a toxic cycle, then a conflict occurs in iteration t +1.

(ii) If F ′
UC,t+1 contains no toxic cycle and |E(G(F ′

UC,t+1))| ≤ |V (G(F ′
UC,t+1))|, then no conflict occurs in iteration

t +1.

Proof. Towards (i) we show that F ′
UC,t+1 is not satisfiable if there is a toxic cycle C = (v1,c1, . . . ,cℓ, vℓ+1 = v1); then

UCP will, of course, run into a contradiction. To see that F ′
UC,t+1 is unsatisfiable, we transform each of the clauses

c1, . . . ,cℓ into a linear equation ci ≡ (vi + vi+1 = yi ) over F2. Here yi ∈ F2 equals 1 iff ci contains an even number of
negations. Adding these equations up yields

∑ℓ
i=1 yi = 0 in F2. This condition is violated if C is toxic.

Let us move on to (ii). Assume for contradiction that there exists a formula F without a toxic cycle such that
|V (G(F ))| ≤ |E(G(F ))| and such that given F ′

UC,t+1 = F , UCP may run into a conflict. Consider such a formula F that
minimises |V (F )| + |C (F )|. Since UCP succeeds on acyclic F , we have |V (G(F ))| = |E(G(F ))|. Thus, G(F ) contains
a single cycle C = (v1,c1, . . . , vℓ,cℓ, vℓ+1 = v1). Apart from the cycle, F contains (possibly empty) acyclic formulas
F ′

1, . . . ,F ′
ℓ

attached to v1, . . . , vℓ and F ′′
1 , . . . ,F ′′

ℓ
attached to c1, . . . ,cℓ. The formulas F ′

1,F ′′
1 , . . . ,F ′

ℓ
,F ′′

ℓ
are mutually

disjoint and do not contain unit clauses.
We claim that F ′

1, . . . ,F ′
ℓ

are empty because |V (F )|+ |C (F )| is minimum. This is because given any truth assign-
ment of v1, . . . , vℓ, UCP will find a satisfying assignment of the acyclic formulas F ′

1, . . . ,F ′
ℓ

.
Further, assume that one of the formulas F ′′

1 , . . . ,F ′′
ℓ

is non-empty; say, F ′′
1 is non-empty. If the start variable

that UCP assigns were to belong to F ′′
1 , then c1, containing x1 and x2, would not shrink to a unit clause, and thus

UCP would not assign values to these variables. Hence, UCP starts by assigning a truth value to one of the variables
v1, . . . , vℓ; say, UCP starts with v1. We claim that then UCP does not run into a conflict. Indeed, the clauses c2, . . . ,cℓ
may force UCP to assign truth values to x2, . . . , xℓ, but no conflict can ensue because UCP will ultimately satisfy c1

by assigning appropriate truth values to the variables of F ′′
1 .

Thus, we may finally assume that all of F ′
1,F ′′

1 , . . . ,F ′
ℓ

,F ′′
ℓ

are empty. In other words, F consists of the cycle C

only. Since C is not toxic, TOX2 does not occur. Consequently, UCP will construct an assignment that satisfies all
clauses c1, . . . ,cℓ. This final contradiction implies (ii). □

Corollary 6.13. Fix ε> 0 and let 0 ≤ t ≤ (1−ε)n. Then

P [Rt+1] =P
[

F ′
UC,t+1 contains a toxic cycle

]
+o(1/n).

Proof. This is an immediate consequence of Lemma 6.10 and Lemma 6.12. □

Thus, we are left to calculate the probability that F ′
UC,t+1 contains a toxic cycle. To this end, we estimate the

number of toxic cycles in the ‘big’ formula F UC,t . Let T t ,ℓ be the number of toxic cycles of length ℓ in F UC,t .

Lemma 6.14. Fix ε> 0 and let 1 ≤ t ≤ (1−ε)n.

(i) For any fixed ℓ, with probability 1−O(n−2) we have

E [T t (ℓ) |Ft ] =βℓ+o(1), where βℓ =
1

4ℓ

(
d(k −1)(1−α∗)αk−2

∗
)ℓ

= 1

4ℓ

(
fn(t )

)ℓ .

(ii) For any 1 ≤ ℓ≤ n, with probability 1−O(n−2) we have E [T t (ℓ) |Ft ] ≤βℓ exp(εℓ).

Proof. In light of Fact 6.2, the calculation of the expected number of toxic cycles is straightforward. Indeed, we just
need to pick sequences of ℓ distinct variables and clauses, place the variables into the clauses in a cyclic fashion,
and multiply by the probability that the clauses contain no other variables and that the parity of the signs of the
clauses works out as per TOX2. Of course, in this way we over count toxic cycles 2ℓ times (due to the choice of the
starting point and the orientation). Hence, we obtain

E [T t (ℓ) |Ft ] = (n)ℓ(m)ℓ
4ℓn2ℓ

(k(k −1))ℓ (1−α(t ))ℓα(t )ℓ(k−2). (6.20)

Thus, (i) follows from (6.20) and Proposition 6.3. Further, (6.20) demonstrates that

E [T t (ℓ) |Ft ] ≤ 1

4ℓ

(
d(k −1)(1−α(t ))α(t )k−2

)ℓ
. (6.21)

Finally, combining (6.21) with Proposition 6.3, we obtain (ii). □
30


Proof of Proposition 6.4. In light of Corollary 6.13 we just need to calculate the probability that F ′
UC,t+1 contains a

toxic cycle. Clearly, if during iteration t +1 UCP encounters a variable of F UC,t that lies on a toxic cycle, UCP will
proceed to add the entire toxic cycle to F ′

UC,t+1 (and run into a contradiction). Furthermore, Lemma 6.14 shows

that with probability 1−O(n−2) given Ft the probability that a random variable of F UC,t belongs to a toxic cycle
comes to

β̄=
∑
ℓ≥2

ℓβℓ+o(1) =
∑
ℓ≥2

1

4

(
fn(t )

)ℓ = fn(t )2

4(1− fn(t ))
+o(1) =O(1). (6.22)

We now use (6.22) to calculate the desired probability of encountering a toxic cycle. To this end we recall from
the proof of Lemma 6.7 that the (t +1)-st iteration of UCP corresponds to a branching process with expected off-
spring fn(t ), unless the root variable xt+1 has already been assigned. Due to (6.11) and Proposition 6.3, with proba-
bility 1−O(n−2) the conditional probability of this latter event equals (nα∗−t )/(n−t )+o(1). Further, given that the
root variable has not been assigned previously, the expected progeny of the branching process, i.e., the expected
number of variables in F ′

UC,t+1, equals 1/(1− fn(t ))+o(1). Since with probability 1−O(n−2) given Ft there remain
n(t ) = (1−α∗+o(1))n unassigned variables in total, (6.22) implies that with probability 1−o(1/n),

P [Rt+1 |Ft ] ∼ β̄

(1−α∗)n
· 1−α∗

1− t/n
· 1

1− fn(t )
= fn(t )2

4(1− fn(t ))2(n − t )
+o(1/n),

as claimed. □

6.2.3. Proof of Proposition 6.5. We combine Fact 6.2 with the tower rule. Specifically, let 0 ≤ t1 < ·· · < th < (1−ε)n
be distinct time indices. Then repeated application of the tower rule gives

P

[
h⋂

i=1
Rti

]
= E

[
h∏

i=1
1
{
Rti

}
]
= E

[
E

[
h∏

i=1
1
{
Rti

} |Fti−1

]]

= E
[(

h−1∏
i=1

1
{
Rti

}
)
P

[
Rth |Fth−1

]
]
= ·· · = E

[
h∏

i=1
P

[
Rti |Fti−1

]
]

. (6.23)

Furthermore, Proposition 6.4 shows that with probability 1−o(1/n),

P
[
Rti |Fti−1

]= fn(ti )2

4(n − ti )(1− fn(ti ))2 +o(1/n) for all 1 ≤ i ≤ h. (6.24)

Combining (6.23)–(6.24) completes the proof.

6.2.4. Proof of Proposition 6.6. Given δ> 0 pick ε> 0 small enough and let t = ⌈(1−ε)n⌉. We are going to show that
the graph G(F UC,t ) is acyclic with probability at least 1−δ. Since all clauses of F UC,t contain at least two variables,
UCP will find a satisfying assignment if G(F UC,t ) is acyclic.

To show that G(F UC,t ) is acyclic, we observe that α∗ ≥ t/n. Hence, α∗ approaches one as t/n → 1. Further,
Fact 6.2 shows that G(F UC,t ) is uniformly random given the degree distribution (6.1) of the clause nodes. Indeed,
the expression (6.1) shows that with probability 1−O(n−2) the expected size of the second neighbourhood of a
given variable node is asymptotically equal to

γ= γ(ε) = 1

(1−α∗)n
· dn

k

k∑
ℓ=2

ℓ

(
k

ℓ

)
(1−α∗)ℓαk−ℓ

∗ = d(1−αk−1
∗ ).

Hence, as limε→0γ = 0, the average degree of the random graph G(F UC,t ) tends to zero as ε→ 0. Therefore, for
small enough ε> 0 the random graph G(F UC,t ) is acyclic with probability greater than 1−δ.

6.3. Failure of UCP for dmin < d < dsat. In this section we assume that dmin < d < dsat. As in Section 6.2 we are going
to trace UCP via the method of differential equations. In particular, we keep the notation from Section 6.2. Thus,
n(t ) signifies the number of unassigned variables after t iterations, and mℓ(t ) denotes the number of clauses that
contain precisely 2 ≤ ℓ ≤ k unassigned variables. Moreover, F UC,t is the formula comprising these variables and
clauses. The following statement is the analogue of Proposition 6.3 for dmin < d < dsat. Its proof relies on similar
arguments as the proof of Proposition 6.3.

Proposition 6.15. Suppose that dmin(k) < d < dsat(k), fix ε,δ > 0 and let 0 < t < (1−ε)θ∗n. Then (6.1) holds with
probability 1−O(n−2).

31


Proof. The formulas (6.8)–(6.10) for the conditional expected changes n(t +1)−n(t ),mℓ(t +1)−m(t ) continue to
hold for dmin < d < dsat, so long as we assume that 2m2(t )/n(t ) < 1−Ω(1) and n(t ) =Ω(n). Indeed, the proof of
Lemma 6.7 only hinges on these assumptions on n(t ),m2(t ), irrespective of d . Hence, if n,m2, . . . ,mk : [0,θ∗−δ] →R

are functions that satisfy the conditions (6.13)–(6.15) and that satisfy

sup
θ∈[0,θ∗−δ]

2m2(θ)/n(θ) < 1, (6.25)

then [26, Theorem 2] implies that for all 0 ≤ t < (1−δ)θ∗n we have

n(t )/n = n(t/n)+o(1), mℓ(t )/n =mℓ(t/n)+o(1) (2 ≤ ℓ≤ k).

Finally, we claim that the functions n∗ : [0,θ∗−δ] → R, m∗
ℓ

: [0,θ∗−δ] → R defined by (6.17) satisfy (6.13)–(6.15)
and (6.25). In fact, the same manipulations as in the proof of Lemma 6.9 yield (6.13)–(6.15). Additionally, (6.25)
follows from Lemma 3.5 (ii) and Proposition 2.2 (ii), which shows that α∗ is a stable fixed point and therefore

2m2(θ)/n(θ) = d(k −1)(1−α∗)αk−2
∗ < 1 for 0 ≤ θ ≤ θ∗−δ.

Thus, we obtain (6.1) for 0 ≤ θ < θ∗. □

Proof of Theorem 1.1 (ii). Let u1, . . . ,un ∈ {0,1} be uniformly distributed, mutually independent and independent
of all other randomness. We couple the execution of the decimation process and of the UCP algorithm on a random
formula F as follows. At every time t where πF DC,t = 1/2, the decimation process sets σDC(xt+1) = u t+1. Similarly,
whenever UCP executes Step 5 we set σUC(xt+1) = u t+1. Let ∆ be the first time 0 ≤ t < n such that σDC(xt+1) ̸=
σUC(xt+1); if σDC(xt+1) =σUC(xt+1) for all t , we set∆= n.

We claim that UCP encounters a conflict if ∆ < n. To see this, assume that 0 ≤ t < n satisfies σDC(xt+1) ̸=
σUC(xt+1) but σDC(xs+1) ̸= σUC(xs+1) for all 0 ≤ s < t and that UCP did not encounter a conflict at any time s ≤ t .
Then πF DC,t ∈ {0,1} but Step 5 of UCP sets σUC(xt+1) = u t+1 ̸= σDC(xt+1). Consequently, F possesses no satisfying
assignment σ such that σUC(xi ) =σ(xi ) for 1 ≤ i ≤ t +1, and thus UCP will ultimately encounter a conflict.

To complete the proof we claim that P [∆< n] = 1−o(1). To verify this consider a time (1+ε)θcond < t/n < (1−
ε)θ∗n. Then Proposition 2.2 and Proposition 2.8 show that |V0(F DC,t )| =α∗n +o(n) w.h.p., while Proposition 6.15
shows thatα(t ) =α∗+o(1) w.h.p. In particular, even if∆≥ (1+ε)θcond, the probability that πF DC,t ∈ {0,1} while UCP
assigns xt+1 randomly isΩ(1). Therefore,∆< θ∗n w.h.p. □

REFERENCES

[1] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802.
[2] D. Achlioptas, M. Molloy: The solution space geometry of random linear equations. Random Structures and Algorithms 46 (2015) 197–231.
[3] P. Ayre, A. Coja-Oghlan, P. Gao, N. Müller: The satisfiability threshold for random linear equations. Combinatorica 40 (2020) 179–235.
[4] B. Bollobás: Random graphs. Cambridge University Press (2001).
[5] A. Braunstein, M. Mézard, R. Zecchina: Survey propagation: An algorithm for satisfiability. Random Structures and Algorithms 27 (2005)

201–226.
[6] A. Coja-Oghlan: Belief Propagation fails on random formulas. Journal of the ACM 63 (2017) #49.
[7] A. Coja-Oghlan: A better algorithm for random k-SAT. SIAM J. Computing 39 (2010) 2823–2864.
[8] A. Coja-Oghlan, A. Ergür, P. Gao, S. Hetterich, M. Rolvien: The rank of sparse random matrices. Proc. 31st SODA (2020) 579–591.
[9] A. Coja-Oghlan, P. Gao, M. Hahn-Klimroth, J. Lee, N. Müller, M. Rolvien: The full rank condition for sparse random matrices. Combina-

torics, Probability and Computing 33 (2024), 643–707.
[10] A. Coja-Oghlan, A. Pachon-Pinzon: The decimation process in random k-SAT. SIAM Journal on Discrete Mathematics 26 (2012) 1471–1509.
[11] C. Deroulers, R. Monasson: Criticality and universality in the unit-propagation search rule. Eur. Phys. J. B 49 (2006) 339–369.
[12] O. Dubois, J. Mandler: The 3-XORSAT threshold. Proc. 43rd FOCS (2002) 769–778.
[13] A. Frieze, S. Suen: Analysis of two simple heuristics on a random instance of k-SAT. J. Algorithms 20 (1996) 312–355.
[14] D. Gamarnik: The overlap gap property: a topological barrier to optimizing over random structures. PNAS 118 (2021) e2108492118.
[15] S. Hetterich: Analysing survey propagation guided decimation on random formulas. Proc. 43rd ICALP (2016) #65.
[16] M. Ibrahimi, Y. Kanoria, M. Kraning, A. Montanari: The set of solutions of random XORSAT formulae. Annals of Applied Probability 25

(2015) 2743–2808.
[17] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová: Gibbs states and the set of solutions of random constraint

satisfaction problems. Proc. National Academy of Sciences 104 (2007) 10318–10323.
[18] A. Maier, F. Behrens, L. Zdeborová: Dynamical cavity method for hypergraphs and its application to quenches in the k-XOR-SAT problem.

arxiv 2412.14794 (2024).
[19] M. Mézard, A. Montanari: Information, physics and computation. Oxford University Press 2009.
[20] M. Mézard, T. Mora, R. Zecchina: Clustering of solutions in the random satisfiability problem. Phys. Rev. Lett. 94 (2005) 197205
[21] M. Mézard, G. Parisi, R. Zecchina: Analytic and algorithmic solution of random satisfiability problems. Science 297 (2002) 812–815.

32


[22] M. Mézard, F. Ricci-Tersenghi, R. Zecchina: Two solutions to diluted p-spin models and XORSAT problems. Journal of Statistical Physics
111 (2003) 505–533.

[23] M. Molloy: Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms 27 (2005) 124–135.
[24] B. Pittel, G. Sorkin: The satisfiability threshold for k-XORSAT. Combinatorics, Probability and Computing 25 (2016) 236–268.
[25] F. Ricci-Tersenghi, G. Semerjian: On the cavity method for decimated random constraint satisfaction problems and the analysis of belief

propagation guided decimation algorithms. J. Stat. Mech. (2009) P09001.
[26] N. Wormald: Differential equations for random processes and random graphs. Ann. Appl. Probab. 5 (1995) 1217–1235.
[27] K. Yung: Limits of sequential local algorithms on the random k-XORSAT problem. Proc. 51st ICALP (2024) #123.

ARNAB CHATTERJEE, arnab.chatterjee@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT-
MUND 44227, GERMANY.

AMIN COJA-OGHLAN, amin.coja-oghlan@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATH-
EMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY.

MIHYUN KANG, kang@math.tugraz.at, TU GRAZ, INSTITUTE OF DISCRETE MATHEMATICS, STEYRERGASSE 30, 8010 GRAZ, AUSTRIA.

LENA KRIEG, lena.krieg@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORTMUND 44227,
GERMANY.

MAURICE ROLVIEN, maurice.rolvien@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT-
MUND 44227, GERMANY.

GREGORY B. SORKIN, g.b.sorkin@lse.ac.uk, THE LONDON SCHOOL OF ECONOMICS AND POLITICAL SCIENCE, DEPARTMENT OF MATHE-
MATICS, COLUMBIA HOUSE, HOUGHTON ST, LONDON WC2A 2AE, UNITED KINGDOM

33


THE RANDOM k-SAT GIBBS UNIQUENESS THRESHOLD REVISITED

ARNAB CHATTERJEE, AMIN COJA-OGHLAN, CATHERINE GREENHILL, VINCENT PFENNINGER, MAURICE ROLVIEN, PAVEL ZAKHAROV,
KOSTAS ZAMPETAKIS

ABSTRACT. We prove that for any k ≥ 3 for clause/variable ratios up to the Gibbs uniqueness threshold of the corre-
sponding Galton-Watson tree, the number of satisfying assignments of random k-SAT formulas is given by the ‘replica
symmetric solution’ predicted by physics methods [Monasson, Zecchina: Phys. Rev. Lett. 76 (1996)]. Furthermore, while
the Gibbs uniqueness threshold is still not known precisely for any k ≥ 3, we derive new lower bounds on this thresh-
old that improve over prior work [Montanari and Shah: SODA (2007)]. The improvement is significant particularly for
small k. MSc: 68Q87, 60C05, 68R07

1. INTRODUCTION

1.1. Background and motivation. Going back to experimental work from the 1990s, the most prominent question
concerning random k-SAT has been to pinpoint the satisfiability threshold, defined as the largest density m/n of
clauses m to variables n up to which satisfying assignments likely exist [6, 18]. Currently, the satisfiability thresh-
old is known precisely in the case of k = 2 [21, 40] and for k ≥ k0 with k0 an undetermined (large) constant [33].
The latter result confirms ‘predictions’ based on an analytic but non-rigorous physics technique called the ‘cavity
method’. Indeed, the cavity method predicts the satisfiability threshold for every k ≥ 3 [51], but random k-SAT for
‘small’ k ≥ 3 appears to be a particularly hard nut to crack. Additionally, according to the cavity method several
phase transitions precede the satisfiability threshold and are expected to impact, among other things, the perfor-
mance of algorithms [47]. One of these phase transitions, the Gibbs uniqueness transition, pertains to a spatial
mixing property that also plays a pivotal role in the computational complexity of counting and sampling [61].

From a statistical physics viewpoint, the satisfiability threshold is only the second most important quantity as-
sociated with random k-SAT. The first place firmly belongs to the typical number of satisfying assignments, known
as the partition function in physics parlance [50]. All the other predictions, including the location of the satisfi-
ability threshold, ultimately derive from the formula for the number of satisfying assignments or closely related
variables [49]. Yet there has been little progress on confirming the physics formula for the number of satisfying
assignments rigorously.

Three prior contributions stand out. First, a proof technique called the ‘interpolation method’ turns the physics
prediction into a rigorous upper bound [35, 41, 60].1 Second, in the case k = 2, conceptually much simpler than
k ≥ 3, the physics formula has been proved correct [3]. Third, Montanari and Shah [56] proved that also for k ≥ 3 for
certain clause/variable densities the ‘replica symmetric solution’ from physics correctly approximates the number
of ‘good’ assignments that satisfy all but o(n) clauses. However, it seems difficult to estimate the gap between the
number of such ‘good’ assignments and the number of actual satisfying assignments. A rigorous method to this
effect would likely imply the existence of uniform satisfiability thresholds for all k ≥ 3, thereby resolving a long-
standing conundrum [12, 36]. The proof of Montanari and Shah is based on the aforementioned Gibbs uniqueness
property.

The aim of the present paper is to determine the number of actual satisfying assignments of random k-SAT for-
mulas for clause/variable densities up to the Gibbs uniqueness threshold. Specifically, we verify that the ‘replica
symmetric solution’ from [54, 55] yields the correct answer for any k ≥ 3 right up to the Gibbs uniqueness threshold,
even though the precise value of this threshold is not currently known. Additionally, we derive a new lower bound
on the Gibbs uniqueness threshold. The improvement is particularly significant for ‘small’ k ≥ 3. Combining these
two results, we obtain the first rigorous formula for the number of satisfying assignments of random k-SAT for-
mula for a non-trivial regime of clause/variable densities. Crucially, the result covers meaningful clause/variable
densities even for small k ≥ 3.

1Strictly speaking, the contributions [35, 41, 60] deal with the ‘random k-SAT model at positive tempertature’, see Section 2.6. In Corol-
lary 2.2 below we combine the interpolation method with a concentration argument to bound the number of actual satisfying assignments.

1

ar
X

iv
:2

50
6.

01
35

9v
2 

 [
cs

.D
M

] 
 1

8 
N

ov
 2

02
5


1.2. Results. Let Φ = Φd ,k (n) be the random k-CNF on n Boolean variables x1, . . . , xn with m = mn ∼ Po(dn/k)
clauses a1, . . . , am . The clauses ai are drawn independently and uniformly from the set of all 2k

(n
k

)
possible clauses

with k distinct variables. Hence, the parameter d prescribes the expected number of clauses in which a given
variable appears. Let S(Φ) be the set of satisfying assignments ofΦ and let Z (Φ) = |S(Φ)|. We encode the Boolean
values ‘true’ and ‘false’ by +1 and −1, respectively. Since right up to the satisfiability threshold Z (Φ) is of order
exp(Θ(n)) w.h.p. for trivial reasons2, our objective is to study the random variable n−1 log Z (Φ) as n →∞.

1.2.1. The number of satisfying assignments up to the Gibbs uniqueness threshold. The first main result vindicates
the ‘replica symmetric solution’ for values of d up to the Gibbs uniqueness threshold of the Galton-Watson tree
that mimics the local topology ofΦ. Let us define these concepts precisely.

We begin with the Galton-Watson tree T = Td ,k , which is generated by a two-type branching process. The two
types are variable nodes and clause nodes. The process starts with a single root variable node x. The offspring of
any variable node is a Po(d) number of clause nodes, while every clause node begets precisely k−1 variable nodes.
Additionally, independently for each clause node a and every variable node x that is either a child or the parent of
a a sign, denoted sign(x, a) ∈ {±1}, is chosen uniformly at random. The resulting random tree T models the local
structure of the random formulaΦ in the sense of local weak convergence [9, 48].3

Next, we define the Gibbs uniqueness property on the tree T. For an integer ℓ ≥ 0 let T(ℓ) be the finite tree
obtained by removing all variable and clause nodes at a distance greater than 2ℓ from the root x. We identify the
finite tree T(ℓ) with a Boolean formula whose variables/clauses are precisely the variable/clause nodes of T(ℓ).
Let S(T(ℓ)) ̸= ; be the set of satisfying assignments of this formula and let τ(ℓ) ∈ S(T(ℓ)) be a uniformly random
satisfying assignment. Moreover, let ∂2ℓx be the set of variable nodes of T(ℓ) at distance precisely 2ℓ from the root
x. Then for given d ,k the tree T=Td ,k has the Gibbs uniqueness property if

lim
ℓ→∞

E

[
max

τ∈S(T(ℓ))

∣∣∣P
[
τ(ℓ)(x) = 1 |T

]
−P

[
τ(ℓ)(x) = 1 |T, ∀x ∈ ∂2ℓx :τ(ℓ)(x) = τ(x)

]∣∣∣
]
= 0 (see [47]). (1.1)

In words, in the limit of large ℓ the truth value τ(ℓ)(x) of the root x is asymptotically independent of the truth values
{τ(ℓ)(x)}x∈∂2ℓx of the variables at distance 2ℓ from x. In light of the above, for any k ≥ 2 we further define duniq(k) as

duniq(k) = inf{d > 0 : condition (1.1) fails to hold for d ,k} . (1.2)

It is easy to see that duniq(k) is strictly positive and finite for any k ≥ 2. Indeed, in Theorem 1.2 we will derive explicit
lower bounds on duniq(k). However, the exact value of duniq(k) is not currently known for any k ≥ 3.

As a final preparation we need to spell out the ‘replica symmetric solution’ from [54]. This prediction comes
in terms of a distributional fixed point problem, i.e., a fixed point problem on the space P (0,1) of probability
measures on the open unit interval. Specifically, consider the Belief Propagation operator

BPd ,k : P (0,1) →P (0,1), π 7→ π̂= BPd ,k (π) (1.3)

defined as follows. Let d+,d− ∼ Po(d/2) be Poisson variables with expectation d/2. Moreover, let (µπ,i , j )i , j≥1

be a sequence of i.i.d. random variables, each following distribution π. All these random variables are mutually
independent. Further, let

µπ,i = 1−
k−1∏
j=1

µπ,i , j for i ≥ 1, and µ̂π =
∏d−

i=1µπ,2i−1
∏d−

i=1µπ,2i−1 +
∏d+

i=1µπ,2i

. (1.4)

Then π̂ is the distribution of µ̂π. Furthermore, for a probability measure π ∈P (0,1) define the Bethe free entropy4

Bd ,k (π) = E
[

log

(
d−∏
i=1

µπ,2i−1 +
d+∏
i=1

µπ,2i

)
− d(k −1)

k
log

(
1−

k∏
j=1

µπ,1, j

)]
, (1.5)

provided that the expectation on the r.h.s. exists. Finally, let δ1/2 ∈P (0,1) be the atom at 1/2 and let us write BPℓd ,k
for the ℓ-fold application of the operator BPd ,k .

2For example, w.h.p. there areΩ(n) variables that do not appear in any clause.
3Corollary 3.7 below provides a precise statement to this effect.
4Throughout the paper log refers to the natural logarithm.

2


FIGURE 1. Comparison of Bd ,k (πd ,k ) with known bounds for limn→∞ 1
n log Z (Φ) for k = 3. The

red dotted line depicts the first moment upper bound (1.13), while the green dotted line repre-
sents the lower bound provided by (1.14). The blue line displays a numerical approximation of
Bd ,3(πd ,3). To obtain our values, we generated 106 samples from π≈ BP25

d ,3(δ1/2) and then evalu-
ated the corresponding empirical average of the expression in (1.5).

Theorem 1.1. Let k ≥ 3 and assume that 0 < d < duniq(k). Then the weak limit

πd ,k = lim
ℓ→∞

BPℓd ,k (δ1/2) ∈P (0,1) (1.6)

exists and

lim
n→∞

1

n
log Z (Φ) =Bd ,k (πd ,k ) in probability. (1.7)

The formula (1.7) matches the prediction from [54] precisely. Of course, part of the assertion of Theorem 1.1
is that the Bethe free entropy Bd ,k (πd ,k ) is well defined. Admittedly, the formula (1.7) is not ‘explicit’. But the
proof of Theorem 1.1 evinces that the convergence (1.6) occurs rapidly. Therefore, a randomised algorithm called
‘population dynamics’ [50] can be used to approximate (1.7) within any desired numerical accuracy.

1.2.2. An improved lower bound on Gibbs uniqueness. The obvious next task is to determine the Gibbs uniqueness
threshold duniq(k). Currently, its value is known precisely only in the case k = 2, where duniq(2) = 2 coincides with
the random 2-SAT satisfiability threshold [3, 21, 40]. Furthermore, Montanari and Shah [56] proved that the pure
literal threshold5 dpure(k) upper bounds duniq(k) for all k ≥ 2.6 The value of dpure(k) admits a neat formula [16, 53]:

duniq(k) ≤ dpure(k) = min
z>0

z

(1−exp(−z/2))k−1
. (1.8)

Complementing the upper bound (1.8), Montanari and Shah derived a lower bound dMS(k):

dMS(k) = sup
{

d > 0 : d(k −1)
(
1−exp(−d/2)/4

)(
1−exp(−d/2)/2

)k−2 < 1
}
≤ duniq(k). (1.9)

5This marks the threshold up to which the pure literal algorithm–which repeatedly assigns the preferred value to all variables appearing
with a single sign–produces a satisfying assignment w.h.p.

6To be precise, Montanari and Shah established an upper bound on the Gibbs uniqueness threshold that turns out to coincide with the
pure literal threshold, albeit without pointing out this identity.

3


Unfortunately, the bound (1.9) is tight not even in the case k = 2, where duniq(2) = 2 while dMS(2) ≈ 1.16. That said,
the lower and upper bounds dMS(k) and dpure(k) match asymptotically in the limit of large k, as

dMS(k),dpure(k) = (2+ok (1)) logk, (1.10)

with ok (1) hiding a term that vanishes as k →∞. The following theorem yields an improved lower bound dcon(k)
on duniq(k).

Theorem 1.2. For all k ≥ 3 we have

duniq(k) ≥ dcon(k) := sup

{
d > 0 :

d(k −1)

2

(
1−exp(−d/2)/2

)k−2 < 1

}
. (1.11)

An easy calculation reveals that

dMS(k) < dcon(k) for every k ≥ 2. (1.12)

Moreover, it is satisfactory that the formula (1.11) reproduces the correct (previously known) threshold duniq(2) =
dcon(2) = dpure(2) = 2. That said, we have no reason to believe that (1.11) is tight for any k ≥ 3.

k 2 3 4 5

dgiant 1.0000 0.5000 0.3333 0.2500
dMS 1.1625 0.8792 0.8695 0.9236
dcon 2.0000 1.3431 1.2451 1.2635
dpure 2.0000 4.9108 6.1782 7.0178
dsat 2.0000 12.801 39.724 105.585

TABLE 1. The values of dMS(k),dcon(k), and dpure(k) for 2 ≤ k ≤ 5. Additionally, dgiant(k) =
1/(k−1) marks the giant component threshold of the hypergraph induced by the random k-CNF
formula. Moreover, dsat(k) is the satisfiability threshold according to physics predictions [49]. It
is not hard to show that dgiant(k) ≤ dMS(k) ≤ dcon(k) ≤ duniq(k) ≤ dpure(k) ≤ dsat(k), for all k ≥ 2.

Combining Theorems 1.1 and 1.2, we obtain the following.

Corollary 1.3. Let k ≥ 3. If d < dcon(k) then (1.7) holds.

Corollary 1.3 constitutes the first rigorous result to determine the precise asymptotic value of log Z (Φ) for a non-
trivial regime of d for any k ≥ 3. To elaborate, the formula (1.7) is trivially true for d < 1/(k −1) because for such d
the k-uniform hypergraph induced by the clauses ofΦ has no giant component and Belief Propagation is exact on
acyclic graphical models [50]. But Corollary 1.3 applies to d well beyond this threshold, as displayed in Table 1. In
particular, in contrast to much of the prior work on random k-SAT, Corollary 1.3 applies to a non-trivial regime of
d even for ‘small’ k ≥ 3.

Although Table 1 contains the values dMS(k) from [56] for comparison, we emphasise that Montanari and Shah’s
result only yields the number of ‘good’ assignments satisfying all but o(n) clauses, rather than of actual satisfying
assignments. In fact, the best prior rigorous bounds on the number of satisfying assignments for d > 1/(k − 1)
derive from the first and the second moment methods. Specifically, the folklore first moment bound reads

1

n
log Z (Φ) ≤ log2+ d

k
log(1−2−k )+o(1) w.h.p. (1.13)

Furthermore, Achlioptas and Peres [7] perform a second moment argument on the number of balanced satisfying
assignments, i.e., satisfying assignments that enjoy a peculiar additional condition required to keep the second
moment under control. They show that w.h.p.

1

n
log Z (Φ) ≥ (1−d) log2+ d

k
log

[(
λ1/2 +λ−1/2)k −λ−k/2

]
+o(1), where (1−λ)(1+λ)k−1 = 1, λ> 0. (1.14)

Figure 1 illustrates the bounds (1.13)–(1.14) along with (1.7) for k = 3. As the figure shows, the correct value
(1.7) is quite close to the first moment bound. That said, the first moment bound strictly exceeds Bd ,k (πd ,k ) for
all d > 0, k ≥ 3 [24]. On the other hand, Figure 1 demonstrates that the ‘balanced second moment bound’ (1.14)
significantly undershoots Bd ,3(πd ,3). Recall that Figure 1 is on a logarithmic scale; thus, even small differences
translate into exponentially large errors.

4


1.3. Preliminaries and notation. Let Φ be a Boolean expression in conjunctive normal such that no clause con-
tains the same variable twice. We write V (Φ) for the set of Boolean variables of Φ and F (Φ) for the set of clauses.
The formula Φ gives rise to a bipartite graph G(Φ) on the vertex set V (Φ)∪F (Φ) in which a variable x and a clause
a are adjacent iff variable x appears in clause a (either positively or negatively). Let E(Φ) denote the edge set of the
graph G(Φ). Furthermore, for a vertex v ∈V (Φ)∪F (Φ) let ∂Φv be the set of neighbours of v ; where the reference to
Φ is self-evident, we just write ∂v .

The graph G(Φ) induces a metric on V (Φ)∪F (Φ) by letting distΦ(v, w) equal the length of the shortest path from
v to w . For a vertex v and an integer ℓ≥ 0 let ∂ℓΦv = ∂ℓv be the set of all vertices w at distance precisely ℓ from v .

For a clause a and a variable x ∈ ∂a we define signΦ(x, a) = 1 if a contains x as a positive literal, and signΦ(x, a) =
−1 if a contains the negation ¬x. (This is unambiguous because clause a is not allowed to contain both x and ¬x.)
For a variable x ∈V (Φ) and s ∈ {±1} we let ∂s

Φx = ∂s x be the set of clauses a ∈ ∂Φx such that signΦ(x, a) = s. Where
convenient we use the shorthand ∂±x = ∂±1x. We say that a variable x is pure inΦ if signΦ(x, a) = signΦ(x,b) for all
a,b ∈ ∂x. More specifically, say that x is a pure literal ofΦ if ∂−x =;. Similarly, ¬x is called a pure literal if ∂+x =;.
A variable or literal that fails to be pure is called mixed.

For a literal l ∈ {x,¬x : x ∈ V (Φ)} we let |l | denote the underlying variable; thus, |x| = |¬x| = x for x ∈ V (Φ).
Moreover, we define sign(x) = 1 and sign(¬x) =−1. Further, for a literal l we define 1 · l = l and (−1) · l =¬l .

If Φ is satisfiable, then σΦ = (σΦ(x))x∈V (Φ) denotes a uniformly random satisfying assignment of Φ. Where the
reference toΦ is obvious we just write σ.

Let µ,ν be two probability measures on Rh , let q ≥ 1 and assume that
∫
Rh ∥x∥q

q dµ(x),
∫
Rh ∥x∥q

q dν(x) < ∞. We
recall that the Lq -Wasserstein distance of µ,ν is defined as

Wq (µ,ν) = inf
(ξ,ζ)

E
[
∥ξ−ζ∥q

q

]1/q
,

where the infimum is taken over all pairs (ξ,ζ) of random variables defined on the same probability space Ω such
that ξ has distribution µ and ζ has distribution ν. If X ,Y are random variables with distributions µ,ν, it is conve-
nient to use the shorthand Wq (X ,Y ) =Wq (µ,ν), provided that E[∥X ∥q

q ],E[∥Y ∥q
q ] <∞.

For two random variables X ,Y we write X ∼ Y if X ,Y are identically distributed. Moreover, for a probability
distribution µ and a random variable X we write X ∼µ if X has distribution µ.

We will make repeated use of the following tail bound for Poisson variables.

Lemma 1.4 (Bennett’s inequality [14, Theorem 2.9]). Suppose that X ∼ Po(λ) withλ> 0 and letϕ(x) = (1+x) log(1+
x)−x for x >−1. Then

P [X ≥λ+ t ] ≤ exp(−λϕ(t/λ)) for any t > 0,

P [X ≤λ− t ] ≤ exp(−λϕ(−t/λ)) for any 0 < t <λ.

For reals a,b we write

a ∨b = max{a,b}, a ∧b = min{a,b}.

Unless specified otherwise asymptotic notation o( · ), O( · ), etc. is understood to refer to the limit n →∞. The
symbol Õ( · ) is understood to swallow polylog(n) terms. Throughout we tacitly assume that n is sufficiently large
so that the various estimates are valid. We use the conventions log0 =−∞ and log∞=∞. Finally, throughout the
paper we assume that k ≥ 3 is a fixed integer.

2. OVERVIEW

In this section we survey the proofs of the main results. Subsequently, we discuss further related work. The proof
details are deferred to the remaining sections; see Section 2.7 for pointers. We assume throughout that k ≥ 3.

2.1. Existence of the fixed point and upper bound. As a first step towards the proof of Theorem 1.1 we prove that
the limit (1.6) exists for d < duniq(k). More precisely, we will establish the following statement.

Proposition 2.1. For every k ≥ 3 and every d < duniq(k), the W1-limit πd ,k = limℓ→∞ BPℓd ,k (δ1/2) exists and

E
[

log2µπd ,k ,1,1

]
+E

∣∣∣∣∣log

(
d−∏
i=1

µπd ,k ,2i +
d+∏
i=1

µπd ,k ,2i−1

)∣∣∣∣∣+E
∣∣∣∣∣log

(
1−

k∏
j=1

µπd ,k ,1, j

)∣∣∣∣∣<∞. (2.1)

In addition, µπd ,k ,1,1 and 1−µπd ,k ,1,1 are identically distributed.
5


The existence of the limit πd ,k is an easy consequence of the Gibbs uniqueness property. As an aside, the limit
πd ,k = limℓ→∞ BPℓd ,k (δ1/2) is a fixed point of the Belief Propagation operator, i.e.,

πd ,k = BPd ,k (πd ,k ). (2.2)

The proof of the bound (2.1) is a bit more subtle and requires a few preparations, but we will come to that. The
upshot of (2.1) is that the Bethe free entropy Bd ,k (πd ,k ) is well defined.

With the fixed point πd ,k in hand we can bring to bear the ‘interpolation method’ to the upper bound the likely
value of log Z (Φ).

Corollary 2.2. If d < duniq(k) then w.h.p. we have 1
n log Z (Φ) ≤Bd ,k (πd ,k )+o(1).

The interpolation method is a mainstay of the study of disordered systems in mathematical physics and has also
been used to investigate random constraint satisfaction problems. In particular, the variant of the interpolation
method from [60] (in combination with Proposition 2.1) easily implies that

limsup
n→∞

1

n
E
[
log(Z (Φ)∨1)

]≤Bd ,k (πd ,k ) ;

taking the logarithm of Z (Φ)∨1 ensures that the expectation is well defined, as it is possible (albeit unlikely for
d < duniq(k)) that Φ is unsatisfiable. The added value of Corollary 2.2 is that we obtain a bound that holds with
high probability, rather than just a bound on the expectation. The interpolation method was used in [3] in a similar
fashion to prove a ‘with high probability’ bound on the number of satisfying assignments of random 2-CNFs. The
proof of Corollary 2.2 is an adaptation of that argument to k ≥ 3.

2.2. A matching lower bound. The key step towards Theorem 1.1 is to establish a lower bound on log Z (Φ) that
matches the upper bound from Corollary 2.2. To accomplish this task we employ a coupling argument known
as the ‘Aizenman-Sims-Starr scheme’ in mathematical physics. Its original version was intended to estimate the
partition function of the Sherrington-Kirkpatrick model, a spin glass model [8]. But the technique has since been
employed in probabilistic combinatorics (e.g., [24, 25, 65]). By comparison to the mathematical physics context,
the crucial difference is that here our objective is to count actual satisfying assignments where every single clause
imposes a hard constraint, whereas in spin glass theory constraints are soft. The same issue occurred in previ-
ous work on the random 2-SAT problem [3]. However, in that case a relatively simple percolation argument was
sufficient to deal with the ensuing complications. As we will see, for k ≥ 3 considerably more care is needed.

But first things first. The basic idea behind the Aizenman-Sims-Starr argument is to perform a kind of induc-
tion. Translated to random k-SAT this means that we couple the random k-CNFΦd ,k (n) with n variables with the
random k-CNFΦd ,k (n+1) with n+1 variables. Recall thatΦd ,k (n) comprises mn ∼ Po(dn/k) independent random
clauses. Ultimately Theorem 1.1 is going to be a consequence of Corollary 2.2 and the following statement.

Proposition 2.3. If d < duniq(k) then E
[
log(Z (Φd ,k (n +1))∨1)

]−E[
log(Z (Φd ,k (n))∨1)

]=Bd ,k (πd ,k )+o(1).

Once again we work with Z (Φd ,k (n))∨1 and Z (Φd ,k (n +1))∨1 to ensure that the expectations are well defined.
To prove Proposition 2.3 we couple the random formulasΦd ,k (n +1) andΦd ,k (n) as follows.

CPL1: LetΦ′ be a random k-CNF with variables x1, . . . , xn and m′ ∼ Po(d(n −k +1)/k) clauses.
CPL2: ObtainΦ′′ fromΦ′ by adding another∆′′ ∼ Po(d(k −1)/k) independent random clauses.
CPL3: Obtain Φ′′′ from Φ′ by adding one new variable xn+1 and ∆′′′ ∼ Po(d) independent random clauses

that each contain xn+1 and k −1 other variables from {x1, . . . , xn}.

Observe that Φ′′ ultimately has variables x1, . . . , xn and a total of mn ∼ Po(dn/k) random clauses. Thus, Φ′′ is
identical to the random formulaΦd ,k (n). Similarly,Φ′′′ has the same distribution asΦd ,k (n+1). Consequently, we
obtain the following.

Fact 2.4. For any d > 0 we have Z (Φd ,k (n)) ∼ Z (Φ′′) and Z (Φd ,k (n +1)) ∼ Z (Φ′′′).

The coupling CPL1–CPL3 reduces the proof of Proposition 2.3 to getting a handle on the differences log(Z (Φ′′)∨
1)−log(Z (Φ′)∨1) and log(Z (Φ′′′)∨1)−log(Z (Φ′)∨1). More precisely, recalling (1.4)–(1.5), we see that Proposition 2.3
is a consequence of the following two statements.

Proposition 2.5. If d < duniq(k) then

E

[
log

Z (Φ′′)∨1

Z (Φ′)∨1

]
= d(k −1)

k
E

[
log

(
1−

k∏
j=1

µπd ,k ,1, j

)]
+o(1). (2.3)

6


Proposition 2.6. If d < duniq(k) then

E

[
log

Z (Φ′′′)∨1

Z (Φ′)∨1

]
= E

[
log

(
d−∏
i=1

µπd ,k ,2i +
d+∏
i=1

µπd ,k ,2i−1

)]
+o(1). (2.4)

To prove Propositions 2.5–2.6 we effectively need to trace the impact that local changes have on the number
of satisfying assignments. Indeed, under the coupling CPL1–CPL3, the formula Φ′′ is obtained from the ‘base
formula’ Φ′ by adding just a bounded expected number of random clauses. Thus, if we imagine that, as both
the first moment upper bound (1.13) and the balanced second moment lower bound (1.14) suggest, each addi-
tional random clause typically reduces the number of satisfying assignments by a constant factor, then the quantity
| log(Z (Φ′′)/Z (Φ′))| should be bounded with probability close to one. Similar reasoning applies toΦ′′′.

Yet while with high probability the local changes that turnΦ′ intoΦ′′ orΦ′′′ are indeed benign, because we are
dealing with hard constraints there is a non-negligible probability that log(Z (Φ′′)/Z (Φ′)) and log(Z (Φ′′′)/Z (Φ′))
could be large. Indeed, a single extra clause might wipe out all satisfying assignments ofΦ′, in which case

log
Z (Φ′′)∨1

Z (Φ′)∨1
=− log Z (Φ′) =−Ω(n).

Hence, we need to argue that such drastic changes are sufficiently rare. The following statement furnishes the
necessary tail bound.

Proposition 2.7. For d < duniq(k) we have

E

[∣∣∣∣log
Z (Φ′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

+
∣∣∣∣log

Z (Φ′′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
=O(1). (2.5)

2.3. Pure literal pursuit. The proof of Proposition 2.7 constitutes the main technical challenge towards the proof
of Theorem 1.1. The linchpin of the proof is an algorithm that we call Pure Literal Pursuit (‘PULP’). Its purpose is
to trace the repercussions of setting a relatively small number of variables to specific truth values. More precisely,
PULP will allow us to compare the number of satisfying assignments that set a few chosen variables to specific
values to the total number of satisfying assignments.

To this end PULP attempts to solve the following optimisation task. Suppose we are given a k-CNF Φ and a set
L of literals of Φ that we deem to be set to ‘true’. We would like to identify a superset L̄ ⊇ L of literals with the
following properties; think of L̄ as a ‘closure’ of L .

PULP1: every clause a ofΦ that contains a literal from ¬L̄ = {¬l : l ∈ L̄ } also contains a literal from L̄ .
PULP2: there is no literal l such that l ,¬l ∈ L̄ .

Of course, it may be impossible to satisfy PULP1 and PULP2 simultaneously. In this case we ask PULP to report a
‘contradiction’. But if PULP1–PULP2 can be satisfied, we aim to find a closure L̄ of as small size |L̄ | as possible.

The combinatorial idea behind PULP1–PULP2 is as follows. Deeming the literals from the initial set L ‘true’,
our goal is to reconcile this assumption with the formula Φ. To this end we enhance the set L . Clearly, any clause
that contains the negation ¬l of a literal l that we deem true also needs to contain another literal l ′ that is set to
true. This is what PULP1 asks. Furthermore, it would be contradictory to deem both l and its negation ¬l true;
this is PULP2.

The size of the closure L̄ yields a bound on the reduction in the number of satisfying assignments if we indeed
insist on all literals l ∈ L being set to true. Formally, let S(Φ,L ) be the set of all satisfying assignments σ ∈ S(Φ)
under which all literals l ∈L evaluate to ‘true’. Also set Z (Φ,L ) = |S(Φ,L )|.
Lemma 2.8. For anyΦ,L and any L̄ ⊇L that satisfies PULP1–PULP2 we have Z (Φ) ≤ 2|L̄ |Z (Φ,L ).

In order to identify a ‘small’ closure L̄ the PULP algorithm resorts to pure literal elimination, a simple trick
commonplace to satisfiability algorithms. A variable x is pure in a CNF formula Φ if sign(x, a) = sign(x,b) for any
two clauses a,b ∈ ∂x. Clearly, if our objective is to construct a satisfying assignment, we might as well set all pure
variables x to the value that satisfies all clauses a ∈ ∂x and disregard these clauses henceforth. In light of this
observation, pure literal elimination repeatedly removes all clauses that contain a pure variable. Naturally, every
round of clause removals may create new pure variables, and thus more clauses may be ripe for removal in the
next round. For a clause a of the original formula Φ let ha(Φ) ≥ 1 be the number of the round at which pure literal
elimination removes a. If a is never removed then we set ha(Φ) =∞.

7


The PULP algorithm invokes a slightly modified version of pure literal elimination to accommodate the initial set
L of literals. Specifically, for a variable x of a CNFΦ and s ∈ {±1} letΦ[x 7→ s] be the CNF obtained by removing all
clauses a ∈ ∂x with sign(x, a) = s and removing the literal −s · x from all a ∈ ∂x with sign(x, a) =−s. The definition
reflects that if we set x to value s, all a ∈ ∂s x will be satisfied, while all a ∈ ∂−s x will have to be satisfied by one of
their other constituent literals. Further, let

hx (s,Φ) =
{

0 if ∂−s
Φ x =;,

max
{
ha(Φ[x 7→ s]) : a ∈ ∂−s

Φ x
}

otherwise.
∈ [0,∞]. (2.6)

We refer to hx (s,Φ) as the height of literal s · x in Φ.
The PULP algorithm, displayed as Algorithm 1, harnesses the heights as follows. In its attempt to precipitate

PULP1 and PULP2 the algorithm iteratively enhances the set L of literals deemed to be ‘true’. For any clause a
that violates PULP1 and that contains a literal l ̸∈ ¬L the algorithm adds one such literal l of minimum height
to L . This choice is intended to keep the ultimate size of the closure small; one could say that PULP uses height
as a proxy of ‘size’. If at any point the algorithm encounters a clause a that consists of literals from ¬L only, the
algorithm reports a contradiction and aborts.

Input: A k-CNF Φ and a set L of literals.
1 Let L̄ =L ;
2 while there is a clause a that contains a literal from ¬L̄ but no literal from L̄ do
3 Pick such a clause a that minimises the distance from the initial set L = {|l | : l ∈L };
4 if a consists of literals l ∈¬L̄ only then
5 return ‘contradiction’ and halt;
6 else
7 choose x ∈ ∂a with x,¬x ̸∈ L̄ that minimises hx (sign(x, a),Φ) and add sign(x, a) · x to L̄ ;
8 return L̄

Algorithm 1: The PULP algorithm

Remark 2.9. To break ties that may occur in the execution of Steps 3 and 7 of PULP we assume that the variables
and clauses of Φ are numbered so that Steps 3 and 7 can choose the clause/variable with the smallest number that
satisfies the respective requirements. In due course we will run PULP on (finite subtrees of) the Galton-Watson tree
T. To number the variables and clauses of T we equip each of them with an independent Gaussian label. Since T
comprises a countable number of clauses/variables, these labels will almost surely be pairwise distinct.

From here on we write L̄ for the set of literals returned by PULP if the algorithm does not encounter a contra-
diction; in the event of a a contradiction we let L̄ = {x,¬x : x ∈V (Φ)} be the set of all literals. Where the reference
to the formula Φ is not entirely obvious, we write L̄Φ. The analysis of PULP on the random formula Φ′ furnishes
the following bound on |L̄ | in terms of the size of the initial set L . This bound is the key ingredient towards the
proof of Proposition 2.7.

Lemma 2.10. There exists C = C (d ,k) > 0 such that the following is true. Let L be a set of literals of Φ′ such that
1 ≤ |L | ≤ log2 n and such that {xi ,¬xi } ̸⊆L for all 1 ≤ i ≤ n. Then E[|L̄ |3/2] ≤C |L |3/2.

The proof of Lemma 2.10 is one of the main technical challenges of the present work. The difficulty stems from
the stochastic dependencies that are inherent to the PULP algorithm. Specifically, in order to decide which literals
to add to the set L , PULP requires knowledge of the heights hx (±1,Φ′). But these heights depend on the other
variables y ∈ ∂a \ {x}, the clauses that these variables y appear in, etc. Furthermore, in its subsequent iterations
the algorithm is apt to revisit some of these variables and clauses at a point when their heights have already been
revealed. These repetitions rule out an analysis of PULP by way of routine techniques such as the principle of de-
ferred decisions or the differential equations method. The reason why we manage to cope with these complicated
dependencies at all is that, remarkably, the heights hx (±1,Φ′) have only a tiny upper tail. More precisely, as we will
see the tails of these random variables decay at a doubly exponential rate.

Proposition 2.7 follows from the analysis of PULP. The basic idea is to apply the algorithm to an initial set L of
literals that contain one literal from each of the extra clauses that are present inΦ′′ orΦ′′′ but not inΦ′. With a bit

8


of care the bounds from Lemmas 2.8 and 2.10 then imply (2.5). Finally, the analysis of PULP that leads up to the
proof of Lemma 2.10 also implies the necessary tail bounds to verify the bounds from (2.1). Specifically, the proof
of Lemma 2.10 proceeds by way of analysing PULP on the Galton-Watson tree Td ,k , and the bounds (2.1) come out
as a byproduct of that analysis.

2.4. Completing the Aizenman-Sims-Starr scheme. To obtain Propositions 2.5–2.6 we combine Proposition 2.7
with an analysis of the quotients Z (Φ′′)/Z (Φ′) and Z (Φ′′′)/Z (Φ′) on a likely ‘good’ event. On this good event the
empirical distribution of the marginal probabilities (P[σΦ′ (xi ) = 1 |Φ′])1≤i≤n of the different variables xi receiving
the value ‘true’ under a random satisfying assignment is ‘close’ to the limiting distribution πd ,k from Proposi-
tion 2.1. Additionally, on the good event the joint distribution of the truth values assigned to a moderate number
of variables is well approximated by a product measure. Of course, to make this precise we need to investigate the
empirical distribution

π′
n = 1

n

n∑
i=1

δP[σΦ′ (xi )=1|Φ′] ∈P (0,1) (2.7)

of the marginals (P[σΦ′ (xi ) = 1 |Φ′])1≤i≤n .

Proposition 2.11. Assume that d < duniq(k). Then E
[
W1(π′

n ,πd ,k )
]= o(1) and for any ℓ=O(1) we have

∑
σ∈{±1}ℓ

E

∣∣∣∣∣P
[∀1 ≤ i ≤ ℓ :σΦ′ (xi ) =σi |Φ′]−

ℓ∏
i=1

P
[
σΦ′ (xi ) =σi |Φ′]

∣∣∣∣∣= o(1).

The proof of Proposition 2.11 hinges on the Gibbs uniqueness property and the convergence of the local topol-
ogy of the random formula Φ′ to the Galton-Watson tree Td ,k . Together with careful coupling arguments Propo-
sitions 2.7–2.11 imply Propositions 2.5–2.6. Moreover, in combination with Fact 2.4 these two propositions yield
Proposition 2.3. We complete this paragraph by showing how Theorem 1.1 follows from Corollary 2.2 and Propo-
sition 2.3.

Proof of Theorem 1.1. The existence of the limit (1.6) follows from Proposition 2.1. With respect to (1.7), we apply
Proposition 2.3 to obtain

1

n
E
[
log(1∨Z (Φd ,k (n)))

]= 1

n

n−1∑
N=0

(
E
[
log(1∨Z (Φd ,k (N +1))

]−E[
log(1∨Z (Φd ,k (N ))

])

=Bd ,k (πd ,k )+o(1) . (2.8)

Since, conversely, Corollary 2.2 shows that 1
n log Z (Φ) ≤Bd ,k (πd ,k )+o(1) w.h.p. and since log Z (Φ) ≤ n log2 deter-

ministically, the assertion follows from (2.8). □

2.5. Lower-bounding the uniqueness threshold. The proof of Theorem 1.2 combines three ingredients. From the
work [3] on random 2-SAT we borrow the idea of constructing an explicit extremal boundary configuration. In
effect, in order to prove Gibbs uniqueness we just have to consider one single boundary configuration, instead
of an enormous number of possible configurations τ that grows quickly with the height ℓ as in the original defini-
tion (1.1). Second, from the work [56] of Montanari and Shah we borrow the idea of expressly considering the effect
of pure literals. As it turns out, without explicit consideration of pure literals it seems difficult to even recover the
correct asymptotic order (1.8) of the Gibbs uniqueness threshold. Third, and most importantly, the improvement
over the bound from [56] stems from a new subtle coupling argument that we will explain in due course.

2.5.1. The extremal boundary condition. An obvious challenge associated with establishing the Gibbs uniqueness
property (1.1) seems to be that we need to estimate the marginal of the root variable given any possible boundary
condition, i.e., given any assignment of the variables at distance 2ℓ from x. As we expect to see (d(k−1))ℓ variables

at distance 2ℓ from x, we thus face a doubly exponential number 2(d(k−1))ℓ of possible boundary conditions. But
fortunately, following [3] we may confine ourselves to just a single, explicit boundary configurationτ+ that satisfies

P
[
τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) =τ+(x)

]
= max
τ∈S(T(ℓ))

P
[
τ(ℓ)(r ) = 1 |T, ∀x ∈ ∂2ℓr :τ(ℓ)(x) = τ(x)

]
. (2.9)

Due to the inherent symmetry of the distribution of T with respect to the signs of the clauses, towards the proof
of (1.1) it is sufficient to show that the difference (2.9) vanishes as ℓ→∞.

9


The extremal boundary condition can be constructed explicitly. Specifically, givenT(ℓ) we construct a satisfying
assignment τ+ ∈ S(T(ℓ)) by working our way down the tree T(ℓ). We begin by setting τ+(x) = 1. Now suppose that
for q ≥ 1, the values of the variables at distance 2(q −1) from x have been already determined. Let w be a variable
at distance 2q from x with parent clause a and grandparent variable u. Then we define

τ+(w) = sign(w, a) · 1{sign(u, a) ̸=τ+(u)}− sign(w, a) · 1{sign(u, a) =τ+(u)} . (2.10)

The idea behind (2.10) is for τ+(w) to “nudge” u towards τ+(u) by making sure that w satisfies clause a if setting u
to τ+(u) does not, and conversely making sure that w fails to satisfy clause a if setting u to τ+(u) does. A simple
induction on ℓ shows that τ+ is a satisfying assignment for which (2.9) holds.

Lemma 2.12. For any integer ℓ≥ 0 the assignment τ+ defined via (2.10) satisfies (2.9).

Hence, proving Theorem 1.2 reduces to establishing the following.

Proposition 2.13. For d < dcon(k) we have that

lim
ℓ→∞

E
[
P

[
τ(ℓ)(x) = 1 |T, ∀x ∈ ∂2ℓx :τ(ℓ)(x) =τ+(x)

]
−P

[
τ(ℓ)(x) = 1 |T

]]
= 0. (2.11)

The proof of Proposition 2.13 may seem delicate because the boundary condition τ+ depends on the tree T(ℓ).
To sidestep this problem, we generalise another idea from the work [3] on random 2-SAT to k ≥ 3 by introducing a
quantity that allows us to prove (2.13) but that behaves ‘Markovian’ as we pass up and down the tree. Specifically,
for a variable x of T(ℓ) let T(ℓ)

x be the sub-formula of T(ℓ) comprising x and its progeny. Moreover, for a satisfying
assignment τ ∈ S(T(ℓ)) let

S(T(ℓ)
x ,τ) =

{
χ ∈ S(T(ℓ)

x ) : ∀y ∈V (T(ℓ)
x )∩∂2ℓ

T x :χy = τy

}
, Z (T(ℓ)

x ,τ) =
∣∣∣S(T(ℓ)

x ,τ)
∣∣∣ .

In words, S(T(ℓ)
x ,τ) contains all satisfying assignments ofT(ℓ)

x that comply with the boundary condition induced by
τ. Additionally, for t =±1 let

S(T(ℓ)
x ,τ, t ) =

{
χ ∈ S(T(ℓ)

x ,τ) :χx = t
}

, Z (T(ℓ)
x ,τ, t ) =

∣∣∣S(T(ℓ)
x ,τ, t )

∣∣∣

be the set and number of satisfying assignments of T(ℓ)
x that agree with τ on the boundary and assign value t to x.

Finally, let

η(ℓ)
x = log

Z (T(ℓ)
x ,τ+,τ+(x))

Z (T(ℓ)
x ,τ+,−τ+(x))

∈R∪ {±∞} (2.12)

be the log-likelihood ratio that gauges how likely a random satisfying assignment τ of T(ℓ)
x subject to the τ+-

boundary condition is to set x to its designated value τ+(x) from (2.10). In terms of (2.12), the proof of Propo-
sition 2.13 comes down to showing that for d < dcon(k),

lim
ℓ→∞

η(ℓ)
x = log

(
µπd ,k ,1,1

1−µπd ,k ,1,1

)
in distribution. (2.13)

For a start, the following lemma bounds the tails of η(ℓ)
x for large enough ℓ and x reasonably close to the root

variable x.

Lemma 2.14. For every 0 < d < dcon(k) there exist c = c(d ,k) and a sequence (εt )t with limt→∞ εt = 0 such that for
any t > 0, ℓ> ct c we have

P

[
max

x∈∂2tx

∣∣∣η(ℓ)
x

∣∣∣≤ 2t c
]
> 1−εt . (2.14)

The proof of Lemma 2.14 rests on combinatorial arguments reminiscent of the analysis of PULP.
A key feature of the definition (2.12) is that the random variables η(ℓ)

x exhibit a ‘reverse Markovian’ behaviour.
This is because η(ℓ)

x depends only on τ+(x) and the part T(ℓ)
x of the tree pending on x. Furthermore, because the

distribution of the random treeT(ℓ)
x is symmetric with respect to sign flips, even the dependence on the valueτ+(x)

can be eliminated. All we need to keep in mind is that the values τ+(y) for y ∈ V (T(ℓ)
x ) are constructed from the

value τ+(x) in accordance with the recurrence (2.10). Thus, by flipping all signs in the tree T(ℓ)
x if necessary, we

10


could assume without loss that τ+(x) = 1 without changing the distribution of η(ℓ)
x with respect to the random-

ness of T(ℓ)
x . As a consequence, it is possible to set up a recurrence that expresses the log-likelihood ratios η(ℓ)

x of
variables x at distance q from x in terms of the η(ℓ)

y for y at distance q +2 from x.
Due to the recursive nature of the random tree T, it suffices to set up this recurrence for the root x of the tree. In

other words, to prove (2.13) we just need a recurrence that expresses the distribution of the random variable η(ℓ+1)
x

in terms of the law of η(ℓ)
x for ℓ≥ 0. A bit of reflection (see Claim 7.1), reveals that the corresponding distributional

operator

LL+
d ,k : P ((−∞,∞]) →P ((−∞,∞]) , ρ 7→ ρ̂ = LL+

d ,k (ρ)

has the following shape. For a distribution ρ ∈ P ((−∞,∞]) let (ηρ,i , j )i , j≥1 be a family of random variables with
distribution ρ. Moreover, let (si )i≥1 be a sequence of uniformly random ±1-valued random variables and let d ∼
Po(d). All of these random variables are mutually independent. Additionally, for q ≥ 0 and z1, . . . , zq ∈ R∪ {±∞}
define

Γ (z1, . . . , zq ) =
q∏

i=1

1+ tanh(zi /2)

2
. (2.15)

Then ρ̂ = LL+
d ,k (ρ) is the distribution of the random variable

−
d∑

i=1
si · log

(
1−Γ

(
si ·

(
ηρ,i ,1, . . . ,ηρ,i ,k−1

)))
. (2.16)

Ultimately we will derive (2.13), and thereby Proposition 2.13, from Lemma 2.14 and a contraction argument.
However, this is not quite as straightforward as one might be inclined to expect. Indeed, at first glance, a natural
approach to proving (2.13) from Lemma 2.14 seems to be to show that LL+

d ,k is a contraction, say, with respect to

the W1-metric. This is indeed carried out in [3] for k = 2, where it is shown that LL+
d ,2 contracts for all 0 < d < 2, i.e.,

right up to the random 2-SAT satisfiability threshold. However, for k ≥ 3 we can only show that LL+
d ,k contracts for

d < 2/(k −1), a value well below dcon(k) and short of the correct asymptotic order (1.10).

2.5.2. Pure and mixed literals. To cover a larger range of d we borrow from [56] the idea of expressly taking into
account pure literals. To elaborate, while LL+

d ,k describes how the law ofη(ℓ+1)
x results from that ofη(ℓ)

x , the operator
fails to take into account that x itself as well as some of the grandchildren of x in T may be pure literals. However,
the pure literal property has a marked effect on the log-likelihood ratios. For if, say, x only appears positively, then a
simple double counting argument shows that η(ℓ)

x ≥ 0 for all ℓ. By extension, pure literals among the grandchildren
of x have a ‘dampening’ effect and may thus improve the range of d for which we can establish contraction.

For a variable node x ofT, let us denote byTx the subtree ofT rooted at x and containing its progeny. Leveraging
the above observation, we classify a variable x ofT as , ⊕, ⊖, or#, depending on whether x appears both positively
and negatively inTx , only positively, only negatively, or whether x has no children at all, respectively. Furthermore,
instead of just tracing the law of η(ℓ)

x for ℓ≥ 0, we study the four separate conditional distributions given the type

 ,⊕,⊖ or # of x. Of course, the distribution of η(ℓ)
x given type # (i.e., x has no children) is just the atom at zero for

all ℓ.
To describe the evolution of the other distributions we introduce the operator

LL⋆d ,k : P (−∞,∞]×P (0,+∞]×P (−∞,0] →P (−∞,∞]×P (0,+∞]×P (−∞,0] ,

with (ρ ,ρ⊕,ρ⊖) 7→ (ρ̂ , ρ̂⊕, ρ̂⊖) = LL⋆d ,k (ρ ,ρ⊕,ρ⊖) (2.17)

defined as follows. Let d⋆
+,d⋆

+
′,d⋆

−,d⋆
−
′ be Poisson variables with parameter d/2, conditioned on being positive.

Moreover, let r 1 =
(
r  ,1,r ⊕,1,r ⊖,1,r #,1

)
, r 2 =

(
r  ,2,r ⊕,2,r ⊖,2,r #,2

)
, . . . be multinomial variables with k −1 trials and

probabilities

p = (1−e−d/2)2, p⊕ = p⊖ = e−d/2(1−e−d/2), p# = e−d . (2.18)

11


For i , j ≥ 1 let η ,i , j , η⊕,i , j , η⊖,i , j be random variables with law ρ , ρ⊕, ρ⊖, respectively. All of the aforementioned
random variables are mutually independent. Further, for a sign ε ∈ {±1} and a vector r = (r ,r⊕,r⊖,r#) of non-
negative integers with r + r⊕+ r⊖+ r# = k −1 and i ≥ 0, 1 ≤ j ≤ 4 we let

ΞΞΞi , j (ε,r ) = 1− 1

2r#
·Γ

(
ε
(
η ,4i+ j ,1, . . . ,η ,4i+ j ,r 

))
Γ

(
ε
(
η⊕,4i+ j ,1, . . . ,η⊕,4i+ j ,r⊕

))
Γ

(
ε
(
η⊖,4i+ j ,1, . . . ,η⊖,4i+ j ,r⊖

))
. (2.19)

The r.h.s. of (2.19) amounts to rewriting the argument of the logarithm in (2.16) when the number of variables of
each type is distributed according to r . Finally, let

ΞΞΞi , j =ΞΞΞi , j

(
(−1) j+1,r 4i+ j

)
. (2.20)

Then the operator (2.17) maps ρ ,ρ⊕,ρ⊖ to the distributions ρ̂ , ρ̂⊕, ρ̂⊖ of the random variables

ρ̂ ∼−
d⋆
+∑

i=1
logΞΞΞi ,1 +

d⋆
−∑

i=1
logΞΞΞi ,2 , ρ̂⊕ ∼−

d⋆
+
′∑

i=1
logΞΞΞi ,3 , ρ̂⊖ ∼+

d⋆
−
′∑

i=1
logΞΞΞi ,4 . (2.21)

2.5.3. Coupling and contraction. While Montanari and Shah [56] do not write their proof of the lower bound
dMS(k) ≤ duniq(k) in the language of distributional recurrences, translating their argument to the current formal-
ism evinces two key differences by comparison to the approach that we are going to take. First, Montanari and
Shah establish contraction with respect to messages from clauses to variables, instead of messages from variables
to clauses as considered here.

While this change of perspective may seem innocuous at first, working with respect to variables provides us
with greater control over how the change in log-likelihood ratios propagates. In particular, working with variable-
to-clause messages and taking into account the four variable types  ,⊕,⊖,# allows us to optimise the metric with
respect to which we establish contraction. Hence, for t > 0 we endow the space P (−∞,∞]×P (0,+∞]×P (−∞,0]
with the metric

distt
((
ρ ,ρ⊕,ρ⊖

)
,
(
ρ′
 ,ρ′

⊕,ρ′
⊖
))= (

1−e−t/2) ·W1
(
ρ ,ρ′

 
)+e−t/2 ·W1

(
ρ⊕,ρ′

⊕
)+e−t/2 ·W1

(
ρ⊖,ρ′

⊖
)

. (2.22)

The following proposition summarises the main step towards the proof of Theorem 1.2.

Proposition 2.15. For every d < dcon(k), the operator LL⋆d ,k is a contraction with respect to the metric distd .

The second key difference between [56] and the present approach will emerge in the proof of Proposition 2.15
itself. As we are about to see, leveraging the four variable types enables us to carry out a sharper bound on the
derivative of our operator LL⋆d ,k . This comes in the form of a subtle combinatorial coupling between variable

types among clauses with opposite signs. To explain this, we recall that LL⋆d ,k describes how the laws of the log-
likelihood ratios ρ ,ρ⊕, and ρ⊖, evolve given the corresponding laws of the variables in one generation below.
Recall also that we are always considering the positive boundary condition, i.e., the one maximising the value of
each log-likelihood ratio.

Let us write ρ = (ρ ,ρ⊕,ρ⊖), ρ′ = (ρ′
 ,ρ′

⊕,ρ′
⊖), and ρ̂, ρ̂′ for their corresponding images under the operator LL⋆d ,k .

We wish to establish that distd (ρ̂, ρ̂′) < c ·distd (ρ,ρ′), for some constant c = c(d ,k) < 1. We call a clause a positive
if it contains its parent variable as a direct literal; otherwise, we call a negative. The change between the output
distributions ρ̂, ρ̂′ describing the log-likelihood law of, say, variable x, comes from two sources: the positive and
the negative children of x. Observe that there is no obvious symmetry between the two, as we have imposed the
positive boundary condition, and therefore, the influence of positive clauses is typically more pronounced. In turn,
the change caused by each clause can be further attributed to that of the k −1 grandchildren variables it features.
To be more precise, let us consider the contribution of a single positive clause a. Let us write r = (r  ,r ⊕,r ⊖,r #)
for the type-distribution of the children variables of a, where r follows the law described in (2.18). Consider also
an arbitrary enumeration of the variables of each type t ∈ { ,⊕,⊖}, and write D t

i (z,r ;+1) for the magnitude of the
partial derivative of the message clause a sends to x, with respect to the message clause a receives from its i -th
variable of type t . Then, the expected contribution of clause a to the distance dist(ρ̂, ρ̂′) is bounded in terms of
D t

i (z,r ;+)’s as follows

E

[
r ∑
i=1

∣∣∣∣∣
∫ η′ ,i

η ,i

D 
i (wi ,r ;+1)dwi

∣∣∣∣∣+
r⊕∑
j=1

∣∣∣∣∣
∫ η′⊕, j

η⊕, j

D⊕
j (y j ,r ;+1)dy j

∣∣∣∣∣+
r⊖∑
ℓ=1

∣∣∣∣∣
∫ η′⊖,ℓ

η⊖,ℓ

D⊖
ℓ (zℓ,r ;+1)dzℓ

∣∣∣∣∣

]
, (2.23)

12


FIGURE 2. Example of a coupling between derivative terms in (2.24)–(2.25). For vector r and type
t ∈ { ,⊕,⊖}, we pair the term D t (z,r ;+1) in (2.24) with the term D t (z,pt (r );−1) in (2.25).

whereηt ,i ,η′
t ,i follow the law ofρt ,ρ′

t , respectively. Expanding the expectation with respect to the type-distribution
r , and writing P (r ) =P[r = r ], for the probability of a vector r = (r ,r⊕,r⊖,r#), we rewrite (2.23) as

∑
r

P (r )

(
r ·E

∣∣∣∣∣
∫ η′ ,1

η ,1

D 
1 (z,r ;+1)dz

∣∣∣∣∣+ r⊕ ·E
∣∣∣∣∣
∫ η′⊕,1

η⊕,1

D⊕
1 (z,r ;+1)dz

∣∣∣∣∣+ r⊖ ·E
∣∣∣∣∣
∫ η′⊖,1

η⊖,1

D⊖
1 (z,r ;+1)dz

∣∣∣∣∣

)
. (2.24)

The expected contribution of a negative clause is given by an expression similar to (2.24), albeit in terms of D t
1(z,r ;−),

i.e., the partial derivative of the message a → x, with respect to the message from a variable of type t to clause a.
Specifically, the expected contribution of a negative clause reads:

∑
r ′

P (r ′)

(
r ′
 ·E

∣∣∣∣∣
∫ η′ ,1

η ,1

D 
1 (z,r ′;−1)dz

∣∣∣∣∣+ r ′
⊕ ·E

∣∣∣∣∣
∫ η′⊕,1

η⊕,1

D⊕
1 (z,r ′;−1)dz

∣∣∣∣∣+ r ′
⊖ ·E

∣∣∣∣∣
∫ η′⊖,1

η⊖,1

D⊖
1 (z,r ′;−1)dz

∣∣∣∣∣

)
. (2.25)

It is not hard to see that pure literals have a ‘dampening’ effect on each partial derivative D t
1(z,r ;±). Consider a

clause a whose children variables are distributed among the different types according to r . Then each derivative in
(2.24)–(2.25), can be bounded in terms of the number of pure literals featured in a, excluding the variable of type
t with respect to which the derivative is taken. Notice that the operator LL⋆d ,k effectively incorporates the positive
boundary condition by imposing the sign of each variable with respect to its parent clause, a, to be + if a is positive,
and − if a is negative. With that in mind, we see that if a is a positive clause, the total number of pure literals it
contains is just r⊖ + r#. On the other hand, if a is negative, then the total number of pure literals it contains is
r⊕ + r#. Bounding separately each derivative D t

i (z,r ;±1) in (2.24)–(2.25), and invoking the mean value theorem,
yields an upper bound on for the contraction constant c.

However, we can do better by partitioning the derivatives in (2.24)–(2.25) into groups, and optimising them
jointly. Indeed, a careful examination of the expression (2.19), reveals that, for example, any sum of the form
D⊕(z, (∗,∗,r⊖,r#);+1)+D⊕(z, (∗,r⊖+1,∗,r#);−1) can be explicitly maximised, and the resulting maximum is smaller
than the sum of maxima of the parts. At first sight, this seems to be of little use, if any, as in order to implement
such a coupling between terms (2.24)–(2.25), we should also match their coefficients, that is, the quantity P (r ) ·r⊕,
must remain invariant under the coupling. Somewhat unexpectedly, it turns out that the coupling r 7→ r ′ with
r ′ = (r ,r⊖ + 1,r⊕ − 1,r#), enjoys both features. Similar couplings strategies (depicted in Figure 2) facilitate the
maximisation of ⊕,⊖-terms. The full proof of Proposition 2.15 can be found in Section 7.

We conclude the section explaining how Theorem 1.2 follows from the above.

Proof of Theorem 1.2. From the triangle inequality, and Lemma 2.12, it is immediate to obtain Theorem 1.2 from
Proposition 2.13. □

13


2.6. Discussion. The location of the random 2-SAT satisfiability threshold was pinpointed already in the 1990s [21,
40] essentially because the threshold coincides with the giant component phase transition of a directed random
graph whose edges correspond to the clauses. This argument also implies that both the pure literal algorithm
and another efficient algorithm called unit clause propagation find satisfying assignments up to the satisfiability
threshold w.h.p. By contrast, in the case of random k-SAT with k ≥ 3 the satisfiability threshold is known only for k
exceeding an undetermined (but large) constant k0 [33]. The proof is based on a sophisticated, physics-inspired
second moment argument that significantly extends ideas from earlier work [5, 7, 27]. Asymptotically in the limit
of large k the satisfiabiliy threshold reads

dsat(k) = k

[
2k log2− 1+ log2

2

]
+εk , where lim

k→∞
εk = 0. (2.26)

Even though [5, 7, 33, 27] rely on the second moment method, they do not yield asymptotically tight estimates of
the number of satisfying assignments for any regime of d . This is because the second moment method is applied
not to the number of satisfying assignments, but to another, exponentially smaller random variable. The assump-
tion that k exceeds a large constant is used critically in [27, 33] to ensure certain concentration and expansion
properties.

For 3 ≤ k < k0 even the existence of a uniform satisfiability threshold remains an open problem, although a
sharp threshold sequence that may vary with n is known to exist [36]. That said, an upper bound on the satisfiabil-
ity threshold (sequence) that matches the so-called ‘1-step replica symmetry breaking’ prediction from statistical
physics can be verified using the interpolation method from mathematical physics [35, 49, 51, 60]. However, the
currently known lower bounds for small k (say, k = 3,4,5) fall short of this upper bound [7, 42, 46]. For exam-
ple, in the case k = 3 the best current lower bound is dsat(3) ≥ 10.56, while dsat(3) ≈ 12.801 according to physics
predictions [49, 51].

Thus, the satisfiability of random formulas continues to pose a substantial challenge for ‘small’ 3 ≤ k < k0. In
light of this, a particularly satisfactory aspect of the present results is that they apply and are meaningful for all
k ≥ 3. In fact, comparing the asymptotic bounds (1.10) and (2.26), we see that the Gibbs uniqueness threshold
duniq(k) is much smaller than dsat(k) for large k . Thus, Theorems 1.1 and 1.2 cover larger shares of the satisfiable
regime of d for smaller values of k; cf. Table 1.

The best current lower bounds on the satisfiability thresholds for k ≥ 4 are non-constructive. With respect to
the algorithmic problem of finding a satisfying assignment of a random k-CNF the best current results for ‘small’
k are based on simple combinatorial algorithms, analysed via the method of differential equations [37, 42, 46].
Asymptotically for large k the best known efficient algorithm [22] succeeds up to

dalg(k) = (1−εk )2k logk where lim
k→∞

εk = 0, (2.27)

about a factor of log(k)/k below (2.26). There is evidence that certain types of algorithms do not succeed for much
larger values of d , at least for enough large k [2, 15, 23, 44]. Apart from the task of finding a satisfying assignment,
an important line of work deals with the problem of counting and sampling satisfying assignments of random k-
CNFs for large k [19, 20, 43]. The best current result [20] covers the regime d ≤ 2k /kc for an undetermined (large
enough) constant c > 0. Since for large k the bound 2k /kc significantly exceeds the pure literal threshold (1.10),
it might be an interesting question whether ideas from [19, 20, 43] can be used to verify the replica symmetric
solution (1.7) for d beyond the Gibbs uniqueness threshold for large k.

Most of the prior work on the rigorous verification of the replica symmetric solution focuses on a soft version of
random k-SAT, the so-called random k-SAT model at inverse termperature β> 0 [57]. The partition function Zβ(Φ)
of this model, its the key quantity of interest, is defined as follows. For a clause a of the random formula Φ and a
truth assignment σ write σ |= a if σ satisfies clause a. Then

Zβ(Φ) =
∑

σ∈{±1}n
exp

(
−β

m∑
i=1

1{σ ̸|= ai }

)
. (2.28)

Thus, each assignment σ contributes a summand equal to exp(−β) raised to the power of the number of clauses
that σ fails to satisfy. In effect,

Z (Φ) = lim
β→∞

Zβ(Φ). (2.29)

14


A line of prior work [13, 59, 62] deals with the derivation of the ‘thermodynamic limit’

lim
n→∞

1

n
E
[
log Zβ(Φ)

]
(2.30)

for small d and/or small β. Specifically, these works verify that (2.30) is given by the replica symmetric solution at
inverse temperature β from [54, 55] under the assumption

d(k −1)min
{
1,6βexp(4β)

}< 1. (2.31)

We observe that for large β the bound (2.31) holds only up to the giant component threshold d = 1/(k −1), where
the replica symmetric solution trivially follows from the fact that Belief Propagation is exact on acyclic graphical
models [50, Theorem 4.1]. That said, a technique called the interpolation method shows that the replica symmetric
solution yields an upper bound on (2.30) for all d ,β > 0 [35, 60]. In particular, we will combine the interpolation
method with a concentration argument in order to prove Corollary 2.2.

According to physics predictions the ‘replica symmetric solution’ from [54, 55] yields the correct value of both
limn→∞ n−1 log Zβ(Φ) for all β> 0 and of limn→∞ n−1 log Z (Φ) for all d up to a threshold drsb(k) close to but strictly
below the satisfiability threshold dsat(k) for all k ≥ 3 [47]. The threshold drsb(k) is known as the ‘1-step replica
symmetry breaking phase transition’ in physics jargon; its asymptotic value is predicted as

drsb(k) = k
[

2k log2−2log2
]
+εk , where lim

k→∞
εk = 0. (2.32)

Indeed, the interpolation method can be used to verify that the replica symmetric solution ceases to be correct
for drsb(k)+εk < d < dsat(k) with εk → 0. Conversely, the replica symmetric solution is known to be correct for all
d and β > 0 where a certain correlation decay condition is satisfied [26], provided that k is large enough. Physics
methods predict that this condition holds for all β> 0 and all d < drsb(k) [47].

The aforementioned work of Montanari and Shah [56] also deals with the soft variant of random k-SAT (2.28),
but allows for an inverse temperature β = β(n) that tends to infinity slowly as n →∞. Specifically, considering a
small power β = nδ enables Montanari and Shah to estimate the number of assignments that satisfy all but o(n)
clauses. The proof combines an interpolation on 0 ≤ β ≤ nδ with a contraction argument that improves over the
previous contraction estimates from [13, 59, 62]. Instead of the interpolation on β, in order to prove Theorem 1.1
we use the Aizenman-Sims-Starr scheme. Because we count actual satisfying assignments, this requires the care-
ful combinatorial analysis of tail events, which is where the PULP algorithm and its analysis come in. Additionally,
towards the proof of Theorem 1.2 we devise an improved version of the contraction argument from [56]. Following
Montanari and Shah, we also take advantage of the impact of pure literals on the Belief Propagation operator. But
we develop an improved coupling scheme that yields a better range of d for which contraction occurs. Addition-
ally, once again because we deal with actual satisfying assignments, the proof of the Gibbs uniqueness property
involves the analysis of the PULP algorithm on a Galton-Watson tree in order to cope with unlikely events.

By comparison to the ‘soft’ random k-SAT model (2.28), few prior contributions deal with the actual number
Z (Φ) of satisfying assignments. A result of Abbe and Montanari [1] implies that a deterministic limit (in probability)

lim
n→∞

1

n
log Z (Φ) (2.33)

exists for Lesbegue-almost all 0 < d < dpure(k) for all k ≥ 2. However, the proof, which is based on the interpolation
method, does not reveal the value of (2.33). In fact, prior to the present work the limit (2.33) was known only in
two cases. First, in the trivial regime d < 1/(k −1) below the giant component threshold. Second, in the case k = 2
for 0 < d < dsat(2) = 2 [3]. In both cases the limit (2.33) coincides with the replica symmetric solution from [54].
Beyond the convergence in probability, log Z (Φ) is known to satisfy a central limit theorem in the case k = 2 [17].

To compute the limit (2.33) in the case k = 2 the contribution [3] employs the Aizenman-Sims-Starr scheme.
The couplings that we use towards the proofs of Proposition 2.5–2.6 generalise the argument from [3] to k ≥ 3. The
main technical novelty lies in the way that moderately unlikely events are treated. Specifically, in the case k = 2
the simple Unit Clause propagation algorithm, which essentially boils down to directed reachability, was sufficient
to derive a tail bound similar to (and actually stronger than) (2.5). By contrast, since in the case k ≥ 3 the clauses
“branch out”, the analysis of tail events and, accordingly, the derivation of (2.5) is far more delicate. The core of
this derivation is the detailed analysis of the PULP algorithm right up to dpure(k).

Finally, by contrast to random k-SAT the validity of the replica symmetric solution is known for the optimal pa-
rameter range for several other random constraint satisfaction problems that enjoy certain symmetry properties.

15


Examples include random graph colouring or random k-NAESAT [11, 25]. Due to the symmetry property 7 the
replica symmetric solution simply coincides with the first moment of the number of solutions. In effect, in many
symmetric problems it is even possible to precisely determine the limiting distribution of the number of solutions,
which superconcentrates on the first moment [24]. By contrast, in random k-SAT the first moment overshoots
the typical number of satisfying assignments by an exponential factor [5], which is why random k-SAT is so much
more delicate than symmetric problems. That said, there is a regular variant of random k-SAT (where every vari-
able appears an equal number of times positively and negatively) where symmetry and superconcentration are
recovered [29].

2.7. Organisation. In the remaining sections we work our way through the proofs of Theorems 1.1 and 1.2. Specif-
ically, in Section 3 we analyse the PULP algorithm introduced in Section 2.3, proving Lemmas 2.8-2.10, which facil-
itates many of the subsequent results.

In Section 4 we establish Proposition 2.1, verifying that the quantities appearing in Theorem 1.1 are well-
defined. The proof of Corollary 2.2 follows in Section 5. Section 6 is devoted to the Aizenman-Sims-Starr scheme,
and in particular the proof of Proposition 2.3. There we also complete the proof of Theorem 1.1.

Our final Section 7, deals with the remaining proofs toward establishing Theorem 1.2. We begin by proving
Lemma 2.14, showing that the log-likelihood ratios of the random Galton-Watson formula close to the root are
bounded w.h.p. This enables us to compare the output distribution of the non-random operator introduced in
Section 2.5 with that of actual ratios on the random tree. We then proceed with the proof of Proposition 2.15, and
conclude with the proof of (2.13), completing the proof of Theorem 1.2.

3. ANALYSIS OF PULP

This section is concerned with the analysis of PULP from Section 2.3. In particular, we prove Lemma 2.10. But let
us get the proof of Lemma 2.8 out of the way first.

3.1. Proof of Lemma 2.8. Suppose that L̄ ⊇L satisfies PULP1–PULP2. Let U = {|l | : l ∈ L̄ } be the set of variables
underlying the literals L̄ . Moreover, let χ : U → {±1} be the truth assignments under which all literals of l ∈ L̄

evaluate to ‘true’. Due to PULP2, the assignment χ is well defined. Moreover, since L ⊆ L̄ , under χ all literals
l ∈L evaluate to ‘true’. Hence, for a satisfying assignment σ ∈ S(Φ) define an assignment σ′ by letting

σ′(x) = 1{x ∈U }χ(x)+ 1{x ̸∈U }σ(x) .

Because L̄ satisfies condition PULP1, we have σ′ ∈ S(Φ,L ). Finally, because for a satisfying assignment τ′ ∈
S(Φ,L ) there are no more than 2|U | = 2|L̄ | satisfying assignments τ ∈ S(Φ) such that τ(x) = τ′(x) for all x ̸∈U , we

obtain the desired bound Z (Φ) ≤ 2|L̄ |Z (Φ,L ).

3.2. Turning a tree to PULP. While the ultimate goal of this section is to study the PULP algorithm on the random
formula Φ′ to prove Lemma 2.10, a necessary preparation is to investigate the algorithm on the random Galton-
Watson tree T = Td ,k . Of course, since T may be infinite we should formally confine ourselves to the finite trees
T(ℓ) truncated at the 2ℓ-th level from the root x. Hence, recalling (2.6), we aim to estimate the height hx(s,T(ℓ)) for
finite ℓ. That said, since these random variables are monotonically increasing in ℓ, it makes sense to define

hx(s,T) = lim
ℓ→∞

hx(s,T(ℓ)) ∈ [0,∞]. (3.1)

We point out that for d < dpure(k) the tails of hx(s,T) decay at a doubly exponential rate.

Lemma 3.1. For any d < dpure(k) there exist c1 = c1(d ,k),c2 = c2(d ,k) > 0 such that

P
[
hx(±1,T) ≥ h

]≤ c1 ·exp
(−exp(c2 ·h)

)
for every h ≥ 1 .

Proof. By symmetry it suffices to consider hx(1,T). Thus, let ph,ℓ = P
[
hx(1,T(ℓ)) ≥ h

]
. All variables at distance 2ℓ

from x are leaves and therefore pure in the tree T(ℓ). Consequently, pure literal elimination removes all clauses of
T(ℓ) within at most ℓ rounds. Hence, ph,ℓ = 0 for h > ℓ. Furthermore, we claim that

ph,ℓ =ϕd ,k (ph−1,ℓ−1), where ϕd ,k (z) = 1−exp

(
−d

2
zk−1

)
(1 ≤ h ≤ ℓ). (3.2)

7A formal definition of ‘symmetry’ could be that the uniform distribution on spins is a fixed point of the Belief Propagation operator on the
respective Galton-Watson tree.

16


Indeed, if hx(1,T(ℓ)) ≥ h ≥ 1 then by (2.6) there exists a clause a ∈ ∂Tr with sign(x, a) =−1 such that ha(T(ℓ)[x 7→ 1]) ≥
h −1. In other words, pure literal elimination on the sub-tree T(ℓ)[a] of T(ℓ) rooted at clause a and with variable x
removed takes at least h −1 rounds to remove clause a. Consequently, pure literal elimination on T(ℓ)[a] takes at
least h−1 rounds to remove one of the variables x ∈ ∂Ta \{x}. In other words, the sub-treeT(ℓ)[x] comprising x and
its successors satisfies

hx (sign(x, a),T(ℓ)[x]) ≥ h −1 for every x ∈ ∂Ta \ {x}. (3.3)

But sinceT is a Galton-Watson tree, the sub-treeT(ℓ)[x] has the same distribution as the random treeT(ℓ−1). Hence,
(3.3) implies that for every a ∈ ∂Tx with sign(x, a) =−1,

P
[
ha(T(ℓ)[x 7→ 1]) ≥ h −1 |T(1)

]
= pk−1

h−1,ℓ−1. (3.4)

Finally, the construction of T ensures that the number of a ∈ ∂Tx with sign(x, a) = −1 has distribution Po(d/2).
Therefore, (3.4) shows that

ph,ℓ =
∞∑

i=0
P [Po(d/2) = i ]

(
1−

(
1−pk−1

h−1,ℓ−1

)i
)
= 1−exp

(
−d

2
pk−1

h−1,ℓ−1

)
,

which completes the proof of (3.2).
Since the sequences (ph,ℓ)ℓ are non-decreasing, the limits ph = limℓ→∞ ph,ℓ exist. Moreover, (3.2) shows that

ph =ϕd ,k (ph−1) (h ≥ 1). (3.5)

Hence, recalling the definition (1.8) of dpure(k), we find

(
ph+1

ph

)k−1

=
(
ϕd ,k (ph)

ph

)k−1

= d ·
(
1−exp

(−d pk−1
h /2

))k−1

d pk−1
h

≤ d

dpure
< 1 .

Consequently,

lim
h→∞

ph = 0. (3.6)

To complete the proof we expand ϕd ,k (z) around z = 0:

ϕd ,k (z) = d

2
zk−1 +O(z2k−2) as z → 0. (3.7)

Thus, the function ϕd ,k (z) is well approximated by a (k −1)-th power. Since k ≥ 3, combining (3.5)–(3.7), we con-
clude that for sufficiently large h we have ph ≤ (d/2+1)pk−1

h−1. Consequently, ph ≤ c1 ·exp
(−exp(c2 ·h)

)
for suitable

c1 = c1(d ,k) and c2 = c2(d ,k). □

We remind ourselves that {±1 · x}T(ℓ) signifies the output of PULP run on the formula T(ℓ) with initial literal set
{±1 · x}. We extend the definition of the closure to the (possibly infinite) tree T by letting

{±1 · x}T =
⋂
ℓ0≥1

⋃
ℓ≥ℓ0

{±1 · x}T(ℓ) .

This definition ensures that if the height hx(±1,T) from (3.1) is finite, then

{±1 · x}T = {±1 · x}T(ℓ) for all ℓ≥ hx(±1,T).

In order to estimate the size of this set, we combine Lemma 3.1 with a crude bound on the total number of variable
nodes of the Galton-Watson tree T(ℓ). Recall that V (T(ℓ)) signifies the set of variable nodes of T(ℓ).

Lemma 3.2. Let d > 0. For any ℓ≥ 1 and any t > 100(1+d(k −1))2 we have

P
[
|V (T(ℓ))| > tℓ

]
≤ ℓexp(−tℓ/2/4).

Proof. Let Nℓ = |V (T(ℓ))| for brevity, set g = 10(1+d(k −1)) and notice that t > g 2. The construction of the Galton-
Watson tree T ensures that N 0 = 1 and that for ℓ ≥ 1 given Nℓ−1 we have Nℓ ∼ (k − 1) · Po(d Nℓ−1). Therefore,
Bennett’s inequality shows that

P
[

N h > g h−ℓtℓ | N h−1 ≤ g h−1−ℓtℓ
]
≤ exp

(
− tℓ

4gℓ−h

)
≤ exp(−tℓ/2/4) (1 ≤ h ≤ ℓ). (3.8)

17


Furthermore, if Nℓ > tℓ then there exists 1 ≤ h ≤ ℓ such that N h > g h−ℓtℓ while N h−1 ≤ g h−1−ℓtℓ. Hence, combin-
ing (3.8) with the union bound completes the proof. □
Corollary 3.3. For any d < dpure(k) there exists c3 = c3(d ,k) > 0 such that

P[|{±1 · x}T| > t ] ≤ c3 exp(−t 1/c3 ) for all t > 0.

Proof. By symmetry it suffices to consider N = |{x}T|. Since Lemma 3.1 shows that P[hx(1,T) < ∞] = 1, we may
assume from now on that indeed hx(1,T) <∞. Moreover, picking c3 = c3(d ,k) > 0 large enough, we may assume
that t > t0 for a large t0 = t0(d ,k). Let Nℓ = |V (T(ℓ))|, ph =P[

hx(1,T) = h
]

and g = 10(1+d(k−1)). It is an immediate
consequence of the way that PULP proceeds that for all l ∈ {x}T we have |l | ∈ V (T(hx(1,T))). Hence, N ≤ Nhx(1,T).
Therefore, by the law of total probability,

P [N > t ] ≤
∑

h≥0
Sh , where Sh =P[

hx(1,T) = h
]
P

[
N h > t | hx(1,T) = h

]
. (3.9)

Depending on the value of t in relation to h, we use either Lemma 3.1 or Lemma 3.2 to bound Sh .

Case 1: t0 < t ≤ g 2h : Lemma 3.1 shows that for certain c1,c2 > 0 we have

Sh ≤P[
hx(1,T) = h

]≤ c1 exp(−exp(c2h)) ≤ c12−h exp(−t 1/c3 ), (3.10)

provided c3 is chosen large enough.
Case 2: t > g 2h : we apply Lemma 3.2 to obtain

Sh ≤P [N h > t ] ≤ h exp(−
p

t/4) ≤ h2−h exp(−t 1/3), (3.11)

provided that t > t0 is sufficiently large.

Combining the bounds (3.9)–(3.11) completes the proof. □

Corollary 3.4. For any d < dpure(k) we have E[|{±1 · x}T|2] <∞.

Proof. This is an immediate consequence of Corollary 3.3. □
3.3. Proof of Lemma 2.10. Because the distribution ofΦ′ is invariant under variable permutations and inversions,
we may assume the initial set L of literals passed to PULP is just L = {x1, . . . , xL} for an integer L = Õ(1). For an
integer ℓ≥ 1 let φ′

ℓ,L be the sub-formula ofΦ′ comprising all clauses and variables at distance at most 2ℓ from L .

We recall that this formula has a bipartite graph representation G(φ′
ℓ,L) with variable nodes V (φ′

ℓ,L), clause nodes

F (φ′
ℓ,L) and edges E(φ′

ℓ,L). The excess of φ′
ℓ,L is defined as

X ℓ,L = |E(φ′
ℓ,L)|− |V (φ′

ℓ,L)|− |F (φ′
ℓ,L)|.

Thus, X ℓ,L =−L iff G(φ′
ℓ,L) consists of L acyclic components.

Lemma 3.5. Let d > 0, c > 0 and assume that L ≤ logc n and ℓ≤ c loglogn. Then

P
[

X ℓ,L >−L
]= Õ(n−1), P

[
X ℓ,L > 1−L

]= Õ(n−2). (3.12)

Furthermore, there exists c4 = c4(c,d ,k) > 0 such that

P
[
|V (φ′

ℓ,L)|+ |F (φ′
ℓ,L)| > logc4 n

]
=O(n−2). (3.13)

Proof. We study breadth first search (‘BFS’) on the graph G(Φ′) from the start vertices L by means of a routine
deferred decisions argument. Throughout the execution of BFS each variable node is in one of three possible
states: unexplored, active, or finished.

Towards the proof of (3.13) we study a ‘parallel’ version of BFS. More precisely, let A0 =L be the set of initially
active variables, let U0 = {x1, . . . , xn} \ L comprise the initially unexplored variables and let F0 = ;. Further, for
t ≥ 0 define At+1,Ut+1,Ft+1 as follows. If At =; then the process has stopped and we let At+1 = At =;,Ut+1 =
Ut ,Ft+1 = Ft . Otherwise let At+1 be the set of all variable nodes y ∈ Ut such that there exist an active variable
node x ∈At and a clause a that contains x and y ; in symbols, x, y ∈ ∂Φ′a. Further, let Ft+1 =Ft ∪At and Ut+1 =
Ut \ At+1. The BFS exploration occurs ‘in parallel’ in the sense that all active vertices activate their previously
unexplored second neighbours simultaneously.

Let Ft be the σ-algebra generated by the first t rounds of parallel exploring. Then the distribution of |At+1|
given Ft is stochastically dominated by a random variable with distribution (k −1)Po(d |At |). This is because by

18


the construction of the formula Φ′ the total number of clauses containing a given variable node has distribution
Po(d(1− (k −1)/n)). Hence, for any u > 0 we have

P [|At+1| > u |Ft ] ≤P [(k −1)Po(d |At |) > u] . (3.14)

To complete the proof of (3.13) we mimic the argument from the proof of Lemma 3.2. Thus, let u = logc4−3 n
for a large enough c4 = c4(c,d ,k) and set g = 10(1+d(k −1)). Since ℓ ≤ c loglogn, the bound (3.14) and Bennett’s
inequality show that

P
[
|At+1| > g t+1−ℓu | |At | ≤ g t−ℓu

]
≤ exp

(
− u

4gℓ−t+1

)
≤ exp(−pu/4) =O(n−3) (0 ≤ t < ℓ).

Hence, taking a union bound on 0 ≤ t < ℓ and observing that |V (φ′
ℓ,L)| ⊆A0 ∪·· ·∪Aℓ, we obtain

P
[
|V (φ′

ℓ,L)| > u logn
]
= Õ(n−3). (3.15)

Finally, another application of Bennett’s inequality demonstrates that with probability 1−O(n−2) no variable ofΦ′

appears in more than logn clauses. Thus, |F (φ′
ℓ,L)| ≤ |V (φ′

ℓ,L)| logn. Hence, (3.15) implies (3.13).
We are left to establish (3.12). The way we set up the BFS process implies that there are only two ways in which

excess edges can come about. First, there may be clauses a with ∂Φ′a ⊆At ∪At+1 such that |∂Φ′a ∩At | ≥ 2. Given
that |At ∪At+1| ≤ logc4 n, the number of such a with |∂Φ′a∩At | = 2 has distribution Po(Õ(1/n)), and the number of
a with |∂Φ′a∩At | > 2 has distribution Po(Õ(1/n2)). The second possibility is that for a variable x ∈At+1 there exist
clauses a,b ∈ ∂Φ′x with ∂Φ′a,∂Φ′b ⊆At ∪At+1. Once again the number of such clauses has distribution Po(Õ(1/n))
given |At ∪At+1| ≤ logc4 n. Furthermore, excess inducing clauses occur independently at different rounds t of the
BFS process. Thus, (3.12) follows from (3.13). □

We proceed to derive bounds on |L̄ | = |L̄Φ′ | depending on the value of the excess. To deal with the case of
excess −L, let Λ = Θ(loglogn) and let (T[i ])i≥1 be a sequence of independent copies of the random tree T. In
the case that the excess XΛ,L equals −L, the bound on |L̄ | follows from the fact that the Galton-Watson tree T
captures the local structure of the graph G(Φ′) in combination with the bound from Corollary 3.3. More precisely,
the following is true.

Lemma 3.6. For any 0 < d < duniq(k) and c > 0 there exists ζ= ζ(c,d ,k) > 0 such that withΛ= ⌈ζ loglogn⌉ uniformly
for all 1 ≤ L ≤ logc n and all u > 0 we have

P
[
1{XΛ,L =−L}|L̄ | > u

]≤P
[

L∑
i=1

|{x}T[i ]| > u

]
+O(n−2).

Proof. We begin by coupling the random formula φ′
ℓ,1 with the Galton-Watson tree T(ℓ)[1] for 0 ≤ ℓ ≤ Λ. The

coupling operates in accordance with the iterations of the BFS process from the proof of Lemma 3.5. Under the
coupling some of the variable and clause nodes of φ′

ℓ,1 and of the tree T(ℓ)[1] are identical, but both T(ℓ)[1] and

φ′
ℓ,1 may contain additional clauses or variables. These additional clauses/variables result from excess edges of

G(φ′
ℓ,1), i.e., edges that close cycles or merge different components in the course of the BFS process.

For ℓ = 0 we just identify the start variable x1 with the root x of the Galton-Watson tree T[1]. Going from ℓ to
ℓ+1, we remember the sets Aℓ,Aℓ+1 from the proof of Lemma 3.5. For each variable x ∈ Aℓ let Cx be the set of
clauses a ∈ ∂Φ′x such that |∂Φ′a ∩Aℓ+1| = k −1 and also such that none of the variables y ∈Aℓ+1 ∩∂Φ′a appear in
another clause b ̸= a with ∂Φ′b ⊆ Aℓ∪Aℓ+1. In other words, Cx contains all clauses a ∈ ∂Φ′x that do not induce
excess edges. Let d x = |Cx | be the number of such clauses. As we pointed out in the proof of Lemma 3.5, d x is
stochastically dominated by a Po(d) variable. Hence, there is a random variable d ′

x such that d x +d ′
x ∼ Po(d).

For any variable x ∈ Aℓ that is also a variable node of T(ℓ)[1] we add all clauses a ∈ Cx and the k −1 variables
y ∈ ∂Φ′a ∩Aℓ+1 to T(ℓ+1)[1]. Additionally, T(ℓ+1)[1] contains d ′

x independent random clauses that contain x and
k −1 new variable nodes without a counterpart in φ′

ℓ+1,1. Finally, to complete T(ℓ+1)[1] every variable y of T(ℓ)[1]

at distance precisely 2ℓ from r such that y ̸∈ V (φ′
ℓ,1) independently begets Po(d) offspring clause nodes, each

containing k −1 new variable nodes that do not belong to V (φ′
ℓ+1,1).

The coupling ensures that φ′
Λ,1 is a sub-formula of T(Λ)[1] unless XΛ,1 > −L. The extension of this coupling

to L = {x1, . . . , xL} is straightforward. We simply perform BFS exploration from the start variables x1, . . . , xL one
after the other. Given that XΛ,L = −L, we thus couple the sub-formula of Φ′ explored from each xi with T(Λ)[i ]

19


for 1 ≤ i ≤ L such that φ′
Λ,L is contained in the union of T(Λ)[1], . . . ,T(Λ)[L]. Finally, we obtain independent copies

T[1], . . . ,T[L] of the (possibly infinite) tree T by continuing the Galton-Watson processes T(Λ)[i ] independently for
depths ℓ>Λ.

The remaining task is to compare |L̄ | with
∑L

i=1 |{x}T[i ]|. If XΛ,L = −L and if hx(1,T[i ]) < Λ for all 1 ≤ i ≤ L,
then the coupling ensures that all clauses and variables of φ′

Λ,L are contained in the disjoint union of the trees

T[1], . . . ,T[L], and thus |L̄ | ≤∑L
i=1 |{x}T[i ]|. Therefore, for any u > 0 we have

P

[
1

{
XΛ,L =−L, max

1≤i≤L
hx(1,T[i ]) <Λ

}
|L̄ | > u

]
≤P

[
L∑

i=1
|{x}T[i ]| > u

]
. (3.16)

Furthermore, sinceΛ≥ ζ loglogn for a large ζ> 0, Lemma 3.1 ensures that

P

[
max

1≤i≤L
hx(1,T[i ]) ≥Λ

]
≤O(n−2). (3.17)

Combining (3.16) and (3.17) completes the proof. □

For later reference we make a note of the following immediate consequence of the coupling from the proof of
Lemma 3.6. For two rooted Boolean formulas φ,φ′ we write φ ∼= φ′ if there is an isomorphism of φ and φ′ that
preserves the root variable. We consider the random formulaφ′

ℓ,1 rooted at x1.

Corollary 3.7. For every ℓ≥ 0 and any fixed tree T we have
∣∣∣P

[
T(ℓ) ∼= T

]−P
[
φ′
ℓ,1

∼= T
]∣∣∣= o(1).

From here on, we set Λ= ⌈c5 loglogn⌉ for a large enough c5 = c5(d ,k) > 0. We obtain the following bound on the
second moment of |L̄ | on the event that the excess equals −L.

Corollary 3.8. For any 0 < d < duniq(k) and any 1 ≤ L ≤ log2 n we have E[1{XΛ,L =−L} · |L̄ |2] =O(1).

Proof. Since |L̄ | ≤ 2n deterministically, this is an immediate consequence of Corollary 3.3 and Lemma 3.6. □

As a next step we deal with the case that the excess equals 1−L. More precisely, with c6 = c6(d ,k) ≫ c5 a large
enough constant letΛ+ = ⌈c6 loglogn⌉. We are going to bound |L̄ | on the event that XΛ,L = XΛ+,L = 1−L. The proof
combines the bound on the probability of this event from Lemma 3.5 with a crude bound on |L̄ |. To elaborate,
since Lemma 3.5 shows that the event XΛ,L = XΛ+,L = 1−L has probability Õ(n−1), we can essentially get away with
simply bounding |L̄ | by the total number of variables within a 2Λ+ radius around the start variables L . Indeed, as
Lemma 3.5 shows, this number of variables is very likely polylogarithmic in n. Working out the details, we obtain
the following.

Lemma 3.9. Let 0 < d < duniq(k) and let 1 ≤ L ≤ log2 n. Then E
[
1{XΛ,L = XΛ+,L = 1−L}|L̄ |3/2

]= o(1).

Proof. Let V + =V (φ′
Λ,L)\V (φ′

Λ−1,L) and obtainψ− fromΦ′ by deleting all variables from V (φ′
Λ−1,L) and all clauses

from F (φ′
Λ,L). Further, let Λ− =Λ+−Λ and let ψ+ be the sub-formula of ψ− comprising all clauses and variables

of ψ− with distance at most 2Λ− from V + (see Figure 3 below). If XΛ,L = XΛ+,L = 1−L then

|V (φ′
Λ,L)|+ |F (φ′

Λ,L)|− |E(φ′
Λ,L)| = L−1, (3.18)

|V (ψ+)|+ |F (ψ+)|− |E(ψ+)| = |V +|. (3.19)

Moreover, Lemma 3.5 shows that for suitable c ′5 = c ′5(d ,k,c5),c ′6 = c ′6(d ,k,c6) > 0 we have

P
[
|V (φ′

Λ,L)|+ |F (φ′
Λ,L)| > logc ′5 n

]
=O(n−2), (3.20)

P
[
|V (φ′

Λ+,L)|+ |F (φ′
Λ+,L)| > logc ′6 n

]
=O(n−2). (3.21)

Let L̂ ⊆ L̄ be the set of literals l that were added to the output set L̄ by Step 7 of PULP by way of clauses
a ∈ F (φ′

Λ,L), i.e., at distance less than 2Λ from the initial set L . Let V − = {|l | : l ∈ L̂ }∩V + be the set of all variables

at distance 2Λ from L in Φ′ that underlie a literal from L̂ . If XΛ,L = 1−L, the variables and clauses at distance
at most 2Λ from L do not cause PULP to run into a contradiction, because each clause contains k ≥ 3 literals.
Therefore, there does not exist a variable x such that both x and ¬x belong to L̂ . Hence, because the signs of

20


1 2 L
· · ·

V +

2Λ

2Λ+
2Λ−

φφφ′
Λ+,L

φφφ′
Λ,L

ψψψ+

ψψψ−

... ...

...

... ... ...

...

...

... ... ...
...

...

...

FIGURE 3. A sketch depicting the subformulasψ+,ψ−,φ′
Λ,L , andφ′

Λ+,L ofΦ′ constructed above.

Φ′ are uniformly random and PULP proceeds in a BFS order, we may assume without loss of generality that L̂

contains positive literals only. Thus,

V − = L̂ ∩V + ⊆ V +. (3.22)

We now apply Lemma 3.6 to the random formulaψ−. Specifically, let L̄ + be the output of PULP on the formula
ψ− with the start set L + comprised by the positive literals of V +. Further, let E be the event that |V (φ′

Λ,L)| +
|F (φ′

Λ,L)| ≤ logc ′5 n. Let

X + = |E(ψ+)|− |V (ψ+)|− |F (ψ+)|

be the excess of ψ+. Since 0 < d < duniq(k), given E the formula ψ− has the same distribution as a random k-CNF
with n− = n −O(logc n) variables and m− ∼ Po(d−n−/k) random clauses, with 0 < d− = d +o(1) < duniq(k). Hence,
assuming that c6 = c6(d ,k) > c ′5 is sufficiently large, Lemma 3.6 shows that

P
[
1E∩ {X + =−|V +|} · |L̄ +| > u

]≤P

 ∑

1≤i≤logc′5 n

|{x}T[i ]| > u


+O(n−2) (u > 0). (3.23)

Combining (3.23) with Corollary 3.3, we conclude that for a large enough c7 = c7(d ,k) > c ′6,

P
[
1E∩ {X + =−|V +|} · |L̄ +| > logc7 n

]=O(n−2). (3.24)

21


In light of above, we see that

E
[
1{XΛ,L = XΛ+,L = 1−L}|L̄ |3/2]≤ E[

1E∩ {XΛ,L = XΛ+,L = 1−L}|L̄ |3/2]+ (2n)3/2(1−P [E]), [since |L̄ | ≤ 2n]

≤ E
[
1E∩ {XΛ,L = XΛ+,L = 1−L}(|L̄ +|+ logc ′5 n)3/2

]
+o(1), [from (3.20)]

≤ E
[
1E∩ {X + =−|V +|}(|L̄ +|+ logc ′5 n)3/2

]
+o(1), [from (3.18), (3.19)]

≤P[
1E∩ {X + =−|V +|} · |L̄ +| > logc7 n

]
(2n)3/2

+P[
{XΛ,L = 1−L}

]
(2logc7 n)3/2 +o(1) [total probability]

= o(1) [from (3.24),Lemma 3.5]

completing the proof. □

Proof of Lemma 2.10. The assertion is immediate from Corollary 3.8, Lemma 3.9, Lemma 3.5 and the deterministic
bound |L̄ | ≤ 2n. □

4. PROOF OF PROPOSITION 2.1

Let π(ℓ)
d ,k = BPℓd ,k (δ1/2) be the distribution obtained after ℓ iterations of BPd ,k (·), with the convention π(0)

d ,k = δ1/2.
We recall (µπ,i , j )i , j≥1 signify independent random variables with distribution π.

Fact 4.1. For all ℓ≥ 0 the random variables µ
π(ℓ)

d ,k ,1,1 and 1−µ
π(ℓ)

d ,k ,1,1 are identically distributed.

Proof. This is an immediate consequence of the fact that the random variables d+,d− from the definition (1.3)–
(1.4) are identically distributed. □

While the following is a direct consequence of the fact that Belief Propagation is ‘exact on trees’ (see [50, Chap-
ter 14] for precise statements), we carry out a detailed proof for the sake of completeness. Following the conven-
tions from Section 1.2.1, we continue to denote by τ(ℓ) a random satisfying assignment of the k-CNF T(ℓ) =T(ℓ)

d ,k .

Fact 4.2. For all ℓ≥ 0, d > 0 we have P
[
τ(ℓ)(x) = 1 |T]∼π(ℓ)

d ,k .

Proof. We proceed by induction on ℓ. As π(0)
d ,k = δ1/2, for ℓ = 0 there is nothing to show. To go from ℓ−1 to ℓ ≥ 1,

for a clause a ∈ ∂Tx and a variable y ∈ ∂Ta \
{
x
}

letTy→a be the component of the forestT−a obtained by removing

clause a that contains variable y . We consider y the root of Ty→a . Further, obtain T(ℓ−1)
y→a from Ty→a by deleting all

clauses and variables at a distance greater than 2(ℓ−1) from y . Additionally, for s ∈ {±1} let

Z (ℓ)(s) =
∣∣∣
{
σ ∈ S(T(ℓ)) :σ(x) = s

}∣∣∣ , Z (ℓ−1)
y→a (s) =

∣∣∣
{
σ ∈ S(T(ℓ−1)

y→a ) :σ(y) = s
}∣∣∣ . (4.1)

In words, Z (ℓ)(s) is the number of satisfying assignments of T(ℓ) that set the root x to s, and Z (ℓ−1)
y→a (s) is the corre-

sponding quantity for the sub-tree T(ℓ−1)
y→a .

Clearly, setting x to s ∈ {±1} immediately satisfies all clauses a ∈ ∂s
T
x. By contrast, once x is assigned the value +1

each clause a ∈ ∂−s
T
x needs to be satisfied by setting some other variable y ∈ ∂Ta \

{
x
}

to the value sign(y, a). Hence,

Z (ℓ)(s) =

 ∏

a∈∂+
T
x

∏
y∈∂Ta\{x}

∑
t∈{±1}

Z (ℓ−1)
y→a (t )


 ·

[ ∏
a∈∂−

T
x

( ∏
y∈∂Ta\{x}

∑
t∈{±1}

Z (ℓ−1)
y→a (t )−

∏
y∈∂Ta\{x}

Z (ℓ−1)
y→a

(−sign(y, a)
)
)]

. (4.2)

Furthermore, the definition of the Galton-Watson tree T ensures that the sub-trees T(ℓ−1)
y→a are independent copies

of T(ℓ−1). Hence, by induction we have

Z (ℓ−1)
y→a (1)

∑
s∈{±1} Z (ℓ−1)

y→a (s)
∼π(ℓ−1)

d ,k for all a ∈ ∂Tx, y ∈ ∂Ta \
{
x
}

, (4.3)

22


and the random variables Z (ℓ−1)
y→a (1)/

∑
s∈{±1} Z (ℓ−1)

y→a (s) are mutually independent. Combining (4.2)–(4.3) with Fact 4.1,
we finally obtain

P
[
τ(ℓ)(x) = 1 |T

]
= Z (ℓ)(1)

∑
s∈{±1} Z (ℓ)(s)

∼

∏d−
i=1

[
1−∏k−1

j=1 µπ(ℓ−1)
d ,k ,2i−1, j

]

∏d−
i=1

[
1−∏k−1

j=1 µπ(ℓ−1)
d ,k ,2i−1, j

]
+∏d+

i=1

[
1−∏k−1

j=1 µπ(ℓ−1)
d ,k ,2i , j

] ∼π(ℓ)
d ,k ,

thereby completing the induction. □

Combining the combinatorial interpretation of the distributions π(ℓ)
d ,k with the Gibbs uniqueness property, we

proceed to show that the sequence (π(ℓ)
d ,k )ℓ converges in the weak topology. To this end, it suffices to show that the

sequence is Cauchy with respect to the Wasserstein W1 metric.

Lemma 4.3. If d < duniq(k) then (π(ℓ)
d ,k )ℓ≥0 is a W1-Cauchy sequence.

Proof. If d < duniq(k) then the random treeT=Td ,k enjoys the Gibbs uniqueness property; hence, (1.1) is satisfied.
Consequently, given 0 < ε< 1 we can choose ℓ0 = ℓ0(d ,k,ε) > 0 large enough so that the event

Uε,ℓ =
{

max
τ∈S(T(ℓ))

∣∣∣P
[
τ(ℓ)(x) = 1 |T

]
−P

[
τ(ℓ)(x) = 1 |T, ∀x ∈ ∂2ℓx :τ(ℓ)(x) = τ(x)

]∣∣∣> ε
}

has probability

P
[
Uε,ℓ

]< ε , for all ℓ≥ ℓ0. (4.4)

Now suppose that ℓ0 ≤ ℓ < ℓ′. Let τ(ℓ),τ(ℓ′) be independent uniformly random satisfying assignments of T(ℓ)

and T(ℓ′), respectively. We claim that

P
[∣∣∣P

[
τ(ℓ)(x) = 1 |T

]
−P

[
τ(ℓ′)(x) = 1 |T

]∣∣∣> ε
]
< ε. (4.5)

To see this, let τ(ℓ,ℓ′) = (τ(ℓ′)(x))x∈∂2ℓ
T
x comprise the truth values that τ(ℓ′) assigns to the variables at distance exactly

2ℓ from x. Then

P
[
τ(ℓ′)(x) = 1 |T

]
= E

[
P

[
τ(ℓ′)(x) = 1 |T,τ(ℓ,ℓ′)

]
|T

]

= E
[
P

[
τ(ℓ)(x) = 1 |T,τ(ℓ,ℓ′), ∀x ∈ ∂2ℓx :τ(ℓ)(x) =τ(ℓ,ℓ′)

x

]
|T

]
.

Hence, for every T ∈Uε,ℓ we have
∣∣∣P

[
τ(ℓ′)(x) = 1 |T= T

]
−P

[
τ(ℓ)(x) = 1 |T= T

]∣∣∣≤ ε. (4.6)

Thus, (4.5) follows from (4.4) and (4.6).

Finally, since Fact 4.2 demonstrates that P
[
τ(ℓ)(x) = 1 |T= T

] ∼ π(ℓ)
d ,k and P

[
τ(ℓ′)(x) = 1 |T= T

]
∼ π(ℓ′)

d ,k , (4.5)

shows that

W1

(
π(ℓ)

d ,k ,π(ℓ′)
d ,k

)
< 2ε for all ℓ0 ≤ ℓ< ℓ′.

Hence, the sequence (π(ℓ)
d ,k )ℓ is Cauchy. □

We are left to bound the lower tail of the limiting distribution πd ,k = limℓ→∞π(ℓ)
d ,k .

Lemma 4.4. If d < duniq(k) then E log2µπd ,k ,1,1 <∞ .

Proof. We are going to bound E log2µ
π(ℓ)

d ,k ,1,1 and subsequently invoke the monotone convergence theorem to com-

plete the proof. First, we note that for all ℓ≥ 0 we have

E log2µ
π(ℓ)

d ,k ,1,1 = E log2 Z (T(ℓ), {x})

Z (T(ℓ))
[by Fact 4.2]

≤ E[|{x}T|2
]

[by Lemma 2.8]

<∞ [by Corollary 3.4].

23


Since πd ,k is the weak limit of (π(ℓ)
d ,k )ℓ, we conclude that for any N ∈N,

E
[

N ∧ log2µπd ,k ,1,1

]
= lim
ℓ→∞

E

[
N ∧ log2µ

π(ℓ)
d ,k ,1,1

]
≤ E[|{x}T|2

]<∞. (4.7)

Finally, applying the monotone convergence theorem to the limit N → ∞, we see that the uniform bound (4.7)
implies the assertion. □
Proof of Proposition 2.1. In light of Fact 4.1 and Lemmas 4.3 and 4.4, it only remains to show that

E

∣∣∣∣∣log

(
d−∏
i=1

µπd ,k ,2i +
d+∏
i=1

µπd ,k ,2i−1

)∣∣∣∣∣<∞ and E

∣∣∣∣∣log

(
1−

k∏
j=1

µπd ,k ,1, j

)∣∣∣∣∣<∞ .

Recall the definition of µπd ,k ,i from (1.4). Using Fact 4.1 and Lemma 4.4, we obtain

E

∣∣∣∣∣log

(
d−∏
i=1

µπd ,k ,2i +
d+∏
i=1

µπd ,k ,2i−1

)∣∣∣∣∣≤ log(2)+E
∣∣∣∣∣log

d−∏
i=1

µπd ,k ,2i

∣∣∣∣∣≤ log(2)+ d

2
E
∣∣∣logµπd ,k ,1

∣∣∣

≤ log(2)+ d

2

√
E
∣∣∣log2µπd ,k ,1,1

∣∣∣<∞ ,

yielding the first inequality. Similarly, invoking Fact 4.1 and Lemma 4.4 for the second l.h.s. above gives

E

∣∣∣∣∣log

(
1−

k∏
j=1

µπd ,k ,1, j

)∣∣∣∣∣≤ E
∣∣∣log

(
1−µπd ,k ,1,1

)∣∣∣= E
∣∣∣log

(
µπd ,k ,1,1

)∣∣∣≤
√
E
∣∣∣log2µπd ,k ,1,1

∣∣∣<∞ ,

thereby completing the proof. □

5. PROOF OF COROLLARY 2.2

In order to turn the estimate of the expectation of log1∨Z (Φ) provided by Proposition 2.3 into a ‘with high proba-
bility’ statement, we harness a ‘soft’ version of the k-SAT problem where violated clauses are discouraged but not
strictly forbidden. To be precise, for a k-CNFΦ and a real β> 0 define

Zβ(Φ) =
∑

σ∈{±1}V (Φ)

∏
a∈F (Φ)

exp(−β1{σ ̸|= a}). (5.1)

Thus, each satisfying assignment contributes one to the sum on the r.h.s. of (5.1), while the contribution of assign-
ments that violate a number M of clauses equals exp(−βM). The value Zβ(Φ), called the partition function of the
random k-SAT model at inverse temperature β, has received a considerable amount of attention in the mathemat-
ical physics literature (see, e.g., [57]). Crucially, by means of an interpolation argument [35, 41] it is possible to
prove the following.

Theorem 5.1 ([60, Theorem 1]). For any k ≥ 3, any β> 0 and any probability measure π on [0,1] we have

1

n
E
[
log Zβ(Φ)

]≤ E
[

log

(
d−∏
i=1

µβ,π,2i +
d+∏
i=1

µβ,π,2i−1

)
− d(k −1)

k
log

(
1−

(
1−e−β

) k∏
j=1

µπ,1, j

)]
, where (5.2)

µβ,π,i = 1− (1−exp(−β))
k−1∏
j=1

µπ,i , j for i ≥ 1.

We emphasise that the bound (5.2) holds for any n ≥ k without an error term. We also notice that by the monotone
convergence theorem for the measure π=πd ,k from Theorem 1.1 we have

lim
β→∞

E

[
log

(
d−∏
i=1

µβ,πd ,k ,2i +
d+∏
i=1

µβ,πd ,k ,2i−1

)
− d(k −1)

k
log

(
1−

(
1−e−β

) k∏
j=1

µπd ,k ,1, j

)]

= E
[

log

(
d−∏
i=1

µπd ,k ,2i +
d+∏
i=1

µπd ,k ,2i−1

)
− d(k −1)

k
log

(
1−

k∏
j=1

µπd ,k ,1, j

)]
=Bd ,k (πd ,k ) . (5.3)

The reason why we proceed by way of the ‘soft’ model with β<∞ is that for this model a routine application of
Azuma-Hoeffding implies the following concentration bound.

Lemma 5.2. For any fixed β> 0 we have P
[∣∣log Zβ(Φ)−E log Zβ(Φ)

∣∣>p
n logn

]= o(1/n).
24


Proof. The clauses of the random formulaΦ are drawn independently, and adding or removing a single clause can
alter the value of log Zβ( · ) by no more than ±β. □

Proof of Corollary 2.2. We proceed with a proof by contradiction. In particular, towards a contradiction, assume
there exists an ε> 0 such that for infinitely many n ≥ 1 we have

P

[
1

n
log Z (Φ) >Bd ,k (πd ,k )+ε

]
> ε . (5.4)

Moreover, by (5.3) we can find a β0 > 0 such that for every β≥β0 we have
∣∣∣∣∣E

[
log

(
d−∏
i=1

µβ,πd ,k ,2i +
d+∏
i=1

µβ,πd ,k ,2i−1

)
− d(k −1)

k
log

(
1−

(
1−e−β

) k∏
j=1

µπd ,k ,1, j

)]
−Bd ,k (πd ,k )

∣∣∣∣∣< ε/3 . (5.5)

Invoking Lemma 5.2 for β=β0 and sufficiently large n gives

P

[
1

n
log Zβ0 (Φ) > 1

n
Elog Zβ0 (Φ)+ε/3

]
≤ ε/3 . (5.6)

The definition (5.1) of the partition function ensures that Zβ(Φ) ≥ Z (Φ) for all β> 0. Therefore, combining (5.4)–
(5.6), and Theorem 5.1 we see that for large enough n the following holds with probability at least 1− 2

3ε:

1

n
log Z (Φ) ≤ 1

n
log Zβ0 (Φ) ≤ 1

n
Elog Zβ0 (Φ)+ ε

3
≤Bd ,k (πd ,k )+ 2

3
ε ,

contradicting our assumption, and thus completing the proof. □

6. PROOF OF PROPOSITION 2.3

In this section we prove Propositions 2.5 and 2.6, which in light of Fact 2.4, imply Proposition 2.3. Both proofs
follow a similar structure and make use of Propositions 2.7 and 2.11, which we therefore prove first.

6.1. Proof of Proposition 2.7. We show that both terms of (2.5) have finite expectation. Let us begin with the first
one.

Lemma 6.1. If d < duniq(k) then E

[∣∣∣log Z (Φ′′)∨1
Z (Φ′)∨1

∣∣∣
3/2

]
=O(1).

Proof. SinceΦ′′ is obtained fromΦ′ by adding clauses, we have

0 ≤ Z (Φ′′) ≤ Z (Φ′) ≤ 2n . (6.1)

Hence,

log
Z (Φ′′)∨1

Z (Φ′)∨1
= 0 if Z (Φ′) = 0. (6.2)

Therefore, we may assume from now on thatΦ′ is satisfiable.
The number∆′′ ∼ Po(d(k −1)/k) of new clauses is a Poisson variable with bounded mean. Therefore, Bennett’s

inequality shows that P
[
∆′′ > logn

] = O(n−2). Since (6.1) shows that | log((Z (Φ′′)∨ 1)/(Z (Φ′)∨ 1))|3/2 ≤ n3/2, we
conclude that

E

[
1
{
∆′′ > logn

} ·
∣∣∣∣log

Z (Φ′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
= o(1). (6.3)

Further, let c1, . . . ,c∆′′ be the new clauses added by CPL2. Let x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k be their constituent
variables and let X = {x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k }. Since the clauses c1, . . . ,c∆′′ are chosen uniformly and inde-
pendently, a routine balls-into-bins consideration shows that

P
[|X | ≤ k(∆′′−1) |∆′′ ≤ logn

]= Õ(n−2). (6.4)

Now, consider the ‘good’ event

G= {
Z (Φ′) > 0,∆′′ ≤ logn, |X | > k(∆′′−1)

}
.

25


Combining (6.1)–(6.4), we see that

E

[
(1− 1G) ·

∣∣∣∣log
Z (Φ′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
= o(1). (6.5)

Hence, we are left to bound E[1G·| log((Z (Φ′′)∨1)/(Z (Φ′)∨1))|3/2]. If G occurs and thus |X | > k(∆′′−1), then there
exists a set of literals L ⊆ {

x1,1,¬x1,1, . . . , x1,k ,¬x1,k , . . . , x∆′′,1,¬x∆′′,1, . . . , x∆′′,k ,¬x∆′′,k
}

such that

• every clause ci contains a literal from L (1 ≤ i ≤∆′′), and
• there does not exist x ∈X such that x ∈L and ¬x ∈L .

Moreover, on G we have |L | ≤ |X | ≤ k logn. Let L̄ = L̄Φ′ be the output of PULP on (Φ′,L ). Then Lemma 2.8
shows that

E

[
1G ·

∣∣∣∣log
Z (Φ′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
≤ E

[
1G ·

∣∣L̄
∣∣3/2

]
. (6.6)

Furthermore, since by CPL2 the new clauses c1, . . . ,c∆′′ are chosen independently of the formula Φ′, Lemma 2.10
implies that there exists C =C (d ,k) > 0 such that

E
[
1G ·

∣∣L̄
∣∣3/2 |∆′′

]
≤C · (∆′′)3/2 . (6.7)

Combining (6.6)–(6.7) and recalling that∆′′ ∼ Po(d(k −1)/k), we obtain

E

[
1G ·

∣∣∣∣log
Z (Φ′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
=O(1). (6.8)

Finally, the assertion follows from (6.5) and (6.8). □

We move on to the second term of (2.5).

Lemma 6.2. If d < duniq(k) then E

[∣∣∣log Z (Φ′′′)∨1
Z (Φ′)∨1

∣∣∣
3/2

]
=O(1).

Proof. We proceed similarly as in the proof of Lemma 6.1. The construction in CPL3 ensures thatΦ′′′ contains one
additional variable xn+1 and ∆′′′ ∼ Po(d) new clauses b1, . . . ,b∆′′′ that each contain xn+1 and k −1 other variables.
Let x1,1, . . . , x1,k−1, . . . , x∆′′′,1, . . . , x∆′′′,k−1 ∈ {x1, . . . , xn} be the variables among x1, . . . , xn that appear in b1, . . . ,b∆′′′

and let X = {x1,1, . . . , x∆′′′,k−1}. Then

0 ≤ Z (Φ′′) ≤ 2Z (Φ′) ≤ 2n+1. (6.9)

Hence, ifΦ′ is unsatisfiable, then so isΦ′′′ and thus

log
Z (Φ′′′)∨1

Z (Φ′)∨1
= 0 if Z (Φ′) = 0. (6.10)

Furthermore, since ∆′′′ ∼ Po(d), Bennett’s inequality shows that P
[
∆′′′ > logn

] = O(n−2). Therefore, (6.9) shows
that

E

[
1
{
∆′′′ > logn

} ·
∣∣∣∣log

Z (Φ′′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
= o(1). (6.11)

Moreover, since the k−1 variables among x1, . . . , xn that appear in the clauses b1, . . . ,b∆′′′ are chosen uniformly and
independently, a simple balls-into-bins argument shows that

P
[|X | ≤ (k −1)(∆′′−1) |∆′′ ≤ logn

]= Õ(n−2). (6.12)

Hence, consider the event

G= {
Z (Φ′) > 0,∆′′′ ≤ logn, |X | > (k −1)(∆′′−1)

}
.

Combining (6.9)–(6.12), we obtain

E

[
(1− 1G) ·

∣∣∣∣log
Z (Φ′′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
= o(1). (6.13)

26


Furthermore, if the event G occurs, then there exists a set L ⊆ {x,¬x : x ∈ X } of literals such that each clause bi ,
1 ≤ i ≤ ∆′′′, contains a literal l ∈ L and such that {x,¬x} ̸⊆ L for all x ∈ X . Hence, with L̄ = L̄Φ′ the output of
PULP on (Φ′,L ), Lemma 2.8 shows that

E

[
1G ·

∣∣∣∣log
Z (Φ′′′)∨1

Z (Φ′)∨1

∣∣∣∣
3/2

]
≤ E[

1G · |L̄ |3/2] . (6.14)

Furthermore, since the clauses b1, . . . ,b∆′′′ are drawn independently of Φ′′′, Lemma 2.10 shows that there exists
C =C (d ,k) > 0 such that

E
[
1G · |L̄ |3/2 |∆′′′]≤C · (∆′′′)3/2. (6.15)

Finally, since∆′′′ ∼ Po(d), the assertion follows from (6.13), (6.14) and (6.15). □

Proof of Proposition 2.7. The proposition follows immediately from Lemmas 6.1–6.2. □

6.2. Proof of Proposition 2.11. Let π(ℓ)
d ,k = BPℓd ,k (δ1/2) be the result of an ℓ-fold application of the operator BPd ,k

from (1.3) to the point mass at 1/2. Also recall from (2.7) thatπ′
n denotes the empirical distribution of the marginals

(P[σΦ′ (xi ) = 1 |Φ′])1≤i≤n .

Lemma 6.3. Suppose that d < duniq(k). For any ε> 0 there exists ℓ0 = ℓ0(d ,k,ε) > 0 such that for all ℓ≥ ℓ0 we have

E[W1(π′
n ,π(ℓ)

d ,k ) | Z (Φ′) > 0] < ε+o(1).

Proof. Assume that ℓ≥ ℓ0 for a large enough ℓ0 = ℓ0(d ,k,ε) > 0. Since d < duniq(k) and since T=Td ,k is a Galton-
Watson tree in which every variable node has Po(d) clause nodes as offspring and the offspring of every clause
node consists of k −1 variable nodes, there exists a set Tℓ of trees, with |Tℓ| =O(1), such that the following hold:

T0: for every T ∈Tℓ we have P
[
T(ℓ) = T

]> 0.
T1: P

[
T(ℓ) ∈Tℓ

]> 1−ε.
T2: given T(ℓ) ∈Tℓ we have

max
τ∈S(T(ℓ))

∣∣∣P
[
τ(ℓ)(x) = 1 |T(ℓ)

]
−P

[
τ(ℓ)(x) = 1 |T(ℓ), ∀x ∈ ∂2ℓx :τ(ℓ)(x) = τ(x)

]∣∣∣< ε.

For a variable node xi ofΦ′ obtain φ′
ℓ

(xi ) fromΦ′ by deleting all variables and clauses at distance greater than
2ℓ from xi . We consider xi being the root of φ′

ℓ
(xi ). Moreover, for a tree T ∈Tℓ let VT be the set of variable nodes

xi , 1 ≤ i ≤ n, such that φ′
ℓ

(xi ) ∼= T ; thus, there is an isomorphism of the CNFs T and φ′
ℓ

(xi ) that maps the root x of
T to xi . Consider the event

Tℓ =
{ ∑

T∈Tℓ

∣∣∣P
[
T(ℓ) ∼= T

]
−|VT |/n

∣∣∣< ε
}

. (6.16)

Then Corollary 3.7 implies that

P [Tℓ] = 1−o(1) for every ℓ≥ 0. (6.17)

We now claim that∣∣∣P
[
τ(ℓ)(x) = 1 |T(ℓ) = T

]
−P[

σΦ′ (xi ) = 1 |Φ′]∣∣∣< ε for all T ∈Tℓ, xi ∈ VT . (6.18)

To see this, let Sℓ(Φ′, xi ) be the set of all assignments σ ∈ {±1}∂
2ℓxi of the variables at distance 2ℓ from xi in Φ′

such that there exists a satisfying assignment σ′ ∈ S(Φ′) with σ′(y) = σ(y) for all y ∈ ∂2ℓxi . Then the law of total
probability shows that

P
[
σΦ′ (xi ) = 1 |Φ′]=

∑
σ∈Sℓ(Φ′,xi )

P
[
σΦ′ (xi ) = 1 |Φ′, ∀y ∈ ∂2ℓxi :σΦ′ (y) =σ(y)

]
P

[
∀y ∈ ∂2ℓxi :σΦ′ (y) =σ(y) |Φ′

]
.

(6.19)

Further, since for T ∈Tℓ and xi ∈ VT we have φ′
ℓ

(xi ) ∼= T , condition T2 implies that
∣∣∣P

[
σΦ′ (xi ) = 1 |Φ′, ∀y ∈ ∂2ℓxi :σΦ′ (y) =σ(y)

]
−P

[
τ(ℓ)(x) = 1 |T(ℓ) = T

]∣∣∣< ε. (6.20)

Combining (6.19) and (6.20), we obtain (6.18).
27


To complete the proof, we recall from Fact 4.2 thatπ(ℓ)
d ,k is precisely the distribution ofP

[
τ(ℓ)(x) = 1 |T(ℓ)

]
. There-

fore, coupling the formulas T(ℓ),Φ′ on the event Tℓ we have

W1(π′
n ,π(ℓ)

d ,k ) ≤P
[
T(ℓ) ̸∈Tℓ

]
+ 1

n

∑
T∈Tℓ

∑
x∈VT

∣∣∣P
[
τ(ℓ)(x) = 1 |T(ℓ) = T

]
−P[

σΦ′ (x) = 1 |Φ′]∣∣∣+ε [by (6.16)]

≤ 3ε [by T1 and (6.18)].

Combining this bound with (6.17) completes the proof. □

Proof of Proposition 2.11. The first assertion follows from Proposition 2.1, Lemma 6.3 and the fact that, since 0 <
d < duniq(k) < dsat(k), we have that P

[
Z (Φ′) > 0

]= 1−o(1).
The second follows a routine argument, which we present below for the case ℓ= 2 and it is standard to extend

to any finite ℓ (see [65, Proposition 2.5]). Let t =Θ(loglogn) and recall the definitions of φ′
t (xi ), Tt and St (Φ′, xi )

from the proof of Lemma 6.3. Consider the event D= {φ′
t (x1),φ′

t (x2) are disjoint tree formulas}.
From Lemma 3.5, we have that P [D] = 1−o(1). On the event D, Lemma 6.3 implies that for every σ1,σ2 ∈ {±1},

and τ1 ∈ St (Φ′, x1),τ2 ∈ St (Φ′, x2) we have

|P[
σΦ′ (xi ) =σi |Φ′, ∀y ∈ ∂2t xi :σΦ′ (y) = τi (y)

]−P[
σΦ′ (xi ) =σi |Φ′] | = o(1) , for i = 1,2 . (6.21)

Therefore, from the law of total probability and the triangle inequality we see that for every σ1,σ2 ∈ {±1}
∣∣P[

σΦ′ (x1) =σ1,σΦ′ (x2) =σ2 |Φ′]−P[
σΦ′ (x1) =σ1 |Φ′]P[

σΦ′ (x2) =σ2 |Φ′]∣∣

≤
∣∣∣
∣∣P[

σΦ′ (x1) =σ1 |Φ′,σΦ′ (x2) =σ2
]−Eτ1,τ2

[
P

[
σΦ′ (x1) =σ1,σΦ′ (x2) =σ2 |Φ′,τ1,τ2

]]∣∣

−
∣∣Eτ1,τ2

[
P

[
σΦ′ (x1) =σ1 |Φ′,τ1

]
P

[
σΦ′ (x2) =σ2 |Φ′,τ2

]]−P[
σΦ′ (x2) =σ2 |Φ′]P[

σΦ′ (x2) =σ2 |Φ′]∣∣
∣∣∣

=
∣∣∣
∣∣Eτ1,τ2

[
P

[
σΦ′ (x1) =σ1 |Φ′,τ1

]
P

[
σΦ′ (x2) =σ2 |Φ′,τ2

]]−P[
σΦ′ (x2) =σ2 |Φ′]P[

σΦ′ (x2) =σ2 |Φ′]∣∣
∣∣∣

≤ Eτ1

∣∣P[
σΦ′ (x1) =σ1 |Φ′,τ1

]−P[
σΦ′ (x1) =σ1 |Φ′]∣∣+Eτ2

∣∣P[
σΦ′ (x2) =σ2 |Φ′,τ2

]−P[
σΦ′ (x2) =σ2 |Φ′]∣∣

= o(1). [by (6.21)]

Summing over the four sign combinations of σ1,σ2 gives the desired result. □

6.3. Proof of Proposition 2.5. As in the proof of Lemma 6.1 let c1, . . . ,c∆′′ be the new clauses added by CPL2 and let
x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k be their constituent variables. Let X = {x1,1, . . . , x1,k , . . . , x∆′′,1, . . . , x∆′′,k }. For ε > 0
and z ∈ R define λε(z) = log(z ∨ ε). Finally, let (si )i≥0 be a sequence of uniformly random ±1-valued random
variables, mutually independent and independent of all other randomness.

Lemma 6.4. Assume that d < duniq(k). There exists B = B(d ,k) > 0 such that for all 0 < ε< 1 we have

limsup
n→∞

E

[(
∆′′∑
i=1

λε

(
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′]

))2

| Z (Φ′) > 0

]
≤ B.

Proof. Given Z (Φ′) > 0 we have

0 ≥λε
(

1−
k∏

j=1
P

[
σ(x1, j ) ̸= sign(x1, j ,c1) |Φ′]

)
≥λε

(
1−P[

σ(x1,1) ̸= sign(x1,1,c1) |Φ′]) . (6.22)

Recalling that∆′′ ∼ Po(d(k −1)/k), we combine (6.22) with Cauchy-Schwarz to obtain B ′ = B ′(d ,k) > 0 such that

E

[(
∆′′∑
i=1

λε

(
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′]

))2

| Z (Φ′) > 0

]

≤ B ′ ·E
[
λε

(
1−P[

σ(x1,1) ̸= sign(x1,1,c1) |Φ′])2 | Z (Φ′) > 0
]

. (6.23)

28


Further, since the function λε is bounded and continuous for every ε > 0 and since sign(x1,1,c1) is chosen inde-
pendently ofΦ′, Proposition 2.11 shows that for any ε> 0,

E
[
λε

(
1−P[

σ(x1,1) ̸= sign(x1,1,c1) |Φ′])2 | Z (Φ′) > 0
]
= E

[
λε

(
µπd ,k ,1,1

)2
]
+o(1)

≤ E
[

log
(
µπd ,k ,1,1

)2
]
+o(1). (6.24)

Since Proposition 2.1 shows that E

[
log

(
µπd ,k ,1,1

)2
]
=O(1), the assertion follows from (6.23) and (6.24). □

Lemma 6.5. Assume that d < duniq(k). For any δ> 0 there exists ε0 > 0 such that for all ε0 > ε> 0 we have

limsup
n→∞

∣∣∣∣∣E
[

log
Z (Φ′′)∨1

Z (Φ′)∨1

]
− d(k −1)

k
E

[
λε

(
1−

k∏
j=1

P
[
σ(x j ) = s j |Φ′]

)
| Z (Φ′) > 0

]∣∣∣∣∣< δ.

Proof. We choose small enough ξ = ξ(d ,k,δ) > ζ(ξ) > η = η(ζ) > ε0 = ε0(η) > 0, let 0 < ε < ε0 and assume that
n ≥ n0(ε) is large enough. Also let γ= γ(n) = o(1) be a sequence that tends to zero sufficiently slowly. Additionally,
let E be the event that all of the following conditions occur.

E1: Z (Φ′) > 0.
E2: ∆′′ ≤ ζ−1.
E3: |X | = k∆′′.
E4: maxx∈X ,s∈{±1}P[σ(x) = s |Φ′] ≤ 1−η.
E5:

∑
τ∈{±1}X

∣∣P[∀x ∈X :σ(x) = τ(x) |Φ′]−∏
x∈X P[σ(x) = τ(x) |Φ′]

∣∣< γ.

We claim that

P [E] ≥ 1−2ξ+o(1). (6.25)

Indeed, since 0 < d < duniq(k) < dsat(k), we have that P
[

Z (Φ′) > 0
] = 1 − o(1). Moreover, since ∆′′ ∼ Po(d(k −

1)/k), Markov’s inequality shows that P
[
∆′′ > ζ−1

] ≤ ζd < ξ. Further, since the new clauses c1, . . . ,c∆′′ are chosen
independently, we have P

[|X | = k∆′′ |∆′′ ≤ ζ−1
]= 1−O(1/n).

Moreover, per Proposition 2.11 we see that the joint distribution on the assignments over X must be approxi-
mately the product measure. The tails of the limiting distribution of the latter are controlled by (2.1). Therefore,
for small enough η we should have

P

[
max

x∈X ,s∈{±1}
P[σ(x) = s |Φ′] ≤ 1−η |∆′′ ≤ ζ−1, Z (Φ′) > 0

]
≥ 1−ξ .

Similarly, Proposition 2.11 shows together with Markov’s inequality that

P
[
E5 occurs |∆′′ ≤ ζ−1, Z (Φ′) > 0

]= 1−o(1) ,

provided that γ→∞ sufficiently slowly. Thus, we obtain (6.25).
Furthermore, (6.25) implies together with Proposition 2.7 and Hölder’s inequality that

E

∣∣∣∣(1− 1E) · log
Z (Φ′′)
Z (Φ′)

∣∣∣∣≤ δ/3+o(1), (6.26)

provided that ξ= ξ(d ,k,δ) > 0 is small enough. Analogously, (6.25), Lemma 6.4 and Cauchy-Schwarz yield

E

∣∣∣∣∣(1− 1E)λε

(
1−

k∏
j=1

P
[
σ(x j ) = s j |Φ′]

)∣∣∣∣∣≤ δ/3+o(1). (6.27)

Thus, we confine ourselves to the event E, on which we have Z (Φ′), Z (Φ′′) > 0 due to E1, E3, E4 and E5. Hence,

log
Z (Φ′′)∨1

Z (Φ′)∨1
= log

Z (Φ′′)
Z (Φ′)

= log
∑

τ∈{±1}X
1
{
τ |= c1, . . . ,c∆′′

}
P[∀x ∈X :σ(x) = τ(x) |Φ′]

= log
∑

τ∈{±1}X
1
{
τ |= c1, . . . ,c∆′′

} ∏
x∈X

P[σ(x) = τ(x) |Φ′]+o(1) [by E4, E5]

=
∆′′∑
i=1

log

[
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′]

]
+o(1) [by E3]. (6.28)

29


Further, E4 ensures that for any 1 ≤ i ≤∆′′,∣∣∣∣∣log

[
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′]

]
−λε

[
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′]

]∣∣∣∣∣< ξ. (6.29)

Thus, combining (6.28) and (6.29), we obtain

E

∣∣∣∣∣1E
(

log
Z (Φ′′)∨1

Z (Φ′)∨1
−
∆′′∑
i=1

λε

(
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′]

))∣∣∣∣∣< δ/3+o(1). (6.30)

Further, combining (6.26) and (6.30) with Lemma 6.4, we obtain
∣∣∣∣∣E

[
log

Z (Φ′′)∨1

Z (Φ′)∨1

]
−E

[
∆′′∑
i=1

λε

(
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′]

)
| Z (Φ′) > 0

]∣∣∣∣∣< δ+o(1). (6.31)

Finally, since the clauses c1, . . . ,c∆′′ are drawn uniformly and independently and since the distribution of Φ′ is
invariant under permutation of the variable nodes, we find

E

[
∆′′∑
i=1

λε

(
1−

k∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,ci ) |Φ′] | Z (Φ′) > 0

)]

= d(k −1)

k
E

[
λε

(
1−

k∏
j=1

P
[
σ(x j ) = s j

] |Φ′
)
| Z (Φ′) > 0

]
. (6.32)

Combining (6.31) and (6.32) completes the proof. □
Proof of Proposition 2.5. Proposition 2.11 shows together with Lemma 6.5 that

E

[
log

Z (Φ′′)∨1

Z (Φ′)∨1

]
= d(k −1)

k
E

[
λε

(
1−

k∏
j=1

µπd ,k ,1, j

)]
+oε(1), (6.33)

with oε(1) hiding a term that vanishes in the limit ε→ 0. Furthermore, in light of (2.1) the monotone convergence
theorem yields

E

[
log

(
1−

k∏
j=1

µπd ,k ,1, j

)]
= lim
ε→0

E

[
λε

(
1−

k∏
j=1

µπd ,k ,1, j

)]
. (6.34)

The assertion follows from (6.33) and (6.34). □
6.4. Proof of Proposition 2.6. We adapt the steps from Section 6.3 to the coupling ofΦ′,Φ′′′. Recall that the latter
is obtained by adding to Φ′ a single variable xn+1 along with ∆′′′ clauses b1, . . . ,b∆′′′ that each contain xn+1 and
k − 1 other variables. Thus, let x1,1, . . . , x1,k−1, . . . , x∆′′′,1, . . . , x∆′′′,k−1 ∈ {x1, . . . , xn} be the variables other than xn+1

that appear in b1, . . . ,b∆′′′ and let X = {x1,1, . . . , x∆′′′,k−1} be the set comprising all these variables.

Lemma 6.6. Assume that 0 < d < duniq(k). There exists B = B(d ,k) > 0 such that for all 0 < ε< 1 we have

limsup
n→∞

E

[
λε

( ∑
s∈{±1}

∆′′′∏
i=1

(
1− 1{sign(xn+1,bi ) ̸= s}

k−1∏
j=1

P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

))2

| Z (Φ′) > 0

]
≤ B.

Proof. Given thatΦ′ is satisfiable, and noticing that λε is increasing, and ε ∈ (0,1), we see that

0∧λε
( ∑

s∈{±1}

∆′′′∏
i=1

(
1− 1{sign(xn+1,bi ) ̸= s}

k−1∏
j=1

P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

))

≥λε
( ∑

s∈{±1}

∆′′′∏
i=1

1−
k−1∏
j=1

P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

)

≥λε
(
∆′′′∏
i=1

1−P[σ(x i ,1) ̸= sign(x i ,1,bi ) |Φ′]

)

=λε
(
∆′′′∏
i=1

P[σ(x i ,1) = sign(x i ,1,bi ) |Φ′]

)
≥
∆′′′∑
i=1

λε(P[σ(x i ,1) = sign(x i ,1,bi ) |Φ′]). (6.35)

30


We also notice that

0∨λε
( ∑

s∈{±1}

∆′′′∏
i=1

(
1− 1{sign(xn+1,bi ) ̸= s}

k−1∏
j=1

P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

))
< 1 . (6.36)

In light of the above, we now bound

E

[
λε

( ∑
s∈{±1}

∆′′′∏
i=1

(
1− 1{sign(xn+1,bi ) ̸= s}

k−1∏
j=1

P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

))2

| Z (Φ′) > 0

]

≤ E
[

1+
(
∆′′′∑
i=1

λε(P[σ(x i ,1) = sign(x i ,1,bi ) |Φ′]

)2

| Z (Φ′) > 0

]
[from (6.35),(6.36)]

≤ d(d +1)E
[

1+ (
λε(P[σ(x1,1) = sign(x1,1,b1) |Φ′])

)2 | Z (Φ′) > 0
]

[∆′′′ ∼ Po(d)]

≤ d(d +1)
(
1+E[

λε(P[σ(x1,1) = sign(x1,1,b1) |Φ′])2 | Z (Φ′) > 0
])

. (6.37)

Further, Proposition 2.11 implies that for any ε> 0,

E
[
λε(P[σ(x1,1) = sign(x1,1,b1) |Φ′])2 | Z (Φ′) > 0

]= E
[
λε(µπd ,k ,1,1)2

]
+o(1) ≤ E

[
log2µπd ,k ,1,1

]
+o(1). (6.38)

Finally, the assertion follows from (6.37) and (6.38). □

Lemma 6.7. Assume that 0 < d < duniq(k). For any δ> 0 there exists ε0 > 0 such that for all ε0 > ε> 0 we have

limsup
n→∞

∣∣∣∣E
[

log
Z (Φ′′′)∨1

Z (Φ′)∨1

]

−E
[
λε

( ∑
s∈{±1}

(
∆′′′∏
i=1

1− 1{sign(xn+1,bi ) ̸= s}
k−1∏
j=1

P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

))2

| Z (Φ′) > 0

]∣∣∣∣< δ.

Proof. Choose small enough ξ= ξ(d ,k,δ) > ζ(ξ) > η= η(ζ) > ε0 = ε0(η) > 0, let 0 < ε< ε0, suppose that n > n0(ε) is
sufficiently large and let 0 < γ= γ(n) = o(1) be a sequence that converges to zero slowly. Let E be the event that the
following conditions occur.

E1: Z (Φ′) > 0.
E2: ∆′′′ ≤ ζ−1.
E3: |X | = (k −1)∆′′′.
E4: maxx∈X ,s∈{±1}P[σ(x) = s |Φ′] ≤ 1−η.
E5:

∑
τ∈{±1}X

∣∣P[∀x ∈X :σ(x) = τ(x) |Φ′]−∏
x∈X P[σ(x) = τ(x) |Φ′]

∣∣< γ.

As in the proof of Lemma 6.5 we find that

P [E] ≥ 1−2ξ+o(1). (6.39)

Let

Lε =λε
( ∑

s∈{±1}

∆′′′∏
i=1

(
1− 1{sign(xn+1,bi ) ̸= s}

k−1∏
j=1

P[σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

))

for brevity. Combining Proposition 2.7, Lemma 6.6 and (6.39) and using Hölder’s inequality, we obtain

E

∣∣∣∣(1− 1E) log
Z (Φ′′)
Z (Φ′)

∣∣∣∣≤ δ/3+o(1), E
∣∣(1− 1E)Lε | Z (Φ′) > 0

∣∣≤ δ/3+o(1). (6.40)

31


Hence, we are left to compare E
∣∣∣1E · log Z (Φ′′)

Z (Φ′)

∣∣∣ and E
∣∣1E ·Lε | Z (Φ′) > 0

∣∣. On the event E we have Z (Φ′), Z (Φ′′′) >
0. Consequently,

log
Z (Φ′′′)∨1

Z (Φ′)∨1
= log

Z (Φ′′′)
Z (Φ′)

= log
∑

τ∈{±1}X∪{xn+1}

1
{
τ |= b1, . . . ,b∆′′′

}
P[∀x ∈X :σ(x) = τ(x) |Φ′]

= log
∑

τ∈{±1}X∪{xn+1}

1
{
τ |= b1, . . . ,b∆′′′

} ∏
x∈X

P[σ(x) = τ(x) |Φ′]+o(1) [by E4, E5]

= log

[ ∑
s∈{±1}

∆′′′∏
i=1

(
1− 1{sign(xn+1,bi ) ̸= s

}k−1∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

)]
+o(1) [by E3]. (6.41)

Now, E4 guarantees that

log

[ ∑
s∈{±1}

∆′′′∏
i=1

(
1− 1{sign(xn+1,bi ) ̸= s

}k−1∏
j=1

P
[
σ(x i , j ) ̸= sign(x i , j ,bi ) |Φ′]

)]
= Lε. (6.42)

Therefore, we combine (6.41) and (6.42) to obtain

E

∣∣∣∣1E
(
log

Z (Φ′′′)∨1

Z (Φ′)∨1
−Lε

)∣∣∣∣< δ/3+o(1). (6.43)

Finally, the assertion follows from (6.40) and (6.43). □

Proof of Proposition 2.6. Following similar steps as in the proof of Proposition 2.5, we see that the assertion follows
from Lemma 6.7, Proposition 2.1, Proposition 2.11, and the dominated convergence theorem. □

Proof of Proposition 2.3. Immediate from Fact 2.4, Proposition 2.5 and Proposition 2.6. □

7. PROOF OF PROPOSITION 2.15

7.1. Proof of Lemma 2.12. The proof is by induction on the height of the tree. The following claim summarises
the main step of the induction.

Claim 7.1. For all ℓ≥ 0, all variables x of T(ℓ) and all satisfying assignments τ ∈ S(T(ℓ)) we have

Z (T(ℓ)
x ,τ,τ+(x))

Z (T(ℓ)
x ,τ)

≤ Z (T(ℓ)
x ,τ+,τ+(x))

Z (T(ℓ)
x ,τ+)

. (7.1)

Proof. For boundary variables x ∈ ∂2ℓx there is nothing to show because the r.h.s. of (7.1) equals one. Hence,
consider a variable x ∈ ∂2qx for some q < ℓ. If Z (T(ℓ)

x ,τ,τ+(x)) = 0, then (7.1) is trivially satisfied. Hence, assume
that Z (T(ℓ)

x ,τ,τ+(x)) > 0.
Let a+

1 , . . . , a+
g be the children (clauses) of x with sign(x, a+

i ) =τ+(x). Also let y11, . . . , y1(k−1), . . . , yg 1, . . . , yg (k−1) be
the children (variables) of a+

1 , . . . , a+
g . Similarly, let a−

1 , . . . , a−
h be the children of x with sign(x, a−

i ) =−τ+(x) and let

z11, . . . , z1(k−1), . . . , zh1, . . . , zh(k−1) be their children. We claim that for all τ ∈ S(T(ℓ)),

Z (T(ℓ)
x ,τ,τ+(x)) =

(
g∏

i=1

k−1∏
t=1

Z (T(ℓ)
yi t

,τ)

)
·

h∏
j=1

(
k−1∏
t=1

Z (T(ℓ)
z j t

,τ)−
k−1∏
t=1

Z (T(ℓ)
z j t

,τ,−τ+(z j t ))

)
, (7.2)

Z (T(ℓ)
x ,τ,−τ+(x)) =

g∏
i=1

(
k−1∏
t=1

Z (T(ℓ)
yi t

,τ)−
k−1∏
t=1

Z (T(ℓ)
yi t

,τ,τ+(yi t ))

)
·
(

h∏
j=1

k−1∏
t=1

Z (T(ℓ)
z j t

,τ)

)
. (7.3)

For setting x to τ+(x) satisfies a+
1 , . . . , a+

g ; hence, arbitrary satisfying assignments of the sub-trees T(ℓ)
yi t

can be com-
bined, which explains the first product in (7.2). By contrast, upon assigning x the value τ+(x) we need to ensure
that each of the clauses a−

1 , . . . , a−
g are satisfied by at least one variable other than x. This explains the second factor

of (7.2). A similar argument yields (7.3). Dividing (7.3) by (7.2) and invoking the induction hypothesis (for q +1),
32


we obtain

Z (T(ℓ)
x ,τ,−τ+(x))

Z (T(ℓ)
x ,τ,τ+(x))

=
g∏

i=1

(
1−

k−1∏
t=1

Z (T(ℓ)
yi t

,τ,τ+(yi t ))

Z (T(ℓ)
yi t

,τ)

)
·

h∏
j=1

(
1−

k−1∏
t=1

Z (T(ℓ)
z j t

,τ,−τ+(zi ))

Z (T(ℓ)
z j t

,τ)

)−1

≥
g∏

i=1

(
1−

k−1∏
t=1

Z (T(ℓ)
yi t

,τ+,τ+(yi t ))

Z (T(ℓ)
yi t

,τ+)

)
·

h∏
j=1

(
1−

k−1∏
t=1

Z (T(ℓ)
z j t

,τ+,−τ+(zi ))

Z (T(ℓ)
z j t

,τ+)

)−1

= Z (T(ℓ)
x ,τ+,−τ+(x))

Z (T(ℓ)
x ,τ+,τ+(x))

,

completing the induction. □

Proof of Lemma 2.12. Applying Claim 7.1 to x = x completes the proof of Lemma 2.12. □

7.2. Proof of Lemma 2.14. We employ the PULP algorithm introduced in Section 2.3 and its analysis on the random
tree from Section 3.2. Recall that given an initial set of literals L , PULP returns a superset L̄ with the property that
the partial assignment obtained from setting all literals of L̄ to true, leaves no clause with only unsatisfying literals.
Let us write L̄ = L̄x,s for the set returned by PULP algorithm, initialized with the literal set L = {s · x}.

Claim 7.2. Let 0 ≤ t < ℓ and assume that x ∈ ∂2t
T
x, s ∈ {±1}, satisfy |L̄x,s | < ℓ− t . Then for all τ ∈ S(T(ℓ))

Z (T(ℓ)
x ,τ) ≤ 2|L̄x,s | ·Z (T(ℓ)

x ,τ, s) . (7.4)

Proof. Notice that under our assumption on the size of L̄x,s , the assignment τ does not clash with the one imposed
by PULP. The assertion therefore follows immediately from the same argument as in the proof of Lemma 2.8. □

Claim 7.3. We have limt→∞P
[|∂2t

T
x| > (200d · (k −1))t

]= 0.

Proof. This is an immediate consequence of Lemma 3.2. □

Proof of Lemma 2.14. Assume that ℓ> ct c for a large enough c = c(d ,k) > 0 and that t > t0 = t0(d ,k) is sufficiently
large. Then Corollary 3.3 shows that

P
[|L̄x,±1| ≥ t c]≤ exp(−t 2) . (7.5)

Combining Claim 7.3 with (7.5) and using the union bound, we obtain a sequence εt → 0 such that

P
[∀x ∈ ∂2t

T x : |L̄x,±1| < t c]≥ 1−εt . (7.6)

If x ∈ ∂2t
T
x satisfies |L̄x,±1| < t c and ℓ> ct c , then Claim 7.2 yields that for all x ∈ ∂2t x

∣∣∣η(ℓ)
x

∣∣∣≤ log
Z (T(ℓ)

x ,σ+)

Z (T(ℓ)
x ,σ+,+1)

+ log
Z (T(ℓ)

x ,σ+)

Z (T(ℓ)
x ,σ+,−1)

≤ |L̄x,+1|+ |L̄x,−1| ≤ 2t c . (7.7)

The result now follows from (7.6) and (7.7). □

7.3. Proof of Proposition 2.15. We focus on the operator LL⋆d ,k introduced in Section 2.5. Let ρ = (
ρ ,ρ⊕,ρ⊖

)
,

and ρ′ = (
ρ′
 ,ρ′

⊕,ρ′
⊖
)

be two arbitrary triplets in P (−∞,∞] ×P (0,+∞] ×P (−∞,0], and write ρ̂ = (
ρ̂ ρ̂⊕, ρ̂⊖

)

and ρ̂′ = (
ρ̂′
 ρ̂

′
⊕, ρ̂′

⊖
)

for the images LL⋆d ,k (ρ) and LL⋆d ,k (ρ′), respectively. We wish to bound distd
(
ρ̂, ρ̂′) in terms

of distd
(
ρ,ρ′).

To this end, we begin with bounding the W1-distance separately for each of the coordinates (ρ̂⊕, ρ̂′
⊕), (ρ̂⊖, ρ̂′

⊖)
and (ρ̂ , ρ̂′

 ). Observe that it is sufficient to consider only W1(ρ̂⊕, ρ̂′
⊕) and W1(ρ̂⊖, ρ̂′

⊖), as the triangle inequality
implies that W1(ρ̂ , ρ̂′

 ) ≤W1(ρ̂⊕, ρ̂′
⊕)+W1(ρ̂⊖, ρ̂′

⊖).
To spell out our bounds, we need to introduce some additional notation. Recall that for i , j ≥ 1 the random

variables η ,i , j , η⊕,i , j , η⊖,i , j follow the law of ρ , ρ⊕, ρ⊖, respectively. Similarly, let η′
 ,i , j , η′

⊕,i , j , η′
⊖,i , j be random

variables with law ρ′
 ,ρ′

⊕ and ρ′
⊖, respectively. We denote with η∧

 ,i , j the random variable η ,i , j ∧η′
 ,i , j , and with

η∨
 ,i , j the random variable η ,i , j ∨η′

 ,i , j . Similarly, we write η∧
⊕,i , j = η⊕,i , j ∧η′

⊕,i , j and η∨
⊕,i , j = η⊕,i , j ∨η′

⊕,i , j , and

also write η∧
⊖,i , j =η⊖,i , j ∧η′

⊖,i , j and η∨
⊖,i , j =η⊖,i , j ,η′

⊖,i , j .

33


Moreover, for a sign ε ∈ {±1} and a vector r = (r ,r⊕,r⊖,r#) of non-negative integers with r +r⊕+r⊖+r# = k −1
and 1 ≤ i ≤ r , 1 ≤ j ≤ r⊕, 1 ≤ ℓ≤ r⊖, we let

D 
i (z,r ;ε)

=
∣∣∣∣
∂

∂z
log

(
1− 1

2r#
Γ

(
ε(η ,1,1, . . . ,η ,1,i−1, z,η′

 ,1,i+1, . . .η′
 ,1,r )

)
Γ

(
ε(η′

⊕,1,1, . . . ,η′
⊕,1,r⊕ )

)
Γ

(
ε(η′

⊖,1,1, . . . ,η′
⊖,1,r⊖ )

))∣∣∣∣ .

Analogously, we define

D⊕
j (z,r ;ε)

=
∣∣∣∣
∂

∂z
log

(
1− 1

2r#
Γ

(
ε(η ,1,1, . . .η ,1,r )

)
Γ

(
ε(η⊕,1,1, . . . ,η⊕,1, j−1, z,η′

⊕,1, j+1, . . . ,η′
⊕,1,r⊕ )

)
Γ

(
ε(η′

⊖,1,1, . . .η′
⊖,1,r⊖ )

))∣∣∣∣ ,

D⊖
ℓ (z,r ;ε)

=
∣∣∣∣
∂

∂z
log

(
1− 1

2r#
Γ

(
ε(η ,1,1, . . .η ,1,r )

)
Γ

(
ε(η⊕,1,1, . . . ,η⊕,1,r⊕ )

)
Γ

(
ε(η⊖,1,1, . . . ,η⊖,1,ℓ−1, z,η′

⊖,1,ℓ+1, . . . ,η′
⊕,1,r⊖ )

))∣∣∣∣ .

With the above notation in place, we are now ready to bound W1(ρ̂⊕, ρ̂′
⊕). For each of the pairs of distributions

(ρ ,ρ′
 ), (ρ⊕,ρ′

⊕), and (ρ⊖,ρ′
⊖), fix an arbitrary coupling among its coordinates.

Lemma 7.4. W1(ρ̂⊕, ρ̂′
⊕) is upper bounded by

d/2

1−e−
d
2

·E
[r ,1∑

i=1

∫ η∨ ,1,i

η∧ ,1,i

D 
i (wi ,r 1;+1)dwi +

r⊕,1∑
j=1

∫ η∨⊕,1, j

η∧⊕,1, j

D⊕
j (y j ,r 1;+1)dy j +

r⊖,1∑
ℓ=1

∫ η∨⊖,1,ℓ

η∧⊖,1,ℓ

D⊖
ℓ (zℓ,r 1;+1)dzℓ

]
. (7.8)

Proof. Let us writeΞΞΞ′i , j (ε,r ) for the expression in the r.h.s. of (2.19) where distribution ρ′ is used instead of ρ, i..,

ΞΞΞ′i , j (ε,r ) = 1− 1

2r#
Γ

(
ε
(
η′
 ,4i+ j ,1, . . . ,η′

 ,4i+ j ,r 

))
Γ

(
ε
(
η′
⊕,4i+ j ,1, . . . ,η′

⊕,4i+ j ,r⊕
))
Γ

(
ε
(
η′
⊖,4i+ j ,1, . . . ,η′

⊖,4i+ j ,r⊖
))

. (7.9)

By identically coupling the number of clauses and the types of the children variables of each clause in ρ̂⊕, ρ̂′
⊕,

we see that by the definition of the W1 norm,

W1(ρ̂⊕, ρ̂′
⊕) ≤ E

[∣∣∣∣∣−
d⋆
+∑

i=1
log
ΞΞΞi ,3(+1,r 4i+3)

ΞΞΞ′i ,3(+1,r 4i+3)

∣∣∣∣∣

]
.

Applying Wald’s lemma, we further obtain

W1(ρ̂⊕, ρ̂′
⊕) ≤ d/2

1−e−d/2
·E

[∣∣∣∣∣log
ΞΞΞ1,3(+1,r 7)

ΞΞΞ′1,3(+1,r 7)

∣∣∣∣∣

]
= d/2

1−e−d/2
·E

[∣∣∣∣∣log
ΞΞΞ0,1(+1,r 1)

ΞΞΞ′0,1(+1,r 1)

∣∣∣∣∣

]
. (7.10)

Let us now focus on the expectation in the r.h.s. of (7.10). Recalling the definition of ΞΞΞ in (2.19), and the definition
of ΞΞΞ′ in (7.9), we expand

log
ΞΞΞ0,1(+1,r 1)

ΞΞΞ′0,1(+1,r 1)
= log

1−2−r# ·Γ
(
η ,1,1, . . . ,η ,1,r ,1

)
·Γ

(
η⊕,1,1, . . . ,η⊕,1,r⊕,1

)
·Γ

(
η⊖,1,1, . . . ,η⊖,1,r⊖,1

)

1−2−r# ·Γ
(
η′
 ,1,1, . . . ,η′

 ,1,r ,1

)
·Γ

(
η′
⊕,1,1, . . . ,η′

⊕,1,r⊕,1

)
·Γ

(
η′
⊖,1,1, . . . ,η′

⊖,1,r⊖,1

) . (7.11)

Telescoping over the arguments of the functions Γ in the r.h.s of (7.11), invoking the fundamental theorem of
calculus for each term, and applying the triangle inequality we further obtain
∣∣∣∣∣log

ΞΞΞ0,1(+1,r 1)

ΞΞΞ′0,1(+1,r 1)

∣∣∣∣∣≤
r ,1∑
i=1

∣∣∣∣∣
∫ η ,1,i

η′ ,1,i

D 
i (wi ,r 1;+1)dwi

∣∣∣∣∣+
r⊕,1∑
j=1

∣∣∣∣∣
∫ η⊕,1, j

η′⊕,1, j

D⊕
j (y j ,r 1;+1)dy j

∣∣∣∣∣+
r⊖,1∑
ℓ=1

∣∣∣∣∣
∫ η⊖,1,ℓ

η′⊖,1,ℓ

D⊖
ℓ (zℓ,r 1;+1)dzℓ

∣∣∣∣∣ .

Plugging the above into (7.10) gives the result. □

Following the same steps as above, but replacing ‘+1’ with ‘−1’, yields the corresponding bound for W1(ρ̂⊖, ρ̂′
⊖).

Lemma 7.5. W1(ρ̂⊖, ρ̂′
⊖) is upper bounded by

d/2

1−e−
d
2

·E
[r ,1∑

i=1

∫ η∨ ,1,i

η∧ ,1,i

D 
i (wi ,r 1;−1)dwi +

r⊕,1∑
j=1

∫ η∨⊕,1, j

η∧⊕,1, j

D⊕
j (y j ,r 1;−1)dy j +

r⊖,1∑
ℓ=1

∫ η∨⊖,1,ℓ

η∧⊖,1,ℓ

D⊖
ℓ (zℓ,r 1;−1)dzℓ

]
. (7.12)

34


Exploiting the signs of the variables with types ⊕ and ⊖, we obtain the following bounds for each of the D-
functions. For λ ∈ (0,1], we define the real functionψλ : [0,1] →R as

ψλ (w) = λ ·w

1−λ ·w
· (1−w) . (7.13)

It is easy to check thatψλ′ (w) ≤ψλ(w), for every λ′ ≤λ.

Claim 7.6. For every r = (
r ,r⊕,r⊖,r#

)
, and i ∈ [r ] we have that

D 
i (wi ,r ;+1) ≤ψ2−r#−r⊖

(
1+ tanh(wi /2)

2

)
and D 

i (wi ,r ;−1) ≤ψ2−r#−r⊕
(

1− tanh(wi /2)

2

)
. (7.14)

Similarly, we also have that for j ∈ [r⊕],

D⊕
j

(
y j ,r ;+1

)≤ψ2−r#−r⊖
(

1+ tanh(y j /2)

2

)
and D⊕

j

(
y j ,r ;−1

)≤ψ2−r#−(r⊕−1)

(
1− tanh(y j /2)

2

)
, (7.15)

and for ℓ ∈ [r⊖]

D⊖
ℓ (zℓ,r ;+1) ≤ψ2−r#−(r⊖−1)

(
1+ tanh(zℓ/2)

2

)
and D⊖

ℓ (zℓ,r ;−1) ≤ψ2−r#−r⊕
(

1− tanh(zℓ/2)

2

)
. (7.16)

Proof. We only prove the first inequality of (7.14) as the rest of them follow in a similar manner. A straightforward
calculation shows that for z ∈Rq ,ε ∈ {±1}, and i ∈ [q] we have

∂

∂zi
Γ (ε · z) = ε · 1− tanh(ε · zi /2)

2
·Γ (ε · z) . (7.17)

Writing K = 2−r#Γ
(
η ,1,1, . . . ,η ,1,i−1,η′

 ,1,i+1, . . .η′
 ,1,r 

)
Γ

(
η′
⊕,1,1, . . . ,η′

⊕,1,r⊕
)
Γ

(
η′
⊖,1,1, . . . ,η′

⊖,1,r⊖
)
, applying the chain

rule, and using (7.17), we see that

D 
i (wi ,r ;+1) =ψK

(
1+ tanh(wi /2)

2

)
. (7.18)

Using the fact that ρ′
⊖ is supported in (−∞,0], and that Γ ≤ 1, we obtain K ≤ 2−r#−r⊖ . The monotonicity ofψλ with

respect to the parameter λ concludes the proof. □

Using Claim 7.6, and maximising each of the functionsψλ appearing in (7.14)–(7.16), we can recover the bounds
of [50]. To obtain sharper bounds, a natural idea is to optimise groups of summands, instead of optimising each
D-summand of W1(ρ̂⊕, ρ̂′

⊕)+W1(ρ̂⊖, ρ̂′
⊖) in isolation. In particular, it is tempting to pair terms of the form D(·,−1)

with corresponding terms of the form D(·,+1), as Lemma 7.7 suggests.

Lemma 7.7. Let φλ : [0,1] → R to be the function φλ(w) = ψλ(w)+ψλ(1− w). For every λ ∈ (0,1], we have that
φλ(w) ≤φλ(1/2) = λ/2

1−λ/2 , for all w ∈ [0,1].

Proof. For λ= 1, we have thatψλ(w) = w implyingφλ(w) = 1, and thus, the result holds trivially. Let now λ ∈ (0,1).
Differentiatingψλ gives

ψ′
λ(w) = λ2w2 −2λw +λ

λ2w2 −2λw +1
= 1− 1−λ

(1−λw)2 .

Therefore,

φ′
λ(w) = 1− 1−λ

(1−λw)2 −
(
1− 1−λ

(1−λ(1−w))2

)
= 1−λ

(1−λ(1−w))2 − 1−λ
(1−λw)2 .

It is straightforward to check that the above expression has only one root at w = 1/2, being non-negative for w ∈
[0,1/2), and non-positive for w ∈ (1/2,1]. Therefore, φλ(1/2) = λ/2

1−λ/2 is the maximum value of φλ. □

However, directly applying Lemma 7.7 to W1(ρ̂⊕, ρ̂′
⊕)+W1(ρ̂⊖, ρ̂′

⊖) seems hopeless, since in the bounds supplied
by Claim 7.6, the parameters of the functions ψ bounding D(·,r,+1)-terms in W1

(
ρ̂⊕, ρ̂′

⊕
)

are quite different from
the parameters of the functionsψ bounding the corresponding D(·,r,−1)-terms in W1

(
ρ̂⊖, ρ̂′

⊖
)
.

The following lemma reveals, a somewhat unexpected, symmetry between W1(ρ̂⊕, ρ̂′
⊕) and W1(ρ̂⊖, ρ̂′

⊖), that
facilitates our pairing strategy.

35


Some additional notation is in order. We denote with R(k) for the set of all vectors r = (r ,r⊕,r⊖,r#) of non-
negative integer entries which sum to k −1. For every r ∈R(k) we use the shorthand

P (r ) = (k −1)!

r !r⊕!r⊖!r#!
·pr 

 pr⊕
⊕ pr⊖

⊖ pr#
# ,

where p , p⊕, p⊖, p# are the probabilities defined in (2.18). Finally, we define

E =
∑

r∈R(k)
r ≥1

P (r ) · r ·E
[∫ η∨ ,1,1

η∧ ,1,1

φ2−r#−r⊖
(

1+ tanh(w/2)

2

)
dw

]
, (7.19)

E⊕ =
∑

r∈R(k)
r⊕≥1

P (r ) · r⊕ ·E
[∫ η∨⊕,1,1

η∧⊕,1,1

φ2−r#−r⊖
(

1+ tanh(y/2)

2

)
dy

]
, (7.20)

E⊖ =
∑

r∈R(k)
r⊖≥1

P (r ) · r⊖ ·E
[∫ η∨⊖,1,1

η∧⊖,1,1

φ2−r#−r⊕
(

1+ tanh(z/2)

2

)
dz

]
. (7.21)

Lemma 7.8. We have that

W1(ρ̂⊕, ρ̂′
⊕)+W1(ρ̂⊖, ρ̂′

⊖) ≤ d/2

1−e−d/2

(
E +E⊕+E⊖

)
. (7.22)

Proof. Expanding the expectation in (7.8) with respect to r = (
r  ,r ⊕,r ⊖,r #

)
, and using the shorthand

E±
 (r ) = E

[∫ η∨ ,1,1

η∧ ,1,1

D 
1 (w,r ;±1)dw

]
, E±

⊕(r ) = E
[∫ η∨⊕,1,1

η∧⊕,1,1

D⊕
1

(
y,r ;±1

)
dy

]
, E±

⊖(r ) = E
[∫ η∨⊖,1,1

η∧⊖,1,1

D⊖
1 (z,r ;±1)dz

]
,

we see that

W1(ρ̂⊕, ρ̂′
⊕) ≤ d/2

1−e−d/2

( ∑
r∈R(k)

P (r ) · r ·E+
 (r )+

∑
r∈R(k)

P (r ) · r⊕ ·E+
⊕(r )+

∑
r∈R(k)

P (r ) · r⊖ ·E+
⊖(r )

)

= d/2

1−e−d/2




∑
r∈R(k)

r ≥1

P (r ) · r ·E+
 (r )+

∑
r∈R(k)
r⊕≥1

P (r ) · r⊕ ·E+
⊕(r )+

∑
r∈R(k)
r⊖≥1

P (r ) · r⊖ ·E+
⊖(r )


 . (7.23)

In a similar manner, we derive

W1(ρ̂⊖, ρ̂′
⊖) ≤ d/2

1−e−d/2




∑
r∈R(k)

r ≥1

P (r ) · r ·E−
 (r )+

∑
r∈R(k)
r⊕≥1

P (r ) · r⊕ ·E−
⊕(r )+

∑
r∈R(k)
r⊖≥1

P (r ) · r⊖ ·E−
⊖(r )


 . (7.24)

Let us now consider the bound on the sum W1(ρ̂⊕, ρ̂′
⊕)+W1(ρ̂⊖, ρ̂′

⊖) obtained by summing (7.23), (7.24). We
next group each of the three sums in (7.23) with the corresponding sum in (7.24), carefully pairing their terms.
Specifically, for the  –sums we match the term of

∑
r P (r ) · r ·E+

 (r ) corresponding to r = (r ,r⊕,r⊖,r#) with the
term of

∑
r ′ P (r ′) · r ′

 ·E−
 (r ′) that corresponds to r ′ = (r ,r⊖,r⊕,r#). Since r 7→ r ′ is a bijection of R(k)∩ {r : r ≥ 1},

and r ′
 = r , and P (r ′) = P (r ) we see that

∑
r∈R(k)

r ≥1

P (r ) · r 
(
E+
 (r )+E−

 (r )
)=

∑
r∈R(k)

r ≥1

P (r ) · r 
(
E+
 (r )+E−

 (r ′)
)

. (7.25)

Invoking the bounds (7.14) of Claim 7.6, and recalling the definitions ofφ, E , we upper bound the r.h.s. of (7.25) by

∑
r∈R(k)

r ≥1

P (r )r 

(
E

[∫ η∨ ,1,1

η∧ ,1,1

ψ2−r#−r⊖
(

1+ tanh(w/2)

2

)
dw

]
+E

[∫ η∨ ,1,1

η∧ ,1,1

ψ2−r#−r⊖
(

1− tanh(w/2)

2

)
dw

])
= E . (7.26)

The matchings between the terms for the ⊕,⊖–sums of (7.23), (7.24) are more delicate. In particular, for the ⊕–sum
it turns out that we can pull off the same trick as above by pairing the term of

∑
r P (r ) · r⊕ ·E+

⊕(r ) corresponding to
36


the vector r = (r ,r⊕,r⊖,r#) with the term of
∑

r ′′ P (r ′′) ·r ′′
⊕ ·E−

⊕(r ′′) that corresponds to r ′′ = (r ,r⊖+1,r⊕−1,r#). To
see this, note that the mapping r 7→ r ′′ is a bijection of R(k)∩ {r : r⊕ ≥ 1}, leaving the quantity P (r ) · r⊕ invariant as

P (r ) · r⊕ = (k −1)!

r !(r⊕−1)!r⊖!r#!
·pr 

 pr⊕
⊕ pr⊖

⊖ pr#
# = (k −1)!

r !(r⊖+1)!(r⊕−1)!r#!
·pr 

 pr⊕
⊕ pr⊖

⊖ pr#
# · (r⊖+1) = P (r ′′) · r ′′

⊕ .

Invoking the bounds (7.15) of Claim 7.6, recalling the definitions of φ, E⊕, and arguing as above, we obtain
∑

r∈R(k)
r⊕≥1

P (r ) · r⊕
(
E+
⊕(r )+E−

⊕(r ′′)
)≤ E⊕ . (7.27)

Similarly, using the mapping r 7→ r ′′′, with r ′′′= (r ,r⊖−1,r⊕+1,r#), and following the same steps as above, we get
∑

r∈R(k)
r⊖≥1

P (r ) · r⊖
(
E+
⊖(r )+E−

⊖(r ′′′)
)≤ E⊖ . (7.28)

Summing (7.26)–(7.28) concludes the proof. □

In light of the above, we are now ready to finish the proof of Proposition 2.15.

Proof of Proposition 2.15. Applying Lemma 7.7 on the function φ in the r.h.s. of (7.19) gives

φ2−r#−r⊖
(

1+ tanh(w/2)

2

)
≤ 2−r⊖−r#−1

1−2−r⊖−r#−1 ≤
(

1

2

)r⊖+r#
. (7.29)

Plugging the above into (7.19) and applying the binomial theorem, further yields

E ≤ (k −1) ·p 

(
1− e−

d
2

2

)k−2

E
[|η ,1,1 −η′

 ,1,1|
]

. (7.30)

Working in a similar manner, we obtain

E⊕ ≤ (k −1) ·p⊕

(
1− e−

d
2

2

)k−2

E
[|η⊕,1,1 −η′

⊕,1,1|
]

, and E⊖ ≤ (k −1) ·p⊖

(
1− e−

d
2

2

)k−2

E
[|η⊖,1,1 −η′

⊖,1,1|
]

. (7.31)

Finally, plugging the bounds (7.30), and (7.31) into (7.22) we see that W1(ρ̂⊕, ρ̂′
⊕)+W1(ρ̂⊖, ρ̂′

⊖) is upper bounded by

d · (k −1)

2
·
(

1− e−
d
2

2

)k−2[(
1−e−

d
2

)
E
[|η ,1,1 −η′

 ,1,1|
]+e−

d
2 E

[|η⊕,1,1 −η′
⊕,1,1|

]+e−
d
2 E

[|η⊖,1,1 −η′
⊖,1,1|

]]
. (7.32)

Recall that we established (7.32) assuming an arbitrary coupling between the coordinates of each pair of distribu-
tions (ρ ,ρ′

 ), (ρ⊕,ρ′
⊕), and (ρ⊖,ρ′

⊖). Therefore, the definition of W1 norm and (7.32), imply the first inequality
below, while (7.33) follows by the definition (2.22) of distd

W1(ρ̂⊕, ρ̂′
⊕)+W1(ρ̂⊖, ρ̂′

⊖) ≤ d(k −1)

2

(
1− e−

d
2

2

)k−2 [(
1−e−

d
2

)
W1(ρ ,ρ′

 )+e−
d
2 W1(ρ⊕,ρ′

⊕)+e−
d
2 W1(ρ⊖,ρ′

⊖)
]

≤ d(k −1)

2

(
1− e−

d
2

2

)k−2

distd (ρ,ρ′) . (7.33)

Moreover, as per the triangle inequality we see that

W1(ρ̂ , ρ̂′
 ) ≤W1(ρ̂⊕, ρ̂′

⊕)+W1(ρ̂⊖, ρ̂′
⊖) ≤ d(k −1)

2

(
1− e−

d
2

2

)k−2

distd (ρ,ρ′) . (7.34)

Plugging the bounds (7.33) and (7.34) into the expression of distd
(
ρ̂, ρ̂′) yields

distd
(
ρ̂, ρ̂′)= (1−e−d/2) ·W1(ρ̂ , ρ̂′

 )+e−d/2 (
W1(ρ̂⊕, ρ̂′

⊕)+W1(ρ̂⊖, ρ̂′
⊖)

)≤ d(k −1)

2

(
1− e−

d
2

2

)k−2

distd (ρ,ρ′) .

Recalling the definition of dcon, we see that for d < dcon(k), the operator LL⋆d ,k contracts with respect to the metric
distd , as desired. □

37


7.4. Proof of Proposition 2.13. To get a handle on the η(ℓ)
x from (2.12), we show that these quantities can be cal-

culated by propagating the extremal boundary condition σ+ bottom-up toward the root of the tree. Specifically,
we consider the operator

Λ+
T(ℓ) : (−∞,∞]V (T(ℓ)) → (−∞,∞]V (T(ℓ)) , η 7→ η̂=Λ+

T(ℓ) (η) ,

defined as follows. For all x ∈ ∂2ℓx we set η̂x = ∞. Moreover, for a variable x ∈ ∂2qx with q < ℓ having children
clauses a1, . . . , at , and grandchildren variables y1,1, . . . , y1,(k−1), . . . , yt ,1, . . . , yt ,(k−1) we define

η̂x =−
t∑

i=1
τ+(x)sign(x, ai ) · log

(
1−Γ (

τ+(x)sign(x, ai ) · (ηyi ,1 , . . . ,ηy1,(k−1) )
))

. (7.35)

It may not be apparent that the sum above is well-defined as −∞ summands may manifest. The following lemma
rules out such possibility and shows that the ℓ-fold iteration ofΛ+ (ℓ)

T(ℓ) , initiated all-(+∞) yields η(ℓ) = (η(ℓ)
x )x∈V (T(ℓ)).

Lemma 7.9. The operatorΛ+
T(ℓ) is well-defined andΛ+ (t )

T(ℓ) (+∞, . . . ,+∞) =η(ℓ) for every t ≥ ℓ.

Proof. To show that Λ+
T(ℓ) is well defined we verify that, in the notation of (7.35), η̂x ∈ (−∞,∞] for all x. Indeed,

in the expression on the r.h.s. of (7.35) a ±∞ summand can arise only from variables yi , j with ηyi , j =∞. But the
definition of τ+ ensures that such yi , j either render a zero summand if τ+(x)sign(x, ai ) =−1, or a +∞ summand if
τ+(x)sign(x, ai ) = 1. Thus, the sum is well-defined and η̂x ∈ (−∞,∞].

Further, to verify the identity η(ℓ) =Λ+ (ℓ)
T(ℓ) (∞, . . . ,∞), consider a variable x of T(ℓ). Let a+

1 , . . . , a+
g be the children

(clauses) of x with sign(x, a+
i ) = τ+(x). Also let y11, . . . , y1(k−1), . . . , yg 1, . . . , yg (k−1) be the children of a+

1 , . . . , a+
g . Sim-

ilarly, let a−
1 , . . . , a−

h be the children of x with sign(x, a−
i ) =−τ+(x) and let z11, . . . , z1(k−1), . . . , zh1, . . . , zh(k−1) be their

children. Then (7.2), and (7.3) yield

η(ℓ)
x =−

g∑
i=1

log

(
1−

k−1∏
q=1

Z (T(ℓ)
yi q

,τ+,τ+(yi q ))

Z (T(ℓ)
yi q

,τ+)

)
+

h∑
j=1

log

(
1−

k−1∏
q=1

Z (T(ℓ)
z j q

,τ+,−τ+(z j q ))

Z (T(ℓ)
z j q

,τ+)

)

=−
g∑

i=1
log

(
1−Γ

(
sign(x, a+

i )τ+(x) ·
(
η(ℓ)

yi 1
, . . . ,η(ℓ)

yi (k−1)

)))
+

h∑
j=1

log
(
1−Γ

(
sign(x, a−

i )τ+(x) ·
(
η(ℓ)

z j 1
, . . . ,η(ℓ)

z j (k−1)

)))
.

The assertion follows because sign(x, a+
i )τ+(x) = 1 and sign(x, a−

i )τ+(x) =−1. □

The next aim is to approximate the ℓ-fold iteration ofΛ+
T(ℓ) , and more specifically the distribution of η(ℓ)

x , using

a non-random operator. To this end, we need to cope with the ±∞-entries of the vector η(ℓ). This is addressed by
Lemma 2.14, proven in Section 7.2, which provides a bound on η(ℓ)

x for variables x near the root of the tree.
In the following we continue to write c and (εt )t for the number and the sequence supplied by Lemma 2.14.

Guided by Lemma 2.14 we consider the vector η(ℓ)
∧t of truncated log-likelihood ratios

(
η(ℓ)
∧t

)
x
=





−2t c if x ∈ ∂2t x and η(ℓ)
x <−2t c ,

2t c if x ∈ ∂2t x and η(ℓ)
x > 2t c ,

η(ℓ)
x otherwise .

(7.36)

Further, let η(ℓ,t ) be the result of t iterations ofΛ+
T(ℓ) ( · ) starting from η(ℓ)

∧t . The following corollary is a direct conse-
quence of Lemma 7.9 and Lemma 2.14.

Corollary 7.10. For any ℓ> ct c we have P[η(ℓ,t )
x ̸=η(ℓ)

x ] < εt .

Proof. Due to Lemma 2.14, the truncation in (7.36) is inconsequential with probability at least 1−εt , in which case

η(ℓ,t ) =Λ+ (t )
T(ℓ)

(
η(ℓ)
∧t

)
=Λ+ (t )

T(ℓ) (η(ℓ)) =Λ+ (ℓ+t )
T(ℓ) (+∞, . . . ,+∞) =η(ℓ) ,

where the last equality follows from Lemma 7.9. □
Recall that we defined the non-random operator LL⋆d ,k from (2.17), mimicking Λ+

T(ℓ) . To make the connection

between the random operatorΛ+
T(ℓ) and LL⋆d ,k precise, we introduce the following concepts. Given a tree formula

T we write V (T ), for the set of x variables of T that appear both as positive and negative literals in the sub-tree Tx

comprising x and its the progeny. We define V#(T ),V⊕(T ), and V⊖(T ) similarly. Note that the above sets constitute
38


a partition of V (T ). We use tp : V (T ) → { ,⊕,⊖,#} to indicate the part each vertex belongs to. We denote with
T

(ℓ)
 the random Galton-Watson formula T conditioned on the root satisfying tp(x) = . We define T(ℓ)

⊕ , and T
(ℓ)
⊖

analogously. Degenerately, we also write T(ℓ)
# for the formula comprised by a single variable x. Let us denote with

η̂(ℓ,t )
 the distribution of (η(ℓ)

∧t )x in T(ℓ)
 . Moreover, let η̄(ℓ−t )

 be the distribution of

η(ℓ−t )
x · 1

{
|η(ℓ−t )

x | ≤ 2t c
}
+2t c · 1

{
η(ℓ−t )
x > 2t c

}
−2t c · 1

{
η(ℓ−t )
x <−2t c

}
,

i.e., the truncation of η(ℓ−t )
x in T

(ℓ)
 . Analogously we define the distributions η̂(ℓ,t )

⊕ , η̂(ℓ,t )
⊖ , and η̄(ℓ−t )

⊕ , η̄(ℓ−t )
⊖ . Notice

that, degenerately, η̂(ℓ,t )
# = η̄(ℓ−t )

# = δ0.

Lemma 7.11. For ℓ> ct c we have that
(
η̂(ℓ,t )
 , η̂(ℓ,t )

⊕ , η̂(ℓ,t )
⊖

)
= LL⋆d ,k

(
η̄(ℓ−t )
 , η̄(ℓ−t )

⊕ , η̄(ℓ−t )
⊖

)
.

Proof. We use induction on t . Specifically, let ν= (
ν ,ν⊕,ν⊖

)
be any triplet in P (−∞,∞]×P [0,+∞]×P [−∞,0],

and ν(t ) = LL⋆(t )
d ,k (ν) be the outcome of the t-fold application of LL⋆d ,k . Moreover, let (ηx )x∈V (T(t )) be a vector of

independent samples with ηx ∼ νtp(x). We claim that root value η(t )
x of the random operatorΛ+ (t )

T(t ) , coincides with
νtp(x). Indeed, for t = 1 the claim follows readily from the definitions. For the inductive step, we notice that the t-
fold application of LL⋆d ,k is obtained by applying LL⋆d ,k to the (t −1)-fold application. Per the induction hypothesis

(
Λ+ (t−1)
T(t−1) (ηx )x

)
x
∼ ν(t−1)

tp(x) . (7.37)

Applying LL⋆d ,k to ν(t−1) implies the result as the first layer of T(t ) is independent of the subtrees rooted at the

grandchildren ∂2x of the root, which are distributed as i.i.d. copies of T(t−1). The lemma follows from applying the

above identity to ν=
(
η̄(ℓ−t )
 , η̄(ℓ−t )

⊕ , η̄(ℓ−t )
⊖

)
. □

Refining the definition of the BPd ,k operator in (1.3), we write BP d ,k for the operator obtained from BPd ,k upon

conditioning on d+,d− ≥ 1. Similarly BP⊕
d ,k and BP⊖

d ,k are obtained from BPd ,k upon conditioning on d+ ≥ 1,d− =
0, and d+ = 0,d− ≥ 1, respectively. We define

π d ,k = BP d ,k

(
πd ,k

)
, π⊕d ,k = BP⊕

d ,k

(
πd ,k

)
, π⊖d ,k = BP⊖

d ,k

(
πd ,k

)
.

Let us write γ,γ−1 for the continuous and mutually inverse real functions

γ :R→ (0,1), z 7→ (1+ tanh(z/2))/2, γ−1 : (0,1) →R, p 7→ log(p/(1−p)) . (7.38)

Let ρ d ,k = γ−1(π d ,k ), and define ρ⊕d ,k ,ρ⊖d ,k similarly.

Claim 7.12. The vector
(
ρ d ,k ,ρ⊕d ,k ,ρ⊖d ,k

)
is a fixed point of the operator LL⋆d ,k .

Proof. Let ρd ,k = γ−1
(
πd ,k

)
. First, we claim that

LL⋆d ,k

(
ρd ,k ,ρd ,k ,ρd ,k

)=
(
ρ d ,k ,ρ⊕d ,k ,ρ⊖d ,k

)
. (7.39)

Indeed, since all input distributions are the same, by Proposition 2.1, the two summands in the left term of (2.21)
corresponding to d⋆

+ and d⋆
− are identically distributed, and also identically distributed to the sums that appear in

the other two terms. Therefore, (7.39) follows directly from the definitions of BP d ,k ,BP⊕
d ,k , and BP⊖

d ,k . The claim

now follows from (7.39), the definition of the operator LL⋆d ,k , and the law of total probability. □

Let ρ(ℓ) be the distribution of the log-likelihood ratio η(ℓ)
x .

Corollary 7.13. For d < dcon(k) the sequence
(
γ

(
ρ(ℓ)

))
ℓ converges weakly to πd ,k .

Proof. The result follows by combining Corollary 7.10, Lemma 7.11, Proposition 2.15, Claim 7.12, and applying the
continuous mapping theorem and the law of total probability. □

Proof of Proposition 2.13. Recall that we writeΛ+ (ℓ)
T(ℓ) for the ℓ-fold iteration of the operatorΛ+

T
. Let us write θ(ℓ)

x =
(
Λ+ (ℓ)
T(ℓ) (0, . . . ,0)

)
x. Using arguments similar to Fact 4.2, we can show that θ(ℓ)

x is nothing but the distribution of the

random variable γ−1(P
[
τ(ℓ)(x) = 1 |T]

). Therefore,

P[τ(ℓ)(x) = 1 |T] ∼ γ(θ(ℓ)
x ) , and P[τ(ℓ)(x) = 1 | ∀y ∈ ∂2ℓx :τ(ℓ)(y) =τ+(y),T] ∼ γ(η(ℓ)

x ) .

39


Due to Lemma 2.12, 0 ≤ γ(θ(ℓ)
x ) ≤ γ(η(ℓ)

x ) ≤ 1. Moreover, from Lemma 7.11, Proposition 2.15, and Claim 7.12, we see

that for d < dcon(k) the sequence
(
γ(θ(ℓ)

x )
)
ℓ converges weakly to πd ,k . Finally, Corollary 7.13 implies that

(
γ(η(ℓ)

x )
)
ℓ

also converges weakly to πd ,k , and thus,

lim
ℓ→∞

E
∣∣∣γ(θ(ℓ)

x )−γ(η(ℓ)
x )

∣∣∣= lim
ℓ→∞

∣∣∣E
[
γ(θ(ℓ)

x )
]
−E

[
γ(η(ℓ)

x )
]∣∣∣= 0 ,

implying the assertion. □

ACKNOWLEDGEMENTS

We would like to thank the anonymous referees for thoroughly reviewing our paper and for suggesting valuable
corrections and improvements.

Amin Coja-Oghlan is supported by DFG CO 646/3, DFG CO 646/5 and DFG CO 646/6. Catherine Greenhill is
supported by ARC DP250101611. Vincent Pfenninger is supported by the Austrian Science Fund (FWF) [10.55776 /
16502]. Pavel Zakharov is supported by DFG CO 646/6. Kostas Zampetakis is supported by DFG CO 646/5. For
open access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version
arising from this submission.

REFERENCES

[1] E. Abbe, A. Montanari: On the concentration of the number of solutions of random satisfiability formulas. Random Structures and Algo-
rithms 45 (2014) 362–382.

[2] D. Achlioptas, A. Coja-Oghlan: Algorithmic barriers from phase transitions. Proc. 49th FOCS (2008) 793–802.
[3] D. Achlioptas, A. Coja-Oghlan, M. Hahn-Klimroth, J. Lee, N. Müller, M. Penschuck, G. Zhou: The number of satisfying assignments of

random 2-SAT formulas. Random Structures and Algorithms 58 (2021) 609–647.
[4] D. Achlioptas, A. Coja-Oghlan, F. Ricci-Tersenghi: On the solution-space geometry of random constraint satisfaction problems. Random

Structures and Algorithms 38 (2011) 251–268.
[5] D. Achlioptas, C. Moore: Random k-SAT: two moments suffice to cross a sharp threshold. SIAM Journal on Computing 36 (2006) 740–762.
[6] D. Achlioptas, A. Naor, Y. Peres: Rigorous location of phase transitions in hard optimization problems. Nature 435 (2005) 759–764.
[7] D. Achlioptas, Y. Peres: The threshold for random k-SAT is 2k ln2−O(k). Journal of the AMS 17 (2004) 947–973.
[8] M. Aizenman, R. Sims, S. Starr: An extended variational principle for the SK spin-glass model. Phys. Rev. B 68 (2003) 214403.
[9] D. Aldous, J. Steele: The objective method: probabilistic combinatorial optimization and local weak convergence. In: H. Kesten (ed.):

Probability on Discrete Structures. Springer (2004).
[10] N. Alon, J. Spencer: The probabilistic method. Wiley (2016).
[11] V. Bapst, A. Coja-Oghlan, S. Hetterich, F. Rassmann, D. Vilenchik: The condensation phase transition in random graph coloring. Commu-

nications in Mathematical Physics 341 (2016) 543–606.
[12] M. Bayati, D. Gamarnik, P. Tetali: Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Ann.

Probab. 41 (2013) 4080–4115.
[13] R. Biswas, W. Chen, A. Sen: On the replica symmetric solution in general diluted spin glasses. arXiv:2410.15599 (2024).
[14] S. Boucheron, G. Lugosi, P. Massart: Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford (2013).
[15] G. Bresler, B. Huang: The algorithmic phase transition of random k-sat for low degree polynomials. Proc. 62th FOCS (2021) 298–309.
[16] A. Broder, A. Frieze, E. Upfal: On the satisfiability and maximum satisfiability of random 3-CNF formulas. Proc. 4th SODA (1993) 322–330.
[17] A. Chatterjee, A. Coja-Oghlan, N. Müller, C. Riddlesden, M. Rolvien, P. Zakharov, H. Zhu: The number of random 2-SAT solutions is asymp-

totically log-normal. Proc. 28th RANDOM (2024) #39.
[18] P. Cheeseman, B. Kanefsky, W. Taylor: Where the really hard problems are. Proc. IJCAI (1991) 331–337.
[19] Z. Chen, A. Galanis, L. A. Goldberg, H. Guo, A. Herrera-Poyatos, N. Mani, A. Moitra: Fast sampling of satisfying assignments from random

k-SAT with applications to connectivity. SIAM J. Disc. Math. 38 (2024) 2750–2811.
[20] Z. Chen, A. Lonkar, C. Wang, K. Yang, Y. Yin: Counting random k-SAT near the satisfiability threshold. arXiv 2411.02980v1 (2024).
[21] V. Chvátal, B. Reed: Mick gets some (the odds are on his side). Proc. 33th FOCS (1992) 620–627.
[22] A. Coja-Oghlan: A better algorithm for random k-SAT. SIAM Journal on Computing 39 (2010) 2823–2864.
[23] A. Coja-Oghlan: Belief Propagation fails on random formulas. Journal of the ACM 63 (2017) #49.
[24] A. Coja-Oghlan, T. Kapetanopoulos, N. Müller: The replica symmetric phase of random constraint satisfaction problems. Combinatorics,

Probability and Computing 29 (2020) 346-422.
[25] A. Coja-Oghlan, F. Krzakala, W. Perkins, L. Zdeborová: Information-theoretic thresholds from the cavity method. Advances in Mathematics

333 (2018) 694–795.
[26] A. Coja-Oghlan, N. Müller, J. Ravelomanana: Belief Propagation on the random k-SAT model. Annals of Applied Probability 32 (2022)

3718–3796.
[27] A. Coja-Oghlan, K. Panagiotou: The asymptotic k-SAT threshold. Advances in Mathematics 288 (2016) 985–1068.
[28] A. Coja-Oghlan, W. Perkins: Belief Propagation on replica symmetric random factor graph models. Annales de l’institut Henri Poincare D

5 (2018) 211–249.
[29] A. Coja-Oghlan, N. Wormald: The number of satisfying assignments of random regular k-SAT formulas. Combinatorics, Probability and

Computing 27 (2018) 496–530.

40


[30] A. Dembo, A. Montanari: Gibbs measures and phase transitions on sparse random graphs. Brazilian Journal of Probability and Statistics
24 (2010) 137–211.

[31] A. Dembo, A. Montanari: Ising models on locally tree-like graphs. Annals of Applied Probability 20 (2010) 565–592.
[32] A. Dembo, A. Montanari, N. Sun: Factor models on locally tree-like graphs. Annals of Probability 41 (2013) 4162–4213.
[33] J. Ding, A. Sly, N. Sun: Proof of the satisfiability conjecture for large k. 20 Annals of Mathematics 196 (2022) 1–388.
[34] O. Dubois, J. Mandler: The 3-XORSAT threshold. Proc. 43rd FOCS (2002) 769–778.
[35] S. Franz, M. Leone: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111 (2003) 535–564.
[36] E. Friedgut: Sharp thresholds of graph properties, and the k-SAT problem. Journal of the AMS 12 (1999) 1017–1054.
[37] A. Frieze, S. Suen: Analysis of two simple heuristics on a random instance of k-SAT. Journal of Algorithms 20 (1996) 312–355.
[38] A. Galanis, L. A. Goldberg, H. Guo, K. Yang. Counting solutions to random SAT formulas. SIAM J. Comput. 50 (2021) 1701–1738.
[39] H.-O. Georgii: Gibbs measures and phase transitions. De Gruyter (1988).
[40] A. Goerdt: A threshold for unsatisfiability. J. Comput. Syst. Sci. 53 (1996) 469–486
[41] F. Guerra: Broken replica symmetry bounds in the mean field spin glass model. Comm. Math. Phys. 233 (2003) 1–12.
[42] M. Hajiaghayi, G. Sorkin: The satisfiability threshold of random 3-SAT is at least 3.52. IBM Research Report RC22942 (2003).
[43] K. He, K. Wu, K. Yang: Improved bounds for sampling solutions of random SAT formulas. Proc. 34th SODA (2023) 3330–3361.
[44] S. Hetterich: Analysing Survey Propagation Guided Decimationon Random Formulas. Proc. 43rd ICALP (2016) #65.
[45] S. Janson, T. Luczak, A. Ruciński: Random Graphs. Wiley (2000).
[46] A. Kaporis, L. Kirousis, E. Lalas: The probabilistic analysis of a greedy satisfiability algorithm. Random Structures and Algorithms 28 (2006)

444–480.
[47] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová: Gibbs states and the set of solutions of random constraint

satisfaction problems. Proc. National Academy of Sciences 104 (2007) 10318–10323.
[48] L. Lovász: Large networks and graph limits. AMS (2012).
[49] S. Mertens, M. Mézard, Riccardo Zecchina: Threshold values of random K-SAT from the cavity method. Random Structures and Algorithms

28 (2006) 340–373.
[50] M. Mézard, A. Montanari: Information, physics and computation. Oxford University Press (2009).
[51] M. Mézard, G. Parisi, R. Zecchina: Analytic and algorithmic solution of random satisfiability problems. Science 297 (2002) 812–815.
[52] A. Moitra. Approximate counting, the Lovasz local lemma, and inference in graphical models. J. ACM 66 #10 (2019).
[53] M. Molloy: Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms 27 (2005) 124–135.
[54] R. Monasson, R. Zecchina: The entropy of the k-satisfiability problem. Phys. Rev. Lett. 76 (1996) 3881.
[55] R. Monasson, R. Zecchina: Statistical mechanics of the random K -SAT model. Phys. Rev. E 56 (1997) 1357–1370.
[56] A. Montanari, D. Shah: Counting good truth assignments of random k-SAT formulae. Proc. 18th SODA (2007) 1255–1264.
[57] D. Panchenko: The Sherrington-Kirkpatrick model. Springer (2013).
[58] D. Panchenko: Spin glass models from the point of view of spin distributions. Annals of Probability 41 (2013) 1315–1361.
[59] D. Panchenko: On the replica symmetric solution of the K -sat model. Electron. J. Probab. 19 (2014) #67.
[60] D. Panchenko, M. Talagrand: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130 (2004) 319–336.
[61] A. Sly: Computational transition at the uniqueness threshold. Proc. 51st FOCS (2010) 287–296.
[62] M. Talagrand: The high temperature case for the random K -sat problem. Probab. Theory Related Fields 119 (2001) 187–212.
[63] L. Valiant: The complexity of enumeration and reliability problems. SIAM Journal on Computing 8 (1979) 410–421.
[64] C. Wang Y. Yin: A sampling Lovasz local lemma for large domain sizes. Proc. 65th FOCS (2024) 129–150.
[65] A. Coja-Oghlan, W. Perkins: Bethe states of random factor graphs. Communications in Mathematical Physics 366 (2019) 173–201.

ARNAB CHATTERJEE, arnab.chatterjee@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-ST, DORT-
MUND 44227, GERMANY.

AMIN COJA-OGHLAN, amin.coja-oghlan@tu-dortmund.de, TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATH-
EMATICS, 12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY.

CATHERINE GREENHILL, c.greenhill@unsw.edu.au, SCHOOL OF MATHEMATICS AND STATISTICS, UNSW SYDNEY, NSW 2052, AUS-
TRALIA.

VINCENT PFENNINGER, pfenninger@math.tu-graz.at,TU GRAZ, INSTITUTE OF DISCRETE MATHEMATICS, STEYRERGASSE 30, 8010 GRAZ,
AUSTRIA.

MAURICE ROLVIEN, maurice.rolvien@uni-hamburg.de, UNIVERSITY OF HAMBURG, FACULTY OF MATHEMATICS, INFORMATICS AND NAT-
URAL SCIENCES, DEPARTMENT OF INFORMATICS, VOGT-KÖLLN-STR. 30, 22527 HAMBURG, GERMANY.

PAVEL ZAKHAROV, pavel.zakharov@tu-dortmund.de,TU DORTMUND, FACULTY OF COMPUTER SCIENCE AND FACULTY OF MATHEMATICS,
12 OTTO-HAHN-ST, DORTMUND 44227, GERMANY.

KOSTAS ZAMPETAKIS, konstantinos.zampetakis@tu-dortmund.de,TU DORTMUND, FACULTY OF COMPUTER SCIENCE, 12 OTTO-HAHN-
ST, DORTMUND 44227, GERMANY.

41


	Acknowledgements
	Abstract
	Introduction
	Models
	Constraint Satisfaction Problems
	Definitions
	The SAT Problem
	Why SAT ?
	Factor Graphs

	Statistical Physics and CSPs
	Boltzmann (Gibbs) probability distribution
	Some statistical physics models


	Message Passing Algorithms
	Belief Propagation
	BP messages
	Computing marginals
	Bethe-Free Entropy

	Algorithms
	Belief Propagation Guided Decimation
	Decimation Process
	Unit Clause Propagation
	Pure Literal Pursuit

	Warning Propagation

	Phase Transitions in random CSPs
	The Satisfiabilty Transition
	Quenched and Annealed Techniques
	Gibbs measure and Long range correlation
	Gibbs measure on random CSPs
	Correlation decay and Gibbs Uniqueness
	Replica Symmetry
	Clustering transition: Reconstruction Property

	Different phases in random k-SAT and random k-XORSAT

	A Central Limit Theorem for random 2-SAT solutions
	Motivation and History
	Main Result.
	Proof Strategy
	Method of Moments fails.
	BP Approximation.
	Towards calculating variance.

	Establishing the Central Limit Theorem

	Performance of BPGD on random k-XORSAT
	Motivation and History
	Problem Statement and Results.
	Analysis of BPGD
	Phase Transition of Decimation process

	Proof Strategy.

	On the Gibbs Uniqueness in random k-SAT
	Motivation and History
	Main Results.
	Limit in probability of -partition function in random k-SAT
	Lower bound on Gibbs uniqueness

	Proof Strategy
	Existence of fixed point.
	Interpolation method: matching upper bound
	Aizenmann-Sims-Starr: matching lower bound
	Lower bound on Gibbs uniqueness threshold


	The Last Chapter
	Summary of the thesis
	Future Directions
	Contribution of the authors

	List of Papers