Machine Learning Renormalization Group for Statistical Physics

Wanda Hou    Yi-Zhuang You Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA
Abstract

We develop a Machine-Learning Renormalization Group (MLRG) algorithm to explore and analyze many-body lattice models in statistical physics. Using the representation learning capability of generative modeling, MLRG automatically learns the optimal renormalization group (RG) transformations from self-generated spin configurations and formulates RG equations without human supervision. The algorithm does not focus on simulating any particular lattice model but broadly explores all possible models compatible with the internal and lattice symmetries given the on-site symmetry representation. It can uncover the RG monotone that governs the RG flow, assuming a strong form of the c-theorem. This enables several downstream tasks, including unsupervised classification of phases, automatic location of phase transitions or critical points, controlled estimation of critical exponents and operator scaling dimensions. We demonstrate the MLRG method in two-dimensional lattice models with Ising symmetry and show that the algorithm correctly identifies and characterizes the Ising criticality.

I Introduction

Renormalization group (RG) is an elegant conceptual framework and powerful computational method in statistical physics and quantum field theory. RG extracts the relevant features at every given scale by progressively coarsening the degrees of freedom in a physics system in hierarchies. Conventionally, real space RG relies on human physicists to design coarse-graining transformations based on their intuition of the physical system. In this research, we aim to explore the potential for artificial intelligence (AI) to learn optimal RG transformations automatically from energy models. Unsupervised machine learning is particularly well-suited for this task due to its ability to learn low-dimensional representations or relevant features from data and to remove noise and irrelevant features without supervision. This approach aligns with the goal of learning RG transformations, which aim to extract relevant features of a physical system by transforming fine-grained configurations to coarse-grained ones while preserving essential information and correlations.

Prior research has demonstrated that neural networks can learn to perform hierarchical feature extraction at the configuration level [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. However, a more fascinating aspect of RG is its ability to quantitatively analyze the flow of physics theory in the parameter space at the model level [12, 13]. Therefore, this research aims to develop a novel machine-learning renormalization group (MLRG) method that can automatically formulate RG flow equations, discover RG monotones, propose effective theories, identify critical points, and estimate critical exponents, all starting from the symmetry and dimension of the physical system.

Refer to caption
Figure 1: High-level architecture of the MLRG algorithm. Three AI models are involved. The teacher and student models are both generative models based on RBM with different predefined neural network connects, representing the fine-grained and coarse-grained local energy models. The moderator model is a discriminative model to predict the RG monotone. The moderator first guides a Hamiltonian Monte Carlo (HMC) sampler to propose the teacher model parameters that is most worth training, and then predicts the student model parameters following the RG flow. After both RBMs get their model parameters, the teacher generates the data to train the student. The gradient back-propagates from the student to the moderator to train the RG monotone network. After training, moderator gains good knowledge about the RG flow throughout the parameter space, which can then be used for various downstream tasks.

In this study, we will focus on statistical mechanics models defined on regular lattices and develop machine learning algorithms to analyze them within the real space RG framework. Our proposed MLRG algorithm is depicted in Fig. 1. We introduce two restricted Boltzmann machines (RBMs) [14, 15, 16, 17] to model the local energy model at different scales on the fine-grained and coarse-grained lattices respectively. The fine-grained model serves as the teacher by generating samples of visible configurations, which are then used to train the coarse-grained model. The coarse-grained model acts as the student and learns its energy model to describe the training data provided by the teacher. We also introduce a third model, the RG monotone network, as a moderator that observes the teacher-student learning process and learns to predict how the model parameters of the coarse-grained model are related to those of the fine-grained model. This allows the moderator model to learn the RG flow and use its knowledge to guide a Hamiltonian Monte Carlo (HMC) [18, 19, 20] sampler to propose new model parameters that are most worth training. After training, we can use the machine-learned RG flow to identify RG fixed points in the parameter space and automate the analysis of physical properties at these fixed points.

The paper will be organized as follows. We will first introduce the MLRG algorithm in Sec. II, which includes (i) the teacher-student learning system Sec. II.2 to model the RG flow and (ii) the moderator Sec. II.4 and HMC sampling Sec. II.5 system to extract RG monotone and use it to guide the training. The teacher and student are modeled by equivariant RBMs, as formulated in Sec. II.1, whose point group representation choices are elaborated in Sec. II.3. We then demonstrate the application of MLRG on 2D Ising models in Sec. III. Sec. III.1 describes the problem setup. Sec. III.2 shows the machine-learned RG monotone and RG flow diagram. A few quantitative results are then presented, including the critical point Sec. III.3, the ground state degeneracy Sec. III.4, and the scaling dimension Sec. III.6. Sec. III.5 explains how to use Newton’s method to localize the RG fixed point. We summarize the advantage and limitations of MLRG and comment on its connections to related works in Sec. IV.

II Methodology

II.1 Statistical Mechanics Models

We begin with a general definition of a statistical mechanics system on a lattice. Let the lattice be described by a graph 𝒢=(𝒱,), where 𝒱 denotes the set of vertices (sites) and denotes the set of edges (bonds). On each site i𝒱, we introduce a vector sin to characterize the on-site degrees of freedom, which are generally referred to as a spin. The entire spin configuration s on the lattice can be viewed as a map s:𝒱n. A central theme of equilibrium statistical physics is to model the probability distribution p(s)eE(s) using a local energy function E(s), such as

EJ(s)=ijϵJ(si,sj), (1)

where ϵJ is a scalar function describing the energy associated with each pair of spins (si,sj) across the edge ij. The subscript J represents the collection of parameters that parametrize the energy model.

Symmetry plays a significant role in defining the spin degrees of freedom and constraining the energy function. Let G be the internal symmetry group and R be the point group of the lattice. Assuming G and R commute and form a direct product group G×R, the on-site spin si should form an n-dimensional linear representation of the G×R symmetry. Under the symmetry action gG and rR, the spin si transforms as

si(TrTg)sr(i), (2)

where Tg and Tr are matrix representations of the internal g and the point group r symmetry transformations. r(i) denotes the resulting site obtained from the original site i by applying the point group transformation r.

The energy function EJ(s) must be invariant under the internal and the lattice symmetry (including the point group and translation symmetries). This requires ϵJ to satisfy the following symmetry constraint:

ϵJ(si,sj)=ϵJ((TrTg)sr(i),(TrTg)sr(j)) (3)

for all (g,r)G×R. This means it is sufficient to define the energy function ϵJ on a representative bond and use the lattice symmetry to carry it to every other bond on the lattice (assuming all bonds on the lattice are related by lattice symmetry transformations).

Refer to caption
Figure 2: (a) Tensor network representation of the bond energy function ϵJ(si,sj) on the representative bond. (b) The kernel K is invariant under the internal symmetry gG.

More explicitly, suppose the spin components siaα are labeled by two indices a and α, separately indexing the representation space basis of the point group and internal symmetries. One possible design of an equivariant bond energy function ϵJ on the representative bond is to use a tensor network:

ϵJ(si,sj)=JabγKαβγsiaαsjbβ, (4)

where repeated indices are automatically summed over, as illustrated in Fig. 2(a). K is a G-invariant tensor satisfying the symmetry constraint TgααKαβγTgββ=Kαβγ (gG), as depicted in Fig. 2(b). The tensor J contains all parameters that determine the energy function. They can be viewed as coupling constants among different symmetry representations. Although Eq. (4) does not yet represent the most general equivariant energy model, it is already sufficiently expressive if the representation space dimension n is large enough. So we will not dive into more complicated equivariant neural network models [21, 22, 23, 24, 25, 26] but adopt this tensor network design to construct the equivariant restricted Boltzmann machines later.

In summary, given the internal symmetry group G and the lattice graph 𝒢 (with the lattice symmetry given by the automorphism group of 𝒢), one can specify the on-site spin si by its representation under the internal and point group symmetries. Statistical mechanical models are generally defined by symmetric energy functions EJ(s), parametrized by J, as in Eq. (1). The RG analysis aims to determine how the model parameters J effectively change across different scales.

II.2 Real Space Renormalization Group

For concreteness, we will focus on two-dimensional lattice models. In particular, we will consider the Lieb lattice [27] (bond-intercalated square lattice) as shown in Fig. 3(a,b). The Ising model defined on the Lieb lattice is physically equivalent to the simple square lattice but the Lieb lattice is more convenient for describing real-space RG schemes. Extending our approach to other lattices and higher dimensions is possible in general.

Refer to caption
Figure 3: (a) Fine-grained and (b) coarse-grained Lieb lattices. Shaded squares mark out the local blocks. In each block, the teacher and student RBMs are respectively defined on the (c) square and (d) cross graphs. The arrows mark the bond directions. All bonds are related by C4v point group transformations, which include the four-fold rotation C4 and the mirror reflection σ.

Starting with an energy model defined on the Lieb lattice, our RG scheme can be described as follows: (i) Divide the lattice into overlapping 2×2 blocks as in Fig. 3(a). (ii) Within each block, replace the local energy model on a square graph Fig. 3(c) with a new model on a cross graph Fig. 3(d), such that their marginal distributions on the corner spins match as closely as possible. (iii) Embed the new local energy model back into the original lattice. The new lattice Fig. 3(b) becomes a coarse-grained Lieb lattice, with lattice constant enlarged by 2. Repeating the procedure recursively, the energy model parameters will be renormalized to a larger and larger lattice scale.

The key objective of this RG scheme is to learn the new local energy model. For this purpose, we view the square- and cross-graph local energy models as two restricted Boltzmann machines (RBMs) [14, 15, 16, 17]. We will call the RBM on the fine-grained square graph the teacher machine, in which the corner spins xi are the visible variables and the decorated spins zi are the hidden variables, see Fig. 3(c). Its energy model reads:

EJtch(x,z)=ijsquareϵJtch(zi,xj). (5)

The RBM on the coarse-grained cross graph will be called the student machine, in which the corner spins xi are still the visible variables, but there is only one hidden spin z0 at the center, see Fig. 3(d). Its energy model is :

EJstd(x,z)=ijcrossϵJstd(xi,zj). (6)

The teacher and student RBMs are separately parametrized by Jtch and Jstd. In both Eq. (5) and Eq. (6), the bond energy function ϵJ(xi,xj) is defined on a representative bond along the ij direction using the tensor network model Eq. (4). In Fig. 3(c,d), the representative bonds are colored in blue, and arrows indicate the bond directions.

Both the square and cross graphs respect the C4v point group symmetry, which is generated by a four-fold rotation C4 and a mirror reflection σ, see Fig. 3(c,d). The mirror reflection σ is always assigned to preserve the representative bond (in blue). It only imposes symmetry constraints on the bond energy model ϵJ as required by Eq. (3). The four-fold rotation C4 further takes the representative bond to other bonds (of other colors), under which the bond energy model will change equivariantly following Eq. (3). This way, the RBMs will respect the internal G and point group R=C4v symmetries by design.

The objective is to train the student RBM to learn the coarse-grained local energy model from the visible spin configurations generated by the teacher RBM. Given the teacher model parameters Jtch, we optimize student model parameters Jstd by minimizing the Kullback-Leibler (KL) divergence between the visible variable distributions of the two models,

DKL(ptch(x)pstd(x))=xptch(x)logptch(x)pstd(x), (7)

where the marginal distributions ptch(x) and pstd(x) are defined by tracing out the hidden spins:

ptch(x)=zeEJtch(x,z)x,zeEJtch(x,z),pstd(x)=zeEJstd(x,z)x,zeEJstd(x,z). (8)

Although directly evaluating the KL divergence in Eq. (7) is not tractable, the optimization can be performed by stochastic gradient descent following the standard contrastive divergence (CD) [15] training technique for the RBM.

In this way, the student machine automatically learns the coarse-grained model parameters Jstd given the fine-grained model parameters Jtch of the teacher machine. This contrasts with the conventional real space RG approach, where human physicists must design the coarse-graining rule that maps the visible spin configuration x to a hidden spin z0(x) in each block. Our approach differs in two aspects: (i) The coarse-grained variable z0 is no longer a deterministic map of x but is defined in a probabilistic manner via a conditional distribution pstd(z0|x) given by the student machine. (ii) The conditional distribution pstd(z0|x) is not specified by humans but is learned from the data. This allows the student machine to develop the optimal RG transformation within its available representation space dimensions, extracting the most relevant features z0 from the visible configurations x without supervision.

After training, we can replace the teacher model parameters Jtch with the student model parameters Jstd and move on to train the next-generation student. Therefore, generation by generation, the teacher-student learning iteration will trace the RG flow of J in the parameter space.

II.3 Choosing Point Group Representations

To demonstrate how well the student RBM can approximate the teacher RBM, we consider a concrete example based on the two-dimensional Ising model on the square lattice. In this case, the internal symmetry is G=2, and the point group symmetry is R=C4v. Then every spin variable (either visible or hidden) in both RBMs will be specified as a representation of 2×C4v. The 2 group only has one non-trivial representation, i.e., the odd (signed) representation that transforms as ss. So we will assume every component of the spin to transform as this odd representation under the 2 Ising symmetry. The C4v point group has richer irreducible representations, summarized in Tab. 1. The expressive power of the RBM depends on the choice of the C4v point group representations for each spin, which will be elaborated in the following.

irrep dim transforms as TC4 Tσ
𝖠1 1 1 [1] [1]
𝖠2 1 xy(x2y2) [1] [1]
𝖡1 1 x2y2 [1] [1]
𝖡2 1 xy [1] [1]
𝖤 2 (x,y) [0110] [1001]
Table 1: Irreducible representations of the C4v group. The columns shows their dimensions, Cartesian products (x-axis along the representative bond), and transformation matrices Tr for r=C4,σ (the two generators of C4v).

A single Ising spin in the conventional Ising model corresponds to a 2-odd variable carrying 𝖠1 representation, which does not transform under the point group symmetry at all. However, under RG, the coarse-grained spin (such as the z0 spin in the student RBM) can carry an enlarged point group representation. From Fig. 3(c,d), it is clear that the RG procedure is effectively merging the four hidden spins in the teacher model (square graph) to one hidden spin in the student model (cross graph). Therefore the hidden spin z0 in the student model must contain more internal structures and carry orbital angular momentum (i.e., non-trivial representation of C4v) to resolve the complication of inhomogeneous hidden spin configurations in the teacher model.

For example, by saying that a 2 spin variable z0 carries the 𝖠1𝖤 representation, we imply that z03 is a three-component vector, consisting of three Ising variables: one Ising variable forms the 𝖠1 representation and remains invariant under lattice rotations, and two Ising variables form the two-dimensional 𝖤 representation that can rotate as a vector. Then the C4 and σ transformations are represented as

TC4z0=[100001010]z0,Tσz0=[100010001]z0, (9)

according to the transformation matrices listed in Tab. 1.

Suppose we start from a fine-grained teacher model with both the visible and hidden spins as ordinary Ising spins in the 𝖠1 representation, i.e., xi,zi{±1}. The parameter tensor Jtch will contain only one component given the symmetry constraints. According to the square graph in Fig. 3(c), the teacher model is described by the energy function below

EJtch(x,z)=Jtchi=03zi(xi+x(i+1)mod4). (10)

The hidden spins zi can be immediately traced out (marginalized) in the partition function, leaving us an effective model for visible spins xi, described by

EJtch(x)=Jeff(x0x1+x1x2+x2x3+x3x0), (11)

where Jeff=12logcosh(2Jtch) is an effective coupling that depends on Jtch. Correspondingly, the visibile spin discributions is ptch(x)=Z1eEJtch(x).

To build a coarse-grained student model to approximate ptch(x), the first step is to specify the choice of the point group representation for the coarse-grained variable z0 in the student RBM. For example, if z0 is chosen to carry the 𝖠1𝖤 representation with totally three components as z0=(z01,z02,z03), where each component is an 2 Ising variable that z0a{±1}, the student model can be written as (see Fig. 3(d) for the connectivity of these variables)

EJstd(x,z)=Jstd𝖠1𝖠1(x0+x1+x2+x3)z01Jstd𝖠1𝖤((x0x2)z02+(x1x3)z03). (12)

The fluctuation of the 𝖠1 representation z01 mediates an equal amount of positive correlation xixj between every pair of visible spins, which is indeed the dominant correlation pattern in a ferromagnetic Ising model. However, to better match the teacher model ptch(x), the x0x2 and x1x3 correlations should be further weakened relative to others because there is no direct coupling between these diagonal spins in the original teacher model Eq. (11). This is where the 𝖤 representation plays a role, as the z02 and z03 fluctuations will mediate some negative correlation in x0x2 and x1x3 due to the minus signs in Eq. (12), which helps to bring the student model’s visible distribution pstd(x) closer to that of the teacher model ptch(x). This argument explains why including more point group representations to the hidden spin in the student model is generally helpful to improve its expressive power.

Refer to caption
Figure 4: Minimal KL divergence DKL(ptch(x)pstd(x)) (in the unit of bit) vs. the parameter Jtch of the teacher RBM, for different choices of point group representations of z0 in the student RBM.

Within this setup, we tested the performance of a series of student RBMs, built with different choices of the point group representation for the hidden spin z0. The performance is evaluated by the minimal KL divergence DKL(ptch(x)pstd(x)) achieved after optimizing the student machine. In Fig. 4, we show the minimum KL divergence over a large range of the parameter Jtch of the teacher model under different representation choices of z0. The general representations are constructed by combining irreducible representations 𝖠1, 𝖡1, and 𝖤. The other two irreducible representations 𝖠2 and 𝖡2 are not used in this specific calculation because they will not couple to the xi spins in the 𝖠1 representation due to the constraint from the mirror reflection symmetry σ. But in more general cases, when the representation of xi is larger, all irreducible representations can appear on z0 in principle. This calculation quantitatively demonstrates that the student RBM can learn to approximate the visible spin distribution from the teacher RBM. The approximation can be progressively improved by introducing larger representations on the hidden spin z0, even though an exact match might only be achievable in the large representation dimension limit n.

In practice, the RBM can only handle a finite representation space dimension n. So we have to truncate the representation space at a maximum dimension nmax. This will introduce inaccuracy in each RG step, but the hope is that the error introduced by the truncation will become irrelevant under RG flow, such that the truncation only affect the RG fixed point behavior in a controllable manner. This is also the underlying assumption of many real space RG schemes, where each RG step does not need to preserve the partition function exactly.

As we carry out the MLRG procedure by recursively training the student RBM by the teacher RBM and replacing the teacher with the trained student, the RG flow is supposed to bring us (close) to some fixed point RBM model, at which we can further investigate the universal properties (such as critical exponents and operator scaling dimensions) of the RG fixed point. We will provide numerical evidence to show that the MLRG algorithm can locate RG fixed points and estimate their universal properties more accurately when the representation dimension n gets larger, such that it could offer a useful and controllable numerical method to automatically map out phase diagrams and study critical phenomena in statistical physics models without human supervision.

II.4 Learning the RG Monotone

While the above teacher-student learning approach for real space RG is appealing, it faces a serious challenge in actual training. This challenge comes from the stochastic nature of training the RBM, such that the student RBM parameter Jstd will always fluctuate around its optimum in every RG step. This inevitably injects random noise into the entire RG flow, causing the model to perform random walks in the parameter space. Since the RG flow near the critical point (i.e., unstable RG fixed point) is particularly sensitive to small perturbations, stochastic RG flow will almost always miss the critical point. Therefore, without addressing the problem of parameter space random walk, the above naive MLRG algorithm is useless for studying any critical phenomena.

To address this challenge, we introduce a third AI system, called the moderator, which operates outside the teacher-student system. The moderator’s objective is to monitor the stochastic RG flow over time in many different scenarios and learn the underlying deterministic RG flow throughout the entire parameter space. The key idea here is to assume the existence of an RG monotone C(J), which is a real scalar function of the RBM model parameters J, such that the RG flow can be formulated as a gradient flow of the RG monotone dJ/d=JC(J), where parametrize the RG step. This is the strongest form of the c-theorem [28, 29, 30, 31]. Instead of trying to construct such RG monotone by humans, we introduce a feed-forward neural network Cθ(J), called the RG monotone network, to model the function C(J) and train it jointly with the RBMs in the RG flow. Here θ denotes the collection of all parameters in the neural network.

The training starts from a random choice of the teacher model parameters Jtch in the parameter space. The moderator takes the initial condition J(=0)=Jtch and evolves J() from =0 to =1 by solving the RG equation

dJd=JCθ(J), (13)

following the gradient signal provided by the RG monotone network Cθ. The neural ordinary differential equation (neuralODE) technique [32, 33] is employed here to enable gradient backpropagation through the ODE solver. The moderator then passes the solution to the student RBM as its model parameters Jstd=J(=1). The teacher RBM then starts sampling visible spin configurations and sends them to the student RBM. The student RBM receives the training data and evaluates the KL divergence in Eq. (7) as the total loss function. All models are trained jointly by minimizing the KL divergence using the CD training technique. The loss function gradient will eventually back-propagate to θ to update the RG monotone network. The training is performed at many random choices of initial parameters Jtch, such that a large amount of training data can be aggregated to optimize the RG monotone throughout the parameter space.

Although the nature of the training is still stochastic, the random noise will be averaged out in fitting the RG monotone, such that after training, the optimal fit Cθ(J) can be used to generate a deterministic RG flow following Eq. (13). In this way, the random walk behavior in the parameter space can be avoided, which enables the MLRG algorithm to locate RG fixed points accurately.

II.5 Sampling the Parameter Space

Next, we will discuss how to sample Jtch more efficiently to speed up the training. Since the RBM model parameters J live in high-dimensional parameter space, uniform sampling might not be an efficient strategy. We propose an importance sampling strategy that focuses on RG fixed points.

The sampler makes use of the knowledge about the RG monotone to sample the parameters J from the following probability distribution

p(J)eβJCθ(J)2, (14)

with some inverse-temperature β as a hyperparameter. We set β=0 initially, such that the sampler will propose model parameters J uniformly in the parameter space. The RG monotone network gradually accumulates knowledge about the RG flow as the training progresses. RG fixed points emerge as local saddle points where the gradient JCθ(J)=0 vanishes. We then gradually increase β, so the sampler will be biased to sample more around RG fixed points (including both stable and unstable). This design encourages the RG monotone network to be trained more intensively near RG fixed points, such that the fixed point location can be estimated more accurately, a desirable feature for the automatic discovery of critical points (phase transitions) in statistical mechanics systems.

Since J varies continuously, the Hamiltonian Monte Carlo (HMC) approach [18, 19, 20] becomes a natural choice of the sampling method. HMC is a Markov chain Monte Carlo algorithm that uses Hamiltonian dynamics to efficiently propose moves in the parameter space of the target distribution p(J). It is a powerful method for sampling complex distributions of continuous variables in high-dimensional space. We implement the method using multiple HMC samplers with replica exchange to mitigate the possibility of getting trapped at a single fixed point.

In summary, the RG monotone network Cθ(J) plays a fundamental role in the MLRG algorithm, as illustrated in Fig. 1. It first guides the HMC sampler to propose new teacher model parameters Jtch following the probability distribution in Eq. (14) and then predicts the student model parameters Jstd by solving the RG flow differential equation in Eq. (13). In this sense, it indeed serves as a moderator to moderate the teaching-learning process. In return, the RG monotone network gets trained when the student RBM learns to generate similar spin configurations as those generated by the teacher RBM. It is worth mentioning that the HMC sampled teacher model parameters Jtch are detached from the computational graph, such that the RG monotone network parameters θ will only receive gradient signals from the student model parameters Jstd. In this way, the training of the RG monotone will not be affected by the HMC sampling.

III Results and Analysis

III.1 Symmetry Assignment and Coupling Parameters

To apply the MLRG method, we focus on two-dimensional lattice models with Ising symmetry. We do not need to specify any particular Ising model Hamiltonian, as the MLRG can automatically explore all possible models that are consistent with the internal and lattice symmetry, given the on-site symmetry representation.

In the following, we will always take the G=2 internal symmetry and the R=C4v point group symmetry. Regarding the symmetry representation of on-site spins, we consider that every spin (whether visible or hidden) is odd under 2 and carries 𝖠1 or 𝖠1𝖤 or 2𝖠1𝖤 representation under C4v. The choice of these point group representations is based on our experience that they are the most efficient representation within their respective representation dimensions in minimizing the KL divergence between teacher and student RBMs, as shown in Fig. 4. The above symmetry assignment is summarized in Tab. 2, which is all we need to set up the equivariant RBMs and prepare the MLRG model for training.

symmetry representation
G 2 1
R C4v 𝖠1,𝖠1𝖤,2𝖠1𝖤
Table 2: Settings of symmetries and spin representations for the two-dimensional Ising MLRG.

For 2 spins, the (representative) bond energy function in Eq. (4) reduces to the following form

ϵJ(si,sj)=Jabsiasjb, (15)

where a,b label the basis of the C4v symmetry representation. The internal symmetry representation labels α,β in Eq. (4) are omitted because the 2 group only has one non-trivial irreducible representation. The coupling J is a matrix that can be parametrized as follows under different choices of the point group representations:

  • 𝖠1 representation (1-dimension, 1 coupling parameter)

    J=[J0], (16)
  • 𝖠1𝖤 representation (3-dimensional, 5 coupling parameters)

    J=[J0J10J2J3000J4], (17)
  • 2𝖠1𝖤 representation (4-dimensional, 10 coupling parameters)

    J=[J0J1J20J3J4J50J6J7J80000J9]. (18)

In the above matrices, we use lines to separate different irreducible representations of the C4v symmetry. Some matrix elements are zero because of the restriction imposed by the mirror reflection symmetry σ, as required by Eq. (3) in general. In all cases, the parameter J0 always denotes the Ising coupling between the first 𝖠1 representations, directly connected to the bare Ising coupling in the lattice model.

III.2 RG Monotone and RG Flow

Choosing the smallest point group representation 𝖠1, the coupling matrix J is parametrized by a single variable J0. In this case, the RG flow is simply defined in the one-dimensional parameter space. The RG monotone C(J0), determined through the MLRG method, is depicted in Fig. 5(a). A local maximum of the RG monotone is observed at J0=0.82±0.02. Since the RG flow is designed to follow the gradient descent trajectory of the RG monotone as in Eq. (13), it implies that the parameter J0 will depart from the unstable fixed point J0 and flow towards one of the two stable fixed points, either J0=0 or J0. The RG flow can also be understood from the plot of J0C in Fig. 5(b), which corresponds to the beta function in the context of RG theory.

Refer to caption
Figure 5: (a) RG monotone C(J0) and (b) its negative gradient J0C(J0) (beta function), as functions of the unique coupling parameter J0 in the 𝖠1 representation. The shaded area indicates the two-standard deviation uncertainty range, estimated based on ten independently trained MLRG models. The vertical solid line marks the estimated critical point J0, while the exact critical point J0c is displayed as the vertical dashed line. Arrows indicate the RG flow directions.

The maximal position J0:=argmaxJ0C(J0) of the RG monotone C(J0) provides an estimation of the Ising critical point that separates the paramagnetic phase (J0=0) and the ferromagnetic phase (J0). The exact Ising critical point of the Lieb lattice model is located at

J0c=12arccosh(1+2)0.7643. (19)

The estimation J0=0.82±0.02 is close to but still deviates from the exact value J0c. This is because the dimension of the 𝖠1 representation is small, which limits the ability of the student RBM to approximate the teacher RBM in the MLRG algorithm. We should expect the estimation to be improved as the on-site spin involves larger representations of the point group.

Refer to caption
Figure 6: RG flow diagram for models with 𝖠1𝖤 representation. J0 parametrize the Ising coupling between 𝖠1 spins. J1,2, parametrize the remaining coupling in Eq. (17) in their most relevant direction. The background color (and contours) indicate the RG monotone C(J). The arrows trace out the RG flow directions. The stable (unstable) fixed points are marked by black (red) dots.

Going beyond the 𝖠1 representation, the RG flow will be defined in high-dimensional parameter space. To visualize the RG monotone C(J), we separate the parameter J0 from the remaining parameters J1,2, in the coupling matrix J. We always initialize the RG flow by setting J0 to the bare Ising coupling of the lattice model and allowing J1,2, to be generated under the RG flow. On the sub-manifold spanned by J0 and the most relevant flow direction of J1,2, (which denotes a particular linear combintation of J1,J2, parameters along which the gradient of the RG monotone is maximal), we can plot the RG monotone C(J) obtained by the MLRG method, as well as its gradient directions (RG flow directions). An example is shown in Fig. 6, which is obtained by training with the 𝖠1𝖤 on-site representation.

The MLRG algorithm learns the RG monotone C(J) throughout the parameter space, based on which the RG flow diagram can be obtained. As we tune the bare coupling J0 to the critical point J0 along the horizontal axis, the RG flow takes us to the true RG fixed point J away from the horizontal axis. The parameter J defines a statistical mechanics model approximating the Ising conformal field theory (CFT). The approximation is expected to be better with larger on-site point group representations.

III.3 Determining the Critical Point

To estimate the Ising critical point J0, we start by initializing the coupling matrix J with a given J0 and J1,2,=0, denoted as

J(J0,=0)=[J0000]. (20)

Then we flow the coupling matrix J by solving the RG equation dJ/d=JC(J) from =0 to some large . We denote the solution as J(J0,) as it depends on the initial condition J0 and the RG scale . Under our RG scheme, the linear system size will be effectively enlarged by 2/2 at the RG scale . Fig. 7 shows the RG monotone C(J(J0,)) as a function of different initial value of J0 at different RG scales .

Refer to caption
Figure 7: Flow of the RG monotone C(J(J0,)) starting from given J0 parameters, under the representation choice of (a) 𝖠1, (b) 𝖠1𝖤, and (c) 2𝖠1𝖤. labels the steps of RG flow. The vertical solid line marks the estimated critical point J0, while the exact critical point J0c is displayed as the vertical dashed line.

One can see that the RG monotone peaks at the critical point, and the peak becomes sharper with longer RG flow . With a sufficiently large , we can estimate the critical point J0 by finding the local maximum of the RG monotone

J0=argmaxJ0C(J(J0,)), (21)

The result J0 will be insensitive to the RG scale as long as it is large enough that the RG flow has converged. We trained several different MLRG models for each on-site point group representation choice and estimated the Ising critical point J0 using the abovementioned method. Our result is shown in Fig. 8. We can see a clear trend that the estimated critical point converges to its exact value J0c with larger point group representations (hence stronger RBM models).

Refer to caption
Figure 8: Estimated critical point J0 under different choices of the on-site point group representation. The dashed line indicates the exact value J0c=12arccosh(1+2).

III.4 Boltzmann Weight Tensor

To further analyze the physical properties at different RG fixed points, we define the Boltzmann weight tensor T(J) from the RBM energy model given the coupling parameter J. The tensor elements are specified as

T(J)x0,x1,x2,x3:=zeEJ(x,z), (22)

where x=(x0,x1,x2,x3) denotes the visible spins jointly and z denotes the hidden spins jointly. T(J) is a rank-4 tensor with each tensor element encoding the Boltzmann weight of a particular configuration of visible spins x, as illustrated in Fig. 9. We can use either the teacher RBM as Eq. (5) or the student RBM as Eq. (6) for the energy model EJ(x,z). They should not have much difference as long as the model parameter J has converged to an RG fixed point where the teacher and the student should behave the same in the ideal limit. Nevertheless, in the following analysis, we will always adopt the teacher model as we found it a little more accurate than the student model.

Refer to caption
Figure 9: Tensor representation of the RBM energy model, such that each tensor element encodes the Boltzmann weight of a configuration of visible spins.

The Boltzmann weight tensor T(J) enables us to define various physical properties of the statistical mechanical model conveniently. For example, the ground state degeneracy Z(J) (regularized partition function) [34, 35] can be defined as the ratio of the following tensor contractions (repeated indices are summed automatically)

Z(J)=(T(J)abab)2T(J)acbcT(J)bdad=([Uncaptioned image])2[Uncaptioned image]. (23)

For Ising models, the ground state degeneracy characterizes the order of the broken symmetry group: Z=|1|=1 at the paramagnetic (disordered) fixed point where the internal 2 symmetry is preserved, and Z=|2|=2 at the ferromagnetic (ordered) fixed point where the internal 2 symmetry is broken.

Refer to caption
Figure 10: Flow of the ground state degeneracy Z starting from given J0 parameters, under the representation choice of (a) 𝖠1, (b) 𝖠1𝖤, and (c) 2𝖠1𝖤. labels the steps of RG flow. The vertical solid line marks the estimated critical point J0, while the exact critical point J0c is displayed as the vertical dashed line.

Following the approach described in Sec. III.3, we start with the initialization of the coupling matrix J by a single parameter J0 as in Eq. (20) and follow the RG flow to obtain J(J0,). At different RG scale , we plot the ground state degeneracy Z(J(J0,)) as a function of the initial parameter J0, the result is shown in Fig. 10. As J0 goes across the critial value J0, the ground state degeneracy transitions from Z=1 (paramagnet) to Z=2 (ferromagnet). The transition becomes sharper with increasing along the RG flow. The intersection of the ground state degeneracy curve at different RG scales can be used to infer the Ising critical point J0. As we enlarge the point group representation, J0 progressively approaches the exact Ising critical point at J0c as shown in Fig. 8.

The MLRG can automatically label different symmetry-breaking phases by their ground state degeneracy using this analysis. This can be viewed as an approach for unsupervised phase classification. The ground state degeneracy curves Z(J(J0,)) also provide a method to estimate the critical point J0 by finding their intersections.

III.5 Locating the Monotone Saddle Point

Our goal is to estimate the universal properties at the critical point. This requires us to accurately locate the RG fixed point parameter J within the high-dimensional parameter space, as exemplified in Fig. 6.

However, if we simply follow the RG flow to approach J, it is almost inevitable that we will miss it. This is due to the tendency of the RG flow to deviate from the unstable fixed points. Fortunately, the MLRG algorithm has been trained to learn the RG monotone function C(J). This feature enables us to find the RG fixed point by locating the saddle point of the RG monotone via Newton’s method. By doing so, we can effectively circumvent the complication of fine-tuning the parameters in the parameter space.

Given the RG monotone C(J) and starting from an initial guess of J near an RG fixed point, Newton’s method recursively improves the approximation by the following update

JJ(J2C)1JC, (24)

where JC denotes the gradient vector and J2C denotes the Hessian matrix of C(J) at J. Fig. 11 demonstrates the flow of J along the vector field of (J2C)1JC. Differing from the RG flow in Fig. 6, Newton’s method converges to the RG fixed point in its neighborhood from all directions, regardless of whether the RG fixed point is stable or unstable. This enables us to pinpoint the RG fixed point from an initial guess near it.

Refer to caption
Figure 11: Flow diagram of Newton’s method iteration Eq. (24) to local RG fixed points for models with 𝖠1𝖤 representation. Both stable and unstable RG fixed points are attractive under Newton’s flow. The dashed curves mark the borders between different attractive basins.

To locate the Ising critical fixed point J in the parameter space, the initial guess can be constructed by setting the J0 component of the matrix J to its critical value J0, which can be estimated by methods described in Sec. II.4 or Sec. III.4. The convergence can be monitored by evaluating C(J). We stop the iteration when C(J) does not change significantly between successive steps. The iteration is expected to converge to a saddle point J of the RG monotone where JC=0.

III.6 Calculating Scaling Dimensions

Once we locate the RG fixed point J by the iteration in Eq. (24), we can substitute it into Eq. (22) to construct the fixed-point Boltzmann weight tensor T(J). The tensor can then be used to compute the transfer matrix M, with the following matrix elements

Mab:=T(J)acbc=[Uncaptioned image]. (25)

The eigenvalues of the transfer matrix M are related to the scaling dimensions Δk (k=0,1,) of operators in the CFT [34, 35, 36, 13, 37],

TrMnke2πn(Δkc/12), (26)

with c being the central charge. The proportionality factor can be fixed by requiring the identity operator (i.e., the vacuum state) to have zero scaling dimension Δ0=0. This allows us to extract the lowest few scaling dimensions associated with the most relevant operators.

Refer to caption
Figure 12: Estimated scaling dimension Δ1 under different on-site point group representation choices. The dashed line indicates the exact value Δ1=1/8.

Fig. 12 shows the estimation of the lowest scaling dimension Δ1 by the Ising MLRG with different point group representation choices. This should be the scaling dimension of the 2-odd Ising order parameter, whose exact value is known to be Δ1=1/8. We can see that the MLRG can approach the exact value with larger point group representations.

IV Summary and Discussion

In this study, we proposed the MLRG algorithm and used it to analyze many-body lattice models in statistical physics. Our method incorporates several modern machine learning techniques, including using equivariant neural networks [21, 22, 23, 24, 25, 26] for RBMs and neural ordinary differential equations [32, 33] for modeling the RG flow. The source code and raw data are available in the MLRG GitHub repository [38].

The MLRG algorithm demonstrates the power of machine learning to automate and enhance the study of statistical physics systems. We used the representation learning capability provided by generative modeling to learn the optimal RG transformation from the data directly, without requiring direct human intervention or prior knowledge about the system (apart from symmetry and dimensionality).

The design of the MLRG algorithm exemplifies the paradigm of introspective learning [39], an effective approach in machine learning for scientific discovery. It refers to the ability of the algorithm to self-analyze and self-adapt during the learning process, drawing on multiple levels of learning and analysis to generate a more comprehensive and insightful understanding of the problem at hand. MLRG’s design incorporates this introspective nature through its multi-level, multi-model architecture. It uses two “lower-level” machines, the teacher and student models, to mimic the renormalization group process of extracting relevant features by representation learning. It also incorporates a “higher-level” machine, the moderator model, which learns the RG flow and uses its knowledge to guide the sampling of new model parameters that are most worth training. This multi-level, multi-model design allows for a clear separation between task performance, carried out by the teacher and student models, and knowledge extraction, carried out by the moderator model. This separation endows the high-level model with error-correcting capabilities, enabling it to resist the influence of noise in RBM training on learning the RG flow. This ability to compress knowledge and correct errors is crucial for AI to make scientific discoveries [40].

The MLRG approach is related to (and distinguished from) existing works in the following ways:

  • Monte Carlo Renormalization Group (MCRG): MCRG [41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54] uses Monte Carlo sampling to facilitate non-perturbative RG techniques. This process begins with sampling configurations from a fine-grained model given the model parameters. It then employs local RG transformations to generate coarse-grained configurations and estimates the new model parameters accordingly. This approach aligns with our method of training the student RBM from the teacher RBM. In particular, recent developments [4, 7, 8] have introduced generative models to learn the optimal RG transformation, a goal that aligns with ours.

    However, our method leverages specific lattice structures, such as the Lieb lattice, to restrict the RG operation within a small spin cluster. In the lack of this design, MCRG has to perform the sampling on a much larger lattice and model many long-range multi-spin couplings, which increases computational complexity. Furthermore, MCRG does not learn the RG monotone and can not automatically locate RG fixed points, such that it requires fine-tuning model parameters when studying critical point properties, adding to the computational burden.

  • Deep Learning and RG: Several studies [55, 56, 57, 58, 59] have drawn parallels between deep neural networks and RG. Notably, they recognize that generation is the inverse process of renormalization [6, 11]. Therefore, generative models can be used to implement data-driven RG. These discussions focus on optimizing RG transformations using deep learning, which often relies on Monte Carlo simulated spin configurations for hierarchical feature extraction. Despite their effectiveness in extracting features of stable phases, they lack controlled accuracy in predicting universal properties of phase transitions, as they did not learn the RG equation or the RG monotone.

  • RG Flow-Based Generative Modeling: Techniques such as Neural-RG [2, 6] and RG-Flow [10, 11] embed RG transformations in multi-level flow-based generative models [60, 61, 62], applying deep learning methods to learn optimal RG transformations from model Hamiltonians by minimizing free energy. These methods are based on the invertible RG framework, which designs the local RG transformation as a bijective (invertible) deterministic map from spin configurations to relevant and irrelevant features.

    However, the requirement for bijectivity limits the possibilities for RG transformation. Moreover, flow-based models struggle to model discrete variable probability distributions, limiting their application in various statistical mechanics problems. They also lack an asymptotic exact limit, meaning they can typically only serve as configuration update proposers to accelerate Monte Carlo calculations rather than replacing Monte Carlo as an unbiased simulation method.

    In contrast, MLRG applies to both discrete and continuous variables, offers more flexible RG transformations with stochastic coarse-graining maps, allows the extension of on-site degrees of freedom, and is exact when the on-site degrees of freedom tend to infinity. These characteristics make MLRG a valuable method for studying statistical physics.

  • Information Theoretical Approach to RG: Some research has explored the information theoretical criterion for optimal RG, indicating that RG transformation should maximize the mutual information of relevant features with the environment [1, 5, 63, 64, 65, 66] or minimize the mutual information among irrelevant features [6]. While MLRG does not conflict with these principles, it does not explicitly use them as optimization criteria. Instead, it uses the match of marginal distributions of teacher and student models on their common spins to define optimal RG transformations, a method that is more direct and easier to optimize.

As a numerical algorithm to solve statistical mechanical problems, the MLRG showcases several advantages:

  • Efficiency in Small Cluster Sampling: Compared to traditional Monte Carlo (MC) simulations, the MLRG algorithm operates on a smaller, lighter-weight scale. It only requires sampling spin configurations within a small cluster of spins (within a unit cell). Moreover, the Gibbs sampling in different spin clusters can be effectively parallelized on modern computing devices. This compact operation allows the algorithm to perform more efficiently than larger-scale simulations.

  • Exploration of the Full Parameter Space: The MLRG algorithm is designed to efficiently traverse the entire parameter space in a single pass of training. This differs significantly from MC simulations and Tensor Network Renormalization Group (TNRG) [67, 68, 34, 69, 70, 35], which must solve the Hamiltonian multiple times at different parameters. The MLRG algorithm’s ability to explore the full parameter space reduces computational costs and time spent scanning the phase diagram, which can be beneficial when the parameter space dimension is large.

  • Discovery and Analysis of Critical Points: The MLRG makes use of the knowledge about the RG monotone to identify critical points (i.e., unstable RG fixed points). It uses machine-learned RG flow to calculate fixed-point properties, eliminating the need for finite-size scaling. This automated discovery and analysis process enhances the algorithm’s capability to analyze critical properties in complex physical systems, especially for multi-critical points with multiple unstable directions.

  • Controlled Convergence of the Algorithm: Similar to tensor-network methods, the MLRG algorithm’s estimation of physical properties converges as the dimension of the latent space increases. This behavior ensures that the algorithm’s predictions become increasingly accurate and reliable as more computational resources are investigated.

  • Interpretability: Compared to other machine-learning methods for RG, the MLRG approach provides better interpretability by labeling the hidden spins in the model with symmetry representation, leading to physically meaningful coupling parameters and RG rules. This feature is particularly valuable in physical sciences, where the goal extends beyond accurate prediction to gaining physical insights.

The current training of the RBMs that model local energy in MLRG is a stochastic process. This brings about fluctuations and noise, which can result in instability in the RG monotone network. The stability of the RG monotone network directly impacts the reliability of critical property predictions. A potential approach to improve this stability is to replace the RBMs with tensor networks, like in Fig. 9. This could allow for a deterministic training approach based on tensor network optimization, thus reducing the noise and increasing the stability of the RG monotone network. With the stability and reliability improvements, the MLRG can be extended to handle more complicated spin models. These models involve larger internal symmetry groups, such as Sn groups, which are significant to the study of categorical symmetry [71, 72] and entanglement transitions [73, 74, 75, 76, 77, 78].

Acknowledgements.
We acknowledge the discussions with Rokas Veitas, John McGreevy, Roger Melko, Han Ma, and Lei Wang. WH and YZY are supported by a startup fund by UCSD and the National Science Foundation (NSF) Grant No. DMR-2238360. The research was first presented at Swarma Club’s reading group on Causal Emergence in 2021 summer and benefited from the interaction with the audience during the event. We acknowledge the OpenAI GPT4 model for providing editing suggestions throughout the process of writing this paper.

References