Complex hypergraphs
Abstract
Providing an abstract representation of natural and human complex structures is a challenging problem. Accounting for the system heterogenous components while allowing for analytical tractability is a difficult balance. Here I introduce complex hypergraphs (chygraphs), bringing together concepts from hypergraphs, multi-layer networks, simplicial complexes and hyperstructures. To illustrate the applicability of this combinatorial structure I calculate the component sizes statistics and identify the transition to a giant component. To this end I introduce a vectorization technique that tackles the multi-level nature of chygraphs. I conclude that chygraphs are a unifying representation of complex systems allowing for analytical insight.
I Introduction
Graphs are structural abstractions of many-particle systems, with vertices representing particles and edges the pairwise interactions between them Albert and Barabási (2002). Hypergraphs are an extension allowing for interactions between two or more vertices Ghoshal et al. (2009); Vazquez (2009); Coutinho et al. (2020); Civilini et al. (2021); Sun and Bianconi (2021). Multiplex networks introduce layers accounting for different interaction types Gómez et al. (2013). Simplicial complexes extend connectivity to hierarchical structures of inclusion Petri and Barrat (2018); Courtney and Bianconi (2018); Iacopini et al. (2019).
There are self-referential constructions as well. Joslyn and Nowak introduced ubergraphs Joslyn and Nowak (2017), a combinatorial structure where edges can contain other edges. ubergraphs has been used to organize relations in knowledge databases Yadati et al. (2021). At a higher level of abstraction, Baas introduced higher order structures, also called hyperstructures, using concepts from category theory Baas (2019).
In practice what is a suitable representation is a matter of balance. We would like the flexibility of hypergraphs to go beyond pairwise interactions, the possibility of multiple layers and the hierarchical inclusion structure of simplicial complexes and hyperstructures. In turn we would like the simplicity of graphs to calculate aggregate properties, such as the size of the giant component.
Here I introduce complex hypergraphs as a flexible combinatorial structure. General enough to include a big volume of previous work and new applications. Simple enough to allow for analytical calculations. The work is organized as follows. In Sect. II I define complex hypergraphs and comment on its relation with other structures. In Sects. III and IV I provide analytical methods to characterize the emergence of the chygraph giant component, using a generating function formalism. I discuss particular cases and provide numerical examples validating the analytical results. I finish with some key conclusions in Sec.V.
II Key definitions
II.1 Motivation
Complex systems are simplified to facilitate their analysis. Gradually we remove some of the simplifications and move closer to the real system. For example, the system of scientific publications has been represented by different network structures depending on the question asked Newman (2001); Barabási et al. (2002). Citation networks indicate the flow of knowledge along publications Redner (1998); Lehmann et al. (2003). In citation networks nodes are publications and citations are represented by directed links. Co-authorship networks are better suited when focusing on collaborations. Authors-publications networks can be further expanded to explicitly represent authors and publications, resulting in the bipartite graph Newman (2001). The same authors-publications relations can be represented as an authors hypergraph, where publications are hyperedges associating one, two or more authors Vasilyeva et al. (2021). These networks and hypergraphs are simplifications loosing some aspects of the original system. We need a more complete representation with a richer combinatorial structure. A scientific publication contains both authors and references. The document internal structure can be represented by a hypergraph with two edges, the list of authors and references. On top of that, the publication is a vertex in a higher order structure where the building blocks are authors and publications. Informally speaking, a hypergraph of hypergraphs, a complex hypergraph.
II.2 Complex systems
Here I use the term complex in the structural sense: made of different parts. I make a distinction between the parts that are not decomposable into other parts, the atoms, and the complexes that are made of other parts, including atoms. The atoms could have a finer structure, but they have been chosen as the primary building blocks. These preliminaries lead to the self-consistent definition of complex system.
Definition. A complex system is a set of atoms and complexes, where complexes are made of atoms and other complexes.
The latter serves as a philosophical definition of complex system. For practical applications we need a precise mathematical construction. The choice depends on the internal structure of the complex.
II.3 Hypergraphs
A hypergraph is a vertex set and a hyperedge set , where a hyperedge is a subset of . Just for the sake of illustration, I rewrite this definition based on the wording of complex systems introduced above. A hypergraph is an atom set and a complex set , where the complexes are subsets (hyperedges) of . Obviously hypergraphs do not posses the self-referential property of complex systems. They are simple structures.
II.4 ubergraphs
To make hypergraphs self-referential, Joslyn and Nowak introduced ubergraphs Joslyn and Nowak (2017), a combinatorial structure where hyperedges can contain other hyperedges. Here I redefine them using the wording of complex systems introduced above.
Definition. An ubergraph is a set of atoms and a set of complexes , where the complexes are subsets (hyperedges) of .
II.5 Complex hypergraphs
To add more internal structure I choose hypergraphs. That leads to the definition of complex hypergraphs.
Definition. A complex hypergraph (chygraph) is a set of atoms and a set of complexes , where the complexes are hypergraphs with vertex sets in .
Let us unravel this definition with some examples. A graph is represented by the chygraph where the complexes are edges. A multiplex graph Gómez et al. (2013) with layer graphs , , is represented by the chygraph plus some partition structure discussed below. The system of scientific publications is represented by the chygraph , where atoms are authors, complexes are publications, the publications are represented by a hypergraph with two edges ( for the authors and for the references) and the index runs across all publications. The two edges have no overlap and therefore the complexes representing scientific publications have two internal components (Fig. 1).
We can think of chygraphs as ubergraphs with two edge types: the edges in the complexes associated hypegraphs and the complexes. The complex act as a special edge that we designate as the basic unit of inclusion. In turn, ubergraphs are chygraphs where all intra-complex hypergraphs contain a single edge. Obviously in this case the intra-complex hypergraph has only one component with a size equal to the hyperedge cardinality. Therefore, all results derived in the following sections are valid for ubergraphs, after replacing component size by hyperedge cardinality.
II.6 Chygraphs properties
Many properties of graphs/hypergraphs are carried on to chygraphs. To make a distinction from the metrics associated with the complexes hypergraph structure, I will use greek letters to name quantities at the chygraph level. Given a complex , let be its associated hypergraph. Let be an atom or complex, where and . I will use the notation as equivalent to . The chy-adjacency matrix is the matrix with matrix elements
(1) |
where . The associated vertex chy-degrees
(2) |
Additional metrics are needed to characterize potential multi-level structure. Chygraphs may contain multiple levels of inclusion. At the lower level we have atoms, whose fine structure is null or not specified. One level above we have complexes, their internal structure being specified as hypergraph containing atoms and/or other complexes as vertices. In some systems it makes sense to define higher levels of inclusions. For example, when describing human populations by location, we speak of neighborhoods, cities, countries, continents and the world. This inclusion hierarchy leads to the definition of chygraph length.
Definition: chygraph length . Let be a chygraph. Let be a partition of that is non-intercepting ( for ), hierarchical (if then ) and complexes within the same partition have similar statistical properties. The chygraph length, denoted by , is the maximum among all such partitions.
The differentiation of partitions by statistical properties allows for the specification of multi-type structures. This is the case in multiplex graphs and hypergraphs. In this context two layers may have the same vertex set but the complexes may have different statistical properties depending on the layer.
As in the case of higher order networks De Domenico et al. (2013), we could extend other metrics such as the clustering coefficient and the entropy to chygraphs. Here I have limited my attention to metrics that are needed in the following sections.
III Percolation theory I: no intra-layer inclusions
In spite of its complexity the chygraph combinatorial structure is suitable for analytical treatment. I focus on percolation, a central problem in graph theory.
The generating functions technique has been used to solve percolation problems in graphs and hypergraphs Callaway et al. (2000); Ghoshal et al. (2009); Coutinho et al. (2020); Sun and Bianconi (2021). It follows a simple recipe: express the generating function of the component sizes distribution as a recursive function of itself, modulated by the generating functions of other relevant distributions. In the context of chygraphs, the key quantities are the component sizes and excess component sizes when the components are sampled from layer at random or coming from another layer , respectively. Here we need to be precise about how we arrive from another layer.
We can arrive to a reference complex in two different ways. From the complexes it includes (coming from below) or from the complexes where it is included (coming from above). If there are no intra-layer connections the notation does not require any further specification. When it is clear we are arriving from below. In contrast, when we are arriving from above. In this section I assume there are no intra-layer connections. The case with intra-layer connections will be the subject of next section.
The component sizes depend on the joint distribution of chy-degrees , in-complex hypergraph component sizes and their excess equivalents and when reached from layer . The notation ) indicates that a component within a complex at layer is composed of vertices from . In turn, indicates that the chy-degree of a vertex at layer is decomposed into chy-degrees to vertices in layers . The probability generating functions of , , , , and are denoted by , , , , and , respectively. Since they are generating functions of probability distributions, they are all equal to 1 when evaluated at and their first derivatives are equal to the corresponding expected values.
III.1 Mean component size
The definition of chygraph is translated into a set of self-consistent equations for the component size generating functions. As a guideline, we arrive to a reference complex and from the complex we navigate the chygraph. Following the chygraph adjacency matrix, leading us to complexes including the reference complex, or going through the complexes included in the reference complex hypergraph. More precisely,
(3) | |||||
(4) | |||||
where . Note that the first in the right hand side of these equations corresponds to the reference complex, the terms containing and to navigations using the chygraph adjacency matrix (complexes including the reference complex) and the terms containing and to navigations through the complexes in the reference complex hypergraph.
III.2 Mean component size
The mean excess component sizes can be calculated from Eq. (4), resulting in
(5) |
where . Note that and only when . When we come from a layer into a layer and then return to layer . Now comes the vectorization trick.
The matrix equation (5) can be solved by the vectorization method for matrix equations Horn et al. (1994). The vectorization operator transform a matrix into a column vector by stacking the columns of . For example,
(6) |
To handle tensors with four indexes I generalize the vectorization operator. The vectorization operator acting on the tensor creates a matrix by stacking the upper indexes along columns and lower indexes along rows. For example,
(7) |
Applying vectorization Eq. (5) is transformed to
(8) |
where
(9) |
(10) |
and is an integer Heaviside step function ( if or 0 otherwise). The linear systems of equations (8) has Cramer’s rule as a formal solution
(11) |
where is derived from by replacing the th column by . This solution is valid provided that is not singular. When the mean component sizes diverge and the system achieve percolation. Therefore, the chygraph critical percolation condition is given by
(12) |
III.3 Giant component
The equation for the mean component sizes (8) is valid provided . In the following I demonstrate that corresponds with the subcritical phase. Let be the probability that a vertex from layer selected at random does not belong to the giant component and let be the probability that a vertex at layer selected from a complex at layer does not belong to the giant component. These probabilities satisfy the self-consistent equations
(13) |
(14) |
This system of equations does not have an explicit analytic solution. A solution can be found by successive approximations, where the left hand side is interpreted as the iteration after plugging in iteration into the right hand side. In particular, in the absence of a giant component, Eqs. (13) and (14) admit the solution . Let us assume that , where . Keeping terms up to first order in in Eq. (14) results in the recursive approximation equations
(15) |
The linear map (15) converges to if and only if , where is the largest eigenvalue of . Therefore is the control parameter for the existence of a giant component. In the subcritical (supercritical) phase () there is not (there is) a giant component and (). The percolation transition takes place at the criticality condition , which is equivalent to the Eq. (12).
III.4
A hypergraph is mapped to a one layer chygraph where the complexes are the hypergraph edges: . In this case the excess component sizes is the excess hyperedges cardinality and the excess atoms degree is the vertices excess degree . Substituting into Eq. (17) one obtains the critical condition for hypergraphs: , in agreement with the result of Coutinho et al Coutinho et al. (2020). Furthermore, graphs are hypergraphs with excess cardinality 1 and, therefore, the criticality condition reduces to , as previously reported by Molloy and Reed Molloy and Reed (1998) and Callaway et al Callaway et al. (2000).
III.5
When there are two complex layers () then in Eq. (9) is reduced to
(18) |
with determinant
(19) |
We could expand the determinant but it would lead to a complicated algebraic expression. For most practical purposes it is best to work with the explicit matrix form in Eq. (19), including numerical calculations. Further simplifications are obtained when we consider combinatorial constructions with additional constraints.
III.6 Hierarchical inclusion
There are many systems where inclusion follows a hierarchy. Geographical zooming for example Okabe and Sadahiro (1996). It is straightforward to build hierarchical inclusion in chygraphs: atoms included in layer 1 complexes, layer 1 complexes included in layer 2 complexes, … , layer complexes included in layer complexes. In this context the tensor elements of are zero if or For the case this constraint leads to the elimination of the second and fifth row and the second and fifth column in Eq. (19), resulting in
(20) |
Expanding the latter determinant and substituting the explicit form of the tensor (Eq. (9)) we obtain
(21) | |||||
Within the parentheses we have the criticality condition for each layer alone. The last term represents the interactions including both layers. The latter is the bona fide complexity of hierarchical inclusion. The terms within the first two parenthesis can be positive, meaning no standard 1 layer percolation, and can become 0 due to the last interaction term.
III.7 Multiplex hypergraphs
Multiplex hypergraphs is another type of multi-layer structure Sun and Bianconi (2021). A multiplex hypergraph is a set of hypergraphs with the same set of vertices. Note that when the statistical properties of the hypergraphs are different the multiplex hypergraph is not statistically equivalent to . A multiplex hypergraph can be mapped to the chygraph , where all edges are represented by complexes and the complexes are partitioned according to the hypergraph they originated from. For the case , the absence of inclusion of complexes into complexes leads to the elimination of the fourth and sixth row and the fourth and sixth column of (19), resulting in
(22) |
Expanding the latter determinant and substituting the explicit form of the tensor (Eq. (9)) we obtain
(23) | |||||
Within the first two parentheses there is the contribution of each hypergraph layer alone. The last term represents the interaction between the two hypergraphs via the vertices. The latter is the bona fide complexity of multiplex hypergraphs. The terms within the first two parenthesis can be positive, meaning no standard hypergraph percolation, and can become 0 due to the last interaction term.
III.8 Numerical example
To test the analytical results I will use a random chygraph model with a multiplex structure: one layer of atoms, two layers of complexes with and complexes and inclusions of a randomly chosen atom into a randomly chosen complex in layer , . Since inclusions are random both the chydegrees and the component sizes have Poisson distributions with averages and , . For the Poisson distribution the excess average coincides with the average and therefore and , . Substituting these values into Eq. (23) we arrive to the criticality condition
(24) |
where , .
To test this expression I computed the giant component fraction and the mean component size excluding the giant component numerically. To this end I project the chygraph into a network where nodes represent atoms and complexes, while links represent inclusion relations. Figure 2 reports the numerical estimation of the giant component fraction and the mean component size as a function of , computed using Eq. (24). There is a phase transition at , with the emergence of a giant component and a maximum of the mean component size excluding the giant component. The agreement demonstrates the validity of the analytical results for multiplex chygraphs.
In turns out this example is a validation for the hierarchical inclusion as well. Consider another random chygraph model with the following inclusion structure: one layer of atoms, a layer labeled 1 with complexes and inclusions of a randomly chosen atom into a randomly chosen complex from layer 1, and a layer labeled 2 with complexes and inclusions of a randomly chosen complex from layer 1 into a randomly chosen complex from layer 2. One can check that the criticality condition for chygraphs with hierarchical inclusion in Eq. (21) is reduced to Eq. (24). Furthermore, projecting the chygraph into a network, where nodes represent atoms and complexes, while links represent inclusion relations, we obtain the same network as for the multiplex chygraph described above. Note that this equivalence holds for . For there is no equivalence between random chygraphs with a multiplex or hierarchical inclusion structure. For the multiplex structure the network projection has one layer connected to layers. For the hierarchical inclusion the network projection has ordered layers with links between adjacent ayers only.
IV Percolation theory II: with intra-layer inclusions
We can arrive to a reference complex from the complexes it includes (from below) or where it is included (from above). When there are no intra-layer connections the pair of indexes is sufficient to distinguish how we arrived to the reference complex. If we came from below. If we cam from above. However, when there are intra-layer inclusions the two indexes are not sufficient to specify how we arrived to a complex for the case . We need an extra index. In the context of the generating function formalism that means we need two component size generating functions when arriving to a complex from an atom or another complex. I will denote them by when arriving from below and when arriving from above. With this distinction Eq. (3) and (4) are rewritten as
(25) | |||||
(26) | |||||
where and
(27) |
(28) |
Note I have implicitly assumed that the chydegrees and the intra-complex hypergraph components are independent.
The mean excess component sizes , , can be calculated from Eq. (26), resulting in
(29) |
where
(30) |
(31) |
We can apply the vectorization trick to transform Eq. (29) into a standard linear system of equations. To do so we need to apply the vectorization trick twice. Once as done before for the indexes and another one for the index . After the first vectorization Eq. (29) can be rewritten as
(32) |
and after the second vectorization as
(33) |
where
(34) |
(35) |
Finally, the criticality condition is
(36) |
IV.1
IV.2 Numerical example
To test Eq. (37) I consider a random chygraph with a layer of atoms, a layer of complexes, inclusions of a randomly selected atom into a randomly selected complex and inclusions of a randomly selected complex into a randomly selected complex. A possible realization of this model is illustrated in Fig. 1.
Since inclusions are random both the chydegrees and the component sizes have Poisson distributions with averages , , and . For the Poisson distribution the excess average coincides with the average and therefore , , and . Substituting these values into Eq. (37) we obtain
(38) |
where and , .
Figure 3 reports the numerical estimation of the giant component fraction and the mean component size as a function of , computed using Eq. (38). There is a phase transition at , with the emergence of a giant component and a maximum of the mean component size excluding the giant component. The agreement demonstrates the validity of the analytical results for chygraphs with intra-layer connections.
V Conclusions
In conclusion, chygraphs are a versatile combinatorial structure to represent complex systems. They allow for encoding different types of structural heterogeneities and hierarchical constructions. The key ingredient is the fractal nature of the chygraph: a complex is composed of atoms and other complexes and it can be part of other complexes as well. I have calculated the component sizes statistics of chygraphs using vectorization. Future work is required to extend this formalism to other problems.
References
- Albert and Barabási (2002) R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2002).
- Ghoshal et al. (2009) G. Ghoshal, V. Zlatić, G. Caldarelli, and M. E. J. Newman, Phys. Rev. E 79, 066118 (2009).
- Vazquez (2009) A. Vazquez, J. Stat. Mech.: Theory Exp. 2009, P07006 (2009).
- Coutinho et al. (2020) B. C. Coutinho, A.-K. Wu, H.-J. Zhou, and Y.-Y. Liu, Phys. Rev. Lett. 124, 248301 (2020).
- Civilini et al. (2021) A. Civilini, N. Anbarci, and V. Latora, Phys. Rev. Lett. 127, 268301 (2021).
- Sun and Bianconi (2021) H. Sun and G. Bianconi, Phys. Rev. E 104, 034306 (2021).
- Gómez et al. (2013) S. Gómez, A. Díaz-Guilera, J. Gómez-Gardeñes, C. J. Pérez-Vicente, Y. Moreno, and A. Arenas, Phys. Rev. Lett. 110, 028701 (2013).
- Petri and Barrat (2018) G. Petri and A. Barrat, Phys. Rev. Lett. 121, 228301 (2018).
- Courtney and Bianconi (2018) O. T. Courtney and G. Bianconi, Phys. Rev. E 97, 052303 (2018).
- Iacopini et al. (2019) I. Iacopini, G. Petri, A. Barrat, and V. Latora, Nature Comm. 10, 2485 (2019).
- Joslyn and Nowak (2017) C. Joslyn and K. Nowak, “Ubergraphs: A definition of a recursive hypergraph structure,” (2017), https://doi.org/10.48550/arXiv.1704.05547.
- Yadati et al. (2021) N. Yadati, D. R S, V. S, I. K M, and S. G, in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (Association for Computational Linguistics, Online, 2021) pp. 448–454.
- Baas (2019) N. A. Baas, Int. J. Gen. Syst. 48, 603 (2019).
- Newman (2001) M. E. J. Newman, Proc. Natl. Acad. Sci. USA 98, 404 (2001).
- Barabási et al. (2002) A. Barabási, H. Jeong, Z. Néda, E. Ravasz, A. Schubert, and T. Vicsek, Physica A 311, 590 (2002).
- Redner (1998) S. Redner, Eur Phys J B 4, 131 (1998).
- Lehmann et al. (2003) S. Lehmann, B. Lautrup, and A. D. Jackson, Phys. Rev. E 68, 026113 (2003).
- Vasilyeva et al. (2021) E. Vasilyeva, A. Kozlov, K. Alfaro-Bittner, D. Musatov, A. M. Raigorodskii, M. Perc, and S. Boccaletti, Sci. Rep. 11, 5666 (2021).
- De Domenico et al. (2013) M. De Domenico, A. Solé-Ribalta, E. Cozzo, M. Kivelä, Y. Moreno, M. A. Porter, S. Gómez, and A. Arenas, Phys. Rev. X 3, 041022 (2013).
- Callaway et al. (2000) D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys. Rev. Lett. 85, 5468 (2000).
- Horn et al. (1994) R. Horn, R. Horn, and C. Johnson, Topics in Matrix Analysis (Cambridge University Press, New York, 1994).
- Molloy and Reed (1998) M. Molloy and B. Reed, Comb., Probab. Comp. 7, 295–305 (1998).
- Okabe and Sadahiro (1996) A. Okabe and Y. Sadahiro, Environment and Planning A: Economy and Space 28, 1533 (1996).