5
November/December 2015 Copublished by the IEEE CS and the AIP 1521-9615/15/$31.00 © 2015 IEEE Computing in Science & Engineering 91 COMPUTING PRESCRIPTIONS Editors: Francis Sullivan, [email protected] | Ernst Mucke, [email protected] Is This for Real? Fast Graphicality Testing Brian Cloteaux | National Institute of Standards and Technology O ne of the major advances in modeling networks is the realization that many of them (social, Internet, power grid, and so on) have predictable degree se- quences—that is, the expected number of connec- tions from a given node comes from some well-understood distribution. Models of these networks usually begin with ran- dom graphs whose degrees have the desired degree distribution. One task that repeatedly comes up in creating these model networks is testing whether a graph can even be creat- ed from a given integer sequence. 1 Simply drawing a random sequence of integers from some distribution doesn’t guaran- tee that there even exists a graph with that degree sequence. When we have an integer sequence that can be realized as the degree sequence of some graph, we say that the sequence is graphic. While almost any graph theory book describes how a sequence can be tested to see if it’s graphic, surprisingly (be- cause this testing has common applications), very little has been written about how to do it efficiently. is highlights the gap between simply knowing that a problem is theoreti- cally tractable and being able to implement an efficient algo- rithm for practical use. Determining Graphicality e most common result for determining whether a sequence of positive integers is graphic or not (and the one that we’ll investigate here) was originally given by Paul Erdős and Tibor Gallai. 2 Another common approach to graphicality testing is the Havel-Hakimi method, 3 an algorithm that creates a graph to realize degree sequence but that’s asymptotically slower than an efficient implementation of the Erdős-Gallai criterion. e Erdős-Gallai test involves first preprocessing the se- quence and then checking it against a series of inequalities. One part of the preprocessing step is to ensure that the sum of the sequence is even. is requirement for all graphic degree sequences is commonly called the handshaking lemma. e name comes from the idea that if you total all the handshakes

Is This for Real? Fast Graphicality Testing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Section titleEditors: Konrad Hinsen, [email protected] | Konstantin Läufer, [email protected]

November/December 2015 Copublished by the IEEE CS and the AIP 1521-9615/15/$31.00 © 2015 IEEE Computing in Science & Engineering 91

Computing presCriptionsEditors: Francis Sullivan, [email protected] | Ernst Mucke, [email protected]

Is This for Real? Fast Graphicality Testing

Brian Cloteaux | National Institute of Standards and Technology

One of the major advances in modeling networks is the realization that many of them (social, Internet, power grid, and so on) have predictable degree se-quences—that is, the expected number of connec-

tions from a given node comes from some well-understood distribution. Models of these networks usually begin with ran-dom graphs whose degrees have the desired degree distribution.

One task that repeatedly comes up in creating these model networks is testing whether a graph can even be creat-ed from a given integer sequence.1 Simply drawing a random sequence of integers from some distribution doesn’t guaran-tee that there even exists a graph with that degree sequence. When we have an integer sequence that can be realized as the degree sequence of some graph, we say that the sequence is graphic. While almost any graph theory book describes how a sequence can be tested to see if it’s graphic, surprisingly (be-cause this testing has common applications), very little has been written about how to do it efficiently. This highlights

the gap between simply knowing that a problem is theoreti-cally tractable and being able to implement an efficient algo-rithm for practical use.

Determining GraphicalityThe most common result for determining whether a sequence of positive integers is graphic or not (and the one that we’ll investigate here) was originally given by Paul Erdős and Tibor Gallai.2 Another common approach to graphicality testing is the Havel-Hakimi method,3 an algorithm that creates a graph to realize degree sequence but that’s asymptotically slower than an efficient implementation of the Erdős-Gallai criterion.

The Erdős-Gallai test involves first preprocessing the se-quence and then checking it against a series of inequalities. One part of the preprocessing step is to ensure that the sum of the sequence is even. This requirement for all graphic degree sequences is commonly called the handshaking lemma. The name comes from the idea that if you total all the handshakes

Computing presCriptions

92 November/December 2015

that each individual at a party has had, this sum will always be even—each handshake will be counted twice when we sum the totals for the two individu-als involved in any exchange. The same reasoning occurs when we count the edges connected to each node in a graph—each edge will be counted twice, making the sum even.

If the sum of the sequence is even, we’ve only ensured that the sequence is potentially graphic; we still need to perform the checks involving the in-equalities. But before we can perform those checks, we need to perform an additional preprocessing step to sort the sequence into reverse order. This is one application where simply using an optimized li-brary sort routine doesn’t give the best performance.

Comparison-based sorts, such as quicksort, run in time O(|d| log |d|), where d is a sequence of integers and |d| is the sequence length. In this case, though, a potential degree sequence can be sorted in linear time by using a bin (or bucket) sort.4

The key observation about why we can use a bin sort is that the range of possible integers for a graph-ic sequence is restricted to being between 0 and |d| – 1. If any integer in the sequence is greater than |d| – 1, it would require a node in the resulting graph to have more connections than the remaining num-ber of nodes in the graph, so we can reject the entire sequence as nongraphic. To perform the bin sort, we set up |d| bins labeled from 0 through |d| – 1. The output from the bin sort is an array n where ni returns the number of times the value i appears in the sequence d. Figure 1 shows an example of using a bin sort on an integer sequence.

We can also use this preprocessing pass on the values for implementing some additional inexpen-sive checks, to see whether the sequence is graphic or not. One useful check involves looking at the largest and smallest (nonzero) values in the sequence. If the sequence sum is even and the following inequality (which we call the ZZ condition) holds,

dd d

d≥

+ +( )max min

min

,1

4

2

(1)

Figure 1. An example of the bin-sort routine for an integer sequence. Notice that the indices for the array in this example are zero-based. This is because the integers for a graphic sequence can take any value from 0 to |d| – 1.

0 1 2 3 4 5 6 7 8

2 2 5 3 2 4 3 2 3

0 0 4 3 1 1 0 0 0

5 4 3 3 3 2 2 2 2

Unsorted d

Binned array n

Sorted d

1 2

2i = 1 iΣ

3 4 5 6 7 8 9

5 4 3 3 3 2 2 2 2d

d = 9 min (di, 2) = 142i = 3Σ

Figure 2. Computing one of the Erdos-Gallai inequalities, k = 2, where ∑ = ≤ − +∑ ( )== =i i i id d1

23

99 2 2 1 2 16( ) min , . In this example, the array is numbered starting from 1 instead of 0 because the Erdos-Gallai inequality assumes that the numbering in the sequence starts at 1. To code this example in a zero-based language, such as C, we would need to adjust the indices.

www.computer.org/cise 93

then the sequence d is graphic,5 and we can immediately accept the sequence and return. An example is the sequence in Figure 1, where we would be able to immediately accept this sequence as graphic because

9 5 2 14 2

82

≥+ +( )

⋅= . (2)

At this point, if we have neither accepted nor rejected the sequence during the preprocessing steps, we need to perform the Erdős-Gallai test. The original theorem says that for an integer se-quence d whose sum is even and sorted in reverse order, then d is graphic if and only if for every k from 1 to n the following inequalities hold:

d k k d ki ii k

d

i

k

≤ −( )+ ( )= +=∑∑ 1

11

min , . (3)

Here, the theorem assumes that the first position in the sequence is at the index k = 1, so we have to adjust this formula for languages that use zero-based arrays. Figure 2 shows an example of this test.

Adjusting Our AlgorithmsIf we code a straightforward implementation of these inequalities as nested loops, then we’ll have an algorithm that takes quadratic time, but we can do much better. If we’re careful, we can implement an algorithm to run in linear time. In addition, if we consider some additional results, we can create an algorithm that’s extremely efficient. This is where we close the gap between having a simple imple-mentation of a result and creating an extremely ef-ficient algorithm for practical use.

Two results are often used to speed up Erdős-Gallai checks; both state that we don’t have to con-sider all the inequalities to ensure that the sequence is graphic. The first result says that we only need to consider up to the largest index s in d, where ds ≥ s – 1.6 If a sequence has passed all the Erdős-Gallai checks up to this index s, then it’s guaranteed to pass all of them. This result limits the maximum number of inequalities that need to be checked to

s dii

d

≤=∑

1

. (4)

While this bound is s ≤ |d| in the worst case, for many common distributions, such as power law, this reduces the number of inequalities that we need to check to something closer to d .

The second result limits the number of in-equalities we need to check by showing that for runs of identical values in the sequence, we need only check one index with the Erdős-Gallai in-equalities.7 If the last index in a run of values passes the Erdős-Gallai inequality, then we’re guaranteed that all the indices with that same value will also pass the checks. Because most sequences have sev-eral runs of identical values, this result, in practice, greatly reduces the number of inequalities we need to consider.

If we use an additional array to hold the par-tial sums of the sequence, we can create a linear-time implementation, but this requires another pass through the data to create this array of partial sums. To avoid this additional pass, we can use a variant of the Erdős-Gallai inequalities that lets us test di-rectly the sequence from the terms of the binned array we produced during the sort; Figure 3 shows an example. This version was introduced by I. Zverovich and V. Zverovich,5 interestingly, not as a computational result but instead as an intermediate form for a proof. Their result changes the original Erdős-Gallai inequalities to

d k d k n inii

k

i ii

k

i

k

≤ −( )− −

= =

=

∑ ∑∑11 0

1

0

1, (5)

where we only check all k from 1 to the maximum index s, where ds ≥ s. Using this result, after we bin the sequence’s values, we directly apply the tests, saving an additional pass through the sequence. Fig-ure 3 shows an example.

Putting everything together, we get the algo-rithm in Figure 4. How fast is this implementa-tion in practice? It’s straightforward to see that the algorithm runs in linear time and, in fact, never needs to take more than two passes through

0 1 2 3 4 5 6 7 8

0 0 4 3 1 1 0 0 0n

2i = 0 i i iΣ 1

i = 0Σn in 2i = 1Σ d–2 = 0 = 9

Figure 3. One instance of the Zverorich and Zverorich formulation on the binned numbers array. Here, the inequality for the second position in the original sequence (k = 2) is ∑ = ≤ − − ∑ −∑( )= − == = =i i i i i id n in1

20

10

19 2 9 1 2 16 0 1( ) 66, matching the inequality from Figure 2.

Computing presCriptions

94 November/December 2015

the sequence. Sorting the original sequence re-quires one pass, while testing the inequalities is the second. Actually, for the second pass, because we only test to the maximum index s, where ds ≥ s, the number of inequalities that we need to check is usually a small fraction of the total length. In addition, because we only need to test once for identical values in a sequence, we con-sider each bin once with the assurance that any

value in it would pass the test. In practice, this algorithm usually requires a little more than one pass through the sequence.

This algorithm nicely illustrates that, when faced with a particular problem, a theoretical result

for solving that problem is really only a starting point. The challenge is being able to adapt the result

def is_graphical(deg_sequence):

# Sort and perform some simple tests on the sequence

p = len(deg_sequence)

num_degs = [0]*p

dmax, dmin, dsum, n = 0, p, 0, 0

for d in deg_sequence:

# Reject if degree is negative or larger than the sequence length

if d<0 or d>=p:

return False

# Process only the non-zero integers

elif d>0:

dmax, dmin, dsum, n = max(dmax,d), min(dmin,d), dsum+d, n+1

num_degs[d] += 1

# Reject sequence if it has odd sum

if dsum%2:

return False

# Accept if sequence has no non-zero degrees or passes the ZZ condition

# Perform the EG checks using the reformulation of Zverovich and Zverovich

k, sum_deg, sum_ni, sum_ini = 0, 0, 0, 0

for dk in range(dmax, dmin-1, -1):

elif n==0 or 4*dmin*n >= (dmax+dmin+1) * (dmax+dmin+1):

return True

if dk < k+1: # Check if we have gone past the end

return True

if num_degs[dk] > 0:

run_size = num_degs[dk] # Process a run of identical values

if dk < k+run_size: # Check if end of run is past the end

run_size = dk-k # Adjust back to the last index

sum_deg += run_size * dk

for v in range(run_size):

sum_ni += num_degs[k+v]

sum_ini += (k+v) * num_degs[k+v]

k += run_size

if sum_deg > k*(n-1) - k*sum_ni + sum_ini:

return False

return True

Figure 4. A Python routine for testing if a sequence is graphic.

www.computer.org/cise 95

to quickly solve either a current problem or perhaps a set of “typical” problems. This is where the result truly becomes an algorithm.

References1. J. Blitzstein and P. Diaconis, “A Sequential

Importance Sampling Algorithm for Generating Random Graphs with Prescribed Degrees,” Internet Mathematics, vol. 6, no. 4, 2010, p. 489; http://dx.doi.org/10.1080/15427951.2010.557277.

2. P. Erdős and T. Gallai, “Graphs with Prescribed Degrees of Vertices,” Matematikai Lapok, vol. 11, 1960, pp. 264–274.

3. S.L. Hakimi, “On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph II. Uniqueness,” J. SIAM, vol. 11, no. 1, 1963, pp. 135–147.

4. T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed., MIT Press, 2009.

5. I. Zverovich and V. Zverovich, “Contributions to the Theory of Graphic Sequences,” Discrete

Mathematics, vol. 105, nos. 1-3, 1992, pp. 293–303.6. S.Y.R. Li, “Graphic Sequences with Unique

Realization,” J. Combinatorial Theory Series B, vol. 19, no. 1, 1975, pp. 42–68.

7. A. Tripathi and S. Vijay, “A Note on a Theorem of Erdős & Gallai,” Discrete Mathematics, vol. 265, nos. 1-3, 2003, pp. 417–420.

Brian Cloteaux is a computer scientist in the Applied and Computational Mathematics Division at the National Institute of Standards and Technology. His research inter-ests include algorithms, computational complexity, and network science. Cloteaux has a PhD in computer science from New Mexico State University. Contact him at [email protected].

Selected articles and columns from IEEE Computer Society publications are also available for free at

http://ComputingNow.computer.org.

PURPOSE: The IEEE Computer Society is the world’s largest association of computing professionals and is the leading provider of technical information in the field.MEMBERSHIP: Members receive the monthly magazine Computer, discounts, and opportunities to serve (all activities are led by volunteer members). Membership is open to all IEEE members, affiliate society members, and others interested in the computer field.COMPUTER SOCIETY WEBSITE: www.computer.org

Next Board Meeting: 15–16 November 2015, New Brunswick, NJ, USA

EXECUTIVE COMMITTEEPresident: Thomas M. ContePresident-Elect: Roger U. Fujii; Past President: Dejan S. Milojicic; Secretary: Cecilia Metra; Treasurer, 2nd VP: David S. Ebert; 1st VP, Member & Geographic Activities: Elizabeth L. Burd; VP, Publications: Jean-Luc Gaudiot; VP, Professional & Educational Activities: Charlene (Chuck) Walrad; VP, Standards Activities: Don Wright; VP, Technical & Conference Activities: Phillip A. Laplante; 2015–2016 IEEE Director & Delegate Division VIII: John W. Walz; 2014–2015 IEEE Director & Delegate Division V: Susan K. (Kathy) Land; 2015 IEEE Director-Elect & Delegate Division V: Harold Javid

BOARD OF GOVERNORSTerm Expiring 2015: Ann DeMarle, Cecilia Metra, Nita Patel, Diomidis Spinellis, Phillip A. Laplante, Jean-Luc Gaudiot, Stefano ZaneroTerm Expriring 2016: David A. Bader, Pierre Bourque, Dennis J. Frailey, Jill I. Gostin, Atsuhiro Goto, Rob Reilly, Christina M. SchoberTerm Expiring 2017: David Lomet, Ming C. Lin, Gregory T. Byrd, Alfredo Benso, Forrest Shull, Fabrizio Lombardi, Hausi A. Muller

EXECUTIVE STAFFExecutive Director: Angela R. Burgess; Director, Governance & Associate Executive Director: Anne Marie Kelly; Director, Finance & Accounting: Sunny Hwang; Director, Information Technology Services: Ray Kahn; Director, Membership: Eric Berkowitz; Director, Products & Services: Evan M. Butterfield; Director, Sales & Marketing: Chris Jensen

COMPUTER SOCIETY OFFICESWashington, D.C.: 2001 L St., Ste. 700, Washington, D.C. 20036-4928Phone: +1 202 371 0101 • Fax: +1 202 728 9614Email: [email protected] Alamitos: 10662 Los Vaqueros Circle, Los Alamitos, CA 90720Phone: +1 714 821 8380 • Email: [email protected]

MEMBERSHIP & PUBLICATION ORDERSPhone: +1 800 272 6657 • Fax: +1 714 821 4641 • Email: [email protected]/Pacific: Watanabe Building, 1-4-2 Minami-Aoyama, Minato-ku, Tokyo 107-0062, Japan • Phone: +81 3 3408 3118 • Fax: +81 3 3408 3553 • Email: [email protected]

IEEE BOARD OF DIRECTORSPresident & CEO: Howard E. Michel; President-Elect: Barry L. Shoop; Past President: J. Roberto de Marca; Director & Secretary: Parviz Famouri; Director & Treasurer: Jerry Hudgins; Director & President, IEEE-USA: James A. Jefferies; Director & President, Standards Association: Bruce P. Kraemer; Director & VP, Educational Activities: Saurabh Sinha; Director & VP, Membership and Geographic Activities: Wai-Choong Wong; Director & VP, Publication Services and Products: Sheila Hemami; Director & VP, Technical Activities: Vincenzo Piuri; Director & Delegate Division V: Susan K. (Kathy) Land; Director & Delegate Division VIII: John W. Walz

revised 5 June 2015