Analyzing Kleinberg’s(and other)Small-world Models∗
Chip Martel Computer Science Dept.
University of California
Davis,CA95616 martel@cs.ucdavis.edu
Van Nguyen
Computer Science Dept.
University of California
Davis,CA95616 nguyenvk@cs.ucdavis.edu
ABSTRACT
We analyze the properties of Small-World networks,where links are much more likely to connect“neighbor nodes”than distant nodes.In particular,our analysis provides new re-sults for Kleinberg’s Small-World model and its extensions. Kleinberg adds a number of directed long-range ran
dom links to an n×n lattice network(vertices as nodes of a grid, undirected edges between any two adjacent nodes).Links have a non-uniform distribution that favors arcs to close nodes over more distant ones.He shows that the following phenomenon occurs:between any two nodes a path with ex-pected length O(log2n)can be found using a simple greedy algorithm which has no global knowledge of long-range links. We show that Kleinberg’s analysis is tight:his algorithm achievesθ(log2n)delivery time.Moreover,we show that the expected diameter of the graph isθ(log n),a log n fac-tor smaller.We also extend our results to the general k-dimensional model.Our diameter results extend traditional work on the diameter of random graphs which largely fo-cuses on uniformly distributed arcs.Using a little addi-tional knowledge of the graph,we show that we canfind shorter paths:with expected length O(log3/2n)in the ba-sic2-dimensional model and O(log1+1/k n)in the general k-dimensional model(for k≥1).
Finally,we suggest a general approach to analyzing a broader class of random graphs with non-uniform edge prob-abilities.Thus we show expectedθ(log n)diameter results for higher dimensional grids,as well as settings with less uni-form base structures:where links can be missing,where the probability can vary at different nodes,or where grid-related he use of lattice distance)has a weaker role or is dismissed,and constraints(such as the uniformness of degree distribution)are relaxed.
∗This work was supported by NSF grant CCR-85961 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.
PODC’04,July25–28,2004,St.Johns,Newfoundland,Canada. Copyright2004ACM1-58113-802-4/$5.00.Categories and Subject Descriptors
F.2[Analysis of Algorithms and Problem Complex-ity]:Miscellaneous;
G.2.2[Discrete Mathematics]:Graph Theory—Graph Algorithms,Network Problems;C.2.2[Net-work Protocols]:Routing Protocols
General Terms
Algorithms,Performance,Theory
Keywords
Small-world network,random graphs,routing,diameter 1.INTRODUCTION
Small-world networks(SWN)have been an active and common topic in many disciplines,including the social and natural sciences.These networks possess a striking property, the so called small-world phenomenon,also often spoken of as“six degrees of separation”(between any two people in the United States).Milgram discovered this in his pioneer-ing work in the1960’s[21],and the more recent work by Dodds et al.suggests its still true[10].Since many real net-works exhibit small-world properties,a number of network models have been proposed as a framework to study this phenomenon.Recently,Kleinberg[14],building on the work of Watts and Strogatz[22],proposed a family of SWNs to study another compelling aspect of Milgram’s originalfind-ings:a greedy algorithm using only local information can construct short paths.
As Kleinberg has commented,it is striking that short paths not only exist but can be found with limited knowl-edge of the global network.Algorithmic results in this area improve our understanding of many practical network struc-tures and bring in potential applications related to routing problems in the Internet and peer-to-peer networks.Thus,finding models with this feature of SWNs,with an emphasis on algorithmic aspects,is well motivated.Kleinberg de-veloped an interesting model of this,and it has generated considerable followup work.
Kleinberg’s basic model uses a two-dimensional grid as a base with long-range random links added be
tween any two nodes u and v with a probability proportional to d−2(u,v), the inverse square of the lattice distance between u and v. In the basic model,from each node there is an undirected local link to each of its four grid neighbors and one directed long-range random link.In this setting Kleinberg shows that a simple greedy algorithm using only local information
finds routes between any source and destination using only O(log2n)expected links[14].
Kleinberg leaves two important issues open in the analysis of routing in his model.We complete the analysis in this paper and then extend our techniques to a broader range of settings.First we show that the O(log2n)expected time analysis is tight(thus except for pairs which are quite close, Kleinberg’s algorithm uses expectedΘ(log2n)links).
Our second main result shows that the expected diame-ter of this graph isΘ(log n).This extends traditional work on the diameter of random graphs which largely focuses on uniformly distributed arcs[7].This diameter result shows that an algorithm with global knowledge of the random links can improve on Kleinberg’s decentralized algorithm by a log factor.We then give an intermediate algorithm which uses some additional,fairly local,information to improve the ex-pected time to O(log3/2n)in the basic2-dimensional model and O(log1+1/k n)in the k-dimensional model(for k≥1). In addition,we develo
p techniques for analyzing random graphs with non-uniform arc distributions.We are able to characterize properties which lead to small diameter graphs, and use these to prove O(log n)expected bounds on the di-ameter of several settings:k-dimensional versions of Klein-berg’s model;settings where arc probabilities are propor-tional to d−r(u,v)for0≤r<2;also for some non-grid settings and where nodes have differing degree or arc distri-butions.
Since real-world networks often are better modeled by random networks with non-uniform arc distributions,our analysis results may help create and analyze more accurate network models.
The structure of this paper.Section3presents defi-nitions and supporting facts on Kleinberg’s Small-world set-ting.Section4discusses our bound on Kleinberg’s delivery time and introduces our alternative algorithm.Section5de-scribes our diameter results for Kleinberg’s model and more general settings.In Section6we summarize our results and suggest additional open problems.
2.RELATED WORK
There has been considerable work on the small-world phe-nomenon.See[15]for early surveys and[14]for a more recent account on modeling small-world networks.Before Kleinberg’s model,Watts and Strogatz[22]proposed a re-fined model by randomly rewiring the edges of a ring lattice each with a
probability parameter p.Watts and Strogatz observed that for small p the model reflects many practi-cal small-world networks with small typical path length and non-negligible clustering coefficient.
Applications have been found using Kleinberg’s small-world model or the ideas behind it,such as decentralized search protocols in peer-to-peer systems[20,25],and gos-sip protocols for spreading information in a communication network[11].See also[13]for a generalization that encom-passes both lattice-based and tree-based(“taxonomic”or “hierarchical”)small-world networks,and[1]for a practical approach to measure the diameter of the World Wide Web using a simulation model based on the power law distribu-tion.
The diameter of random graphs is a classic problem[5,6, 7,8]but most results use uniformly distributed arcs.Bol-lobas and Chung[6],study a graph model very similar to Watts and Strogatz in[22]with the nodes of a cycle(or a“ring”)randomly matched to form additional long-range links.The closest diameter work with non-uniform arc prob-abilities is by Benjamini and Berger who study the diameter of1-D long-range percolation graphs[4]and Coppersmith et al.who extend this to k-D grids[9].As in Kleinberg’s model,a grid with(undirected)local links is augmented by undirected long-range random links whose probability is inversely related to their distance.Both papers prove di-ameter results which show how the diameter changes as the arc probability parameters change.Note that in contrast to Klei
nberg’s model,the added links are undirected,and the out-degree of a node is notfixed.Thus the analysis tech-niques here are rather different than those to analyze Klein-berg’s and related models.OurΘ(log n)diameter bound improves on the best prior result of O(log n log log2n)due to Lebhar and Schabanel’s work onfinding short paths[16]. There have also been several recent papers which analyze greedy routing in other small-world like networks.Barriere et al.[3]give matching upper and lower bounds for greedy routing in a ring network augmented by random links.They also give a nice general framework for analyzing greedy rout-ing,and prove a polylog upper bound on greedy routing for a broad class of what they call“Long-range contact”graphs. Our results have some similarities,though we focus on di-ameter issues and use mostly different techniques.Aspnes et al.[2]and Manku and Manku et al.[18,17]look at using small-world type networks for routing in peer-to-peer sys-tems.Manku et alpares deterministic and randomized structures,as well as considering the effect of one-lookahead. Aspnes et al.analyze expected performance with additional links and look at the effect of edge failures.They produce some useful results for doing analysis though their focus is on the one dimensional setting.Recently,Fraigniaud et al.
[12]study greedy routing and also show that with some ex-tra knowledge about the graph it is possible to route using O(log1+1/k n)expected steps in the general k-dimensional model(for k≥1).However,they
achieve this with an oblivious greedy algorithm,while ours requires keeping ad-ditional state information.In addition,they show this path length is the best possible for a range of greedy routing algo-rithms(even ones which have complete global knowledge). They give a general lower-bound on greedy routing which also shows that Kleinberg’s O(log2n)analysis is tight for greedy routing in the basic model.
3.BASIC FACTS AND DEFINITIONS
We now present our notation and basic facts on Klein-berg’s small world setting.Let V denote the set of all nodes, the size of which is n2.We use K(n,p,q)to refer to the class of all random graphs based on Kleinberg’s model:an n×n grid,undirected local links(from a node)to all nodes within distance p,and q long range directed links/node such that the probability of a link from u to v is proportional to d−2(u,v).We also use K∗(n,p,q)for a similar class,de-fined with respect to the lattice distance with wrap-around: d(u,v)=min{|k−i|,n−|k−i|}+min{|l−j|,n−|l−j|}.For simplicity,we usually show our main results for K∗(n,p,q) but most of our work can be easily extended to K(n,p,q). Let B l(u)denote the set of all the nodes within lattice distance l from u;we call this a ball of radius l and center u (actually shaped as a diamond in a two dimensional grid). Define b l(u)as the number of nodes at distance l from u, des on the‘surface’of B l(u).
First,we consider some basic facts for the simple models with p=q=1,which,we denote by K∗and K.For any two nodes u and v,let p(u,v)denote the probability that there is a random link from u to v.Since p(u,v)∼d−2(u,v),we have p(u,v)=d−2(u,v)/c u,where c u is the inverse normalized coefficient,a constant depending on n and the position of u:
c u=
∀v=u d−2(u,v)=
2n−2
j=1
b j(u)j−2.
Note that we always have p(u,v)=p(v,u)if the graph is from K∗,and so c u is the same for all u;call this value c∗.It is easy to see that b j(u)=θ(j),so c u can be approximated by a harmonic sum.The following fact is trivially implied from Kleinberg’s analysis in[14].
Fact1.For graphs from K∗or K,the inverse normalized coefficient c u=θ(log n)and for any two distinct nodes u and v,p(u,v)=Ω((n2log n)−1).Especially,c∗=4ln n+O(1)1.
3.1Links into or out of a ball
Our analysis of the expected diameter(section5)builds on a standard approach.We consider the expected length of a path from an arbitrary start s to a destination t;for simplicity,we start with K∗(n,1,4)for a basic framework then extend our analysis to K∗(n,1,1)and others.For K∗(n,1,4),using only the long-range links consider all nodes at distance one from s,then all new nodes these can reach (so at distance two),and so on.If these sets grow expo-nentially in size,we will quickly reach a large subset of the nodes in the graph.We prove two important facts which help to analyze the growth rate.
For a given set Q of nodes,we construct a set R,a col-lection of nodes not in Q but reachable by a random link from Q;we also consider a reverse scenario where R con-tains nodes with a random link i nto Q.We focus on lower bounding the ratio|R|/|Q|,which is supported by these two elementary experiments:to see if a random link from the center of a ball goes out of this ball and to see if there is a random link from outside the ball which goes to its center. Note that we have non-uniform probabilities,so the ratio |R|/|Q|depends on the shape of Q as well as its size;we also have directed links,so the analysis on incoming links differs from outgoing links2.
Fact2.On a graph from K∗,given any positiveθ<1, any integer1≤l≤nθ,for n large enough:i)the probability
that a random link from a given node u goes to a node outside of B l(u)is greater than1−θ−o(1);ii)the probability that there is a random link to u from a node outside of B l(u)is greater than1−eθ+o(1)−almost1−eθ−1).
We are most interested in values ofθjust above0.5since then|B l(u)|=Ω(n log n).Thus,a link going out has prob-ability 0.5(1−θ)and in is 0.39(1−eθ−1). Proof.For i)let E be the event that u has a random link going to a node v/∈B l(u).We have
Pr[E]=
v∈B l(u)d−2(u,v)/c∗≤
l
j=1
b j(u)(j−2)/c∗.
1Since in K∗,b
j (u)=4j if j≤ n/2 and0otherwise.
2In K∗|K,each node has exactly one outgoing random link, but the number of incoming links varies.Since b j(u)≤4j we have
l
j=1
b j(u)(j−2)≤4
l
j=1
j−1≤4ln(3l)≤4ln(3nθ).
With c∗=4ln n+O(1)we have
Pr[E]≤
4ln(3nθ)
4ln n+O(1)
θln n+ln3
ln n+O(1)
=θ+o(1)
Thus Pr[E]≥1−θ−o(1).
For ii)let F be the event that there is a random link coming to u from a node v outside B l(u);thus,
Pr[F]=
v/∈B l(u)
(1−p(v,u))≤
v/∈B l(u)
e−p(v,u)=e
p(v,u)
v/∈B l(u)
Here we use the well known fact e x≥1+x to obtain e−p(u,v)≥1−p(u,v).Since p(u,v)=p(v,u),
Pr[F]≤e
p(u,v)
v/∈B l(u)=e−Pr[E]≤eθ+o(1)−1
So Pr[F]=1−P r[F]≥1−eθ+o(1)−1
.
In a similar manner,we have an equivalent result for K as follows.See[19]for the proof of this.
Fact3.On a graph from K,given any positiveθ<1and integer1≤l≤nθ,for n large enough:i)the probability that a random link from a given node u goes to a node outside of B l(u)is greater than1−θ
1+3θ
+o(1);ii)the probability that there is a random link to u from a node outside of B l(u)is greater than1−e−(1−θ)/4+o(1).
3.2Extended models
As a natural generalization,we can extend all these con-cepts and basic facts to the classes of graphs based on a k-dimensional grid for k≥1by updating the distribution rule of the random links:p(u,v)∼d−k(u,v)instead.Let K∗(k,n,p,q)and K(k,n,p,q)denote such classes of graphs based on this k-dimensional grid(with size n in each di-mension)with and without wrap-around,respectively.In K(k,n,p,q),for a node u close to the edge of the grid,a ball centered at u may not be a‘full’ball but it is easy to extend all our results with balls to cover such a case.
It is relatively easy to prove that b(j)=θ(j k−1)3.So, b j(u)j−k=θ(j−1),which means that we can still use h
ar-monic sums to bound the likes of
b j(u)j−k as before. Thus,we still have
c u=θ(log n).More importantly,we can generalize the facts on“links into or out of a ball”as follows(for simplicity,we do not claim more exact bounds).
Fact4.On a graph from K(k,n,p,1)or K∗(k,n,p,1), for any given4positiveθ<0.6,there exist positive constants ξ1andξ2such that for n large enough and l≤nθ:
i)the probability that there is a random link from a node u to a node outside of B l(u)is≥ξ1;
ii)the probability that there is a random link(with a given label)to u from a node outside of B l(u)is≥ξ2.
3Based on counting the number of ways to choose k positive numbers such that they sum to a given positive number(j). 4The fact holds for any0<θ<1but we useθ<.6since we only needθabout0.5in our later analysis).
It is also easy to extend this fact for arbitrary q≥  1. Each node now has q independent random links where each of them can be labeled from1to q.Fact3is then,concerned with a link specified by a given label and an arbitrary center node u.
3.3Links to a spherical surface
We now study another probabilistic experiment in the general models K(k,n,p,q)and K∗(k,n,p,q),namely,if a random link from a given node u goes to the surface of a given ball B,where u is outside of B.We denote the prob-ability of this by P(u,S B),where S B is the set of nodes on the surface of ball B.Also define distance d(u,B)between node u and ball B as the minimum lattice distance from u to a node on B’s surface.Our lower bound in section4will be based on this experiment.
Fact5.For any k≥1,there exists a constantˆc such that for any ball B=B l(v)and node u outside B on a graph from K∗|K(k,n,p,q):P(u,S B)≤ˆc
m log n
,where m=d(u,B).
Also,if k≥2and l≤m then P(u,S B)≤ˆc l
2
.
Note that for thefirst(main)part of the fact,the bound
(ˆc
m log n )depends on the distance from u to the ball only,and
thus,is independent of the size of the ball.Intuitively,if we think of this probability as a measure of‘attractive force’(to u),the force generated by S B is not stronger than the joint force generated by m nodes at about distance m from u.That is,a small fraction of nodes in S B,which are at the ‘pole’closest to u,generates a dominant term for P(u,S B).
A rigorous proof for thefirst part is complicated,so we leave the proof to the appendix.Meanwhile,the second
part is easy:P(u,S B)=
w∈S B p(u,w)=O(l
2
)with
p(u,w)≤1
m k logn
and|S B|=b(l)=O(l k−1)≤l×O(m k−2).
4.DECENTRALIZED ROUTING
Kleinberg proved that the expected number of links used to route from a source s to a destination t(Kleinberg’s de-livery time)by the greedy algorithm is O(log2n)for the two-dimensional case[14]5.We will prove that indeed the greedy algorithm takes expectedΩ(log2n)delivery time for the general k-dimensional model and suggest other variants with better delivery time.Let GA denote the greedy al-gorithm.For a node v we call N(v)the next node found by GA when we are at v(initially,v=s)and seeking a path to the destination node t.Define random variable δ(v)=d(v,t)−d(N(v),t),which characterizes the speed that GA converges to t.The following lemma,which is ac-tually a re-statement of fact5,establishes a probabilistic bound for this speed.
Lemma6.For each k≥1(the dimension),there ex-ists a constantˆc such that,for a graph from K(k,n,p,q) or K∗(k,n,p,q),any two nodes v and t,and any integer
1<m<d(v,t),Pr[δ(v)=m]≤ˆc
mlogn ×min{1,d(v,t)−m
m
}.
The following theorem shows that for the majority of s−t pairs,Kleinberg’s delivery time can not be o(log2n),there-fore we have the tight bound ofθ(log2n)for the greedy al-gorithm.
5It is straightforward to generalize his proof for the general k-dimensional model
Theorem7.For any constant c1>0,there exists a con-stant c2>0such that,for any two nodes s and t on a graph
from K∗|K(k,n,1,1)with d(s,t)>c1n,Kleinberg’s delivery dpoints s and t is greater than c2log2n with probability at least0.5,when n is large enough6.
Proof.Consider a series of random trials,where,given a node v,wefind N(v)the node for the next trial.De
fine random variable X v=d(v,t),which reflects the ratio between the distance to t before and after such a trial.Let {v1=s,v2,...,v k}be the nodes on a run of GA after k such steps(thus v i+1=N(v i)).
k−1
i=1
X v
i
reflects the ratio between the distance at the initial and at the current state. To make this ratio always well-defined,if N(v)=t we define X v=d(v,t)and if v=t we define X v=1.
It is clear that1≤
k−1
i=1
X v
i
≤d(s,t)and we need to analyze the expected k to have
k−1
i=1
X v
i
=d(s,t)(indicat-ing that d(v k,t)≤1,so GA is about tofinish),or
Z v
i
= ln(d(s,t))where Z v=lnX v.From lemma6,we can prove that E[Z v]<c/log n for some constant c(proved below),if d(v,t)≥ln n.To get rid of the condition,d(v i,t)>ln n,we modify GA a bit,so whenever we get to a node v within a distance ln n from d(v,t)≤ln n),set N(v)=t and redefine Z v=0(this will not weaken ou
r proof since when we get there we can simply walk to t by at most lnn local links).Thus we always have E[Z v
i
]≤c/lnn,∀i=1..k. Consider Z k=Z v
1
+Z v
2
+...+Z v
k
.If we reach t with less than k steps,or say,v j=t for some j<k,then as mentioned before,we have Z v
j
=Z v
j+1
=...=Z v
k−1
=0. We now show that
P r[Z k<M]≥0.5(1) where k= ln2n and M=ln(d(s,t))≥ln n+ln c1−ln(ln n).
Since E[Z v
i
]<c/ln n,we have
E[Z k]≤k×
c
ln n
lnn
4
<0.5M
for n large enough.Thus,P r[Z k≥M]<0.5since otherwise E[Z k]≥0.5M.Hence,P r[Z k<M]≥0.5.From(1), choose c2= 1
4c
then at least half of the time,GA can not finish with less than c2log2n steps;the theorem follows. Now,we show that there exists a constant c such that E[Z v]≤c/ln n if d=d(v,t)≥ln n.Define the function f(i)=ln(d
d−i
);note that Z v=ln(d
d−δ(v)
).We need to show that
d
i=1
f(i)Pr[δ(v)=i]≤c/ln n for some constant c. First we have
d/2
i=1
f(i)Pr[δ(v)=i]≤
1
d−1
+
c
ln n
d−1
i=d/2
1/i≤
c +1
ln n
for some constant c .Note that we have Pr[δ(v)=i]≤c
i ln n
from lemma6and f(i)≤i
(d−i)
from the common
fact that ln(1+x)<x,∀x>−1.Similarly,we have  d
connect下载
i=d/2
f(i)Pr[δ(v)=i]≤c
ln n
for some constant c  ,us-
ing Pr[δ(v)=i]≤c  (d−i)
i2ln n
from lemma6.Now,choose c=c +c  +
1.
6Note that we can actually make the probability above an arbitrarily high constant(<1),but we use0.5for simplicity.
4.1Algorithms for improved delivery time
We now consider variants of Kleinberg’s greedy algorithm which use additional knowledge of the graph to improve ex-pected path length.Our basic algorithm operates on graphs from class K ∗(n,1,1)but it can be easily extended to more general classes.We assume that each node u knows the long-range links of the log n neighbor nodes closest to u in the grid.Call W u ,the set of these neighbors and their long range contacts,the view at u .During a basic step of routing,if u is the current message router,we go next to v ,the node in W u closest to t (we may need to follow several local links and,possibly one final random link to v ).Once we reach v ,it becomes the current node.
In Kleinberg’s algorithm W u includes only the nodes in-cident to links from u (at most 5in a two-dimensional grid)and a routing decision is made at each node on the path.In our algorithm,once we make a decision for a basic step (at the initial message router)we follow the sub-path to the next node where a decision is made.When we are at an intermediate node in a basic step,we follow local links (up,down,left or right)except for the final link which may be a random long-range link).We can describe
this sub-path using only O (log log n )extra bits to specify the last node on the sub-path reached by local links only.Alternately,we can use 2√
log n bits to describe the sequence of local moves (two bits per local move).
We now adapt Kleinberg’s analysis to show that the ex-pected number of links used in this algorithm is O (log 3/2n ).We say that node u is in phase i if 2i <d (u,t )≤2i +1.Kleinberg proved that if u is in phase i ,and v is u  s long range contact,the probability that v is in some phase j <i is proportional to 1/log n (for i >log log n ).Let X i be a random variable recording the number of basic steps our algorithm executes from nodes in phase i .
Lemma 8.For i >log log n ,E [X i ]<c for a constant c .Proof.Each basic step considers at least log n/2new long-range links (since at most half of the current message holder’s set W u overlaps all previous sets),and by Klein-berg’s result has at least a constant probability of finding a link to a node within distance 2i of t .
Theorem 9.Our improved algorithm visits expected O (log 3/2n )nodes in K ∗(n,1,1).
Proof.Using the lemma above,the expected number of basic steps till we get within distance log n is O (log n )(constant per phase for log n phases).Each basic step visits at most √log n nodes
(for a 2-D grid),so the total length is O (log 3/2n ).
More careful work (using tail inequalities)gives a slightly stronger result:a longest path found has length O (log 3/2n )with overwhelming probability.The underlying intuition is to think of doing a chain of basic steps until we have about log n which advance to a new phase.
If we use the same algorithm on classes K ∗/K (n,p,q )where p,q ≥1,it is relatively easy to show a better bound
of O (log 3/2
n p √q ).So,the bound is just O (log n )if for instance,p =Ω(log 1/2n )and q is a constant,or p is a constant and q =Ω(log n ).It is also easy to extend our algorithm to K ∗/K (k,n,p,q ).For p =1,q =1,we obtain the upper bound O (log 1+1/k n ),which is close to the ideal O (log n )when k is large.
A related approach is analyzed in [12]who show that the O (log 1+1/k n )bound for k -dimensions can be achieved by an oblivious algorithm.They also show that this bound is tight for a range of greedy algorithms (even with much more information).
5.EXPECTED DIAMETER 5.1
Basic approach
For simplicity we will first consider Kleinberg’s basic 2-D grid,but with four long-range links per node.We show that the expected diameter of a graph from K ∗(n,1,4)is Θ(log n )and then extend our approach to other and more general classes of graphs.
For a source s and a destination t =s chosen arbitrarily from V ,we will show that there is an O (log n )length path from s to t with overwhelming probability.The basic idea is to construct two sets of nodes S and T which are each of size Θ(n log n )such that:all the nodes in S can be reached from s by a directed path of length O (log n )of long-range links;all the nodes in T have a directed path of length O (log n )to t using long-range links.It is then simple to show that with overwhelming probability,there is a directed arc from a node in S to a node in T ,thus there is an O (log n )length path from s to t .The bulk of the proof is showing that appropriate sets S and T exist with high probability.Define χ(u )with u ∈V as the set of nodes v such that (u,v )is a long-range link;also χ(A )= u ∈A χ(u ),for any A ⊂V .Our basic idea is to ‘grow two trees’with roots from s and t to create S and T .We construct a chain of disjoint
subsets {S k }µ
k =0with S 0=B r (s )where r =r 0√log n for
some constant r 07.Let C k =
k i =0S i and define S k +1=χ(S k )−C k .Thus,S k +1is built by iteratively applying χon elements of S k and taking only ‘fresh’nodes,which have not been in any preceding subsets.
Later we construct a similar subset chain {T k }νk =0to cre-ate the t −tree.The probability of success (finding sets S,T which connect)can be made as close to one as needed by choosing r 0(and thus S 0)sufficiently large.Thus we get an O (log n )bound on the expected diameter of a graph in K ∗(n,1,4).
A worst-case analysis with a ball experiment .
We will show that the subset chain almost surely grows exponentially in size:there is a constant γ>1such that Pr[|S k +1|/|S k |>γ]is almost one when c log n <|S k |<αn log n for suitable constants c and α.Our goal is to finally form a set S µsuch that |S µ|≥αn log n and µ=O (log n ).For a fixed γ,consider the following S-construction process :starting with S 0we successively create S 1,S 2,...until either we get a set S µwith |S µ|≥αn log n (we succeed)or we have some |S k +1|/|S k |≤γand we fail.As soon as either case happens we stop the process.Note that if we succeed we do so in O (log n )
χ(u )steps,so µ=O (log n )and S µis our desired set S .The following analysis shows that our S -construction succeeds with high probability.
The main concern is how many fresh nodes we get from χ(S k )to include in S k +1.We fix an order to scan S k and for each u ∈S k ,a node v ∈χ(u )is fresh if it has not been included in a subset G ,which contains the current tree (the union of the current S k +1and all the preceding S i ,i ≤k ).7
Thus,there are θ(log n )nodes in S 0all reachable from s by O (√log n )local links.
To analyze this we make the following crucial observation. We consider this experiment E(u,G):we generate a random link8from u and consider if u/∈G;let X(u,G)denote the indicator random variable of this happening.We do this ex-periment4times for each u∈S k and|S k+1|will be the sum of these4|S k|random variables X(u,G);note that these variables are not identical since G keeps growing larger.Let G k denote the whole process.Note that Pr[X(u,G)=1] depends on both the size and shape of G.Thus we ask:if we move the elements in G around,how would we minimize Pr[X(u,G)=1]?The nature of the inverse second power distribution makes it clear:this can be done by moving ele-ments of G closer to u;in fact,Pr[X(u,G)=1]is minimized (for afixed size for G)when G is like a ball with u as its center.
Thus we can lower bound|S k+1|/|S k|using the follow-ing worst-case setting for selecting a link from u.
Let C=Θ(n log n)be the maximum size set C j we can ever have in our S-construction process.Let H be a ball with cen-ter s and size C(or the next larger size to make a full ball).Since H is chosen such that|H|≥|C k+1|(i.e.always ≥|G|)and has the worst possible shape,Pr[X(u,G)=1]≥Pr[X(s,H)=1]for any pair u,G used in our S-construction process.Since by fact2,E[X(s,H)]>βfor anyfixed β<0.5when n is large enough,we conclude:
Fact10.For anyfixedβ<0.5,E[X(u,G)]>βfor each u,G pair in our S-construction.
We now show that each new set is successfully constructed with high probability:
Lemma11.For anyη>0,andγ∈(1,2)we can choose a constantˆc such that,for k≥0,if|S k|≥ˆc ln n and |C k+1|=O(n log n)then Pr[|S k+1|/|S k|>γ]=1−O(n−η) for n large enough.
Proof.Let m=|S k|,so|S k+1|is the sum of4m X(u,G) indicator R.V’s.By fact10,we treat this as the sum of4m independent Bernoulli random variables each with expecta-tion≥β.Applying Chernoff’s inequality,for0<δ<1,
Pr[|S k+1|≤(4mβ)(1−δ)]≤e−2mβδ2≤n−2ˆcβδ2≤n−ηfor a proper choice ofˆc.Thus,Pr[|S k+1|/|S k|>γ]=1−O(n−η)forγ=4β(1−δ).
Indeed we can always chooseδ>0small enough so that γ=4β(1−δ)can be arbitrarily close to4β,and thus,ar-bitrarily close to2asβ=0.5−o(1).Once a value from (1,2)is chosen forγ(and so0<δ<1and0<β<0.5 are chosen accordingly),given anyη>0,we can chooseˆc such that when m≥ˆc log n we have Pr[|S k+1|/|S k|>γ]= 1−O(n−η
).
In fact,by some simple computation,we can see thatˆc>η/(1−γ)2is a sufficient condition tofind suchˆc.So,if we haveη=6(9),γ=1.8(byβ=0.48andδ=0.0625for example)then we can chooseˆc=601.
Thus,givenfixed constantsηandγwe can chooseˆc as above,so if|S0|≥ˆc log n,we can expect the cardi-nality series{|S i|}to grow exponentially before reaching 8Using inverse second power distribution
9Later,in ourfinal step,we use this value to prove the two trees,s-tree and t-tree,connect with overwhelming proba-bility
or exceeding the thresholdαnlogn during thefirst g+1 terms,where g= logγ(αn/ˆc) =θ(log n).By iterating lemma11appropriately(in each step of growing S i+1from S i when|S i|<αnlogn),we can show that almo
st surely |Sµ|≥αnlogn for someµ≤g=θ(log n).Note that Sµcontains only‘fresh’nodes such that we have not yet con-sidered their random links.We now prove that we almost surely get a large s-tree within O(log n)steps.
Lemma12.For any given node s∈V,anyθ>0and anyα>0,by constructing s’s subset chain as above,we will obtain subset Sµwhere any node in Sµcan be reached from s by an O(log n)path and Pr[|Sµ|≥αnlogn]=1−O(n−θ).
Proof.Consider a successful S-construction where we have a succession of steps where|S k+1|/|S k|>γuntil we reach sizeαn log n.From lemma11we can pickη>θso that Pr[|S k+1|/|S k|>γ]>1−O(n−η)for each step.
The probability that the S-construction process succeeds is greater than the probability we get g consecutive success-ful growth steps with g= logγ(αn/ˆc) =θ(log n)Thus Pr[S-construction process succeeds]≥
1−c1n−η
g
≥1−gc1n−ηfor some constant c1>0(using a basic calculus fact: (1+x)n≥1+nx for any x>−1and n≥1).Sinceθ<ηand,g=O(log n),it is easy to see that gc1n−η=O(n−θ) and hence,we create the desiredfinal set with probability ≥1−O(n−θ).
Another ball experiment to analyze‘t-tree’.
We now consider a tree of nodes with paths to t.We now use a functionˆχwhich,given an input node u,outputs the nodes with a random link to u.As before,we construct a subset chain{T k}ν0by having T k+1=ˆχ(T k)−ˆC k where ˆC
k
=
k
i=0
T i.Thus,we include into T k+1all the fresh nodes which have a random link to any node in T k.We can use much the same approach as before,but with some modifications and additional details.We still use
a state-variable G to denote the set of all nodes in the tree we have reached so far:G=ˆC k if we have justfinished thefirst k subsets,otherwise G is the union ofˆC k and the developing T k+1.Let G=V−G,the set of nodes not in the tree yet. Also,for each random link from each node u∈V we assign a label i=1..4.
We now look closer at processˆG k,the growing step of T k+1from T k,which is more complicated than the coun-terpart in the s-tree.LetˆE(u,i,G)denote an experiment which has each node w∈G look at its random link labeled i,and if this link hits u we add w to G(for simplicity we may also useˆE u instead).ProcessˆG k simply repeats the ˆE(u,i,G)experiment10for each node u∈T
k
and for each i=1..4.As with the s−tree,we now consider an artifi-cial experiment which should be‘outperformed’by the real ˆG
k
but is easier to analyze.Let H be a ball centered at u with size at leastˆC,the maximum size we allow a set to get in our T-construction process.LetˆX(u,H)be a ran-dom variable which is one if at least one ra
ndom link from a node in V−H hits u,otherwise it is zero.In our T-construction process,letˆX(u,i,G)be an indicator random variable recording whether some node in¯G has a link with label i to u.We now argue that P r[ˆX(u,i,G)=1]>.39 (the bound on a link into a ball in Fact2).
10Note that G is changing as we discover new fresh nodes and we canfix the order to scan T k initially.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。