- Email: [email protected]

arXiv:1201.4422v1 [math.PR] 21 Jan 2012

Jim Pitman∗ and Nathan Ross University of California, Berkeley

Abstract We discuss a characterization of the centered Gaussian distribution which can be read from results of Archimedes and Maxwell, and relate it to Charles Stein’s well-known characterization of the same distribution. These characterizations fit into a more general framework involving the beta-gamma algebra, which explains some other characterizations appearing in the Stein’s method literature. 1

CHARACTERIZING THE GAUSSIAN DISTRIBUTION

One of Archimedes’ proudest accomplishments was a proof that the surface area of a sphere is equal to the surface area of the tube of the smallest cylinder containing it; see Figure 1. Legend has it that he was so pleased with this result that he arranged to have an image similar to Figure 1 inscribed on his tomb.

Figure 1: An illustration of the inscription on Archimedes’ tomb. ∗

Research supported in part by NSF grant DMS-0806118

1

More precisely, in the work “On the Sphere and Cylinder, Book I” as translated on Page 1 of [2], Archimedes states that for every plane perpendicular to the axis of the tube of the cylinder, the surface areas lying above the plane on the sphere and on the tube are equal. See Figure 2 for illustration and also the discussion around Corollary 7 of [1].

Figure 2: The surface area of the shaded “cap” of the sphere above a plane is equal to the striped surface area on the tube of the cylinder above the same plane. In probabilistic terms, if a point is picked uniformly at random according to surface area on the unit sphere in three dimensions, then its projection onto any given axis having origin at the center of the sphere is uniformly distributed on the interval (−1, 1), independent of the angular component in the plane perpendicular to that axis. Formally, we have the following result. Proposition 1.1. If V is uniformly distributed on the interval (−1, 1) and Θ is uniformly distributed on the interval (0, 2π) and is independent of V , then p p V, 1 − V 2 cos(Θ), 1 − V 2 sin(Θ) is uniformly distributed on the surface of the two dimensional sphere of radius one. In this article, we take Proposition 1.1 as a starting point for a discussion of characterizations of the centered Gaussian distribution which arise in Stein’s method of distributional approximation. This discussion culminates in Theorem 1.6 at the end of this section. We then generalize some of these 2

results in Section 2 to obtain the characterization of the gamma distribution found in Proposition 2.1, and also mention an analog of Theorem 1.6 for the exponential distribution. We conclude in Section 3 with a discussion of some related literature. To move from Archimedes’ result above to characterizing the Gaussian distribution, we state the following result which was first realized by the astronomer Herschel and made well known by the physicist Maxwell in his study of the velocities of a large number of gas particles in a container; see the introduction of [6]. Proposition 1.2. Let X = (X1 , X2 , X3 ) be a vector of independent and identically distributed (i.i.d.) random variables. Then X1 has a mean zero Gaussian distribution if and only if for all rotations R : R3 → R3 , RX has the same distribution as X. Propositions 1.1 and 1.2 are related by the following observations. It is clear that if X is an R3 /{0} valued random vector such that RX has the same distribution as X for all rotations R, then X/kXk is a rotation invariant distribution on the surface of the two dimensional unit sphere p 2 and is independent of kXk := X1 + X22 + X32 . Since the unique rotation invariant distribution on the surface of a sphere of any dimension is the uniform distribution (Theorem 4.1.2 of [6]), the propositions of Archimedes and Herschel-Maxwell suggest the following characterization of mean zero Gaussian distributions; we provide a proof and discussion of generalizations in the Appendix. Proposition 1.3. Let X = (X1 , X2 , X3 ) be a vector of i.i.d. random variables. Then X1 has a mean zero Gaussian distribution if and only if for V uniform on (−1, 1) and independent of X, q d X1 = V X12 + X22 + X32 . d

Here and in what follows, = denotes p equality in distribution of two random variables. The distribution of X12 + X22 + X32 , where X1 , X2 , X3 are independent standard normal variables, is referred to as the Maxwell or Maxwell-Boltzman distribution; see page 453 of [12]. Proposition 1.3 characterizes centered Gaussian distributions as the one parameter scale family of fixed points of the distributional transformation which p takes the distribution of a random variable X to the distribution of V X12 + X22 + X32 , where X1 , X2 , X3 are i.i.d. copies of X, and V is uniform 3

on (−1, 1) independent of (X1 , X2 , X3 ). Such characterizations of distributions as the unique fixed point of a transformation are often used in Stein’s method for distributional approximation (see [20] for an introduction). In the case of the Gaussian distribution, these transformations are put to use through Stein’s Lemma. Lemma 1.4 (Stein’s Lemma). [21] A random variable W has the mean zero, variance one Gaussian distribution if and only if for all absolutely continuous functions f with bounded derivative,

Ef 0(W ) = EW f (W ). We can relate the characterizations provided by Proposition 1.3 and Lemma 1.4, but first we need the following definition. Definition 1.1. Let X be a random variable with distribution function F and such that µα := E|X|α < ∞. We define F (α) , the α-power bias distribution of F , by the relation dF (α) (x) =

|x|α dF (x) , µα

and we write X (α) for a random variable having this distribution. Otherwise put, X (α) has the α-power bias distribution of X if and only if for every measurable function f such that E|X|α |f (X)| < ∞, f (X) Ef (X (α)) = E|X| E|X|α . α

(1.1)

Taking α = 1 and X > 0, X (1) has the size-biased distribution of X, a notion which frequently arises in probability theory and applications [3, 5]. We can now state and prove the following result which sheds some light on the relationship between Proposition 1.3 and Lemma 1.4. Lemma 1.5. If W is a random variable with finite second moment and f is an absolutely continuous function with bounded derivative, then for V uniform on the interval (−1, 1) and independent of W , 2EW 2 Ef 0 (V W (2) ) = EW f (W ) − EW f (−W ). Proof. The lemma is implied by the following calculation Z 1 Ef 0(V W (2)) = 12 E f 0 (uW (2) )du −1 4

(1.2)

# " 1 f (W (2) ) − f (−W (2) ) = E 2 W (2) =

EW f (W ) − W f (−W ) , 2EW 2

where in the final equality we use (1.1). We now have the following main result for the Gaussian distribution. Theorem 1.6. Let W be a random variable with finite second moment. The following are equivalent: 1. W has the standard normal distribution. 2. For all absolutely continuous functions f with bounded derivative,

Ef 0(W ) = EW f (W ). 3.

(1.3)

EW 2 = 1 and W =d V W (2), where V is uniform on (−1, 1) and inde-

pendent of W (2) .

Proof. The equivalence of the first two items of the proposition is (Stein’s) Lemma 1.4 above. The fact that Item 1 implies Item 3 follows from Proposition 1.3 above coupled with the simple fact that for X1 , X2 , X3 i.i.d. standard normal ran2 dom variables, the density of (X12 + X22 + X32 )1/2 is proportional to x2 e−x /2 (2) (that is, (X12 + X22 + X32 )1/2 has the same distribution as X1 ). d

Finally, we show Item 2 follows from Item 3. If W = V W (2) and EW 2 = 1, then using Lemma 1.5 we find that for functions f with bounded derivative,

Ef 0(W ) = Ef 0(V W (2)) = 21 (EW f (W ) − EW f (−W )) = EW f (W ), where the last equality follows from the assumptions of Item 3 which imply W has the same distribution as −W . Remark 1.2. The equivalence of Items 1 and 3 is essentially the content of Proposition 2.3 of [8], which uses the concept of the “zero-bias” transformation of Stein’s method, first introduced in [11]. For a random variable W with mean zero and variance σ 2 < ∞, we say that W ∗ has the zero-bias distribution of W if for all f with E|W f (W )| < ∞, σ 2 Ef 0 (W ∗ ) = EW f (W ). 5

We think of the zero-bias transformation acting on probability distributions with zero mean and finite variance, and Stein’s Lemma implies that this transformation has the centered Gaussian distribution as its unique fixed point. Proposition 2.3 of [8] states that for a random variable W with support symmetric about zero with unit variance, the transformation W → V W (2) provides a representation of the zero-bias transformation. The equivalence of Items 1 and 3 of the theorem follows easily from these results. 2

BETA-GAMMA ALGEBRA

The equivalence between Items 1 and 3 in Theorem 1.6 can be generalized as follows. For r, s > 0, let Gr , and Br,s denote standard gamma and 1 beta random variables having respective densities Γ(r) xr−1 e−x , x > 0 and Γ(r+s) r−1 (1 Γ(r)Γ(s) y

− y)s−1 , 0 < y < 1, where Γ denotes the gamma function.

Proposition 2.1. Fix p, r, s > 0. A non-negative random variable W d has the distribution of c Gpr for some constant c > 0 if and only if W = p W (s/p) , where Br,s is independent of W (s/p) . Br,s Remark 2.1. The equivalence in Items 1 and 3 of Theorem 1.6 follows by taking p = r = 1/2, s = 1 in Proposition 2.1 and using the well known fact d

that for Z having the standard normal distribution, Z 2 = 2G1/2 . The proof of Proposition 2.1 uses the following result. Lemma 2.2. Let α, β > 0. If X > 0 is a random variable such that

EX α < ∞, then

d

(X (α) )β = (X β )(α/β) . Proof. By the definition of α/β-power biasing, we only need to show that

EX αEf ((X (α))β ) = EX αf (X β )

(2.1)

for all f such that the expectation on the left hand side exists. By the definition of α-power biasing, we have that for g(t) = f (tβ ),

EX αEg(X (α)) = EX αg(X), which is (2.1).

6

Proof of Proposition 2.1. The usual beta-gamma algebra (see [9]) implies d

that Gr = Br,s Gr+s where Br,s and Gr+s are independent. Using the eled

(s)

mentary fact that Gr+s = Gr , we find that for fixed r, s > 0, Gr satisfies d (s) Gr = Br,s Gr . Now applying Lemma 2.2 to Gr with α = s and β = p, we d

p have that W = Gpr satisfies W = Br,s W (s/p) and the forward implication d

now follows after noting that (cX)(α) = cX (α) d

p Now, assume that W = Br,s W (s/p) for fixed p, r, s > 0 and we show that d

W = c Gpr for some c > 0. First, note by Lemma 2.2, that if X = W 1/p , then d

X = Br,s X (s)

(2.2)

d

and we will be done if this implies that X = Gr . Note that by writing X (s) , we have been tacitly assuming that EW s/p = EX s < ∞, which implies that E(Br,s X (s) )s < ∞ so that using the definition of power biasing yields EX 2s < ∞. Continuing in this way we find that EX ks < ∞ for all k = 1, 2, . . . and thus that EX p < ∞ for all p > s. Moreover, writing ak := EX ks, and taking expectations in (2.2) after raising both sides to the power k, we have ks ak = EBr,s

ak+1 , a1

where we have again used the definition of power biasing. We can solve this recursion after noting that for α > −r, Γ(r + α)Γ(r + s) α EBr,s = , Γ(r + α + s)Γ(r) to find that for k = 0, 1, . . . , ak =

a1 Γ(r) Γ(r + s)

k

Γ(r + sk) . Γ(r)

For any value of a1 > 0, it is easy to see using Stirling’s formula that the sequence (ak )k>1 satisfies Carleman’s condition n X

−1/2k

a2k

→ ∞, as n → ∞,

k=1

7

so that for a given value of a1 , there is exactly one probability distribution having moment sequence (ak )k>1 (see the remark following Theorem (3.11) in Chapter 2 of [10]). Finally, it is easy to see that the random variable X s :=

a1 Γ(r) s G Γ(r + s) r

has moment sequence (ak )k>1 . 2.1

Exponential Distribution

The exponential distribution has many characterizing properties, many of which stem from its relation to Poisson processes. For example, by superimposing two independent Poisson processes into one, we easily find that if Z1 and Z2 are independent rate one exponential variables, then 2min{Z1 , Z2 } is also a rate one exponential (this is in fact characterizing as shown in Theorem 3.4.1 of [6]). For our framework above, we use the memoryless property of the exponential distribution in the context of renewal theory. In greater detail, for any non-negative random variable X, wePdefine the renewal sequence generated from X as (S1 , S2 , . . .), where Si = ik=1 Xk and the Xk are i.i.d. copies of X. For a fixed t > 0, the distribution of the length of the interval [SKt , SKt +1 ] containing t and the position of t in this interval depend on t and the distribution of X in some rather complicated way. We can remove this dependence on t by starting the sequence in “stationary” meaning that we look instead at the sequence (X 0 , X 0 + S1 , . . .), where X 0 has the limiting distribution of SKt +1 − t as t goes to infinity; see Chapter 5, Sections 6 and 7.b of [13]. If X is a continuous distribution with finite mean, then the distribution of X 0 is the size-biased distribution of X times an independent variable which is uniform on (0, 1) [13]. Heuristically, the memoryless property which characterizes the exponential distribution (Chapter 12 of [4]) implies that the renewal sequence generated by an exponential distribution is stationary (that is, X and X 0 have the same distribution) and vice versa. The following result implies this intuition is correct. Theorem 2.3. [16] Let W be a non-negative random variable with finite mean. The following are equivalent: 1. W has the exponential distribution with mean one. 2. For all absolutely continuous functions f with bounded derivative,

Ef 0(W ) = Ef (W ) − f (0). 8

3.

EW = 1 and W =d U W (1), where U is uniform on (0, 1) and independent of W (1) .

Similar to the case of the normal distribution, the crucial link between Items 2 and 3 of Theorem 2.3 is provided by the following lemma; the proof is similar to that of Lemma 1.5. Lemma 2.4. If W is a non-negative random variable with finite mean and f is an absolutely continuous function with bounded derivative, then

EW Ef 0(U W (1)) = Ef (W ) − f (0). Proof of Theorem 2.3. The equivalence of Items 1 and 3 is a special case of Theorem 2.3 with r = s = p = 1, and the equivalence of Items 2 and 3 can be read from Lemma 2.4 (note in particular that Item 2 with f (x) = 1 implies that EW = 1). Remark 2.2. For a non-negative random variable W with finite mean, the transformation W → U W (1) is referred to in the Stein’s method literature as the “equilibrium” transformation, first defined in this context in [16], where Theorem 2.3 is also shown. Due to the close relationship between the exponential and geometric distributions, it is not surprising that there is a discrete analog of Theorem 2.3 with the exponential distribution replaced by the geometric; see [17] for this discussion in the context of Stein’s method. 3

PROOF OF PROPOSITION 1.3 AND DISCUSSION

Proof of Proposition 1.3. We will show that for n > 2 and Y1 , . . . , Yn nond negative i.i.d. random variables, Y1 = cG1/(n−1) for some c > 0 if and only if d

Y1 = B1/(n−1),1 (Y1 + · · · + Yn ),

(3.1)

where B1/(n−1),1 is independent of (Y1 , . . . , Yn ), and Ga , Ba,b are gamma and beta variables as defined above. The proposition then follows from this fact d with n = 3 after noting that V 2 = B1/2,1 and if X has a mean zero and d

variance one normal distribution, then X 2 = 2G1/2 . The forward implication is a consequence of Proposition 2.1 coupled d with the fact that Ga+b = Ga + Gb , where Ga and Gb are independent. To

9

d

establish the result we assume (3.1) and show Y1 = cG1/(n−1) . Since we assume that Y1 is non-negative, we define the Laplace transform ϕ(λ) = Ee−λY1 , λ > 0. By conditioning on the value of B1/(n−1),1 in (3.1), we find for λ > 0, ϕ(λ) = Eϕ(B1/(n−1),1 λ)n Z 1 1 = u−(n−2)/(n−1) ϕ(uλ)n du n−1 0 Z λ 1 = t−(n−2)/(n−1) ϕ(t)n dt, (n − 1)λ1/(n−1) 0 where we have made the change of variable t = uλ in the last equality. We can differentiate the equation above with respect to λ which yields Z λ−n/(n−1) λ −(n−2)/(n−1) 1 ϕ0 (λ) = − t ϕ(t)n dt + ϕ(λ)n , 2 (n − 1) (n − 1)λ 0 n −ϕ(λ) + ϕ(λ) ϕ0 (λ) = . (3.2) (n − 1)λ Thus, we find that ϕ satisfies the differential equation (3.2) with boundary condition ϕ(0) = 1. By computing the derivative using (3.2) and using that 0 < ϕ(λ) 6 1 for λ > 0, we find that for some constant c > 0, 1 − ϕ(λ)n−1 = c, λ > 0. λϕ(λ)n−1 Solving this equation for ϕ(λ) implies ϕ(λ) = (1 + cλ)−1/(n−1) , which is the Laplace transform of cG1/(n−1) , as desired. The proof of Proposition 1.3 and the beta-gamma algebra suggest the following conjecture. Conjecture 3.1. Let n > 2 and Y = (Y1 , Y2 , . . . , Yn ) be a vector of i.i.d. random variables. Then Y1 is equal in distribution to cGa for some constant c > 0 if and only if for V = Ba,(n−1)a independent of Y, d

Y1 = V (Y1 + Y2 + . . . + Yn ). 10

(3.3)

The forward implication of the conjecture is an easy consequence of the following beta-gamma algebra facts: for Ga , Gb , and Ba,b independent, d

d

Ba,b Ga+b = Ga , and Ga + Gb = Ga+b . Conversely, assuming (3.3), it is possible to follow the proof of Proposition 1.3, which leads to an integral equation for the Laplace transform of Y1 . It is easy to verify that the Laplace transform of the appropriate gamma distribution satisfies this equation, so it is only a matter of showing the integral equation has a unique scale family of solutions. In the case a = 1/(n − 1) the integral equation has a simpler form from which the required uniqueness follows from the proof of Proposition 1.3 above. In the general case, we do not have an argument for the uniqueness of the solution. However, under the assumption that Y1 has all positive integer moments finite, the conjecture follows after using (3.3) to obtain a recursion relation for the moments which, up to the scale factor, determines those of a gamma distribution with the appropriate parameter. Conjecture 3.1 is very similar to Lukacs’ characterization of the the gamma distribution [14] that positive, non-degenerate, independent variables X, Y have the gamma distribution if and only if X +Y and X/(X +Y ) are independent. However, it does not appear that this result can be used to show the difficult implication of the conjecture. Note also that Lukacs’ result also characterizes beta distributions as the only distributions which can be written as X/(X +Y ) independent of X +Y for positive, non-degenerate, independent variables X, Y . Thus, a question related to our conjecture is that if (3.3) holds for independent variables Y1 , . . . , Yn and V , does this imply that V has a beta distribution? Conjecture 3.1 is connected to the observation of Poincar´e (see the introduction of [15]) that the coordinates of a point uniformly chosen on the √ (n−1) dimensional sphere of radius n are asymptotically distributed as independent standard Gaussians. Analogous to the discussion in the introduc√ tion, we can realize these uniformly distributed points as nR−1 (X1 , . . . , Xn ), where X1 , . . . , Xn are independent standard normal variables and R = (X12 +· · ·+Xn2 )1/2 . Squaring these coordinates, Poincar´e’s result implies that d

nX12 /(X12 +· · ·+Xn2 ) is asymptotically distributed as X12 . Since X12 = 2G1/2 , taking the limit as n → ∞ on the right side of (3.3) with a = 1/2 yields a related fact. The forward implication of Proposition 1.3 is evidenced also by creation of a three-dimensional Bessel process by conditioning a one-dimensional Brownian motion not to hit zero. Indeed, a process version of Proposition 1.3 is involved in the proof of the “2M − X” theorem provided in [19]; see 11

Section 2 and especially Section 2.3 of [7]. More generally, process analogs of the beta-gamma algebra can be found in Section 3 of [7]. Some extensions of the characterizations discussed in this article to more complicated distributions can be found in the recent work [18]. REFERENCES

[1] T. M. Apostol and M. A. Mnatsakanian. A fresh look at the method of Archimedes. Amer. Math. Monthly, 111(6):496–508, 2004. [2] Archimedes and T. L. Heath. The works of Archimedes. Cambridge University Press, 1897. Translated by T.L. Heath. [3] R. Arratia and L. Goldstein. Size bias, sampling, the waiting time paradox, and infinite divisibility: when is the increment independent? http://arxiv.org/abs/1007.3910, 2011. [4] N. Balakrishnan and A. P. Basu, editors. The exponential distribution. Gordon and Breach Publishers, Amsterdam, 1995. Theory, methods and applications. [5] M. Brown. Exploiting the waiting time paradox: applications of the size-biasing transformation. Probab. Engrg. Inform. Sci., 20(2):195– 230, 2006. [6] W. Bryc. The normal distribution: characterizations with applications, volume 100 of Lecture Notes in Statistics. Springer-Verlag, New York, 1995. [7] P. Carmona, F. Petit, and M. Yor. Beta-gamma random variables and intertwining relations between certain Markov processes. Rev. Mat. Iberoamericana, 14(2):311–367, 1998. [8] L. H. Y. Chen, L. Goldstein, and Q.-M. Shao. Normal Approximation by Steins Method. Probability and its applications. Springer, 2010. [9] D. Dufresne. Algebraic properties of beta and gamma distributions, and applications. Adv. in Appl. Math., 20(3):285–299, 1998. [10] R. Durrett. Probability: theory and examples. Duxbury Press, Belmont, CA, second edition, 1996. [11] L. Goldstein and G. Reinert. Stein’s method and the zero bias transformation with application to simple random sampling. Ann. Appl. Probab., 7(4):935–952, 1997. 12

[12] N. L. Johnson, S. Kotz, and N. Balakrishnan. Continuous univariate distributions. Vol. 1. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons Inc., New York, second edition, 1994. A Wiley-Interscience Publication. [13] S. Karlin and H. M. Taylor. A first course in stochastic processes. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London, second edition, 1975. [14] E. Lukacs. A characterization of the gamma distribution. Ann. Math. Statist., 26:319–324, 1955. [15] H. P. McKean. Geometry of differential space. Ann. Probab., 1:197–206, 1973. [16] E. Pek¨ oz and A. R¨ ollin. New rates for exponential approximation and the theorems of R´enyi and Yaglom. Ann. Probab., 39(2):587–608, 2011. [17] E. Pek¨ oz, A. R¨ ollin, and N. Ross. Total variation error bounds for geometric approximation. http://arxiv.org/abs/1005.2774, 2010. To appear in Bernoulli. [18] E. Pek¨ oz, A. R¨ ollin, and N. Ross. Degree asymptotics with rates for preferential attachment random graphs. http://arxiv.org/abs/ 1108.5236, 2011. [19] L. C. G. Rogers and J. W. Pitman. Markov functions. Ann. Probab., 9(4):573–582, 1981. [20] N. Ross. Fundamentals of Stein’s method. Probability Surveys, 8:210– 293, 2011. [21] C. Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pages 583–602, Berkeley, Calif., 1972. Univ. California Press.

13

Copyright © 2019 PROPERTIBAZAR.COM. All rights reserved.