The Simple Economics of Optimal Persuasion

The Simple Economics of Optimal Persuasion

The Simple Economics of Optimal Persuasion∗ Piotr Dworczak† and Giorgio Martini‡ April 15, 2017 Abstract We study Bayesian persuasion problems in wh...

740KB Sizes 0 Downloads 3 Views

The Simple Economics of Optimal Persuasion∗ Piotr Dworczak† and Giorgio Martini‡

April 15, 2017

Abstract We study Bayesian persuasion problems in which the Sender’s preferences depend only on the mean of posterior beliefs. In this environment, the economics of optimal persuasion are simple: Given a price schedule for posterior means, the Sender faces a consumer-like choice problem, purchasing posterior means using the prior distribution as her endowment. We propose a verification tool for optimality and characterize the structure of prices that support the optimal solution. Our approach provides a tractable solution method for persuasion problems with infinite state and action spaces, and yields a necessary and sufficient condition on the Sender’s objective function under which the optimal persuasion mechanism can be guaranteed to have a monotone partitional structure.

Keywords: Bayesian Persuasion, duality, Lagrangian, mean-preserving spreads.



This paper was previously circulated under the title “A Duality Approach to Bayesian Persuasion.” We would like to thank Isa`ıas Chaves Villamizar, Matthew Gentzkow, Shota Ichihashi, Emir Kamenica, Paul Milgrom, Michael Ostrovsky, Andy Skrzypacz, and Alexey Suzdaltsev for helpful comments, and Anthony Lee Zhang for discussions that were crucial to this project. † Stanford University, Graduate School of Business, [email protected] ‡ Stanford University, Graduate School of Business, [email protected]

1

Introduction

Bayesian persuasion has become a canonical model of communication with commitment power.1 However, the standard approach based on concavification of the value function has limited power when the state space and action space are large. The concavification method alone is typically not sufficient to characterize the optimal signal, and fails to provide intuition for the structure of the underlying persuasion scheme. To overcome this difficulty, we develop a duality-based approach to Bayesian persuasion under the assumption that the Sender’s preferences only depend on the mean of posterior beliefs.2 We show that this assumption is satisfied in many natural economic environments. In such environments, the economics of optimal persuasion are simple. The Sender’s problem can be interpreted as one of consumer choice: posterior means have prices, and the Sender purchases posterior means that maximize her utility net of prices, subject to two constraints. The first is a budget constraint: the total cost of the chosen posteriors must not exceed the value of the endowment, i.e., the value of the prior distribution under the same prices. Second, the distribution of posterior means has to be feasible, that is, induced by some signal structure. Since the Sender can only garble information, the prior has to be a mean-preserving spread of the chosen distribution of posteriors. The first main result of the paper, Theorem 1, shows that if the Sender maximizes utility given prices, exhausts her budget, and purchases a feasible distribution of posterior means, then this distribution of posterior means is optimal. The result provides a verification tool for candidate distributions of posterior means. Verification requires finding the correct price schedule for posterior means. We characterize the structure of the optimal solution which simplifies the search for the relevant prices. In Theorem 2, we prove that under mild regularity assumptions it is always possible to find a price schedule that rationalizes the optimal persuasion scheme. Persuasion mechanisms used in practice often have a simple structure: information is either fully revealed or adjacent types are pooled together (e.g. coarse ratings used by bond rating agencies). An important question in the communication and persuasion literature concerns the conditions under which such a monotone partitional signal structure is optimal. In Theorem 3, we use our approach to derive a necessary and sufficient condition 1

Key references are Aumann and Maschler (1995) and Kamenica and Gentzkow (2011). Related models are considered by Calzolari and Pavan (2006), Ostrovsky and Schwarz (2010) and Rayo and Segal (2010). 2 This setting has also been considered by Ostrovsky and Schwarz (2010), Kamenica and Gentzkow (2011), Gentzkow and Kamenica (2015), and Kolotilin (2016).

2

on the Sender’s utility function under which a monotone partitional signal is optimal for any prior distribution of the state. Two examples illustrate how our approach can be employed to solve persuasion problems that could not be directly solved using previous methods. Both examples feature a continuum of states and actions. In the first, an agent must be persuaded to exert effort on a project. The agent is rewarded with a fraction of the value of the project but only the principal knows how much the project is worth if successful. We prove that the principal should disclose the project’s value when it is low, and pool high realizations into the lowest signal that induces maximal effort. In the second example, a financial analyst who possesses private information on the profitability of a risky asset wants to persuade an agent to invest in it; the optimal persuasion mechanism has a tractable structure in which the informativeness of the analyst’s recommendation depends on the agent’s degree of risk aversion. In both applications, a simple graphical analysis combined with our main theorem is enough to characterize the optimal persuasion mechanism. Concurrent work by the first author uses our results in an applied setting. In Duffie, Dworczak, and Zhu (2016), dealers and customers meet in an over-the-counter financial market for an asset. Customers don’t know the common cost of providing the asset, and as a consequence face uncertainty over the prices quoted by dealers. Because of costly search for the best price, entry by customers is limited. A social planner decides how much information about the common cost to reveal prior to trading. The distribution of the cost is continuous, resulting in a continuum of states. Our Theorem 1 is used to solve for the social planner’s optimal information revelation scheme. Other authors have studied related settings and techniques. Along with their analysis of the general model, Kamenica and Gentzkow (2011) apply the concavification approach to the same setting as ours. The concavification of the value function over posterior means reveals whether the Sender benefits from persuasion, but does not explicitly characterize the optimal signal. In contrast, our approach directly establishes the structure of the optimal signal. Gentzkow and Kamenica (2015) focus on the same setting as ours and characterize the set of feasible distributions over posterior means using a graphical method. Their method is then used to solve simple persuasion problems in which the Receiver chooses between two or three actions, and hence the Sender’s preferences over posterior means are a step function. In contrast, our techniques apply to an almost arbitrary objective function of the Sender, and in particular allow for a continuum of both states and actions. We complement their graphical method by characterizing when a distribution of posterior means is induced by a monotone partitional signal. 3

Kolotilin (2016) develops a duality theory for a model of persuasion in which the Receiver is privately informed and chooses between two actions. As noted by Kolotilin, this is equivalent to the model in which the Receiver is uninformed, there is a continuum of actions and the Sender’s preferences only depend on the posterior mean. While our paper relies on similar proof techniques, the aim is different. We provide general sufficient conditions for optimality, and a tight characterization of the form of the optimal Lagrange multiplier which simplifies applying the method in practice. Related models of persuasion with a privately informed receiver (but not using duality techniques) are considered by Kolotilin, Li, Mylovanov, and Zapechelnyuk (2015) and Guo and Shmaya (2017). The rest of the paper is organized as follows. In the next section we introduce the persuasion problem and reduce it to a constrained optimization problem. In Section 3, we present our sufficient conditions for optimality. In Section 4, we analyze the structure of the optimal solution and its supporting prices. In Section 5, we show that, under additional regularity assumptions, there always exists a price schedule that rationalizes the optimal persuasion scheme. In Section 6, we give a necessary and sufficient condition for optimality of monotone partitional signals. In Section 7, we work through two applications of our methods. Finally, in Section 8, we show how our method relates to the concavification approach via classical consumer choice duality. Proofs and additional applications are collected in Appendix A.

2

Model

The state of nature is the realization of a real-valued random variable X with a cumulative distribution function F . We assume that X has realizations in some bounded interval [x, x¯]. To simplify notation, we normalize the support so that supp(F ) ⊆ [0, 1].3 F is common knowledge between the two players, Sender and Receiver. The Sender commits to an information structure which determines the signal that is sent to the Receiver. An information structure is a mapping π : [0, 1] → ∆(S), measurable with respect to X, for some arbitrary signal space S. Given an information structure π, every signal realization s ∈ S induces a posterior belief over the distribution of X. We assume that the Sender’s final utility depends on posterior beliefs only through the posterior mean. Formally, there exists a measurable 3

We assume throughout that there exists a probability space (Ω, F, P) on which X is defined. The explicit probability space plays no further role in the analysis.

4

function u : [0, 1] → R such that u(x) is the ex-post utility of the Sender when the induced posterior mean is x. The assumption is satisfied when the Receiver’s optimal action only depends on the expected state and when the Sender’s preferences over actions depend linearly on the state (in particular, if they are state-independent). Both assumptions automatically hold when the state is binary. The Receiver’s problem only influences the persuasion problem via the shape of the function u, and thus the Receiver will not play any role in the analysis. Under this assumption, the expected value of an information structure π depends only on the distribution of posterior means that it induces. It is thus natural to optimize over distributions of posterior means directly. Given the prior F , a distribution of posterior means G is induced by some information structure if and only if F is a mean-preserving spread of G (Blackwell, 1953; Kolotilin, 2016; Gentzkow and Kamenica, 2015). The Sender’s problem is thus ˆ 1

u(x)dG(x)

max G

(2.1)

0

subject to the constraint that F is a mean-preserving spread of G.

3

Sufficient conditions for optimality

Theorem 1. If there exist a cumulative distribution function G and a convex function p : [0, 1] → R that satisfy supp(G) ⊆ arg max [u(x) − p(x)],

(3.1)

x∈[0, 1]

ˆ

ˆ

1

p(x)dG(x) =

1

p(x)dF (x), and

(3.2)

F is a mean-preserving spread of G,

(3.3)

0

0

then G is a solution to problem (2.1). Proof. See Appendix A.1. The theorem gives a method to verify optimality of a solution G by finding a corresponding Lagrange multiplier p. Moreover, the result allows us to reinterpret the Bayesian persuasion problem as a familiar consumer choice problem. The consumer “purchases” posterior means x at prices p(x). The chosen posterior means (the support of G) have to

5

maximize consumer’s utility net of prices (condition (3.1)). The consumer has a budget ´1 ´1 constraint, 0 p(x)dG(x) ≤ 0 p(x)dF (x), where the left-hand side can be interpreted as the total expenditure on the “bundle” G, and the right-hand side as the value of the “initial endowment” F . Condition (3.2) says that the at the optimal bundle, the consumer exhausts her budget. Finally, condition (3.3) is a constraint on feasible consumption bundles. Convexity of the price function p also has a natural economic interpretation. The Sender can commit to (further) garbling for free: after purchasing posterior means x1 and x2 with probability weights π1 and π2 , respectively, the Sender could pool the two at 2 1 x1 + π1π+π x2 with total probability weight π1 + π2 . If prices were not convex, x0 = π1π+π 2 2 π1 2 p(x2 ), then the Sender would prefer to purchase x1 that is, if p(x0 ) > π1 +π2 p(x1 ) + π1π+π 2 and x2 and pool them at x0 rather than purchase x0 directly at its price p(x0 ). Therefore, the effective price of x0 would no longer be p(x0 ). Convexity of prices is thus a necessary economic property when signals can be garbled for free. To illustrate Theorem 1, we first show how it can be used to establish well-known results in two simple cases. Corollary 1. If there exists an affine function q such that q(x) ≥ u(x) for all x and q(EX) = u(EX),4 then pooling (revealing nothing about X) is a solution to (2.1). Indeed, in this case it is enough to take p ≡ q. Since p is affine, condition (3.2) is equivalent to equality of unconditional means of F and G. The distribution G that puts all mass on EX also satisfies conditions (3.1) and (3.3), and hence is optimal by Theorem 1. Corollary 2. If u is convex, then full revelation (always revealing X) is a solution to problem (2.1). Indeed, in this case we can simply take p ≡ u and G ≡ F . Theorem 1 also allows us to easily solve persuasion problems where the Receiver chooses one of two actions. This has been solved using other methods (Gentzkow and Kamenica, 2015; Ivanov, 2015). For comparison, in Appendix A.2 we solve the two-action problem using Theorem 1. In Section 7, we illustrate the usefulness of Theorem 1 with two additional applications. 4

That is, u is superdifferentiable at EX.

6

4

Structure of optimal prices

The conditions of Theorem 1 are easy to check for a given pair (G, p), but do not give much guidance for the search of the correct (G, p). In this section we prove several implications of the three conditions, and show how these implications narrow down the class of candidate (G, p). First, without loss of generality, assume that p(x) ≥ u(x) for all x and p(x0 ) = u(x0 ) for some x0 ∈ [0, 1] (we can always add a constant to prices p). Then condition (3.1) is equivalent to supp(G) ⊆ {x ∈ [0, 1] : u(x) = p(x)}. (3.10 ) In the remainder of the paper we will use (3.10 ) in lieu of (3.1). We impose two regularity assumptions which we will maintain throughout. First, we assume that F has no mass points and has full support on [0, 1]. Second, we impose some regularity conditions on the objective function. Definition 1. Function u is regular if: 1. u is upper semi-continuous with finitely many (possibly zero) one-sided jump discontinuities at interior points y1 , ..., yk ∈ (0, 1); 2. u has a bounded slope in each interval (yi , yi+1 ) (where y0 = 0, yk+1 = 1), that is, there exists M < ∞ such that   |u(x) − u(x0 )| 0 : x, x ∈ (yi , yi+1 ) ≤ M, ∀i = 0, ..., k; sup |x − x0 | 3. for every convex function v such that v ≥ u pointwise, the set {x : v(x) > u(x)} can be represented as a finite union of intervals. The above regularity conditions are relatively mild. Upper semi-continuity of u is necessary to guarantee existence of a solution. The bounded slope assumption rules out the possibility that u gets infinitely steep at some points. The third condition excludes pathological utility functions that switch from convexity to concavity infinitely many times.5 5

If the function u is upper semi-continuous, then the set {x : v(x) > u(x)} is always a union of intervals, 1 but the union can in general require infinitely many elements. For example, f (x) = sin( 1−x )(1 − x)2 for x < 1 and f (1) = 0 is continuous, has bounded slope but is not regular.

7

Proposition 1. Suppose that u is regular, and a convex function p, continuous at the endpoints, satisfies conditions (3.10 )–(3.3) for some distribution G. Let 0 = x0 < x1 < ... < xn+1 = 1 be the unique coarsest partition of [0, 1] such that on each [xi , xi+1 ], p is either (1) strictly convex, or (2) affine.6 Then, • in case (1), p(x) = u(x), and G(x) = F (x), for all x ∈ [xi , xi+1 ]; • in case (2), G(xi ) = F (xi ), G(xi+1 ) = F (xi+1 ), p(yi ) = u(yi ) for at least one yi ∈ [xi , xi+1 ].

´ xi+1 xi

t dG(t) =

´ xi+1 xi

t dF (t), and

Proof. See Appendix A.3. Proposition 1 provides insight about the structure of the optimal solution. Summarizing, any pair (G, p) which satisfies conditions (3.10 )–(3.3) has the following properties: (i) the function p is a convex majorant of u, (ii) the distribution G is supported on a subset of {x ∈ [0, 1] : u(x) = p(x)}, (iii) whenever p is strictly convex, it must coincide with u, and then G fully reveals the state, and (iv) when p is affine on some interval [xi , xi+1 ], then F is a mean-preserving spread of G conditional on the realization being in [xi , xi+1 ]. Property (iv) implies that it is enough to verify the mean-spread condition (3.3) separately for every maximal interval in which p is affine (the condition holds automatically conditional on realizations in intervals where p is strictly convex, by point (iii) above). Figures 1 and 2 show an example u and F along with (G, p) that satisfy (3.10 )–(3.3), and hence all the implications above. 1 p(x) F(x) G(x) u(x) x

x0

x1

y1

0 x0

x2

Figure 1: Utility u(·) and prices p(·).

x

x1

y1

x2

Figure 2: Prior CDF F (·) and CDF of posterior means G(·).

Proposition 1 also implies that when p(x) > u(x) for all x ∈ [a, b], then p is piecewise affine with at most one kink in [a, b]. Indeed, such [a, b] can intersect at most two 6

Existence of such a partition with finitely many elements is proven.

8

consecutive intervals [xi , xi+1 ], [xi+1 , xi+2 ] because in every interval of the partition, p and u coincide at at least one point. The conditions listed above often restrict the set of possible prices to a small class. Because the convex price function p can only coincide with u when u is convex, and p is piece-wise affine with at most one kink between regions where u ≡ p, for relatively simple u, the set of potential price functions can be parameterized by a low-dimensional parameter. For any p in that set, points where u ≡ p determine the support of G, and the correct price function can be found by solving for the parameter which yields condition (3.3). We further aid the search for the solution in Section 6 where we provide sufficient conditions for when a monotone partitional signal is optimal. In Section 7, we illustrate the method by applying it to two persuasion problems.

5

Existence of optimal prices

For a regular function u, we can guarantee existence of a solution G supported by a price function p satisfying conditions (3.10 )–(3.3), and hence properties (i)–(iv) listed in the previous section. This means that under the regularity assumptions the consumer-choice approach to the persuasion problem is always valid in our setting. Theorem 2. For any regular u there exist an optimal solution G and a convex p : [0, 1] → R (continuous at the endpoints) such that the pair (G, p) satisfies conditions (3.10 )–(3.3). Proof. See Appendix A.4. The proof of Theorem 2 uses techniques from the literature on optimization with stochastic dominance constraints (Dentcheva and Ruszczynski, 2003). Theorem 2 does not follow immediately from the results of Dentcheva and Ruszczynski because (i) the qualification constraint used by Dentcheva and Ruszczynski does not hold in our setting, (ii) we use a different stochastic dominance constraint, and (iii) we allow for discontinuities in the objective function. To deal with (i), we consider a perturbed version of the problem for which the qualification constraints holds, and then prove the appropriate convergence properties. To accommodate a mean preserving spread condition, we modify the proof of Dentcheva and Ruszczynski (2003). Finally, we establish the result for a discontinuous objective function by constructing its continuous approximation.

9

6

Monotone partitional signals

Persuasion mechanisms that are seen in practice often only use monotone partitional signals: pooling, if present, is only between adjacent types. For example, many schools only release coarse information on student performance (Ostrovsky and Schwarz, 2010). Bond credit ratings also have a coarse structure (where very fine categories can be interpreted as full disclosure). These signal structures also appear in other models of communication: Crawford and Sobel (1982) show that all equilibria in their model feature monotone partitional signals. Definition 2. A distribution of posterior means G is induced by a monotone partitional signal if there exists a finite partition of [0, 1] into intervals {[xi , xi+1 ]}ki=1 such that for each i, either (i) G ≡ F in [xi , xi+1 ] (full revelation), or (ii) G puts all mass in [xi , xi+1 ] on E[X| X ∈ [xi , xi+1 ]] (pooling). We will prove that the following definition gives a necessary and sufficient condition on the objective function under which a monotone partitional signal is optimal regardless of the underlying prior distribution over states. Definition 3. A function u is affine-closed if there do not exist 0 < x < y < 1 and an affine function q such that: 1. u(x) = q(x) and u(y) = q(y); 2. q(z) ≥ u(z) for all z ∈ [x, y]; 3. q(z) > u(z) for all z ∈ {w}∪(x−, x)∪(y, y +) for some  > 0 and some w ∈ (x, y). Roughly speaking, an affine-closed function u cannot have two interior, proper local maxima. This characterization is not exact because (i) adding an affine function to u can change the set of local maxima but does not change the optimal solution to our problem, and (ii) point 3 of the definition has stronger implications for functions that are locally affine. For functions u that are not affine in any interval, we can formalize the above intuition: Such function u is affine-closed if and only if u + q has at most one local interior maximum for any affine function q. Convex and concave functions are always affine-closed. More complex examples of affine-closed functions are shown in Figures 3 and 4. In both cases there does not exist an affine function q with properties 1-3 listed in Definition 3. The affine function q1 in 10

Figure 3 does not satisfy property 1 because points of support must be interior (not 0 or 1). The affine function q2 that supports u at x and y does not satisfy property 3, because there does not exist w ∈ (x, y) such that q2 (w) > u(w). Finally, the function in Figure 4 is affine-closed because the affine function q3 cannot simultaneously satisfy properties 2 and 3, regardless of how we choose the support points x and y. q1

q3

q2

0

x

y

Figure 3: An affine-closed function (solid black line) and affine functions q1 and q2 (dotted lines)

1

0

1

Figure 4: An affine-closed function (solid black line) and an affine function q3 (dotted line)

Theorem 3. Let u be regular. If u is affine-closed, then for any continuous prior F there exists an optimal solution G for problem (2.1) which is induced by a monotone partitional signal. Conversely, if u is not affine-closed, then there exists a (continuous) prior F such that no optimal G can be induced by a monotone partitional signal. Proof. See Appendix A.5. In the proof, we use Theorem 2 to generate a solution G and a corresponding multiplier p. Starting from G, we construct a modified distribution of posteriors which is induced by a monotone partitional signal. Using Proposition 1 and the affine-closure property, we show that the multiplier p still supports the modified distribution, thus proving its optimality by Theorem 1. To prove the converse, we use the violation of affine-closure to construct a distribution F such that the optimal signal cannot have a monotone partitional structure. Further intuition behind the proof, and the importance of the affine-closure assumption can be understood using Figure 5. Consider the non-affine-closed function (black solid line). Suppose that F is the uniform distribution on [0, 1]. If h ≤ 41 in Figure 5, then by Theorem 1, the optimal posterior distribution of means has two atoms at the two peaks 12 ± h of the objective function (the multiplier p is a horizontal line tangent at the two peaks). Except for the non-generic case h = 41 , this distribution of posterior means cannot be induced 11

0

1 2

!h

1 2

1 2

+h

1

Figure 5: A non-affine-closed function (solid black line) and an affine-closed function (dotted blue line). by a monotone partitional signal. Consider instead the affine-closed function in Figure 5 (blue dotted line) as the objective. Although the same signal remains optimal, we can now modify it to obtain a monotone partition: We construct a signal which pools all realizations into one posterior mean 12 , and achieves the same payoff as the mixture between points 1 − h and 12 + h. The affine closure property implies that the objective function coincides 2 with the (locally affine) price function at the pooled mean 21 , allowing an optimal G to put an atom at 12 , in line with condition (3.10 ). We emphasize that, despite some superficial similarities, the affine-closure property is fundamentally different from the notion of a concave closure of a function. In Figures 3 - 5, the x-axis is the space of states, not the space of probability distributions over the state. In particular, there does not exist a prior-independent notion of an affine closure of a function. However, by Theorem 2, for any regular objective function u and prior distribution F , we can find an affine-closed function that yields the same value of the persuasion problem it is simply the price function p (because p is convex, it is affine-closed, and by property (3.10 ), p and u have the same value when integrated against the optimal G). Continuity of F is essential for the first half of the theorem. Consider the extreme case where F puts all mass on two points. Then, there are only two distinct distributions G that are induced by monotone partitions: G = F (full revelation) and G = δE[X] (pooling at the prior mean). It is easy to construct (affine-closed) u for which neither of these is optimal. One natural example where the affine-closure assumption may not hold is when the Receiver chooses one of n ≥ 3 actions. As noted in Gentzkow and Kamenica (2015), the optimal signal structure may fail to be monotone partitional in this case. Our method 12

Figure 6: A non-affine-closed function

Figure 7: An affine-closed function

yields a complete characterization of cases when the existence of a monotone partitional signal is guaranteed. When the three actions are ordered as in Figure 6, u is not affineclosed. By Theorem 3, there exists a prior distribution for which no monotone partitional signal is optimal. On the other hand, when the actions are not ordered (both when the central action is the worst, as in Figure 7, and when it is the best), u is affine-closed and hence there always exists an optimal monotone partitional signal. Other papers have found sufficient conditions for monotone partitional signals in related models. Mensch (2016) gives a sufficient condition based on supermodularity of the Sender’s and Receiver’s preferences in the general setting of Kamenica and Gentzkow (2011). An alternative sufficient condition is provided by Ivanov (2015), who uses results from majorization theory in a model similar to ours. His condition relies on finiteness of the signal space and requires a particular convexity structure of the objective function. We conclude this section by presenting a graphical characterization of when a distribution G is induced by a monotone partitional signal, which complements the approach of Gentzkow and Kamenica (2015). Gentzkow and Kamenica prove that G is induced by some ´x ´x signal if and only if the function cG (x) = 0 G(t)dt lies between cF (x) = 0 F (t)dt and ´x cG (x) := 0 1{t≥EX} dt, where G is the least informative distribution, obtained by pooling all realizations at the prior mean. Proposition 2. Let G be a distribution of posterior means induced by a monotone partitional signal. Then cG (x) coincides with cF (x) in every interval with full revelation and is tangent to cF (x) in every interval with pooling. Conversely, if c satisfies cF ≥ c ≥ cG , is piecewise linear in all regions where c < cF , and every linear piece is tangent to cF (x), then it is induced by a monotone partitional signal. In both cases, the points of tangency coincide with the endpoints of the partition intervals. Proof. See Appendix A.6. 13

7

Applications

7.1

Motivating through strategic disclosure

A principal wants to motivate an agent to exert costly effort in order to complete a project. The project, if completed successfully, has value X to the principal. We assume that the agent receives a fraction β ∈ (0, 1) of the value of the project (“sales commission”). The agent chooses effort level e ∈ [0, e¯], with e¯ ≤ 1, where e is interpreted as probability that the project will be successfully completed. Choosing effort level e has disutility c(e) = eα , where α > 1.7 The value X is distributed according to a continuous distribution F on [x, x¯]. The principal observes the realization x of X, but the agent does not. The principal commits to a disclosure scheme in order to maximize her profits. Given the belief of the agent that the expected value of the project is y, the chosen level of effort is equal to ) (  1 α−1 βy , e¯ . e? (y) = min α Let y¯ = (α/β)¯ eα−1 be the smallest expected value of the project such that e? (¯ y ) = e¯, i.e. the agent exerts maximal effort. To make the analysis interesting, we assume that x < y¯ < x¯. The value to the principal from inducing the belief y of expected value of the project is given by u(y) = (1 − β)e? (y)y. y , x¯]. The shape of the It is easy to check that u is strictly convex in [x, y¯], and affine in [¯ function u(·) is depicted in Figure 8. Proposition 3. If EX ≥ y¯, it is optimal to reveal no information. Otherwise, let x? be defined by E [X| X ≥ x? ] = y¯. Then, it is optimal to disclose x whenever x < x? , and to reveal only that x ≥ x? if x ≥ x? . Proof. 1, it is As signal,

In the first case, the objective function is superdifferentiable at EX, so by Corollary optimal not to reveal anything. for the second case, by Theorem 3, there exists a monotone partitional optimal and by Theorem 2, we know we can find an appropriate price function. Because

All the results in this section continue to hold under the assumption that (c0 )−1 (e)·e is convex, without any particular choice of functional form. 7

14

Pooling

p(x)

u(x)

x

x

x

?

y7

x 7

Figure 8: Motivating through strategic disclosure EX < y¯, we can rule out the case that p coincides with u on [¯ y , x¯]. Thus, if p coincides with u on some interval, it has to be an interval contained in [x, y¯], and by Proposition 1, there must be full revelation in that region. Given these insights, it is natural to consider the following price function (depicted in red in Figure 8):  u(x) if x < x? , p(x) = u(x? ) + ∆(x − x? ) if x ≥ x? where ∆≡

u(¯ y ) − u(x? ) . y¯ − x?

Function p is convex. We then define the cdf G of posterior means as

G(x) =

 F (x) F (x? ) + 1

if x < x? ? ? {x≥¯ y } (1 − F (x )) if x ≥ x

.

That is, G coincides with F up to x? , and then puts an atom at y¯ = E [X| X ≥ x? ] . Condition (3.10 ) holds because u and p coincide for x ≤ x? and x = y¯. Condition (3.2) is satisfied because p is affine whenever F 6= G, and the conditional means of F and G are equal in that region. By the way we defined G, F is a mean-preserving spread of G, which verifies condition (3.3). By Theorem 1, G is optimal.

15

Proposition 3 has the following economic interpretation. If the agent believes the value of the project to be high enough ex-ante (EX ≥ y¯), he exerts maximal effort, so it is optimal for the principal to release no additional information. In the opposite case EX < y¯, the principal uses an “upper censorship” rule. She discloses the exact value of the project for low realizations but garbles the signal for high realizations by only informing the agent that the project is “important” (the realized value of X is above x? ).

7.2

Investment recommendation

A risk-averse investor chooses how to divide her wealth w between a risk-free asset and a single risky asset, on which she can take a short or a long position. (A similar analysis is possible if only one of the two positions is available.) The amount invested in the risky asset either doubles in value or is entirely lost. To take any non-zero position in the risky asset, the investor must pay a fixed cost c > 0. The investor has prior belief 21 that a long position will double her investment. The investor consults a financial analyst, who has access to additional information about the payoff of the risky asset. Specifically, she knows the probability X that a long position will double investment. (With probability 1 − X a short position will double investment.) X is distributed according to F which is symmetric around the mean E[X] = 21 and admits a strictly positive density on [0, 1]. The analyst commits to a persuasion mechanism in order to influence the belief of the agent about the payoff of the risky asset. If the posterior belief is too close to the prior, because of the fixed cost the agent will optimally invest zero. The closer to 0 or 1 the belief, the more the agent is willing to invest. We consider two different shapes of the analyst’s utility u. In the first (Figure 9), as posterior belief approaches 0 and 1, the function is concave, meaning there are diminishing returns from inducing polarized beliefs. In the second (Figure 10), the function u is convex near 0 and 1. In both cases u is flat near 12 , where there is no investment. We provide a microfoundation of these shapes in Appendix A.7 assuming that the investor has CRRA preferences and the analyst’s utility is proportional to the amount invested. The shape of u depends on the measure of relative risk aversion, with u becoming more convex as risk aversion increases. The linear boundary case is attained when the investor has log-utility. Similar shapes can also arise because of non-linearities in the analyst’s fee structure. In Proposition 4 we solve the analyst’s persuasion problem under the assumption that   1 x0 > E X X ≤ 2 16

(7.1)

Pooling

Pooling

p(x)

u(x)

x

0

y

x0

1 ! x0

1!y

1

Figure 9: Portfolio recommendation: concave case so that the analyst is, on average, sufficiently informed to always induce positive investment in the risky asset, when information is pooled in two signals (high and low). By symmetry of F and u, this implies 1 − x0 < E[X|X ≥ 21 ]. Proposition 4. In the concave case (Figure 9, with u formally defined in Appendix A.7), it is optimal to reveal whether x < 12 or x > 12 . In the convex case (Figure 10, with u formally defined in Appendix A.7), there exist a? < b? such that it is optimal to disclose x whenever x ≤ a? or x ≥ 1 − a? , and to pool realizations (i) x ∈ (a? , b? ) at x0 = E[X|a? < X < b? ]; (ii) x ∈ (1 − b? , 1 − a? ) at 1 − x0 ; (iii) x ∈ [b? , 1 − b? ] at 12 . Proof. See Appendix A.7. Proposition 4 can be interpreted as follows. If the agent is less risk averse than a log-utility agent, the analyst issues a binary recommendation “buy” or “sell”, and the investor always invests a positive amount of wealth in the position recommended by the analyst. If the agent is more risk averse than a log-utility agent, the recommendation of the analyst has a finer structure. If the analyst thinks the asset will go up (or down) with high probability, she provides full information about the assessed probability. If she is less confident, she issues a “weak recommendation” to either buy or sell. And if the realized x is close to the prior belief 12 , she does not recommend any of the positions (“hold”). In this last case, the agent refrains from investing. 17

Pooling

Pooling

Pooling

p(x)

u(x)

x

a

$

x0

b

$

1!b

$

1 ! x0

1 ! a$

Figure 10: Portfolio recommendation: convex case

8

Concavification through prices

In this section, we briefly discuss the relationship between our approach and the concavification approach of Aumann and Maschler (1995) and Kamenica and Gentzkow (2011). We also draw another parallel to consumer choice theory. To simplify the analysis, we restrict attention to a binary model in which the state of the world is either 0 or 1 with probabilities 1 − α and α, respectively. The main assumption of this paper, that the utility of the Sender depends only on the posterior mean, is without loss of generality when the state is binary: the mean (α under the prior) is a one-dimensional statistic that uniquely pins down the entire distribution. In this special case, by the main result of Kamenica and Gentzkow (2011), the optimal payoff of the Sender is equal to the concave closure of u at α (the concave closure is the point-wise smallest concave function on [0, 1] that bounds u from above). Proposition 5. In the binary setting, there always exists an affine price function pα (x) which supports the optimal solution. Moreover, the value of the problem is equal to the price of the prior mean, pα (α). Proof. See Appendix A.8.

18

In light of Proposition 5 and the above, we can write co(u)(α) = pα (α) = min{px (α) : x ∈ [0, 1]}, where the second equality comes from the proof of Proposition 5. The economic consequences of the above relationships are illustrated in Figure 11. All possible price functions (optimal for different prior distributions) trace out the concave closure of the utility function u. This is analogous to how supporting prices can be used to recover the utility function (up to concavification) in a consumer choice problem as the initial endowment varies.

p , (x) 1

p , (x) 2

u(x)

x

0

,1

1

,2

Figure 11: Concavification through prices

References Aumann, Robert J and Michael Maschler. Repeated games with incomplete information. MIT press, 1995. Blackwell, David. Equivalent Comparisons of Experiments. The Annals of Mathematical Statistics, 24(2):265–272, 1953. 19

Bonnans, J. F. and Alexander Shapiro. Perturbation analysis of optimization problems. Springer, 2000. Calzolari, Giacomo and Alessandro Pavan. On the optimality of privacy in sequential contracting. Journal of Economic Theory, 130(1):168–204, September 2006. Crawford, Vincent P and Joel Sobel. Strategic Information Transmission. Econometrica, 50(6):1431–51, November 1982. Dentcheva, Darinka and Andrzej Ruszczynski. Optimization with stochastic dominance constraints. SIAM Journal on Optimization, 14(2):548–566, 2003. Duffie, Darrell, Piotr Dworczak, and Haoxiang Zhu. Benchmarks in search markets. Journal of Finance, forthcoming, 2016. Gentzkow, Matthew and Emir Kamenica. A Rothschild-Stiglitz approach to Bayesian persuasion. American Economic Review Papers & Proceedings, 2015. Guo, Yingni and Eran Shmaya. The Interval Structure of Optimal Disclosure. 2017. Ivanov, Maxim. Optimal Signals in Bayesian Persuasion Mechanisms with Ranking. 2015. Kamenica, Emir and Matthew Gentzkow. Bayesian Persuasion. American Economic Review, 101:2590–2615, 2011. Kolotilin, Anton. Optimal Information Disclosure: A Linear Programming Approach. 2016. Kolotilin, Anton, Ming Li, Tymofiy Mylovanov, and Andriy Zapechelnyuk. Persuasion of a Privately Informed Receiver. Working Paper, 2015. Mensch, Jeffrey. Monotone persuasion. 2016. Ostrovsky, Michael and Michael Schwarz. Information Disclosure and Unraveling in Matching Markets. American Economic Journal: Microeconomics, 2(2):34, 2010. Rayo, Luis and Ilya Segal. Optimal Information Disclosure. Journal of Political Economy, 118(5):949–987, 2010.

20

A A.1

Proofs and additional material Proof of Theorem 18

Let (G, p) satisfy conditions (3.1)–(3.3). To show that G is a solution to the Sender’s ´1 ´1 problem, it is enough to show that 0 u(x)dG(x) ≥ 0 u(x)dH(x), for any H such that F is a mean-preserving spread of H. By (3.1), ˆ ˆ 1

1

(u(x) − p(x))dG(x) ≥

(u(x) − p(x))dH(x).

0

0

Rearranging, ˆ

ˆ

1

0

ˆ

1

1

p(x)dG(x) −

u(x)dH(x) ≥

u(x)dG(x) − 0

ˆ

1

ˆ

0

p(x)dH(x) 0

ˆ

1

1

p(x)dF (x) −

= ˆ

p(x)dH(x) 0

0

ˆ

1

p(x)dF (x) −



1

p(x)dF (x) = 0 0

0

where the equality is given by (3.2) and the final inequality holds because −p(x) is concave, ´1 and F is a mean-preserving spread of H, by assumption. Therefore 0 u(x)dG(x) ≥ ´1 u(x)dH(x). 0

A.2

Receiver has two actions

A simple example where our methods can be applied is when the Receiver chooses one of two actions. Suppose that the Receiver takes the Sender-preferred action if and only if her posterior mean is greater than or equal to x0 (so that indifferences are broken in Sender’s favor). This example can be solved by other methods (see both Gentzkow and Kamenica, 2015 and Ivanov, 2015). Proposition 6. Let u be the non-decreasing step function

u(x) =

 u

if 0 ≤ x < x0

u

if x0 ≤ x ≤ 1

0

1

8

.

We thank Shota Ichihashi for showing us this direct proof. Our previous proof employed duality techniques from the literature on optimization with stochastic dominance constraints.

21

Pooling

Pooling

p(x) u1

u(x) u0

x

0

x?

x0

1

Figure 12: Two-action case If EX ≥ x0 , then the optimal mechanism reveals nothing. If EX < x0 , then the optimal mechanism reveals whether x is below or above x? , where x? satisfies E[X|X ≥ x? ] = x0 . Proof. If EX ≥ x0 , the objective function u is superdifferentiable at EX, so by Corollary 1, it is optimal to reveal nothing. Now assume EX < x0 . Let x? be defined by E[X|X ≥ x? ] = x0 . Consider the piece-wise affine p given by  u 0 ≤ x < x? 0 , p(x) = u + u1 −u0 (x − x∗ ) x? ≤ x ≤ 1 0 ∗ x0 −x and the cdf G given by    0 0 ≤ x < E[X|X < x? ]   G(x) = F (x? ) E[X|X < x? ] ≤ x < x0 .    1 x0 ≤ x ≤ 1 The function p is convex by construction. Condition (3.10 ) holds because u and p coincide for x ≤ x? and x = x0 . Condition (3.2) is satisfied because p is piece-wise affine, and the conditional means of F and G are equal in both regions, by construction. By the way we defined G, F is a mean-preserving spread of G, which verifies condition (3.3). Thus, by Theorem 1, G is optimal.

22

A.3

Proof of Proposition 1

Lemma 1. If (G, p) satisfy conditions (3.2), (3.3), and p is convex and continuous at the endpoints, then  ˆ x ˆ 1 ˆ x F (t)dt − G(t)dt dp0 (x) = 0, (A.1) 0

0

0

interpreted as a Riemann-Stieltjes integral with respect to the measure induced by the nondecreasing function p0 . Proof. Because p is convex, it is absolutely continuous in the interior of the domain, and continuous at the endpoints by assumption. We can use integration by parts for the Riemann-Stieltjes integral: ˆ

ˆ

1

p(x)dG(x) =

[p(x)G(x)]10

ˆ

1

0

0

0

1

p0 (x)G(x)dx,

G(x)dp(x) = p(1) −



where the second equality uses the fact that dp(x) = p0 (x)dx by absolute continuity of p. Next, we have  ˆ x ˆ 1 ˆ 1 0 0 G(t)dt , p (x)d p (x)G(x)dx = 0

0

0

and we can use integration by parts for the Riemann-Stieltjes integral again to obtain ˆ

1

 1 ˆ 1 ˆ x G(t)dt dp0 (x) − G(t)dt p (x)G(x)dx = p (x) 0 0 0  0ˆ 1 ˆ x  ˆ 1 0 G(t)dt − G(t)dt dp0 (x). = p (1) 

0

0



0

x

0

0

0

Because G was arbitrary, the same transformations are true for G = F , and hence condition (3.2) is equivalent to 0



p (1)

1

 ˆ G(t)dt −

0

1



0

x

 ˆ 0 0 G(t)dt dp (x) = p (1)

0

1

 ˆ F (t)dt −

0

0

1



x

 F (t)dt dp0 (x).

0

By condition (3.3), F and G have the same mean, and thus 0



p (1)

1

 ˆ 0 G(t)dt = p (1)

0

1

 F (t)dt

0

which ends the proof. By condition (3.3), F is a mean-preserving spread of G which implies that G second23

order stochastically dominates F . Thus, ˆ

ˆ

x

F (t)dt ≥ 0

x

G(t)dt, ∀x ∈ [0, 1].

(A.2)

0

Because p is convex, p0 is non-decreasing, and thus p0 induces a positive measure. Therefore, ´x ´x condition (A.1) is satisfied if and only if 0 F (t)dt = 0 G(t)dt for p0 -almost all x ∈ [0, 1]. That is, the equality has to hold on every set that has positive measure under p0 , in particular for each x at which there is a jump in p0 , and for every interval on which p is strictly convex. We conclude that: ´x ´x 1. 0 F (t)dt = 0 G(t)dt in every interval [a, b] ⊂ [0, 1] in which p is strictly convex; ´t ´t 2. p is affine in every interval [a, b] ⊂ [0, 1] such that 0 F (x)dx > 0 G(x)dx for all t ∈ [a, b]; ´x ´x 3. 0 F (t)dt = 0 G(t)dt for each x at which p has a jump in the first derivative. To strengthen the conclusion of points 1 and 3 above, we prove the following lemma. ´x ´x Lemma 2. If x ∈ (0, 1) is such that 0 F (t)dt = 0 G(t)dt, then F (x) = G(x). Proof. We will prove the contrapositive: if F (x) 6= G(x) for some x ∈ (0, 1), then ´x ´x F (t)dt 6= 0 G(t)dt. 0 Fix x ∈ (0, 1) and first suppose that F (x) < G(x). Since F and G are right-continuous, ´z there exists a z > x such that F (t) < G(t) for every t ∈ (x, z). Then, since 0 F (t)dt ≥ ´z G(t)dt holds by equation (A.2), we have 0 ˆ

ˆ

x

F (t)dt −

F (t)dt = 0

ˆ

z

0

ˆ

z

ˆ

z

G(t)dt −

F (t)dt > x

ˆ

z

G(t)dt =

0

x

G(t)dt.

x

0

If instead F (x) > G(x), since G is nondecreasing and F has full support and no atoms, there exists a w < x such that F (t) > G(t) for all t ∈ (w, x). Then ˆ

ˆ

x

F (t)dt = 0

ˆ

w

F (t)dt + 0

ˆ

x

F (t)dt > w

ˆ

w

G(t)dt + 0

ˆ

x

G(t)dt = w

x

G(t)dt. 0

By Lemma 2, F (x) = G(x) in every interval in which p is strictly convex, and for every x at which p has a jump in the first derivative. Take any maximal interval [a, b] in which p is affine (that is, p is not affine on any other [c, d] which contains [a, b]). By maximality, we must have F (a) = G(a), F (b) = G(b), 24

´b ´b and a F (t)dt = a G(t)dt as we would otherwise violate the observation in the previous paragraph. Moreover, there exists x0 ∈ [a, b] such that u(x0 ) = p(x0 ) because otherwise, by condition (3.10 ), the function G would be constant on [a, b] (while F is strictly increasing because the distribution has full support). Now, take any interval [a, b] where p is strictly convex. Then, F (x) = G(x) for all x ∈ [a, b], and because F has full support, [a, b] ⊆ supp(G). Because G and p satisfy condition (3.10 ), we must have u(x) = p(x) for all x ∈ [a, b]. Suppose that p(x) > u(x) in some interval [a, b]. Because F has full support, the ´x function 0 F (t)dt is strictly convex. Because G satisfies condition (3.10 ), it is not supported ´x in [a, b], and thus 0 G(t)dt is affine on [a, b]. Using inequality (A.2), we conclude that ´x ´x F (t)dt and G(t)dt coincide at at most one point in (a, b), call it z ? . From condition 0 0 2 above, we obtain that p is affine in (a, z ? ) and (z ? , b). Thus, whenever p(x) > u(x) in some interval, p(x) is piece-wise affine with at most one kink in that interval. We can now recursively define the partition 0 = x0 < x1 < ... < xn+1 = 1. Given xi , we define xi+1 = inf{α > xi : p is affine on [xi , α] or p is strictly convex on [xi , α]}. We first prove that xi+1 > xi and that the partition is finite. By regularity of u, the set {x : p(x) > u(x)} is a finite union of intervals. In every such interval, as proven above, p is piece-wise affine with at most one kink. The complement set {x : p(x) = u(x)} is also a finite union of intervals, and u is convex in each such interval. By regularity, each such interval can be decomposed into a finite union of intervals in which p is either affine or strictly convex.9 Thus, there are finitely many candidate points for any xi , and thus such a partition is well defined and finite. By construction, the partition is the coarsest one such that u is either affine or strictly convex on each interval of the partition. Because each element of the partition is a maximal interval in which p is either strictly convex or affine, the properties listed in Proposition 1 follow directly from above observations. 9

Proof: suppose it is not possible, i.e. there are infinitely many intervals in which u is alternately affine and strictly convex. Then, we can define a piece-wise affine and globally convex function v which coincides with u exactly in intervals where u is affine. This would violate regularity.

25

A.4

Proof of Theorem 2

We prove the theorem in three steps. In the first step, we make the additional assumption that u is continuous, and study a perturbed problem. The perturbation allows us to show that a generalized Slater condition holds which results in existence of the appropriate Lagrange multiplier. In the second step, we show that the solution to the perturbed problem provides the correct approximation of the solution to the unperturbed problem. By taking the limit, we obtain the statement of Theorem 2 for the special case of a continuous u. In the third step, we relax the assumption that u is continuous, again using an approximation approach. We relegate some technical lemmas to Appendix A.9. We make two preliminary observations. First, a solution to the Sender’s problem always exists because the objective function is upper semi-continuous, and the set of feasible points is compact in the weak? topology. Second, the optimization problem is unaffected by adding a constant to the utility function u, so we can assume without loss of generality that u(x) ≥ 0 for all x ∈ [0, 1]. Step 1. In the first step, we rely heavily on the proof technique developed by Dentcheva and Ruszczynski (2003). Dentcheva and Ruszczynski (2003) provide a duality theory for optimization problems with stochastic dominance constraints. They study a case where the constraint takes the form of second-order stochastic dominance. Our constraint additionally incorporates equality of unconditional means resulting in a mean preserving spread condition. Thus, the proof of Dentcheva and Ruszczynski (2003) has to be modified. For completeness, we sketch the entire argument. Assume that u is continuous. Consider the following perturbed problem, where the mean-preserving spread condition is only required on the interval [, 1 − ], instead of on [0, 1]. ˆ

1

max

u(x)dG(x) ˆ x ˆ x s.t. F (t)dt ≥ G(t)dt for all x ∈ [, 1 − ], 0 0 ˆ 1 ˆ 1 G(t)dt ≥ F (t)dt for all x ∈ [, 1 − ].

G∈∆([0, 1])

(A.3)

0

x

(A.4) (A.5)

x

Conditions (A.4) and (A.5) enforced on the entire interval [0, 1] would be jointly equivalent to F being a mean preserving spread of G. Note that although the perturbed problem (A.3)

26

only requires the mean preserving spread condition on [, 1−], all distributions are defined on [0, 1]. We take  > 0 small enough so that  < E[F ] < 1 − . Let C := C([, 1 − ]) denote the space of continuous functions on [, 1 − ]. Define K as the cone of continuous non-negative functions on [, 1 − ], that is, K := {g ∈ C : g(x) ≥ 0, ∀x ∈ [, 1 − ]}. Let Φ : ∆([0, 1]) → C × C be defined by ´  x F (t)dt − ´ x G(t)dt if i = 1, 0 (Φ(G))i (x) = ´01  G(t)dt − ´ 1 F (t)dt if i = 2, x x for any x ∈ [, 1 − ]. Conditions (A.4) and (A.5) are now equivalent to Φ(G) ∈ K × K . The operator Φ is concave with respect to the product cone K × K . By the Riesz representation theorem, the space dual to C is the space rca([, 1 − ]) of regular countably additive measures on [, 1 − ] having finite variation. We define a Lagrangian Λ : ∆([0, 1]) × rca([, 1 − ]) × rca([, 1 − ]) → R, ˆ

ˆ

1

0

1−

(Φ(G))2 (x)dµ2 (x).

(Φ(G))1 (x)dµ1 (x) +

u(x)dG(x) +

Λ(G, µ1 , µ2 ) =

ˆ

1−





We now show that the generalized Slater condition holds for the problem (A.3). By Bonnans and Shapiro (2000) (Proposition 2.106), it is enough to show that there exists ˜ ∈ ∆([0, 1]) such that G ˜ ∈ int(K × K ). Φ(G) ˜ supported on [0, 1] such that Φ(G) ˜ is a That is, we have to find a distribution (cdf) G Cartesian product of two functions that are both in K , and are bounded away from zero on ˜ [, 1 − ]. Let G(x) = 1{x≥E[F ]} . Using the full support assumption on F , we conclude that ˜ i (x), for i = 1, 2, is equal to 0 at x = 0 and x = 1, non-negative, and the function (Φ(G)) strictly concave. Therefore, that function has to be bounded away from zero on [, 1 − ]. Thus, the generalized Slater condition holds. By Bonnans and Shapiro (2000) (Theorem 3.4), we conclude that if G is a solution to the problem (A.3) (we have already argued that a solution exists), then there exist

27

non-negative measures µ?1 , µ?2 ∈ rca([, 1 − ]) such that ˆ µ? , µ? ), Λ(G, 1 2

(A.6)

(Φ(G))i (x)dµ?i (x) = 0, i = 1, 2.

(A.7)

Λ(G, µ?1 , µ?2 ) = ˆ

and

max

ˆ G∈∆([0, 1])

1−



Using an argument analogous to the one used by Dentcheva and Ruszczynski (2003), we can associate with each measure µ?i a function p?i : [0, 1] → R, ´  1− µ? ([τ, 1 − ])dτ, x < 1 −  1 x ? p1 (x) = 0, x ≥ 1 − ,  0, p?2 (x) = ´ x  

x< µ?2 ([, τ ])dτ, x ≥ ,

where each µ?i is extended to [0, 1] by putting zero mass beyond the interval [, 1 − ]. By the properties of µ?i , the function p?1 is non-increasing and convex, and p?2 is non-decreasing and convex. We have (by changing the order of integration) ˆ 

1−



x

 ˆ ? G(t)dt dµ1 (x) =

1



0

0

1−

1{t≤x} dµ?1 (x)

ˆ



1

µ?1 ([t, 1 − ])G(t)dt.

G(t)dt = 0



Using the definition of p?i , we can write ˆ

1−





Similarly, we have

x

 ˆ ? G(t)dt dµ1 (x) = −

0

ˆ

1−



1

G(t)dp?1 (t).

0



1

 ˆ ? G(t)dt dµ2 (x) =

1

G(t)dp?2 (t).

0

x

Using integration by parts, we get ˆ

ˆ

1

G(t)dp?i (t)

=

p?i (1)

p?i (x)dG(x).

− 0

0

28

1

Therefore, the complementary-slackness condition (A.7) becomes ˆ

ˆ

1

p?i (x)dG(x) 0

1

p?i (x)dF (x), i = 1, 2.

= 0

Finally, define p(x) = p?1 (x) + p?2 (x). Then, p(x) is convex, and obviously satisfies ˆ

ˆ

1

p(x)dG(x) = 0

1

p(x)dF (x).

(A.8)

0

Finally, condition (A.6) implies that ˆ G ∈ argmaxG

ˆ

1

u(x)dG(x) − 0

1

 p(x)dG(x) .

(A.9)

0

We can always add a constant to p without changing any of it properties. Because of property (A.9), we can normalize p so that condition (3.10 ) holds. That is, p ≥ u and supp(G) ⊆ {x ∈ [0, 1] : u(x) = p(x)}.

(A.10)

In particular, because u ≥ 0, we also have p ≥ 0. Therefore, for the perturbed problem, we have shown that there exists a convex p : [0, 1] → R such that conditions (A.8) and (A.10) both hold. Step 2. In this step, we show that we can take the limit of the perturbed problems from Step 1, and obtain a solution to the original (unperturbed) problem (still maintaining the assumption that u is continuous). Consider a sequence of problems (A.3) - (A.5) defined by taking  = 1/n for n = 1, 2, . . .. We obtain a sequence (Gn , pn ) of pairs satisfying conditions (A.8), (A.10), as well as (A.4) and (A.5). We will first show a simplified version of the proof using the following assumption. Assumption 1. The sequence of functions (pn ) converges uniformly on [0, 1] to some convex function p. In Appendix A.9, we present a full version of the proof without Assumption 1. This adds technical complications because the sequence pn can potentially explode near the endpoints of the interval [0, 1]. In Appendix A.9, we show that we can still establish the result by appropriately modifying the functions pn , and using the special structure of the 29

problem. Under Assumption 1, we have a well defined limit p of the sequence pn . The sequence (Gn ), seen as a sequence of probability measures, lives in a compact set (in the weak? topology). Because the space of measures is metrizable, compactness is equivalent to sequential compactness, and thus we can choose a converging subsequence. Without loss of generality, Gn converges in the weak? topology to some distribution G ∈ ∆([0, 1]). We have thus defined the limiting pair (G, p). We want to prove that (G, p) satisfies conditions (3.10 )–(3.3) on [0, 1]. First, as n → ∞, conditions (A.4) and (A.5) imply that F is a mean-preserving spread of G on the entire interval [0, 1]. This establishes condition (3.3). Second, we note that10 supp(G) ⊆ lim sup{supp(Gn )}. n

Given that condition (A.10) holds for each n, and because pn converges to p, lim sup{supp(Gn )} ⊆ lim sup {x ∈ [0, 1] : u(x) = pn (x)} ⊆ {x ∈ [0, 1] : u(x) = p(x)}. n

n

We conclude that supp(G) ⊆ {x ∈ [0, 1] : u(x) = p(x)},

(A.11)

establishing condition (3.10 ). ´1 ´1 Third, we will show that (3.2) holds. We will argue that 0 pn (x)dGn (x) → 0 p(x)dG(x). To see this, note that ˆ

ˆ

1

pn dGn − 0

ˆ

1

(pn − p)dGn −

pdG = 0

ˆ

1

1

p(dG − dGn ). 0

0

The second integral converges to zero by definition of convergence of Gn to G in the weak? topology. For the first integral, we have ˆ

1

(pn − p)dGn ≤ sup {|pn (x) − p(x)|} 0 10

x∈[0, 1]

The lim sup of a sequence of sets An is defined as lim sup An = {x : ∃(xn )n s.t. xn ∈ An , ∀n, and xn → x}. n

30

which converges to zero because pn converges to p uniformly on [0, 1], by Assumption 1. Similarly, we have ˆ 1 ˆ 1 pdF. pn dF = lim n

0

0

Combining these two results with condition (A.8), we get ˆ

ˆ

1

1

p(x)dG(x),

p(x)dF (x) = 0

0

which is what we wanted to prove. This finishes the proof of Step 2, i.e., we have shown that for a continuous u conditions (3.10 ) - (3.3) hold with (G, p). By Theorem 1, G is the optimal solution. Step 3. We now prove Theorem 2 without the additional assumption that u is continuous. As stated in the regularity assumption (Definition 1), u has finitely many one-sided jump discontinuities at y1 , ..., yk ∈ (0, 1). First, we construct a continuous approximation of u. Fix  > 0. We only modify the function u in an -neighborhood of each yi . For small enough , these neighborhoods are disjoint. Take any i = 1, ..., k, and suppose without loss of generality that limy↑yi u(y) < u(yi ) = limy↓yi u(y). Then, for small enough , and using regularity of u, we can connect the point yi −  with yi using an affine function that lies (weakly) above u in [yi − , yi ]. We denote by u¯ the continuous function constructed by replacing u in each such neighborhood [yi − , yi ]11 with an affine majorant described above. Because u¯ is a continuous function with a bounded slope, by Steps 1 and 2, there exists an optimal distribution G and a Lagrange multiplier p such that (G , p ) satisfies conditions (3.10 )–(3.3).12 Lemma 3. The sequence p constructed by taking  = 1/n and n → ∞ has a subsequence that converges to some convex continuous p uniformly on [0, 1]. Proof. See Appendix A.9. The lemma gives us a limit p of the sequence p . By the same argument as in the first step of the proof, a subsequence of solutions G as  → 0 also converges to some distribution G ∈ ∆([0, 1]). 11

Or [yi , yi + ] if the function u “jumps down”, that is, its value is locally lower to the right of yi . Note that we only need u ¯ to have a bounded slope for a fixed . We do not claim that the slope is bounded uniformly in . 12

31

The last step of the proof is to show that (G, p) satisfy conditions (3.10 )–(3.3). This is immediate by the same arguments as used in the first step of the proof, and the fact that u and u¯ coincide on the support of G for small enough .13

A.5

Proof of Theorem 3

Let u be regular (Definition 1) and affine-closed (Definition 3), and let F be continuous. By Theorem 2, we can find G and p that satisfy conditions (3.10 )–(3.3). From G we will build an optimal distribution of posterior means H which is induced by a monotone partitional signal. By Proposition 1, there are finitely many intervals [xi , xi+1 ] in which p is either strictly convex or affine. On each of these intervals [a, b] = [xi , xi+1 ] we will verify that supp(H) ∩ [a, b] ⊆ {x ∈ [a, b] : u(x) = p(x)}, ˆ

ˆ

b

p(x)dH(x) = a

(A.12)

b

p(x)dF (x), and

(A.13)

a

F |[a,b] is a mean-preserving spread of H|[a,b] .

(A.14)

If (A.12), (A.13), and (A.14) hold on each interval, then for (H, p), (3.10 ), (3.2) and (3.3) hold on [0, 1]. By Theorem 1, this is sufficient to verify optimality of H. Consider each interval [a, b] = [xi , xi+1 ] in turn, beginning with [0, x1 ]. If [a, b] is an interval where p is strictly convex, set H(x) = G(x) for all x ∈ [a, b]. By Proposition 1, u = p on [a, b], hence (A.12) is satisfied. Since F = G = H on [a, b], (A.13) and (A.14) are automatically satisfied. The signal which induces H is full revelation of X on [a, b], which is part of a monotone partitional signal (Definition 2). If instead p is affine on [a, b], let y = E[X|a ≤ X ≤ b]. By Proposition 1, the mean of G conditional on [a, b] is equal to y. If u(y) = p(y), modify G by specifying that H puts all mass in the interval [a, b] on y (pooling in the interval [a, b]). Formally, H(x) = G(a) for x ∈ [a, y) and H(x) = G(b) for x ∈ [y, b]. Condition (A.12) holds because u(y) = p(y). Condition (A.13) holds because p is affine on [a, b] and H and F have the same conditional mean. Finally, condition (A.14) holds because F |[a,b] is a mean-preserving spread of G|[a,b] , G|[a,b] is a mean-preserving spread of H|[a,b] , and the mean-preserving spread relationship is transitive. The remaining case is when p is affine on [a, b] and u(y) < p(y). Let A := {x ∈ [a, b] : 13

Formally, this follows from the fact that p have a uniformly bounded slope, as shown in the proof of Lemma 3.

32

u(x) = p(x)}. The support of G restricted to [a, b] is a subset of A, by (3.10 ). Since u is upper semi-continuous and u ≤ p, A is a closed subset of [a, b]. Since y 6∈ A by assumption and y is the conditional mean of G on [a, b], the support of G (hence its superset A) must contain points in [a, y) and in (y, b]. Write A as the disjoint union A = AL t AR , where AL ⊂ [a, y) and AR ⊂ (y, b] are closed and nonempty. Next we show that affine-closure implies that at least one of AL and AR is a closed interval that extends to a or b respectively; that is, either AL = [a, c], or AR = [d, b], or both, for some c, d ∈ (0, 1). Suppose neither were true. Write the closed set AL as a union of disjoint closed intervals, choose any one of these intervals that has its left endpoint not equal to a, and define α as its left endpoint. Similarly we can define β < b as the right endpoint of an interval of AR . By construction, the definition of affine-closure applies to u and p at α and β: they belong to A, so u(α) = p(α) and u(β) = p(β); p ≥ u at all points in [a, b]; and p > u in a left-neighborhood and right-neighborhood of α and β respectively. Thus, by affine-closure, u(x) = p(x) for all α ≤ x ≤ β. In particular this holds for x = y, which contradicts our previous assumption that u(y) < p(y). From now on suppose that AL = [a, c] for some c < y; the symmetric case AR = [d, b] follows from the same argument. We now construct H on the interval [a, b]. Let δ := min AR , and define ω as the smallest solution to E[X|ω ≤ X ≤ b] = δ. The solution exists because E[X|ω ≤ X ≤ b], as a function of ω, is nondecreasing, continuous (F has no mass points), and ranges from y < δ at ω = a to b > δ at ω = b. Now consider the following monotone partition: [a, ω], (ω, b]. The H that it induces has a mass point at γ := E[X|a ≤ X ≤ ω] and one at δ. As it is a monotone partition of F , it satisfies (A.14). Finally, we check that (3.10 ) holds. The required equality u(δ) = p(δ) holds by construction, so the only thing left is to check that u(γ) = p(γ). Since AL = [a, c], it is enough to check that γ ≤ c. Suppose instead that γ > c. We will derive a contradiction by showing that this implies that F is not a mean-preserving spread of G in the interval [a, b], contradicting Proposition 1. When γ > c, all mass that G puts to the left of y must be to the left of γ. We show that this mass is smaller than what H puts: G(γ) < H(γ). If instead G(γ) ≥ H(γ), then G(z) ≥ H(z) for all z ∈ [a, ω] with the inequality strict for at least some z < γ, since H(a) = G(a) (because H = G at all endpoints of the convex regions to the left of a, by construction) and ´ω ´ω both G and H are constant for z ∈ (γ, ω]. Therefore a G(z)dz > a H(z)dz. The right´ω ´ω ´ω hand side evaluates to a H(z)dz = F (ω)(ω−γ) = ωF (ω)− a zdF (z) = a F (z)dz, where ´ω ´ω the last equality is integration by parts. Putting things together, a G(z)dz > a F (z)dz, 33

which contradicts the assumption that F is a mean-preserving spread of G. ´b Next, note that ω H(z)dz = F (ω)(δ − ω) + (b − δ) = bF (b) − ωF (ω) − (1 − F (ω))δ = ´b F (z)dz, where the last equality is again integration by parts. Furthermore, 1 = H(z) ≥ ω G(z) for z ∈ [δ, b], and H(z) > G(z) for z ∈ [γ, δ) (because both G and H are constant ´b ´b ´b on [γ, δ) and H(γ) > G(γ)). Therefore ω G(z)dz < ω H(z)dz = ω F (z)dz. Since ´ω ´ω ´b F is a mean-preserving spread of G, a G(z)dz ≤ 0 F (z)dz. But then 0 G(z)dz = ´ω ´b ´ω ´b ´b G(z)dz+ G(z)dz < F (z)dz+ F (z)dz = F (z)dz, contradicting the assumption a ω a ω a ´b ´b that F is a mean-preserving spread of G (which implies a G(z)dz = 0 F (z)dz). This concludes the proof that (3.10 ) holds on [a, b]. Note that H is well-defined, as by construction H = G on all endpoints of the intervals [a, b]. We conclude that H is optimal, and by construction it is induced by a monotone partitional signal. We now prove the converse. If u is not affine-closed, then there exist x, y ∈ (0, 1), x < y, and an affine function q such that: u(x) = q(x), u(y) = q(y); q(z) ≥ u(z) for all z ∈ (x, y); there exists w ∈ (x, y) such that q(w) > u(w); and there exists ε > 0 such that q(z) > u(z) for all z ∈ (x − ε, x) ∪ (y, y + ε), where ε is chosen so that x − ε > 0 and y + ε < 1. Consider the distribution F that puts weight α > 0 uniformly on [x − ε, x − ε/2] and weight 1 − α uniformly on [y + ε/2, y + ε], where α is chosen so that E[F ] = w, where q(w) > u(w) holds. That is, let F have density    0     2α    ε f (z) = 0    2(1−α)    ε    0

if z < x − ε if x − ε ≤ z ≤ x −

ε 2

if x −

ε 2


ε 2

if y +

ε 2

≤z ≤y+ε

.

if z > y + ε

Given such a prior F , the Sender cannot do strictly better than choosing a distribution of posterior means G that has support limited to x and y. To see this, first note that the mean-preserving spread condition implies that supp G ⊆ [x − ε, y + ε]. Suppose the Sender’s utility were q. Since q is linear, all feasible distributions of posterior means are optimal; in particular, so is one with support equal to {x, y}. Since q ≥ u on [x − ε, y + ε], this gives an upper bound to the utility attainable under u. The upper bound is attained if and only if all mass is concentrated at points where q(z) = u(z). In particular, the

34

distribution

   0 if 0 ≤ z < x   G(z) = α if x ≤ z < y    1 if y ≤ z ≤ 1

is feasible and attains the upper bound. To conclude it is enough to show that no monotone partitional signal achieves the upper bound. First consider the trivial partition that pools all realizations at the prior mean w = E[F ]. By our choice of α, u(w) < q(w), hence this is not optimal. Next consider monotone partitions with two or more signals. Since all prior mass is concentrated on [x − ε, x − ε/2] and [y + ε/2, y + ε], at least one of the intervals in the partition has a conditional posterior mean in [x − ε, x − ε/2] or [y + ε/2, y + ε]. But by construction u(z) < q(z) on such intervals. Hence the monotone partition is not optimal. We conclude that no monotone partition is optimal for such a prior F .

A.6

Proof of Proposition 2

Suppose G is induced by a monotone partitional signal with n intervals [a0 , a1 ) = [0, a1 ), [a1 , a2 ), [a2 , a3 ), . . ., [an−1 , an ] = [an−1 , 1], where intervals are either of full revelation or pooling. Let ˆ ai+1 1 tdF (t) mi := F (ai+1 ) − F (ai ) ai be the conditional mean of F on the interval [ai , ai+1 ). On intervals of pooling, G(·) is constant with jumps of size F (ai+1 ) − F (ai ) at points mi ; on intervals of full revelation, G = F . Therefore cG (·) is continuous, convex, piecewise linear on intervals of pooling, and differentiable everywhere except at those mi where [ai , ai+1 ) is a pooling interval. For each i = 0, 1, . . . , n we will prove by induction that cG (ai ) = cF (ai ). The base case i = 0 is trivial: cG (a0 ) = cG (0) = 0 = cF (a0 ). Now suppose that cG (aj ) = cF (aj ); we want to show that cG (aj+1 ) = cF (aj+1 ) holds. Write ˆ

ˆ

aj

cG (aj+1 ) =

G(t)dt + 0

ˆ cF (aj+1 ) =

a ˆ jaj+1

aj

aj+1

G(t)dt = cG (aj ) +

F (t)dt + 0

ˆ

aj+1

G(t)dt a ˆ jaj+1

F (t)dt = cF (aj ) + aj

F (t)dt aj

´a Since cG (aj ) = cF (aj ) by inductive hypothesis, it is enough to show that ajj+1 G(t)dt = ´ aj+1 F (t)dt. If [aj , aj+1 ) is an interval of full revelation this is immediate because F = G. aj 35

If instead it is a pooling interval, ˆ

ˆ

aj+1

G(t)dt = aj

ˆ

mj

aj+1

G(t)dt +

G(t)dt

a ˆ jmj

=

mj

ˆ

aj+1

F (aj )dt +

F (aj+1 )dt

aj

mj

= (mj − aj )F (aj ) + (aj+1 − mj )F (aj+1 ) and, integrating by parts, ˆ

ˆ

aj+1

aj+1

F (t)dt = aj+1 F (aj+1 ) − aj F (aj ) − aj

tdF (t) aj

ˆ

aj+1

= aj+1 F (aj+1 ) − aj F (aj ) − mj (F (aj+1 ) − F (aj )) =

G(t)dt aj

We conclude that cG (ai ) = cF (ai ) for all i. Tangency also requires that c0G (ai ) = c0F (ai ). For ´x d this it is enough to note that c0F (ai ) = dx F (t)dt = F (ai ) and c0G (ai ) = F (ai ) because 0 x=ai cG (x) is linear near x = ai (at least one of two adjacent intervals is a pooling interval), with slope F (ai ) by construction. Finally, if [ai , ai+1 ) is an interval of full revelation (hence ´x G ≡ F on the interval), cG coincides with cF because cG (x) = cG (ai ) + ai G(t)dt = ´x cF (ai ) + ai F (t)dt = cF (x) for x ∈ [ai , ai+1 ). Next we prove the reverse implication. Let c(·) be a continuous and convex function that satisfies cF ≥ c ≥ cG , that is piecewise linear where c < cF and tangent to cF (·) in every affine piece. Call the points of tangency 0 = a0 < a1 < . . . < an = 1. Then c(·) is generated by the partitional signal with intervals [ak , ak+1 ), k = 0, 1, . . . , n − 1, where the intervals are of pooling when c(·) is affine and of full revelation when c(·) is strictly convex (and coincides with cF ). To see this, consider    0 x<0   Gc (x) := ∂+ c(x) x ∈ [0, 1]    1 x>1 where ∂+ c(x) is the right derivative of c(·) at x. Consider two consecutive points of tangency ai < ai+1 such that c(·) is piecewise linear between ai and ai+1 . c(·) is linear to the right of ai with slope F (ai ), and linear to the left of ai+1 with slope F (ai+1 ). The two affine segments meet at some point x ∈ (ai , ai+1 ), and the difference in slopes (which corresponds to the jump for Gc ) is F (ai+1 ) − F (ai ), as 36

desired. To determine the intersection x, note that the affine segment around ai is described by the equation y = F (ai )x + (c(ai ) − F (ai )ai ) (the line with slope F (ai ) that passes through (ai , c(ai )), and similarly the equation for the line around ai+1 is y = F (ai+1 )x + (c(ai+1 ) − F (ai+1 )ai+1 ). Setting these equal and rearranging gives x= =

F (ai+1 )ai+1 − F (ai )ai − (c(ai+1 ) − c(ai )) F (ai+1 ) − F (ai ) ´a F (ai+1 )ai+1 − F (ai )ai − aii+1 F (t)dt

F (ai+1 ) − F (ai ) tdF (t) ai = = mi F (ai+1 ) − F (ai ) ´ ai+1

where the second to last equality uses integration by parts. Therefore the kink in c(·) (hence jump in Gc (·)) occurs precisely at the posterior mean of F conditional on the interval [ai , ai+1 ). By construction, cGc ≡ c. Therefore c is generated by a monotone partition.

A.7

Material for Section 7.2 and proof of Proposition 4

We first present the parametrization of preferences that underlies Figures 9 and 10. Assume that the investor has CRRA utility v with Arrow-Pratt measure of relative risk aversion 1−η η > 0 over final wealth z; that is, v(z) = z 1−η−1 if η 6= 1, and v(z) = ln z if η = 1. Given a posterior belief x > 12 that the long position is profitable (x < 21 is symmetric), the investor chooses the level of investment y > 0 in the risky asset that maximizes xv(w + y − c) + (1 − x)v(w − y − c), or decides not to invest (y = 0) and receives the outside option v(w). Let y ? (x) be the optimal investment as a function of belief x. Assume the analyst’s payoff is proportional to the amount invested. Then (omitting the irrelevant proportionality constant) u(x) ≡ y ? (x) can be shown to be

u(x) =

 1 η  1−( 1−x x )   −(w − c) 1   η 1+( 1−x  x )  0   1  η  1−( 1−x  x )  (w − c) 1−x η1 1+( x )

if x ≤ x0 if x0 < x < 1 − x0 if x ≥ 1 − x0

where x0 is the threshold belief at which the investor is indifferent between investing or not: 37

v(w) = x0 v(w + y ? (x0 ) − c) + (1 − x0 )v(w − y ? (x0 ) − c). If η < 1 (less risk averse than log utility) u(x) is strictly concave in [0, x0 ] and [1 − x0 , 1]; if η > 1 (more risk averse than log utility) then u(x) is strictly convex in the same regions. For all η > 0, u is affine-closed. By Theorem 3, there exists a monotone partitional optimal signal. Since the multiplier p(x) must be convex, hence continuous, p(x) cannot coincide with u(x) in a neighborhood of the discontinuities x0 and 1 − x0 . Thus, by Proposition 1, x0 and 1 − x0 must be contained in pooling regions. These insights lead us to consider the multipliers depicted in Figure 9 and Figure 10. Define w(x) as 



w(x) = −1{x≤ 1 } + 1{x> 1 } (w − c) 2

2

1−

1−x x

1+

1−x x

 η1  η1 .

  This coincides with u for x ≤ x0 and x ≥ 1 − x0 . For η < 1, w is strictly concave on 0, 21   and 12 , 1 . For η > 1, w is (globally) strictly convex. Proof of Proposition 4. In the concave case (η < 1), define y = E[X|X ≤ 12 ] and consider the piece-wise linear, convex function  u0 (y)(x − y) + u(y) if x ≤ p(x) = u0 (1 − y)(x − (1 − y)) + u(1 − y) if x >

1 2 1 2

By construction, p is tangent to u at y and 1 − y, and has a kink at 21 . Furthermore, since and u coincide at y and 1 − y, and w is concave, p(x) ≥ w(x) ≥ u(x) for all x ∈ [0, 1] and p(x) > u(x) for x 6∈ {y, 1 − y}. Consider the distribution of posterior means

G(x) =

   0   1

2    1

if x < y if y ≤ x < 1 − y if x ≥ 1 − y

 that puts atoms of size F 21 = 12 on y = E[X|X ≤ 12 ] and on 1 − y = E[X|X ≥ 21 ]. Conditions (3.1)–(3.3) are satisfied, hence, by Theorem 1, G is optimal. Now consider the convex case (η > 1). Consider the set of lines passing through < m < u0 (x0 ). Since u(x) ≤ w(x) and w (x0 , u(x0 )) with slopes m that satisfy u(xx00)−u(0) −0 is strictly convex, each such line intersects u(x) at two points besides x0 : a(m) ∈ (0, x0 ) and b(m) ∈ (x0 , 12 ). Functions a(·) and b(·) are continuous and strictly increasing. Let

38

t(m) = E[X|X ∈ [a(m), b(m)]]. Since F is continuous and strictly increasing   by assump< tion, t(·) is also continuous and strictly increasing. By assumption (7.1), t u(xx00)−u(0) −0   1  E X|X ∈ 0, 2 < x0 . By construction, t(u0 (x0 )) = E[X|X ∈ [x0 , b(u0 (x0 ))]] > x0 . By the intermediate value theorem, there exists a (unique) m? such that t(m? ) = x0 . Define a? := a(m? ), b? := b(m? ). Now consider the following p and G:    u(x)       m? (x − a? ) + u(a? )   p(x) = u(x)      −m? (x − (1 − a? )) + u(1 − a? )     u(x)

if x < a? if a? ≤ x < b? if b? ≤ x < 1 − b?

,

if 1 − b? ≤ x < 1 − a? if x ≥ 1 − a?

   F (x)       F (a? ) + 1{x≥x0 } (F (b? ) − F (a? ))   G(x) = F (b? ) + 1{x≥ 1 } (F (1 − b? ) − F (b? )) 2     ?  F (1 − b ) + 1{x≥1−x0 } (F (1 − a? ) − F (1 − b? ))     F (x)

if x < a? if a? ≤ x < b? if b? ≤ x < 1 − b?

.

if 1 − b? ≤ x < 1 − a? if x ≥ 1 − a?

G reveals x when x < a? or x ≥ 1 − a? . The remaining intervals [a? , b? ], (b? , 1 − b? ), and [1 − b? , 1 − a? ] are pooled at x0 , 21 and 1 − x0 , respectively. Conditions (3.1)–(3.3) are satisfied by construction, hence, by Theorem 1, G is optimal.

A.8

Proof of Proposition 5

To prove the proposition, we first characterize distributions of posterior means satisfying condition (3.3). Because the state is binary, it is without loss to look at binary signals, i.e. distributions of posterior means G with up to two points in their support. Let the support of G be {x, x¯} with x ≤ x¯ (possibly with equality). Then, {x, x¯} with probability weights (1 − β, β) satisfies (3.3) if and only if x ≤ α ≤ x¯ and β = (α − x)/(¯ x − x) (where we define β = 1 when x = α = x¯), where α is the prior mean. If p is affine and condition (3.3) holds for G, then condition (3.2) holds for (G, p). If (G, p) additionally satisfies condition (3.10 ), then the expected payoff to the Sender under G is equal to p(α). In this case, p bounds u from above, and satisfies p(x) = u(x) and 39

p(¯ x) = u(¯ x). For any upper semi-continuous u and any prior distribution α, we can find the affine price function with the properties described above. Formally, let pα ∈ argmin{q(α) : q is affine on [0, 1], q ≥ u}.14 Then, pα has to touch u at some points x and x¯ which lie on the opposite sides of α. By the above arguments, these points uniquely pin down a feasible distribution of posterior means G, and by Theorem 1, G is optimal. It is possible that x = α = x¯ in which case pα is simply the supporting hyperplane of u at α, and the optimal G is a degenerate distribution that puts all mass on the unconditional mean α. By definition, the price of the prior, pα (α), is equal to the concave closure of u at α. Thus, co(u)(x) = px (x), for all x ∈ [0, 1], where co(u) denotes the concave closure of u.

A.9

Technical Appendix

In this Appendix, we fill the gaps in the proof of Theorem 2. A.9.1

Step 2 without Assumption 1

For a fixed , consider a pair (G, p) from Step 1 of the proof of Theorem 2 from Appendix A.4. We first show how to modify the function p to ensure that its slope is uniformly bounded (this will be important in the next step of the proof). Intuitively, when we consider the sequence pn , the slope of pn could diverge to infinity close to the endpoints, upsetting uniform convergence. We show in the lemma below that we can control the slope by using the properties of the pair (G, p) established in Step 1. Lemma 4. Consider problem (A.3) - (A.5) for a fixed  > 0. There exists a convex function q which satisfies (A.8) and (A.10) and has a slope uniformly bounded by ( max

c c ´x , ´1 (x − x)dF (x) x¯ (x − x¯)dF (x) 0

) ,

14 Such function pα can be found because the set {q : q is affine on [0, 1], q ≥ u} can be made compact without affecting the definition of pα by requiring that q be pointwise smaller than some sufficiently large number M (upper semi-continuity of u implies that the set is closed).

40

where c is a constant that does not depend on , and x := inf{x ∈ [0, 1] : u(x) = p(x)}, x¯ := sup{x ∈ [0, 1] : u(x) = p(x)}. Proof. In the case when either x = 0 or x¯ = 1, there is nothing to prove because the bound ´x is equal to ∞. We assume otherwise, and focus on showing the bound by c/ 0 (x−x)dF (x) on [0, 1/2] (an analogous argument establishes the other bound on [1/2, 1]). Recall that u is assumed continuous, and that by the regularity assumption u has a slope uniformly bounded by M < ∞. This also implies that u is bounded: kuk∞ < ∞. Because p is convex, p ≥ u, and p coincides with u at x and x¯, p inherits the bound M on the slope in the interval [x, x¯]. Next, note that condition (A.10) is not affected by modifying p(x) on [0, x) or (¯ x, 1]. x, 1] by property (A.10), the value of the integral Because G puts no mass on [0, x) ∪ (¯ ´1 p(x)dG(x) is also unaffected. Therefore, we can replace p on [0, x) by some other 0 function q and preserve conditions (A.8) and (A.10) as long as ˆ

ˆ

x

x

p(x)dF (x) =

q(x)dF (x).

0

(A.15)

0

Consider a function q that coincides with p on [x, 1], and is affine otherwise: q(x) = p(x) + ∆(x − x), for some ∆ > 0, for x ∈ [0, x]. Choose ∆ to satisfy equation (A.15). Because p is convex, it is clear that such ∆ has to be larger than the slope of p at x, so q remains convex. Moreover, we have ˆ kuk∞ ≥

1

(1)

ˆ

u(x)dG(x) = 0

0

1

(2)

ˆ

p(x)dG(x) = ˆ (4) =

1

(3)

ˆ

x

p(x)dF (x) ≥ 0 x

p(x)dF (x) ˆ

0 (5)

q(x)dF (x) ≥ ∆

0

x

(x − x)dF (x), (A.16) 0

where (1) follows from (A.10), (2) follows from (A.8), (3) follows from the fact that p is non-negative, and (4) and (5) from the definition of q. We conclude that kuk∞ . (x − x)dF (x) 0

∆ ≤ ´x

41

Therefore, the slope of q is bounded by max{kuk∞ , M } ´x . (x − x)dF (x) 0 This finishes the proof.15 We come back to the proof of Step 2 without Assumption 1. Recall that we have a sequence (Gn , pn ) satisfying (A.4), (A.5), (A.8) and (A.10), with each pn convex. Moreover, by Lemma 4, we can modify the functions pn so that pn has a slope bounded uniformly by ( max

c c ´ xn , ´1 (xn − x)dF (x) x¯ (x − x¯n )dF (x) 0 n

) ,

where c does not depend on n, and xn and x¯n are defined by Lemma 4: xn := inf{x ∈ [0, 1] : u(x) = pn (x)}, x¯n := sup{x ∈ [0, 1] : u(x) = pn (x)}. We can assume without loss (by passing to a subsequence if necessary) that both xn and x¯n converge to some x and x¯, respectively. If x > 0 and x¯ < 1, then, for sufficiently high n, all pn have a slope uniformly bounded by ( max

´ x/2 0

c (x/2 − x)dF (x)

, ´1

c

(x − (¯ x + 1)/2)dF (x) (¯ x+1)/2

) ,

using the assumption that F has full support. Consider the opposite case when either (i) x = 0 or (ii) x¯ = 1. Then, for a sufficiently small δ > 0, all pn have a uniformly bounded slope on [δ, 1 − δ], for sufficiently high n. This is because each pn ≥ u, pn is convex, and thus pn has a slope bounded by the slope of u (which is bounded by M by the regularity assumption). We can thus conclude that for every (small enough) δ > 0, pn have a uniformly bounded slope on [δ, 1 − δ]. Thus, the sequence of functions pn is uniformly bounded on [δ, 1 − δ]. This follows from the fact that each pn is convex, has a uniformly bounded slope, and the 15

A careful reader will notice that we have not verified that such q satisfies q ≥ u on [0, x]. However, this does not pose a problem for the remainder of the proof because we will only need the bound derived in the lemma to hold for x sufficiently close to 0. If q does not satisfy q ≥ u, we can always replace x with some smaller x0 > 0, and the rest of the proof remains the same.

42

domain [δ, 1−δ] is compact. A uniformly bounded sequence of convex functions is Lipshitz continuous with a common Lipshitz constant L. In particular, the sequence (pn )n is equicontinuous on [δ, 1 − δ]. By the Arzela-Ascoli Theorem, pn has a uniformly converging subsequence on every interval [δ, 1 − δ]. Therefore, a subsequence of pn converges to some continuous convex p on (0, 1), uniformly on each compact subset of (0, 1). We can complete the definition of p by specifying that p is continuous at 0 and at 1 (the properties of p at any single point do not play a role). Just as in Step 2 of the proof from Appendix A.4, we prove that Gn converges in the weak? topology to some G ∈ ∆([0, 1]), and that the limiting pair (G, p) satisfies conditions (3.10 ) and (3.3). A separate argument is needed to show condition (3.2) because now we only have convergence of pn to p uniformly on every compact subset of (0, 1) but not necessarily on [0, 1]. Define, for each n, the smallest convex function qn that coincides with pn on [xn , x¯n ]. Note that on [xn , x¯n ] the slope of pn is bounded by the slope of u (which is bounded by M by assumption), so qn can be constructed by linearly extending pn beyond [xn , x¯n ] with the slope equal to the relevant derivative of pn at the endpoints xn and x¯n . Obviously, we have pn ≥ qn . By construction, qn has a uniformly bounded slope on [0, 1] (bounded by M ), so by the same argument as above, it has a uniformly convergent subsequence to some function ´1 ´1 q. Therefore (indexing the subsequence by n again), 0 qn (x)dGn (x) → 0 q(x)dG(x). To see this, note that ˆ

ˆ

1

qn dGn −

ˆ

1

1

q(dG − dGn );

(qn − q)dGn −

qdG = 0

0

ˆ

1

0

0

The second integral converges to zero by the definition of convergence of Gn to G in the weak? topology. For the first integral, we have ˆ

1

(qn − q)dGn ≤ sup {|qn (x) − q(x)|} 0

x∈[0, 1]

which converges to zero because qn converges to q uniformly on [0, 1].

43

We have ˆ 0

1

(1)

ˆ

1

(2)

ˆ

ˆ

1

1

p(x)dF (x) ≥ p(x)dG(x) ≥ q(x)dG(x) = lim qn (x)dGn (x) n 0 0 0 ˆ 1−δ ˆ 1 ˆ 1 (5) (4) (3) pn (x)dF (x) pn (x)dF (x) ≥ lim pn (x)dGn (x) = lim = lim n n n δ 0 0 ˆ 1 ˆ 1−δ (7) (6) p(x)dF (x) ≥ p(x)dF (x) − ε(δ), (A.17) = δ

0

where (1) follows because p is convex and F is a mean-preserving spread of G, (2) follows because the inequality pn (x) ≥ qn (x) is preserved in the limit, (3) follows because, by definition, pn and qn coincide on the support of Gn , (4) follows from (A.8), (5) follows for any δ > 0 from non-negativity of pn , (6) follows because pn converges to p uniformly on every compact subset of (0, 1), and (7) is true for some ε(δ) which goes to zero as δ → 0. Because δ (and hence ε(δ)) can be arbitrarily small, we must have: ˆ

ˆ

1

p(x)dF (x) = 0

1

p(x)dG(x), 0

which is what we wanted to prove. This finishes the proof of Step 2, i.e. we have shown conditions (3.10 ) - (3.3) for the case of a continuous u (and hence the optimality of G, by Theorem 1). A.9.2

Proof of Lemma 3

We prove that the functions in the sequence p are uniformly bounded. Suppose not. Then there exists a subsequence of p (which we take to be the original sequence to simplify notation) such that lim→0 kp k∞ = ∞. By the properties of Lagrange multipliers proved in Proposition 1, and the assumption that u has a uniformly bounded slope in the intervals where it is continuous, the only possibility is that p has an affine piece whose slope diverges to infinity and which touches u at one of the points of discontinuity yi . Because there are finitely many points of discontinuity of u, we can choose a divergent subsequence (which we take to be the sequence itself) in which each p touches u with an affine piece at the same discontinuity point yi . Using Proposition 1 and the properties of u and u¯ , for small enough , the affine piece of p that touches u at yi must have the following properties: (i) p is affine on [δ , 1] for some δ ≤ yi , and is not affine on any interval that strictly contains [δ , 1], and (ii) p

44

touches u¯ only at yi in the interval [yi , 1], and (iii) δ → yi as  → 0. We argue why properties (i) - (iii) are indeed true: Properties (i) and (ii) hold because, by the choice of the sequence p , the affine piece of p that touches u at yi has a divergent slope. Because u has a uniformly bounded slope whenever it is continuous, and only finitely many discontinuities, for small enough , p cannot touch u¯ to the right of yi . It follows from Proposition 1 that p is affine to the right of yi . Then, we can define δ ≤ yi by requiring that [δ , 1] is the maximal interval in which p is affine (no interval on which p is affine strictly contains it). Property (iii) then follows from the fact that the slope of the affine piece of p on [δ , 1] goes to plus infinity but p touches u¯ at yi . We are ready to obtain a contradiction. By Proposition 1, in the interval [δ , 1], F is a mean-preserving spread of G . Because p does not touch u to the right of yi , G must put all mass on [δ , yi ], and hence it has to be that E[X|X ≥ δ ] ∈ [δ , yi ]. In the limit, we obtain E[X|X ≥ yi ] = yi which is a contradiction because F has full support and yi < 1. The obtained contradiction implies that p are uniformly bounded. Because each p is convex, the family is equi-continuous, and, as before, we can use the Arzel`a-Ascoli theorem to conclude that a subsequence of p converges to some convex continuous p.

45