Optimal Information Disclosure: A Linear - Theoretical Economics

Optimal Information Disclosure: A Linear - Theoretical Economics

Optimal Information Disclosure: A Linear Programming Approach∗ Anton Kolotilin† First Version: November, 2011 This Version: July, 2017 Abstract An un...

400KB Sizes 0 Downloads 5 Views

Recommend Documents

Information disclosure in optimal auctions - e-Archivo Principal - UC3M
that the seller controls the accuracy with which bidders learn their valuations, and .... The fact that G preserves the

Optimal Information Disclosure in Auctions: The Handicap Auction∗
allows the buyers to learn their valuations with the highest possible precision, ..... non-negative shock-adjusted virtu

Pareto Optimal Budgeted Combinatorial Auctions - Theoretical
This paper studies the possibility of implementing Pareto optimal outcomes in the ... Keywords: Combinatorial auctions,

Competitive markets with externalities - Theoretical Economics
Competitive equilibrium, externalities, distributional economies. ... The classical model of competitive markets assumes

Dynamic markets for lemons: Performance - Theoretical Economics
May 12, 2015 - Dynamic markets for lemons: Performance, liquidity, and policy intervention. Diego Moreno. Departament of

Optimal Leverage and Strategic Disclosure - ConfTool
Optimal Leverage and Strategic Disclosure. ∗. Giulio Trigilia. †. January 9, 2017. Abstract. I consider a market whe

Nearly Optimal Tests when a Nuisance Parameter - UCSD Economics
Theorem of classical decision theory (see, for instance, Ferguson (1967), ... null hypothesis in the form of the general

Jessica E. Schaffner, Optimal Deterrence: A Law and Economics
Apr 19, 2014 - sex trafficking, it is not as prevalent in the United States). ... slavery taking place throughout histor

Information theoretical properties of Tsallis entropies
Sep 8, 2014 - Abstract. A chain rule and a subadditivity for the entropy of type β, which is one of the nonadditive ent

Discrete gambles: Theoretical study of optimal bet allocations for the
gamble, equivalent to betting on a tennis match, a double binary gamble, equivalent to betting on two simultaneous tenni

Optimal Information Disclosure: A Linear Programming Approach∗ Anton Kolotilin† First Version: November, 2011 This Version: July, 2017

Abstract An uninformed sender designs a mechanism that discloses information about her type to a privately informed receiver, who then decides whether to act. I impose a single-crossing assumption, so that the receiver with a higher type is more willing to act. Using a linear programming approach, I characterize optimal information disclosure and provide conditions under which full and no revelation are optimal. Assuming further that the sender’s utility depends only on the sender’s expected type, I provide conditions under which interval revelation is optimal. Finally, I show that the expected utilities are not monotonic in the precision of the receiver’s private information. Keywords: Bayesian persuasion, information design, information disclosure, informed receiver JEL classification: C72, D82, D83 ∗

This paper is based on the second chapter of my 2012 Ph.D. dissertation at MIT and was previously

circulated under the title “Optimal Information Disclosure: Quantity vs. Quality.” I thank Robert Gibbons and Muhamet Yildiz for their invaluable guidance and advice. I thank Hongyi Li for many detailed comments. I also thank Ricardo Alonso, Sandeep Baliga, Abhijit Banerjee, Gabriel Carroll, Denis Chetverikov, Glenn Ellison, Richard Holden, Jin Li, Uliana Loginova, Niko Matouschek, Parag Pathak, Michael Powell, Andrea Prat, Juuso Toikka, Alexander Wolitzky, Juan Xandri, Luis Zerme˜ no, various seminar and conference participants, anonymous referees, and especially the editor, George Mailath, for helpful comments and suggestions. Financial support from the Australian Research Council is acknowledged. † UNSW Australia, School of Economics. Email: [email protected]

1

1

Introduction

In the Bayesian persuasion literature (Rayo and Segal 2010 and Kamenica and Gentzkow 2011), an uninformed sender (she) designs an information disclosure mechanism to influence the beliefs of a receiver (he) about the sender’s type. I use a linear programming approach to study this problem. In my model, the receiver privately knows his one-dimensional type and chooses between two actions: to act or not to act. Before observing her type, the sender can commit to any (stochastic) mapping from her types to messages, which I call an information disclosure mechanism. After observing the message generated by the mechanism and his type, the receiver decides whether to act. The sender and receiver’s types are drawn from a continuous joint prior distribution. The sender and receiver have continuous utility functions that depend on the sender and receiver’s types. I impose a single-crossing assumption which ensures that each message of a mechanism induces the receiver to act if and only if his type exceeds a threshold type. It turns out that my model is equivalent to an alternative model where the receiver is uninformed, chooses a one-dimensional action, and has utility that is single-peaked in his action for each message of a mechanism. In my model, each message of a mechanism corresponds to a threshold type above which the receiver acts. Likewise, in the alternative model, each message of a mechanism corresponds to an optimal action of the uninformed receiver. That is, the receiver’s threshold type in my model is isomorphic to the receiver’s optimal action in the alternative model. I characterize conditions for a candidate mechanism to be optimal, and derive comparative statics on the precision of the receiver’s private information. The characterization results apply directly both to my model and to the alternative model. But the comparative statics results do not apply to the alternative model, in which the receiver is uninformed. Hereafter, I discuss my results in the context of my model, in which the receiver is privately informed and chooses between two actions. For concreteness, consider a school that wishes to persuade a potential employer to hire a student by choosing a grade disclosure policy for the student. The school can freely choose what information about the student’s grades appears on the student’s transcript. Moreover, the school chooses this disclosure policy before observing anything about the student. The employer observes the student’s transcript but also obtains private information, for example, from conducting an employment interview with the student and competing candidates. The

2

single-crossing assumption requires that all possible interview outcomes can be appropriately ranked. The sender’s problem of finding an optimal mechanism reduces to a linear program, because a mechanism is described by the conditional probabilities of messages given the sender’s types, and the expected utilities are linear in these probabilities. The linear programming approach gives necessary and sufficient conditions under which a candidate mechanism is optimal. This enables the characterization of conditions that justify many commonly observed grade disclosure policies, such as those reported in Ostrovsky and Schwarz (2010). These conditions imply that to verify that a grade disclosure policy is optimal, it suffices to check that there is no simple deviation from this policy that the school prefers. At the one extreme, some schools report all grades and class rank on transcripts. Such a full revelation mechanism is optimal if and only if the sender prefers to reveal any two of her types than to pool them. At the other extreme, some schools release no transcripts. Such a no revelation mechanism is optimal if and only if the sender prefers to pool any three of her types with the uninformative message than to pool two of them and reveal the third one. Assume further that the sender’s utility (under the receiver’s optimal action) depends on the message only through the posterior expectation of the sender’s type given this message.1 Under this assumption, the sender can choose any distribution of posterior expectations of the sender’s type, subject to the constraint that the prior distribution of the sender’s type is a mean-preserving spread of this distribution. As a result, the shape of the optimal mechanism is jointly determined by the convexity properties of the sender’s utility function and by the prior distribution of her type. I provide necessary and sufficient conditions under which the sender optimally chooses an interval revelation mechanism that reveals moderate types and hides extreme types. In general, the sender and receiver’s expected utilities under the optimal mechanism are not monotonic in the precision of the receiver’s private information. First, as the receiver becomes more informed, his expected utility may decrease despite the fact that he is the only player who takes an action that directly affects his utility. This happens because the optimal mechanism depends on the precision of the receiver’s private information, and the sender may prefer to disclose significantly less information if the receiver’s information is 1

The statement of this assumption is silent about the sets of receiver’s types and actions. Consequently,

the assumption and the corresponding results apply directly both to my model and to the equivalent alternative model with an uninformed receiver. Kamenica and Gentzkow (2011) refer to this assumption as “Sender’s payoff depends only on the expected state”. Ostrovsky and Schwarz (2010) also impose this assumption and characterize the unique equilibrium (rather than optimal) information disclosure mechanism.

3

more precise. Returning to the school–employer example, this suggests that low reliability of employment interview procedures (summarized by Arvey and Campion 1982) may be beneficial for employers, as it motivates schools to design more informative disclosure policies. Second, it may be easier for the sender to influence a more informed receiver. This happens because the sender may optimally choose to target only the receiver with favorable private information, and it becomes easier for the sender to persuade such a receiver, as he becomes more informed. The linear programming approach to Bayesian persuasion complements the standard concavification approach of Kamenica and Gentzkow (2011). They work with the distribution of posterior beliefs induced by a mechanism. They define the sender’s indirect utility of posterior beliefs, and derive the optimal mechanism by taking the concave closure of this indirect utility function. In contrast, the linear programming approach solves the dual problem: it derives conditions under which a given mechanism is optimal. As Gentzkow and Kamenica (2016) point out, the concavification approach has limited applicability when the set of sender’s types is an interval, because the set of posterior beliefs becomes infinite-dimensional. The linear programming approach instead works with utilities directly expressed as functions of the one-dimensional sender and receiver’s types, and thus yields sharper results for the class of problems I consider. My model is a special case of Kamenica and Gentzkow (2011), who do not restrict either the sets of receiver’s types and actions or the functional form of the receiver’s utility. Rayo and Segal (2010), on the other hand, is a special case of my model. They assume that the receiver’s type is uniformly distributed and does not affect the sender’s utility.2 Subsequent to the first version of this paper, some papers on Bayesian persuasion have assumed that the sender’s utility depends only on the sender’s expected type. Gentzkow and Kamenica (2016) provide an alternative characterization of the set of feasible mechanisms, and use it to find optimal mechanisms in stylized examples. Kolotilin et al. (2017) allow the sender to condition mechanisms on the receiver’s reports, and provide simple sufficient conditions for optimality of upper-censorship – a special case of the interval revelation mechanisms characterized in this paper.3 2

In Section 4, I discuss in more details how my paper relates to Rayo and Segal (2010), Kamenica and

Gentzkow (2011), and other papers on Bayesian persuasion. 3 The Bayesian persuasion problem of this paper is mathematically similar to the delegation problem initiated by Holmstrom (1984). Alonso and Matouschek (2008) and Amador and Bagwell (2013) characterize necessary and sufficient conditions under which interval delegation is optimal. Their conditions resemble my conditions under which interval revelation is optimal, but their proofs are more involved.

4

2

School–Employer Example

A school chooses a grading policy to maximize the probability of an employer hiring a student. The student is either a peach or a lemon. The school and the employer have a common prior belief that the student is a peach with probability 0.2. The employer hires the student if the employer believes that the student is a peach with probability at least 0.5. The timing of the game is as follows. First, the school chooses a grading policy Φ described by a finite (ordered) set M of grades and the conditional distribution of grades given the student’s type. Second, the student’s type is drawn and the grade is generated according to Φ. Third, the employer observes the grade and conducts an employment interview. The interview produces a two-valued signal about the student’s type with precision p ∈ [1/2, 1] in the sense that Pr(positive|peach) = Pr(negative|lemon) = p and Pr(negative|peach) = Pr(positive|lemon) = 1 − p. The employer is positive (negative) if the interview signal is positive (negative). Finally, the employer makes a hiring decision. I restrict attention to grading policies that generate three possible grades: A, B, or C that convince both, only positive, or no employer to hire. This is without loss of generality because convincing the negative employer also convinces the positive employer. The school chooses Φ to maximize the probability of hire: 1 · Pr Φ (A) + Pr Φ (positive|B) · Pr Φ (B) + 0 · Pr Φ (C) , subject to the constraint imposed by the prior distribution of the student’s ability: X Pr Φ (peach|m) · Pr Φ (m) = Pr(peach) = 0.2. m

Under the optimal grading policy, grades A and B barely persuade the negative and positive employers to hire, whereas grade C makes the employer certain that the student is a lemon; so, after some algebra, PrΦ (peach|A) = p, PrΦ (peach|B) = 1 − p, and PrΦ (peach|C) = 0. Using these conditions, it is easy to show that PrΦ (positive|B) = 2p(1 − p). The school’s problem is thus a linear program: to maximize the utility function Pr (A) + 2p (1 − p) Pr (B) over probabilities Pr (A), Pr (B), and Pr (C), subject to the Bayesian budget constraint p Pr (A) + (1 − p) Pr (B) = 0.2. The marginal utilities of grades A, B, and C are 1, 2p (1 − p), and 0; the prices of these grades are p, (1 − p), and 0. Thus, the school faces a tradeoff: to choose a grading policy 5

that generates A with a small probability and persuades both the negative and positive employers or a grading policy that generates B with a high probability but persuades only the positive employer. The school resolves this tradeoff by choosing a policy that frequently generates grades with the highest marginal utility-price ratio (1/p for A and 2p for B). As an aside, this argument that optimal grading policies should frequently generate messages with high marginal utility-price ratios requires the student’s type to take only two values, but does not rely on the cardinality of the set of receiver’s types, the utility functional forms, or the form of the joint distribution of the sender and receiver’s types. The optimal grading policy can take three forms depending on the interview precision. √ If the interview is imprecise (1/2 ≤ p < 1/ 2), the marginal utility-price ratio is higher for A than for B; so the optimal grading policy targets the negative employer and generates √ grades A and C. If the interview is precise (p > 1/ 2), the ratio is higher for B, so the optimal policy targets the positive employer. In this case, if it is impossible to convince the positive employer with probability one (p < 4/5), the optimal policy generates grades B and C; otherwise (p > 4/5), the optimal policy generates grades A and B. When the interview is not too precise (p < 4/5), the optimal grading policy exhibits grade inflation: peaches and some lemons get a good grade but only lemons get a bad grade. √ Moreover, when the interview is imprecise (p < 1/ 2), grade inflation is moderate and a good √ grade impresses all employers. When the interview is relatively precise (1/ 2 < p < 4/5), grade inflation is severe and a good grade convinces only the positive employer. Finally, when the interview is too precise (p > 4/5), the optimal grading policy is noisy: with a positive probability, a peach gets a bad grade and a lemon gets a good grade. Figure 1 shows that the school and employer’s expected utilities under the optimal grading policy are not monotonic in the interview precision.4 Naive intuition may suggest that (i) the school’s expected utility should decrease with p because it is harder to influence a better informed employer and (ii) the employer’s expected utility should increase with p because a better informed employer takes a more appropriate hiring decision. This naive intuition, however, ignores that the optimal mechanism changes with p, and the school may choose to disclose significantly less information when the employer is more informed. This effect may overturn the naive intuition. In equilibrium, a more informative interview may help the school because the employer hires more students; it may also hurt the employer because the employer hires worse students. 4

Figure 1 normalizes the utility functions as follows. Both the school and the employer get utility 0 if

the student is not hired. The school gets utility 1 if the student is hired. The employer gets utility 1 from hiring a peach and utility -1 from hiring a lemon.

6

E[v] 2/5

E[u]

1/5

1/5

optimal revelation full revelation no revelation

0 1/2

A, C

B, C √ 1/ 2 4/5

A, B 1 p

0 1/2

A, C

B, C √ 1/ 2 4/5

A, B 1 p

Figure 1: The sender and receiver’s expected utilities in the school–employer example In fact, the school’s expected utility strictly increases with the interview precision for √  p ∈ 1/ 2, 4/5 , where the optimal grading policy targets the positive employer. As the interview precision increases, the positive employer becomes more positive, so it becomes easier for the school to persuade the employer. Moreover, the employer’s expected utility drops down to 0 as the interview precision √ √ exceeds 1/ 2. At p around 1/ 2, neither the positive nor the negative employer would hire √ if the grading policy was completely uninformative. For p slightly above 1/ 2, the optimal grading policy targets the positive employer and thus extracts all rent from the positive √ employer. In contrast, for p slightly below 1/ 2, the optimal grading policy targets the negative employer, and thus leaves some rent to the positive employer.

3

Model

3.1

Setup

Consider a communication game between a sender and a receiver. The sender chooses an information disclosure mechanism (described below) and the receiver takes one of two actions: to act (a = 1) or not to act (a = 0). The set of receiver’s types is R = [r, r], and the set of sender’s types is S = [s, s]. The pair (r, s) has some joint prior distribution. For this joint distribution, the marginal distribution F (s) of s and the conditional distribution G (r|s) of r given s admit strictly

7

positive densities f (s) and g (r|s) that are continuous in s and continuously differentiable in r. The sender and receiver’s utilities from a = 0 are normalized to zero. The sender and receiver’s utilities from a = 1 are v(r, s) and u(r, s), respectively, where functions v and u are continuous in s and continuously differentiable in r. Before s is realized, the sender chooses a mechanism that sends a message m ∈ R to the receiver as a (stochastic) function of the sender’s type s. Specifically, the sender chooses a joint distribution Φ(m, s) of m and s such that the marginal distribution of s under Φ equals the prior marginal distribution F . The timing of the communication game is as follows. First, the sender publicly chooses a mechanism Φ. Second, a triple (m, s, r) is drawn according to distributions Φ and G. Third, the receiver observes (m, r) and takes an action a. Finally, the sender and receiver’s utilities are realized. Let PΦ (s|m) denote the distribution of s given m under Φ. The distribution of s given m and r is then

Rs s

g(r|˜ s)dPΦ (˜ s|m)

s

g(r|˜ s)dPΦ (˜ s|m)

PΦ (s|m, r) = R s

.

The receiver’s expected utility from a = 1 given m and r is Rs Z s u(r, s)g(r|s)dPΦ (s|m) s u(r, s)dPΦ (s|m, r) = . Rs g(r|s)dPΦ (s|m) s s R Therefore, the receiver strictly prefers to act if S u e (r, s) dPΦ (s|m) > 0 and strictly prefers R not to act if S u e (r, s) dPΦ (s|m) < 0, where u e (r, s) ≡ u (r, s) g (r|s) . I impose a single-crossing assumption which ensures that each message of a mechanism induces the receiver to act if and only if his type exceeds a threshold type. Assumption 1 (Single crossing) For each distribution Q on S, there exists rQ ∈ R such R that S u e (r, s) dQ (s) T 0 if r T rQ . Moreover, there exists a strictly decreasing function r∗ that satisfies u (r∗ (s) , s) = 0 for all s ∈ S. A message m of a mechanism Φ induces a distribution Q of s. By Assumption 1, after observing m, the receiver of type rQ is indifferent between the two actions, and the receiver of type r > rQ strictly prefers to act. Without loss of generality, I restrict attention to mechanisms Φ such that each message m of Φ induces the receiver to act if and only if 8

r ≥ m.5 Therefore, the set of feasible messages is the image R∗ ≡ r∗ (S) of S under r∗ , and the sender’s expected utility from message m ∈ R∗ is Z r V (m, s) ≡ v (r, s) g (r|s) dr. m

Remark 1 Assumption 1 is stronger than a standard single-crossing assumption, which R requires that for each distribution Q on S, inequality S u e (r1 , s) dQ (s) ≥ (>) 0 implies R u e (r2 , s) dQ (s) ≥ (>) 0 whenever r2 > r1 . This standard single-crossing assumption holds S if and only if, for all s1 , s2 ∈ S, functions u e (r, s1 ) and u e (r, s2 ) of r ∈ R satisfy signed-ratio monotonicity (Theorem 1 of Quah and Strulovici 2012). In particular, it holds if u (r, s) increases with (r, s) and types (r, s) are affiliated. Section 4.3 imposes a stronger linearity assumption which ensures that the sender’s expected utility depends only on the sender’s expected type, EΦ [s|m]. Assumption 2 (Linearity) For all (r, s) ∈ R×S, u (r, s) = s−r, v (r, s) = v (r), G (r|s) = G (r), and S ⊂ R.6 Under Assumption 2, the receiver acts if and only if r ≤ EΦ [s|m]. Therefore a message m of a mechanism Φ satisfies m = EΦ [s|m], and the sender’s expected utility from a message Rm m is V (m) ≡ r v (r) g (r) dr (which depends only on EΦ [s|m]). Remark 2 Equivalent to Assumption 2, I could directly assume that the sender’s expected utility V depends only on EΦ [s|m]. Kamenica and Gentzkow (2011) refer to this assumption as “Sender’s payoff depends only on the expected state”. This assumption may hold even if the receiver has more than two actions. In particular, it holds if the set of actions is compact, the sender and receiver’s types are independent, and the sender and receiver’s utility functions are linear in the sender’s type and continuous in the receiver’s type and action. 5

Although type r = m is indifferent between the two actions, I assume that he acts. This assumption is

innocuous, because, for any r ∈ R, the receiver has type r with probability 0, since G admits a density. 6 Notice that if r is replaced with −r in Assumption 2, then higher types of the receiver are more willing to act, and Assumption 1 holds. But exposition is easier without this replacement. Notice also that Assumption 2 requires r to be independent of s. In the context of the school–employer example, s may correspond to the student’s type privately known by the school, and r to the opportunity cost from hiring privately known by the employer.

9

3.2

Equivalent Alternative Model

Consider an alternative model where an uninformed receiver takes an action r from set R = [r, r]. If the receiver takes action r and the sender’s type is s, then the sender and receiver’s utilities are V (r, s) and U (r, s), where V and U are continuous in s and twice continuously differentiable in r. The set of sender’s types remains S = [s, s], and the prior distribution of s remains F . Parallel to Assumption 1, I impose an assumption which ensures that for each message of a mechanism the receiver’s utility is single-peaked in his action. Assumption 1’ (Single crossing) For each distribution Q on S, there exists rQ ∈ R such R that S [−∂U (r, s) /∂r] dQ (s) T 0 if r T rQ . Moreover, there exists a strictly decreasing function r∗ that satisfies ∂U (r∗ (s), s) /∂r = 0 for all s ∈ S. It turns out that the sender’s problem of choosing an optimal mechanism in this alternative model under Assumption 1’ is the same as in the original model from Section 3.1 under Rr Assumption 1. Given v, u, and g from the original model, set V (r, s) = r v (˜ r, s) g (˜ r|s) d˜ r Rr 7 and U (r, s) = r u (˜ r, s) g (˜ r|s) d˜ r. Notice that in both models, a message m under mechanism Φ induces some distribution Q of s. In the original model, Q induces the receiver to Rr act if and only if r ≥ rQ , so the sender’s utility is rQ v (˜ r, s) g (˜ r|s) d˜ r = V (rQ , s); in this alternative model, Q induces the receiver to take action rQ , so the sender’s utility is again V (rQ , s). The receiver’s threshold type rQ in the original model is thus isomorphic to the receiver’s optimal action rQ in this alternative model. Similar to the equivalence between Assumptions 1 and 1’, the following assumption is equivalent to Assumption 2. Assumption 2’ (Linearity) For all (r, s) ∈ R × S, U (r, s) = −(r − s)2 , V (r, s) = V (r), and S ⊂ R. Under Assumption 2’, the receiver takes action r = EΦ [s|m]. Therefore a message m of a mechanism satisfies m = EΦ [s|m], and the sender’s expected utility from a message m is V (m), as in the original model. To sum up, this alternative model with Assumptions 1’ and 2’ is equivalent to the original model with Assumptions 1 and 2. Consequently, all the results in Section 4 hold verbatim in this alternative model. 7

Equivalently, given V and U from this alternative model, set v, u, and g such that v(r, s)g(r|s) =

−∂V (r, s) /∂r and u(r, s)g(r|s) = −∂U (r, s) /∂r.

10

4

Optimal Mechanisms

Section 4.1 sets up the sender’s problem as a linear program and presents some basic duality results. Under Assumption 1, Section 4.2 characterizes necessary and sufficient conditions under which the full and no revelation mechanisms are optimal. Under Assumption 2, Section 4.3 characterizes necessary and sufficient conditions under which an interval revelation mechanism is optimal. A mechanism is called an interval revelation mechanism with bounds sL and sH if sL , sH ∈ S, sL ≤ sH , and it generates one message for all s ∈ [s, sL ), another message for all s ∈ (sH , s], and a different message for each s ∈ (sL , sH ).8 In particular, the full revelation mechanism (denoted by Φf ull ) is an interval revelation mechanism with bounds sL = s and sH = s; and the no revelation mechanism (denoted by Φno ) is an interval revelation mechanism with bounds sL = sH = s (or equivalently sL = sH = s).

4.1

Linear Programming Characterization

Under Assumption 1, an optimal mechanism is a distribution Φ that solves the following primal linear program,

Z maximize

V (r, s) dΦ (r, s)

(P)

R∗ ×S

Z

Z e R ∗ ×S

Z e R×S

f (s) ds for any measurable set Se ⊂ S,

dΦ (r, s) =

subject to

(P1)

e S

e ⊂ R∗ . u e (r, s) dΦ (r, s) = 0 for any measurable set R

(P2)

The objective function is the sender’s expected utility under Φ. The first constraint (P1) is the feasibility requirement that the marginal distribution of s under Φ is F . The second constraint (P2) is the consistency requirement that message m = r makes the receiver r indifferent between the two actions. The dual problem is to find bounded measurable functions η and ν that Z minimize η (s) f (s) ds

(D)

S

subject to η (s) + u e (r, s) ν (r) ≥ V (r, s) for all (r, s) ∈ R∗ × S.

(D1)

The variables η (s) and ν (r) are multipliers for constraints (P1) and (P2). 8

Since the distribution F of s admits a density, it is not necessary to specify what message is sent for a

finite number of types, such as sL and sH .

11

Say that Φ is feasible for (P) if it is a distribution that satisfies (P1) and (P2). Similarly, say that η and ν are feasible for (D) if they are bounded measurable functions that satisfy (D1). Feasible Φ and (η, ν) that solve their respective problems (P) and (D) are called optimal solutions. Lemma 1 gives sufficient conditions under which candidate feasible solutions Φ and (η, ν) are optimal: Lemma 1 Suppose Assumption 1 holds. If Φ is feasible for (P), (η, ν) is feasible for (D), and

Z R∗ ×S

(η (s) + u e (r, s) ν (r) − V (r, s)) dΦ (r, s) = 0,

(C)

then Φ and (η, ν) are optimal solutions, and the values of (P) and (D) are the same. Lemma 2 establishes the existence of optimal solutions and shows that complementarity condition (C) is not only sufficient but also necessary for optimality of Φ and (η, ν): Lemma 2 Suppose Assumption 1 holds. There exists an optimal mechanism Φ, an optimal solution to the primal problem (P). There exists an optimal solution to the dual problem (D) in which η is continuous. Moreover, (C) holds for these optimal Φ and (η, ν). Lemmas 1 and 2 yield necessary and sufficient conditions under which a given mechanism is optimal. Specifically, a candidate mechanism Φ is optimal if and only if there exists (η, ν) that satisfies feasibility condition (D1) and complementarity condition (C). Given (D1), Condition (C) holds if and only if η (s) = V (r, s) − u e (r, s) ν (r) for all (r, s) in the support of Φ. For this η (s), we can find conditions on the primitives u e, V , and F which are equivalent to the existence of function ν (r) which satisfies (D1).9 Lemma 1 (Lemma 2) implies that these conditions are sufficient (necessary) for Φ to be optimal.

4.2

Full and No Revelation under Single Crossing

Besides their simplicity and widespread use, the full revelation mechanism Φf ull and the no revelation mechanism Φno satisfy two important properties under Assumption 1. First, Φf ull and Φno are extremal in the following strong sense: (i) Φf ull uniquely maximizes the receiver’s expected utility, and (ii) Φno uniquely minimizes the receiver’s expected utility (see Proposition 6 in the Appendix). Second, if the sender privately knew s, did not have commitment power, and always preferred to act in that v (r, s) > 0 for all (r, s), then: (i) 9

This step is known as Fourier-Motzkin elimination of ν (r).

12

Φf ull would be the unique equilibrium outcome of a persuasion game (Milgrom 1981) in which the sender can withhold information but cannot misrepresent information; and (ii) Φno would be the unique equilibrium outcome of a cheap-talk game (Crawford and Sobel 1982) in which the sender can say anything.10 The first main result derives necessary and sufficient conditions under which Φf ull and Φno are optimal. Note that Φf ull generates message r∗ (s) for each s ∈ S, and Φno generates R the same message rno for all s ∈ S, where rno is a unique r that solves S u e (r, s) f (s) ds = 0. Let sno be a unique s that solves u (rno , s) = 0. Proposition 1 Suppose Assumption 1 holds. 1. All mechanisms are optimal if and only if, for all s1 , s2 ∈ S and r ∈ R such that s2 > s1 and r ∈ (r∗ (s2 ) , r∗ (s1 )) V (r∗ (s2 ) , s2 ) − V (r, s2 ) V (r∗ (s1 ) , s1 ) − V (r, s1 ) = . u e (r, s2 ) u e (r, s1 )

(1)

2. The full revelation mechanism is optimal if and only if, for all s1 , s2 ∈ S and r ∈ R such that s2 > s1 and r ∈ (r∗ (s2 ) , r∗ (s1 )), V (r∗ (s1 ) , s1 ) − V (r, s1 ) V (r∗ (s2 ) , s2 ) − V (r, s2 ) ≥ . u e (r, s2 ) u e (r, s1 )

(2)

3. The no revelation mechanism is optimal if and only if, for all s1 , s2 ∈ S and r ∈ R such that s2 > s1 and r ∈ (r∗ (s2 ) , r∗ (s1 )), V (r,s2 )−V (rno ,s2 ) u e(r,s2 )

+

u e(rno ,s2 ) ∂V (rno ,sno )/∂r u e(r,s2 ) ∂ u e(rno ,sno )/∂r



V (r,s1 )−V (rno ,s1 ) u e(r,s1 )

+

u e(rno ,s1 ) ∂V (rno ,sno )/∂r . u e(r,s1 ) ∂ u e(rno ,sno )/∂r

(3)

To verify optimality of a mechanism Φ, one needs to check that no deviation from Φ to any feasible mechanism increases the sender’s expected utility, which requires a lot of checks. It turns out that Proposition 1 can be interpreted as follows: for optimality of Φf ull and Φno , it is necessary and sufficient to check that only certain deviations from these mechanisms do not increase the sender’s expected utility. I now define these deviations. For any s1 , s2 ∈ S and r ∈ R such that s2 > s1 and r ∈ (r∗ (s2 ), r∗ (s1 )), say that the sender prefers to reveal s1 and s2 than to pool them at r if, for the prior distribution that 10

In the persuasion game, if the sender sent the same message r for two or more different s in equilibrium,

then there would exist se such that the sender se sent r but u (r, se) > 0, which leads to a contradiction because the sender se would strictly prefer to reveal her type. In the cheap-talk game, if the sender sent two different messages r1 and r2 in equilibrium, then she would strictly prefer to send min {r1 , r2 } regardless of s, which leads to a contradiction.

13

assigns probabilities p1 and p2 = 1 − p1 on states s1 and s2 that make type r indifferent P between the two actions, 2i=1 pi u e(r, si ) = 0, the sender’s expected utility is higher under the full revelation mechanism than under the no revelation mechanism, 2 X



pi V (r (si ), si ) ≥

i=1

2 X

pi V (r, si ).

(4)

i=1

Similarly, say that the sender is indifferent between revealing s1 and s2 and pooling them at r if (4) holds with equality. For any s1 , s2 , s3 ∈ S and r ∈ R such that s2 > s1 , r ∈ (r∗ (s2 ) , r∗ (s1 )), and sgn (rno − r) = sgn (r∗ (s3 ) − rno ), say that the sender prefers to pool s1 , s2 , and s3 at rno than to pool s1 and s2 at r and to reveal s3 if, for the prior distribution that assigns probabilities p1 , p2 , and p3 = 1 − p1 − p2 on states s1 , s2 , and s3 that make type rno indifferent between the two P e(rno , si ) = 0, and make type r indifferent between the two actions given that actions, 3i=1 pi u P2 e(r, si ) = 0, the sender’s expected utility is higher under the no revelation s 6= s3 , i=1 pi u mechanism than under the mechanism that generates message r for s1 and s2 and message r∗ (s3 ) for s3 , 3 X

pi V (rno , si ) ≥

i=1

2 X

pi V (r, si ) + p3 V (r∗ (s3 ), s3 ).

(5)

i=1

Finally, say that the sender prefers to pool s1 , s2 , and s3 at rno than to pool s1 and s2 at r and to reveal s3 for s3 approaching sno if (5) holds in the limit as s3 → sno . Using these definitions, I can now restate Proposition 1 as follows. Corollary 1 Suppose Assumption 1 holds. 1. All mechanisms are optimal if and only if, for all s1 , s2 ∈ S and r ∈ R such that s2 > s1 and r ∈ (r∗ (s2 ) , r∗ (s1 )), the sender is indifferent between revealing s1 and s2 and pooling them at r. 2. The full revelation mechanism is optimal if and only if, for all s1 , s2 ∈ S and r ∈ R such that s2 > s1 and r ∈ (r∗ (s2 ) , r∗ (s1 )), the sender prefers to reveal s1 and s2 than to pool them at r. 3. The no revelation mechanism is optimal if and only if, for all s1 , s2 ∈ S and r ∈ R such that s2 > s1 and r ∈ (r∗ (s2 ) , r∗ (s1 )), the sender prefers to pool s1 , s2 , and s3 at rno than to pool s1 and s2 at r and to reveal s3 for s3 approaching sno .

14

Conditions (1)–(3) in Proposition 1 are weaker than the corresponding conditions (i) – (iii) below from Kamenica and Gentzkow (2011).11 Following the notation of Assumption 1, suppose a message m of a mechanism Φ generates posterior distribution Q(s) = PΦ (s|m) of s, and let rQ be the receiver’s type who is indifferent between the two actions. The sender’s R indirect expected utility under Q is Vb (Q) = V (rQ , s) dQ (s). Kamenica and Gentzkow S

(2011) show the following: (i) All mechanisms are optimal if Vb (Q) is linear in Q, so that the sender is indifferent between separating posteriors Q1 and Q2 and pooling them at αQ1 + (1 − α) Q2 ; (ii) Full revelation Φf ull is optimal if Vb (Q) is convex in Q, so that the sender prefers to separate Q1 and Q2 than to pool them at αQ1 + (1 − α) Q2 ; (iii) No revelation Φno is optimal if the concave closure of Vb evaluated at the prior F is equal to Vb (F ),12 so that (after a moment of reflection), for QF whose mean is arbitrarily close to EF [s], the sender prefers to pool Q and QF at F than to separate them. Proposition 1 shows that it is sufficient to check (i) and (ii) only for degenerate distributions Q1 and Q2 whose supports are s1 and s2 , respectively, and to check (iii) only for discrete Q whose support is {s1 , s2 } and degenerate QF whose support is s3 where s3 is arbitrarily close to sno . Conditions (1)–(3) are necessary because, for optimality of a candidate mechanism, one needs to check all deviations from the mechanism, including those described in (1)–(3). The proof of sufficiency of conditions (1)–(3) relies on Lemmas 1 and 2, but we can build the intuition by focusing on decomposed mechanisms in which each message is sent by at most two types of the sender. To justify this focus, I construct a decomposed version of Φno for the case in which u (r, s) is linear in s, s is uniformly distributed on S = [−1, 1], and r is independent of s. Consider a mechanism that sends a message m0 for s = 0 and a different message me for each pair {−e, e} of S, where e ∈ (0, 1]. Noting that E[s|me ] = E[s] = 0 for all e ∈ [0, 1] implies that this (decomposed) mechanism induces the same mapping from 11

Rayo and Segal (2010) study a special case of my model with u (r, s) = s − r, v (r, s) = v (s), and −1

g (r|s) = (r − r)

for all (r, s) ∈ R × S. Proposition 1 can be used to establish their key lemma (Lemma 1),

which shows that pooling two types s2 > s1 yields a higher (lower) expected utility to the sender than separating them if v (s2 ) ≤ v (s1 ) (if v (s2 ) ≥ v (s1 )). If, in addition, v (s) = bs+c for all s, then Assumption 2 holds and Proposition 1 implies: (i) all mechanisms are optimal if b = 0, (ii) Φf ull is optimal if b > 0, and (iii) Φno is optimal if b < 0 (see Corollary 2 below). 12 Intuitively, a concave closure of a function (defined on a convex set) is the smallest concave function that is everywhere greater than the original function.

15

(r, s) to the receiver’s action as Φno . This argument can be generalized to show that any mechanism can be decomposed in this way. I now discuss the intuition for sufficiency conditions of Proposition 1, starting from part 1. Consider any non-trivial message of a decomposed mechanism. This message is sent by some two types of the sender. By (1), the sender is indifferent between revealing these two types and pooling them, so the sender is indifferent between the original mechanism and the mechanism that differs only in that it reveals these two types. Continuously modifying the original mechanism for each message until all types are revealed implies that the sender is indifferent between the original mechanism and Φf ull , so part 1 follows. I now turn to part 2 of Proposition 1. Again, consider any non-trivial message of a decomposed mechanism. This message is sent by some two types of the sender. By (2), the sender prefers to reveal these two types than to pool them, so the sender prefers the mechanism that differs from the original one only in that it reveals these two types. Continuously modifying the original mechanism for each message until all types are revealed implies that the sender prefers Φf ull to the original mechanism, so part 2 follows. Finally, I provide the intuition for a weaker version of part 3 of Proposition 1. Namely, if the sender prefers to pool s1 , s01 , s2 , s02 at rno than to pool s1 , s01 at r1 and to pool s2 , s02 at r2 for all feasible s1 , s01 , s2 , s02 , r1 , r2 , then Φno is optimal. Consider two non-trivial messages of a decomposed mechanism. Suppose that the first message m1 is sent by s1 and s01 and makes the receiver r1 ≤ rno indifferent. Similarly, suppose that the second message m2 is sent by s2 and s02 and makes the receiver r2 ≥ rno indifferent. The sender prefers the mechanism that differs only in that it sends one message that makes the receiver rno indifferent instead of sending both m1 and m2 . Continuously applying this argument for pairs of messages until all types are pooled implies that the sender prefers Φno to the original mechanism, so this weaker version of part 3 follows.

16

4.3

Interval Revelation under Linearity

Under Assumption 2, Proposition 2 simplifies the sender’s problem of finding an optimal mechanism to a problem of finding an optimal distribution of messages. Proposition 2 Suppose Assumption 2 holds. Let H denote the marginal distribution of m under the optimal mechanism. Then, H maximizes

Rs s

V (m) dH (m)

subject to F is a mean-preserving spread of H.

(6)

The objective function in (6) represents the sender’s expected utility; and the constraint in (6) describes the set of feasible distributions of m.13 The intuition for the constraint is as follows. If F is a mean-preserving spread of H, then F is more informative about the sender’s type than H in the sense of Blackwell (1953). A mechanism can garble the sender’s information to achieve any less informative distribution H of m than the prior F . Conversely, because a mechanism can only garble the sender’s information, F must be a mean-preserving spread of H for any feasible mechanism. By Proposition 2, the curvature of V determines the form of the optimal mechanism. Corollary 2 Suppose Assumption 2 holds and let rno = EF [s]. 1. All mechanisms are optimal if and only if V is linear on S. 2. Φf ull is optimal if and only if V is convex on S. 3. Φno is optimal if and only if V (r) ≤ V (rno ) + V 0 (rno ) (r − rno ) for all r ∈ S. All three parts of Corollary 2 are straightforward implications of (6).14 First, if V is linear, then the sender is risk neutral, so all mechanisms are equivalent. Second, if V is convex, then the sender is risk loving, so the full revelation mechanism is optimal. Third, if V is concave, then the sender is risk averse, so the no revelation mechanism is optimal. More precisely, part 3 requires that the concave closure V of V on S is equal to V at rno . The second main result derives necessary and sufficient conditions under which an interval revelation mechanism with bounds sL and sH is optimal. This mechanism generates message 13

Kamenica and Gentzkow (2011) note that all feasible H have the same mean as F , but that not all

such H are feasible. Proposition 2 shows that H is feasible if and only if F is a mean-preserving spread of H. Using Proposition 2, Gentzkow and Kamenica (2016) provide an alternative characterization of feasible mechanisms. 14 Rayo and Segal (2010) and Kamenica and Gentzkow (2011) also obtain versions of Corollary 2.

17

rL = EF [s|s < sL ] for all s ∈ [s, sL ), message rH = EF [s|s > sH ] for all s ∈ (sH , s], and message s for each s ∈ (sL , sH ). In a special case of sL = sH , the revelation interval (sL , sH ) is empty, so the mechanism generates only two messages, rL and rH . Proposition 3 Suppose Assumption 2 holds. 1. An interval revelation mechanism with bounds sL < sH is optimal if and only if V (r) ≤ V (rL ) + V 0 (rL ) (r − rL ) for all r ∈ [s, sL ] with equality at sL , V (r) ≤ V (rH ) + V 0 (rH ) (r − rH ) for all r ∈ [sH , s] with equality at sH , V (r) is convex for all r ∈ (sL , sH ) . 2. An interval revelation mechanism with bounds sL = sH is optimal if and only if V (r) ≤ V (rL ) + V 0 (rL ) (r − rL ) for all r ∈ [s, sL ] , V (r) ≤ V (rH ) + V 0 (rH ) (r − rH ) for all r ∈ [sH , s] , V (rL ) + V 0 (rL ) (sL − rL ) = V (rH ) + V 0 (rH ) (sH − rH ) , V 0 (rL ) ≤ V 0 (rH ) . I now discuss implications of Proposition 3 for the case when the derivative V 0 (r) of the sender’s expected utility is either unimodal or bimodal. The derivative V 0 is unimodal if it has a unique local (and therefore global) maximum at rm ∈ R; the maximum point rm is called a mode. Consider the case of unimodal V 0 in which rm ∈ S and rno < rt , where rt is the point of tangency illustrated in Figure 2 (a).15 If F were to assign strictly positive probabilities only on s and s, then the optimal mechanism would send two messages s and rt and the sender’s expected utility would achieve the concave closure V (rno ). This mechanism, however, is not feasible when F admits a density because s is equal to s with probability 0. By part 1 of Proposition 3, sL = s and sH ∈ (s, s), so the optimal mechanism reveals s for s < sH and sends the same message rH for all s > sH , where the bound sH is determined by the condition that the sender is indifferent between revealing sH and pooling it with rH .16 15 16

In the remaining cases of unimodal V 0 , either Φf ull or Φno is optimal by Corollary 2. In an extreme case, when V is a step function with V (r) = 0 for r < rt and V (r) = 1 for r ≥ rt , the

optimal mechanism reveals s for s < sH and sends the same message rt for s > sH , where sH is a unique solution to EF [s|s > sH ] = rt . This is the case of an uninformed receiver (r = rt with probability 1) studied in Kolotilin (2015).

18

V

V V

V

V

V

s sH rno

rt

s

r

s rt0

rno sL

s

r

rt0

rno

rt s r

V0 V0 rm (a)

r

V0

← rm

rm0 r

← rm

(b)

rm0

r

(c)

Figure 2: Sender’s utility V and concave closure V when derivative V 0 is unimodal (a) and bimodal (b,c) The derivative V 0 is bimodal if it has two local maxima at rm , rm0 ∈ R. If rm < s < rt0 < rno < s < rm0 , where rt0 is the point of tangency illustrated in Figure 2 (b), then, by part 1 of Proposition 3, sH = s and sL ∈ (s, s), so the optimal mechanism reveals s for s > sL and sends the same message rL for all s < sL . If rm < s < rt0 < rno < rm0 < rt < s, where rt and rt0 are the points of tangency illustrated in Figure 2 (c), then the optimal mechanism takes one of the following three forms. The first two forms correspond to the interval revelation mechanisms (with interior bounds sL , sH ∈ (s, s)) from parts 1 and 2 of Proposition 3. The third form corresponds to the mechanism that sends the two messages rt and rt0 , so that the sender’s expected utility achieves the concave closure V (rno ).17 17

This mechanism does not generally belong to the class of interval revelation mechanisms; so this case is

not considered in Proposition 3.

19

5

Comparative Statics

This section studies the value of the receiver’s information. I depart from the assumptions of Section 3 and instead impose the following three assumptions. Assumption 3 u and v are increasing in s. Assumption 4 v (r, s) = v (s) and u (r, s) = u (s) for all (r, s) ∈ R × S. Assumption 5 v (s) > 0, u (s) > 0, and EF [u (s) |s : v (s) > 0] < 0. Assumption 3 requires that the sender and receiver are more willing to act for higher types s. Assumption 4 requires that the receiver’s type affects the receiver’s belief but does not directly affect the sender and receiver’s utilities. Assumption 5 requires that the sender can influence the receiver’s action but cannot achieve her first-best outcome if the receiver is uninformed. Assumption 3 is mainly for ease of presentation; Assumption 4 is crucial for Proposition 4; and Assumption 5 is crucial for Proposition 5. Let the set of receiver’s types R be finite; so the receiver’s information structure G can be described by conditional probabilities q (r|s) of r given s, where q (r|s) is a measurable function of s for each r ∈ R. For each G, the sender and receiver’s expected utilities under the optimal mechanism are denoted by VG and UG , respectively. I use Blackwell (1953)’s ordering of information structures: G is more informative than G 0 if there exists a stochastic P matrix D such that q 0 (r0 |s) = r∈R D (r0 |r) q (r|s) for all (r0 , s) ∈ R0 × S. An information structure G is public if q (r|s) is either 0 or 1 for all (r, s) ∈ R × S; that is, the receiver’s type is deterministically determined by the sender’s type if G is public. Let Gf ull and Gno represent fully informative and completely uninformative (public) information structures. That is, Rf ull = S and qf ull (s|s) = 1 for all s ∈ S; Rno = {rno } and qno (rno |s) = 1 for all s ∈ S. Although Rf ull is not finite, it is clear that any information structure G is less informative than Gf ull and more informative than Gno . Before discussing non-monotone comparative statics, I present a benchmark result (also found in Kolotilin 2015) that provides sufficient conditions for monotone comparative statics.18 The receiver’s expected utility increases and the sender’s expected utility decreases with the precision of the receiver’s private information if this precision is either very low or very high. Moreover, this monotonicity holds for all levels of precision if the receiver’s information is public. 18

Relatedly, Bergemann and Morris (2016a) show that the set of implementable outcomes decreases as the

information structure becomes more informative, which implies the sender’s part of Proposition 4.

20

Proposition 4 Suppose Assumptions 3–5 hold and distribution F admits a strictly positive density f on S. 1. For any information structure G, we have UGf ull ≥ UG ≥ UGno and VGno ≥ VG ≥ VGf ull . 2. For any two public information structures G and G 0 , such that G is more informative than G 0 , we have UG ≥ UG 0 and VG 0 ≥ VG . Intuitively, if G and G 0 are public, and G is more informative than G 0 , then, under G 0 , the sender can first make public information more precise, from G 0 to G, and then implement any mechanism Φ available under G, implying VG 0 ≥ VG . To get the intuition for the receiver’s part of Proposition 4, suppose that the sender’s utility is type-independent, so that v(s) = 1 for all s. In this case, under public G, the optimal mechanism and the no revelation mechanism give the same expected utility to the receiver, implying UG ≥ UG 0 . In light of Proposition 4, the utility non-monotonicity presented in Section 2 can only arise when the receiver’s information is private and its precision is intermediate. The next proposition shows that, if the sender’s type can take only two values, it is always possible to increase the precision of the receiver’s private information in such a way that the sender and receiver’s expected utilities change non-monotonically.19 Proposition 5 Suppose Assumptions 3–5 hold and the support of F is {s, s}. There exist two binary information structures G and G 0 , such that G is more informative than G 0 , yet UG 0 > UG , and VG > VG 0 . The intuition for Proposition 5 is similar to that in the school–employer example. Since the sender’s type can take only two values, without loss of generality, I assume that r = Pr(s|r) for all r ∈ R. By Assumption 5, the sender wants to persuade the receiver to act, but the receiver prefers not to act if he has no information beyond the prior. By continuity, we can find the receiver’s type r > Pr (s) who would still prefer not to act under the no revelation mechanism. A binary information structure of the receiver with r < r becomes more informative if r decreases and r stays constant. As r decreases, the probability of r 19

Bergemann and Morris (2016b) consider an example with v(s) = v(s) = 1, u(s) = 9/10, u(s) = −1,

Pr(s) = 1/2, and a binary information structure with the restriction that q(r|s) = q(r|s) = p. They show that the set of implementable outcomes (and, thus, the sender’s expected utility) decreases with the precision of the receiver’s private information p. This monotonicity does not hold without the restriction that q(r|s) = q(r|s) = p. Since their example satisfies Assumptions 3–5, Proposition 5 implies that there exist two binary information structures G and G 0 , such that G is more informative than G 0 , yet the sender is strictly better off under G.

21

increases, because Pr (r) r + Pr (r) r = Pr (s), so it becomes relatively more attractive for the sender to target r than r. There exists a critical value r at which the sender is exactly indifferent between which of the two types of the receiver to target. Above this value, the sender targets r, and the receiver’s expected utility is strictly positive, because r strictly prefers to act whenever r acts. Below this value, the sender targets r, the receiver’s expected utility is zero, and the sender’s expected utility increases as the receiver’s private information becomes more precise (r decreases), because the probability of r increases.

Appendix: Proofs Proof of Lemma 1. The lemma can be proved by applying Theorem 2.1 of Anderson and Nash (1987) to my model. But, to make the paper self-contained, I prove this lemma here. Since η is bounded and measurable on set S, (P1) implies Z Z η (s) dΦ (r, s) . η (s) f (s) ds = R∗ ×S

S

Since ν is bounded and measurable on set R, (P2) implies Z u e (r, s) ν (r) dΦ (r, s) = 0. R∗ ×S

Summing up these two equalities gives Z Z η (s) f (s) ds = R∗ ×S

S

(η (s) + u e (r, s) ν (r)) dΦ (r, s) .

Integrating (D1) over R∗ × S gives Z Z V (r, s) dΦ (r, s) ≤ R∗ ×S

R∗ ×S

(η (s) + u e (r, s) ν (r)) dΦ (r, s) .

(7)

(8)

Suppose that (C) holds for some feasible (η, ν) and Φ. Conditions (7) and (8) yield Z Z V (r, s) dΦ (r, s) = η (s) f (s) ds. (9) R∗ ×S

S

e Conditions (7) and (8) imply Consider any other feasible Φ. Z Z e V (r, s) dΦ (r, s) ≤ η (s) f (s) ds. R∗ ×S

S

Combining this inequality with (9) gives Z Z e (r, s) ≤ V (r, s) dΦ R∗ ×S

R∗ ×S

22

V (r, s) dΦ (r, s) ,

showing that Φ is an optimal solution to the primal problem (P). An analogous argument proves that (η, ν) is an optimal solution to (D). Finally, (9) shows that the values of (P) and (D) are the same. Proof of Lemma 2. The proof of this lemma is a modification of the proof of Theorem 5.2 in Anderson and Nash (1987), whose notation I closely follow. Conventions. The primal variable Φ is in Mr (R∗ × S), the space of finite signed measures on R∗ ×S with the total variation norm. The mechanism Φ is chosen from the positive closed convex cone P of finite positive measures on R∗ × S. The dual constraint function V (r, s) is in C (R∗ × S), the space of continuous measurable functions on R∗ × S with the uniform norm. The dual variables (η, ν) are in L∞ (S) × L∞ (R∗ ), the space of bounded measurable functions with the uniform norm. The primal constraint function (f, θ) is in L1 (S)×L1 (R∗ ), the space of absolutely integrable functions with the 1-norm, where θ is a zero function on the right hand side of (P2). Optimal solution to (P). One feasible Φ for the primal problem (P) is the full revelation mechanism. The feasible set of the primal problem is bounded because the total variation of any probability measure Φ is equal to one. The constraint map in (P1) is continuous because it is a projection; the constraint map in (P2) is continuous because u e is continuous. The space Mr is the dual of C by Corollary 14.15 of Aliprantis and Border (2006). Therefore, there exists an optimal solution Φ by Theorem 3.20 in Anderson and Nash (1987). Optimal solution to (D). Since V is continuous on the compact set R∗ × S, there exists a finite value V = maxr,s V (r, s). Functions η (s) = V and ν (r) = 0 are feasible for the dual problem, and the set of feasible (η, ν) can be bounded without affecting the value of the dual problem. The constraint map in (D1) is continuous because u e is continuous. The space L∞ is the dual of L1 by Theorem 13.28 of Aliprantis and Border (2006). Therefore, there exists an optimal solution (η, ν) by Theorem 3.20 in Anderson and Nash (1987). Equality (C) under optimal solutions. As can be seen from above, the dual problem has a finite value and functions η (s) = 2V and ν (r) = 0 are in the interior of the constraint set (D1). Therefore, there is no duality gap by Theorem 3.13 in Anderson and Nash (1987). Continuity of η. Observe that if (η, ν) is optimal, then (η ∗ , ν) is also optimal, where η ∗ (s) = sup r {V (r, s) − u e (r, s) ν (r)} . Indeed, η ∗ is feasible and η ∗ ≤ η, because η satisfies (D1) for all r, so the objective in (D) is smaller under η ∗ . I now show that η ∗ is continuous. Since R∗ × S is compact, V and u e are uniformly continuous. Thus, since ν is bounded, for any ε > 0, there exists δ > 0 such that |(V (r, s) − u e (r, s) ν (r)) − (V (r, s0 ) − u e (r, s0 ) ν (r))| < ε 23

(10)

for all r ∈ R∗ and s, s0 ∈ S such that |s − s0 | < δ. By definition of η ∗ , for any s there exists r such that η ∗ (s) < V (r, s) − u e (r, s) ν (r) + ε.

(11)

Thus, η ∗ (s0 ) ≥ V (r, s0 ) − u e (r, s0 ) ν (r) > V (r, s) − u e (r, s) ν (r) − ε > η ∗ (s) − 2ε, where the first inequality holds by definition of η ∗ , the second by (10), and the third by (11). Analogously, η ∗ (s) > η ∗ (s0 ) − 2ε, so |η ∗ (s) − η ∗ (s0 )| < 2ε whenever |s − s0 | < δ, implying that η ∗ is continuous. Proof of Proposition 1.

If part of part 1. Consider any mechanism Φ. Note that

condition (1) holds if and only if there exists a function b (r), such that, for all s ∈ S and all r ∈ (r∗ (s) , r∗ (s)), we have V (r∗ (s) , s) − V (r, s) = b (r) u e (r, s). Substituting this equation into (P2) gives Z

Z

V (r∗ (s) , s) dΦ (r, s) .

V (r, s) dΦ (r, s) = R∗ ×S

R∗ ×S

Taking into account (P1) gives Z Z V (r∗ (s) , s) f (s) ds, V (r, s) dΦ (r, s) = R∗ ×S

S

which implies that the sender’s expected utility is the same under all mechanisms. Only if part of part 1. Suppose to get a contradiction that there exist s2 > s1 , and r ∈ (r∗ (s2 ) , r∗ (s1 )) such that V (r∗ (s2 ) , s2 ) − V (r, s2 ) V (r∗ (s1 ) , s1 ) − V (r, s1 ) > . u e (r, s2 ) u e (r, s1 )

(12)

(The case in which the left hand side of (12) is strictly smaller than the right hand side Rx Rs is analogous.) Let w1 (x) = s1 u e (r, s) f (s) ds and w2 (x) = x 2 u e (r, s) f (s) ds. There exists ε1 > 0 such that the function w1 is continuously differentiable, strictly decreasing on [s1 , s1 + ε1 ], and vanishing at s1 . Likewise, there exists ε2 > 0 such that the function w2 is continuously differentiable, strictly decreasing on [s2 − ε2 , s2 ], and vanishing at s2 . Thus, on [s2 − ε2 , s2 ], we can define a continuously differentiable and strictly decreasing function s∗1 (x) that satisfies w1 (s∗1 (x)) + w2 (x) = 0. By the implicit function theorem, ds∗1 (x) u e (r, x) f (x) = . dx u e (r, s∗1 (x)) f (s∗1 (x)) 24

(13)

By continuity, there exists x2 < s2 such that (12) holds for all (s1 , s2 ) ∈ [s1 , s∗1 (x2 )] × [x2 , s2 ]. Consider two mechanisms that differ only in that one reveals s for all s ∈ [s1 , s∗1 (x2 )]∪[x2 , s2 ] and the other sends the same message for all s ∈ [s1 , s∗1 (x2 )] ∪ [x2 , s2 ]. That is, the former mechanism sends r∗ (s) and the latter sends r, because w1 (s∗1 (x)) + w2 (x) = 0. The sender strictly prefers the former mechanism, because the difference in the sender’s expected utility between the former and latter mechanisms is: Z (V (r∗ (s) , s) − V (r, s)) f (s) ds ∗ [s1 ,s1 (x2 )]∪[x2 ,s2 ] Z s∗1 (x2 ) (V (r∗ (s) , s) − V (r, s)) f (s) ds > s1 Z s2 u e (r, s) + (V (r∗ (s∗1 (s)) , s∗1 (s)) − V (r, s∗1 (s))) f (s) ds = 0, ∗ u e (r, s (s)) x2 1 where the inequality holds by (12) and the equality holds by (13) and the change of variables formula. This concludes the proof of “only if” part of part 1. Part 2. By Lemmas 1 and 2, Φf ull is optimal if and only if there exists feasible (η, ν) that satisfies

Z R∗ ×S

(η (s) + u e (r, s) ν (r) − V (r, s)) dΦf ull (r, s) = 0.

(14)

By (D1), the integrand is nonnegative, so (14) holds if and only if η (s) + u e (r∗ (s) , s) ν (r∗ (s)) = V (r∗ (s) , s) almost everywhere. Since u e (r∗ (s) , s) = 0, we have η (s) = V (r∗ (s) , s) almost everywhere. Since η is continuous by Lemma 2, and V and r∗ are continuous by assumption, η (s) = V (r∗ (s) , s) holds for all s ∈ S. Therefore, Φf ull is optimal if and only if there exists ν that satisfies (D1): V (r∗ (s) , s) + u e (r, s) ν (r) ≥ V (r, s) for all (r, s) ∈ R∗ × S,

(15)

which is equivalent to V (r, s2 ) − V (r∗ (s2 ) , s2 ) V (r∗ (s1 ) , s1 ) − V (r, s1 ) ≤ ν (r) ≤ u e (r, s2 ) −e u (r, s1 ) for all r ∈ (r∗ (s) , r∗ (s)) and s1 , s2 such that r ∈ (r∗ (s2 ) , r∗ (s1 )). (For r ∈ {r∗ (s) , r∗ (s)}, the existence of ν is obvious because (15) bounds ν only from one side.) There exists such ν if and only if (2) holds. Part 3. Analogously to part 2, Φno is optimal if and only if there exists feasible (η, ν) that satisfies η (s) + u e (rno , s) ν (rno ) = V (rno , s) for all s ∈ S. 25

(16)

Therefore, Φno is optimal if and only if there exists ν that satisfies (D1): V (rno , s) − u e (rno , s) ν (rno ) + u e (r, s) ν (r) ≥ V (r, s) for all (r, s) ∈ R∗ × S,

(17)

which is equivalent to V (r,s2 )−(V (rno ,s2 )−e u(rno ,s2 )ν(rno )) u e(r,s2 )

≤ ν (r) ≤

(V (rno ,s1 )−e u(rno ,s1 )ν(rno ))−V (r,s1 ) −e u(r,s1 )

(18)

for all r ∈ (r∗ (s) , r∗ (s)), and s1 , s2 ∈ S such that r ∈ (r∗ (s2 ) , r∗ (s1 )). (For r ∈ {r∗ (s) , r∗ (s)}, the existence of ν is obvious because (17) bounds ν only from one side.) At r = rno , both sides of (18) become ν (rno ). Thus, for (18) to be satisfied everywhere, the derivatives of both sides of (18) with respect to r evaluated at r = rno must coincide, which gives ν (rno ) =

∂V (rno ,s1 )/∂r u e(rno ,s1 ) ∂u e(rno ,s1 )/∂r u e(rno ,s1 )

− −

∂V (rno ,s2 )/∂r u e(rno ,s2 ) ∂u e(rno ,s2 )/∂r u e(rno ,s2 )

.

(19)

Taking the limit of (19) as s2 ↓ sno gives ν (rno ) =

∂V (rno , sno ) /∂r . ∂e u (rno , sno ) /∂r

(20)

Substituting ν (rno ) from (20) into (18) completes the proof of Proposition 1. P e(r, si ) = 0, Proof of Corollary 1. Parts 1 and 2. Since p2 = 1 − p1 and 2i=1 pi u p1 =

u e(r, s2 ) u e(r, s1 ) and p2 = − . u e(r, s2 ) − u e(r, s1 ) u e(r, s2 ) − u e(r, s1 )

By Assumption 1, u e(r, s1 ) < 0 < u e(r, s2 ) because r ∈ (r∗ (s2 ), r∗ (s1 )); so p1 , p2 ∈ (0, 1). Substituting p1 and p2 in (4) gives (2). Finally, (4) with equality is equivalent to (1). Part 3. Since sgn (rno − r) = sgn (r∗ (s3 ) − rno ), either rno ∈ (r, r∗ (s3 )) or rno ∈ (r∗ (s3 ), r) is true. Suppose that rno ∈ (r, r∗ (s3 )) (the other case is analogous). By Assumption 1, u e(rno , s3 ) < 0 because rno < r∗ (s3 ), u e(r, s1 ) < 0 < u e(r, s2 ) because r ∈ (r∗ (s2 ), r∗ (s1 )), P2 P2 and i=1 pi u e(rno , si ) > 0 because rno > r and i=1 pi u e(r, si ) = 0. Therefore, the system P3 P of equations p1 + p2 + p3 = 1, i=1 pi u e(rno , si ) = 0, and 2i=1 pi u e(r, si ) = 0 has a unique solution p1 , p2 , p3 ∈ (0, 1). Substituting these p1 , p2 , and p3 in (5) and rearranging gives V (r,s2 )−V (rno ,s2 ) u e(r,s2 )

+

u e(rno ,s2 ) V (rno ,s3 )−V (r∗ (s3 ),s3 ) u e(rno ,s3 ) u e(r,s2 )



V (r,s1 )−V (rno ,s1 ) u e(r,s1 )

+

u e(rno ,s1 ) V (rno ,s3 )−V (r∗ (s3 ),s3 ) . u e(rno ,s3 ) u e(r,s1 )

Taking the limit of this inequality as s3 → sno gives (3), because lim

s3 →sno

V (rno , s3 ) − V (r∗ (s3 ) , s3 ) ∂V (rno , sno ) /∂r dr∗ (sno ) ∂V (rno , sno ) /∂r =− = , u e (rno , s3 ) ∂e u (rno , sno ) /∂s ds ∂e u (rno , sno ) /∂r 26

where the first equality holds by L’Hospital’s rule and the second by the implicit function theorem applied to u e (r∗ (s) , s) = 0. Proof of Proposition 2.

Any mechanism Φ, whose messages m satisfy m = EΦ [s|m],

generates messages with a distribution H having the property that distribution F is a meanpreserving spread of H. It remains to verify that any distribution H having this property can be generated by a feasible mechanism. If F is a mean-preserving spread of H, then, by definition, s has the same distribution as m + z for some z such that E [z|m] = 0. Define Φ (m, e se) = Pr (m ≤ m, e m + z ≤ se) for all (m, e se) ∈ S×S. For this Φ, the marginal distribution of s is F and EΦ [s|m] = EΦ [m + z|m] = m. Therefore, Φ is a feasible mechanism whose messages are distributed according to H. Proof of Corollary 2.

Assumption 1 and Proposition 1 hold after replacing r with −r.

With this change of variables, r∗ (s) = s, V (r, s) = V (r), and u e (r, s) = (s − r) g (r). By part 1 of Proposition 1, all mechanisms are optimal if and only if (1): V (r) =

s2 − r r − s1 V (s1 ) + V (s2 ) for all s1 , s2 , r ∈ S. s2 − s1 s2 − s1

By part 2 of Proposition 1, Φf ull is optimal if and only if (2): V (r) ≤

r − s1 s2 − r V (s1 ) + V (s2 ) for all s1 , s2 , r ∈ S. s2 − s1 s2 − s1

By part 3 of Proposition 1, Φno is optimal if and only if (3), which simplifies to the condition of part 3 of Corollary 2. Proof of Proposition 3. Only if part of part 1. By Lemma 2, if the described mechanism Φ is optimal, then there exists feasible (η, ν) that satisfies Z (η (s) + (s − r) ν (r) − V (r)) dΦ (r, s) = 0. R×S

By the feasibility condition (D1), the integrand is nonnegative. Moreover, by Lemma 2, η is continuous, so     V (rL ) − (s − rL ) ν (rL ) for all s ∈ [s, sL ] , η (s) = V (s) for all s ∈ (sL , sH ) ,    V (r ) − (s − r ) ν (r ) for all s ∈ [s , s] . H H H H The feasibility condition (D1) implies V (s) + (s − r) ν (r) ≥ V (r) for all s, r ∈ (sL , sH ) .

27

Taking the limits s ↑ r and s ↓ r yields ν (r) = −V 0 (r) for all r ∈ (sL , sH ). Substituting back gives V (s) ≥ V (r) + V 0 (r) (s − r) for all s, r ∈ (sL , sH ) , which implies that V (r) is convex for all r ∈ (sL , sH ). The feasibility condition (D1) also implies V (rL ) − (s − rL ) ν (rL ) + (s − r) ν (r) ≥ V (r) for all s, r ∈ [s, sL ] . Writing these inequalities for s = sL and s = s gives V (rL ) − (sL − rL ) ν (rL ) + (sL − r) ν (r) ≥ V (r) , V (rL ) − (s − rL ) ν (rL ) + (s − r) ν (r) ≥ V (r) . Multiplying the first inequality by (r − s), the second by (sL − r), and adding up yields: (sL − s) (V (rL ) + (rL − r) ν (rL ) − V (r)) ≥ 0. Taking the limits r ↑ rL and r ↓ rL yields ν (rL ) = −V 0 (rL ). Substituting back gives V (r) ≤ V (rL ) + V 0 (rL ) (r − rL ) for all r ∈ [s, sL ] with equality at sL , where the equality holds by continuity of η. To complete the proof, we can use the same argument to get ν (rH ) = −V 0 (rH ) and V (r) ≤ V (rH ) + V 0 (rH ) (r − rH ) for all r ∈ [sH , s] with equality at sH . If part of part 1. Consider the described mechanism Φ and the constructed pair (η, ν):  0    V (rL ) + V (rL ) (s − rL ) for s ∈ [s, sL ] , η(s) = V (s) for s ∈ (sL , sH ) ,    V (r ) + V 0 (r ) (s − r ) for s ∈ [s , s] , H H H H  0    −V (rL ) for r ∈ [s, sL ] , ν(r) =

−V 0 (r) for r ∈ (sL , sH ) ,    −V 0 (r ) for r ∈ [s , s] . H H

The complementarity condition (C) holds by construction. Moreover, (η, ν) is feasible for (D), because, for all (r, s) ∈ S × S, we have: η (s) + (s − r) ν (r) ≥ η (r) ≥ V (r) , 28

where the first inequality holds because η (s) is convex for all s ∈ S and −ν (r) is a subderivative of η (r) for all r ∈ S. Therefore, by Lemma 1, Φ is optimal. Only if part of part 2. The proof of the first three conditions is the same as in part 1. To prove V 0 (rL ) ≤ V 0 (rH ), write the feasibility condition (D1) for s ≥ sH and r = rL : V (rH ) + V 0 (rH ) (s − rH ) ≥ V (rL ) + V 0 (rL ) (s − rL ) , and notice that both sides are equal at s = sH = sL and linear in s for s ≥ sH . If part of part 2. Consider the described mechanism Φ and the pair (η, ν) from part 1. By the same argument as in part 1, Φ is optimal. Proof of Proposition 4.

Although this proposition follows almost immediately from

Kolotilin (2015), below I provide a simpler proof adapted to this setting. Part 1. VG ≥ VGf ull , because the sender can always achieve VGf ull by choosing the full revelation mechanism. Similarly, VGno ≥ VG , because an outcome produced under G by Φ can also be achieved under Gno by a mechanism Φ0 that generates r according to q and m according to Φ. Trivially, UGf ull ≥ UG , because the receiver is best off when he knows the sender’s type. Finally, UG ≥ UGno , because the optimal mechanism under Gno gives the same expected utility to the receiver as the no revelation mechanism; that is, UGno = max {EF [u (s)] , 0}. Indeed, since EF [u (s) |s : v (s) > 0] < 0 by Assumption 5, the optimal mechanism induces the receiver to act if and only if s ≥ s∗ , where s∗ is the unique solution to EF [u (s) |s ≥ s∗ ] = 0 (Kolotilin 2015). Part 2. Public G partitions S into disjoint subsets Sr = {s : q (r|s) = 1}. Moreover, since public G is more informative than public G 0 , G is a refinement of G 0 ; that is, for each r ∈ R, there exists r0 ∈ R0 such that Sr ⊂ Sr0 . Therefore, an outcome produced under G by Φ can also be achieved under G 0 by a mechanism Φ0 that refines G 0 to G and generates m according to Φ, which implies that VG 0 ≥ VG . Under public G, r is deterministically determined by s, so we can allow mechanisms to be conditioned on r. By Kolotilin (2015), the optimal mechanism induces the receiver to act if and only if s ≥ s∗ (r), where s∗ (r) is the minimum se ∈ S such that v (e s) ≥ 0 and EF [u (s) |s ∈ Sr : s ≥ se] ≥ 0. Therefore, Z Z X UG = max u (s) f (s) ds, r∈R

s∈Sr

 u (s) f (s) ds, 0 ,

s∈Sr :v(s)≥0

which implies that UG ≥ UG 0 , because G is a refinement of G 0 .

29

Proof of Proposition 5.

If s can take only two values, Assumption 5 implies that

u (s) < 0 < v (s). Therefore, EF [u (s) |s : v (s) > 0] < 0 simplifies to EF [u (s)] < 0, which holds if and only if µ = Pr (s) < 1/ (1 + x), where x = −u (s) /u (s). For any r1 and r2 such that 0 < r1 < µ < r2 < 1/ (1 + x), there exists a binary information structure of the receiver with R = {r1 , r2 }, where r1 = Pr (s|r1 ) and r2 = Pr (s|r2 ). Similarly to the school–employer example, we can restrict attention to mechanisms that generate only three messages m2 , m1 , and m0 , where m2 makes the receiver r2 indifferent, m1 makes the receiver r1 indifferent, and m0 convinces both types of the receiver that s = s. Since r2 < 1/ (1 + x), neither type of the receiver would act if the sender chose the no revelation mechanism. It is easy to show then that the optimal mechanism is either Φ1 that generates m1 and m0 or Φ2 that generates m2 and m0 . Without loss of generality, assume that m = Pr (s|m) for all m. After receiving mi , where i ∈ {1, 2}, the receiver ri holds the posterior: Pr (s|mi , ri ) =

mi ri µ

+

mi ri µ (1−mi )(1−ri ) (1−µ)

.

Since mi makes ri indifferent, we have Pr (s|mi , ri ) = 1/ (1 + x), which is equivalent to mi =

µ (1 − ri ) . µ (1 − ri ) + (1 − µ) ri x

Since the posteriors Pr (s|m) must average out to the prior Pr (s), the mechanism Φi generates mi with probability µ/mi . Therefore, the sender’s expected utility under Φ1 is: µ ((1 − m1 ) v (s) + m1 v (s)) m1 (1 − µ) r1 xv (s) + µ (1 − r1 ) v (s) = . 1 − r1

V1 =

Similarly, the sender’s expected utility under Φ2 is: µ ((1 − m2 ) Pr (r2 |s) v (s) + m2 Pr (r2 |s) v (s)) m2 (µ − r1 ) r2 (xv (s) + v (s)) = . r2 − r1

V2 =

In the limit r1 ↓ 0, we have V2 = µ (xv (s) + v (s)) > µv (s) = V1 ; but in the limit r1 ↑ µ, we have V1 = µ (xv (s) + v (s)) > 0 = V2 . Since both V1 and V2 are continuous in r1 , for each r2∗ ∈ (µ, 1/ (1 + x)), there exists an r1∗ ∈ (0, µ) at which V1 = V2 . Let G and G 0 be the two information structures with R = {r1∗ /2, r2∗ } and R0 = {r1∗ , r2∗ }, respectively. Because r1 ≤ r10 and r2 ≥ r20 , G is more informative than G 0 . Under G 0 , the 30

sender is indifferent between Φ1 and Φ2 , so Φ1 is an optimal mechanism. Since V1 increases with r1 and V2 decreases with r1 , the sender’s optimal mechanism under G is Φ2 . The receiver’s expected utility is 0 under Φ2 and is strictly positive under Φ1 ; so UG 0 > UG . Since V2 decreases with r1 , the sender is strictly better off under G; so VG > VG 0 . Proposition 6 Suppose Assumption 1 holds. 1. The receiver’s expected utility under Φf ull is strictly higher than under any other Φ. 2. The receiver’s expected utility under Φno is strictly lower than under any other Φ. Proof of Proposition 6. The receiver’s expected utility under Φ, Φf ull , and Φno is: Z r  Z EΦ [u] = u e (e r, s) de r dΦ (r, s) , (21) R∗ ×S r   Z r Z Z r Z EΦf ull [u] = u e (e r, s) de r f (s) ds = u e (e r, s) de r dΦ (r, s) , (22) S

r∗ (s) r

Z Z EΦno [u] = S

rno

R∗ ×S

 Z u e (e r, s) de r f (s) ds =

Z

R∗ ×S

r∗ (s) r

rno

 u e (e r, s) de r dΦ (r, s) .

(23)

Equation (21) holds because a message m induces the receiver r to act if and only if r ≥ m. The first equality in (22) holds because Φf ull generates message r∗ (s) for each s ∈ S. Similarly, the first equality in (23) holds because Φno generates rno for all s ∈ S. The second equality in (22) and (23) holds because the marginal distribution of s under any mechanism Φ coincides with the prior distribution of s. Part 1. Fubini’s Theorem together with the condition u e (r∗ (s) , s) = 0 gives R  R R r e (e r, s) de r dΦ (r, s) EΦf ull [u] − EΦ [u] = S r>r∗ (s) r∗ (s) u R ∗  R R r (s) − S r 0 for re > r∗ (s), so

Rr

r∗ (s)

(24)

u e (e r, s) de r > 0 for r > r∗ (s).

Any Φ that differs from Φf ull puts strictly positive probability on the event r > r∗ (s), R otherwise R∗ ×S u e (r, s) dΦ (r, s) would be strictly negative rather than zero. Therefore, the first integral in (24) is strictly positive. The analogous argument shows that the second integral in (24) is strictly negative, so EΦf ull [u] − EΦ [u] > 0 for any Φ that differs from Φf ull . Part 2. For a mechanism Φ, denote the conditional distribution of s given a message r by PΦ (s|r) and the marginal distribution of message r by PΦ (r). Fubini’s Theorem gives   R r R r R EΦ [u] − EΦno [u] = r no r no S u e (e r, s) dPΦ (s|r) de r dPΦ (r)  i R r hR r R (25) − rno rno S u e (e r, s) dPΦ (s|r) de r dPΦ (r) . 31

By Assumption 1, we have

R

u e (e r, s) dPΦ (s|r) > 0 for re > r. Therefore,  Z rno Z u e (e r, s) dPΦ (s|r) de r > 0 for r < rno . r

S

S

Since PΦ (r) of any mechanism Φ that differs from Φno puts strictly positive probability on messages in [r, rno ), the first integral in (25) is strictly positive. The analogous argument shows that the second integral in (25) is strictly negative, so EΦ [u] − EΦno [u] > 0 for any Φ that differs from Φno .

References Aliprantis, Charalambos D. and Kim C. Border (2006) Infinite Dimensional Analysis: A Hitchhiker’s Guide, Berlin: Springer-Verlag. Alonso, Ricardo and Niko Matouschek (2008) “Optimal Delegation,” Review of Economic Studies, Vol. 75, pp. 259–293. Amador, Manuel and Kyle Bagwell (2013) “The Theory of Optimal Delegation With an Application to Tariff Caps,” Econometrica, Vol. 81, pp. 1541–1599. Anderson, Edward J. and Peter Nash (1987) Linear Programming in Infinite-Dimensional Spaces: Theory and Applications, New York: John Wiley and Sons. Arvey, Richard D. and James E. Campion (1982) “The Employment Interview: A Summary and Review of Recent Research,” Personnel Psychology, Vol. 35, pp. 281–322. Bergemann, Dirk and Stephen Morris (2016a) “Bayes Correlated Equilibrium and the Comparison of Information Structures in Games,” Theoretical Economics, Vol. 11, pp. 487–522. Bergemann, Dirk and Stephen Morris (2016b) “Information Design, Bayesian Persuasion, and Bayes Correlated Equilibrium,” American Economic Review: Papers and Proceedings, Vol. 106, pp. 586–591. Blackwell, David (1953) “Equivalent Comparisons of Experiments,” Annals of Mathematical Statistics, Vol. 24, pp. 265–272. Crawford, Vincent P. and Joel Sobel (1982) “Strategic Information Transmission,” Econometrica, Vol. 50, pp. 1431–1451.

32

Gentzkow, Matthew and Emir Kamenica (2016) “A Rothschild-Stiglitz Approach to Bayesian Persuasion,” American Economic Review: Papers and Proceedings, Vol. 106, pp. 597–601. Holmstrom, Bengt (1984) “On the Theory of Delegation,” in M Boyer and R. Kihlstrom eds. Bayesian Models in Economic Theory, New York: North-Holland. Kamenica, Emir and Matthew Gentzkow (2011) “Bayesian Persuasion,” American Economic Review, Vol. 101, pp. 2590–2615. Kolotilin, Anton (2015) “Experimental Design to Persuade,” Games and Economic Behavior, Vol. 90, pp. 215–226. Kolotilin, Anton, Tymofiy Mylovanov, Andriy Zapechelnyuk, and Ming Li (2017) “Persuasion of a Privately Informed Receiver,” Econometrica, Vol. forthcoming. Milgrom, Paul (1981) “Good News and Bad News: Representation Theorems and Applications,” Bell Journal of Economics, Vol. 12, pp. 380–391. Ostrovsky, Michael and Michael Schwarz (2010) “Information Disclosure and Unraveling in Matching Markets,” American Economic Journal: Microeconomics, Vol. 2, pp. 34–63. Quah, John K-H and Bruno Strulovici (2012) “Aggregating the Single Crossing Property,” Econometrica, Vol. 80, pp. 2333–2348. Rayo, Luis and Ilya Segal (2010) “Optimal Information Disclosure,” Journal of Political Economy, Vol. 118, pp. 949–987.

33