What Is A Good Risk Measure - FDIC

What Is A Good Risk Measure - FDIC

What Is a Good Risk Measure: Bridging the Gaps between Data, Coherent Risk Measures, and Insurance Risk Measures C. C. Heyde, S. G. Kou, X. H. Peng Co...

321KB Sizes 0 Downloads 11 Views

What Is a Good Risk Measure: Bridging the Gaps between Data, Coherent Risk Measures, and Insurance Risk Measures C. C. Heyde, S. G. Kou, X. H. Peng Columbia University June 26, 2006

Abstract Two main axiomatically based risk measures are the coherent risk measure, which assumes subadditivity for random variables, and the insurance risk measure, which assumes additivity for comonotonic random variables. We propose a new, data based, risk measure, called natural risk statistic, that is characterized by a new set of axioms. The new axioms only require subadditivity for comonotonic random variables, which is consistent with the prospect theory in psychology. Comparing to two previous measures, the natural risk statistic includes the tail conditional median which is more robust than the tail conditional expectation suggested by the coherent risk measure; and, unlike insurance risk measures, the natural risk statistics can also incorporate scenario analysis. The natural risk statistic includes the VaR as a special case and therefore shows that VaR, though simple, is not irrational. Keywords: Risk measure, tail conditional expectation, tail conditional median, value at risk, quantile, robust statistics, L-statistics

1

Introduction

Broadly speaking a risk measure attempts to assign a single numerical value to the potential random financial loss. Obviously, it can be problematic in using one number to summarize the whole statistical distribution of the financial loss. Therefore, one shall avoid doing this if it is at all possible. However, in many cases there is no alternative choice. Examples of such cases include margin requirements in financial trading, insurance risk premiums, and government regulatory deposit requirements, such as Basel Accord [5, 6, 7] for banking regulation. Consequently, how to design a good risk measure becomes a problem of great practical importance, and is essential to financial and insurance industries as well as to the regulators. There are two main approaches in the literature, the coherent risk measure suggested by Artzner et al. [3] and the insurance risk measure in Wang et al. [24]. In the coherent risk measure, one first chooses a set of scenarios (different probability measures) and then compute 1

the coherent risk measure as the maximal expectation of the loss under these scenarios. In the insurance risk measure, one fixes a distorted probability and then compute the insurance risk measure as the expectation with respect to one distorted probability. Both approaches are axiomatic, meaning that some axioms are postulated first, and all the risk measures satisfying the axioms are then identified. Of course, once some axioms are set, there is room left to evaluate the axioms to see whether the axioms are reasonable for one’s particular needs, and, if not, one should discuss possible alternative axioms. In particular, the subadditivity axiom in the coherent risk measure has been criticized by Daníelsson et al. [9] from a viewpoint of tail distribution, and by Dhaene et al. [11], Dhaene et al. [12] from regulatory and merger viewpoints. In this paper we complement the previous approaches of coherent and insurance risk measures by postulating a different set of axioms. The resulting risk measures are fully characterized in the paper. More precisely, the contribution of the current paper is sixfold. (1) We give more reasons on why a different set of axioms is needed: (a) In addition to reviewing the critiques of subadditivity in [9], [11], [12], we point out some new critiques from psychological and robustness viewpoints. See Section 3. (b) The main drawback of insurance risk measure is that it does not incorporate scenario analysis; i.e. unlike the coherent risk measure, the insurance risk measure chooses a (distorted) probability measure, and does not allow one to compare different distorted probability measures. See Section 2. (c) What is missed in both coherent and insurance risk measures is the consideration of data. Our approach is based on data (either observed or simulated or both) rather than on hypothetical distributions. (2) A different set of axioms based on data and comonotonic subadditivity is postulated in Section 4, resulting in the definition of natural risk statistic. A complete characterization of the natural risk statistics is given in Theorem 1. (3) An alternative characterization of the natural risk statistic based on statistical acceptance sets is given in Section 5. (4) VaR or quantile is among the most widely used risk measures in practice (see, e.g. the Basil Accord). However the coherent risk measure rule out the use of VaR. In Section 6 we show that the natural risk statistic gives an axiomatic justification to the use of VaR. (5) Unlike the insurance risk measures, we show in Section 6 that the natural risk statistic can incorporate scenario analysis by putting different weights on the sample order statistics. (6) We point out in Section 7 that the natural risk statistic includes the tail conditional median as a special case, which leads to more robust measures of risk than that of the tail

2

conditional mean suggest by the coherent risk measure. The mathematical difficulty of the current paper lies in the proof of Theorem 1. Unlike in the case of coherent risk measures, one cannot use the results in Huber [18] directly. This is because we only require comonotonic subadditivity, and the comonotonic sets are not open sets. Therefore, one has to be careful in applying the theorem of separating hyperplanes. Also we need to show that the weights are negative and add up to one.

2

Review of Two Risk Measures

2.1

Coherent Risk Measure

Let Ω be the set of all possible states and X be the set of all the random financial losses defined

on Ω. Then a risk measure is a mapping from X to R.

In Artzner et al. [3] a risk measure ρ is called a coherent risk measure, if it satisfies the

following three axioms: Axiom A1. Translation invariance and positive homogeneity: ρ(aX + b) = aρ(X) + b, ∀a ≥ 0, b ∈ R. Axiom A2. Monotonicity: ρ(X) ≤ ρ(Y ), if X ≤ Y almost surely.

Axiom A3. Subadditivity: ρ(X + Y ) ≤ ρ(X) + ρ(Y ), for any X, Y ∈ X .

Axiom A1 states that the risk of a financial position is proportional to the size of the position and that a sure loss of amount b simply increases the risk by b. Axiom A2 is a minimum requirement for a reasonable risk measure. What is controversial lies in the subadditivity requirement in Axiom A3, which basically means that “a merger does not create extra risk” ([3], p. 209). We will discuss the controversies related to this axiom in Section 3. If Ω has finite number of elements, Artzner et al. [3] showed that a risk measure ρ is coherent if and only if there exists a family P of probability measures on Ω, such that ρ(X) = sup {E P [X]}, ∀X ∈ X , P ∈P

where E P [X] is the expectation of X under the probability measure P . Therefore, if Ω has finite number of elements, then a coherent risk measure amounts to computing maximal expectation under different scenarios (different P ’s), thus justifying scenarios analysis used in practise. Also if Ω has finite number of elements, then Artzner et al. [3] also presented an equivalent approach of defining the coherent risk measure through the specification of acceptance sets, the sets of financial positions accepted by regulators or investors. 3

The coherent risk measure gives an axiomatic justification to the use of the “SPAN” system in calculating initial portfolio margin requirements at the Chicago Mercantile Exchange (see [8]). SPAN essentially defines a margin requirement as the maximal expected loss of the portfolio under 14 pre-specified scenarios. However, from a practical viewpoint, a main drawback of the coherent risk measure is that it rules out the use of quantiles as risk measures, as it only includes expectations. One of the most widely used risk measures for regulators and internal risk management in banks is Value-at-Risk, or VaR for short, which is nothing but a quantile at some pre-defined probability level. More precisely, Given α ∈ (0, 1), the value-at-risk VaRα at level α of the loss variable X

is defined as the α-quantile of X, i.e.,

VaRα (X) = min{x | P (X ≤ x) ≥ α}.

(1)

For example the banking regulation “Basil Accord” specifies a risk measure as VaR at 99 percentile. Therefore, the very fact that coherent risk measures exclude VaR and quantiles post a serious inconsistency between the academic theory and industrial practise. The main reason of this inconsistency is due to the subadditivity in Axiom A3, which is a controversial axiom, as we will explain in Section 3. By relaxing this axiom and requiring subadditivity only for comonotonic random variables, we are able to find a new set of axioms in Section 4 which will include VaR and quantiles, thus eliminating this inconsistency.

2.2

Insurance Risk Measure

Insurance risk premiums can also be viewed risk measures, as they aim at using one numerical number to summarize future random financial losses. Wang et al. [24] proposed four axioms to characterize insurance risk premiums. More precisely, a risk measure ρ is said to be an insurance risk measure if it satisfies the following four axioms. Axiom C1. Conditional state independence: ρ(X) = ρ(Y ), if X and Y have the same distribution. This means that the risk of a position is determined only by the loss distribution. Axiom C2. Monotonicity: ρ(X) ≤ ρ(Y ), if X ≤ Y almost surely.

Axiom C3. Comonotonic additivity:

ρ(X + Y ) = ρ(X) + ρ(Y ), if X and Y are comonotonic, where random variables X and Y are comonotonic if and only if (X(ω1 ) − X(ω2 ))(Y (ω1 ) − Y (ω2 )) ≥ 0 4

holds almost surely for ω1 and ω2 in Ω. Axiom C4. Continuity: lim ρ((X − d)+ ) = ρ(X + ), lim ρ(min(X, d)) = ρ(X),

d→0

d→∞

lim ρ(max(X, d)) = ρ(X),

d→−∞

where (X − d)+ = max(X − d, 0).

The notion of comonotonic random variables in axiom C3 is discussed by Schmeidler [25],

Yaari [29] and Denneberg [10]. If two random variables X and Y are comonotonic, then X(ω) and Y (ω) always move in the same direction as the state ω changes. Wang et al. [24] imposed axiom C3 based on the argument that the comonotonic random variables do not hedge against each other, leading to the additivity of the risks. However, this is only true if one focuses on one scenario. Indeed, if one have multiple scenarios, then the counterexample at the end of Section 6 will show that comonotonic additivity fail to hold. Wang et al. [24] proved that if X contains all the Bernoulli(p) random variables, 0 ≤ p ≤ 1,

then risk measure ρ satisfies axioms C1-C4 and ρ(1) = 1 if and only if ρ has a Choquet integral representation with respect to a distorted probability Z Z Z 0 (g(P (X > t)) − 1)dt + ρ(X) = Xd(g ◦ P ) = −∞



g(P (X > t))dt,

(2)

0

where g(·) is called the distortion function which is nondecreasing with g(0) = 0 and g(1) = 1, and g ◦ P (A) := g(P (A)) is called the distorted probability. The detailed discussion of Choquet

integration can be found in Denneberg [10].

It should be emphasized that VaR satisfies the axioms C1-C4 (see Corollary 4.6 in Denneberg [10] for a proof that VaR satisfying axiom C4) and henceforth is an insurance risk measure. But VaR is not a coherent risk measure, because it may not satisfy subadditivity (see [3]). In general, the insurance risk measure in (2 does not satisfy subadditivity, unless the distortion function g(·) is concave (see Denneberg [10]). A main drawback of insurance risk measure is that it does not incorporate scenario analysis. More precisely, unlike the coherent risk measure, the insurance risk measure chooses a fixed distortion function g and a fixed probability measure P , and does not allow one to compare different measures within a family P of probability measures. This is inconsistent with industrial practices, as people generate different scenarios to get a suitable risk measure.

The main reason that the insurance risk measure rules out scenario analysis is that it requires comonotonic additivity. The counterexample at the end of Section 6 shows that even for comonotonic random variables, with different scenarios we may get strict subadditivity rather 5

than additivity. In our new approach in Section 4 we shall require comonotonic subadditivity instead of comonotonic additivity. Remark: What is missing in both coherent and insurance risk measures is the consideration of data. Our approach in Section 4 is based on data (either observed or simulated or both) rather than on hypothetical distributions.

3

Is Subadditivity too Restrictive?

3.1 3.1.1

Review of Existing Critiques Tail Subadditivity of VaR

VaR has been criticized because of its lack of subadditivity, except for elliptically distributed random vectors (see Embrechts et al. [14]). In fact, if Z = (Z1 , . . . , Zn ) is an n-dimensional P random vector with an elliptical distribution, then for any two linear portfolio X = ni=1 ai Zi P and Y = ni=1 bi Zi , ai , bi ∈ R, the subadditivity holds: VaRα (X + Y ) ≤VaRα (X)+VaRα (Y ).

Although in the center of the distributions VaR may violate the subadditivity, in practice

VaR is mostly used in tail regions. In an interesting paper Daníelsson et al. [9] showed that VaR is subadditive in the tail regions, provided that the tails are not too fat. More precisely, they proved that: (1) If X and Y are two asset returns having jointly regularly varying nondegenerate tails with tail index bigger than one, then there exists α0 ∈ (0, 1), such that VaRα (X + Y ) ≤ VaRα (X) + VaRα (Y ), ∀α ∈ (α0 , 1). (2) If the tail index of the X and Y are different, then a weaker form of tail subadditivity holds lim sup α→1

VaRα (X + Y ) ≤ 1. VaRα (X) + VaRα (Y )

Asset returns with tail index less than one have very fat tails; they are hard to find and easy to identify, and Daníelsson et al. [9] argued that they can be treated as special cases in financial modeling. According to the Basel Accord, 99% VaR is the primary risk measurement tool for determining capital charges against market risk. Daníelsson et al. [9] also carried out simulations which show that VaRα is indeed subadditive when α ∈ [95%, 99%] for most practical applications. 3.1.2

Critique from the Viewpoint of Regulator

For a risky business like an insurance company with potential loss X, regulators impose a solvency capital requirement ρ(X) on the company to protect the insurance policyholder from 6

insolvency, where ρ is a risk measure. In other words, the company has to hold the capital ρ(X) in order to pay out the claims from the policyholders. The shortfall of a portfolio with loss X and solvency capital requirement ρ(X) is therefore max(0, X − ρ(X)) = (X − ρ(X))+ , which is the part of the loss that cannot be covered by the insurer.

Dhaene et al. [11] suggested that from the point of view of the regulator, a merger of two portfolios X and Y should decrease the shortfall, i.e. (X + Y − ρ(X + Y ))+ ≤ (X − ρ(X))+ + (Y − ρ(Y ))+ However, Dhaene et al. [11] showed that this condition implies superaddivitivity not subadditivity; and they also show that even the weaker requirement E[(X + Y − ρ(X + Y ))+ ] ≤ E[(X − ρ(X))+ ] + E[(Y − ρ(Y ))+ ] is violated by coherent risk measures, such as the tail conditional expectation. 3.1.3

Critique from the Merger Viewpoint

Subadditivity basically means that “a merger does not create extra risk” ([3], p. 209). However, Dhaene et al. [12] pointed out that many times merger may increase risk, particularly due to bankruptcy protections for each firms. For example, it is better to split a risky trading business into a separate sub-firm. This way, even if the loss from a sub-firm is enormous, the parent firm can simply let the sub-firm go bankrupt, thus confining the loss to that one sub-firm. Therefore, creating sub-firms may incur less losses compared to the situation that the parent firm does not split into sub-firms. In this sense merger may increase firm risk. The famous example of the collapse of Britain’s Barings Bank (which has a long operating history and even helped finance the Louisiana Purchase by the United States in 1802) in February 1995 due to the failure of a single trader, Nick Leeson, in Singapore clearly indicates that merger may increase risk. Had Barings Bank set up a separate firm for its Singapore unit, the bankruptcy in that unit would not have sink the entire bank.

3.2 3.2.1

Two New Critiques Critique from the Psychological Theory of Uncertainty and Risk

Risk measures have a close connection with the psychological theory of people’s preference of uncertainties and risk. By the late 1970’s, psychologists and economists have discovered many preference anomalies in which rational people may systematically violate the axioms of 7

the expected utility theory. In particular, Kahneman and Tversky [19] proposed a model of choice under uncertainty called the “prospect theory” which departed from the expected utility model, leading to a Nobel prize in Economics. The theory has been further developed by many people, including the contributions from Kahneman and Tversky [27], Tversky and Wakker [28], Quiggin [21], Schmeidler [25], [26] and Yaari [29] on using comonotonic random variables in prospect theory. Their models are also referred to as “anticipated utility”, “rank-dependent models” and “Choquet expected utility”. The prospect theory postulates that (a) it is better to impose preference on comonotonic random variables rather than on arbitrary random variables; and (b) people evaluate uncertain prospects using “decision weights” that may be viewed as distorted probabilities of outcomes. The theory can explain a variety of preference anomalies including the Allais and Ellsberg paradoxes. Here is a simple example, by modifying a similar one in Schmeidler [26], that illustrates that risk associated with non-comonotonic random variables may violate subadditivity, because of Ellsberg’s paradox. Suppose there are two urns I and II. Urn I contains 50 black balls and 50 white balls, while Urn II has unknown number of red and green balls. Let IB be the event that a ball drawn from Urn I is black, and let IW, IIR, IIG be defined similarly. The participates in the experiment are asked about their judgements of risk preference of losing 10,000 if one of the four events IB, IW, IIR, IIG happens. Then almost all participates seem to agree that the risk of IB and IW should be the same, and so does IIR and IIG. Furthermore, if we assign 10,000 as a risk measure for the sure loss of 10,000 with probability one, then people will have no problem accepting that the risk measures for IB and IW are all equal to 5,000. However, the paradox is that for a significant number of rational participates (for example people who know calculus, etc.), due to uncertainty in Urn II and people’s attitude towards large loss (p. 298 in Tversky and Kahneman [27]), they will prefer the loss of 10,000 associated with the event IIR over that with the event IB, and the event IIG over IW. In other words, these rational participates view the risk of loss associated with IIR to be smaller than that with IB, and similarly for IIG and IW. Mathematically we have ρ(IB) > ρ(IIR), ρ(IW ) > ρ(IIG), ρ(IB) = 5000, ρ(IW ) = 5000. But IIR ∪ IIG corresponds a sure loss of 10,000, which leads to that ρ(IIR ∪ IIG) = ρ(10, 000) = 10, 000 = ρ(IB) + ρ(IW ) > ρ(IIR) + ρ(IIG), 8

(3)

violating the subadditivity. Clearly the random losses associated with IIR and IIG are not comonotonic. Therefore, this example shows that non-comonotonic random variables may not have subadditivity. Schmeidler [26] attributes this phenomena, such as the inequality in (3), to the difference between randomness and uncertainty, and further postulates that even for rational decision makers their subjective probabilities may not add up to one, due to uncertainty. Schmeidler [26] also indicated that risk preference for comonotonic random variables are easier to justified than the risk preference for arbitrary random variables Following the prospect theory, we think it may be appropriate to relax the subadditivity to comonotonic subadditivity. In other words, we impose ρ(X + Y ) ≤ ρ(X) + ρ(Y ) only for

comonotonic random variables X and Y .

The insurance risk measure imposes comonotonic additivity in axiom C3 based on the argument that comonotonic losses have no hedge effect against each other. However, this intuition only holds when one focuses only on one scenario or one distorted probability. The counterexample at the end of Section 6 shows that if one incorporates different scenarios, then additivity may not hold even for comonotonic random variables. Hence, the comonotonic additivity condition in Axiom C3 may be too restrictive and its relaxation to comonotonic subadditivity may be a better choice. 3.2.2

Critique from the Robustness Viewpoint

When a regulator imposes a risk measure based on either probabilistic models of the loss or historical data of the loss, the risk measure must be unambiguous, stable, and must give consistent answers. Otherwise, different firms using different models may report very different risk measures to the regulator, and the risk measures may also fluctuate dramatically when firms update their historical database. In short, from a regulator viewpoint, the risk measure should demonstrate robustness with respect to underlying models or updating of historical data, in order to enforce the regulation and to maintain the stability of the regulation. The robustness of the coherent risk measure is questionable: (1) The coherent risk measure suggests to use tail expectations to compute risk measures, and tail expectations may be sensitive to model assumptions of heaviness of tail distributions, which is a controversial subject. For example, although it is accepted that real returns data has tails heavier than those of normal distribution, one school of thought believes tails to be exponential type (e.g. [13] [4]) and another believes power-type tails (e.g. [22]).

9

Heyde and Kou [17] shows that it is very difficult to distinguish between exponential-type and power-type tails with 5,000 observations (about 20 years of daily observations). This is mainly because the quantiles of exponential-type distributions and power-type distributions may overlap. For example, surprisingly, an exponential distribution has larger 99 percentile than the corresponding t-distribution with degree of freedom 5. Therefore, with ordinary sample data (e.g. 20 years of daily data), one cannot easily identify exact tail behavior from data. Therefore, the tail behavior may be a subjective issue depending on people’s modeling preference. We show in Section 7 that the tail conditional expectation, a widely used coherent risk measures, is sensitive to the assumption on the tail behavior of the loss distribution. (2) Some risk measures may be coherent, satisfying subadditivity, but not robust at all. A simple example is the sample maxima. More precisely, given a set of observations x ˜ = (x1 , . . . , xn ) from a loss random variable X, let (x(1) , . . . , x(n) ) denote the order statistics of the data x ˜ with x(n) being the largest. Then x(n) is a coherent risk measure, as it satisfies subadditivity. However, the maximum loss x(n) is not robust at all, and is quite sensitive to both outliers and model assumptions. ˜n0 More generally, let w ˜ 0 = (w10 , . . . , wn0 ) ∈ Rn be a weight with 0 ≤ w10 ≤ w20 ≤ · · · ≤ w P P and ni=1 wi0 = 1. Then the risk measure ρˆ(˜ x) = ni=1 wi0 x(i) is an empirically coherent risk measure satisfying subadditivity, as will be shown in Section 6. However, since this risk measure puts larger weights on larger observations, it is obviously not robust with respect to outliers in the data and model assumptions. We will discuss the issue of robustness in detail in Section 7.

4

Natural Risk Statistic and its Representation

4.1

The Axioms and the Representation

In this section we shall propose a new measure of risk based on data. Suppose we have a collection of data observation x ˜ = (x1 , x2 , ...xn ) ∈ Rn on the random variable X. The collection x ˜ may be a set of empirical data observations from X, or a set of simulated data observation

regarding (subjective) possible outcomes of X from a given model, or a combination of the two. Our risk measure, call natural risk statistic, is based on the data x ˜. More precisely, a risk statistic ρˆ is a mapping from the data in

n

to a numerical value in R.

Remark: In the setting of the coherent risk measure, Ω has to be finite and consequently the random variable X has to be discrete. In our setting of risk statistic, X can be any random variable, discrete or continuous. What we need is a set of data observation (could be empirical

10

or subjective or both) x ˜ = (x1 , x2 , ..., xn ) ∈ Rn from X. Next we shall introduce a set of axioms for ρˆ.

Axiom D1. Positive homogeneity and translation invariance: ρˆ(a˜ x + b1) = aˆ ρ(˜ x) + b, ∀˜ x ∈ Rn , a ≥ 0, b ∈ R, where 1 = (1, 1, ..., 1)T ∈ Rn .

Axiom D2. Monotonicity: ρˆ(˜ x) ≤ ρˆ(˜ y ), if x ˜ ≤ y˜,

where x ˜ ≤ y˜ if and only if xi ≤ yi , i = 1, . . . , n.

The above two axioms have been proposed for the coherent risk measure. Here we simply

adapted them to the case of risk statistic. Note that Axiom D1 yields ρˆ(0 · 1) = 0, ρˆ(b1) = b, b ∈ R, Also Axioms D1 and D2 imply the continuity of ρˆ. Indeed, suppose ρˆ satisfies Axioms D1 and D2. Then for any x ˜ ∈ Rn , ε > 0, and y˜ satisfying |yi − xi | < ε, i = 1, . . . , n, we have

x ˜ − ε1 < y˜ < x ˜ + ε1. By the monotonicity in Axiom D2, we have ρˆ(˜ x − ε1) ≤ ρˆ(˜ y ) ≤ ρˆ(˜ x + ε1). Applying Axiom D1, the inequality further becomes ρˆ(˜ x)−ε ≤ ρˆ(˜ y ) ≤ ρˆ(˜ x)+ε, which establishes

the continuity of ρˆ.

Axiom D3. Comonotonic subadditivity: ρˆ(˜ x + y˜) ≤ ρˆ(˜ x) + ρˆ(˜ y ), if x ˜ and y˜ are comonotonic, ˜ and y˜ are comonotonic if and only if (xi − xj )(yi − yj ) ≥ 0, for any i 6= j. where x

In Axiom D.3 we relax the subadditivity requirement in the coherent risk measure so that

the axiom is only enforced for comonotonic data. This also relaxes the comonotonic additivity requirement in the insurance risk measure. Comonotonic subadditivity is consistent with the prospect theory of risk in psychology, as we only specify our preference among comonotonic random variables. Axiom D4. Permutation invariance: ρˆ((x1 , . . . , xn )) = ρˆ((xi1 , . . . , xin )), for any permutation (i1 , . . . , in ). This axiom is postulated because we focus on risk measures of a single random variable X with data observation x ˜. In other words, just like the coherent risk measure and insurance risk measure, here we discuss static risk measures rather than dynamic risk measures. 11

Definition 1. A risk statistic ρˆ : Rn → R is called a natural risk statistic if it satisfies

Axioms D1-D4.

The following representation theorem for natural risk statistic is a main result of the current paper. ˜ with x(n) being Theorem 1. Let x(1) , ..., x(n) be the order statistics of the observation x the largest. Then ρˆ is a natural risk statistic if and only if there exists a set of weights W = P ˜ ∈ W satisfying ni=1 wi = 1 and wi ≥ 0, ∀1 ≤ i ≤ n, {w ˜ = (w1 , . . . , wn )} ⊂ Rn with each w such that

n X wi x(i) }, ∀˜ x ∈ Rn . ρˆ(˜ x) = sup {

(4)

w∈W ˜ i=1

The main difficulty in proving Theorem 1 is the “only if” part. Axiom D3 implies that the functional ρˆ satisfies subadditivity on comonotonic sets of Rn , for example, on the set B = {˜ y ∈ Rn | y1 ≤ y2 ≤ · · · ≤ yn }. However, unlike in the case of the coherent risk measure,

the existence of the set of weights W such that (4) holds does not follow easily from the proof

in Huber [18]. The main difference here is that the comonotonic set B is not an open set in

Rn . The boundary points may not have nice properties as the interior points do. We have to treat boundary points with more efforts. In particular, one should be cautious when using the

results of separating hyperplanes. Furthermore, we have to spend some effort in Lemma 2 to show that wi ≥ 0, ∀1 ≤ i ≤ n.

4.2

Proof of Theorem 1

The proof relies on the following two Lemmas, which depend heavily on the properties of interior points. Therefore, we can only show that they are true for the interior points of B. The results

for boundary points will be obtained by approximating the boundary points by the interior points, and by employing continuity and uniform convergency. Lemma 1. Let B = {˜ y ∈ Rn | y1 ≤ y2 ≤ · · · ≤ yn }, and denote Bo to be the interior of

z ) = 1 there B. For any fixed z˜ = (z1 , . . . , zn ) ∈ Bo and any ρˆ satisfying Axioms D1-D4 and ρˆ(˜ Pn x) := i=1 wi xi , satisfies exists a weight w ˜ = (w1 , . . . , wn ) such that the linear functional λ(˜ λ(˜ z ) = 1,

(5)

λ(˜ x) < 1 for all x ˜ such that x ˜ ∈ B and ρˆ(˜ x) < 1.

(6)

Proof. Let U = {˜ x | ρˆ(˜ x) < 1} ∩ B. Since x ˜, y˜ ∈ B, we know that x ˜ and y˜ are comonotonic, ¯ of U is also convex. Axioms D1 and D3 imply that U is convex, and, therefore, the closure U 12

For any ε > 0, since ρˆ(˜ z − ε1) = ρˆ(˜ z ) − ε = 1 − ε < 1, it follows that z˜ − ε1 ∈ U .

Since z˜ − ε1 tends to z˜ as ε ↓ 0, we know that z˜ is a boundary point of U because ρˆ(˜ z ) = 1. ¯ at z˜, i.e., there exists a nonzero vector Therefore, there exists a supporting hyperplane for U P ¯ . In x) := ni=1 wi xi satisfies λ(˜ x) ≤ λ(˜ z ) for all x ˜ ∈ U w ˜ = (w1 , . . . , wn ) ∈ Rn such that λ(˜

particular, we have

λ(˜ x) ≤ λ(˜ z ), ∀˜ x ∈ U.

(7)

We shall show that the strict inequality holds in (7). Suppose, by contradiction, that there exists x ˜0 ∈ U such that λ(˜ x0 ) = λ(˜ z ). For any α ∈ (0, 1), let x ˜α = α˜ z + (1 − α)˜ x0 . Then we have

z ) + (1 − α)λ(˜ x0 ) = λ(˜ z) λ(˜ xα ) = αλ(˜

(8)

In addition, since z˜ and x ˜0 are comonotonic (as they all belong to B) we have ρˆ(˜ xα ) ≤ αˆ ρ(˜ z ) + (1 − α)ˆ ρ(˜ x0 ) < α + (1 − α) = 1, ∀α ∈ (0, 1).

(9)

˜α0 is also an Since z˜ ∈ Bo , it follows that there exists a small enough α0 ∈ (0, 1) such that x

interior point of B. Hence, for all small enough ε > 0,

˜ ∈ B. x ˜α0 + εw

(10)

˜α0 + εw ˜≤x ˜α0 + εwmax 1. Thus, the monotonicity With wmax = max(w1 , w2 , ..., wn ), we have x in Axiom D2 and translation invariance in Axiom D1 yield

˜ ≤ ρˆ(˜ xα0 + εwmax 1) = ρˆ(˜ xα0 ) + εwmax . ρˆ(˜ xα0 + εw)

(11)

Since ρˆ(˜ xα0 ) < 1 via (9), we have by (11) and (10) that for all small enough ε > 0, ρˆ(˜ xα0 + εw) ˜ < 1, x ˜α0 + εw ˜ ∈ U. ˜ ≤ λ(˜ z ). However, we have, by (8), an opposite inequality Hence, (7) implies λ(˜ xα0 + εw)

˜ = λ(˜ xα0 ) + ε|w| ˜ 2 > λ(˜ xα0 ) = λ(˜ z ), leading to a contradiction. In summary, we λ(˜ xα0 + εw) have shown that λ(˜ x) < λ(˜ z ), ∀˜ x ∈ U.

(12)

Since ρˆ(0) = 0, we have 0 ∈ U . Letting x ˜ = 0 in (12) yields λ(˜ z ) > 0. So we can re-scale λ

such that

λ(˜ z ) = 1 = ρˆ(˜ z ).

13

Thus, (12) becomes λ(˜ x) < 1 for all x ˜ such that x ˜ ∈ B and ρˆ(˜ x) < 1, from which (6) holds. ¤ Lemma 2. Let B = {˜ y ∈ Rn | y1 ≤ y2 ≤ · · · ≤ yn }, and denote Bo to be the interior of B.

For any fixed z˜ = (z1 , . . . , zn ) ∈ Bo and any ρˆ satisfying Axioms D1-D4, there exists a weight

w ˜ = (w1 , . . . , wn ) such that n X

wi = 1,

(13)

i=1

wk ≥ 0, k = 1, . . . , n, n n X X wi xi , for ∀˜ x ∈ B, and ρˆ(˜ z) = wi zi . ρˆ(˜ x) ≥ i=1

(14) (15)

i=1

Proof. We will show this by considering three cases. Case 1: ρˆ(˜ z ) = 1. From Lemma 1, there exists a weight w ˜ = (w1 , . . . , wn ) such that the linear functional Pn λ(˜ x) := i=1 wi xi satisfies (5) and (6). P First we prove that w ˜ satisfies (13). For this, it is sufficient to show that λ(1) = ni=1 wi = 1. To this end, first note that for any c < 1 Axiom D1 implies ρˆ(c1) = c < 1. Thus, (6) implies

λ(c1) < 1, and, by continuity of λ, we obtain that λ(1) ≤ 1. Secondly, ∀c > 1 Axiom

D1 implies ρˆ(2˜ z − c1) = 2ˆ ρ(˜ z ) − c = 2 − c < 1. Then it follows from (6) and (5) that

˜ 1 > λ(2˜ z − c1) = 2λ(˜ z ) − cλ(1) = 2 − cλ(1), i.e. λ(1) > 1/c for any c > 1. So λ(1) ≥ 1, and w satisfies (13).

Next, we will prove that w ˜ satisfies (14). Let e˜k = (0, . . . , 0, 1, 0, . . . , 0) be the k-th standard ek ). Since z˜ ∈ Bo , there exists δ > 0 such that z˜ − δ˜ ek ∈ B. For any basis of Rn . Then wk = λ(˜

ε > 0, we have

ρˆ(˜ z − δ˜ ek − ε1) = ρˆ(˜ z − δ˜ ek ) − ε ≤ ρˆ(˜ z ) − ε = 1 − ε < 1, where the inequality follows from the monotonicity in Axiom D2. Then (6) and (5) implies z ) − δλ(˜ ek ) − ελ(1) = 1 − ε − δλ(˜ ek ). 1 > λ(˜ z − δ˜ ek − ε1) = λ(˜ ek ) > −ε/δ, and the conclusion follows by letting ε goes to 0. Hence wk = λ(˜

Finally, we will prove that w ˜ satisfies (15). It follows from Axiom D1 and (6) that ∀c > 0, λ(˜ x) < c,

for all x ˜ such that x ˜ ∈ B and ρˆ(˜ x) < c. 14

(16)

For any c ≤ 0, we choose b > 0 such that b + c > 0. Then by (16), we have λ(˜ x + b1) < c + b,

for all x ˜ such that x ˜ ∈ B and ρˆ(˜ x + b1) < c + b.

Since λ(˜ x + b1) = λ(˜ x) + bλ(1) = λ(˜ x) + b and ρˆ(˜ x + b1) = ρˆ(˜ x) + b we have ∀c ≤ 0, λ(˜ x) < c,

for all x ˜ such that x ˜ ∈ B and ρˆ(˜ x) < c.

(17)

It follows from (16) and (17) that ρˆ(˜ x) ≥ λ(˜ x),

for all x ˜ ∈ B,

which in combination with ρ˜(˜ z ) = λ(˜ z ) = 1 completes the proof of (15). Case 2: ρˆ(˜ z ) 6= 1 and ρˆ(˜ z ) > 0. ³ ´ 1 1 Since ρˆ ρˆ(˜ ˜ = 1 and ρˆ(˜ ˜ is still an interior point of B, it follows from the result proved in z) z z) z P ˜ = (w1 , . . . , wn ) satisfying Case 1 that there exists a linear functional λ(˜ x) := ni=1 wi xi , with w

(13), (14) and

¶ µ ¶ 1 1 z˜ = λ z˜ , ρˆ(˜ x) ≥ λ(˜ x), ∀˜ x ∈ B, and ρˆ ρˆ(˜ z) ρˆ(˜ z) µ

or equivalently ρˆ(˜ x) ≥ λ(˜ x), ∀˜ x ∈ B, and ρˆ(˜ z ) = λ(˜ z ). Thus, w ˜ also satisfies (15). Case 3: ρˆ(˜ z ) ≤ 0.

Choose b > 0 such that ρˆ(˜ z + b1) > 0. Since z˜ + b1 is an interior point of B, it follows P from the result proved in Case 2 that there exists a linear functional λ(˜ x) := ni=1 wi xi with w ˜ = (w1 , . . . , wn ) satisfying (13), (14) and

ρˆ(˜ x) ≥ λ(˜ x), ∀˜ x ∈ B, and ρˆ(˜ z + b1) = λ(˜ z + b1), or equivalently ρˆ(˜ x) ≥ λ(˜ x), ∀˜ x ∈ B, and ρˆ(˜ z ) = λ(˜ z ). Thus, w ˜ also satisfies (15). ¤ Proof of Theorem 1. (I) The proof of the “if” part. Suppose ρˆ is defined by (4). Then obviously ρˆ satisfies Axioms D1 and D4. To check Axiom D2, write (y(1) , y(2) , . . . , y(n) ) = (yi1 , yi2 , . . . , yin ), 15

where (i1 , . . . , in ) is a permutation of (1, . . . , n). Then for any x ˜ ≤ y˜, we have y(k) ≥ max{yij , j = 1, . . . , k} ≥ max{xij , j = 1, . . . , k} ≥ x(k) , 1 ≤ k ≤ n, which implies that ρˆ satisfies Axiom D2 because n n X X ρˆ(˜ y ) = sup { wi y(i) } ≥ sup { wi x(i) } = ρˆ(˜ x). w∈W ˜ i=1

w∈W ˜ i=1

To check Axiom D3, note that if x ˜ and y˜ are comonotonic, then there exists a permutation (i1 , . . . , in ) of (1, . . . , n) such that xi1 ≤ xi2 ≤ . . . ≤ xin and yi1 ≤ yi2 ≤ . . . ≤ yin . Hence, we

have (˜ x + y˜)(i) = x(i) + y(i) , i = 1, ..., n. Therefore,

n n X X ρˆ(˜ x + y˜) = sup { wi (˜ x + y˜)(i) } = sup { wi (x(i) + y(i) )} w∈W ˜ i=1 n X

≤ sup {

w∈W ˜ i=1

w∈W ˜ i=1 n X

wi x(i) } + sup {

w∈W ˜ i=1

wi y(i) } = ρˆ(˜ x) + ρˆ(˜ y ),

which implies that ρˆ satisfies Axiom D3. (II) The proof of the “only if” part. By Axiom D4, we only need to show that there exists a P ˜ ∈ W satisfies ni=1 wi = 1 and wi ≥ set of weights W = {w ˜ = (w1 , . . . , wn )} ⊂ Rn with each w 0, ∀1 ≤ i ≤ n, such that

n X wi xi }, ∀˜ x ∈ B, ρˆ(˜ x) = sup {

(18)

w∈W ˜ i=1

where recall that B = {˜ y ∈ Rn | y1 ≤ y2 ≤ · · · ≤ yn }.

˜ (˜y) satisfying (13), (14) and (15). By Lemma 2, for any point y˜ ∈ Bo , there exists a weight w

Therefore, we can take the collection of such weights as

W = {w ˜ (˜y) | y˜ ∈ Bo }. Then from (15), for any fixed x ˜ ∈ Bo we have ρˆ(˜ x) ≥

n X i=1

(˜ y)

wi xi , ∀˜ y ∈ Bo ,

ρˆ(˜ x) =

n X

(˜ x)

wi xi ,

i=1

Therefore,

n n X X (˜ y) ρˆ(˜ x) = sup { wi xi } = sup { wi xi }, ∀˜ x ∈ Bo , y˜∈Bo i=1

w∈W ˜ i=1

16

(19)

where each w ˜ ∈ W satisfies (13) and (14).

Next, we will prove that the above equality is also true for any boundary point of B, i.e., n X wi xi }, ∀˜ x ∈ ∂B. ρˆ(˜ x) = sup {

(20)

w∈W ˜ i=1

o Let x ˜0 be any boundary point of B. Then there exists a sequence {˜ xk }∞ k=1 ⊂ B such that

˜0 as k → ∞. By the continuity of ρˆ, we have x ˜k → x

n X xk ) = lim sup { wi xki }, ρˆ(˜ x0 ) = lim ρˆ(˜ k→∞

k→∞ w∈W ˜

(21)

i=1

where the last equality follows from (19). If we can interchange sup and limit in (21), i.e. if n n n X X X k k lim sup { wi xi } = sup { lim wi xi } = sup { wi x0i },

k→∞ w∈W ˜

k→∞ w∈W ˜ i=1

i=1

(22)

w∈W ˜ i=1

then (20) holds and the proof is complete. To show (22), note that we have by Cauchy-Schwartz inequality ¯ Ã n ¯ n !1 Ã n !1 Ã n !1 n 2 2 2 ¯ ¯X X X X X ¯ k 0¯ 2 k 0 2 k 0 2 wi xi − wi xi ¯ ≤ (wi ) (xi − xi ) ≤ (xi − xi ) , ∀w ˜ ∈ W, ¯ ¯ ¯ i=1

i=1

because wi ≥ 0 and

i=1

Pn

i=1 wi

i=1

i=1

= 1, ∀w ˜ ∈ W. Therefore,

all w ˜ ∈ W and (22) follows. ¤

5

Pn

k i=1 wi xi



Pn

0 i=1 wi xi

uniformly for

Another Representation of the Natural Risk Statistic via Acceptance Sets

Similar to the coherent risk measure, the proposed natural risk statistic can also be characterized via acceptance sets. A statistical acceptance set is a subset of Rn . Given a statistical acceptance set A ∈ Rn , the risk statistic ρˆA associated with A is defined to be the minimal amount of

risk-free investment that has to be added to the original position so that the resulting position is acceptable, or in mathematical form ρˆA (˜ x) = inf{m | x ˜ − m1 ∈ A}, ∀˜ x ∈ Rn .

(23)

On the other hand, given a risk statistic ρˆ, one can define the statistical acceptance set associated with ρˆ by x ∈ Rn | ρˆ(˜ x) ≤ 0}. Aρˆ = {˜ 17

(24)

We shall postulate the following axioms for statistical acceptance set A:

x ∈ Rn | xi ≤ 0, i = 1, . . . , n}. Axiom E1. The acceptance set A contains Rn− where Rn− = {˜

x ∈ Rn | Axiom E2. The acceptance set A does not intersect the set Rn++ where Rn++ = {˜

xi > 0, i = 1, . . . , n}.

Axiom E3. If x ˜ and y˜ are comonotonic and x ˜ ∈ A, y˜ ∈ A, then λ˜ x + (1 − λ)˜ y ∈ A, for

∀λ ∈ [0, 1].

Axiom E4. The acceptance set A is positively homogeneous, i.e., if x ˜ ∈ A, then λ˜ x∈A

for all λ ≥ 0.

Axiom E5. If x ˜ ≤ y˜ and y˜ ∈ A, then x ˜ ∈ A.

Axiom E6. If x ˜ ∈ A, then (xi1 , . . . , xin ) ∈ A for any permutation (i1 , . . . , in ).

We will show that a natural risk statistic and a statistical acceptance set satisfying axioms C.1-C.6 are mutually representable. More precisely, we have the following Theorem: Theorem 2. (I) If ρˆ is a natural risk statistic, then the statistical acceptance set Aρˆ is

closed and satisfies axioms E1-E6.

(II) If a statistical acceptance set A satisfies axioms E1-E6, then the risk statistic ρˆA is a

natural risk statistic.

(III) If ρˆ is a natural risk statistic, then ρˆ = ρˆAρˆ .

D.

¯ the closure of (IV) If a statistical acceptance set D satisfies axioms E1-E6, then AρˆD = D, Remark: Theorem 2 shows that the risk statistic ρˆ calculated from the data x ˜ is equivalent

to the amount of risk-free investment that has to be added to make the original position acceptable. This alternative characterization of the natural risk statistic is consistent with a similar characterization of the coherent risk measure in Artzner et al. ??. Proof. (I) (1) For ∀˜ x ≤ 0, Axiom D2 implies ρˆ(˜ x) ≤ ρˆ(0) = 0, hence x ˜ ∈ Aρˆ by definition.

˜ − α1. Axioms D2 Thus, E1 holds. (2) For any x ˜ ∈ Rn++ , there exists α > 0 such that 0 ≤ x

and D1 imply that ρˆ(0) ≤ ρˆ(˜ x − α1) = ρˆ(˜ x) − α. So ρˆ(˜ x) ≥ α > 0 and henceforth x ˜∈ / Aρˆ, i.e.,

x) ≤ 0, ρˆ(˜ y ) ≤ 0, and λ˜ x E2 holds. (3) If x ˜ and y˜ are comonotonic and x ˜ ∈ Aρˆ, y˜ ∈ Aρˆ, then ρˆ(˜

and (1 − λ)˜ y are comonotonic for any λ ∈ [0, 1]. Thus D3 implies

ρˆ(λ˜ x + (1 − λ)˜ y ) ≤ ρˆ(λ˜ x) + ρˆ((1 − λ)˜ y ) = λˆ ρ(˜ x) + (1 − λ)ˆ ρ(˜ y ) ≤ 0. ˜ ∈ Aρˆ and a > 0, we have ρˆ(˜ x) ≤ 0 and Hence λ˜ x + (1 − λ)˜ y ∈ Aρˆ, i.e., E3 holds. (4) For any x

˜ ≤ y˜ and y˜ ∈ Aρˆ, D1 implies ρˆ(a˜ x) = aˆ ρ(˜ x) ≤ 0. Thus, a˜ x ∈ Aρˆ, i.e., E4 holds. (5) For any x ˜ ∈ Aρˆ, we have ρˆ(˜ y ) ≤ 0. By D2, ρˆ(˜ x) ≤ ρˆ(˜ y ) ≤ 0. Hence x ˜ ∈ Aρˆ, i.e., E5 holds. (6) If x 18

then ρˆ(˜ x) ≤ 0. For any permutation (i1 , . . . , in ), D4 implies ρˆ((xi1 , . . . , xin )) = ρˆ(˜ x) ≤ 0. So

˜k ∈ Aρˆ, k = 1, 2, . . ., and x ˜k → x ˜ as k → ∞. (xi1 , . . . , xin ) ∈ Aρˆ, i.e., E6 holds. (7) Suppose x ˜∈ / Aρˆ. Then ρˆ(˜ x) > 0. There exists δ > 0 such that Then ρˆ(˜ xk ) ≤ 0, ∀k. Suppose the limit x

˜, it follows that there exists K ∈ N such that x ˜K > x ˜ − δ1. By ρˆ(˜ x − δ1) > 0. Since x ˜k → x x − δ1) > 0, which contradicts to ρˆ(˜ xK ) ≤ 0. So x ˜ ∈ Aρˆ, i.e., Aρˆ is closed. D2, ρˆ(˜ xK ) ≥ ρˆ(˜ (II) (1) For ∀˜ x ∈ Rn , ∀b ∈ R, we have

x + b1) = inf{m | x ˜ + b1 − m1 ∈ A} = b + inf{m | x ˜ − m1 ∈ A} = b + ρˆA (˜ x). ρˆA (˜ For ∀˜ x ∈ Rn , ∀a ≥ 0, if a = 0, then x) = inf{m | 0 − m1 ∈ A} = 0 = a · ρˆA (˜ x), ρˆA (a˜ where the second equality follows from E1 and E2. If a > 0, then x) = inf{m | a˜ x − m1 ∈ A} = a · inf{u | a(˜ x − u1) ∈ A} ρˆA (a˜ = a · inf{u | x ˜ − u1 ∈ A} = a · ρˆA (˜ x), by E4. Therefore, D1 holds. (2) Suppose x ˜ ≤ y˜. For any m ∈ R, if y˜ − m1 ∈ A, then E5 and

x ˜ − m1 ≤ y˜ − m1 imply that x ˜ − m1 ∈ A. Hence {m | y˜ − m1 ∈ A} ⊂ {m | x ˜ − m1 ∈ A}. By

y ) ≥ ρˆA (˜ x), i.e., D2 holds. (3) Suppose x ˜ and y˜ taking infimum on both sides, we obtain ρˆA (˜

are comonotonic. For any m and n such that x ˜ − m1 ∈ A, y˜ − n1 ∈ A, since x ˜ − m1 and y˜ − n1 are comonotonic, it follows from E3 that

1 x 2 (˜

− m1) + 12 (˜ y − n1) ∈ A. By E4, the previous

formula implies x ˜ + y˜ − (m + n)1 ∈ A. Therefore,

x + y˜) ≤ m + n. ρˆA (˜ Taking infimum of all m and n satisfying x ˜ − m1 ∈ A, y˜ − n1 ∈ A, on both sides of above inequality yields

x + y˜) ≤ ρˆA (˜ x) + ρˆA (˜ y ). ρˆA (˜ So D3 holds. (4) Fix any x ˜ ∈ Rn and any permutation (i1 , . . . , in ). Then for any m ∈ R, E6 implies that x ˜ − m1 ∈ A if and only if (xi1 , . . . , xin ) − m1 ∈ A. Hence

{m | x ˜ − m1 ∈ A} = {m | (xi1 , . . . , xin ) − m1 ∈ A}. Taking infimum on both sides, we obtain ρˆA (˜ x) = ρˆA ((xi1 , . . . , xin )), i.e., D.4 holds.

19

(III) For ∀˜ x ∈ Rn , we have x) = inf{m | x ˜ − m1 ∈ Aρˆ} = inf{m | ρˆ(˜ x − m1) ≤ 0} = inf{m | ρˆ(˜ x) ≤ m} ≥ ρˆ(˜ x), ρˆAρˆ (˜ via D1. On the other hand, for any δ > ρˆ(˜ x), we have δ > ρˆ(˜ x) ⇒ ρˆ(˜ x − δ1) < 0 (since ρˆ satisfies D1) ⇒x ˜ − δ1 ∈ Aρˆ (by definition) ⇒ ρˆAρˆ (˜ x − δ1) ≤ 0 (by definition) ⇒ ρˆAρˆ (˜ x) ≤ δ (since ρˆAρˆ satisfies D1) Letting δ ↓ ρˆ(˜ x), we obtain ρˆAρˆ (˜ x) ≤ ρˆ(˜ x). Therefore, ρˆ(˜ x) = ρˆAρˆ (˜ x).

x) ≤ 0. Hence x ˜ ∈ AρˆD . Therefore, D ⊂ AρˆD . By the (IV) For any x ˜ ∈ D, we have ρˆD (˜ ¯ ⊂ Aρˆ . On the other hand, for any x ˜ ∈ AρˆD , we have results (I) and (II), AρˆD is closed. So D D

x) ≤ 0, i.e., inf{m | x ˜ − m1 ∈ D} ≤ 0. If inf{m | x ˜ − m1 ∈ D} < 0, then by definition that ρˆD (˜

there exists m < 0 such that x ˜ − m1 ∈ D. Then since x ˜
¤

6

D

Relation between the Natural Risk Statistic, Coherent Risk Measure and Insurance Risk Measure

6.1

Relation with Coherent Risk Measures

To compare natural risk statistics with coherent risk measure in a formal manner, we first have to extend the coherence risk measure to coherent risk statistics. Definition 2. A risk statistic ρˆ : Rn → R is called a coherent risk statistic, if it satisfies

Axioms D1, D2 and the following axiom F3:

Axiom F3. Subadditivity: ρˆ(˜ x + y˜) ≤ ρˆ(˜ x) + ρˆ(˜ y ), for every x ˜, y˜ ∈ Rn .

By using the result from Huber [18], one can easily see that a risk statistic is a coherent risk statistic if and only if there exists a set of weights W = {w ˜ = (w1 , . . . , wn )} ⊂ Rn with each P w ˜ ∈ W satisfying ni=1 wi = 1 and wi ≥ 0, ∀1 ≤ i ≤ n, such that n X wi xi }, ∀˜ x ∈ Rn . ρˆ(˜ x) = sup { w∈W ˜ i=1

20

(25)

The next theorem shows the connection between natural risk statistics and coherent risk statistics. Theorem 3. Consider a fixed scenario set W, where each w ˜ ∈ W satisfies

1 and wi ≥ 0, ∀1 ≤ i ≤ n. Let ρˆ be a natural risk statistic induced by W:

Pn

n X ρˆ(˜ x) = sup { wi x(i) }, ∀˜ x ∈ Rn ,

i=1 wi

=

(26)

w∈W ˜ i=1

If every weight w ˜ is monotonic, i.e., w1 ≤ w2 ≤ . . . ≤ wn , ∀w ˜ ∈ W,

(27)

then ρˆ satisfies subadditivity and is, therefore, a coherent risk statistic. Remark: Comparing (25), (26), and (27), we see the main differences between the natural risk statistic and the coherent risk statistic: (1) A natural risk statistic is a supremum of L-statistic (which is a weighted average of order statistics), while a coherent risk statistic is a supremum of weighted sample average. There is no simple linear function that can transform L-statistic to weighted sample average. (2) Although VaR is not a coherent risk statistic, VaR is a natural risk statistic. In other words, though being simple, VaR is not without justification, as it also satisfies a different set of axioms. (3) If one assigns larger weights to larger observations, resulting in (27), then a natural risk statistic become a coherent risk statistic. However, assigning larger weights to larger observations leads to less robust risk statistics, as the statistics become more sensitive to larger observations. Proof of Theorem 3. We only need to show that under condition (27), the risk statistic (26) satisfies subadditivity for any x ˜ and y˜ ∈ Rn . Let (k1 , . . . , kn ) be the permutation of (1, . . . , n)

x + y˜)k2 ≤ . . . ≤ (˜ x + y˜)kn . Then for i = 1, . . . , n − 1, the partial sum up such that (˜ x + y˜)k1 ≤ (˜

to i satisfies

i i i i X X X X (˜ x + y˜)(j) = (˜ x + y˜)kj = (xkj + ykj ) ≥ (x(j) + y(j) ). j=1

j=1

j=1

(28)

j=1

In addition, we have for the total sum n n n X X X (˜ x + y˜)(j) = (xj + yj ) = (x(j) + y(j) ). j=1

j=1

j=1

21

(29)

Re-arranging the summation terms yields n n−1 i n X X X X ρˆ(˜ x + y˜) = sup { wi (˜ x + y˜)(i) } = sup { (wi − wi+1 ) (˜ x + y˜)(j) + wn (˜ x + y˜)(j) }, w∈W ˜ i=1

w∈W ˜ i=1

j=1

j=1

This, along with the fact wi − wi+1 ≤ 0 and equations (28) and (29), shows that n−1 X

ρˆ(˜ x + y˜) ≤ sup {

w∈W ˜ i=1 n X

(wi − wi+1 )

n i X X (x(j) + y(j) ) + wn (x(j) + y(j) )} j=1

n X

j=1

= sup {

wi x(i) +

≤ sup {

n X wi x(i) } + sup { wi y(i) } = ρˆ(˜ x) + ρˆ(˜ y ),

w∈W ˜ i=1 n X w∈W ˜ i=1

i=1

wi y(i) }

w∈W ˜ i=1

from which the proof is completed. ¤

6.2

Relation with the Insurance Risk Measure

Similar to the coherent risk statistic, we can extend the insurance risk measure to the insurance risk statistic as follows: Definition 3. A risk statistic ρˆ : Rn → R is called a insurance risk statistic, if it satisfies

the following axioms G1-G3:

Axiom G1. Monotonicity: ρˆ(˜ x) ≤ ρˆ(˜ y ), if x ˜ ≤ y˜.

Axiom G2. Comonotonic additivity: ρˆ(˜ x + y˜) = ρˆ(˜ x) + ρˆ(˜ y ), if x ˜ and y˜ are comonotonic. Axiom G3. Continuity: x − d)+ ) = ρˆ(˜ x+ ), lim ρˆ(min(˜ x, d)) = ρˆ(˜ x), lim ρˆ(max(˜ x, d)) = ρˆ(˜ x), lim ρˆ((˜

d→0

d→∞

d→−∞

where (˜ x − d)+ = max(˜ x − d, 0).

Given an observation x ˜ = (x1 , . . . , xn ) from the loss variable X, the empirical distribution of P X is given by Fˆ (x) = n1 ni=1 1{x≤xi } . Replacing the tail probability P (X > t) by empirical tail probability 1 − Fˆ (t) in the distorted probability representation of the insurance risk measure

(2), we obtain the representation of insurance risk statistic ρˆ(˜ x) =

Z

0

−∞

(g(1 − Fˆ (t)) − 1)dt +

Z

0



g(1 − Fˆ (t))dt =

n X

wi x(i) ,

(30)

i=1

n−i where wi = g( n−i+1 n ) − g( n ), i = 1, . . . , n. Since g is nondecreasing, g(0) = 0 and g(1) = 1, it P follows that wi ≥ 0, i = 1, . . . , n, and ni=1 wi = 1.

22

Comparing (4) and (30), we see that the natural risk statistic is the supremum of L-statistics, while the insurance risk statistic is just one L-statistic. Therefore, the insurance risk statistics cannot incorporate different scenarios. On the other hand, each weight w ˜ = (w1 , . . . , wn ) in a natural risk statistic can be considered as a “scenario” in which (subjective or objective) evaluation of the importance of each ordered observations is specified. Hence, the nature risk statistic reflects the idea of evaluating the risk under different scenarios, similar to the coherent risk measure. The following counterexample shows that if one incorporates different scenarios, then the comonotonic additivity may not hold, as the strict comonotonic subadditivity may prevail. A Counterexample: Consider a natural risk statistic defined by x ∈ R3 . ρˆ(˜ x) = max(0.5x(1) + 0.5x(1) , 0.72x(1) + 0.08x(2) + 0.2x(3) ), ∀˜ Let z˜ = (3, 2, 4) and y˜ = (9, 4, 16). By simple calculation we have ρˆ(˜ z + y˜) = 9.28 < ρˆ(˜ z ) + ρˆ(˜ y ) = 2.5 + 6.8 = 9.3, even though x ˜ and y˜ are comonotonic. Therefore, the comonotonic additivity fails, and this natural risk statistic is not an insurance risk statistic. In summary, insurance risk statistic cannot incorporate those two simple scenarios.

7

Tail Conditional Median: a Robust Natural Risk Statistic

In this section, we propose a special case of the natural risk statistic, which we call the tail conditional median and compare it with an existing coherent risk measure, the tail conditional mean. Theoretical and numerical results are provided to illustrate the robustness of the proposed tail conditional median.

7.1

The Differences between Tail Conditional Expectation and Tail Conditional Median

A special case of coherent risk measures, the tail conditional expectation, or TCE for short, has gained popularity since it was proposed by Artzner et al.[3]. TCE satisfies subadditivity for continuous random variables, and also for discrete random variables if one define quantiles for discrete random variables properly; see Acerbi and Tasche ??. The TCE is also called expected shortfall by Acerbi et al.[1] and conditional value-at-risk by Rockafellar and Uryasev [23] and

23

Pflug [20]. More precisely, the TCE at level α is defined by TCEα (X) = mean of the α-tail distribution of X,

(31)

where the distribution in question is one with distribution function Gα (x) defined by ( 0 for x < VaRα (X) Gα (x) = P (X≤x)−α for x ≥ VaRα (X). 1−α If the distribution of X is continuous, TCEα (X) = E[X|X ≥ VaRα (X)].

(32)

Essentially, TCEα (X) is the regularized version of the tail conditional expectation E[X|X ≥ VaRα (X)].

As we will see that TCE is not robust and is sensitive to model assumptions and outliers, here we propose an alternative, the tail conditional median (TCM), as a way of measuring risk to ameliorate the problem of robustness. The TCM at level α is defined as TCMα (X) = median[X|X ≥ VaRα (X)].

(33)

Essentially, TCMα (X) is the conditional median of X given that X ≥ VaRα (X). If X is

continuous then TCMα (X) =VaR 1+α (X). However, for discrete random variable or data, one 2

simply uses the definition (33) and there is a difference between TCMα (X) and VaR 1+α (X), 2

depending on ways of defining quantiles for discrete random variables and discrete data. There are several differences between the TCE and TCM. First, there are theoretical differences. For example, the TCM will not in general satisfy subadditivity, although TCE generally does; and the TCM is a natural risk statistic while the TCE is a coherent risk statistic. Secondly, there may be significant numerical differences between TCE and TCM. To illustrate this, we shall use three data sets. The first one is a data set of auto insurance claims with sample size 736. In Table 1, we calculated the risk measure TCEα and TCMα with α ranging from 95% to 99%. The absolute difference and the relative difference of the two measures are reported in the last two columns of the table, which shows that the relative differences can be as high as over 60% when α < 99% and about 30% when α = 99%. The second one is a data set of incendie insurance claims with sample size 736. Table 2 shows the TCEα and TCMα for this data set. The relative differences between TCEα and TCMα are also very significant for this data set . 24

α 99.0% 98.5% 98.0% 97.5% 97.0% 96.5% 96.0% 95.5% 95.0%

α 99.0% 98.5% 98.0% 97.5% 97.0% 96.5% 96.0% 95.5% 95.0%

Table 1: TCM and TCE for auto insurance claim TCEα TCMα TCEα − TCMα 6390627.0523 4489416.3847 1901210.6676 4454513.7015 1682970.0123 2771543.6892 3681944.0471 1384060.8997 2297883.1474 3014237.8755 1039186.8726 1975051.0028 2579508.4877 962778.2851 1616730.2026 2333814.6040 851033.8563 1482780.7477 2073066.4541 705136.3357 1367930.1185 1865231.5196 676514.4433 1188717.0763 1736077.5343 662045.2762 1074032.2581

Table 2: TCM and TCE for incendie insurance claim TCEα TCMα TCEα − TCMα 326941430.7051 221019625.0743 105921805.6308 217524078.9696 68744819.4502 148779259.5194 174769416.7362 41636726.9290 133132689.8072 138375399.0228 24408413.1754 113966985.8474 114576698.9187 23633013.0148 90943685.9039 101683055.9714 18352824.0186 83330231.9528 88411141.0208 13845230.0549 74565910.9660 78274066.9951 10493594.7803 67780472.2148 72110217.3841 8572087.3993 63538129.9848

TCE α −TCM α TCE α

29.75% 62.22% 62.41% 65.52% 62.68% 63.53% 65.99% 63.73% 61.87%

TCE α −TCM α TCE α

32.40% 68.40% 76.18% 82.36% 79.37% 81.95% 84.34% 86.59% 88.11%

The third data set is S&P 500 daily index from January 03, 1980 to December 21, 2005. More precisely, let x ˜ = (x1 , x2 , . . . , xn ) be the negative of the net daily return calculated from the data, where n = 6556 here. Then we apply TCE and TCM to x ˜ to measure the risk. Table 7.1 shows the results for α ranging from 95.0% to 99.9%. The relative differences of TCE and TCM are in the range of 12.72% ∼ 25.70%.

7.2

Robustness Comparison between the Tail Conditional Expectation and Tail Conditional Median

Next we show that the tail conditional median is more robust than the tail conditional expectation. The left panel of Figure 1 shows the value of TCEα with respective to log(1 − α) for Laplace distribution and T-distribution, where α is in the range [0.95, 0.999]. As demonstrated in the graph, if the model assumes the loss distribution to be Laplace while the underlying true

25

α 99.9% 99.5% 99.0% 98.5% 98.0% 97.5% 97.0% 96.5% 96.0% 95.5% 95.0%

TCEα 0.0922 0.0487 0.0383 0.0337 0.0308 0.0288 0.0272 0.0259 0.0248 0.0239 0.0231

TCMα 0.0685 0.0389 0.0306 0.0280 0.0259 0.0245 0.0233 0.0224 0.0217 0.0207 0.0196

TCE α −TCM α TCE α

TCEα − TCMα 0.0237 0.0098 0.0078 0.0057 0.0050 0.0043 0.0038 0.0035 0.0032 0.0032 0.0035

25.70% 20.21% 20.24% 16.97% 16.15% 14.94% 14.13% 13.54% 12.72% 13.21% 15.05%

Table 3: The TCE and TCM for S&P 500 index daily losses (negative returns) from Jan 03, 1980 to Dec 21, 2005 with the range for α is from 95.0% to 99.9%. The table shows a significant difference between the TCE and TCM. loss distribution is T-distribution, the calculated TCE value can be far from the true value. TCE for Laplace and T-distributions

TCM for Laplace and T-distributions

9

9 Laplace T-3 T-5 T-12

8

7

7

6

6 TCM α

TCE α

8

5

5

4

4

3

3

2

2 -7

Laplace T-3 T-5 T-12

-6

-5 log(1- α)

-4

-3

-7

-6

-5 log(1- α)

-4

-3

Figure 1: TCE and TCM for Laplace and T-distributions with degree of freedom 3, 5, 12. All distributions are normalized with mean 0 and variance 1. The TCM is less sensitive to changes in distribution, as the right panel has narrower range in y-axis. The right panel of Figure 1 shows the value of TCMα with respective to log(1 − α) for

Laplace distribution and T-distribution. As seen from the figure, TCMα is more robust than 26

TCEα in the sense that it is less sensitive to the tail type of the underlying distribution.

7.3

Influence Functions of Tail Conditional Expectation and Tail Conditional Median

Influence functions introduced by Hampel [15] [16] is a useful tool in assessing the robustness of an estimator. Let F be a distribution function. The quantity of interest determined by F is represented by an estimator, T (F ), where T is a functional mapping from a family of distribution functions to R. For x ∈ R, let δx be the point mass 1 at x. The influence function of the estimator T (F ) at x is defined by

IF (x, T, F ) = lim ε↓0

T ((1 − ε)F + εδx ) − T (F ) . ε

The influence function yields information about the rate of change of the estimator T (F ) with respect to a contamination point x to the distribution F . An estimator T is called bias robust at F , if its influence function is bounded, i.e., γ ∗ = sup IF (x, T, F ) < ∞. x

If the influence function of an estimator T (F ) is unbounded, an outlier in the data may cause problems. Lemma 3. Suppose the loss distribution has a density fX (·) which is continuous and positive at VaR 1+α (X), then the influence function of TCMα is given by 2

⎧ (α − 1)/2 ⎪ ⎪ , x < VaR 1+α (X) ⎪ ⎪ 2 ⎪ f (VaR 1+α (X)) ⎪ ⎨ X 2 x = VaR 1+α (X) IF (x, TCMα , X) = 0, 2 ⎪ ⎪ ⎪ (1 + α)/2 ⎪ ⎪ (X) ⎪ ⎩ fX (VaR 1+α (X)) x > VaR 1+α 2 2

Suppose the loss distribution has a density fX (·) which is continuous and positive at VaRα (X), then the influence function of TCE is given by ⎧ ⎨VaRα (X) − E[X|X ≥ VaRα (X)], if x ≤ VaRα (X) α x IF (x, TCEα , X) = (34) ⎩ − E[X|X ≥ VaRα (X)] − VaRα (X) if x > VaRα (X) 1−α 1−α Proof. The result of TCM is from [30] equation (3.2.3). To show (34), note that by equation

(3.2.4) in [30] the influence function of the (1 − α)-trimmed mean T1−α (X) := E[X|X < 27

VaRα (X)] is IF (x, T1−α , X) =

(

x−(1−α)VaR α (X) α

− E[X|X < VaRα (X)], if x ≤ VaRα (X) if x > VaRα (X) VaRα (X) − E[X|X < VaRα (X)]

(35)

By simple calculation, the influence function of E[X] is IF (x, E[X], X) = x − E[X].

(36)

Since E[X] = αT1−α (X) + (1 − α)TCEα , it follows that IF (x, E[X], X) = αIF (x, T1−α , X) + (1 − α)IF (x, TCEα , X).

(37)

Then (34) follows from equations (35), (36) and (37). ¤ From Lemma 3, we see that sup IF (x, TCMα , X) < ∞, sup IF (x, TCEα , X) = ∞ x

x

Hence, TCE has unbounded influence function but TCM has bounded influence function, which implies that TCM is more robust against the influence of outliers in the data.

8

Conclusion

We propose a new, data based, risk measure, called natural risk statistic, that are characterized by a new set of axioms. The new axioms only require subadditivity for comonotonic random variables, thus relaxing the subadditivity for all random variables in the coherent risk measure, and the comonotonic additivity in the insurance risk measure. The relaxation is consistent with the prospect theory in psychology. Comparing to two previous risk measures, the natural risk statistic include tail conditional median which is more robust than tail conditional expectation suggested by the coherent risk measure; and, unlike the insurance risk measure, the natural risk statistic can also incorporate scenario analysis. The natural risk statistic includes the VaR as a special case and therefore show that VaR, though simple, is not irrational. There are several open problems left. First, the natural risk statistic proposed here is only a static risk measure. It will be of great interest if it can be extended to dynamic risk measures. Furthermore, just like subadditivity in the coherent risk measure has been extended to convex risk measures, we pose a conjecture that the comonotonic subadditivity can be extended to comonotonic convexity. 28

References [1] Acerbi, C. and C. Nordio and C. Sirtori (2001). Expected shortfall as a tool for financial risk management. Working paper can be downloaded from http:// www.gloriamundi.org [2] Acerbi, C. and D. Tasche (2002). On the coherence of Expected Shortfall. Journal of Banking and Finance, 26, 1487-1503. [3] Artzner, P., F. Delbaen, J.-M. Eber, and D. Heath (1999). Coherent measures of risk. Mathematical Finance, 9, 203-228. [4] BarndorD-Nielsen, O. E. and N. Shephard (2001). Non-Gaussian Ornstein-Uhlenbeck based models and some of their uses in financial economics (with discussion), J. Roy. Statist. Soc. Ser. B, 63, 167-241. [5] Basel Committee on Banking Supervision (1988). International Covergence of Capitial Requirement an Capital Standard. [6] Basel Committee on Banking Supervision (1996). Amendment to the Capital Accord Incorporate Market Risks. [7] Basel Committee on Banking Supervision (1988). International Covergence of Capitial Requirement an Capital Standard: A Revised Framework. [8] Chicago Mercantile Exchange (1995). Standard Portfolio Analysis of Risk. [9] Daníelsson, J., B. N. Jorgensen, G. Samorodnitsky, M. Sarma, C. G. de Vries (2005) Subadditivity Re-examined: the Case for Value-at-Risk. Working paper, available at http://ideas.repec.org/p/fmg/fmgdps/dp549.html. [10] Denneberg, D. (1994) Non-additive measure and integral, Kluwer Academic Publishers, Boston. [11] Dhaene, J., R. J. A. Laeven, S. Vanduffel, G. Darkiewicz, and M. J. Goovaerts (2005) Can a Coherent Risk Measure Be Too Subadditive?

Working paper, available at

http://www.econ.kuleuven.be/insurance/pdfs/subadditive.pdf [12] Dhaene, J., M. J. Goovaerts, and R. Kaas (2003). Economic Capital Allocation Derived from Risk Measures. North American Actuarial Journal. 7, 44-56. 29

[13] Eberlein, E. and U. Keller. (1995). Hyperbolic distributions in finance. Bernoulli, 1, 281299. [14] Embrechts, P., A. Mcneil, and D. Straumann (2001). Correlation and dependency in risk management: properties and pitfalls. In Risk Management: Value at Risk and Beyond, edited by M.A.H. Dempster, Cambridge University Press, U.K. [15] Hampel, F. R. (1968) Contributions to the theory of robust estimation. Ph.D. thesis. University of California, Berkeley. [16] Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of American Statistical Association, 62, 1179-1186. [17] Heyde, C. C. and S. G. Kou (2004). On the controversy over tailweight of distributions. Operations Research Letters, 32, 399-408. [18] Huber, P. J. (1981). Robust Statistics, Wiley, New York. [19] Kahneman, D. and A. Tversky (1979) Prospect theory: an analysis of decision under risk. Econometrica, 47, 1979, 263-292. [20] Pflug, G. (2000). Some remarks on the Value-at-Risk and the conditional Value-at-Risk. In Probabilistic Constrained Optimization: Methodology and Applications, edited by S. Uryasev, Kluwer [21] Quiggin, J. (1982) A theory of anticipated utility. Journal of economic behaviour and organization, 3, 323-343. [22] Rachev, S. and S. Mittnik (2000). Stable Paretian Models in Finance, Wiley, New York. [23] Rockafellar, R. T. and S. Uryasev (2002). Conditional Value-at-Risk for General Loss Distributions. Journal of Banking and Finance, 26, 1443-1471. [24] Wang, S. S., V. R. Young, and H. H. Panjer (1997). Axiomatic characterization of insurance prices. Insurance: Mathematics and Economics, 21, 173-183. [25] Schmeidler, D. (1986) Integral representation without additivity. Proceedings of the American Mathematical Society, 97, 255-261.

30

[26] Schmeidler, D. (1989) Subjective probability and expected utility without additivity, Econometrica, 57, 571-587. [27] Tversky, A. and D. Kahneman (1992) Advances in prospect theory: cumulative representation of uncertainty. Journal of risk and uncertainty, 5, 297-323. [28] Wakker, P. and A. Tversky (1993) An axiomatization of cumulative prospect theory. Journal of risk and uncertainty, 7, 147-176. [29] Yaari, M. E. (1987) The dual theory of choice under risk. Econometrica, 55, 95-115. [30] Staudte, R. G., and S. J. Sheather (1990) Robust Estimation and Testing, Wiley, New York.

31