- Email: [email protected]

Recommendations on the use of Bayesian optimal designs for choice experiments Roselinde Kessels, Bradley Jones, Peter Goos and Martina Vandebroek

DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI)

KBI 0617 Electronic copy of this paper is available at: http://ssrn.com/abstract=968586

Recommendations on the use of Bayesian optimal designs for choice experiments Roselinde Kessels∗ Bradley Jones† Peter Goos‡ Martina Vandebroek§

Abstract In this paper, we argue that some of the prior parameter distributions used in the literature for the construction of Bayesian optimal designs are internally inconsistent. We rectify this error and provide practical advice on how to properly specify the prior parameter distribution. Also, we present two pertinent examples to illustrate that Bayesian optimal designs generally outperform utility-neutral optimal designs that are based on linear design principles. Keywords: choice experiments, Bayesian optimal designs, prior parameter distribution, utility-neutral optimal designs

1

Introduction

Choice experiments have become an increasingly popular method to understand consumers’ preference structures for the attributes of a product or service. In such experiments respondents make a sequence of choices. In each case they indicate their preferred product or service among a choice set of alternatives or profiles. A profile is thereby characterized by a combination of attribute levels. The design of a choice experiment comprises a select number of choice sets administered to each respondent. The aim of a choice experiment is to estimate the importance of each attribute and their levels based on the respondents’ preferences. The estimates are then used to mimic real marketplace choices by making predictions about consumers’ future purchases.

∗

Department of Decision Sciences and Information Management, Faculty of Economics and Applied Economics, Katholieke Universiteit Leuven, Naamsestraat 69, 3000 Leuven, Belgium. † SAS Institute Inc., SAS Campus Drive, Cary, NC 27513, USA. ‡ Department of Mathematics, Statistics and Actuarial Sciences, Faculty of Applied Economics, Universiteit Antwerpen, Prinsstraat 13, 2000 Antwerpen, Belgium. § Department of Decision Sciences and Information Management & University Center for Statistics, Katholieke Universiteit Leuven, 3000 Leuven, Belgium.

1

Electronic copy of this paper is available at: http://ssrn.com/abstract=968586

The question of how to design efficient choice experiments has received a great deal of attention recently. Designing an efficient choice experiment involves selecting those choice sets that result in a precisely estimated model providing accurate predictions. At present, two design approaches are prevalent: the Bayesian design approach and the linear design approach. We review these current practices for setting up choice experiments. Bayesian choice designs have so far been constructed for the multinomial logit model (McFadden 1974). This discrete choice model predicts for profile j, j =P 1, ..., J, in choice 0 0 set s, s = 1, ..., S, the probability that people prefer it: pjs = exjs β / Jt=1 exts β . Here, xjs is a k × 1 vector of the attribute levels of profile j in choice set s and β is a k × 1 vector of parameter values. The multinomial logit probability is derived from people’s latent utility for profile j in choice set s: ujs = x0js β + εjs where εjs is an i.i.d. extreme value error term. Since the multinomial logit model is nonlinear in the parameters, like all other choice models, the quality of a given design depends on the unknown parameter vector. The Bayesian design approach deals with this problem by assuming a prior distribution of likely parameters. It thereby takes into account the uncertainty on the proposed parameters. To date, most of the Bayesian research focus has been on designs for main-effects models. S´andor and Wedel (2001) were the first to introduce the Bayesian design procedure in the choice design literature. They generated Bayesian designs using the D-optimality criterion for the multinomial logit model. This design criterion seeks to minimize the determinant of the variance-covariance matrix of the parameter estimators. In the Bayesian framework, it is referred to as the DB -optimality criterion. S´andor and Wedel (2001) showed that the DB -optimal designs generally outperform the locally DP -optimal designs which are based on a point estimate for the unknown parameter vector (Huber and Zwerina 1996). S´andor and Wedel (2005) continued the Bayesian approach to construct so-called heterogeneous DB -optimal designs that include several different designs that are each offered to different respondents. Kessels et al. (2006a) expanded the work on Bayesian choice designs by also considering other design criteria than the commonly used DB -optimality criterion. They compared the DB - and AB -optimality criteria with the GB - and VB -optimality criteria for the multinomial logit model. The DB - and AB -optimality criteria concentrate on producing precise estimates, whereas the GB - and VB -optimality criteria focus on providing precise predictions, which is key in choice experiments. Using a simulation study, Kessels et al. (2006a) demonstrated that the DB - and AB -optimal designs actually produce more precise estimates and that the GB - and VB -optimal designs produce better predictions. Also, they showed that the DB -optimal designs perform reasonably well in terms of prediction. To quickly generate the Bayesian designs, Kessels et al. (2006b) developed an adaptive algorithm. The high speed of this algorithm stems from the use of a small designed sample of prior parameters to approximate the prior distribution, Meyer and Nachtsheim’s (1995) coordinate-exchange algorithm, and an update approach to economically calculate the criterion values of designs that differ only in one profile from another design. Kessels 2

et al. (2006b) recommended using VB -optimal designs primarily because they are faster to compute. Also, Kessels et al. (2006b) preferred minimizing the average prediction variance to minimizing the maximum prediction variance over the design region, as the VB - and GB -optimal designs do, respectively. Currently, however, linear design principles are still used to construct designs for choice experiments. Such designs are based on an implicit assumption that the respondents are indifferent to all attribute levels, and thus to all alternatives. Moreover, there is no uncertainty associated with the indifference. This is equivalent to adopting a zero prior parameter vector with zero prior variance for the multinomial logit model. The designs are therefore referred to as utility-neutral designs and they are utility balanced by assumption (Huber and Zwerina 1996). Utility-neutral designs for main effects as well as main effects plus interactions have been discussed at length. To generate them, Kuhfeld and Tobias (2005) proposed a D-efficient factorial design algorithm implemented in the SAS %MktEx macro. This algorithm combines Cook and Nachtsheim’s (1980) modification of Fedorov’s (1972) exchange algorithm, the coordinate-exchange algorithm with simulated annealing, and a very large catalog of orthogonal arrays. Street et al. (2001) and Street and Burgess (2004) followed a more theoretical approach providing generators to construct utility-neutral paired comparison designs for two-level attributes. In paired comparison designs profiles are arranged in choice sets of size two. The authors used the nonlinear Bradley-Terry model, that is the logit model for paired evaluations, for which they assumed zero prior parameter values. In most of this work, the focus was on the D-optimality criterion, but Street et al. (2001) also computed A-optimal pairs, which minimize the sum or the average of the variances of the parameter estimators. Furthermore, Burgess and Street (2003) derived D-optimal utility-neutral designs for two-level attributes of any choice set size. Even more flexible designs allowing for attributes with any number of levels are elaborated in Burgess and Street (2005). Finally, Street et al. (2005) showed that the theoretical strategies proposed in their aforementioned papers produce utility-neutral designs that are better than those based on common strategies. The assumption of complete indifference among all alternatives underlying the utilityneutral designs is surely unrealistic. The Bayesian design approach is more practical since it incorporates all available prior information in the designs. Moreover, Bayesian optimal designs are on average more efficient than utility-neutral optimal designs. In this paper we show that this is indeed true using two design examples. Before doing so, we first rectify a misunderstanding with respect to the specification of the prior parameter distribution. In a number of Bayesian design examples studied by S´andor and Wedel (2001), Kessels et al. (2006a) and Kessels et al. (2006b) where prior information on the parameter vector β from previous experiments is lacking, the specification of the prior distribution is impractical. In these examples Bayesian optimal designs are constructed for a multivariate normal distribution π(β) = N (β|β 0 , Σ0 ), where the elements in the prior mean β 0 are equally spaced between −1 and 1 for each attribute and the prior variance-covariance 3

matrix Σ0 is the identity matrix. In the next section, we show that these specifications of β 0 and Σ0 conflict. Also, we provide some general recommendations on how to properly specify the prior parameter distribution π(β) = N (β|β 0 , Σ0 ) for any design case.

2

Guidance on correctly specifying the prior parameter distribution

We illustrate for the 32 × 2/24 design example, initiated in Kessels et al. (2006a) and extended in Kessels et al. (2006b), that the prior parameter distribution used to construct the Bayesian designs is unrealistic. In this example, the profiles are composed of two attributes at three levels and one attribute at two levels. The total number of design profiles is 24 and they have been arranged in choice sets of size two, three and four. Through effects-type coding, the number of elements, k, in the parameter vector β is five. The prior parameter distribution exploited was the multivariate normal distribution π(β) = N (β|β 0 , Σ0 ). The parameter values of the prior mean β 0 were evenly spaced between −1 and 1 for each attribute so that β 0 = [−1, 0, −1, 0, −1]0 , and the prior variance-covariance matrix Σ0 was the identity matrix I5 . We explain why these specifications of β 0 and Σ0 are contradictory. First, however, a note should be made about the effects-type coding we adopt in this paper. For a two-level attribute, S´andor and Wedel (2001), Kessels et al. (2006a) and Kessels et al. (2006b) coded the first level as −1 and the second level as 1 while specifying a prior mean parameter value of −1. In this way, a utility of 1 is attached to the first level and a utility of −1 to the second level so that the utilities decrease with the attribute levels. On the other hand, for attributes with more than two levels, the authors coded the levels such that the utilities increase with the levels given prior mean values that are equally spaced between −1 and 1. For example, for an attribute with three levels, the first level is coded as [1 0], the second level as [0 1] and the third level as [−1 − 1]. Given prior mean values of [−1, 0]0 the utilities associated with the three levels are −1, 0 and 1, respectively. In order to have the utilities increase with the levels for all attributes, we change the coding for a two-level attribute to 1 for the first level and to −1 for the second level. Consider now the two-alternative choice set in Table 1 for the 32 × 2/24 design example. This choice set is special in the sense that Alternative I consists of the worst possible levels for each attribute given the prior mean β 0 = [−1, 0, −1, 0, −1]0 and Alternative II consists of the best possible attribute levels. As a result, Alternative II dominates the choice set. This can also be seen from the logit probabilities given β 0 . The probability that Alternative I is chosen is 0.00247 and the probability that Alternative II is chosen is 0.99753. These probabilities are most extreme meaning that there is no other twoalternative choice set for the 32 × 2/24 example with more extreme logit probabilities given β 0 . So, these probabilities imply very strong prior information. In other words, the prior mean is very informative about the overall attractiveness of the two alternatives in 4

the choice set of Table 1.

Table 1: A two-alternative choice set for the 32 × 2/24 design example. Alt I II

Attr 1 2 3 1 1 1 3 3 2

Now, given the prior variance Σ0 = I5 , the parameters β 1 = [0, 0, 0, 0, 0]0 and β 2 = [−2, 0, −2, 0, −2]0 are equally likely under the prior mean β 0 and neither is improbable when drawn from a Monte Carlo sample. Using β 1 the probabilities of choosing Alternatives I and II are 0.5 each. Using β 2 the probabilities of choosing Alternatives I and II are 0.00001 and 0.99999, respectively. The differences in probabilities from using β 1 and β 2 seem to imply that not much prior information is assumed when Σ0 = I5 is used as the variance-covariance matrix of the prior distribution. To illustrate a more extreme case where prior information is completely lacking, we ponder the parameters β 3 = [1, 0, 1, 0, 1]0 and β 4 = [−3, 0, −3, 0, −3]0 . Similar to β 1 and β 2 , these parameters are equally likely when β 0 and I5 are used as prior mean and variance, and neither is improbable in a Monte Carlo sample. However, since they are further away from the prior mean, they are less plausible than β 1 and β 2 . Using β 3 the probabilities of choosing Alternatives I and II are 0.99753 and 0.00247, respectively. Using β 4 the probabilities are reversed, essentially equaling 0 and 1. Based on the above observations, the prior mean indicates that one has a substantial amount of prior knowledge about people’s preferences for the alternatives in the choice set of Table 1. In fact, one has so much prior information that the choice set should not be included in the design. This is indeed the case when examining the optimal designs with choice sets of size two generated by Kessels et al. (2006b). On the other hand, the prior variance implies that one has very little prior knowledge because the range of expected probabilities for the two alternatives in the choice set essentially goes from zero to one. Hence, the prior parameter distribution π(β) = N (β|β 0 , Σ0 ), with β 0 = [−1, 0, −1, 0, −1]0 and Σ0 = I5 is internally inconsistent. If one knows as much as the mean implies, then the variance should be smaller. If one knows as little as the variance implies, then the mean should be closer to the zero vector. Consequently, to specify a proper prior parameter distribution, one has to choose between 1. an informative mean with a small variance and 2. a less informative mean with a larger variance.

5

The first option makes sense if you are augmenting a previous study, or verifying its results. In that case, the posterior mean and variance from the previous work can be used as the prior mean and variance for the new study. The second strategy is appropriate in a case where no previous work has been done. Most often, one has some prior beliefs about the relative preferences for the attributes and its levels. It is sensible to incorporate these notions in the prior mean. However, if one is completely without intuition about what choices will be made by the market segment one is targeting, then the zero vector should be used as prior mean. When dealing with ordinal attributes like for example the price of an apartment, the speed of a computer, the size of a house, and so forth, one has generally a clear idea about the overall predilection for the attribute levels. Utilities usually either increase or decrease when going from the low to the high setting of an ordinal attribute. It is wise to reflect this information in the prior mean. To ensure that one uses an appropriate prior mean β 0 for the prior parameter distribution π(β) = N (β|β 0 , Σ0 ) when no previous studies have been performed, we propose the following sanity check for β 0 : 1. List all possible choice sets of size two and compute the multinomial logit probabilities for each profile in these choice sets given β 0 . 2. Check whether the probabilities for all alternatives are reasonable. Do they match one’s subjective probabilities or beliefs? Does one feel as confident about the alternatives as the logit probabilities imply? 3. (a) If yes, then β 0 is a good choice. (b) If no, then choose a new prior mean in accordance with your understanding: i. If the probabilities of the alternatives tend to overestimate one’s beliefs, or one knows less than the probabilities indicate, then β 0 should be taken closer to the zero vector. This draws the probabilities nearer to each other. ii. If the probabilities underestimate one’s understanding, or one knows more than the probabilities reveal, then β 0 should be taken somewhat further from the zero vector. This pulls the probabilities more apart. Note that overstating one’s beliefs occurs more frequently than understating one’s beliefs. Subsequently, verify whether the new prior mean is suitable by repeating the procedure. Instead of going through all possible choice sets of size two, a more instant check on the suitability of the prior mean β 0 is to examine only the choice set with the least attractive alternative and the most attractive alternative given β 0 . Assuming main-effects models, the least attractive alternative is composed by selecting the worst possible level for each attribute and the most attractive alternative is composed by selecting the best possible

6

level for each attribute. As already mentioned, the choice set of Table 1 groups these alternatives for the 32 × 2/24 design example given the prior mean β 0 = [−1, 0, −1, 0, −1]0 . Once the choice set with the most extreme alternatives is constructed, the logit probabilities should be computed and studied in order to evaluate β 0 using steps 2 and 3 of the sanity check proposed above. The probabilities of this single choice set supply a reasonable quick test of the appropriateness of β 0 . From the above discussion on the sanity check for the prior mean β 0 , it is clear that the number of attributes plays a role in the specification of β 0 . The more attributes are involved, the more extreme the logit probabilities for any choice set might be. In particular, the probabilities for the choice set with the most extreme alternatives might be close to zero and one. The more extreme the probabilities, the more confident one is supposed to be about the preferences for the alternatives. Consequently, the probabilities may readily overstate one’s beliefs. In case of a large number of attributes, we therefore advise against taking a prior mean far away from the zero vector and recommend using smaller absolute prior parameter values. We illustrate this argument by comparing the prior means β 01 = [−1, −1]0 and β 02 = [−1, −1, −1, −1, −1, −1]0 associated with two and six two-level attributes, respectively. Both prior means assume equally spaced elements between −1 and 1 for the levels of each attribute. The choice sets with the most extreme alternatives given each of these priors are shown in Tables 2a and 2b. Using β 01 the probabilities that Alternatives I and II in Table 2a are chosen are 0.01799 and 0.98201, respectively. Using β 02 the probabilities that Alternatives I and II in Table 2b are chosen equal 0.00001 and 0.99999.

Table 2: Two choice sets with the most extreme alternatives given a) β 01 = [−1, −1]0 and b) β 02 = [−1, −1, −1, −1, −1, −1]0 . Alt a) I II

Attr 1 2 1 1 2 2

Alt b) I II

Attributes 1 2 3 4 5 6 1 1 1 1 1 1 2 2 2 2 2 2

It is obvious that the probabilities for the choice set with two attributes in Table 2a are less extreme than those for the choice set with six attributes in Table 2b. For the choice set in Table 2b, one has to be virtually certain about the alternative that people prefer, whereas for the choice set in Table 2a, there is still room for a little hesitation. We believe that, without any data from a previous study, it is very rare to be completely confident about people’s preference evaluations for the choice set in Table 2b. In fact, already in the case of three attributes, the most extreme logit probabilities, being 0.00247 and 0.99753, most probably overvalue one’s notions. Note that these probabilities are independent of the number of levels for each attribute if the parameter values 7

in β 0 are evenly spaced between −1 and 1 per attribute. Only the value of −1 for the first level is important for each attribute then since the values for the other levels cancel each other out (for an example see Section 3.2). So we do not advocate the use of a prior mean β 0 with equally spaced elements between −1 and 1 for each attribute in the case of more than two attributes either. Concerning the specification of the prior variance-covariance matrix Σ0 , we argue that the variances should not be larger than 1. This is because a prior variance of 1 already indicates a great amount of uncertainty.

3

Bayesian designs outperforming utility-neutral designs

We now show with two design cases how Bayesian optimal designs outperform utilityneutral optimal designs on average. We focus on Bayesian designs computed by means of the DB - and VB -optimality criteria since these are the most appealing criteria from an estimation and prediction viewpoint, respectively (Kessels et al. 2006b). Moreover, Kessels et al. (2006a) demonstrated that the DB -optimality criterion also scores well in terms of prediction. In both design cases, we assume main-effects models for which no prior information is available from previous studies.

3.1

The 26 /2/8 case: DB - and VB -optimal choice designs versus an orthogonally blocked fractional factorial design

For a first design case, we computed DB - and VB -optimal designs and a utility-neutral optimal design of class 26 /2/8. The design profiles are thereby described by six two-level attributes and are grouped two by two in each of eight choice sets. So in total, the designs consist of 16 profiles each. Using effects-type coding, the number of elements, k, in the parameter vector is six. We constructed the Bayesian designs by assuming some prior beliefs about people’s preferences for the attribute levels. In accordance with the guidelines presented in Section 2, we incorporated these beliefs in the prior parameter distribution π(β) = N (β|β 0 , Σ0 ) by specifying the prior mean as β 0 = [−0.5, −0.5, −0.5, −0.5, −0.5, −0.5]0 and the prior variance-covariance matrix as Σ0 = 0.72 × I6 . We created the DB - and VB -optimal designs using the adaptive algorithm of Kessels et al. (2006b) provided in MATLAB 7. We performed 1, 000 tries or random starts of this algorithm for each of the criteria. As input to the algorithm, we constructed a systematic 20-point sample for generating the tries and drew a random 1, 000-point sample for evaluating the resulting designs. As a utility-neutral optimal design, we used an orthogonally blocked 26−2 fractional factorial design with blocks of size two. This fractional factorial design is locally D-, A-, G- and V-optimal for β P = [0, 0, 0, 0, 0, 0]0 given the present choice design configuration. 8

We produced it in JMP 6. The orthogonally blocked 26−2 fractional factorial design and the DB - and VB -optimal designs appear in Table 3. As can be seen, the choice sets of the fractional factorial design are completely level balanced, whereas those of the Bayesian optimal designs exhibit some level overlap.

Table 3: An orthogonally blocked 26−2 fractional factorial design used as utility-neutral optimal design and the DB - and VB -optimal designs for the 26 /2/8 example. Choice Alt set 1 I II 2 I II 3 I II 4 I II 5 I II 6 I II 7 I II 8 I II

1 2 1 2 1 1 2 2 1 2 1 2 1 2 1 2 1

26−2 FracF Attr 2 3 4 5 2 2 1 1 1 1 2 2 1 1 2 2 2 2 1 1 2 1 1 2 1 2 2 1 2 1 1 2 1 2 2 1 2 1 2 1 1 2 1 2 1 2 1 2 2 1 2 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1

6 1 2 1 2 1 2 2 1 1 2 1 2 2 1 2 1

1 1 1 2 1 2 1 2 1 2 1 2 1 2 2 2 2

2 1 1 2 1 1 2 1 2 1 2 2 1 1 2 1 2

DB Attr 3 4 2 1 1 2 2 1 2 2 1 1 1 2 1 1 2 1 2 2 1 1 1 2 2 1 1 2 2 2 2 2 1 1

5 2 1 2 2 1 2 2 1 1 2 1 1 2 1 2 1

6 1 2 2 2 2 1 2 1 1 2 1 2 1 2 2 2

1 2 1 1 2 2 2 2 1 2 2 1 2 2 2 2 2

2 1 1 1 1 1 2 1 2 2 1 1 2 1 1 1 2

VB Attr 3 4 1 1 1 1 2 2 1 1 1 2 1 1 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 1 2 1 2 2

5 1 2 1 2 1 1 2 2 1 2 2 2 1 2 1 1

6 2 2 2 2 2 1 1 2 2 1 1 2 2 2 2 1

Figure 1 contains two plots comparing the utility-neutral optimal design or the orthogonally blocked 26−2 fractional factorial design to the DB - and VB -optimal designs. Figure 1a shows the relative DP -efficiencies of the fractional factorial design to the DB optimal design for various true parameter vectors and Figure 1b shows the relative VP efficiencies of the fractional factorial design to the VB -optimal design. The plusses in the graphs correspond to true parameter vectors going from [−1.5, −1.5, −1.5, −1.5, −1.5, −1.5]0 through the prior mean of the Bayesian designs, β 0 = [−0.5, −0.5, −0.5, −0.5, −0.5, −0.5, ]0 , and finally to the implied prior mean of the utility-neutral design, β P = [0, 0, 0, 0, 0, 0]0 . Thus each plus sign represents a true parameter of the form [c, c, c, c, c, c]0 where c is on the interval [−1.5 0]. At the far left hand side of Figure 1a comparing DP -efficiencies, the DB -optimal design is about 40% more efficient than the utility-neutral design. The relative DP -efficiency of the utility-neutral design increases until c = −0.64 where the two designs are equally efficient. For less negative values of c, the utility-neutral design is more efficient than 9

2.2

2.2 2 Relative V−efficiency of the fractional factorial design

Relative D−efficiency of the fractional factorial design

2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −1.5

1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2

−1.25

−1 −0.75 −0.5 True value for each of 6 parameter elements

−0.25

0 −1.5

0

(a) Relative DP -efficiencies of the 26−2 design to the DB -optimal design

−1.25

−1 −0.75 −0.5 True value for each of 6 parameter elements

−0.25

0

(b) Relative VP -efficiencies of the 26−2 design to the VB -optimal design

Figure 1: Relative local efficiencies of the orthogonally blocked 26−2 fractional factorial design to the Bayesian optimal designs for various true parameter vectors starting from [−1.5, −1.5, −1.5, −1.5, −1.5, −1.5]0 and moving toward [0, 0, 0, 0, 0, 0]0 with equal values for each parameter element. the DB -optimal design. Consequently, at the prior mean of the Bayesian designs, where c = −0.5, the utility-neutral design outperforms the DB -optimal design, but only slightly by less than 10%. For the zero parameter vector the utility-neutral design is about 45% more efficient than the DB -optimal design. Figure 1b shows a similar trend for the relative VP -efficiencies. There, the crossover point for the VB -optimal design and the utility-neutral design to be equally efficient is found at c = −0.73. At the prior mean of the Bayesian designs, at c = −0.5, the utilityneutral design is about 35% more efficient than the VB -optimal design. Note, however, that at the far left hand side of this plot, the relative VP -efficiency of the utility-neutral design is less than 20%. Alternatively, one could say that the VB -optimal design is roughly five times more efficient than the utility-neutral design for the parameter vectors in this corner. By contrast, at the zero parameter vector the utility-neutral design is only twice as efficient. In summary, we can conclude the following from Figure 1. While the utility-neutral optimal design is more efficient with respect to the DP - and VP -optimality criteria than the Bayesian optimal designs for true parameter vectors that are small in magnitude, the Bayesian designs are far more robust to true parameter values that are some distance away from the prior mean. Since the prior mean of the Bayesian designs has its parameter values of −0.5 fairly close to zero, the utility-neutral design is slightly more efficient there than the Bayesian designs. To further illustrate this, we plotted similar graphs as in Figure 1 but for a different range of true parameter vectors. The relative DP -efficiencies of the fractional factorial 10

design to the DB -optimal design appear in Figure 2a and the relative VP -efficiencies of the fractional factorial design to the VB -optimal design appear in Figure 2b. Here, the true parameter vectors go from [−2, −0.5, 0, 0, 0, 0]0 to the implied prior mean of the utilityneutral design, β P = [0, 0, 0, 0, 0, 0]0 . Each plus sign now corresponds to a true parameter of the form [c, c/4, 0, 0, 0, 0]0 where c is on the interval [−2 0]. 2.2

2.2

2 Relative V−efficiency of the fractional factorial design

Relative D−efficiency of the fractional factorial design

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2 −2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

−1.75

−1.5

−1.25 −1 −0.75 −0.5 True value for the first parameter element

−0.25

0.2 −2

0

(a) Relative DP -efficiencies of the 26−2 design to the DB -optimal design

−1.75

−1.5

−1.25 −1 −0.75 −0.5 True value for the first parameter element

−0.25

0

(b) Relative VP -efficiencies of the 26−2 design to the VB -optimal design

Figure 2: Relative local efficiencies of the orthogonally blocked 26−2 fractional factorial design to the Bayesian optimal designs for various true parameter vectors starting from [−2, −0.5, 0, 0, 0, 0]0 and proportionally moving toward [0, 0, 0, 0, 0, 0]0 . At the far left hand side of Figure 2a, the DB -optimal design is more than twice as efficient as the utility-neutral design. The same can be observed for the VB -optimal design in terms of VP -efficiency at the far left hand side of Figure 2b. This happens despite the fact that the parameter vector [−2, −0.5, 0, 0, 0, 0]0 is almost equally far from β P = [0, 0, 0, 0, 0, 0]0 as from β 0 = [−0.5, −0.5, −0.5, −0.5, −0.5, −0.5, ]0 . More specifically, if we denote [−2, −0.5, 0, 0, 0, 0]0 by β t , then the Euclidean distances d(β t , β P ) and d(β t , β 0 ) approximately equal two. In Figure 2a the DB -optimal design and the utility-neutral design are equally efficient at c = −0.99. The crossover point for the VB -optimal design and the utility-neutral design to be equally efficient occurs at c = −1.17 in Figure 2b. These values of c are more negative than in Figures 1a and 1b because four of the six parameter elements are remaining at zero which is advantageous to the utility-neutral design. Furthermore, it should be noted that many of the parameter vectors to the left of the vertical lines in Figures 2a and 2b are actually closer to the zero vector than to the prior mean of the Bayesian designs. Yet in spite of this, the DB - and VB -optimal designs are more efficient than the utility-neutral design in the left panels of these figures. We now examine more closely the estimation and prediction capabilities of the utilityneutral and Bayesian optimal designs at the true parameter β t = [−2, −0.5, 0, 0, 0, 0]0 . 11

In this way, we show how poorly the utility-neutral design performs when the true parameter vector consists of values at a distance away from zero. The relative DP - and VP -efficiencies of the utility-neutral and Bayesian optimal designs at the true parameter β t are included in Table 4. The DB -optimal design turns out to be fairly efficient in terms of the VP -optimality criterion compared with the VB -optimal design. Also, the VB -optimal design is fairly efficient in terms of the DP -optimality criterion relatively to the DB -optimal design. So in terms of relative DP - and VP -efficiency at β t , the Bayesian designs behave similarly and contrast with the utility-neutral design.

Table 4: Relative DP - and VP -efficiencies of the orthogonally blocked 26−2 fractional factorial design and the Bayesian optimal designs to the DB - and VB -optimal designs, respectively. The efficiencies are obtained at the true parameter vector β t = [−2, −0.5, 0, 0, 0, 0]0 . Rel. eff. DP VP

26−2 DB 44% 100% 40% 95%

VB 92% 100%

We further compare the estimation and prediction performance of the utility-neutral and Bayesian designs at the true parameter β t with a simulation study. Based on β t we simulated 100 datasets with choices of 200 respondents for each of the fractional factorial and DB - and VB -optimal designs. We subsequently estimated the parameter values for each dataset. In Figures 3a, 3b and 3c we plotted the 100 estimates for β1t = −2 against the 100 estimates for β2t = −0.5 for the fractional factorial design and the DB - and VB -optimal designs, respectively. In this way, we obtain additional information on the correlation between the estimates for β1t and β2t . From Figure 3a, we clearly observe that a substantial number of the estimates from the fractional factorial design are far away from their true values. Moreover, the estimates for β1t and β2t are strongly correlated. This means that if β1t is poorly estimated, β2t is poorly estimated as well. Not surprisingly, the estimates from the DB -optimal design in Figure 3b are all very precise, but some from the VB -optimal design in Figure 3c are less precise. For these two designs, the estimates are almost uncorrelated. Figure 4 shows the box plots with 100 predicted probabilities based on the 100 estimates for β3t , β4t , β5t and β6t for each of the three designs. Since these four coefficients have a true value of zero, the predicted probabilities should ideally be 0.5. We thus only consider the last four attributes to study the variability around the predicted probability of 0.5. Profiles described by these four attributes are referred to as partial profiles as they only include a subset of the attributes. Because of the zero parameter values the predicted probabilities can be calculated for any partial profile in any choice set with two partial profiles composed of the last four attributes. In the choice set we used, one partial alternative has all four attributes at the first level and the other alternative has all four 12

(a) Orthogonally blocked 26−2 fractional factorial design

(b) DB -optimal design

(c) VB -optimal design

Figure 3: Scatter plots showing the correlation between 100 estimates for β1t = −2 and β2t = −0.5. attributes at the second level. We computed the predicted probabilities for the latter alternative.

Figure 4: Box plots of 100 predicted probabilities based on 100 estimates for β3t = β4t = β5t = β6t = 0. They are shown for the orthogonally blocked 26−2 fractional factorial design and the DB - and VB -optimal designs. Clearly, the box plot for the orthogonally blocked 26−2 fractional factorial design is substantially wider than the box plots for the DB - and VB -optimal designs. Also, there are outlying predicted probabilities near 0 and 1 for the fractional factorial design. Using Levene’s test for equality of variances, the significance probability is 5 × 10−13 . As a result, there is no doubt that the predictions from the fractional factorial design have a substantially higher variance than the predictions from the DB - and VB -optimal designs. Further, there is no significant difference between the quality of the predictions from the DB - and VB -optimal designs. 13

So for a true parameter vector with one or more values reasonably large in magnitude, the relative DP - and VP -efficiencies, the scatter plots uncovering the correlation between the estimates and the box plots showing the prediction variances have all illustrated that the utility-neutral design has noticeably worse properties than the Bayesian designs. On the other hand, at the zero parameter vector, the utility-neutral design is the best design option. However, the utility-neutral design’s implied prior mean of all zero values indicates that none of the attributes has much impact on consumer preferences. If this assumption were true, then it would make no sense to run the experiment. Hence, the Bayesian designs should generally be favored. They provide the best estimates and predictions on average for a whole range of true parameter vectors including true parameter values that are fairly large in magnitude. Our conclusions so far are all based on this 26 /2/8 design example. In a second design example, we compare the Bayesian designs with a classical design with a completely different choice design structure. We do this to show that the characteristics we noted in the current example are not unique.

3.2

The 42 /4/4 case: DB - and VB -optimal choice designs versus an orthogonally blocked full factorial design

In this second design case, we produced DB - and VB -optimal designs and a utility-neutral optimal design of class 42 /4/4. Here, the profiles are configured from two attributes with 4 levels each and are arranged in 4 choice sets of size 4. As in the previous design case, the designs comprise 16 profiles. Also similar to the first case is that the number of elements, k, in the parameter vector is six. We generated the Bayesian designs under the assumption that one’s beliefs about people’s predilections are well represented by the prior parameter distribution π(β) = N (β|β 0 , Σ0 ) with β 0 = [−1, −1/3, 1/3, −1, −1/3, 1/3]0 and Σ0 = 0.42 × I6 . As explained in Section 2, it is reasonable to equally space the parameter values in β 0 between −1 and 1 for each attribute if only two attributes are assumed. An accompanying prior variance of 0.16 thereby expresses a small amount of uncertainty. We again produced each of the DB - and VB -optimal designs using 1, 000 tries of the adaptive algorithm of Kessels et al. (2006b). We used a systematic sample of 20 parameters for the design generation and a Monte Carlo sample of 1, 000 parameters for the design evaluation. For the realization of the utility-neutral optimal design, we generated an orthogonally blocked full 42 factorial design in JMP 6. Given the current choice design structure, this design is locally D-, A-, G- and V-optimal for β P = [0, 0, 0, 0, 0, 0]0 . Table 5 shows the orthogonally blocked full 42 factorial design and the DB - and VB -optimal designs. Like in the preceding design case, there is no level overlap in the full factorial design, but some is present in the Bayesian designs.

14

Table 5: An orthogonally blocked full 42 factorial design used as utility-neutral optimal design and the DB - and VB -optimal designs for the 42 /4/4 example. Choice Alt set 1 I II III IV 2 I II III IV 3 I II III IV 4 I II III IV

FullF Attr 1 2 2 2 4 4 3 3 1 1 2 3 3 4 1 2 4 1 1 4 2 1 3 2 4 3 2 4 1 3 4 2 3 1

DB Attr 1 2 4 1 3 2 1 3 2 1 1 4 3 1 1 2 2 3 3 3 2 1 1 2 2 2 3 1 2 4 4 2 1 3

VB Attr 1 2 3 3 2 1 4 2 4 1 4 3 1 4 3 1 2 2 1 2 1 4 3 4 2 3 4 1 1 3 3 2 2 4

To demonstrate that the Bayesian optimal designs should generally be preferred to the utility-neutral optimal design or the orthogonally blocked full 42 factorial design, we plotted again two graphs with relative efficiencies for various true parameter vectors. They appear in Figure 5. Figure 5a shows the DP -efficiencies of the full factorial design relative to the DB -optimal design and Figure 5b shows the VP -efficiencies of the full factorial design relative to the VB -optimal design. The true parameter vectors go from [−1.5, −0.5, 0.5, −1.5, −0.5, 0.5]0 through the prior mean of the Bayesian designs, β 0 = [−1, −1/3, 1/3, −1, −1/3, 1/3]0 , to end up again at the implied prior mean of the utility-neutral design, β P = [0, 0, 0, 0, 0, 0]0 . So each plus sign corresponds to a true parameter of the form [c, c/3, −c/3, c, c/3, −c/3]0 where c is on the interval [−1.5 0]. Figures 5a and 5b clearly confirm our finding that the Bayesian designs substantially outperform the utility-neutral design for parameter values reasonably large in magnitude, whereas the utility-neutral design is more efficient for parameter vectors close to the zero vector. As far as DP -efficiency is concerned, the far left hand side of Figure 5a shows that the DB -optimal design outperforms the utility-neutral design by approximately 35%. The efficiency gap steadily decreases until c = −0.66, where the two designs are equally efficient, after which it increases in favor of the utility-neutral design. At the zero parameter vector the utility-neutral design is 25% more efficient than the DB -optimal design. A similar course can be observed for the relative VP -efficiency in Figure 5b. Here, however, 15

1.4

1.3

1.3 Relative V−efficiency of the full factorial design

Relative D−efficiency of the full factorial design

1.4

1.2

1.1

1

0.9

0.8

0.7

0.6 −1.5

1.2

1.1

1

0.9

0.8

0.7

−1.25

−1 −0.75 −0.5 True value for the first parameter element

−0.25

0.6 −1.5

0

(a) Relative DP -efficiencies of the 42 design to the DB -optimal design

−1.25

−1 −0.75 −0.5 True value for the first parameter element

−0.25

0

(b) Relative VP -efficiencies of the 42 design to the VB -optimal design

Figure 5: Relative local efficiencies of the orthogonally blocked full 42 factorial design to the Bayesian optimal designs for various true parameter vectors starting from [−1.5, −0.5, 0.5, −1.5, −0.5, 0.5]0 and proportionally moving toward [0, 0, 0, 0, 0, 0]0 . the efficiency gaps at the outer sides of the plot are smaller and the VB -optimal design and the utility-neutral design are equally efficient at c = −0.84. The crossover points for each of the Bayesian designs and the utility-neutral design to be equally efficient are clearly larger than c = −1. This is because the starting true parameter vector at c = −1.5 does not include any zero values and thereby lies relatively far from the zero vector. We could also observe this in Figures 1a and 1b. Consequently, at the prior mean of the Bayesian designs, where c = −1, the Bayesian designs are more efficient than the utility-neutral design. We expected this result because the prior mean lies rather far from the zero vector. Recall that in Figures 1a and 1b on the other hand, the utility-neutral design is more efficient at the prior mean of the Bayesian designs since the parameter values of −0.5 are fairly close to zero.

4

Conclusion

In this paper, we had two goals. First, we wanted to provide some practical recommendations on how to properly specify the prior parameter distribution for constructing Bayesian choice designs. We did this because some of the prior distributions used in the literature are internally inconsistent. Second, we wished to illustrate that Bayesian designs have generally better properties than utility-neutral designs. We therefore used two separate examples. In the Bayesian choice design literature, we noticed that in the absence of prior information from a previous enquiry, the specifications of the prior mean and variance conflict. 16

One has to be careful not to take a prior mean that is too informative compared with a specific prior variance. Therefore, we established a sanity check for the prior mean. It is built around the principle that one’s expectations about the preferences for alternatives in choice sets of size two should be in line with the logit probabilities for those alternatives given the prior mean. A quick look at the choice set with the most extreme alternatives already provides some profound insights about the prior mean’s suitability. Furthermore, we advise to take a prior variance of one as upper limit for the specification of the variances as this indicates already a lot of uncertainty. In the choice design literature, the Bayesian design approach competes with the linear design approach for the production of choice designs. The Bayesian approach should however be favored because Bayesian designs are constructed for a prior parameter distribution incorporating all prior knowledge, whereas linear or utility-neutral designs are generated under the assumption that all alternatives are equally preferred by the respondents. Utility-neutral designs can thus be seen as Bayesian designs with zero prior mean and prior variance. Note that even if the prior variance is very small around zero, Bayesian designs are utility-neutral designs. However, if one believes in these specifications behind utility-neutral designs, then it would make no sense to run the experiment. A zero prior mean is only justified if it is accompanied by a large prior variance to identify the situation where one is completely without intuition about people’s preferences. In that case, Bayesian designs differ from utility-neutral designs. Not surprisingly therefore, our study of two design examples showed that Bayesian designs substantially outperform utility-neutral designs whenever some true parameter values are reasonably large in magnitude, whereas utility-neutral designs are more efficient for true parameter vectors close to the zero vector. As one generally conducts an experiment when one anticipates a number of important attribute levels, and thus a number of fairly large parameter values, Bayesian designs should clearly be preferred.

References Burgess, L. and Street, D. J. (2003). Optimal designs for 2k choice experiments, Communications in Statistics – Theory and Methods 32: 2185–2206. Burgess, L. and Street, D. J. (2005). Optimal designs for choice experiments with asymmetric attributes, Journal of Statistical Planning and Inference 134: 288–301. Cook, R. D. and Nachtsheim, C. J. (1980). A comparison of algorithms for constructing exact D-optimal designs, Technometrics 22: 315–324. Fedorov, V. V. (1972). Theory of Optimal Experiments, New York: Academic Press.

17

Huber, J. and Zwerina, K. (1996). The importance of utility balance in efficient choice designs, Journal of Marketing Research 33: 307–317. Kessels, R., Goos, P. and Vandebroek, M. (2006a). A comparison of criteria to design efficient choice experiments, Journal of Marketing Research 43: 409–419. Kessels, R., Jones, B., Goos, P. and Vandebroek, M. (2006b). An efficient algorithm for constructing Bayesian optimal choice designs, working paper, Katholieke Universiteit Leuven, Belgium. Kuhfeld, W. F. and Tobias, R. D. (2005). Large factorial designs for product engineering and marketing research applications, Technometrics 47: 132–141. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior, in Frontiers in Econometrics, Zarembka, P., ed. New York: Academic Press, 105–142. Meyer, R. K. and Nachtsheim, C. J. (1995). The coordinate-exchange algorithm for constructing exact optimal experimental designs, Technometrics 37: 60–69. S´andor, Z. and Wedel, M. (2001). Designing conjoint choice experiments using managers’ prior beliefs, Journal of Marketing Research 38: 430–444. S´andor, Z. and Wedel, M. (2005). Heterogeneous conjoint choice designs, Journal of Marketing Research 42: 210–218. Street, D. J., Bunch, D. S. and Moore, B. J. (2001). Optimal designs for 2k paired comparison experiments, Communications in Statistics – Theory and Methods 30: 2149–2171. Street, D. J. and Burgess, L. (2004). Optimal and near-optimal pairs for the estimation of effects in 2-level choice experiments, Journal of Statistical Planning and Inference 118: 185–199. Street, D. J., Burgess, L. and Louviere, J. J. (2005). Quick and easy choice sets: constructing optimal and nearly optimal stated choice experiments, International Journal of Research in Marketing 22: 459–470.

18

Copyright © 2019 PROPERTIBAZAR.COM. All rights reserved.