Conjoint Analysis for Evaluating Parameterized Gamut Mapping Algorithms

758 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010

Conjoint Analysis for Evaluating ParameterizedGamut Mapping Algorithms

Peter Zolliker, Member, IEEE, Zofia Baranczuk, Iris Sprow, and Joachim Giesen, Member, IEEE

Abstract—We show that conjoint analysis, a popular multi-at-tribute preference assessment technique used in market research,is a well suited tool to evaluate a multitude of gamut mapping algo-rithms simultaneously. Our analysis is based on data from psycho-visual tests assessed in a laboratory and in a web environment.Conjoint analysis allows us to quantify the contribution of everysingle parameter value to the perceived value of the algorithm; italso allows us to test the influence of additional parameters likegamut size or color shifts. We show that conjoint analysis can be in-dividualized to images or observers if enough data is available. Es-pecially promising in this respect is the combination of individualand population data.

Index Terms—Conjoint analysis, gamut mapping, psychomet-rics, web-based testing.

I. INTRODUCTION

T HE rendering of a color image to device limitations, alsocalled gamut mapping, is fundamental for digital color

reproduction. Despite being a classical topic, for an overviewsee Morovic [1], gamut mapping is still an active area of re-search. In recent years, research on gamut mapping algorithms(GMAs) has focused on image dependence [2], [3] and spa-tial mapping algorithms [4]–[7]. A very important part in thedevelopment of GMAs is their evaluation. Here human per-ception is the ultimate judge that determines which of the dif-ferent competing algorithms is the most effective. Psychometricscaling is a common method to measure the perceived imagequality and image differences [8]. An overview of the under-lying psycho-physics is given by Falmange [9]. The quality ofgamut mapping algorithms is typically measured with psycho-visual tests that involve paired comparisons. In a paired com-parison, an observer is shown an original image and two imagesobtained from different mapping algorithms. The observer hasto identify the mapped image perceived to represent the orig-inal better. In order to improve the quality and comparabilityamong studies, the technical committee of CIE published guide-lines [10] on how to conduct psycho-visual tests assessing the

Manuscript received May 20, 2009; revised September 24, 2009. First pub-lished December 22, 2009; current version published February 18, 2010. Thiswork was supported by the Hasler Stiftung Proj. # 2200. The associate editorcoordinating the review of this manuscript and approving it for publication wasMr. Vishal Monga.

P. Zolliker, Z. Baranczuk, and I. Sprow are with the Media TechnologyLaboratory, Swiss Federal Laboratories for Materials Science and Testing,Überlandstrasse 129, CH-8600 Dübendorf, Switzerland (e-mail: [email protected]; [email protected]; [email protected]).

J. Giesen is with the Friedrich-Schiller-University, Ernst-Abbe-Platz 2,D-07743 Jena, Germany (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2009.2038833

quality of GMAs. In recent studies, these guidelines are com-monly used as a basic for psycho-visual studies on gamut map-ping algorithms [4], [5], [11], [12].

We use psycho-visual tests not only to compare a few finalGMAs but also in the development stage of mapping algorithms.Our approach builds on the insight that gamut mapping can beseen as a highly parameterized problem. There are many, some-times competing, parameters relevant for gamut mapping. Firstof all, the preservation of hue, lightness and saturation, but alsothe preservation of spatial image information such as local con-trast and smooth gradients. Also, in the realization of GMAs, wehave a choice of working color spaces, the mapping direction,compression type (clipping, linear, nonlinear compression) andthe choice of source gamut description. We use psycho-visualtests—paired comparisons—to determine an optimized param-eter setting. The data elicitation phase of our test is the sameas in traditional psycho-visual tests conducted to compare dif-ferent gamut mapping algorithms. In particular, the number ofpaired comparisons per observer is not larger, and the numberof observers needs not to be significantly larger, although thepotential number of mapping algorithms that can be comparedis much larger. The difference to traditional psycho-visual testscomparing GMAs is in the way how we analyze the elicited data.We are using conjoint analysis that essentially fits a linear model[13] to the data by assigning a part-worth value to each param-eter level. The quality value of a parameter setting—besides thealgorithm’s parameters this can include additional parameterslike gamut size—is then the sum of the part-worth values of theparameter levels used. The number of potential parameter set-tings that can be compared using conjoint analysis is determinedby the number of levels tested for each parameter, i.e., it is theproduct of these numbers which can be quite large. In additionto quality values of a parameter setting we are also interested inextending and testing the underlying model, including param-eter interdependencies, choice models, and the influence of in-dividual images and observers.

We should point out that we are not the first who system-atically include observer experiments in the development ofGMAs; see, for example, the work done by Kang et al. [14].Multivariate analysis techniques also have been used in imageprocessing to gauge the importance of parameters, see forexample the book of Keelan [15], where the scaling betweendifferent parameters is ensured by using of “Just NoticeableDifferences” (JNDs).

This paper is organized as follows. In the next section,“Methodology”, we review and further develop a conjoint anal-ysis technique used in [13] and described in [16] by extendingThurstone’s Law of Comparative Judgment that is often usedto analyze paired comparison data to the multi-attribute case.

1057-7149/$26.00 © 2010 IEEE

ZOLLIKER et al.: CONJOINT ANALYSIS FOR EVALUATING PARAMETERIZED GAMUT MAPPING ALGORITHMS 759

In the section “Experiments”, the tested algorithms and thepsycho-visual experiments are described. In section “Resultsand Discussion”, we present and discuss the user studies withrespect to the applicability of conjoint analysis in image qualitystudies. Special emphasis is put on the potential of includingindividual results from observers and images. Preliminaryresults of this study were published in two recent conferencepapers [17], [18].

II. METHODOLOGY

A. Discrete Choice Models

Given a set of stimuli, e.g., GMAs, and choice data of theform with . We know the frequency ( -ma-trix) of stimulus being preferred over stimulus (number oftimes is preferred over ). We consider the proportion( -matrix) of stimulus being preferred over stimulus

as an indirect measure for the distance of the “qualities” ofand of , respectively.1

Discrete choice models build on the assumption that the ob-served choices are outcomes of random trials: confronted withthe two options an observer assigns quality values

and , respectively, to the stimuli,where the error terms and are drawn independently fromthe same distribution. The observer then prefers the stimuluswith larger quality value. Hence, the probability that ispreferred over is given as

Here, we discuss discrete choice models that differ in thechoice of distribution for . We consider normal distributions(Thurstone’s (probit) model [19]), and Gumbel distributions(Bradley-Terry’s (logit) model [20]). In both cases, the qualities

can be computed by a least squares approach from theprobability which can be estimated by the proportion .

1) Thurstone’s Model: In Thurstone’s model [19], the errorterms are drawn from a normal distribution withexpectation 0 and variance . Here we consider Thurstone’scase V model, where the variances for all stimuli are assumedto be equal. The difference is also normally distributedwith expectation 0 and variance and, thus

1We introduced the bias correction � in order to eliminate numerical prob-lems for pairs of items, which have zero entries in the frequency matrix. In thedata analysis of this paper we used � � ��. For a discussion of different biascorrection formulas, see Engeldrum [8].

where is the cumulative distribution function of the standardnormal distribution

This is equivalent to

(1)

Using the proportion of being preferred over we set2

2) Bradley-Terry’s Model: In Bradley-Terry’s model [20],the error terms are drawn from a standard Gumbel distribu-tion, i.e., the distribution with location parameter andscale parameter . Since the difference of two independentGumbel distributed random variables is logistically distributed,we have

This implies

which is equivalent to

(2)

As we did for Thurstone’s model, we set

3) Computing Quality Values Using Least Squares: Fromboth Thurstone’s and Bradley-Terry’s model, we get an estimate

for the difference of the quality values and . We esti-mate the ’s as the best approximation for the ’s (all equallyweighted) in a least squares sense. Let be the approximation

. We want to minimize the residual

A necessary condition for the minimum of the residual is thatall partial derivatives vanish, which gives

2Note that, for both models (Thurstone and Bradley-Terry), the �-matrixisantisymmetric, thus � � �� .


Hence

If we normalize by setting

then the values that minimize the residual are given by

4) Extension for Incomplete Frequency Matrices: In the fol-lowing, we give a probabilistic interpretation of the least squaresapproach and extend it to incomplete frequency matrices. Weassume that the entries of the -matrix are noisy measure-ments of the true differences . If we assume that the noiseterm is independent of the pair , then we have

where is the -matrix (indexed by the pairs/elementsin ) that has the entries

If we further assume that is normally distributed with mean0 and variance , then the likelihood function for the qualityvalues

has the form

This function (or the logarithm of it) is maximized at whichmaximizes

i.e., the residual function that we maximized in the last subsec-tion under the additional constraint .

The assumption that the variance of the error term is in-dependent of the pair is generally not true, in particular, ifnot all pairs were compared equally often. We can actually esti-mate this variance for every pair using error analysis(see the next subsection). We can normalize the least squaresapproach to accommodate the difference in variance by multi-plying with the following diagonal matrix:

That is, we assume

where is normally distributed with mean 0 and variance(independent of the pair ), the maximum (log)likelihoodsolution is then given as

In order to enforce our constraint we addan equation with this constraint. We define , , as thefollowing:

and

where defines the weight of that equation and can be inter-preted as the uncertainty of the constraint. The final equation tocompute is

Let us notice that the solution for is independent from thechoice of . The strength of this linear regression method is,that it is applicable also for incomplete paired comparisons.If unit weights are used in , the solution corresponds to theincomplete matrix solution to Thurstone case V, published byMorrissey [21], as well as Gulliksen [22].

5) Error Analysis: Reliable error analysis of the Thurstonecase is not trivial as discussed for example by Montag [23].We performed error analysis in two ways, by analytical erroranalysis using error propagation as well as by statistical errorsampling.

6) Analytic Error Analysis:Error of Proportion Matrix Elements : We treat every

comparison of stimulus and stimulus as an independentBernoulli trial with success probability . Here preferringover is considered success. We estimate by . Under ourassumption converges to when the number of trials goesto infinity. For a finite number of trials the standard deviation ofa Bernoulli trial is estimated as

where is the number of comparisons between stimuli and.

Error of Scale Values : To compute the errors of thequality values, we use error propagation. Errors for the entries ofthe -matrix are computed using the formula for employedin the respective model (Thurstone or Bradley-Terry). For theThurstone case, this yields


and for Bradley-Terry

Error of Quality Values : If we have an equally filledproportion matrix, the errors of the quality values are com-puted using the formula

In the case of linear regression for incomplete -matrices wecan estimate the error for the scale values using the diagonalelements in matrix

is the variance of which all are equal to onebecause all equations were weighted with their estimated error.The estimated errors for the individual scale values are

The second term compensates for the assumed uncertaintyof the constraint. Note that the estimation for is independenton the choice of . If is chosen small enough, the secondterm can be ignored.

7) Sampling Error Analysis: The theoretical error esti-mations rely on several model assumptions, in particular theindependency of preference choices. For comparison, we alsouse a sampling error analysis by dividing the choice data ran-domly into two groups. For both groups, we computed the scorevalues individually and estimate errors from the differencesof the values obtained from both groups. This process wasrepeated several times and the results were averaged to increasethe accuracy of the error estimation. Furthermore, we tested thedependency of the error estimation on individual observers orimages. To test observer dependencies, we used a biased sam-pling strategy of the choices by dividing the observers into tworandom groups and attributing the choice data of an observerto the corresponding group. Similarly, the image dependencywas tested by dividing the images into two random groups andattributing all choice data on an image to the correspondinggroup.

B. Conjoint Analysis

We now assume that the set of stimuli has a structure,namely we assume it is a parameterized domain. We call a do-main parameterized if it is given as a Cartesian product

of parameter sets . Each elementof is a vector , where . The

elements are called parameter levels. One goal of conjointanalysis is to determine how much each parameter level con-tributed to the observed outcome of a (preferential) choice mea-surement—this contribution is called the part-worth value of the

parameter level. As for the discrete choice models, we assumethat we are given a set of choice data on the stimulus set .We want to estimate the part-worth values of all the parametervalues from the choice data.

In [13], the least squares approach described for discretechoice models is extended to the multiparameter (conjoint)case. The extension entails to apply least squares method foreach parameter using Thurstone’s model, which provides aninitial set of part-worth values. Rescaling these values makesthe scales of the different parameters comparable. The overallvalue of a stimulus in (in our case an incarnation of ourmaster gamut mapping algorithm) is then obtained by summingup the rescaled part-worth values of the parameter levels presentin the object. In the following we first provide more details onthe rescaling approach.

1) Rescaling for Thurstone’s Model: Note that Thurstone’smodel has a free parameter that we always (without loss ofgenerality) set to 1. Now let be the quality valuescomputed using Thurstone’s model for every parameter withlevels . To get the quality value of a stimulus in ,we sum up all the quality values (part-worth) of the parameterlevels present in the stimulus, i.e., we assume a linear model,but for the linear model to be meaningful, the quality values fordifferent parameters need to be on comparable scales. To makethe scales for the different parameters comparable we normalizethem by the following normalization procedure: for any param-eter the quality values are normally distributedwith variance and expected value 0 which are drawn fromanother normal distribution with expected value 0 and variance

. Hence, quality values for the levels of parameter aredrawn from a normal distribution with varianceand expected value 0 (as the convolution of two normal distribu-tions with expected value 0 and variances and , respec-tively). The value is the same for all and will bechosen such that the quality values for different parameter arecomparable. Since we compute quality values from choices onthe stimuli level by our assumption the stimuli quality valuesare all drawn from the same normal distribution, i.e., all the

are equal. Hence, the value is inde-pendent of . Now we want to find a re-scale value

, such that , where the constant doesnot depend on . Without loss of generality we can set the con-stant to 1 which yields for ,or if we assume (for the computationon the parameter level we have to fix the value of anyway,so we can set it to 1 without loss of generality. In the followingwe fix the parameter and drop it from the index. Using anestimator for the variance we can estimate from the qualityvalues computed from Thurstone’s model scaled by , i.e., ,where is the quality value of level , as follows:3

3Note that � �� , where � �� is normally distributedwith zero mean and variance �� , and also note that we compute the � setting� � � and normalizing by �� .


where is the frequency the ’th level is preferred over the ’thlevel. Plugging the resulting estimate for into ,we get

Now for the fixed parameter , we rescale the quality valuesby the value of as defined above. The normal-

ized quality values of the parameter levels are our part-worthsthat we assume to contribute linearly to the quality of a stim-ulus, i.e., the quality value of a stimulus is

which is the sum of the part-worths of the param-eter levels present in the stimulus.

C. Testing the Model

1) Mosteller’s Test: Mosteller’s test is used to test the as-sumption on the parameter level that the quality values are un-correlated, equally distributed variables (either normally in caseof Thurstone’s model, or Gumbel distributed in case of Bradley-Terry’s model). A description of Mosteller’s test can be foundin [8] or in the original article [24]. Here we only summarize itbriefly. The goal of Mosteller’s test is to compare the computedquality values , or more precisely their differencesto the observed proportions . We use the respective distribu-tion function, i.e., either (1) in case of Thurstone’s model, or(2) in case of Bradley-Terry’s model, to compute probabilities

from the differences . Then we transform bothand into angles and , respectively, using the arcsinetransformation given by

and

The arcsine transformation converts binomially distributed fre-quencies into asymptotically normally distributed variables withvariance , where is the number of choices betweenstimulus and stimulus . Our hypothesis is that is nor-mally distributed with expectation and variance forall . As a test statistic, we use

If the hypothesis is true, then the test statistic is approxi-mately -distributed with degrees of freedom. Thus, atlevel , we have to compare our test statistic to the quan-tile of the -distribution with degrees of freedom.

2) Equivalence of Data Sets: Note that a test similar tothe one used in Mosteller’s test can be used to test for significantdifferences between frequency matrices for different viewingscenarios. Let and be proportion matrices on differentdata sets obtained from the same population of observers. Asdata sets with different numbers of judgments are compared,their variances have to be adjusted. The difference of indepen-dent normal distributions with variances and , respec-tively, is a normal distribution with variance . Our

hypothesis is that both distributions have the same mean. As thetest statistic, we use

where

and

are the arcsine transformations. It should be noted that the en-tries of all proportion matrices are considered together. Thenumber of degrees of freedom is now the number of elementsin the sum.

3) Linearity Assumption in Conjoint Analysis: Here we wantto describe how to test the linearity assumption that we makefor conjoint analysis. Let and be two parameters, let

be the parameter that results from combiningand , and let be its levels. We compute quality

values for the levels of in two different ways. First, for everylevel with and , we add upthe comparable quality values for and that we computeas described before. Let be the resulting scale values.Second, we apply Thurstone’s method directly to the combinedparameter and make the resulting quality values comparablewith the quality values of all levels of the parameters differentfrom and . This results in quality values . If ad-ditivity holds, then we expect that . Thus, our hypothesisis that for all . As the test statistic, we use

where and are computed by error propagation from theerrors of the observed frequencies. If the hypothesis is true, thenthe test statistic is approximately -distributed withdegrees of freedom. The hypothesis is rejected at a significancelevel of if where is thequantile of the -distribution with degrees of freedom.

4) Errors: Theoretical Versus Experimental: Anothermethod to test the model assumption is by comparing theoret-ical and experimental errors.

D. Measuring Model Performance: Cross Validation

Cross validation is a method to compare different model as-sumptions. We partition our choice data into random subsam-ples of the same size. Out of the subsamples, only sub-samples are used to compute the model which is then tested onthe remaining subsamples. The quality of a model is assessed interms of the percentage of correct predictions from observers’choices on the subsample not used to learn the model. Everysubsample is used for validating the model once, and the overallquality measure is the average percentage of correctly predictedobserver choices (also called hit rate). In order to improve thevalidity of the results, the hit rate can even be averaged over re-peating the sub-sampling several times.


III. EXPERIMENTS

A. Gamut Mapping Algorithms

The goal is to compare and assess the importance of param-eters, known to be relevant for gamut mapping in one study.The parameters include lightness, color, saturation and details.In our study, we consider one master algorithm with free param-eters. The master algorithm is quite simple, it maps any colorpoint in the source gamut along a line segment connecting afocal point and the color point into the destination gamut. Ad-ditionally, we consider the influence of detail enhancement andworking color space. Furthermore, we want to compare the in-fluence of those gamut mapping parameters with typical colorand lightness operations on an image and with parameters of thedestination gamut. In the following, we present the parameterswhich we have studied. We always used sRGB as source gamut,i.e., we did not consider the source gamut as a parameter.

1) Compression: Describes how our master algorithm movesa color point along the line segment. We tested different strate-gies: linear compression, clipping, and sigmoidal compressionalgorithms. In order to parameterize the sigmoidal compression,we used a weighted average of linear and nonlinear compression[25]. The scale factor is computed as

where• is the distance of the focal point to the color point that

needs to be mapped;• is the distance of the focal point to the source gamut

boundary;• is the distance of the focal point to the destination gamut

boundary;• is the weighting factor in the range .2) Details: Reconstructing details can improve the quality

of the mapped image essentially [4]–[6]. We used a detail en-hancement procedure independent of the master algorithm. Butwe can interpret it as a parameter of the master algorithm inthe sense that we can apply detail enhancement in varying de-grees to the results obtained from the master algorithm. We usethe detail enhancement method described in [4] with differentweighting factors . The other parameters were kept at defaultvalues.4

3) Color Space: Note that our master algorithm can be ap-plied in many color spaces and this parameter describes thechoice of working color space. In our study, we used either IPT[26] or CIELAB [27] as color space.

4) Color and Lightness: Another free parameter of ourmaster algorithm is the choice of focal point. The idea is toproduce well defined color and density shifts in the mappedimage by varying the focal point. A natural choice for the focalpoint is close to the mid point of the gray axis in the destinationgamut. Moving the focus point on the lightness axis results inan overall lightness change of the mapped image. The amountof the lightness change of a specific pixel decreases from amaximum for the colors close to the focus towards zero at the

4The values taken from [4] were � � ��, � � �� and � � �

gamut boundary. Similarly, a shift of the focus point in thechroma plane results in a color shift of the mapped image.

5) Hue: In order to study the influence of hue shifts, westudied the effect of images for which the color of all pixels wereshifted in hue by a defined angle prior to applying the masteralgorithm.

6) Gamut Size: To gauge the importance of the destinationgamut we also tested a gamut parameter. It is not actually afree parameter of our master algorithm, but we included it, be-cause it allows us to estimate the relative importance of thedestination device capabilities compared to the free parame-ters of the master algorithm. We tested four different destina-tion gamuts. The smallest was ISO-Newspaper, the largest ISO-Coated. The remaining gamuts were created from the two asweighted average.

7) Gamut Shift: This parameter describes a shift of the des-tination gamut in the working color space.

8) Gamut Rotation: Another parameter that we considered isa rotation of the destination gamut in the working color space.

B. Setup of the Studies

Two conjoint studies (here referred to as Study 1 and Study 2)have been designed to gauge the influence of the aforementionedparameters and their levels. In the context of conjoint analysiswe denote the parameters as attributes and the parameter settingas levels. After a preliminary evaluation of the results for Study1, we realized that the influence of two of the attributes (GamutShift and Gamut Rotation) was marginal. Thus, in Study 2, thosetwo attributes were replaced by new attributes (Color/Lightnessand Hue/Color Space).

In order to keep the number of possible combinations man-ageable, some of the parameters were combined into a singleattribute and the number of considered levels was reduced to areasonable size. Lightness and color were combined into oneattribute with only six levels. Because the main difference ofLAB and IPT color space are hue conservation issues, the pa-rameters Color Space and Hue parameters were also combinedto one attribute. In Study 1, only neighboring Gamut size levelswere used in the comparison. This restriction was removed inStudy 2 in order to better test the distribution assumption in theevaluation model.

The attributes and levels used for the two studies are sum-marized in Table I. For both studies every image had 1536 pos-sible mapping combinations. The five attributes had a total of22 levels.

C. Test Setup

In this section, we describe how we collected paired compar-ison data in psycho-visual experiments to analyze our mastergamut mapping algorithm. In every paired comparison, we pre-sented an original image and two images mapped by differentincarnations of our master algorithm on an LCD screen. Theoriginal image was presented in the upper half of the screen andthe two mapped images side by side below the original.

The two mappings were chosen at random from our param-eter space. For Study 1, we had the constraint that gamut sizelevels in the comparisons are consecutive levels in the natural


TABLE IATTRIBUTES AND THEIR LEVELS USED IN STUDY 1 AND STUDY 2

TABLE IITEST SETS USED IN THIS PAPER

order since larger differences in gamut size essentially deter-mine the choice.

The observers who participated in our test had to choosethe mapped image that reproduces the original better. For theirchoices, the observers used a mouse to click on the corre-sponding image. If no difference could be seen, the originalhad to be selected in order to avoid a forced choice.

1) Test Sets: The test data was collected in two differentsetups: A laboratory setup and a web setup. The key propertiesof the data sets used in this study are summarized in Table II.More details of the setup were published in a conference paper[18]. Here, we give only a summary.

2) Laboratory Setup: For the lab test, we used LCD displays.An 22” Eizo CG 241W-BK monitor calibrated to show sRGB-colors was used to display test images. The ambient illuminationmeasured in the middle of the switched off monitor was at 40 lx.Monitor flaps around the screen prevented flare. The monitorsbackground was set to a neutral gray.

3) Web Setup: For the Internet-based part, we had to con-sider a variety of viewing conditions and displays compared tothose from the laboratory set-up (concerning brightness, size,

Fig. 1. Part-worths for all parameter levels. The light bars (red) show results ofStudy 1 and the filled bars (blue) those of Study 2. Error bars show one estimatedstandard deviation.

resolution, white point, and color gamut). Therefore, additionalinformation was collected from the web study participants con-cerning their employed system (ambient illumination, displaytype and size, Internet browser and operating system). We usedJPEG images with very low compression and a maximal widthor height of 400 pixels which resulted in about 150 KB perimage. The resulting test pages were verified to be presentableon common operating systems, browsers and even small laptopdisplays.

4) Observers: Three user groups were considered in this ex-periment: lab , web, and cross-link. Observers in the lab usergroup were recruited from staff of our institutes, and participantsof a symposium who were mostly color experts. Each observerhad passed the Ishihara test for color deficiency. To recruit ob-servers for the web user group the Internet test was posted onour institute home-page. Students, color specialists, and otherpeople were invited to participate via e-mail and Internet usergroups. In the cross link, study the same observer participatedin both environments (identified with a user ID). The study wascarried out by students from the Swiss Federal Institute of Tech-nology Zürich and by staff of our institute. The number of par-ticipating people for each study are given in Table II.

5) Test Images: The image set included the obligatory ”Ski”image that is specified by the CIE 156:2004 guidelines and ad-ditional ISO images. A wide range of scenes including 99 dif-ferent images, was used in the experiment in order to get goodaverage results and to be able to study the influence of differentimages on the psycho-visual results. Most of them were takenfrom royalty free libraries as well as from private stock.

IV. RESULTS AND DISCUSSION

A. Importance of Parameters

First, we present the computed part-worths for all differentparameters individually. The results for both studies are shownin Fig. 1. The comparison of the part-worths allows to answerquestions like: What is the relative importance of the differentparameters? Which levels of the parameters are most preferred?

Second, we look at the importance of parameters, whichdescribes how much each parameter contributes to the quality


TABLE IIIIMPORTANCE OF PARAMETERS FOR BOTH STUDIES (SCALED SUCH THAT SUM

IS 1). THE ENTRIES ARE SORTED BY THE LAST COLUMN, WHICH SHOWS �

value on the stimulus level, i.e., the combination for all param-eters. For this, we use the standard deviation of the part-worthswithin the parameters. Note that the computed importance de-pends on the levels chosen for the parameters, e.g., if we chooselevels for a parameter that can hardly be distinguished, then theimportance of this parameter will be low, though it could behigh for a different choice of levels. Hence, the choice of levelsis an important task in conjoint analysis. The importance of thedifferent parameters is shown in Table III.

In Table III, we also show the standard deviation whichserves as a first order measurement of the perceived image dis-tance between original and mapped image. is the dis-tance between the transformed image and its original, averagedover the images. The standard deviation was calculatedfrom values for different levels of a specific parametertaking default levels of all other parameters (S3, C3, ShL0/L0,IPT-R0/IPT-H0, D1). Note that in general the importance of theparameters correlates with the average difference . An ex-ception is the Details parameter, which has a very smalldespite its relative importance. This is not surprising, as localcontrast conservation can not be measured by a global color dis-tance measure such as .

Gamut size is the most important parameter in both studies,but it is not the only deciding factor. Compression, Details, andin the case of Study 2 also Color/Lightness can all contributeto the quality as much as the difference between to consecutivegamut size levels.

Clipping emerges as the best compression method. Linearcompression is not well suited. Sigmoidal compression is thebetter the closer it is to clipping. This result shows that satura-tion is an important factor for respondents which is in agreementwith many gamut mapping studies in the literature [1].

About equally important as compression is detail conserva-tion. The higher the weighting factor the more it is preferred.Surprisingly this even holds for an exaggerated detail enhance-ment factor of 1.5. A factor of reconstructs small de-tails of the original image except for colors close to the gamutboundary and due to the edge-preserving filter also for colorsclose to an edge.

According to the computed part-worths, the preferred colorspace is IPT in Study 1 and CIELAB Study 2. This is rather un-expected. The advantage of IPT is, that it better preserves hueespecially in blue regions. On the other hand CIELAB may haveadvantages over IPT, because most gamut mapping algorithmsand their optimizations (e.g., choice of focus point) were elabo-rated in CIELAB. One possible reason for our conflicting result

TABLE IVMOSTELLER’S TEST FOR PARAMETER COMPARED TO � WITH SIGNIFICANCE

LEVEL � � �� FOR GAUSSIAN AND LOGISTIC DISTRIBUTION

could be that the hue advantage is relevant mainly in Study 1.Study 2 has explicit color changes larger than the expected hueshift in the CIELAB space and for images with a color cast thehue advantage may not be important. In fact, a partial evaluationof the data in Study 2 disregarding the color shift level (Col1,Col2, Col3) shows an increased part worth of IPT compared toCIELAB. However, in view of the rather small part-worths ofthe color space parameters, compared to the other parameters,we can not rule out that some systematic shortcomings of ourconjoint model are the reason for the result.

For Color/Lightness the most preferred level is L- followedby L0. For the default value of the focus point in the destinationgamut we used the mid point between black andwhite point of the smallest gamut. Because the mid point of thesource gamut is a neutral gray with the defaultattribute L0 is mapped to a lighter color than in the original. Theresults of our study show, that in general darker images (levelL-) were preferred for which the mapped neutral gray is closer tothat of the original. This indicates, that the mid grays tone shouldbe mapped close to its original, independent of the lightness ofthe destination black and white. As expected the color changesclearly have a negative influence on the perceived quality. Acolor change due to a focus shift of causes a decrease inquality of the same order as the differences between sigmoidalcompression and clipping, or the difference of two successivedetails enhancement factors. In a similar manner hue changesin either direction cause a quality decrease, but the magnitudeof the studied hue changes radians is only about half ofthat of the studied color changes.

We do not try to interpret the results for the levels of GamutShift and Gamut Rotation. Their part-worth values are smallanyway.

B. Testing the Model

1) Mosteller’s Test: We made the assumption on the param-eter level, that the part-worth values are uncorrelated normallydistributed variables with equal variances. We tested this as-sumptions using Mosteller’s test. A description of Mosteller’stest can be found in [8] or [24]. Results are presented in Table IV.

Most parameters passed the test at a significance level of. Only the Compression parameter in Study 1 and the

Gamut Size parameter in Study 2 show significant deviations.One possible reason could be that a distribution with a widertail (than the Gaussian distribution) is a better fit to the data. In-deed, using a logistic distribution in the evaluation give better,but not perfect results. Note that we could not apply Mosteller’stest to the gamut parameter in Study 1, as the frequency matrix


TABLE V� -TEST FOR COMPARISON OF TEST SETS

TABLE VI� -VALUES OF LINEARITY TEST FOR STUDY 1. SIGNIFICANT DEVIATIONS ARE

SHOWN IN BOLD. LAST COLUMN SHOWS CHANGES IN HIT RATE.LARGEST HIT RATE GAINS ARE SHOWN IN BOLD

TABLE VIIHIT RATES USING GAUSSIAN AND LOGISTIC

DISTRIBUTIONS FOR THE TWO STUDIES

Fig. 2. Part worths of details as a function of compression.

for the gamut parameter does not have enough entries (only spe-cific pairs of gamut levels have been compared).

2) Equivalence of Data Sets: For Study 1, each data set fromthe three observer groups was analyzed separately: the two lab-oratory data sets that were collected once at a symposium andonce in the lab, then the control-study where the same observerperformed the test once over the Internet and once in the labo-ratory. Surprisingly, the three sub-tests yield similar results. ForStudy 2, the lab and the Internet data were also analysed sepa-rately and compared and showed no significant difference. The

Fig. 3. Sorted scale values for algorithms for Study 1 and Study 2 comparedwith model curved using Gaussian an logistic distributions.

hypothesis that the results of the studies cannot be distinguishedwas tested with a test. The results support our hypothesis thatthe results of the studies cannot be differentiated on the base ofour data and are summarized in Table V. The collection of thislarge data set using the Internet allows us to draw more preciseconclusions about our model parameters. More detailed resultscan be found in [18].

3) Linearity: The linearity assumption was tested for everyattribute pair. The results of the -Test for Study 1 are given inTable VI. The -values of most attribute pairs did not indicatea deviation from linearity. Two combinations Compression-De-tails and Gamut Size-Gamut Shift show clearly significant devi-ations. Those two combinations also provide the largest increasein hit rate when combined attribute levels are used instead of theindividual ones.

A detailed inspection of the combined Compression-Detailsresults shows that the gain resulting from Details for clipping isonly about half as large as the other gains (see Fig. 2). A possibleexplanation is the fact that the colors have to be mapped backinto the gamut after detail reconstruction. When using clipping,much more colors are affected by this mapping compared tothe other compression parameters. The nonlinearity in GamutSize-Gamut Shift can be characterized as an increase of the part-worth for Sh- on the cost of Sh0 with increasing gamut size.

For Study 2, no significant deviation from linearity could bedetected and the hit rate could not be increased by any attributepair. This is presumably due to the limited size of the data setcompared to Study 1.

4) Distribution Function: First, we qualitatively verify theassumption that the distribution of the sorted scale values of pos-sible parameter combination does not have gaps between suc-cessive quality values, i.e., that no attribute is dominant over theother attributes. This is visualized in Fig. 3.

In a second step, we can experimentally estimate the averagecumulative distribution function: Histograms are collected onall judgments based on their estimated psycho-visual distance.From them, the probability that an observers judgment agrees


Fig. 4. Cumulative distribution function. Logistic versus Gaussian CDF com-pared to experimental data.

Fig. 5. Error estimation. Average error of attribute levels for theoretical errorestimation and three types of experimental error estimations.

with the modeled quality distance can be computed and com-pared to the cumulative distribution function, see Fig. 4.

The conjoint analysis and the hit rate determination was per-formed for the Gaussian and the logistic distribution function.The hit rate turned out to be very similar with a slight advantagefor the Gaussian distribution for Study 1 and an advantage forthe logistic distribution for Study 2.

Even if there is evidence from the Mosteller test, that the lo-gistic distribution can explain frequencies at large psycho-vi-sual distances better, the logistic distribution does not increasehit rate significantly. The Gaussian distribution may be moreappropriate at shorter distances. In regard to the very similar re-sults on part worths values for both distributions, we did not fur-ther investigate on finding a better distribution function, whichcould be a convolution of a Gaussian function with a logisticfunction.5 The influence of the choice of distribution functionswas already discussed in earlier works [28], [29].

5) Error Analysis: In Fig. 5, we show the the comparisonof the theoretical error with three types of experimental errors.Since the data set for Study 1 is larger than the data set forStudy 2, the estimated error is smaller. For both studies, we did

5Note that the rescaling in the conjoint analysis was derived assumingnormal distributions; thus, the rescaling for the logistic distribution is onlyapproximative.

Fig. 6. Hit rate as a function of individualization for study 1.

Fig. 7. Hit rate as a function of individualization for study 2.

not notice a significant difference between the experimentalerror computed by randomly dividing the paired comparisonsinto two groups and the error calculated by linear regression.However, the experimental error computed by randomly di-viding the images into two groups is significantly larger inboth studies. For Study 2, the experimental error computedby randomly dividing the observers into two groups is alsosignificantly larger in both studies. This suggests it is worthan effort to develop gamut mapping algorithms based on indi-vidual image properties and even personalize gamut mappingalgorithms for user groups.

C. Individualization

We show here that the analysis of the data for individual im-ages and individual observers has the potential to better predictthe individual choices. For Study 1, we could increase the hitrate from 81.4% to 82.4%. More interesting is the fact that thehit rate can be further increased to 83.5% by mixing the generalscale with the individual scale. The results are shown in Figs. 6and 7.

For Study 2, the individualization on its own does not increasethe hit rate, but the maximum hit rate is obtained by mixingresults from individual images and general results at a rate of


about 20% to 80%. For individual observers, we always get alower hit rate but mixing allows a marginal increase too.

The obtainable hit rate as a function of individualization de-pends on several factors: the number of observers, the number ofimages and how much observers and images differ from the av-erage population. In Study 1, the average number of individualtests for an image was about 300 and, thus, sufficiently largefor the evaluation to be better on individual images than on thewhole population. Individualization on observers, however, didnot increase the hit rate, because the average number of testsper individual observer was only about 45. In Study 2, the av-erage number of individual tests per image or observer was inthe order of 60-80. This is reflected by the very similar hit ratecurves as a function of individualization. Note that the gain inhit-rates with individualization (Figs. 6 and 7) correlates wellwith the behavior of the different experimental error estimation(Fig. 5).

V. CONCLUSION

We showed that conjoint analysis can be a useful and efficientmethod to gauge the importance of gamut mapping parameterson the perceived visual image quality. Its strength is that it al-lows to determine the relative importance of different parame-ters in one study. The combination of a web test with a labora-tory test allows to have access to a large number of observers viathe Internet, but still being able to confirm the results based onthe evaluation of test data with well known viewing conditions.

Hit rate analysis turned out to be a good tool to analyze inwhich direction (combination of parameters, individualizationfor observers and images and type of distribution functions) theunderlying model can be improved.

In particular, the individualization for images bears the po-tential for improving gamut mapping algorithms. The questionarises how a parameterized gamut mapping algorithm can beoptimized depending on the image content without the need oflarge user studies. Good image quality measures estimating thevisual distance of an original to the mapped image could playan important role in this optimizing step.

ACKNOWLEDGMENT

The authors would like to thank everyone who participated intheir psycho-visual studies.

REFERENCES

[1] J. Morovic, Colour Gamut Mapping ISBN 0470030321. New York:Wiley, 2008.

[2] H. Kotera and R. Saito, “Compact description of 3-d image gamut byr-image method,” J. Electron. Imag., vol. 12, no. 4, pp. 660–668, Oct.2003.

[3] J. Giesen, E. Schuberth, K. Simon, and P. Zolliker, “Image-dependentgamut mapping as optimization problem,” IEEE Trans. Image Process.,vol. 16, no. 10, pp. 2401–2410, Oct. 2007.

[4] P. Zolliker and K. Simon, “Retaining local image information in gamutmapping algorithms,” IEEE Trans. Image Process., vol. 16, no. 3, pp.664–672, Mar. 2007.

[5] N. Bonnier, F. Schmitt, M. Hull, and C. Leynadier, “Spatial and coloradaptive gamut mapping algorithms,” in Proc. Color Imaging X: Pro-cessing, Hardcopy and Applications, Scottsdale, AZ, 2007, vol. 15, pp.267–272.

[6] I. Farup, C. Gatta, and A. Rizzi, “A multiscale framework for spa-tial gamut mapping,” IEEE Trans. Image Process., vol. 16, no. 10, pp.2423–2435, Oct. 2007.

[7] R. Kimmel, D. Shaked, M. Elad, and I. Sobel, “Space-dependent colorgamut mapping: A variational approach,” IEEE Trans. Image Process.,vol. 14, pp. 796–803, 2005.

[8] P. G. Engeldrum, Psychometric Scaling, A Toolkit for Imaging SystemsDevelopment. Winchester, MA: Imcotek, 2000.

[9] J.-C Falmange, Elements of Psychophysical Theory. Oxford, U.K.:Oxford Univ. Press, 2001.

[10] “Central bureau of the CIE, vienna,” CIE Publication 156: Guidelinesfor the Evaluation of Gamut Mapping Algorithms, 2004.

[11] F. Dugay, I. Farup, and J. Y. Hardeberg, “Perceptual evaluation ofcolor gamut mapping algorithms,” Color Res. Appl., vol. 33, no. 6, pp.470–476, 2008.

[12] J. Morovic and Y. Wang, “A multi-resolution, full-colour spatial gamutmapping algorithm,” in Proc. 11th Color Imaging Conf., Society forImaging Science and Technology, 2003, vol. 11, pp. 282–287.

[13] J. Giesen, K. Mueller, E. Schuberth, L. Wang, and P. Zolliker, “Con-joint analysis to measure the perceived quality in volume rendering,”IEEE Trans. Vis. Comput. Graph., vol. 13, no. 11, pp. 1664–1671, Nov.2007.

[14] B.-H. Kang, M.-S. Cho, J. Morovic, and M. R. Luo, “Gamut com-pression algorithm development on the basis of observer experimentaldata,” in Proc. 8th Color Imaging Conf., Nov. 2000, vol. 8, pp.268–272.

[15] B. Keelan, Handbook of Image Quality. New York: Marcel Dekker,2002.

[16] B. Taneva, J. Giesen, P. Zolliker, and K. Mueller, “Choice based con-joint analysis: Discrete choice models vs. direct regression,” presentedat the 1st ECML/PKDD-Workshop on Preference Learning (PL), Sep.2008.

[17] Z. Baranczuk, P. Zolliker, I. Sprow, and J. Giesen, “Conjoint analysis ofparametrized gamut mapping algorithms,” in Proc. 16th Color ImagingConf., Nov. 2008, pp. 38–43.

[18] I. Sprow, Z. Baranczuk, T. Stamm, and P. Zolliker, Web-Based Psycho-metric Evaluation of Image Quality. Bellingham, WA: SPIE, 2009,vol. 7242, p. 72420A.

[19] L. Thurstone, “A law of comparative judgement,” Psychol. Rev., pp.273–286, 1927.

[20] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete blockdesigns, I. the method of paired comparisons,” Biometrika, vol. 39, pp.324–345, 1952.

[21] J. H. Morrissey, “New method for the assignement of psychometricscale values from incomplete paired comparisons,” J. Opt. Soc. Amer.,vol. 45, no. 5, pp. 373–378, 1955.

[22] H. Gullikson, “A least squares solution for paired comparisons withincomplete data,” Psychometrika, vol. 21, no. 2, pp. 125–134, 1956.

[23] E. D. Montag, “Empirical formula for creating error bars for the methodof paired comparison,” J. Electron. Imag., vol. 15, no. 1, pp. 0105021–3, 2006.

[24] F. Mosteller, “Remarks on the method of paired comparisons: III. A testof significance for paired comparisons when equal standard deviationsand equal correlations are assumed,” Psychometrika, vol. 16, p. 203,1951.

[25] P. Zolliker and K. Simon, “Continuity of gamut mapping algorithms,”J. Electron. Imag., vol. 15, no. 1, p. 13004, Mar. 2006.

[26] F. Ebner and M. D. Fairchild, “Developement and testing of a colorspace (ipt) with improved hue uniformity,” in Proc. 6th IS&T/SIDColor Imaging Conf., Scottsdale, AZ, 1998, pp. 8–13.

[27] “Central bureau of the CIE, Vienna,” CIE Publication 116: Industrialcolor difference evaluation, 1995.

[28] R. H. Hohle, “An empirical evaluation and comparison of two modelsfor discriminability scales,” J. Math. Psych., vol. 3, pp. 174–183, 1966.

[29] J. E. Jackson and M. Fleckenstein, “An evaluation of some statisticaltechniques used in the analysis of paired comparison data,” Biometrics,vol. 13, pp. 51–64, 1957.

Peter Zolliker (M’06) received the degree in physicsfrom the Swiss Federal Institute of Technology,Zürich, and the Ph.D. degree in crystallography fromthe University of Geneva, Switzerland, in 1987.

From 1987 to 1988, he was a Postdoctoral Fellowat the Brookhaven National Laboratory. From 1989to 2002, he was a member of the R&D team at GretagImaging. In 2003, he joined the Swiss Federal Lab-oratories for Materials Testing and Research, wherehis research is focused on digital imaging, color man-agement, image quality, and psychophysics.


Zofia Baranczuk received the B.S. degree in com-puter science and the M.S. degree in mathematicsfrom the Warsaw University, Poland. She is currentlypursuing the doctorate degree

She is engaged in psycho-visual tests and gamut-mapping.

Iris Sprow received the degree in imaging and pho-tographic technology from the Rochester Institute ofTechnology in 2005 and the M.Sc. degree in digitalcolor imaging from the London College of Commu-nication in 2009.

She joined the Media Technology group at EMPADuebendorf, Switzerland, in 2005, where her work isfocused on subjective image quality evaluation.

Joachim Giesen (M’09) received the Ph.D. degree incomputer science from ETH Zürich, Switzerland.

Afterward, he spent time as a postdoctorate andresearcher at The Ohio State University, ETH Zürich,and the Max Planck Institut Informatik. Since 2008,he has been a Professor of computer science atFriedrich-Schiller-Universitaet, Jena, Germany.

Documents

Conjoint Analysis for Evaluating Parameterized Gamut Mapping Algorithms