6
3D VIDEO QUALITY ASSESSMENT WITH MULTI-SCALE SUBJECTIVE METHOD Valentin Kulyk 1 , Samira Tavakoli 4 , Mats Folkesson 1 , Kjell Brunnström 2,3 , Kun Wang 2,3 , Narciso Garcia 4 1. Ericsson Research, Sweden 2. Acreo Swedish ICT AB, Sweden 3. Mid Sweden University, Sweden 4. Universidad Politécnica de Madrid, Spain ABSTRACT The stereoscopic 3D viewing experience has been studied quite intensively recently; however the subjective test methods to assess it have not been determined. It has become clear that the 3D viewing experience cannot easily be described by just one scale. This paper describes a cross-laboratory study employing the same test setup and materials where a multi-scale rating method was used. One of the laboratories used three scales: Depth Naturalness, Visual Comfort and Video Quality; while the other laboratory used these: 3D Realism, Depth Quantity and Video Quality. The results show that the Quality of Experience (QoE) of 3D video is a multidimensional concept and therefore more than one scale is in many cases needed for subjective assessment of the QoE of 3D video. Index Terms— 3D video quality, QoE, subjective experiment 1. INTRODUCTION With the successes of Stereoscopic 3D videos from cinema, there has been an industrial push to bring 3D into people’s home. However, this has not been that straightforward, due to many reasons e.g. lack of distribution standards, lack of content and still a number of unsolved user experience issues. The introduction of stereoscopic 3D video has brought new experiences to viewers which may also induce new problems such as visual discomfort [1]. Subjective assessment can help us understand these new aspects of 3D video and their impact on the Quality of Experience (QoE). Therefore it is of high importance not only to be able to conduct subjective tests properly but also to understand what 3D video quality properties or aspects taken into account by observers during quality assessment in a subjective test. Stereoscopic 3D viewing experience has been studied quite intensive recently [2], but still the subjective test methods have not yet been standardized. It has become clear that the 3D viewing experience cannot easily be described by just one scale [6]. The purpose of this study was to evaluate a subjective test methodology regarding assessment of stereoscopic 3D video, reveal meaning of 3D voting scales and validate applicability of multi-scale voting procedure. Two similar subjective experiments were conducted at different labs using three voting scales and the same test setup including mixed set of 2D and 3D videos. The “Video Quality” scale was common for both labs which made the cross lab validation possible. The paper is organized as follow: In Sec. 2, the experiment method, setups, video sets and conditions are described. The results are reported in Sec. 3 and later discussed in Sec. 4, before concluding the work in Sec. 5. 2. SUBJECTIVE EXMERIMENT 2.1 Experimental setup and method Two subjective experiments were conducted at two laboratories in Sweden based on the same video set: one at Ericsson Research (Lab1) and one at Acreo Swedish ICT AB (Lab2). In order to try and limit environmental bias both labs strived for similar ambient illumination and used the same settings of the 3D displays at both locations. A 46” Hyundai S465D display with polarized line- interleaved pattern was used in each laboratory for displaying 3D videos in the experiment, together with polarized eye-glasses from RealD. The display was positioned 4 times of the display height (4H) from the user because the black horizontal stripes due to interleaving at the display were visible at 3H viewing distance for a subject with visual acuity 20/20 or better. The display was put into factory default mode except the brightness of the screen which was set to 90% i.e. 205 cd/m2 peak level. The refresh rate was set to 50Hz. The default settings for the Hyundai S465D display are: Brightness 70%, Contrast 80% and Color 50%. The light reduction by the passive 3D eye-glasses was measured per eye (~40%). The room lighting was switched off in Lab1. The ambient light levels measured horizontally towards the screen and at 4H viewing distance were ~23 lux in Lab1 and ~20 lux in Lab2. At Lab1, 22 naïve viewers, not working professionally with 3D video, participated in the experiment, where two subjects performed the test at the same time, sitting beside each other with a shielding screen in between. At Lab2, 25 naïve viewers participated; only one subject performed the test at a time. Before the test execution, the test subjects’ vision was tested for visual acuity (Snellen) and stereo 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX) 978-1-4799-0738-0/13/$31.00 ©2013 IEEE 106 QoMEX 2013

3D video quality assessment with multi-scale subjective method

  • Upload
    ri-se

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

3D VIDEO QUALITY ASSESSMENT WITH MULTI-SCALE SUBJECTIVE METHOD

Valentin Kulyk1, Samira Tavakoli4, Mats Folkesson1, Kjell Brunnström2,3, Kun Wang2,3, Narciso Garcia4

1. Ericsson Research, Sweden 2. Acreo Swedish ICT AB, Sweden 3. Mid Sweden University, Sweden

4. Universidad Politécnica de Madrid, Spain

ABSTRACT The stereoscopic 3D viewing experience has been studied quite intensively recently; however the subjective test methods to assess it have not been determined. It has become clear that the 3D viewing experience cannot easily be described by just one scale. This paper describes a cross-laboratory study employing the same test setup and materials where a multi-scale rating method was used. One of the laboratories used three scales: Depth Naturalness, Visual Comfort and Video Quality; while the other laboratory used these: 3D Realism, Depth Quantity and Video Quality. The results show that the Quality of Experience (QoE) of 3D video is a multidimensional concept and therefore more than one scale is in many cases needed for subjective assessment of the QoE of 3D video.

Index Terms— 3D video quality, QoE, subjective experiment

1. INTRODUCTION With the successes of Stereoscopic 3D videos from cinema, there has been an industrial push to bring 3D into people’s home. However, this has not been that straightforward, due to many reasons e.g. lack of distribution standards, lack of content and still a number of unsolved user experience issues. The introduction of stereoscopic 3D video has brought new experiences to viewers which may also induce new problems such as visual discomfort [1]. Subjective assessment can help us understand these new aspects of 3D video and their impact on the Quality of Experience (QoE). Therefore it is of high importance not only to be able to conduct subjective tests properly but also to understand what 3D video quality properties or aspects taken into account by observers during quality assessment in a subjective test. Stereoscopic 3D viewing experience has been studied quite intensive recently [2], but still the subjective test methods have not yet been standardized. It has become clear that the 3D viewing experience cannot easily be described by just one scale [6]. The purpose of this study was to evaluate a subjective test methodology regarding assessment of stereoscopic 3D video, reveal meaning of 3D voting scales and validate applicability of multi-scale voting procedure.

Two similar subjective experiments were conducted at different labs using three voting scales and the same test setup including mixed set of 2D and 3D videos. The “Video Quality” scale was common for both labs which made the cross lab validation possible.

The paper is organized as follow: In Sec. 2, the experiment method, setups, video sets and conditions are described. The results are reported in Sec. 3 and later discussed in Sec. 4, before concluding the work in Sec. 5.

2. SUBJECTIVE EXMERIMENT 2.1 Experimental setup and method Two subjective experiments were conducted at two laboratories in Sweden based on the same video set: one at Ericsson Research (Lab1) and one at Acreo Swedish ICT AB (Lab2). In order to try and limit environmental bias both labs strived for similar ambient illumination and used the same settings of the 3D displays at both locations. A 46” Hyundai S465D display with polarized line-interleaved pattern was used in each laboratory for displaying 3D videos in the experiment, together with polarized eye-glasses from RealD. The display was positioned 4 times of the display height (4H) from the user because the black horizontal stripes due to interleaving at the display were visible at 3H viewing distance for a subject with visual acuity 20/20 or better.

The display was put into factory default mode except the brightness of the screen which was set to 90% i.e. 205 cd/m2 peak level. The refresh rate was set to 50Hz. The default settings for the Hyundai S465D display are: Brightness 70%, Contrast 80% and Color 50%. The light reduction by the passive 3D eye-glasses was measured per eye (~40%). The room lighting was switched off in Lab1. The ambient light levels measured horizontally towards the screen and at 4H viewing distance were ~23 lux in Lab1 and ~20 lux in Lab2.

At Lab1, 22 naïve viewers, not working professionally with 3D video, participated in the experiment, where two subjects performed the test at the same time, sitting beside each other with a shielding screen in between. At Lab2, 25 naïve viewers participated; only one subject performed the test at a time. Before the test execution, the test subjects’ vision was tested for visual acuity (Snellen) and stereo

2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX)

978-1-4799-0738-0/13/$31.00 ©2013 IEEE 106 QoMEX 2013

acuity (Randot). One subject from each lab was rejected and thus removed from the final analysis due to bad results in the stereo vision test. Both labs used three voting scales for evaluating the 3D video QoE. At Lab1: “Depth Naturalness” (DN), “Video Quality” (VQ1) and “Visual Discomfort” (VD) were used with five level category scale in combination with a continuous sliding bar as Figure 1 shows. Whereas at Lab2, the voting scales used in the test were “3D Realism” (3DR), “Depth Quantity” (DQ) and “Video Quality” (VQ2), with discrete five level category scale options as Figure 2 shows.

Figure 1 Voting interface used in Lab1

Figure 2 Voting interface used in Lab2

Regarding “Visual Discomfort” used in Lab1, the test results were transformed and used as “Visual Comfort” (VC) later in this document for comparison with the other scales.

Regarding the scales used in Lab2, 3DR represents the subjects’ vote about the experience of the 3D depth reconstruction presented in the test video, which is conceptually equivalent to DN scale in Lab1. DQ indicates the amount of depth perceived in the scene representation including both monocular and binocular cues. Finally VQ2 refers to pictorial quality of the test sequence considering color, resolution, contrast, etc. similarly to VQ1. Before the experiments, observers were instructed and were

shown training sequences to understand all these concepts and the test procedure.

2.2 Test conditions 13 source stereoscopic video sequences (SRC), chosen from one documentary and three movies avoiding scene changes were divided into three content types: Content 1 – recorded with still camera containing

small amount of motion (standing sitting people) Content 2 – recorded with still camera containing a

moderate amount of motion. Content 3 – recorded using zoom and/or moving

camera containing moderate/large amount of motion. All content were 1920x1080 progressive (1080p) 24/1.001 fps, played in 25 fps in the test. The 1280×720 progressive (720p) videos were up-scaled to 1080p. All videos were displayed as 1920×540 per eye on the screen.

One purpose of the experiments was to verify whether 3D and 2D related properties can be assessed in the same subjective test and provide consistent results. Therefore several video processing scenarios [8], called Hypothetical Reference Circuits (HRC) according to the VQEG terminology, were used to create the Processed Video Sequences (PVS) which are listed in Table 1 and here classified in three groups:

2D - uncompressed and compressed 2D video in full resolution and anamorphic,

3D - uncompressed 3D videos with different levels of 3D quality,

3Denc - compressed 3D videos at different bitrates and in Side-by-Side (SbS) format.

All videos were prepared with row-interleaving process off-line for presentation on the Hyundai 3D display in 2D mode.

Table 1 List of all HRCs. The encoding bitrates considered in this test have been r01, r02, r03, r04, and r05 which correspond to 1.25, 1.5, 1.75, 2.0 and 2.5 Mbps respectively.

HRC Nr.

Test condition HRC code HRC group

1 Uncompressed 2D, content 1 2D1

2D

2 Uncompressed 2D, content 2 2D2 3 Uncompressed 2D, content 3 2D3 4 Uncompressed 2D, all content types 2D4 5 Uncompressed anamorphic 2D,720p 2D5

6 Compressed anamorphic 2D,720p at r04 (using the left view of 3D) 2D6

7 Compressed 2D, 720p at r02 2D7 8 Uncompressed 3D, content1 3D1

3D

9 Uncompressed 3D, content2 3D2 10 Uncompressed 3D, content3 3D3 11 Uncompressed 3D, all content types 3D4 12 Uncompressed 3D, 720p SbS 3D5

13 Simulated 3D (2D-to-3D conversion by geometrical distortion) 3D6

14 Simulated 3D (uneven depth in vertical direction) 3D7

15 Simulated 3D (temporal mismatch between left & right views) 3D8

16 3D,720p SbS, compressed at r01 3Denc1

3Denc 17 3D,720p SbS, compressed at r02 3Denc2 18 3D,720p SbS, compressed at r03 3Denc3 19 3D,720p SbS, compressed at r04 3Denc4 20 3D,720p SbS, compressed at r05 3Denc5

2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX)

107

3. RESULTS

3.1 Uncompressed and compressed 2D video HRCs Figure 3 shows, the Mean Opinion Score (MOS) of the 2D video related HRCs (from left to right): three content types (“2D1”, “2D2” and “2D3”), the average MOS over all content types “2D4”, uncompressed anamorphic 720p “2D5”, the same anamorphic 720p but encoded at bitrate r04 “2D6”, and full 720p encoded at bitrate r02 (r02 < r04) “2D7”. One can observe that 3D video related MOS (DN, 3DR and DQ) are low, indicating that subjects did not have any difficulties to detect 2D HRCs. The 3DR MOS is the lowest and almost constant, reflecting the 2D property of the HRCs even better than DN and DQ. The DN and DQ are slightly higher for video content types with higher amount of motion. The Video Quality is high for all content types, hence the average (“2D4” in Figure 3), but falls for HRCs that had lower resolution and were encoded (“2D5/6/7”). The Visual Comfort is judged to be almost equally high for all HRCs in this group. The test subjects managed to use the 3DR and VC voting scales as different dimensions from VQ, DN and DQ. The DN and DQ seems to reflect the fact that monocular cues for depth are present in 2D video therefore one can expect to have some experience of the depth.

3.2 Uncompressed 3D video HRCs Figure 4 shows the results of the following uncompressed 3D video related HRCs: MOS for each content type (“3D1”, “3D2” and “3D3”), the average MOS for all content types “3D4”, 720p SbS video “3D5”, the 3D simulation “3D6”, depth distortion simulation “3D7” and temporal mismatch between left and right view “3D8”.

The MOS on all scales are pretty high for the three content types with one note: for content with high motion (“3D2” and “3D3”) the DQ is slightly higher while VC is

lower than for 3D1 which indicated that amount of motion in a video scene can have impact on the subjective judgments. The down-sampling of the video resolution results in decreased VQ, DN and DQ but preserves a high VC MOS (“3D5”).

The simulated depth degradation was reflected differently by different scales in each HRC. The simulated 3D “3D6” (looking almost like 2D displayed behind the screen surface) was reflected by lower DN, 3DR and DQ MOS. The 3DR MOS is the lowest, similar to the 3DR MOS for the 2D HRCs indicating poor 3D experience. At the same time the VC is judged to be high for “3D6” HRC.

The gradient depth distortion “3D7” was reflected by decreased DN, 3DR and VC MOS, while VQ was judged to be good. This demonstrates that 3D video QoE is multidimensional for “3D7” and rated different on different scales. The lowest Visual Comfort MOS has been observed in temporal mismatch HRC “3D8”. The quality for this HRC was also reflected by decreased DN and 3DR similarly to the simulated 3D (3D7/8). The DQ is judged to be high for “3D7” and “3D8”, indicating that a moderate amount of depth was perceived while the VC MOS reveals discomfort problem for these HRCs.

It should be noted that VQ was assessed slightly different for “3D8” in Lab1 and Lab2. The subjects in Lab1 judged the VQ to be degraded for the “3D8” HRC, similarly to the decreased video resolution in “3D5” HRC; however no similar decrease in VQ MOS is observed in Lab2 results. There is no clear explanation for this discrepancy for “3D8” HRC. One possible reason could be the differences between the labs in the environmental setup or instruction/training part before the assessment sessions.

3.3 Compressed 3D video HRCs Following compressed 3D video HRCs are presented in Figure 5: encoded SbS 3D videos at different bit rates

Figure 3 MOS for 2D video Figure 4 MOS for uncompressed 3D video Figure 5 MOS for compressed 3D video

2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX)

108

(where r01 is the lowest and r05 is the highest) and uncompressed 720p SbS “3D5”. One can observe that the Video Quality results from two labs are very similar. The 3D video quality related results, 3DR and DQ MOS, are almost constant through all HRCs except “3D5”. The VC follows the VQ MOS, increasing for higher bitrates. One can say that VQ and VC provide almost the same information about the QoE. At the same time the DN indicated that compression affects also the depth perception. Thus, the high video quality enhances the 3D perception as one can expect since clear pictures make it easier for stereopsis to work. 3.4 Voting scales correlation in each lab The Pearson correlations between the different scales used by respective laboratory were calculated for the HRC groups, and the results are shown in Figure 6.

Figure 6: Pearson correlation coefficients (absolute values) between the different scales for each lab over HRCs MOS

For Lab 1, the Pearson correlation for the 2D HRC group is rather high for DN-VQ and VQ-VC but low for DN-VC. However in Lab2, only DQ-VQ scales are highly correlated. The DN_DQ and DQ-VQ high correlations indicated that the observers gave high score to VQ for 2D PVS when large amount of depth was judged. One possible explanation can be that higher VQ resulted in clearer picture which enhanced the perception of the pictorial depth cues present in 2D. Furthermore, the higher DQ can also benefit from a clearer picture which makes stereo matching of corresponding points for the human visual system between the views easier. As Heynderick’s and Kaptein’s research [7] found that the perceived amount of detail is higher in 3D than in 2D images with the same spatial resolution. There is a small and negative correlation in 3DR-VQ combination (PCC= - 0.14), as well as almost no correlation for the 3DR-DQ pair.

For the 3D encoded video in Lab1, the correlations between scales are high though lower for DN-VC indicating that all three scales reflected the same QoE properties. VQ MOS values highly correlate with both DN and VC MOS values in Lab1, which is also stronger than correlation for VC-DN. Regarding the scales used in Lab2, relatively large correlation between 3DR and VQ as well as 3DR and DQ can be seen, while the correlation for DQ-VQ is much lower.

(a)

(b)

Figure 7: Correlation over PVSs MOS per 3D quality related scales

For uncompressed 3D video, the correlation between all scales in Lab1 is low (though higher between VQ-VC scales) which indicates that the subjects separated the DN, VQ and VC quality concepts. The influence of the simulated depth degradation on both DN and VC can be also seen in Figure 7(a) which shows the correlation over PVSs MOS results in Lab1.

The lowest ratings due to simulated depth degradations were obtained by temporal mismatched simulated 3D, “3D8”, which received the lowest DN and VC ratings as it is shown in the figure. Similarly, we can see the impact of depth degradations in the 3DR MOS results in Lab2 showing itself as high correlation in 3DR-DQ combination. That is when the depth degradations are perceived worse it affect both the impression of realism and quantity.

The high positive 3DR-DQ correlation should mean that large DQ result in high 3DR and vice versa. It was the case for 3Denc HRCs to larger extend than for 3D HRCs which can be explained by 3D8 HRC MOS where DQ was high while 3DR was low. This is presented in more detail in Figure 7(b). The correlation for 3DR-VQ (2D and 3D HRCs) and DQ-VQ (2D and 3Denc HRCs) scales are rather low, which indicates that the observers use them

2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX)

109

independently from each other within respective HRC group. 3.5 Cross-lab comparison A repeated measure ANOVA was performed, where the scales and the PVSs were treated as within factors and the different laboratories was a between factor, followed by a Tukey HSD post-hoc test which revealed that the Video Quality scales in the two labs were not statistically significantly different (p=0.8). Depth Naturalness and 3D Realism was not statistically significantly different on the 95% confidence level but close (p=0.1). The Depth Quantity was not significantly different from Depth Naturalness (p=0.8) or Video Quality (p=0.06), but different from 3D Realism (p=0.002). Visual Comfort was not statistically significantly different from the Video Quality scales (p=0.8 Lab1 and p=0.2 Lab2), but different towards the other scales (p=0.0001).

Table 2 reports the correlation between scales calculated for both PVS and HRC MOS results. Being more aggregated, correlation calculated based on HRC MOS is higher than the other in all pairs.

Table 2: Pearson correlation coefficient comparison

Scales PCC (PVS MOS) PCC (HRC MOS) VQ1,VQ2 0.913 0.96 DN, 3DR 0.905 0.97 DN, DQ 0.68 0.90 VC, DQ - 0.53 - 0.61 VC, 3DR - 0.24 - 0.38

Figure 8 shows scatter plots of the MOS of the PVSs

between different scales used in the two laboratories. According to Figure 8(a), the correlation between the video quality scales in the two laboratories was about 91% (see Table 2). Some PVSs had a bit different behavior than the others which all belong to 3Denc HRC group. The Pearson

correlation for DN-3DR is about 91% which is shown in Figure 8(b). According to Figure 8(c), the correlation for DN-DQ combination is lower than relevant pair, 3DR-DQ, which is previously shown in Figure 7(b). Nevertheless, similar behavior can be seen in both figures. The behavior of VC-DQ combination is shown in Figure 8(d) which indicates a reverse and not really high correlation between scales. The low correlation for the VC-3DR pair is shown in Figure 8(e). Here also a similar behavior to the VC-DN combination in Figure 7(a) can be seen. The VC-VQ2 pair had very low correlation, presented in Figure 8(f). One can also observe from Figure 8(e) and Figure 8(f) that the PVSs MOS are clustered, in two PVS groups with different correlations. For example Figure 8(e) reveals that all 2D PVSs (2D1- 2D6) had similar VC and 3DR i.e. high VC but low 3DR. Figure 8(f) also reveals that some PVSs had similar VQ but different VC while other had correlated VQ-VC.

4. DISCUSSION One of the goals of this study was to investigate the need for more than one scale for subjective 3D video quality assessment. The experiment results have revealed that using only the video quality scale is sufficient when the main target of the assessment is evaluating compression artifacts, typical for the 2D video, which is confirmed by [9]. However, if the experiment targets more than one quality dimension comprising 2D and 3D videos with combination of different distortions types, different scales should be used to be able to capture the experience of viewing the3D videos in the evaluation.

At least one 3D and one Video Quality related scale are needed to detect the combination of good video quality and poor depth conditions among all HRCs. Apart from that, it can be crucial studying the importance of the voting scales and the conceptual relation between them to know how relevant they are since they can provide different aspects of 3D video QoE. In certain cases the impact of

(a) (b) (c)

(d) (e) (f)

Figure 8: Cross-lab comparison

2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX)

110

some scales can be ignored, e.g. when none of the depth related concepts are the experiment’s targets then the impact of Video Quality on Depth Quantity or Depth Naturalness can be ignored. On the other hand, some quality concepts are revealed only in specific voting scales, e.g. Visual Comfort reflected the depth degradations in 3D8. It should be noted that depth degradation can sometimes be revealed in the Video Quality scale as well. Applying the same HRC on different content types is also perceived differently which can be captured on Depth Quantity and Visual Comfort scales.

It has also been observed that compression artifacts, which usually are considered as video quality related degradations, can also affect the 3D QoE by creating disparities not typical for a presented 3D scene. In other words, video compression has impact on several dimensions of the QoE of 3D video. Another observation is the use of 2D and simulated 3D as useful HRCs’ anchor for 3D QoE assessment, making the comparison of different tests conducted by different labs easier. One should always expect low MOS for 3D related scales and high MOS for visual comfort experience from such anchors.

Considering the cross-lab comparison results, a slight difference between the results of related scales has been observed. Apart from using different scales, one possible reason for this difference would be how the test subjects of each lab were instructed and have weighted the scales. Another reason could be the difference between the use of continuous and discrete scales, although labeled with five levels. The 91% correlation between VQ1 and VQ2 considering the PVSs MOS could possibly be higher if some additional post screening would be applied.

5. CONCLUSION QoE of 3D video is a multidimensional concept, and therefore more than one scale is needed for subjective assessment of its QoE. Taking this into account, a multi-scale rating method was used in a subjective experiment established in two different labs to analyze whether 3D and 2D related properties can be evaluated in the same subjective test with solid results. The joint analysis of data collected from each lab showed that the test subjects managed well to use three scales at a time.

The experiment also validated that depending on the HRC group used in the test, different voting scales should be used. In fact, at least one 3D and one Video Quality related scales are needed to detect 2D and 3D HRCs among others to provide consistent results. Therefore, using one Video Quality scale can be enough if the evaluation of 2D features of the video is the main target of the assessment. Furthermore, the importance of using Visual Comfort scale was revealed when depth related distortions were presented.

Both 3D Realism and Depth Naturalness showed to be useful concepts for assessment of 3D video QoE. While

the former reflected poor depth conditions better, the later reflected impact of video compression on the 3D QoE better. This indicated that these concepts were understood and used differently by the test observers. The test results for compressed 3D videos also indicated that compression have impact on several dimensions of the QoE of 3D video.

6. ACKNOWLEDGEMENTS The work at Acreo has been supported by VINNOVA

(The Swedish Governmental Agency for Innovation Systems), which is hereby gratefully acknowledged. We would also like thank Philip Corriveau for improving the language of the manuscript.

7. REFERENCES

[1] Lambooij, M., IJsselsteijn, W., Fortuin, M., and Heynderickx, I., "Visual Discomfort and Visual Fatigue of Stereoscopic Displays: A Review", Journal of Imaging Science and Technology 53, 030201-1-030201-14 (2009)

[2] Xing, L., You, J., Ebrahimi, T., and Perkis, A., "Assessment of Stereoscopic Crosstalk Perception", IEEE Transactions on Broadcasting 14, 326-337 (2012)

[3] Chen, W., Fournier, J., Barkowsky, M., Le Callet, P., “New Requirements of Subjective Video Quality Assessment Methodologies for 3DTV”, Fifth International Workshop on Video Processing and Quality Metrics (VPQM), In Proc. VPQM 2010, p.107.

[4] Wang, K.; Barkowsky, M.; Brunnström, K.; Sjöström, M.; Cousseau, R.; Le Callet, P.; , "Perceived 3D TV Transmission Quality Assessment: Multi-Laboratory Results Using Absolute Category Rating on Quality of Experience Scale," IEEE Transactions on Broadcasting , vol.PP, no.99, pp.1, 0 (2012)

[5] Patterson, R., "Review paper: Human Factors of Stereo Displays: An update", Journal of the Society for Information Display 17, 987-996 (2009)

[6] Kaptein, R., Kuijsters, A., Lambooij, M., IJsselsteijn, W. A., and Heynderickx, I., "Performance evaluation of 3D-TV systems", Proc. SPIE 6808, Image Quality and System Performance V, 680819, (2008)

[7] Heynderickx, I., and Kaptein, R. "Perception of detail in 3D images." In Proc. SPIE, vol. 7242, p. 72420W. 2009.

[8] Kulyk, V., Folkesson, M., “Subjective assessment of 3D video quality” ITU-T Study Group 9 meeting, January 2013, contribution C0008.

[9] Brunnström, K., Ananth, I. V., Hedberg, C., Wang, K., Andrén, B., and Barkowsky, M., "Comparison between different rating scales for 3D TV", SID Symposium Digest of Technical Papers Proc of SID Display Week 2013, May 21-24, 2013, Vanvouver, Canada, paper 36.4 (2013)

2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX)

111