DirectX Video Acceleration Specification of Off-Host VLD Mode for

DirectX Video Acceleration Specification of Off-Host VLD Mode for MPEG-4 Part 2 Video Decoding

Gary J. Sullivan and Yongjun Wu

Microsoft Corporation

March 2011

Applies to:

DirectX Video Acceleration

Summary: Defines extensions to DirectX Video Acceleration (DXVA) to support

variable-length decoding (VLD) modes for MPEG-4 part 2 video.

The information contained in this document represents the current view of Microsoft Corporation on the issues

discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it

should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the

accuracy of any information presented after the date of publication.

MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS

DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under

copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or

transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or

for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights

covering subject matter in this document. Except as expressly provided in any written license agreement from

Microsoft, the furnishing of this document does not give you any license to these patents, trademarks,

copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses,

logos, people, places and events depicted herein are fictitious, and no association with any real company,

organization, product, domain name, e-mail address, logo, person, place or event is intended or should be

inferred.

Microsoft does not make any representation or warranty regarding specifications in this document or any

product or item developed based on these specifications. Microsoft disclaims all express and implied

warranties, including but not limited to the implied warranties or merchantability, fitness for a particular

purpose and freedom from infringement. Without limiting the generality of the foregoing, Microsoft does not

make any warranty of any kind that any item developed based on these specifications, or any portion of a

specification, will not infringe any copyright, patent, trade secret or other intellectual property right of any

person or entity in any country. It is your responsibility to seek licenses for such intellectual property rights

where appropriate. Microsoft shall not be liable for any damages arising out of or in connection with the use of

these specifications, including liability for lost profit, business interruption, or any other damages whatsoever.

Some states do not allow the exclusion or limitation of liability or consequential or incidental damages; the

above limitation may not apply to you.

© 2011 Microsoft Corporation. All rights reserved.

Microsoft, MS-DOS, Windows, Windows Media, Windows NT, Windows Server, Windows Vista, Active

Directory, ActiveSync, ActiveX, Direct3D, DirectDraw, DirectInput, DirectMusic, DirectPlay, DirectShow,

DirectSound, DirectX, Expression, FrontPage, HighMAT, Internet Explorer, JScript, Microsoft Press, MSN,

NetShow, Outlook, PlaysForSure logo, PowerPoint, SideShow, Visual Basic, Visual C++, Visual InterDev, Visual

J++, Visual Studio, WebTV, Win32, and Win32s are either registered trademarks or trademarks of Microsoft

Corporation in the U.S.A. and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective

owners.

Contents Contents ................................................................................................................................... 3 1. Introduction .......................................................................................................................... 4

1.1 Referenced Specifications and Referenced Sample Software ...................................... 4 1.2 General Design Considerations..................................................................................... 4 1.3 Deblocking and Deringing Filters................................................................................... 5

1.3.1 Deblocking filter ..................................................................................................... 5 1.3.2 Deringing filter ....................................................................................................... 5

1.4 Support for Off-Host VLD Operation Only ..................................................................... 5 1.5 Uncompressed Surface Memory Requirements ............................................................ 5 1.6 Picture Data .................................................................................................................. 6 1.7 Buffer Type .................................................................................................................... 7 1.8 DXVA Decoding Operations .......................................................................................... 7 1.9 Status Reporting............................................................................................................ 8

2. Configuration Parameters .................................................................................................... 8 2.1 Syntax ........................................................................................................................... 9

2.2 Semantics ................................................................................................................. 9 3. Picture Parameters Data Structure .................................................................................... 10

3.2 Syntax ......................................................................................................................... 10 3.3 Semantics ................................................................................................................... 11

4. Quantization Matrix Data Structure .................................................................................... 16 4.1 Syntax ......................................................................................................................... 17 4.2 Semantics ................................................................................................................... 17

5. Slice Control Data Structure ............................................................................................... 17 5.1 Syntax ......................................................................................................................... 17 5.2 Semantics ................................................................................................................... 18

6. Status Report Data Structure ............................................................................................. 18 6.1 Syntax ......................................................................................................................... 18 6.2 Semantics ................................................................................................................... 19

7. Restricted-Mode Profiles .................................................................................................... 19 7.1 DXVA_ModeMPEG4pt2_VLD_Simple Profile ............................................................. 19 7.1 DXVA_ModeMPEG4pt2_VLD_AdvSimple_NoGMC Profile ........................................ 19 7.2 DXVA_ModeMPEG4pt2_VLD_AdvSimple_GMC Profile ............................................. 20

8. IDirectXVideoDecoder Operation ....................................................................................... 20 8.1 Structure of BeginFrame, Execute, and EndFrame Calls ............................................ 20

Annex A: Buffer Copies and Post-processing ........................................................................ 21 For More Information.............................................................................................................. 21


1. Introduction This draft specification defines extensions to the DirectX® Video Acceleration (DXVA)

2.0 API/DDI to support MPEG-4 Part 2 video decoding, including support of the MPEG-4

Part 2 Simple and Advanced Simple Profiles.

This specification assumes that the reader is familiar with the MPEG-4 Part 2 video

coding specification and with the basic design of DXVA.

DXVA consists of a DDI for display drivers and an API for software decoders. Version

1.0 of DXVA is supported in Windows 2000 or later versions. Version 2.0 is available

starting in Windows Vista. Considering the passage of time and the increasing

prevalence of DXVA 2.0 support, this document specifies the DXVA 2.0 operation for

MPEG-4 Part 2 video decoding. We do not plan to specify MPEG-4 Part 2 video

decoding in the DXVA 1.0 context.

In DXVA, some decoding operations are implemented by the graphics hardware driver

and GPU. This set of functionality is termed the accelerator. Other decoding operations

are implemented by user-mode application software, called the host decoder or software

decoder. Processing performed by the accelerator is called off-host processing.

Typically the accelerator uses the GPU to speed up some operations. When the

accelerator performs a decoding operation, the host decoder sends buffers of data to the

accelerator that contain the information that is needed to perform the operation.

Unless stated otherwise in this specification, DXVA operations in the accelerator must

be stateless, and must not make assumptions about sequential operation or internal-

memory state dependencies.

Note In this document, the term shall describes behavior that is required by the

specification. The term should describes behavior that is encouraged but not required.

The term note refers to observations about implications of the specification.

Questions or comments about this specification may be sent to

[email protected].

1.1 Referenced Specifications and Referenced Sample Software

The referenced MPEG-4 Part 2 video coding standard (2004 edition) is specified in

ISO/IEC 14496-2:2004, Information technology – Coding of audio-visual objects – Part

2: Visual, available at

http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=

39259.

Associated standard reference software is available at

http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html.

This specification also contains some references to the closely-related ITU-T

Recommendation H.263 standard (2005 edition), as specified in ITU-T Recommendation

H.263 (2005), Video coding for low bit rate communication. That standard is available at

http://www.itu.int/rec/T-REC-H.263.

1.2 General Design Considerations

Section 1 of this specification provides an overview of the DXVA design for MPEG-4

Part 2 video decoding. It is intended as background information, and might be helpful in

http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=39259

http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=39259

http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html

http://www.itu.int/rec/T-REC-H.263

DXVA Specification of Off-Host VLD Mode for MPEG-4 Part 2 Video Decoding 5


understanding the following sections. In the case of conflicts, later sections of this

document override this section.

The initial design is intended to be sufficient for decoding bitstreams of the Simple and

Advanced Simple Profiles. To support other profiles would require incorporating some

additional features into the design, which are not considered in this document.

1.3 Deblocking and Deringing Filters

Annex F of the MPEG-4 Part 2 specification defines post-processing techniques that

consist of a deblocking filter and deringing filter. One or both filters can be enabled as

needed as an out-of-loop post-processing operation. If an accelerator supports

deblocking filtering or deringing filtering, these filtering operations shall conform to the

specifications in subclauses F.3.1 and F.3.2 of the MPEG-4 Part 2 specification.

Accelerators should support these filtering operations.

NOTE – As previously stated, the term should describes guidelines that are encouraged

but are not mandatory requirements.

1.3.1 Deblocking filter

The deblocking filter operations are performed across 8x8 block edges at the decoder as

a post-processing operation. Luminance and chrominance data are filtered. For

additional details, refer to subclause F.3.1 of the MPEG-4 Part 2 specification.

1.3.2 Deringing filter

The deringing filter consists of three sub-processes: threshold determination, index

acquisition, and adaptive smoothing. This filter is applied to pixels on an 8x8 block basis.

More specifically, 10x10-pixel areas are processed to produce each 8x8 filtered block

region. For additional details, refer to subclause F.3.2 of the MPEG-4 Part 2

specification.

1.4 Support for Off-Host VLD Operation Only

Previous DXVA specifications have defined modes of DXVA operation other than off-

host VLD operation, such as DXVA_ModeH264_MoComp_NoFGT and

DXVA_ModeH264_IDCT_NoFGT profiles for H.264/AVC, and

DXVA_ModeWMV9_PostProc and DXVA_ModeVC1_IDCT profiles for WMV9/VC-1.

Over time, the level of industry interest in supporting such modes seems to have waned.

We therefore do not plan to specify such modes for MPEG-4 Part 2 video decoding. This

document specifies only off-host VLD operation.

1.5 Uncompressed Surface Memory Requirements

This section describes the minimum number of uncompressed surfaces required for

VLD-mode DXVA decoding of MPEG-4 Part 2. If the accelerator supports post-

processing of deblocking or deringing, a minimum of four uncompressed surfaces are

required:

Two surfaces that contain originally reconstructed pictures, used as references.

A surface used by the accelerator during the decoding process.

An out-of-loop post-processed picture that is waiting to be displayed, because of

the reordering requirements associated with B pictures.



Drivers are encouraged to support more than this number for better performance.

When neither deblocking or deringing is supported by the accelerator, a minimum of

three uncompressed surfaces are required:

Two surfaces that contain originally reconstructed pictures, used as references.

A surface used by the accelerator during the decoding process.

Drivers are encouraged to support more than this number for better performance.

Any real decoder application will probably need more than one surface for color

conversion, resizing, de-interlacing, and display, while decoding happens at the same

time.

1.6 Picture Data

The following data must be sent for each picture. For more information, see section 3.0,

Picture Parameters Data Structure.

short_video_header, which indicates whether the abbreviated header format is used

for the picture. The short header format is compatible with the ITU-T Rec. H.263

baseline format.

vop_width and vop_height, which indicate the width and height of the decoded

picture.

interlaced. When this flag is 1, it indicates that the video object planes (VOPs)

associated with the video object layer (VOL) may use frame/field adaptive coding.

When this flag is set to 0, the VOPs associated with the VOL do not use frame/field

adaptive coding.

sprite_enable, no_of_sprite_warping_points, sprite_warping_accuracy, and

warping_mv_code, which indicate whether global motion compensation (GMC) is

enabled, and related GMC parameters

quant_type and quantization matrices, which indicate the quantization methods that

are used and the corresponding quantization matrix if necessary.

wDecodedPictureIndex and wDeblockedPictureIndex, which identify the destination

surfaces.

wForwardRefPictureIndex and wBackwardRefPictureIndex, which identify surfaces

that contain reference pictures for use in inter-picture prediction processing.

quarter_sample, which indicates whether motion compensation accuracy is half

pixel or quarter pixel.

data_partitioned, which indicates whether the motion vector data is separated from

the texture data (that is, DCT coefficients) in the bitstream.

reversible_vlc, which indicates whether the reversible variable-length tables are

used when decoding transform coefficient values.

vop_coding_type and picture_coding_type, which identify whether a VOP is an intra-

coded VOP (I), predictive-coded VOP (P), bidirectionally predictive-coded VOP (B)

or sprite-coded VOP (S).

vop_coded, which indicates whether any subsequent data exists for the picture.

vop_rounding_type, which indicates the value of the rounding_control parameter

that is used for pixel value interpolation in motion compensation for P- and S(GMC)-

VOPs.

intra_dc_vlc_thr, which specifies a threshold value of the quantization scaling factor

that is used to switch between two VLC tables for the coding of Intra DC coefficients

vop_quant, which specifies the quantization scaling factor

vop_fcode_forward and vop_fcode_backward, which are used for the decoding of

motion vectors



1.7 Buffer Type

The host software decoder will send the following DXVA buffers to the accelerator in off-

host VLD profile,

One picture parameters buffer.

One quantization matrix buffer.

One or more slice/picture control buffers.

One or more bitstream data buffers.

These buffer types are defined in the prior DXVA specifications, but new data structures

are defined here for MPEG-4 Part 2 video decoding. The sequence of operations is

described in section 1.8.

1.8 DXVA Decoding Operations

The basic sequence of operations for DXVA decoding consists of the following calls by

the host software decoder. In DXVA 2.0, they are part of the IDirectXVideoDecoder

interface.

1. BeginFrame. Signals the start of one or more decoding operations by the

accelerator, which will cause the accelerator to write data into an uncompressed

surface buffer.

2. BeginFrame. The decoder calls this method again if deblocking or deringing is

performed.

3. Execute. The decoder calls Execute one or more times, sending one or more

compressed data buffers to the accelerator and specifying the operations to

perform on the buffers. The accelerator may return status information from the

call. In DXVA 2.0, the command can be specified in the Function member of

the optional DXVA2_DecodeExtensionData structure passed to the

IDirectXVideoDecoder::Execute method. However, in most cases the

command is implied by the type of buffer.

4. EndFrame. Signals that the host software decoder has sent all the data needed

for the corresponding BeginFrame call.

5. EndFrame. If the decoder called BeginFrame again in step 2 for post

processing, it calls EndFrame again.

For MPEG-4 Part 2 video decoding, the data passed to the Execute method includes a

destination index. This indicates which uncompressed surface buffer is affected by the

operation. The host software decoder can call Execute multiple times between each pair

of BeginFrame and EndFrame calls.

During the BeginFrame/EndFrame sequence, in some cases the accelerator will

access uncompressed surfaces other than the surface being written to. For example,

decoding a picture may require data from one or more previously-decoded pictures, for

use as a reference for inter-picture prediction. If the host software decoder issues a

command that requires writing to a buffer, and then issues a command that requires

reading from the same buffer, it is the accelerator's responsibility to serialize these

operations. In other words, the accelerator must complete a previous write operation

before it starts a subsequent read operation on the same buffer.

The DXVA design for MPEG-4 Part 2 video decoding restricts the sequence of buffer

types that can be sent to the accelerator. With compressed picture decoding in off-host

parsing (that is, VLD profile), the host software decoder sends the following buffers:

One picture parameters buffer.



One quantization matrix buffer.

One or more slice control buffers.

One or more bitstream data buffers.

For status reporting, the host decoder does not pass any compressed buffers to the

accelerator. Instead, it provides an output data buffer where the accelerator writes status

information (see sections 1.9 and 6.0).

Two values of bDXVA_Func are defined:

Value Description

1 Compressed picture decoding with off-host parsing.

7 Request for status report.

If dwFunction is present, it shall contain exactly one of the two values that are listed

here. Function 7 is described in the next section.

Between a single pair of BeginFrame and EndFrame calls, the host software decoder

can send one or more sets of buffers with bDXVA_Func equal to 1 for off-host parsing.

The total quantity of data in any bitstream data buffer (and the quantity of data reported

by the host software decoder) shall be an integer multiple of 128 bytes.

Whenever the host software decoder calls Execute to pass a set of compressed buffers

to the accelerator, the private output data pointer shall be NULL. When the

NumCompBuffers member of the DXVA2_DecodeExecuteParams structure is greater

than zero, pPrivateOutputData shall be NULL and PrivateOutputDataSize shall be

zero. (Alternatively, the pExtensionData member of the

DXVA2_DecodeExecuteParams structure can be NULL.)

1.9 Status Reporting

After the host software decoder calls EndFrame for the uncompressed destination

surfaces, it may call Execute with bDXVA_Func = 7 to get a status report. The host

software decoder does not pass any compressed buffers to the accelerator in this call.

Instead, the decoder provides a private output data buffer where the accelerator will

write status information.

In DXVA 2.0, the host software decoder provides the output data buffer as follows: The

decoder sets the pPrivateOutputData member of the DXVA2_DecodeExecuteParams

structure to point to the buffer. The PrivateOutputDataSize member specifies the

maximum amount of data that the accelerator should write to the buffer. The value of

PrivateOutputDataSize shall be an integer multiple of sizeof(DXVA_Status_VC1).

When the accelerator receives the Execute call for status reporting, it should not stall

operation to wait for any prior operations to be completed. Instead, it should immediately

provide the available status information for all operations that have completed because

the previous request for a status report, up to the maximum amount requested.

Immediately after the Execute call returns, the host software decoder can read the

status report information from the buffer. The status report data structure is described in

section 6.

2. Configuration Parameters This section describes the configuration parameters for MPEG-4 Part 2 video decoding

according to this specification.



2.1 Syntax

In DXVA 2.0, configuration uses the DXVA2_ConfigPictureDecode structure. This

syntax structure is documented in the DXVA 2.0 documentation

(http://msdn.microsoft.com/en-us/library/ms694823(VS.85).aspx).

2.2 Semantics

The ordinary semantics of this data structure apply for MPEG-4 Part 2 video decoding

according to this specification. Details of the usage in this context follow.

ConfigBitstreamRaw

Shall be 1, because only off-host VLD parsing profiles are supported by this

specification.

ConfigMBcontrolRasterOrder

Shall be 0, because ConfigBitstreamRaw is always equal to 1.

ConfigResidDiffHost


ConfigSpatialResid8

Shall be 0, because ConfigResidDiffHost is always equal to 0.

ConfigResid8Subtraction

Shall be 0, because ConfigSpatialResid8 is always equal to 0.

ConfigSpatialHost8or9Clipping


ConfigSpatialResidInterleaved


ConfigIntraResidUnsigned


ConfigResidDiffAccelerator


ConfigHostInverseScan

Shall be 0, because ConfigResidDiffAccelerator is always equal to 0.

ConfigSpecificIDCT


Config4GroupedCoefs


ConfigDecoderSpecific

Contains information about the recommended levels of deblocking and deringing.

Starting with the least significant bit (LSB), the following bits are used:

Bit 0 specifies the out-of-loop deblocking algorithm:

0b: No out-of-loop deblocking support.

1b: Deblocking support as defined in subclause F.3.1 of the MPEG-4 Part 2

specification.

Bit 1 specifies the out-of-loop deringing algorithm:

0b: no deringing support.



1b: deringing support as defined in subclause F.3.2 of the MPEG-4 Part 2

specification.

Support of the deblocking and deringing features by an accelerator is optional.

The remaining bits of ConfigDecoderSpecific shall always be set to 0.

In DXVA 2.0, the accelerator provides an initial list of supported configurations in

decreasing order of preference, and the host software decoder picks one entry from this

list. The host software decoder recognizes four values of ConfigDecoderSpecific: 00b,

01b, 10b and 11b.

If the accelerator supports deblocking filtering or deringing filtering, the supported

filtering shall conform to the specifications in subclauses F.3.1 and F.3.2 of the MPEG-4

Part 2 specification.

3. Picture Parameters Data Structure The DXVA_PicParams_MPEG4_PART2 structure provides the picture-level

parameters of a compressed picture for MPEG-4 Part 2 video decoding.

This structure is used when bDXVA_Func is 1 and the buffer type is

DXVA2_PictureParametersBufferType.

3.2 Syntax

typedef struct _DXVA_PicParams_MPEG4_PART2 {

UCHAR short_video_header;

UCHAR vop_coding_type;

UCHAR vop_quant;

WORD wDecodedPictureIndex;

WORD wDeblockedPictureIndex;

WORD wForwardRefPictureIndex;

WORD wBackwardRefPictureIndex;

USHORT vop_time_increment_resolution;

UINT TRB[2];

UINT TRD[2];

union {

struct {

USHORT unPicPostProc : 2;

USHORT interlaced : 1;

USHORT quant_type : 1;

USHORT quarter_sample : 1;

USHORT resync_marker_disable : 1;

USHORT data_partitioned : 1;

USHORT reversible_vlc : 1;

USHORT reduced_resolution_vop_enable : 1;

USHORT vop_coded : 1;

USHORT vop_rounding_type : 1;

USHORT intra_dc_vlc_thr : 3;

USHORT top_field_first : 1;

USHORT alternate_vertical_scan_flag : 1;

};

USHORT wPicFlagBitFields;

};



UCHAR profile_and_level_indication;

UCHAR video_object_layer_verid;

WORD vop_width;

WORD vop_height;

union {

struct {

USHORT sprite_enable : 2;

USHORT no_of_sprite_warping_points : 6;

USHORT sprite_warping_accuracy : 2;

};

USHORT wSpriteBitFields;

};

SHORT warping_mv[4][2];

union {

struct {

UCHAR vop_fcode_forward : 3;

UCHAR vop_fcode_backward : 3;

};

UCHAR wFcodeBitFields;

};

USHORT StatusReportFeedbackNumber;

USHORT Reserved16BitsA;

USHORT Reserved16BitsB;

} DXVA_PicParams_MPEG4_PART2, *LPDXVA_PicParams_MPEG4_PART2;

3.3 Semantics

If the short_video_header member is 1, the following members of the

DXVA_PicParams_MPEG4_PART2 structure are important to the accelerator:

short_video_header, vop_coding_type, vop_quant, wDecodedPictureIndex,

wForwardRefPictureIndex, unPicPostProc, vop_width, and vop_height. The

accelerator may ignore the remaining structure members, unless otherwise stated.

short_video_header

May be 0 or 1. Set to 1 when the abbreviated header format is used, which provides

compatibility with the baseline form of ITU-T Recommendation H.263, as specified

in the MPEG-4 Part 2 specification. Otherwise, this member is equal to 0.

vop_coding_type

If short_video_header is 1, this structure member may be 0 or 1:

Value Description

0 Intra-coded VOP (I).

1 Predictive-coded VOP (P).

If short_video_header is 0, this structure may be 0, 1, or 2. When equal to 0,

vop_coding_type indicates an intra-coded VOP (I); when equal to 1, this parameter

indicates a predictive-coded VOP (P); when equal to 2, this parameter indicates a

bidirectionally predictive-coded VOP (B);

Value Description

0 Intra-coded VOP (I).

1 Predictive-coded VOP (P).

2 Bidirectionally predictive-coded VOP (B).



The value shall not be 3, because Simple Profile and Advanced Simple Profile do

not use the visual tool of Sprite (S)-VOP, 0, as specified in Table 9-1 and Table 9-4

of the MPEG-4 Part 2 specification.

vop_quant

Specifies the absolute value of the quantizer scale factor to be used for inverse

quantization, until updated by a subsequent dquant, dbquant, or quant_scale.

Because the supported bit depth in DXVA of MPEG-4 Part 2 video decoding is

always 8 when using this specification, this structure member may only contain

values from 1 to 31, inclusive.

wDecodedPictureIndex

Specifies the index of the destination frame buffer for the decoded picture.

wDeblockedPictureIndex

Specifies the index of the destination frame buffer for the deblocked or deringing

output picture. If both deblocking and deringing are unsupported (bits 0 and 1 of the

ConfigDecoderSpecific member of DXVA2_ConfigPictureDecode are zero), or if

deblocking and deringing are not necessary for the decoded picture

(unPicPostProc is 0), the value of this member may be the same as

wDecodedPictureIndex.

wForwardRefPictureIndex

The picture to be used as a reference picture for forward prediction of the current

picture. The value is given as the index of the frame buffer for the reference picture.

The value must differ from wDecodedPictureIndex. If vop_coding_type is 0 (intra-

coded VOP), the value must be 0xFFFF.

wBackwardRefPictureIndex

The picture to be used as a reference picture for backward prediction of the current

picture. The value is given as the index of the frame buffer for the reference picture.

if backward reference motion prediction is used, the value must differ from

wDecodedPictureIndex. This member has no meaning and must be 0xFFFF if

vop_coding_type equals 0 for an intra-coded VOP (I) or equals 1 for a predictive-

coded VOP (P).

vop_time_increment_resolution

This is a 16-bit unsigned integer that indicates the number of evenly spaced

subintervals, called ticks, within one modulo time. One modulo time represents the

fixed interval of one second. The value zero is forbidden. This flag may not be

necessary for picture decoding of any profiles defined in this DXVA spec, but might

be needed if accelerator wants to parse the parameters in the Video Object Plane

(VOP) header, such as vop_time_increment. When short_video_header is equal to

1, this parameter shall be equal to 0.



TRB, TRD

TRB contains the difference between the temporal reference of the current B-VOP

and that of the previous reference VOP. TRD contains the difference in temporal

reference of the temporally next reference VOP with the temporally previous

reference VOP, assuming B-VOPs or skipped VOPs in between. The temporal

references TRB[i] and TRD[i], where i equals 0 for the top field of the B-VOP and i

equals 1 for the bottom field of the B-VOP, are distances in time expressed in field

periods. Figure 7-48 in section 7.7.2.2 of the MPEG-4 Part 2 specification shows

how they are defined for the case where i is 0 (top field of the B-VOP). When the

current VOP is not a B-VOP, the accelerator may decide to ignore the flags. If

short_video_header is 1 (and B-VOP is therefore not supported), both arrays shall

be equal to 0. For a progressive B-VOP, only TRB[0] and TRD[0] shall be used, and

the accelerator shall ignore TRB[1] and TRD[1].

unPicPostProc

Specifies the use of deblocking or deringing. The allowed values of unPicPostProc

are constrained by the value of the ConfigDecoderSpecific member of the

DXVA2_ConfigPictureDecode structure.

Starting with the least significant bit (LSB), bits 0 and 1 specify the deblocking or

deringing filtering to be applied. The bits are interpreted as follows:

Bit 0: deblocking filter specified in subclause F.3.1 of the MPEG-4 Part 2

specification.

Bit 1: deringing filter specified in subclause F.3.2 of the MPEG-4 Part 2

specification.

The following table shows the valid values of this parameter:

Value Decription

00b No post-filtering.

01b Deblocking filter, as specified in subclause F.3.1 of the MPEG-4 Part 2 specification.

10b Deringing filter, as specified in subclause F.3.2 of the MPEG-4 Part 2 specification.

11b Deblocking filter and deringing filter.

interlaced

If 1, indicates that the VOPs associated with the VOL may be coded by using

adaptive frame/field coding. Otherwise, if equal to 0, the VOPs associated with the

VOL do not use adaptive frame/field coding. If short_video_header is 1, interlaced

shall be 0, as specified in Table 6-28 of the MPEG-4 Part 2 specification.

sprite_enable

To support Simple and Advanced Simple Profiles, if video_object_layer_verid is

0001b, sprite_enable is a one-bit flag. If the flag is 1, static (basic or low latency)

sprite coding is used. Otherwise, if the flag is 0, static sprite coding is not used.

If video_object_layer_verid is 0101b, sprite_enable a two-bit unsigned integer

that indicates the usage of static sprite coding or global motion compensation

(GMC). Table 6-19 of the MPEG-4 Part 2 specification specifies the semantics of

this value.

If short_video_header is 1, sprite_enable shall be 0.



quant_type

If 1, the first inverse quantization method is used for inverse quantization of the DCT

coefficients . If 0, the second inverse quantization method is used. Both inverse

quantization methods are described in subclause 7.4.4 of the MPEG-4 Part 2

specification. For the first inverse quantization method, two matrices are used, one

for intra blocks the other for non-intra blocks. If short_video_header is 1,

quant_type shall be 0, as specified in Table 6-28 of the MPEG-4 Part 2

specification.

quarter_sample

If 0, half-sample mode shall be used for motion compensation. If 1, quarter-sample

mode shall be used for motion compensation of the luminance component. If

short_video_header is 1, quarter_sample shall be 0.

resync_marker_disable

If 1, there are no resync_marker syntax elements in coded VOPs. This flag is only

useful for optimizing the accelerator operation. The accelerator can successfully

perform decoding without considering this flag. If short_video_header is 1,

resync_marker_disable shall be 1, as specified in Table 6-28 of the MPEG-4 Part

2 specification.

data_partitioned

If 1, the macroblock data is rearranged. Specifically, motion vector data is separated

from the texture data (that is., DCT coefficients). If short_video_header is 1,

data_partitioned shall be 0, as specified in Table 6-28 of the MPEG-4 Part 2

specification.

reversible_vlc

If 1, the reversible variable-length tables (Tables B-23, B-24, and B-25 in the MPEG-

4 Part 2 specification) should be used when decoding DCT coefficients. These

tables can be used only when the data_partitioned flag is enabled. The

reversible_vlc flag shall be 0 for B-VOPs. Use of an escape sequence (Tables B-

24 and B-25 in the MPEG-4 Part 2 specification) for encoding the combinations

listed in Table B-23 is prohibited. If short_video_header is 1, reversible_vlc shall

be 0, as specified in Table 6-28 of the MPEG-4 Part 2 specification.

reduced_resolution_vop_enable

If 1, the reduced resolution VOP tool is enabled. If video_object_layer_verid is

0001b, and therefore reduced_resolution_vop_enable is not transmitted, this flag

takes a default value of zero. This flag is not used for the picture decoding process

of any profiles defined in this DXVA spec, but might be needed if the accelerator will

parse VOP headers. If short_video_header is equal to 1,

reduced_resolution_vop_enable shall be 0.

vop_coded

If 0, no subsequent data exists for the VOP. In this case, the following decoding

rules apply: Because video_object_layer_shape always equals "rectangular," the

luminance and chrominance planes of the reconstructed VOP shall be filled with the

forward reference VOP, as defined in subclause 7.6.7 of the MPEG-4 Part 2

specification. If short_video_header is 1, vop_coded shall be 1, as specified in

Table 6-28 of the MPEG-4 Part 2 specification.



vop_rounding_type

Specifies the value of the rounding_control parameter. It is used for pixel value

interpolation in motion compensation for P- and S(GMC)- VOPs. If this flag 0, the

value of rounding_control is 0, and if this flag is 1, the value of rounding_control is 1.

When the corresponding syntax element is not present in the VOP header, the value

of rounding_control is set to 0, and the value of vop_rounding_type shall be 0. If

short_video_header is 1, vop_rounding_type shall be 0, as specified in Table 6-

28 of the MPEG-4 Part 2 specification.

intra_dc_vlc_thr

Specifies a threshold value of quantizer scale used to switch between two VLCs for

coding of Intra DC coefficients, as per Table 6-25 in MPEG-4 Part 2 specification. If

short_video_header is 1, intra_dc_vlc_thr shall be 0.

top_field_first

If 1, indicates that the top field (that is, the field that contains the top line) of

reconstructed VOP is the first field to be output by the decoding process and

displayed. When 0, the bottom field of the reconstructed VOP is the first field to be

displayed. This flag might be used for the internal optimizations in accelerator. For

example, some post-processing might be performed after the decoding of both fields

is completed. If short_video_header is 1, top_field_first shall be 0.

alternate_vertical_scan_flag

The value 1 indicates that alternate vertical scan for interlaced VOPs will be used. If

short_video_header is 1, alternate_vertical_scan_flag shall be 0.

profile_and_level_indication

Specifies the profile and level identification. Only Simple and Advanced Simple

Profiles are supported by this specification.

video_object_layer_verid

Specifies the version number of the video object layer. The meaning is defined in

Table 6-13 of the MPEG-4 Part 2 specification. If both the visual_object_verid and

video_object_layer_verid syntax elements are present in the bitstream, the value of

video_object_layer_verid supersedes the value of visual_object_verid. If the

corresponding syntax element is not present in the bitstream, the value of this

structure member shall be equal to the value of the visual_object_verid syntax

element in the bitstream. Because only Simple and Advanced Simple Profiles are

supported by this specification, the only supported values for

video_object_layer_verid are 0001b and 0101b. If short_video_header is 1, this

parameter shall be 0001b.

vop_width

Specifies the width of the displayable part of the luminance component, in pixel

units. The width of the encoded luminance component of VOPs in macroblocks is

(vop_width + 15) / 16. The displayable part is left-aligned in the encoded VOPs.

The value shall not be zero. If short_video_header is 1, the value of vop_width

shall be set according to the value of the source_format syntax element in the video

bitstream, as specified in Table 6-29 of the MPEG-4 Part 2 specification.

vop_height

Specifies the height of the displayable part of the luminance component, in pixel

units. The height of the encoded luminance component of VOPs in macroblocks is

(vop_height + 15) / 16. The displayable part is top-aligned in the encoded VOPs.

The value shall not be zero. If short_video_header is 1, the value of vop_height

shall be set according to the value of the source_format syntax element in the video

bitstream, as specified in Table 6-29 of the MPEG-4 Part 2 specification.



no_of_sprite_warping_points

Specifies the number of points used in sprite warping. If the value is 0 and

sprite_enable is set to "static” or “GMC,” warping is an identity operation (stationary

sprite) and no coordinates are coded. If the value is 1, 2 or 3, an affine transform is

used. The value 1 is treated as a special case distinct from the usage associated

with values 2 or 3. Table 6-20 of the MPEG-4 Part 2 specification lists the various

choices. If sprite_enable is 0, no_of_sprite_warping_points shall be 0.

Note The value 4 is disallowed by the MPEG-4 Part 2 specification when

sprite_enable equals “GMC.”

sprite_warping_accuracy

Specifies the quantization accuracy of motion vectors used in the warping process

for sprites and GMC. Table 6-21 of the MPEG-4 Part 2 specification specifies the

meaning of various code words. If sprite_enable is 0, sprite_warping_accuracy

shall be 0.

warping_mv

Specifies differential motion vectors for implied warping. The number of differential

motion vectors for implied warping may be 0, 1, 2 or 3, as specified by

no_of_sprite_warping_points. The value of warping_mv[i][0] specifies du[i], and

the value of warping_mv[i][1] specifies dv[i]. Entries in this array that are not

represented in the bitstream shall be equal to 0.

vop_fcode_forward

Corresponds to the MPEG-4 part 2 syntax element of the same name. The value

can range from 1 to 7. The value zero is forbidden. This parameter is used in the

decoding of motion vectors. If the corresponding syntax element is not represented

in the bitstream, vop_fcode_forward shall be 1.

vop_fcode_backward

Corresponds to the MPEG-4 part 2 syntax element of the same name. The value

can range from 1 to 7. The value zero is forbidden. This parameter is used in the

decoding of motion vectors. If the corresponding syntax element is not represented

in the bitstream, vop_fcode_backward shall be 1.

StatusReportFeedbackNumber

Arbitrary number set by the host software decoder to use as a tag in the status

report feedback data. The value should not equal 0, and should be different in each

call to Execute. For more information, see section 6, Status Report Data Structure.

Reserved16BitsA

This structure member has no meaning. The value shall be set to 0 by the host

software decoder, and accelerators shall ignore the value.

Reserved16BitsB

This structure member has no meaning. The value shall be set to 0 by the host

software decoder, and accelerators shall ignore the value.

4. Quantization Matrix Data Structure The quantization matrix data structure is used when bDXVA_Func is 1 and the buffer

type is DXVA2_InverseQuantizationMatrixBufferType.



4.1 Syntax

The DXVA_QmatrixData structure, as specified for other DXVA usage cases, contains

inverse-quantization matrix data for off-host bitstream decoding of the compressed video

picture. For convenience, the form of this data structure is shown in the following code:

typedef struct _DXVA_QmatrixData {

BYTE bNewQmatrix[4]; //intra Y, inter Y, intra chroma, inter chroma

WORD Qmatrix[4][64];

} DXVA_QmatrixData, *LPDXVA_QmatrixData;

4.2 Semantics

The semantics of the DXVA_QmatrixData data structure are the same as specified for

other DXVA usage cases, except as follows.

The quantization matrix data structure shall be sent explicitly by the host software

decoder for every picture. The value of bNewQmatrix[i] shall be equal to 1 for i = 0 and

1, even if the associated Qmatrix[i][j] entries contain values that are available by default

in the MPEG-4 Part 2 video coding specification. (This ensures stateless operation.)

Because the Simple and Advanced Simple profiles of the MPEG-4 Part 2 do not support

custom chroma quantization matrices, the value of bNewQmatrix[i] shall be equal to 0

for i = 2 and 3 within the DXVA_QmatrixData structure.

If quant_type is 1, the first inverse quantization method is used for inverse quantization

of the DCT coefficients. In this method, custom quantization matrices are supported.

If quant_type is 0, the second inverse quantization method is used for inverse

quantization of the DCT coefficients. In this method, custom quantization matrices are

not supported. Therefore, when quant_type is 0, the value of Qmatrix[i][j] shall be 16

for i = 0 and 1 and j = [0…63], and the accelerator may ignore these values.

Note Hypothetically, this structure could have been included in the picture parameters

data structure, but DXVA already defines a buffer type for quantization matrices, and this

method is specified for design consistency.

5. Slice Control Data Structure The DXVA_SliceInfo structure is used when bDXVA_Func is 1 and the buffer type is

DXVA2_SliceControlBufferType. The slice control buffer is accompanied by a raw

bitstream data buffer. The total quantity of data in the bitstream buffer, and the quantity

of data reported by the host software decoder, shall be an integer multiple of 128 bytes.

5.1 Syntax

The DXVA_SliceInfo data structure, as specified for other DXVA usage cases, is sent

by the host software decoder to the accelerator to convey slice control data. For

convenience, the form of this data structure is shown in the following code:

typedef struct _DXVA_SliceInfo {

WORD wHorizontalPosition;

WORD wVerticalPosition;

DWORD dwSliceBitsInBuffer;

DWORD dwSliceDataLocation;

BYTE bStartCodeBitOffset;



BYTE bReservedBits;

WORD wMBbitOffset;

WORD wNumberMBsInSlice;

WORD wQuantizerScaleCode;

WORD wBadSliceChopping;

} DXVA_SliceInfo, *LPDXVA_SliceInfo;

5.2 Semantics

The semantics of the DXVA_SliceInfo data structure are the same as specified for other

DXVA usage cases.

6. Status Report Data Structure The DXVA_Status_VC1 data structure, as specified for other DXVA usage cases, is

sent by the accelerator to the host software decoder to convey decoding status

information.

This structure is used when bDXVA_Func is 7. The status reporting command does not

use a compressed buffer. Instead, the host software decoder provides a buffer as

private output data. For more information, see section 1.9, Status Reporting.

The status information command should be asynchronous to the decoding process. The

host software decoder should not wait to receive status information on a process before

it starts another process. After the host software decoder receives a status report for a

particular operation, the accelerator shall discard that information and not report it again.

(That is, the results of each particular operation shall not be reported to the host

software decoder multiple times.) Accelerators shall be able to provide status

information for every buffer for every operation that is performed.

Accelerators are required to store at least 512 DXVA_Status_VC1 structures internally,

pending status requests from the host software decoder. An accelerator may (and

should) exceed this storage capacity. If the accelerator discards reporting information, it

should discard the oldest data first. The accelerator should provide status reports in

approximately reverse temporal order of when the operations were completed. That is,

status reports for the most recently completed operations should appear earlier in the list

of status report data structures.

Note As previously stated, the term should describes guidelines that are encouraged

but are not mandatory requirements.

6.1 Syntax

The DXVA_Status_VC1 data structure, as specified for other DXVA usage cases, is

sent by the accelerator to the host software decoder to convey decoding status

information. For convenience, the form of this data structure is shown in the following

code:

typedef struct _DXVA_Status_VC1 {

USHORT StatusReportFeedbackNumber;

WORD wDecodedPictureIndex;

WORD wDeblockedPictureIndex;

UCHAR bPicStructure;

UCHAR bBufType;

UCHAR bStatus;



UCHAR bReserved8Bits;

USHORT wNumMbsAffected;

} DXVA_Status_VC1, *LPDXVA_Status_VC1;

6.2 Semantics

The semantics of the DXVA_Status_VC1 data structure are the same as specified for

other DXVA usage cases.

7. Restricted-Mode Profiles The following restricted-mode profiles for DXVA operation for MPEG-4 Part 2 video

decoding are defined.

7.1 DXVA_ModeMPEG4pt2_VLD_Simple Profile

This profile supports the features necessary for a decoder that conforms to the MPEG-4

Part 2 Simple profile. In this profile, the accelerator performs bitstream parsing, inverse

quantization scaling, inverse transform processing, motion compensation, and out-of-

loop deblocking and deringing (if supported by the accelerator).

All data buffers shall contain only data that is consistent with the constraints specified for

the Simple Profile of the MPEG-4 Part 2 specification.

Note The video object layer with short headers (H.263 baseline) is supported in this

profile, because it is supported within the MPEG-4 Part 2 Simple Profile.

Note Global motion compensation (GMC) is not supported in this profile, because it is

not supported within the MPEG-4 Part 2 Simple Profile.

The GUID for this profile is currently not defined in the Windows SDK. To use this

profile, use the following declaration:

// {EFD64D74-C9E8-41D7-A5E9-E9B0E39FA319}

DEFINE_GUID(DXVA_ModeMPEG4pt2_VLD_Simple,

0xefd64d74, 0xc9e8, 0x41d7, 0xa5, 0xe9, 0xe9, 0xb0, 0xe3, 0x9f, 0xa3,

0x19);

7.1 DXVA_ModeMPEG4pt2_VLD_AdvSimple_NoGMC Profile


Part 2 Simple and Advanced Simple profiles, except that this profile does not support

global motion compensation (GMC). In this profile, the accelerator performs bitstream

parsing, inverse quantization scaling, inverse transform processing, motion

compensation, and out-of-loop deblocking and deringing (if supported by the

accelerator).

In this profile, all data buffers shall contain only data that is consistent with the

constraints specified for the Simple or Advanced Simple Profiles of the MPEG-4 Part 2

specification.

In this profile, sprite_enable shall be equal to 0.


profile, because it is supported within the MPEG-4 Part 2 Simple and Advanced Simple

Profiles.



Note Global motion compensation (GMC) is not supported in this profile.

The requirements for supporting this profile are a superset of those needed to support

DXVA_ModeMPEG4pt2_VLD_Simple. Therefore, if an accelerator driver supports this

profile, it should also indicate support for the DXVA_ModeMPEG4pt2_VLD_Simple

profile.



// {ED418A9F-010D-4EDA-9AE3-9A65358D8D2E}

DEFINE_GUID(DXVA_ModeMPEG4pt2_VLD_AdvSimple_NoGMC,

0xed418a9f, 0x10d, 0x4eda, 0x9a, 0xe3, 0x9a, 0x65, 0x35, 0x8d, 0x8d,

0x2e);

7.2 DXVA_ModeMPEG4pt2_VLD_AdvSimple_GMC Profile


Part 2 Simple and Advanced Simple profiles, and the video object layer with short

headers, with the support of GMC. In this profile, the accelerator performs bitstream

parsing, inverse quantization scaling, inverse transform processing, motion

compensation, and out-of-loop deblocking and deringing (if supported by the

accelerator).

In this profile, all data buffers shall contain only data that is consistent with the

constraints specified for the Simple or Advanced Simple Profiles of the MPEG-4 Part 2

specification.


profile, because it is supported within the MPEG-4 Part 2 Simple and Advanced Simple

Profiles.

Note Global motion compensation (GMC) is supported in this profile.

The requirements for supporting this profile are a superset of both the

DXVA_ModeMPEG4pt2_VLD_Simple profile and the

DXVA_ModeMPEG4pt2_VLD_AdvSimple_NoGMC profile. Therefore, if an accelerator

driver supports this profile, it should also indicate support for the other two profiles.



// {AB998B5B-4258-44A9-9FEB-94E597A6BAAE}

DEFINE_GUID(DXVA_ModeMPEG4pt2_VLD_AdvSimple_GMC,

0xab998b5b, 0x4258, 0x44a9, 0x9f, 0xeb, 0x94, 0xe5, 0x97, 0xa6, 0xba,

0xae);

8. IDirectXVideoDecoder Operation The host software decoder uses the IDirectXVideoDecoder interface in essentially the

same as previous DXVA designs, except as identified here.

8.1 Structure of BeginFrame, Execute, and EndFrame Calls

When out-of-loop post-processing of deblocking or deringing is used, the host software

decoder calls BeginFrame multiple times for each decoded picture, so that it can



provide surface information for both the decoded and the post-processed surfaces. The

host software decoder may typically use the following calls to decode a picture:

1. The decoder calls BeginFrame with the index of the in-loop decoded output

surface.

2. If out-of-loop post-processing of deblocking or deringing is to be performed, the

decoder calls BeginFrame with the index of the post-processed output surface.

3. The decoder calls Execute with a picture parameters buffer

(DXVA_PictureParameters structure) that indicates the decoded and post-

processing surface indexes.

4. The decoder calls Execute one or more times with buffers that contain

macroblock and residual data.

5. The decoder calls EndFrame with the index of the in-loop decoded output

surface.

6. If out-of-loop post-processing of deblocking or deringing is to be performed, the

decoder calls EndFrame with the index of the post-processed output surface.

In addition, the decoder may make paired calls to BeginFrame and EndFrame for some

surface, without intervening calls to Execute. These calls may be made outside any pair

of BeginFrame/EndFrame calls for some other surface, but may not be interspersed

with such calls.

Annex A: Buffer Copies and Post-processing MPEG-4 Part 2 video decoding can produce two distinct picture outputs: an in-loop

picture for predicting subsequent pictures, and an out-of-loop, post-processed picture for

display with deblocking and/or deringing.

The wDecodedPictureIndex member of the DXVA_PictureParameters structure

contains the index of the destination surface for the picture after decoding and in-loop

filtering. The wDeblockedPictureIndex member of the structure contains the index of

the destination surface for the post-processed picture.

When no post-processing is invoked, the same output is specified to be written to the

two output destinations. Whenever the same output data should be written to both

output destinations, software decoders shall set the value of wDeblockedPictureIndex

equal to wDecodedPictureIndex, to make accelerator implementation easier and to

save the additional buffer copy and a write operation.

When post-processing is invoked, because B pictures are not used as references for

decoding other pictures in the support of Simple and Advanced Simple Profiles, there is

no need to store the value of the decoded picture before post-processing is performed.

In that case, the host shall set wDecodedPictureIndex to the same value as

wDeblockedPictureIndex for decoding a B picture.

With regard to I and P picture decoding when post-processing is invoked, the host shall

set two different indexes for wDecodedPictureIndex and wDeblockedPictureIndex,

because the data written to each surface will not be identical. The surface at

wDecodedPictureIndex will contain the picture before any out-of-loop post-processing

and wDeblockedPictureIndex will contain the picture that results after out-of-loop post-

processing is applied.

For More Information DXVA 1.0 specification: http://go.microsoft.com/fwlink/?LinkId=93647

DXVA interfaces:

http://go.microsoft.com/fwlink/?LinkId=93647




DXVA configuration: http://go.microsoft.com/fwlink/?LinkId=210877

Windows Driver Kit (WDK) documentation for DXVA:

http://msdn.microsoft.com/en-us/library/ff55386.aspx

DXVA 2.0: http://go.microsoft.com/fwlink/?LinkId=94771



http://msdn.microsoft.com/en-us/library/ff55386.aspx


Documents

DirectX Video Acceleration Specification of Off-Host VLD Mode for