Understanding and implementing JPEG2000 compression for long-form EFP acquisition

Understanding and implementing JPEG2000 compression for long-form EFP acquisition

John R Naylor and David Bancroft Thomson, Beaverton, OR

While the concept of digital compression is now universally recognised, there is still widespread concern and confusion about which codec is appropriate at which stage of production and delivery - and what degree of compression is acceptable.

Compression, incidentally, is not just an electronic process. The human eye has approximately 130 million photo-receptors but only one million nerve connections to the brain. Given that more than 50% of the human brain is involved in visual processing, we are pretty good at compression and decompression ourselves!

Returning to digital technology, in an ideal world the production workflow (at least) would not use compression at all. Unfortunately, “technically ideal” uncompressed HD requires a data rate of close to 1.5Gb/s1. While this can be handled in some special circumstances, it is simply too much data for most applications. In particular, recording and replaying such data rates in real time is very challenging.

Conversely, applying compression, by its very definition, implies affecting the quality of the image. In fact, you are throwing away some of the data. Those who develop compression schemes have to find the balance between the amount of data that is discarded and the impact on image quality.

That balance is just one decision to be made when developing or selecting a compression scheme.

1 1920 x 1080 x 10 bits x (1 x Y + ½ x Cr + ½ x Cb) x 30

frames per second = 1,244,160,000 bits per second of

payload data, plus audio, control and network overhead

Another is its computational complexity. It goes without saying that a video codec has to operate in real time, certainly in playback - but what sort of processor power is required to achieve that?

MPEG-2 and now H.264 MPEG-4 are particularly suited to delivery because they are asymmetrical in their computational demands. Encoding calls for considerable processing power, especially if high compression ratios are required, whereas decoding is simple enough to be performed by a low-cost chip embedded in a consumer-priced set-top box or receiver.

But that asymmetry is not necessarily appropriate for professional applications where equal processing power can be included in each device. There the requirement is more dependent upon quality, or upon other factors such as ease of editing.

So there is no single answer to the compression question. In fact there are at least four, as can be seen in figure 1. The four stages of capture, edit, playout and delivery all have different requirements, and therefore logically should use different codecs.

This paper considers only the production workflow, so focuses only on the capture stage and how it interacts with the edit stage.

Figure 1 - The Four Stages of the Production Workflow

CAPTURE

“Garbage in, garbage out” is a long-established principle in computing, and it applies equally well to video production. If the image in the camera is poor then nothing you do downstream is going to improve it.

Even considering something to be “good enough” is a dangerous assumption. Today’s recordings are tomorrow’s archives. Outside the United States, many production companies are shooting now in high definition. This is not because the programme will be broadcast in HD immediately, but because it future-proofs the content.

The capture stage is not a place where picture quality should be traded off lightly. Yet that has been common practice, and compromised solutions continue to be championed.

The SMPTE standard 274M encodes HD video at a resolution of 1920 x 1080 pixels, usually using 10 bits per sample and 4:2:2 colour sampling. Note that, while being regarded as the gold standard, this

encoding in itself includes a compromise in a form of compression: the colour is sub-sampled at half the rate of the luminance. This closely mimics the operation of human vision, and is generally regarded as visually lossless.

Some capture systems introduce further compromises, including:

• a sub-sampled raster, where only 1440 pixels horizontally are captured, rather than 1920 pixels, throwing away 25% of the available information

• more aggressively sub-sampled colours, processing 4:2:0 (another 25% loss) or worse; and/or,

• truncated samples, using only 8 bits rather than 10 bits - another 20% of the signal gone forever.

Camera systems using legacy codecs such as MPEG-2 and the DV family have typically made at least one - and sometimes all three - of these additional compromises, all of which can

Capture Edit Playout Emission

50-100 Mb/s 100-300 Mb/s 50-100Mb/s 1-20 Mb/s

Picture QualityCoding Efficiency

Software DecodingEasy to Edit

Multi-gen Transparency

Simplicity

Single-gen Picture Quality

Interop with emission codec

Stream-abilityPlaylist Friendly

Minimum bit-rate

camcorder or

archiveNLE

edit codec

play-outserver

emissionencoder

accumulate to a potential overall information loss of 55% before you even get to a codec.

These compromises were initially implemented because of the restricted recording bandwidth available, particularly on compact tape formats for use in camcorders. As developments in media technology progressed, so the spotlight shifts to the record time available: there is always pressure to capture more on a single tape or other unit of media.

The use of advanced codecs such as JPEG2000 and H.264/AVC changes the balance slightly. They provide higher compression efficiency, which can either improve quality at a given data rate, or in some cases provide lower bitrates for an equivalent quality. These new formats are, however, computationally complex.

There is an important issue here. You can tackle the issue of computational complexity in two ways. You can throw processor power at it – which becomes power-hungry with all the problems that entails – or you can develop a dedicated chipset to implement the coding in hardware.

For those of us in the broadcast industry, it is rare to have the opportunity to develop such an application-specific chip, because the volumes are simply not there to make it an economic proposition. JPEG2000, though, is not just a broadcast codec. Indeed, it was deliberately developed to have a wide range of applications and therefore a much greater volume. Those applications are likely to be in consumer devices every bit as much as professional environments, so the economies of scale in developing dedicated silicon are readily available.

Analog Devices, for example, is now on its third generation of JPEG2000 coding chips, the ADV212, and is finding applications in CCTV and surveillance systems, and in digital still cameras, as well as in broadcast quality camcorders.

In developing the Infinity digital media camcorder, the Thomson design team made the decision to offer the user a choice of codecs: MPEG-2, DV as well as JPEG2000 using ADV212 chips. The legacy codecs are provided to allow our customers to migrate in an orderly fashion from existing workflows, with the recognition that JPEG2000 delivers better images.

We see JPEG2000 as the codec for the future, particularly in HD, and I will devote the balance of this paper to outlining the practical benefits that JPEG2000 brings.

JPEG2000

The original JPEG format was developed more than 20 years ago. As its name suggests (JPEG stands for Joint Photographic Experts Group), the format’s original application was as a digital photography standard. It is only thanks to determined video engineers that it also became a motion standard. When JPEG set out to create a new standard for the 21st century, moving images were an integral part of the design from the outset.

JPEG and MPEG each depend on a technique of dividing each image up into blocks, then applying discrete cosine transforms (DCT) to each block to achieve the compression required. This has the advantage of making relatively modest processing demands in the encoding stage, but it has an equally enormous disadvantage in that, when the compression is pressed hard, the coding produces discontinuities at the block borders. This is the blocking we have all seen in compression. Because it manifests itself as perfectly horizontal and vertical straight lines – which are virtually unknown in nature - it is immediately and objectionably visible.

Many implementations of MPEG for professional video also use temporal compression to get the bitrate down. A group of pictures will include one I-frame with full picture information, and the rest of the group will be references to that I-frame.

This ainescasome to cut real pra databe visi

Takingimagebeen a

First, transfoamoundeeplyThe pbecausand lono neeup to 4

The sia digiseparaand vefour elhigh (conveand loand hvertica

approach makeapable fact is will look betteon one of the nroblem. Anotha corruption inible on every f

g advantage o coding, the d

able to tackle b

rather than Dorms to procent of data. Doy into the mapoint we neese wavelet tra

ocation informaed to divide th4k resolution c

imple way to thital filter whicating high andertically. Afterlements, wavelhorizontal an

entionally descow vertical frehigh vertical (al (LL).

es editing a nithat, in a gr

er than others,not so good pic

her issue is thatn an I-frame, itframe in the Go

of advances indevelopers of

both these issue

DCT, JPEG200ess the image o not worry: wathematics of ed to underlinansforms codeation at the same image up int

can be processe

hink of wavelech is passed

d low frequencr the transform let coefficientsnd high vertcribed as HH)equencies (HL)(LH), and low

ightmare, for troup of pictur and if you nectures you havt, should there t will potentiaoP.

n the science JPEG2000 ha

es.

00 uses waveand reduce t

we will not deltransforms he

ne here is the both frequenme time, thereto blocks: imaged in one pass.

et transforms isover the imag

cies, horizontayou end up w

s representing tical frequenc, high horizon), low horizon

w horizontal a

the res, eed e a be

ally

of ave

elet the lve ere. hat, ncy e is ges

s as ge,

ally with

the cies ntal ntal and

Ashalimmusecwitdir

2 F

Co

aut

Fig

s you can see flf-size, half-re

mage. The waultiple times. Fcond pass on th the very toprection) version

Figures 2 thru 4

mpression by P

thor.

gure 2 – original

from figure 3, esolution versavelet transforFigure 4 showthe LL elemep left now a qn of the origina

4 are reproduc

eter Symes with

image2

the LL elemenion of the orrm can be aws the result ent of the firstquarter size (inal.

ced from Digital

h the permission

nt is a riginal

applied of the t pass, n each

l Video

n of the

Figure

Figure

Part oand thcoeffic

3 – first wavelet

LH bottom l

e 4 – second wa

now at o

of the “intellighis applies to cients represe

t pass, with LL to

left and HH botto

avelet pass, show

one quarter resolu

gence” in videoall coding sc

enting only h

op left, HL top rig

om right

wing the original

ution

o compressionchemes - is thhigh frequenc

ght,

is

n – hat

cies

machathapro

Thalsimwocoppaswax 5

In thethuaddbrocom

Sostax 5notresAnemcanon rig

In imThnontim

ay be quantisedange to the perat this iterativeogressively eas

hat is critical toso leads to a

mportance in orkflows. Eachpy of the origss. So when coavelet compres540 version, th

the decode pre LL version umbnail versiditional procesowse version mpression sche

if you are shoart editing on a540 version, ort take any mosolution versiond because t

mbedded into tn never be septhe right conte

ght up to the mo

fact, the waymplemented in homson Grassnlinear editors

me of encoding

d more coarselyrceived image e process prodsier to compres

o the success oside benefit t

practical ph wavelet tra

ginal, half the ompressing a sion automaticen a 480 x 270

rocess you canat any stag

on of any issing. In videocreated as an eme itself.

ooting HD on la laptop, you sir even a 480 x ore processingon: in fact it tathe browse the full resoluparated so you ent at the right oment of deliv

y that the JPEthe Infinity

s Valley Aus, is that the dg, as can be seen

y without signquality. So it is

duces a signal ss.

of JPEG2000,that is of enoprofessional ansform inclusize of the pre1920 x 1080 ically generates0 version, and s

n choose to pue. So you himage withouo terms, you hintegral part

location and wimply work on270 version. I

g to create thisakes less proceversion is f

ution signal thare always womoment, poten

very.

EG2000 encodcamcorder, a

urora and Edata is layered n in figure 5.

ificant s clear that is

and it rmous video

udes a evious image, a 960 so on.

ull out have a ut any have a of the

want to n a 960 It does s low-essing! forever he two orking ntially

ding is and in EDIUS

at the

Figure 5 – JPEG2000 Layering Scheme for Infinity Content Format

Layer 1 is a 6.25Mb/s stream for browse quality: quarter resolution HD or half resolution SD. Layer 2, together with layer 1, totals 25Mb/s and provides half resolution HD or OK SD. Layer 3 brings the full signal (in Infinity it is selectable between 50, 75 and 100Mb/s) for full quality.

One important benefit of the ready access to these lower bitrate versions of the identical signal is that they can be used in editing on platforms with limited power. That means, for example, that journalists can edit stories in the field using a laptop editor.

That is not the only way to implement the progressive nature of wavelet compression within JPEG2000. In news, for instance, it might be important to get the full size picture home as soon as possible, so you could re-organise the bitstream so that a soft but full resolution image is delivered and decoded first, getting progressively sharper as more data is decoded.

IMAGE QUALITY

The clever things that can be done with the data structure of JPEG2000 are of course important, but are meaningless if the codec does not deliver good quality.

Subjective and objective testing, by Thomson and others tends to confirm that visually satisfying

reproduction is achieved in JPEG2000 when the bitrate of the compressed signal corresponds to between one and two bits per pixel in the original image.

In developing the Infinity digital media camcorder, the design decision was taken to offer the user high definition JPEG2000 settings of 100Mb/s, 75Mb/s and 50Mb/s. In 29.97 frames per second video these represent 1.6 bits per pixel3, 1.2 bits per pixel4 and 0.8 bits per pixel5 respectively. 100Mb/s delivers excellent quality and 75Mb/s good HD. While 50Mb/s should be regarded as marginal, it may be acceptable in some applications as a trade-off between quality and recording time.

For 25 frames per second markets, the yields are 1.9 bits per pixel at 100Mb/s6, 1.4 bits per pixel at 75Mb/s7 and 0.96 bits per pixel at 50Mb/s8 – even closer to the ideal.

3 100 x 1,000,000/(1920 x 1080 x 29.97) = 1.609

4 75 x 1,000,000/(1920 x 1080 x 29.97) = 1.207

5 50 x 1,000,000/(1920 x 1080 x 29.97) = 0.805

6 100 x 1,000,000/(1920 x 1080 x 25) = 1.929

7 75 x 1,000,000/(1920 x 1080 x 25) = 1.447

8 50 x 1,000,000/(1920 x 1080 x 25) = 0.9645

As already noted, the layered structure adopted for the Infinity camcorder also includes a “quarter HD” stream at 25Mb/s and a “quarter quarter HD” stream at 6.25Mb/s, which are created as part of the inherent multipass wavelet compression of JPEG2000.

Image distortion introduced by excessive wavelet compression tends to take the form of a softening of detail or slight smearing, which is psycho-visually much less obtrusive than blocking. The complete absence of macroblocking ensures that JPEG2000 compression degrades under pressure much more gracefully.

This also has an impact downstream. Blocking artefacts from the acquisition compression can stress the transmission compression encoder, so even if MPEG errors in acquisition are not visually disturbing on the studio monitor, they may cause a significant reduction in the quality of experience once compressed for delivery.

In comparison, any softening from JPEG2000’s wavelet compression process would not cause problems for the transmission encoder. Indeed, any slight softening of the image would actually reduce the load on the downstream MPEG compression.

That wavelet compression does not generate offensive artefacts when challenged has been used as a benefit by other manufacturers. The new entrant into the digital cinematography market, RED Digital Cinema Camera Company, has based its proprietary REDCODE™9 on wavelet compression. As already noted, JPEG2000 is also suited to other video applications such as surveillance, and has been implemented by Ikegami among others. At Thomson, we employ wavelet compression, in a scheme based on JPEG2000, in our digital HD wireless camera systems.

9 REDCODE is a trademark of the RED Digital Cinema

Camera Company

Perhaps the best assurance that JPEG2000 delivers excellent image quality comes from the fact that the Digital Cinema Initiative, the consortium of seven top Hollywood studios, selected this codec after thorough evaluations as its standard for delivery of movies. The DCI’s stated aim was to select the standard which guaranteed the high presentation standards possible in the digital cinemas of the future.

JPEG2000 IN THE WORKFLOW

After capture, content is passed to edit workstations, and so it is important that any codec implemented in a camcorder can be used downstream.

It is important to make the point that JPEG2000 is an Intraframe compression scheme. There are no temporal issues to make life difficult for the editor.

As we saw earlier, the requirements for a compression scheme in an editor are very different to those in a camcorder. In a fully-featured craft editor, there is a need to balance very high quality to ensure transparency after multiple passes with the requirement for a relatively simple codec which allows very many layers to be combined with complex effects.

For this reason, most edit suppliers have developed their own internal codecs to which camcorder formats are transcoded on ingest. Examples are EDIUS HQ and Apple ProRes 422. Some manufacturers have already developed a JPEG2000 input, and the others are sure to follow, particularly as it is in the interest of the editor to encourage the use of a high-quality, non-compromised, intraframe only origination format.

The latest version of the EDIUS craft editor supports JPEG2000 directly on the timeline without the need to transcode.

As already noted, the bit-layering approach inherent in JPEG2000 makes it possible to chose lower quality layers when the CPU power or disk

performance is limited. For example, with editing on a laptop computer, the edit process is just as quick and convenient as on a desktop, and when the edit is finished the system conforms the high resolution material for transfer as a file to a playout system.

The Thomson Grass Valley Aurora news editor – which provides basic editing functionality on a journalist’s workstation – can also access this JPEG2000 embedded proxy and work directly on it, including jog, shuttle, and sub-clip selection, again with the decisions made conformed on the full resolution material before delivery.

CONCLUSION

As a codec for acquisition, JPEG2000 offers a number of important advantages:

• Full resolution, full colour capability;

• Superior compression performance, with freedom from artefacts and the complete absence of blocking;

• Lossless and lossy compression, with visually satisfying performance at practical bitrates of 100Mb/s or less;

• Graceful degradation under severe pressure;

• Intraframe coding for freedom from temporal constraints;

• Multiple image resolutions created as part of the compression process; and,

• Progressive transmission by pixel and resolution accuracy.

Implementing this codec in a camcorder is a challenge as it is computationally complex. However, because JPEG2000 is an open standard developed to be applicable to a broad range of markets, ASIC manufacturers have sufficient interest from other industries to recognise the high volumes needed to justify the development of dedicated encoding chips. This has allowed manufacturers of relatively specialised equipment,

such as broadcast camcorders, to build systems which are within the accepted norms of power consumption, weight, operational practicality and affordability.

JPEG2000 is seen as the codec to meet many of the challenges and requirements we face in acquisition, production and archiving master content developed for both broadcast and digital cinema applications. Today’s users will migrate towards it as they introduce new products like the Infinity digital media camcorder alongside legacy equipment.

The benefits are enormous as JPEG2000 provides a cost-effective, full HD resolution compression solution for camcorders, and also one that offers increased flexibility for editing. Furthermore, it is scalable to even higher resolutions and visually lossless performance should that be needed in other applications – making it a key component in future-proofed solutions.

Documents

Understanding and implementing JPEG2000 compression for long-form EFP acquisition