Monday 15 February 2010

High Definition, High Resolution and Electronic Cinematography

For an assessment of my research, please see the blog entitled Time and Resolution: Experiments in High Definition Image Making, which outlines my work and current findings. Another set of ideas I've been working on, in terms of how colour is represented can be found at: The Concept of Colour Space from the practitioners Standpoint. You can find other papers of mine at: academia.edu

New sensors scheduled to deliver in the coming year from both Red and Arri are even more promising. Despite my nostalgic love for film, I fear it may soon be a fond memory on all but the most specialized productions’. Johnathan Flack cinematographer, Cinematographers Mailing list, 18 january 2010.

I’ve been minded to try to summarise what has happened in High Definition technologies for a while now but it was only having to answer the mistaken notion in a recent document that went under peer review that made me realise that the general understanding (of academics at least) is still back a few years in origin. People often talk in terms of High Definition - yet this term alone creates a huge confusion about what is actually going on.

One of the two reasons HD was called HD was to combat Film’s hold on image generation for feature films. Sony and various other electronic corporations needed to rebrand what they were doing to successfully challenge the corporations such as Kodak who dominated production of professional and consumer materials in this area. The other reason the word ‘High’ was used was because the new format was to replace standard definition equipment in one of industry’s new-broom maneuvers which allowed them to sell a lot more product.

I want to cite a distinction that we shouldn’t waver from: High Definition is proprietary in form - that is it comprises of data that is packaged in a particular file system, generally with a pixel count of 1920 x 1080 or the other popular system 1280 x 720, under a certain amount of compression that make is handleable by the proprietors other equipment. In other words when you use High Definition you’re buying in to a group agreement about a set of easily handleable functions that capture an image at one end and and display that image within certain tolerances at the exhibition end. A particular system of compression that is best suited to streaming video, or compressing for DVD or Blue Ray is that of creating Groups of Pictures - say a set of 7 - where the first frame and the last frame have a set of common reference points and the frames in between throw away that information and only update the elements that change. This system is particularly bad for depicting morion as practically everything changes in each frame, but the tolerance of the system is set to limit information - that’s a contradiction. A short Group of Pictures is therefore less qualitative than a 15 picture Long GoP structure.

On the other hand, High Resolution technologies, such as most Electronic Cinematography forms do use a package, but this is for convenience, it’s intent is to retain as much data, or information generated by a photosite as possible. Photosites are locations where light is turned into data. Pixels are not photosites - pixels can be an aggregation of the information generated by several photosites - or just representative of one photosite. The point is the idea of a pixel is really of a value which may be generated by one or more contributing elements.

Compression still exists in this form, but it is limited because the central thrust of this technology is to retain as much data about tonal latitude, resolution and colour from the natural world as possible. Also, ownership exists in this form - Red’s R3D file system is owned by then and no one else. Also it is compressed - maybe to one tenth of the data that was available to the sensor - but, being wavelet driven as opposed to DCT driven (where DCT means Discrete Cosine Transform as opposed to the rival Wavelet Transform version) it is less destructive of the data and information can more easily be retrieved or simulated from this form of compression.

Data cameras utilise logarithmic (log) as opposed to linear colour space. ‘Linear’ is a product of preparing an image to travel through older analogue technologies (phosphorus screens) and therefore has limitations placed upon it to work in that domain. Linear utlised 8 bit Colour and nowadays we are looking for 14 or even 16 bit systems. Log maintains a higher level of data but tends to look not so good on contemporary display - so we introduce ‘look up tables’ to treat the monitor with to make sure the image looks ok on set.

Lastly - data recording means that changes are made to the image after the data is recorded unlike in proprietary systems where treatment are made at the time of recording. Sony was famous for throwing away around 500 pixels in its HD cam system which recorded only 1440 of the 1920 pixels it was trying to represent.

For a masterly description of the discussion around colour space, see Douglas Bankston's article In American Cinematographer, The Colour Space Conundrum. Or my own paper on colour space: The Concept of Colour Space from the Practitioners Standpoint.

So - since beginning my fellowship in September 2007 there has been a fundamental change in the thinking around high definition technology.

Before this time, development of the technology was in the hands of large corporations who employed specialists in various areas to create systems that handled data or information that assembled 25 frames per second into a moving image stream.

Opinion about what was good and what was bad, correct or inappropriate to the process was derived from a cultural attitude assembled through 100 years of corporate development. Research labs, product labs, customer testing and so on assembled and produced the products that industry and then consumers had access to and were encouraged to demand.

From early experiments by John Logie Baird, Vladimir Zworkin and Philo T Farnsworth, modern analogue technologies developed finally via Bing Crosby’s Ampex Corporation into what we now know as standard definition analogue broadcast tv in all of its variants, Pal with 625 lines and 25 frames per second (due to the 50 hz electrical system), Secam with 819 lines, NTSC with 525 lines 30 frames per second (due to the 60hz electrical system) and no colour reference signal, Brazilian PAL with 525 lines and so on, all of which lead towards early analogue HD systems such as NHK’s 1125 line system and Philips Mac 1250 line system.

Due to the persistence of vision factor, where in early film 18 frames per second needed to be held in the gate whilst a circular wheel with two or three slits in it span and allowed two or three flashes per frame (i.e. 18 frames flashed three times was 54 times per second), analogue video needed its 25 frame structure split into two fields to achieve enough flashes per second. Pal with 25 frames, split into two fields of 325.5 lines flashed 50 times a second to create persistence of vision and NTSC flashed 60 times a second. Splitting the frame into two fields is called interlacing and the resulting resolution is half the line structure. 625 lines is 312.5 lines in resolution.

The reason that television as a form developed the idea of interlace (besides the persistence of vision factor) was that the lowest link in the chain was display. Cathode Ray Tubes (CRT’s) worked by having an electron gun fire electrons to excite phosphor on the surface of the screen. Magnets pulled the beam left to right (because were in a society that values that direction more than the Japanese or Chinese), then the beam switched off, another set of magnets pulled the beam fractionally down, the beam was switched on and pulled left to right again. Glass technology was limited to a certain size and electron beams could only sweep so fast before their accuracy was lost. 625 lines was close to the limit of these technologies at the time of invention.

Meanwhile early digital video was developing into digi beta where the remediation of the incoming digital signal imitated and developed into something akin to a line system of 625 lines in Europe and 525 lines in Japan and America. However, there were no lines, only pixels: 720 x 576 in Europe and 720 x 480 in NTSC. Anamorphisation was used to take the European digital system into a 16:9 aspect ratio as it was naturally 4:3 originally. Cathode ray tubes were the main display system and so early digital video used interlace to display the image.

Having established a pixel count the way forward into ‘High Definition’ was established. HD was so-called because the manufacturers were interested in creating a product to rival film and therefore enable them to come into the film market place - they wanted a system that had more Kudos than old analogue terms suggested. Sony still calls its HD system Cine Alta. Due to the financial might of America and Japan. Due to electrical system frequency (50 or 60 cycles a second) a relationship developed, (as it had already done in analogue calculations to derive first 408 then 625 in the European system and in 525 in the US and Japanese system) between it and the ‘natural’ amount of pixels that might easily be recorded and then displayed - From capture through to display is, after all, a chain where the lowest level of information at any point is the determining factor of what the viewer sees - this is the modular transform function of the any system - it’s weakest link in effect in the chain.

So we come to the technology that was prevalent prior to 2007. Having won the frequency battle (50 or 60 cycles per second) Japanese and American manufacturers offered proprietary systems of image capture and display that was sympathetic to the outgoing interlaced paradigm. This meant that we still had lack of a reliable standard between manufacturers because on a financial level - to the victor goes the spoils. This was seen in the early analogue battle between Sony and Mitsubishi who were proposing VHS and betamax. At that time it became clear that the best system might not necessarily win (by best I do not mean most reliable, just best in the sense of image representation - it is arguable that betamax was more fragile on a mechanical level).

What was evident from the start was that film recorded a progressive image and needed many flashes to create persistence of vision and digital video might do this but the forms of display were just developing - so interlace still persists - but it must be remembered that a 1080i picture is only 540 lines in resolution!

At that time computerization and storage was not as developed as it is now and so manufacturers were recognising that though they basically had a standard they could comply to much data had to be thrown away to record an image on tape. The argument about whether 1920 x 1080 or 1280 x 720 was the right standard were still being argued about - where due to simple maths the lower level pixel image could have a greater recording of colour information than the higher level one (also the 720 line system was fundamentally higher in resolution than the prevalent 1080 i signal - it’s only today that we have 1080p in such amounts).

In recording an image the issue is always of how fast your data capture, transfer and recording is (until we pass through the barrier where we can capture and record more data than the eye and brain can see). For me its a question of plumbing - how big are your pipes and your tanks. The manufacturers were using systems of compression and decompression - that is trying to keep as much data as was necessary to fabricate an image that was seemingly high definition.

It would pay to look at the difference between Fouriers’ Discrete Co-sine Transform and Fouriers 1807 innovation which is now becoming the compression system to use - Fourier’s Wavelet Transforms. I refer you here to Astrophysicist Amara Graps work on Waveforms: http://www.amara.com/current/wavelet.html

As Amara says: The fundamental idea behind wavelets is to analyze according to scale. Indeed, some researchers in the wavelet field feel that, by using wavelets, one is adopting a whole new mindset or perspective in processing data.

Then there’s the issue of recording Groups of Pictures or GoP’s which seek to only record the changes in the pictures as they proceed rather than the whole picture and this eventuates in artifacts, blocking into squares that the developed eye finds so painful. This is the system of compression beloved of the internet, DVD’s Blue-ray, streaming etc.

If you have a long GoP of 15 frames where the image is refreshed every 15, it stands to reason that it’s an image of less resolution than a short GoP of 15 (in a nutshell Sony at 15 and Panasonic at 7). This is the basic system in streaming across the internet which is fine in that form (for a while) but there’s no place for it in image capture any more.

Then Red came along to make the data argument public. It’s not that Red were the first, others had proposed systems that did not throw away data, or rather recorded information in packets of data that were less lossy than the proprietary systems (HD cam for instance throws away 500 lines, just like that, so that there’s 1440 lines left to compress to record)

So - the basic change that’s taken place since 2007 is the cultural change that no longer accepts image compression as a de facto stance in image capture and display and would rather have all of the data captured so that choices could be made at varying points along the pipeline about what is kept and what is thrown away.