Photo mogrification: lossy -> lossless

Today I manually backed up nearly 600 images from the gallery to the cloud. This is good, but in the process I realized that many of those images are lossy jpeg files, which will slowly erode over time.


Archival scan – PNG 25,354,428 bytes. ©2003 WSaewyc

The basic theory of photo digitization is: you can store a really good version of an image either in a lossy format, which is relatively small and therefore fast/cheap to move over the internet, or in a lossless format, which is relatively large and expensive to move over the internet but is easier to later alter, and will not degrade in quality due to editing or resizing.

Archival scan – JPEG 4,651,815 bytes. ©2003 WSaewyc

Degrading in quality, in the lossless format, is very much like the copy machine problem: you can make a facsimile of the original, but it will never be quite as good as the original, and a copy of the copy will be even less good, and so on. As an example, if I see a nice JPEG[O en.WP] image on a website and grab a copy of it, it was probably resized for the internet, and even if it was saved at 600 dpi the resized version – because of the compression method – is not quite as nice as the original. And if I crop the image slightly – say, removing the white borders – even though I save it with exactly same number of pixels as it should have it will still be slightly less accurate. It is because every edit or alteration – even solely increasing the quality of the image save – reduces the exactness.

For almost all uses of an image other than high resolution printing this is not noticeable by the average person. But there is one particular application where the lossy format is particularly problematic: archives.

If you are working with a fragile, unique object – for example an old family photo – you would like to plan that your scan of the object will be the last time it will ever need to be exposed to bright light. For that to be true you need to expose it once and capture a very high resolution digital copy which is stored in a manner that preserves it for as long as is reasonably foreseeable – and that means avoiding a lossy format.

The other point of this is ease of future use and manipulation. By far the most universally useful lossless format is the TIFF[O en.WP] format. Unfortunately, it is also encumbered with patents, is very complex due to its extensibility, and produces the largest storage files except in certain applications such as bi-level images (black-and-white, which is not the same as black and white photography which is actually 256/thousands/millions/billions of shades of grey.)

The next best thing for archival use is PNG[RFC en.WP], which is lossless, compresses very well, and has an extremely popular open source library which is very well supported across many software markets and operating systems. A key issue is ancillary chunks, allowing PNG files to carry extensive metadata making it very useful in graphic editing settings, but which are often proprietary extensions not supported by all libraries. An archival image should retain these, but a copy being shared on the web or sent to the printer should have them trimmed out. (This has actually been causing me an issue with the cloud software.)

And the point of this whole monologue is to explain why I am mogrifying[O R en.WP] all lossy formats in the family photo collection to PNG. This will result in a small loss of quality from the previous files, but better to lose a small bit once than to slowly erode over time. I have every hope that some, even many, of the photos in our collection will still be being shared amongst our relations and descendants 130 years from now – just as we have physical images now from 130 years ago.