Evaluation of PNGX ideas

Kirinn Bunnylin,

This describes a variety of trivial additions that could be made to the PNG image format, improving its performance without compromising its simplicity, and seeking to maintain backward compatibility.

ZStandard compression

Zstd is well-known to compress better and run faster than DEFLATE. It has become widely adopted, and appears to have a tolerable license grant, making it acceptable to use in free software.

Adding zstd as compression algorithm 1 should be an easy win. This doesn't affect image pre- or post-processing, or the filtering step; the only change needed is to dispatch the bytestream to a different compression library when compression value is 1.

Other compression algorithms could be considered, but in order to retain simplicity and maintainability, only one new algorithm should actually be defined for PNGX. Zstd seems like the most obvious step after DEFLATE.

Dictionary support

For greatly improving storage of a collection of images, such as for a game, zstd's dictionary support could make a big difference. However, since this entails managing an extra shared dictionary file, it would complicate the image format too much. A game developer may want to add it in a custom implementation.

New filters 5..15: Difference from more locations

The original filter set only has a difference predictor for the byte to the left and directly above, and an average of the two. Basic filters like this are cheap to add, so we may as well use the full set from WebP. The WebP license has a patent grant intended to allow free use and modification by anyone not patent-litigating against them, so I would cautiously consider these as acceptable for free software.

Two additional filters for images with ordered dithering could be the byte from four bytes to the left, and four bytes above. If above goes out of image, it should not be used by the encoder, but if present, acts as filter 0 direct copy. The four bytes to the left one should copy from directly above for the leftmost four bytes, which becomes a direct copy if used for the first four pixels of the image.

New filters:

Averages and clamps are mainly useful for images where colors follow a natural gradient order, truecolor or grayscale. For indexed color images, it may be best to just ignore the average filters when encoding.

New filter 16: Median adaptive predictor (MAP or MED)

A median filter is a powerful addition beside the existing Paeth filter. This was described by Martucci in 1990 so definitely free to use by now, and is computationally comparable to Paeth. This is the same as the JPEG-LS nonlinear predictor.

The neighboring bytes are here notated with compass directions.

if nw >= max(w, n) then P := min(w, n)
else if nw <= min(w, n) then P := max(w, n)
else P := w + n - nw

This could be useful in all color modes, including indexed.

New filter 17: Gradient adjusted predictor (GAP)

This is used in the CALIC codec from 1997, by Wu and Memon, surely free to use by now.

It has 7 source bytes instead of the 3 used by MAP. The neighboring bytes are defined as steps in compass directions: w, nw, n, ne, ww, nn, and nne. If any such byte is outside image bounds, substitute with a 0 value.

Let dh := abs(w - ww) + abs(n - nw) + abs(n - ne)
Let dv := abs(w - nw) + abs(n - nn) + abs(ne - nne)
Let D := dv - dh

if D > 80 then P := w
else if D < -80 then P := n
else begin
P := (w + n) / 2 + (ne - nw) / 4
if D > 32 then P := (P + w) / 2
else if D > 8 then P := (3 * P + w) / 4
else if D < -32 then P := (P + n) / 2
else if D < -8 then P := (3 * P + n) / 4

Since this makes guesses based on gradients, it works best on truecolor and grayscale images, where colors are in natural gradient order. Indexed-color images can have such a palette order too, but probably don't. You could analyse the gradientness of the palette and decide based on that whether to test GAP during encoding, but it's easier to never use GAP when encoding indexed-color images.

New filter 18: Gradient edge detection (GED)

GED is positioned halfway between MAP and GAP, published by Avramovic and Reljin in 2010. It's not clear to me if this is patented, so it can't be used without further investigation.

This uses 5 source bytes, here notated with compass directions. T is a configurable threshold value, which is suggested should default to 8.

Let gh := abs(ww - w) + abs(nw - n)
Let gv := abs(nn - n) + abs(nw - w)
Let G := gv - gh

if G > T then P := w
else if G < T then P := n
else P := 3 * (w + n) / 8 + (nw + ww + nn) / 12

New filter: 0x80 Subtract green

Another simple transformation from WebP, this subtracts green from the red and blue bytes when encoding, leaving alpha and green itself unchanged. When decoding, green is added back. Since green correlates somewhat with luma, and luma somewhat correlates with all colors, this operation partially eliminates brightness gradients from the image for improved compressibility.

This only makes sense in truecolor mode, and may combine well with another filter applied afterward during encoding. (Adding green back in decoding is done after defiltering.) I propose to add a Subtract Green Bit as the MSB of the row predictor byte.

New filter: 0x40 Nibble Flip

Indexed-color images, particularly at 4bpp or less, often use ordered dithering, variations of a checkerboard pattern. PNG does a fair job compressing this, but if the pattern can be simplified, it may work even better.

I propose adding a Nibble Flip Bit as the 2nd MSB of the row predictor byte. If set, the predicted pixels on the row are nibble-flipped after calculation (likely flipping all pixels on the row in one go for efficiency, after predicting all pixels on the row). This does increase encoding complexity somewhat, but the Nibble Flip only makes sense with indexed-color images at 4bpp or less, so truecolor and 8bpp encoding would be unaffected.

It's only called Nibble Flip because it sounds funny, but actually it should swap all pixel pairs within a byte. At 4bpp, swap the top and bottom nibbles; at 2bpp, swap the bit pairs within both nibbles; at 1bpp, swap all bit pairs.

I assume this has already been invented and possibly patented elsewhere, so caution is advised before using this.

Predictor for the predictor

Instead of storing a predictor byte at the start of each row, the active predictor could be derived from evaluating what would've been the best predictor for the last byte or several bytes. Start with filter 0, then for each byte in sequence after the first pixel, if the active filter predicted the last pixel correctly then keep the active filter; otherwise switch to whichever filter predicted the most correct pixels over the last window of interest. Basically any-size sliding window can be used without performance impact. Likely window sizes range from 1 up to distance from start of row. If multiple predictors tie for best, use the lowest-numbered one.

This would break backward compatibility, requiring non-trivial encoder and decoder changes, so it's out of scope for PNGX. There are also some pathological cases where this would perform worse than the best per-row predictor. But this would remove the need to test different row predictor combinations when aiming for best compression, to some degree actually simplifying the encoder.

I assume this has already been invented and possibly patented elsewhere, so caution is advised before using this.

Sort palette indexes

When a prediction is missed, the compression algorithm has to encode the difference between the prediction and real value. If the palette is randomly distributed, those difference values are likely to be large as well. Sorting the palette may allow minimising difference values.

The simplest way to go is counting how many pixels of each indexed color are present in the image, then sort the palette with the most frequent colors at the bottom. A possible further improvement would be to sort each color as close as possible to colors it is most commonly next to in the image.

Keep a table of all colors * all colors. For each byte X in the image, where the byte to its left is A and the byte directly above is B, add +1 to table[X][A] and table[X][B]. For the top row, ignore B; for the leftmost byte, ignore A. At end of image, use the table to sort the palette.

Using the table could entail calculating a weighted average from each [X] to all its neighbors' current palette indexes, distance weighted by total share of that neighbor relative to all neighbors of [X]. Repeat this process at least once, but probably a few times, and the palette should settle close to optimal order.

PNG's bytewise filtering means palette reordering is only likely to be useful for 8bpp indexed images.

Since palette order can be important, it is not safe to rearrange the palette in every image, so this can't be part of PNGX. However, game developers could make use of this as a custom addition, knowing which images can allow palette reordering. This can cause a possibly significant encoding slowdown, since the image must be iterated an extra time and the final sorting needs to calculate a lot of numbers, but there would be no changes needed or slowdown in decoding.

I assume this has already been invented and possibly patented elsewhere, so caution is advised before using this.