Ideas and algorithms

I'll present some of my odder ideas here. They're mostly of a mathematical nature or deal with programming tricks, so if you're not a hacker you may not get much out of these. I'm bound to make some errors now and then, and half of these ideas will never work due to some factor I haven't taken into consideration. Also, the ones that do work have probably already been thought of by others before I ever wrote these.

Superb interpolation technique:
Differential Cubic Interpolation
Compression ideas:
Audio Compression using Rebuilding
Video Compression with Multiple Delta Layers
Audio Compression using Gradients

Simple audio filters:
Extremely simple Digital Low-Pass Filter
Another simple Digital Filter-Booster, Extended


Frameskipping and framelimiting
Sprite blitting

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

BRIEFLY: The interpolation methods traditionally considered the best do not
         actually give optimal results when accuracy is desired. Although
         some of my earlier techniques provide extremely accurate
         interpolation, those have certain shortcomings. Differential Cubic
         Interpolation is, I believe, conceptually perfectly accurate.

There are two main families of two-dimensional interpolation techniques:
those that only consider the two values to interpolate between, and those
that also consider some surround values to provide continuity in the curve.

Linear interpolation is the primary representative of the first family.
Another one is Cosine interpolation, which to my understanding scales half of
a cosine curve to form a smooth curve between two points. These methods are
simple to implement, but not necessarily accurate. Consider a sine wave
sampled at a low frequency. Linear interpolation creates an edge at each
sampled value. Cosine interpolation adds a little weird oscillation around
the proper sine.

Cubic or Gaussian interpolations represent the second family well. By taking
into account some surrounding values, the interpolated curve can avoid
creating edges or most other disturbing artifacts. I can think of two
downsides to using these: they are slightly more complex than methods of the
first family, and they introduce a certain kind of inaccuracy that does not
exist in the first family.

If you imagine the sampled sine wave interpolated using the latter methods,
you'd probably find the end result very close to original if the sampling
rate is at least a dozen hertz. However, if your original wave has a sharp
peak that is preserved in the sample, these interpolation methods will create
a new peak a little before or after the sampled value and at
a different amplitude.

While the difference may not be dramatic, it exists. The real peak might of
course not be at the sampled peak location, but without complex frequency
reconstruction information we do not know at which side and exactly how far
from the sample peak the original peak was. Since the original peak could've
been anywhere around the sampled value, it is most desirable to have the
interpolated curve peak at the exact sample location to meet the highest
probability of being right.

The idea behind Differential Cubic Interpolation is to control the gradients
of the curve between two points to force peaks in their places. If at every
peak sample the curve's gradient is zero, the interpolated peak falls at the
desired place.

The interpolation process has two steps, which complicates matters a bit and
makes this method a bit slower than regular cubic interpolation. The first
step is to decide on the gradients to use for the beginning and ending
points. The second step is to plug the numbers into the formula and do the
curve.

The formula is simply a combination of a cubic curve and its derivative.
        y = ax^3 + bx^2 + cx
        g = 3ax^2 + 2bx + c

So, you have two points: x1,y1 and x2,y2. You want to interpolate with
a function where n is in the range [x1..x2].

       ( (  g1 + g2 - 2dy / dx         3dy / dx - 2g1 - g2  )          )
f(n) = ( ( -------------------- * n + --------------------- ) * n + g1 ) * n + y1
       ( (          dx^2                        dx          )          )

where g1 is the desired gradient at the beginning point
      g2 is the desired gradient at the ending point
      dx = x2 - x1
      dy = y2 - y1

The function is here optimised for the fewest number of operations, so it's
pleasant to use by programmers.

I'm not the first to think of this - Cubic Hermite (Catmull-Rom) spline interpolation
gives identical results to mine, though it assumes the unit interval dx has
been normalized to 0..1 while my function lets you change that value as well.

Now, deciding the gradients is a bit nasty from a computer's point of view.
I haven't yet thought of a straightforward mathematical way to calculate
gradients that wouldn't result in peak displacements, so some conditional
jumps are required. The spline article mentioned above lists a few different
ways to calculate the tangents, but apparently none of them respect the goal
of retaining exact peak locations.

To select a gradient, follow these rules:
If the values on both sides of a point are on the same side of the point, it
is a peak and must use a gradient of zero.
Otherwise the gradient should be some sort of linear gradient between the
previous and next points. Or possibly, the gradient from the previous to the
current point, or from the current to the next, whichever is closer to zero.

To make it simpler, you could of course just stick in a zero for every
gradient, but then you might as well use cosine interpolation as the result
would look pretty much like it.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

BRIEFLY: It might be possible to break down a sound to its components, only
         store those, and then reconstruct the sound using those. This might
         not rival any modern audio compression schemes and the reconstructed
         audio stream might sound horrible anyway. I haven't been able to
         test this idea so I don't know if it's any good.

From my high school physics lessons I remember that every sound can be broken
down to its frequency components. They say it's also possible to
mathematically reconstruct any sound from a bunch of sine waves. So I wonder,
why not save a bunch of sine wave parameters instead of direct audio data? If
you break the sound down to enough frequencies and update the parameters at
a high enough refresh rate, the reconstructed sound should sound
approximately similar to the original, even though the actual wave would look
noticeably different.

Take some nice wavelet and slip that through some function to separate the
frequencies. Perhaps 512 or 1024 frequency steps would be enough, and the
refresh rate might be 10-20 times a second, if you then interpolate the
frequency amplitude values upon playback. You could use one of my cutesy
interpolation techniques for that.

For every frequency step you'll need to save one amplitude value, for which
perhaps 8 bits is sufficient due to the logarhitmic nature of sound loudness.
When playing back the sound, the 8 bit value could be raised to the second
power.

Reconstructing the wavelet is done by mixing together a number of sine waves.
These could be viewed as channels in module music, maybe, except each channel
produces only a continuous sine wave at a fixed frequency and varying volume.
The only thing changing upon playback are the volumes of the channels. Each
channel should maybe also have a volume multiplier for easy bass&treble
boosting. Each wave should be mixed together multiplied by
(channel count / maximum output amplitude), maybe, or whatever you need for
clipping and controlling the output. Playing back sound would be very easy
and quick to compute this way, only the compressing would take computing
power.

512*10 would yield 5120 bytes a second, 1024*20 would yield 20480 bytes
a second, in stereo double that. To compare with MP3, in kilobits per second
these would be 40 and 160 kbps, in stereo 80 and 320 kbps. The size might be
reduced by saving the amplitude values as a bitstream where each frequency
step has three value sizes, either 0 bits (no change), 4 bits (+/- 1..7 from
current value) or 8 (full value). If some frequencies are not used as much as
others the file size would drop. For a regular music piece this might mean
a reduction of a quarter.

It should be noted I had a theory on breaking a wavelet apart without using
complex Fourier Transformations, but turns out that didn't work. As it is, I
haven't managed to break a sound in frequencies myself and thus haven't been
able to test this compression theory. Also, I'm not sure exactly how many
sine waves we absolutely need for a recognizable reconstruction. I keep
wondering how the frequencies between the measured steps are calculated in...

That aside, I would suppose 512 frequency steps should be quite sufficient,
and a refresh rate of 20 hertz should maybe be a bit higher. The frequencies
should possibly be calculated from 20 Hz - 40 kHz. Of course one might be led
to ponder how other wave types such as triangle and square waves would
compress with this.

Relay further thoughts to a most curious me.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

BRIEFLY: A possible way to compress a digital video stream using deltas on
         multiple layers. I don't know much about the best video compression
         codecs today, but at least some time ago there were two major kinds
         of them. MPEG apparently uses some sort of motion estimation. The
         other kind compresses by storing only changes from the previous
         frame. My method belongs to the latter breed.

Let's pretend you have a video stream. A bunch of sequential video frames
that you want to compress. We have some basic information on those as we
start, such as the resolution, the framerate, the length, and the color
depth.

I consider compression time irrelevant, so this scheme is aimed at fast
decompression. Probably doesn't beat Smacker video though. Most movies have
true colors, 24 bits per pixel. In my opinion some of those bits can safely
be scrapped without the viewer noticing anything, especially since all pre-
and post-compression work on the video stream is of course done in the
highest possible bit depth, and a little anti-aliasing on the decompressed
image can restore some smoothness to the colors.

This is how a string of frames is compressed, then. You have frame A, the one
you're showing currently, and frame B, the one next in turn. Divide both
whole frames into nine or maybe sixteen about equal-sized squares. Compare
each square in B to the corresponding square in A and calculate the average
color values of both. That is, add together each color component of every
pixel in the square and divide that by the area of the square. If we're
working with RGB, you'll get the average red, green and blue intensities
within that ninth or sixteenth of the frame. If there is noticeable change in
the average, then shift the color component in whole square A toward the new
value. The color component shifts for each square are saved in the compressed
stream using a signed few bits, once for every frame. This quite handily
takes care of fades and larger changes in the movie.

Next divide both whole frames into smaller blocks. 8x8 pixels seems to be
a standard size, but you could easily use 4x4 or 16x16 depending on your
resolution. Do a similar average color component comparison on every little
block and if there is noticeable change in the averages, shift the block in
frame A toward the new average. Every block should maybe get half the amount
of bits for the delta the movie's color depth is, when writing those in the
compression bitstream.

Note that this scheme allows for a bitrate to be defined, as well as lossless
compression. If a bitrate is defined, then keep track of how many bits you've
allocated during the following steps. For lossless compression you can use as
many bits as you ever want to produce deltas for an identical frame.

If you want to compress losslessly, this step can be skipped. Go through
every block again and calculate the average absolute change of each block
from A to B. The absolute change are the color component deltas added
together. In RGB you'd do a |dR|+|dG|+|dB| on every pixel. Store these values
in a table somewhere. This way you get a table showing which blocks have the
most changes and thus need bits most desperately. You'll of course need to
sort the table before using it.

Now proceed through the blocks using the table and start allocating them bits
as long as you have available more than ( the remaining number of blocks
multiplied by the minimum number of bits needed for each ) bits. Calculate
how many bits each block needs for storing a lossless delta for each bit in
it, to change from your already altered frame A to the new frame B. Then
apply the color component adjustments on frame A and save those in the
bitstream. Keep doing this until you can only allocate the minimum number of
bits for every pixel left, then sweep through the rest of the blocks using
that. This is to ensure that every block gets at least a little change for
every frame, if necessary.

Once you're done with this, your frame A has been changed noticeably and now
resembles frame B, and if enough bits were available, is identical to it.
Also, you've stored the instructions for doing the same changes in the
compression bitstream. Proceed to load the next frame in your movie as your
new frame B, and start comparing again.

If the viewer wants to start watching the movie from halfway, it would be
inconvenient to have to decompress the whole thing from the start. That's why
we have keyframes. A keyframe is placed in the compression stream now and
then, for example once every five seconds, and is basically a somehow
compressed direct bitmap. You could possibly only save a quarter of the image
in every keyframe, if they keyframes are placed near each other, that way the
bitmaps take only a quarter as much space and the movie will start looking
perfectly normal after three keyframes have passed.

You could also use a YUV color system instead of RGB and get somewhat tighter
compression, however YUV by its nature doesn't allow lossless compression so
RGB should be left as an option. Possibly the bitstream could be Huffman
encoded for further reduction in size but I don't know how that works. Also,
I suggest including special command support for every frame, for example for
showing subtitles by the decompressor program instead of being compressed
along with the rest of the video.

Decompression is easy. You keep your last frame in memory and apply the
deltas on that from the compression stream. First the larger ninth or
sixteenth deltas, then the smaller blocks, then on single-pixel level.
Perform scaling and anti-aliasing if desired and voila: one decompressed
video playing back feasting your eyes.

I have no idea how effective this codec would be.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

BRIEFLY: Modern audio compression techniques splice the sound into pieces and
         use psychoacoustic models to drop some parts, with vectors and
         codebooks for compression. Ogg Vorbis can do very good stereo at
         bitrates of even below 100kbps. My method is tremendously simpler to
         understand and apply, taking a totally different approach. The
         quality and size, however, may be infeasible.

Just the other day I was looking at an explanation of how sampling affects
a natural wave. It included an image of a nice round wave, and some sampling
points. Connecting the points linearry gave a result not entirely like the
original wave. Using cubic interpolation was better, and the connected sample
points almost matched the original wave. A realization dawned:

The main mismatches were at places where the original wave had a gradient of
zero. The peak was often displaced and reformed. As it is, the locations of
the peaks are the most important, since they actually define the frequencies
in the wave. Having distinctive peaks every 1000 frames would mean there is
a sine wave going there at a steady frequency. So suppose the sampling is
done with sufficient frequency, and then the wave is stored using only the
peak points? Differential Cubic Interpolation, above, is made for giving
natural curvature in expanding such cases.

Some pride themselves on breaking and rebuilding the wavelet by performing
evaluations, discardings, and predictions based on its derived values. I push
aside the origin of the wavelet and just view it as a cute wavy line, as it
appears to those not as proficient in sound engineering. Sometimes taking
a fresh approach can bring useful innovations into the field. Let's consider
further to see if this particular innovation is of any use whatsoever...

Here are some illuminating images.
Here you see a sound wave in real life.
Notice the sensuous round curves.
Now the sound is sampled. Due to the
low sampling rate much of the wave is lost,
and really we'd use a higher rate, but this
is for the sake of demonstrational simplicity.
Here is the wavelet shown with linear
interpolation. This is what we need to compress.
We only have 13 dots that are connected
in this example.
Let's drop less necessary dots and only
connect the remaining ones using DC
interpolation. We have 7 dots left, most of
them being turning points. Notice the
differences and similarities with the
original and the sampled wavelets.
The above images of course do not tell the whole truth. They were made using
MS Paint so the interpolation isn't exact. At decent sampling rates less
complex sounds actually can be crawling with runs of dozens of removable
samples between turning points. If only the sampled value of each turning
point was necessary to store, such sounds could gain an instant size decrease
of 95% with practically no loss of quality!

More complex sounds enjoy more turning points, but still even at worst
average there are around six to eight samples that can be skipped. One minor
setback is that we won't get off by just saving the sample values. This could
be done by simply resampling a sound at a lower rate, of course, but then the
resulting wave would lose more and more of the intricacies of the original,
as you can see in the images above where samples aren't taken too frequently.

Since the peaks or turning points really matter, the distance between each
needs to be stored as well, in terms of the original wavelet. If there are
peaks at 0, 15, 19, 24 and 29, we should store the peak values along with the
numbers 14, 3, 4, and 4. Since at least one frame needs to pass, the value
can well be decreased by one for storing. An average peak distance might be
eight frames, so four bits would be well enough to store most distances.

Also, wavelets rarely use the whole 16-bit width available. If both the
accuracy and the bit width are expendable, we can at best even halve the
space that each stored sample value takes. Since the decompression is done at
the full dynamic range, the perceived loss of quality is not nearly as
dramatic as if you just cut half of the accuracy directly out. The actual
stored sample value should be saved in a more or less logarhitmic way, too.
Most sample values gather around the middle, zero. Therefore giving more
accuracy for this area at the expense of the edges may be desirable. This is
just my intuitive guess, though, so some testing would need to be done to be
sure.

What if a peak or turning point is not found in the desired range? Or what if
there is a small sine vibration in a steadily rising wave, but not enough to
give a zero-gradient at any point? Since the DC interpolation is capable of
smoothly drawing a line with any given gradients at both ends, it might be
wise to store some other important frames as well. How to decide when to
save?

First scan through your desired range, for example 16 frames. Calculate the
gradient of each point up to the first zero, or until the range runs out. The
gradient is easy to calculate by taking the sample values of the following
and previous frames for a delta y, and dividing that by delta x, in this case
a constant two frames. With sufficient sample frequency this is an adequate
method. Of course, the gradient is non-zero only if the previous and next
frames are not on the same side of the target frame.

Next scan from the start up to the zero, or if no zero was found, the
whole range. If there is a notable change in the gradient at some point, like
from 0.1 to 6.9, and the frame has more than one frame in between the start
and end of the scanned range, it should be stored instead of the zero.

When storing a frame that does not have a gradient of zero, either because
there was too much change in the area, or just because no suitable frame
existed in the desired range, the actual local gradient needs to be stored
as well. Most of the time this will be zero so a single 0-bit is enough per
frame. If a real value is needed, then a fixed-point number can be used. For
accuracy I'd suggest eight or nine bits. One bit for the sign, four for the
fractional part and a few for the integer part. The real part should
represent linear fractional values from 0 to 1, but the integer part should
be logarhitmic since the larger numbers rapidly lose significance and thus
need less accuracy.

In order to accommodate changes in the wavelet, dynamic bitrates are a must.
Suppose one part of the wavelet is overamplified action music that uses the
whole 16-bit range. Another part has moody silent strings that only use
a quarter. Obviously compressing the whole wavelet using only a single
bitrate will either give lousy quality for the action music, or waste space
for the silent strings. Therefore it makes sense to store the wavelet in
blocks of a constant, or even varying, size.

Using variable-size blocks would require scanning ahead in the wavestream and
deciding the most advantageous point to change. This should be done
carefully, as constant accommodation to the wavelet would require constant
changes and storing the changes produces some overhead as well. For now I'll
go with constant-size blocks, but variable-size is certainly an option.

Here, then, is the summary of how to compress audio for DC decompression:
1. Read wave frames to memory until you have a whole block
2. Calculate a high-resolution gradient of each frame in the block
3. Locate all peaks and other frames with notable gradients
4. Calculate an average bit value for storing most frame distance values
5. Adjust the list of frames to store to fit the distance value bitsize now
6. Compress all the wavelet sample values to an appropriate bit depth
7. Round the gradients of chosen frames to the lower bit resolution
8. Store the block: Frame Distance bitrate, Sample bitdepth, then each frame.
   Each frame contains: Sample value, local gradient, and distance to next.

I believe that compressing the distance values separately in a string for
each block would also be advantageous. Because peaks in a wavelet come at
definite frequencies, there is some repetition to be taken advantage of.
Using a form of LZW, perhaps as used by GZip, might further cut the distance
value storage by a third.

All this taken in account, how good is this compression method? Using the
suggested bit sizes and estimated distance between peaks or notable changes
in gradients, let's see... one out of eight frames stored. Maybe one out of
sixteen stored frames needs a separate gradient value. Average storage bit
depth could be 10 bits, and the distance to next frame five bits, though
further compressed. The ratio between eight original wavelet frames and
compressed ones would be... 128 to 11+5/3 bits. Multiply by 16 for the
separate gradient value to get 2048 to 200 or so. This multiplied by 344.5
gives about the number of bits used to store a single second of a mono
wavelet at 44100 Hz...

Reasonably good-quality mono sound would eat a bit below 70kbps. This may or
may not be a tie with MP3. I'm sure further improvements are possible for my
compression scheme, that might bump it past MP3, maybe even to match Ogg
quality. However, a test application would need to be written to hear exactly
how the sound changes when using my compression, and whether the quality
suffers more than I can forehear. Also I do not have a nice way to store
stereo sound, although I suppose subtracting one channel from the other and
then storing the other in this same way would give decent results.

One possible improvement for size and quality might lie in the distinction of
high, and medium to low frequencies. DC interpolation shows its best sides
for the lowest frequencies, which would compress very well using this method.
For the highest frequencies, though, it might make more sense to filter them
out and save in a different way. Say, from 10kHz upwards might count as high
frequencies. The downsampling method described in the previous article could
be perfect for separating the two. Whatever is left could be saved using this
method, for DC decompression. The high frequencies could possibly be stored
in a manner similar to the third article of this page, keeping only the
approximate amplitudes of the high frequencies, and upon decompression adding
them to the DC interpolated wave.

This gradient-based compression technique is inherently more simple to
understand and apply than either MP3 or Ogg Vorbis. I can't say I'm very
familiar with existing possibly superior compression methods, so odds are
something like this already has been invented.

Although modern personal computer users will no doubt demand quality before
speed and size, I believe this method might have some real use for handhelds
and other devices with more limited resources. This method is easy to program
without using floating point numbers. It's also possible to revise the DC
interpolation when decompressing into a modified linear interpolation which
might sound almost as good. The interpolation is the heaviest part of
decompression so changing it would allow for warp speed audio output!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

BRIEFLY: Very simple way to digitally cut out high frequencies from a wavelet.
         This technique uses what I affectionally call 'a lagger' to slow
         down the changes in the wave, effectively filtering out high
         frequencies while low ones pass unmolested.

I found myself wondering one day, how do those frequency filters and boosters
work. An obvious answer is, use a FFT to splice the stream and then apply
your favorite multipliers on the spliced bits before mixing them together
again. Trouble is, Fourier Transformations are relatively slow and complex.
(ie. I don't get how exactly they work.) So I came up with a simple method of
cutting down your high frequencies.

This whole theory works on the digital representation of a wavelet: Sampled
values in a stream of audio. It might not be useful at all for analogic
devices. The stream of audio, a wavelet, can be thought of as being composed
of a number of sine waves of various frequencies. If I understand correctly,
a Fourier Transform searches for peaks of sine waves. This could possibly be
done by using a derivative of the wavelet, showing the amount of change in
each frame. My method views the wavelet with a similar approach, not
concentrating on the actual sample values, but rather the changes from frame
to frame.

On the simplest level, it works like this. Instead of sending each frame of
the wavelet to your audio output, you keep track of the sample values and
send your tracking counter to the audio output instead. For a non-filtered
sound, allow the tracking variable which I'll call X from now on to move
freely up and down following your wavelet. The sample value of each frame can
be called Y.

Once you want to start filtering, you feed the change between the frames into
some nice function and move X using the result of that function. Consider
having harmonic triangle waves at 110, 220, 440, 880 and on, in a wavelet
playing at 11kHz. The highest triangle frequency is hence 3520 Hz. Then you
define the filter function to be: if dY > (quarter of max amplitude) then
counter = counter + quarter of max amp - square root of quarter of max amp + dY to some nice fractional power

If you have a 16 bit wavelet, the maximum amplitude is 65535.
counter = counter + 16384 - sqr(16384) + dY^0.8

This would allow a maximum unfiltered change of 16384 during one frame, which
means the highest thing you could get out unfiltered and at maximum amplitude
would be a 1378 Hz triangle wave. The highest undistorted sine wave would be
something lower, I can't calculate what. Now see what happens to our harmonic
triangle waves above 1378 Hz. We've got 1760 Hz. This would require a delta
of about a third of the maximum amplitude, some 21845. Instead our function
allows it a movement of 19217 per frame. This does alter the dynamics
a little but hey, what were you expecting from a cheap, easy solution? :p

This would mean a 1617 Hz triangle wave if we were in fact trying to reach
every turning point of the original wavelet. Good thing we aren't. In fact
what we'll get is a somewhat displaced, hopefully 1760 Hz triangle wave whose
amplitude has dropped from full 65536 to 57651, 88% of maximum.

Her big brother, 3520 Hz, desires a framey change of 41850. Instead, he is
allowed 21237. Calculate this: allowed movement / desired movement * max amp.
He'll hopefully end up a happily displaced but otherwise intact triangle wave
at an amplitude of 33257, 51% of maximum.

The function of course needs to be adjusted for the frequency of the wavelet.
You'll only want to allow half as much movement per frame for 44100 Hz as you
would for 22050 Hz.

Don't be intimidated by the maths presented above. Most of that is spent on
converting the wave to understandable hertz values. All you need to do, is
1) Prepare the X sample value pointer
2) Start reading the wavelet
3) dY = next value - previous value
4) Add f(dY) to X
5) Write X to audio output
6) Move to next sampled value, hop back to 3 if more wavelet to go

One thing to note: To make sure you get no distortion you'd need to calculate
dY using only the original wavelet, not X. However, this would quickly throw
X way off and might cause some undesired clipping in the sound. But if you
calculate dY = next value - X, then the resulting wave will be displaced
somewhat, but will follow the original wavelet more faithfully. Perhaps some
sort of hybrid would be cool, for example by using the no distortion method
only when it moves X toward the new sample value, and otherwise using the
displacing method.

Easy, isn't it! The only trouble arises with the nasty fractional exponent.
It might be a bit annoying to code and is the single slowest thing in the
loop.

Of course I haven't tested this out either, for whatever reason. It may be
that the resulting filtered output sounds horribly distorted, but at least
it's good for a few laughs in that case.

Side note: It might be possible to do a high frequency booster using
a different function, which would move X more than dY if dY is higher than
some arbitrary value. Also, you could possibly boost low frequencies by
making a copy of the low-pass filtered wavelet and mixing that together with
the original with some balancing multiplier. Any frequency that wouldn't be
filtered smaller would get instant 2x gain.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

BRIEFLY: Another simple way of filtering out high frequencies or boosting low
         ones in a digitally sampled wavelet. Or, in fact, with some
         creative combining, any adjustment of bass and treble in a wavelet.

Alright, so the previously described 'lagger' method may or may not alter the
dynamics of the sound to be filtered a bit more than desired. In any case,
there is another way to remove high frequencies, familiar to any of us.

A wavelet is comprised of a number of digital samples, for example recorded
from a live source by storing the pressure of air (sound wave) a number of
times per second. A popular number is 44100 samples a second. It sounds very
close to the real thing already. But what happens when you downsample a wave
to, say, 11025 samples a second? Using proper interpolation you get the same
wave with frequencies only up to 5512.5 Hz and some sound artifacts that
hopefully aren't too huge.

And what happens if you simply resample this with good interpolation back to
44100 Hz? Why, the frequencies stay away, while the actual form of the wave
remains basically the same.

Or downsample to 200 frames per second. Use proper averaging interpolation to
even out the wave, and you'll find yourself with a wavelet that at best can
contain frequencies of up to 100 Hz. Then resample this with DC interpolation
to make it natural, and mix in with the original. Frequencies at 100 Hz and
below are boosted.

Like this isn't enough, you can also do the above, but not mix the filtered
wave together with the original. Instead, since the basic form of the
filtered wave remains the same, you can subtract it from the original. For
simpleness, just negate the wavelet by multiplying with -1, then mix as
above. In doing this you cancel out the frequencies below 100 Hz. Who'd want
to do that, you ask, what's a song without bass? But use the first filtered
wave that only went down to 11025 samples a second. Muah! A high-frequency
filter. :p

Or give 2x gain for the original wavelet, then subtract the wave that had its
frequencies above 5512.5 Hz cut out. This boosts the cut frequencies by 2x.

The advantage of this method is, again, ease of implementation and relative
computational effectiveness. Also it should be noted that the frequency
cut-off is quite precise, which is not always a good thing. For some special
mixing effects it can be very cool, but for just adding a bit more bass you
may desire a filter-booster that has a progressive cut-off. Not unlike the
lagger filter, coincidentally.

Addendum: What I describe here is basically a moving average low-pass filter.
Brilliant, but a bit late for the game.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Briefly: Frame skipping and frame limiting can be done the right way or any
         of a number of wrong ways. Tying program state updates into the
         display's refresh rate is a common offense. Here are some thoughts
         and pseudocode suggesting a superior frame control method.

In anything except user-input dependent scenarios, a program needs to update
the screen constantly and smoothly. 3D engines come to mind first when the
term FPS is mentioned, but sprite-based, tile-based, even real-time ASCII
graphical engines need screen refresh timing. Neglecting to program in
a frame limiter and trusting that no computer will ever be fast enough to
make the game unplayable has been shown to be very annoying. Likewise leaving
out frame skipping and trusting that no computer is slow enough to make the
game unplayable.

There is a temptation to tie screen updates to the screen refresh rate,
particularly among console programmers. This is bad practice, in part due to
the legacy of two different video standards around the world: NTSC and PAL.
Porting a program between the two will then require recoding screen updates,
mainly resulting in European console gamers having to wait a few extra
months while the outsourced porting programmers scratch their heads.

The simplest way to implement a frame limiter is to add a forced delay
between all updates. Say, 20 microseconds before a screen buffer switch or
full screen blit. This guarantees the program will never run faster than at
50 FPS, and will often run slower than that. This doesn't take into account
the time spent calculating what to draw and the drawing itself, times that
might vary from frame to frame. But, it's better than nothing!

    FPS = 50
    MsecsPerFrame = 1000 / FPS
    Repeat
      Update program state
      Draw the frame into a buffer
      Sleep (MsecsPerFrame msec)
      Display the buffer
    Until Oblivion

An advanced frame update system could remember the system timer count at the
previous frame, and only sleep however long is needed on top of that to have
passed the desired 20 microseconds between frame displays.

    FPS = 50
    MsecsPerFrame = 1000 / FPS
    PreviousTicks = SystemClock
    Repeat
      Update program state
      Draw the frame into a buffer
      CurrentTicks = SystemClock
      TimeSpent = CurrentTicks - PreviousTicks
      TimeLeft = MsecsPerFrame - TimeSpent
      if TimeLeft > 0 then Sleep (TimeLeft msec)
      Display the buffer
      PreviousTicks = CurrentTicks
    Until Fun

That already runs quite nicely, and should be perfectly smooth if you can
guarantee the program never needs more time to update its state and draw the
frame than the 20 microseconds you want used per frame. On multitasking or
otherwise unreliable operating systems, unfortunately, you can never
guarantee this unless you can get away with running with real-time priority.
If the program does go overtime, everything will happen in slow motion, even
with the Sleep function skipped. This brings us to frame skipping.

When the program is running late, it should skip ahead by a frame or few to
catch up. The pseudocode below does this by updating the program state an
extra time for every frame to be skipped. Obviously you'd use integer
division instead of floating point, unless you had something creative in
mind. It also makes sense to define a maximum amount of frames to allow
skipping past; there may be an OS hiccup that prevents a screen update for
two seconds, and it's then better to continue where you left off rather than
two seconds further into the action, possibly having missed critical user
input during that time.

    SkipFrames = 0
    MaxSkip = 4
    FPS = 50
    MsecsPerFrame = 1000 / FPS
    PreviousTicks = SystemClock
    Repeat
      if SkipFrames > MaxSkip then SkipFrames = MaxSkip
      While SkipFrames > 0
        Update program state
        SkipFrames = SkipFrames - 1
      Update program state once more
      Draw the frame into a buffer
      CurrentTicks = SystemClock
      TimeSpent = CurrentTicks - PreviousTicks
      TimeLeft = MsecsPerFrame - TimeSpent
      if TimeLeft > 0
        then Sleep (TimeLeft msec)
        else SkipFrames = -TimeLeft / MsecsPerFrame
      Display the buffer
      PreviousTicks = CurrentTicks
    Until Profitable

It's good, but not perfect yet. This implementation is still tied to the
desired screen refresh rate, since program state is updated exactly at that
same frequency. Changing the desired FPS results in a different amount of
state updates per second. A cheap way around this would be to up the FPS and
hardcode the state update procedure to run at that rate. For example, 300
state updates per second smoothly divides into 50 and 60 screen updates per
second with constant frame skipping. However, this may be computationally
inefficient.

A proper way around this problem would be to make the state update procedure
capable of internally processing however many milliseconds needed in a single
call. Unfortunately this may mean a complete rewrite of the existing system.
If building from the ground up, this is the desired method to use, however.
It requires making animations run on a millisecond basis instead of a set
FPS basis. Mob/Actor/Thing energy systems such as used in Angband and many
other roguelikes are very easily converted; instead of giving a steady 10
energy to everything per update, the energy given depends on the amount of
time passed since the previous update.

    MaxSkip = 100 msecs
    FPS = 50
    MsecsPerFrame = 1000 / FPS
    PreviousTicks = SystemClock
    Repeat
      CurrentTicks = SystemClock
      TimeSpent = CurrentTicks - PreviousTicks
      if TimeSpent > MaxSkip then TimeSpent = MaxSkip
      Update program state (TimeSpent)
      Draw the frame into a buffer
      TimeLeft = MsecsPerFrame - TimeSpent
      if TimeLeft > 0 then Sleep (TimeLeft msec)
      Display the buffer
      PreviousTicks = CurrentTicks
    Until Yesterday

The beauty of this approach is that it tries to run at exactly the desired
framerate, but can skip arbitrary amounts of time between frames. The state
update procedure is hopefully efficiently programmed. More interestingly, it
is possible to remove the FPS setting entirely, as state updates no longer
rely on a constant FPS number. The program will run with optimal smoothness,
no matter what FPS the display device is operating at. Add a vertical retrace
wait to avoid calculating a dozen updates for each actually seen frame.

    MaxSkip = 100 msecs
    PreviousTicks = SystemClock
    Repeat
      CurrentTicks = SystemClock
      TimeSpent = CurrentTicks - PreviousTicks
      if TimeSpent > MaxSkip then TimeSpent = MaxSkip
      Update program state (TimeSpent)
      Draw the frame into a buffer
      Wait for vertical retrace...
      Display the buffer
      PreviousTicks = CurrentTicks
    Until End

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Briefly: Sprite blitting can be relatively simple and fast, if you're the
         kind of person who likes using hardware acceleration. However,
         should you prefer software rendering, using a packed bitmap may be
         much more efficient than an unpacked one.

Two-dimensional bitmaps have been a staple of computer graphics for ages.
They are simple to use, since you just copy the bitmap into the output buffer
raster row by raster row. This is fast and works well for solid rectangular
images.

Someone came up with the idea of showing a number of different bitmaps one
after another to create animation. These are sprites. At its simplest, you
just had two different frames and alternated between drawing them. However,
this introduced an additional complication: animated things weren't always
solidly rectangular. Transparency was needed to avoid everything being
box-like.

The early Sierra AGI adventures used a palette of 16 colors. There sprites
could pick one color to not be drawn. However, copying a bitmap onto the
output buffer then required a conditional jump for every pixel of the sprite
to decide whether to draw or not.

A modern 24-bit RGB sprite bitmap would probably just include an alpha
channel, becoming a 32-bit sprite. These can be generally be blitted with
hardware acceleration, very fast.

The fastest theoretical way of drawing a sprite would be to hardcode it in
straight machine code. Then, you can use direct offset additions to skip
transparent pixels and move down scanlines, and can use 8-, 16-, 32- or even
64-bit MOV instructions to draw the non-transparent pixels.

This approach has two problems: first, for sanity, you would need a function
that takes a bitmap and produces the machine code to draw it, taking into
account output buffer width and possibly bitdepth. Few programmers know how
to write code that can produce its own executable code - essentially, writing
a small compiler. Also, this approach is not at all portable.

The second problem is clipping. A vanilla bitmap is easy enough to clip, but
here you would need to have an extra routine that is called if clipping is
needed, that either uses the original bitmap or else hacks the drawing
machine code to exclude pixels outside the viewport. This is mainly an
implementation problem, however, as sprites spend 99% of the time not
clipped. You would also have to use the painter's algorithm when blitting,
since fitting a Z-buffer or something in this system could be difficult.

Perhaps a good alternative, then, would be sprites stored with RLE-compressed
transparency. A main loop checks if the next run of pixels is transparent,
and only proceeds into the drawing code if it is not. I would estimate
a 5-15% speed increase over a more traditional sprite blitting function.
I implemented a new tile rendering in-line assembly routine in MoonVideo
using the presented principle, and rendering went from 7.5 milliseconds down
to 6 milliseconds, though some of that is thanks to another optimization.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-