Ideas and algorithms
I've independently thought up some algorithms, most of which have already been invented before. These tend to be the less useful ideas; the actually useful ones I've turned into programs.
- Differential Cubic Interpolation
- Audio Compression using Rebuilding
- Video Compression with Multiple Delta Layers
- Audio Compression using Gradients
- Extremely simple Digital Low-Pass Filter
- Another simple Digital Filter-Booster, Extended
- Frameskipping and framelimiting
- Sprite blitting
BRIEFLY: The interpolation methods traditionally considered the best do not
give optimal results where peak accuracy is desired. This one aligns
curve peaks more precisely. This turns out to be one way to describe
a Cubic Hermite Spline.
There are two main families of two-dimensional interpolation techniques:
those that only consider the two values to interpolate between, and those
that also consider some surround values to provide continuity in the curve.
Linear interpolation is the primary representative of the first family.
Another one is Cosine interpolation, which to my understanding scales half of
a cosine curve to form a smooth curve between two points. These methods are
simple to implement, but not necessarily accurate. Consider a sine wave
sampled at a low frequency. Linear interpolation creates an edge at each
sampled value. Cosine interpolation adds a little weird oscillation around
the proper sine.
Cubic or Gaussian interpolations are in the second family. By taking
into account some surrounding values, the interpolated curve can avoid
creating edges or most other disturbing artifacts. Two downsides present
themselves: they are slightly more complex than methods of the first family,
and they shift curve peaks.
If you imagine the sampled sine wave interpolated using the latter methods,
you'd probably find the end result very close to original if the sampling
rate is at least a dozen hertz. However, if your original wave has a sharp
peak that is preserved in the sample, these interpolation methods will create
a new peak a little before or after the sampled value and at
a different amplitude.
While the difference may not be dramatic, it exists. The real peak might of
course not be at the sampled peak location, but without complex frequency
reconstruction information we do not know at which side and exactly how far
from the sample peak the original peak was. Since the original peak could've
been anywhere around the sampled value, it is most desirable to have the
interpolated curve peak at the exact sample location to meet the highest
probability of being right.
The idea behind Differential Cubic Interpolation is to control the gradients
of the curve between two points to force peaks in their places. If at every
peak sample the curve's gradient is zero, the interpolated peak falls at the
The interpolation process has two steps, which complicates matters a bit and
makes this method a bit slower than regular cubic interpolation. The first
step is to decide on the gradients to use for the beginning and ending
points. The second step is to plug the numbers into the formula and do the
The formula is simply a combination of a cubic curve and its derivative.
y = ax^3 + bx^2 + cx
g = 3ax^2 + 2bx + c
So, you have two points: x1,y1 and x2,y2. You want to interpolate with
a function where n is in the range [x1..x2].
( ( g1 + g2 - 2dy / dx 3dy / dx - 2g1 - g2 ) )
f(n) = ( ( -------------------- * n + --------------------- ) * n + g1 ) * n + y1
( ( dx^2 dx ) )
where g1 is the desired gradient at the beginning point
g2 is the desired gradient at the ending point
dx = x2 - x1
dy = y2 - y1
The function is here optimised for the fewest number of operations, so it's
pleasant to use by programmers.
Now, deciding the gradients is a bit nasty from a computer's point of view.
I haven't thought of a straightforward mathematical way to calculate
gradients that wouldn't result in peak displacements, so some conditional
jumps are required. The wikipedia article on Cubic Hermite Splines lists a few
different ways to calculate the tangents, but apparently none of them respect
the goal of retaining exact peak locations.
To select a gradient, follow these rules:
If the values on both sides of a point are on the same side of the point, it
is a peak and must use a gradient of zero.
Otherwise the gradient should be some sort of linear gradient between the
previous and next points. Or possibly, the gradient from the previous to the
current point, or from the current to the next, whichever is closer to zero.
To make it simpler, you could of course just stick in a zero for every
gradient, but then you might as well use cosine interpolation as the result
would look pretty much like it.
BRIEFLY: It might be possible to break down a sound to its components, only
store those, and then reconstruct the sound using those. This might
not rival any modern audio compression schemes and the reconstructed
audio stream might sound horrible anyway.
From my high school physics lessons I remember that every sound can be broken
down to its frequency components. They say it's also possible to
mathematically reconstruct any sound from a bunch of sine waves. So I wonder,
why not save a bunch of sine wave parameters instead of direct audio data? If
you break the sound down to enough frequencies and update the parameters at
a high enough refresh rate, the reconstructed sound should sound
approximately similar to the original, even though the actual wave would look
Take some nice wavelet and slip that through some function to separate the
frequencies. Perhaps 512 or 1024 frequency steps would be enough, and the
refresh rate might be 10-20 times a second, if you then interpolate the
frequency amplitude values upon playback.
For every frequency step you'll need to save one amplitude value, for which
perhaps 8 bits is sufficient due to the logarhitmic nature of sound loudness.
When playing back the sound, the 8 bit value could be raised to the second
Reconstructing the wavelet is done by mixing together a number of sine waves.
These could be viewed as channels in module music, maybe, except each channel
produces only a continuous sine wave at a fixed frequency and varying volume.
The only thing changing upon playback are the volumes of the channels. Each
channel should maybe also have a volume multiplier for easy bass&treble
boosting. Each wave should be mixed together multiplied by
(channel count / maximum output amplitude), maybe, or whatever you need for
clipping and controlling the output. Playing back sound would be very easy
and quick to compute this way, only the compressing would take computing
512*10 would yield 5120 bytes a second, 1024*20 would yield 20480 bytes
a second, in stereo double that. To compare with MP3, in kilobits per second
these would be 40 and 160 kbps, in stereo 80 and 320 kbps. The size might be
reduced by saving the amplitude values as a bitstream where each frequency
step has three value sizes, either 0 bits (no change), 4 bits (+/- 1..7 from
current value) or 8 (full value). If some frequencies are not used as much as
others the file size would drop. For a regular music piece this might mean
a reduction of a quarter.
I would suppose 512 frequency steps should be quite sufficient, and a refresh
rate of 20 hertz should maybe be a bit higher. The frequencies should possibly
be calculated from 20 Hz - 40 kHz. Of course one might be led to ponder how
other wave types such as triangle and square waves would compress with this.
BRIEFLY: A possible way to compress a digital video stream using deltas on
multiple layers. I don't know much about the best video compression
codecs today, but at least some time ago there were two major kinds
of them. MPEG apparently uses some sort of motion estimation. The
other kind compresses by storing only changes from the previous
frame. This method belongs to the latter breed.
Let's pretend you have a video stream. A bunch of sequential video frames
that you want to compress. We have some basic information on those as we
start, such as the resolution, the framerate, the length, and the color
I consider compression time irrelevant, so this scheme is aimed at fast
decompression. Most movies have true colors, 24 bits per pixel. In my opinion
some of those bits can safely be scrapped without the viewer noticing anything,
especially since all pre- and post-compression work on the video stream is of
course done in the highest possible bit depth, and a little anti-aliasing on
the decompressed image can restore some smoothness to the colors.
This is how a string of frames is compressed, then. You have frame A, the one
you're showing currently, and frame B, the one next in turn. Divide both
whole frames into nine or maybe sixteen about equal-sized squares. Compare
each square in B to the corresponding square in A and calculate the average
color values of both. That is, add together each color component of every
pixel in the square and divide that by the area of the square. If we're
working with RGB, you'll get the average red, green and blue intensities
within that ninth or sixteenth of the frame. If there is noticeable change in
the average, then shift the color component in whole square A toward the new
value. The color component shifts for each square are saved in the compressed
stream using a signed few bits, once for every frame. This quite handily
takes care of fades and larger changes in the movie.
Next divide both whole frames into smaller blocks. 8x8 pixels seems to be
a standard size, but you could easily use 4x4 or 16x16 depending on your
resolution. Do a similar average color component comparison on every little
block and if there is noticeable change in the averages, shift the block in
frame A toward the new average. Every block should maybe get half the amount
of bits for the delta the movie's color depth is, when writing those in the
Note that this scheme allows for a bitrate to be defined, as well as lossless
compression. If a bitrate is defined, then keep track of how many bits you've
allocated during the following steps. For lossless compression you can use as
many bits as you ever want to produce deltas for an identical frame.
If you want to compress losslessly, this step can be skipped. Go through
every block again and calculate the average absolute change of each block
from A to B. The absolute change are the color component deltas added
together. In RGB you'd do a |dR|+|dG|+|dB| on every pixel. Store these values
in a table somewhere. This way you get a table showing which blocks have the
most changes and thus need bits most desperately. You'll of course need to
sort the table before using it.
Now proceed through the blocks using the table and start allocating them bits
as long as you have available more than ( the remaining number of blocks
multiplied by the minimum number of bits needed for each ) bits. Calculate
how many bits each block needs for storing a lossless delta for each bit in
it, to change from your already altered frame A to the new frame B. Then
apply the color component adjustments on frame A and save those in the
bitstream. Keep doing this until you can only allocate the minimum number of
bits for every pixel left, then sweep through the rest of the blocks using
that. This is to ensure that every block gets at least a little change for
every frame, if necessary.
Once you're done with this, your frame A has been changed noticeably and now
resembles frame B, and if enough bits were available, is identical to it.
Also, you've stored the instructions for doing the same changes in the
compression bitstream. Proceed to load the next frame in your movie as your
new frame B, and start comparing again.
If the viewer wants to start watching the movie from halfway, it would be
inconvenient to have to decompress the whole thing from the start. The normal
solution for this are keyframes. A keyframe is placed in the compression stream
now and then, for example once every five seconds, and is basically a somehow
compressed direct bitmap. You could possibly only save a quarter of the image
in every keyframe, if they keyframes are placed near each other, that way the
bitmaps take only a quarter as much space and the movie will start looking
perfectly normal after three keyframes have passed.
You could also use a YUV color system instead of RGB and get somewhat tighter
compression, however YUV by its nature doesn't allow lossless compression so
RGB should be left as an option. Possibly the bitstream could be Huffman-
encoded for further reduction in size but I don't know how that works. Also,
I suggest including special command support for every frame, for example for
showing subtitles by the decompressor program instead of being compressed
along with the rest of the video.
Decompression is easy. You keep your last frame in memory and apply the
deltas on that from the compression stream. First the larger ninth or
sixteenth deltas, then the smaller blocks, then on single-pixel level.
Perform scaling and anti-aliasing if desired and voila: one decompressed
video playing back feasting your eyes.
I have no idea how effective this codec would be.
BRIEFLY: Modern audio compression techniques splice the sound into pieces and
use psychoacoustic models to drop some parts, with vectors and
codebooks for compression. Ogg Vorbis can do very good stereo at
bitrates of even below 100kbps. This method is tremendously simpler
to understand and apply, taking a totally different approach. The
quality and size, however, may be infeasible.
Just the other day I was looking at an explanation of how sampling affects
a natural wave. It included an image of a nice round wave, and some sampling
points. Connecting the points linearry gave a result not entirely like the
original wave. Using cubic interpolation was better, and the connected sample
points almost matched the original wave. A realization dawned:
The main mismatches were at places where the original wave had a gradient of
zero. The peak was often displaced and reformed. As it is, the locations of
the peaks are the most important, since they actually define the frequencies
in the wave. Having distinctive peaks every 1000 frames would mean there is
a sine wave going there at a steady frequency. So suppose the sampling is
done with sufficient frequency, and then the wave is stored using only the
peak points? Differential Cubic Interpolation, above, is made for giving
natural curvature in expanding such cases.
Here are some illuminating images.
|Here you see a sound wave in real life.|
Notice the sensuous round curves.
|Now the sound is sampled. Due to the|
low sampling rate much of the wave is lost,
and really we'd use a higher rate, but this
is for the sake of demonstrational simplicity.
|Here is the wavelet shown with linear|
interpolation. This is what we need to compress.
We only have 13 dots that are connected
in this example.
|Let's drop less necessary dots and only|
connect the remaining ones using DC
interpolation. We have 7 dots left, most of
them being turning points. Notice the
differences and similarities with the
original and the sampled wavelets.
The above figures are only approximate, made using MS Paint so the interpolation isn't exact. At decent sampling rates less complex sounds actually can be crawling with runs of dozens of removable samples between turning points. If only the sampled value of each turning point was necessary to store, such sounds could gain an instant size decrease of 95% with practically no loss of quality! More complex sounds enjoy more turning points, but still even at worst average there are around six to eight samples that can be skipped. One minor setback is that we won't get off by just saving the sample values. This could be done by simply resampling a sound at a lower rate, of course, but then the resulting wave would lose more and more of the intricacies of the original, as you can see in the images above where samples aren't taken too frequently. Since the peaks or turning points really matter, the distance between each needs to be stored as well, in terms of the original wavelet. If there are peaks at 0, 15, 19, 24 and 29, we should store the peak values along with the numbers 14, 3, 4, and 4. Since at least one frame needs to pass, the value can well be decreased by one for storing. An average peak distance might be eight frames, so four bits would be well enough to store most distances. Also, wavelets rarely use the whole 16-bit width available. If both the accuracy and the bit width are expendable, we can at best even halve the space that each stored sample value takes. Since the decompression is done at the full dynamic range, the perceived loss of quality is not nearly as dramatic as if you just cut half of the accuracy directly out. The actual stored sample value should be saved in a more or less logarhitmic way, too. Most sample values gather around the middle, zero. Therefore giving more accuracy for this area at the expense of the edges may be desirable. This is just my guess, though, so some testing would need to be done to be sure. What if a peak or turning point is not found in the desired range? Or what if there is a small sine vibration in a steadily rising wave, but not enough to give a zero-gradient at any point? Since the DC interpolation is capable of smoothly drawing a line with any given gradients at both ends, it might be wise to store some other important frames as well. How to decide when to save? First scan through your desired range, for example 16 frames. Calculate the gradient of each point up to the first zero, or until the range runs out. The gradient is easy to calculate by taking the sample values of the following and previous frames for a delta y, and dividing that by delta x, in this case a constant two frames. With sufficient sample frequency this is an adequate method. Of course, the gradient is non-zero only if the previous and next frames are not on the same side of the target frame. Next scan from the start up to the zero, or if no zero was found, the whole range. If there is a notable change in the gradient at some point, like from 0.1 to 6.9, and the frame has more than one frame in between the start and end of the scanned range, it should be stored instead of the zero. When storing a frame that does not have a gradient of zero, either because there was too much change in the area, or just because no suitable frame existed in the desired range, the actual local gradient needs to be stored as well. Most of the time this will be zero so a single 0-bit is enough per frame. If a real value is needed, then a fixed-point number can be used. For accuracy I'd suggest eight or nine bits. One bit for the sign, four for the fractional part and a few for the integer part. The real part should represent linear fractional values from 0 to 1, but the integer part should be logarhitmic since the larger numbers rapidly lose significance and thus need less accuracy. In order to accommodate changes in the wavelet, dynamic bitrates are a must. Suppose one part of the wavelet is overamplified action music that uses the whole 16-bit range. Another part has moody silent strings that only use a quarter. Obviously compressing the whole wavelet using only a single bitrate will either give lousy quality for the action music, or waste space for the silent strings. Therefore it makes sense to store the wavelet in blocks of a constant, or even varying, size. Using variable-size blocks would require scanning ahead in the wavestream and deciding the most advantageous point to change. This should be done carefully, as constant accommodation to the wavelet would require constant changes and storing the changes produces some overhead as well. For now I'll go with constant-size blocks, but variable-size is certainly an option. Here, then, is the summary of how to compress audio for DC decompression: 1. Read wave frames to memory until you have a whole block 2. Calculate a high-resolution gradient of each frame in the block 3. Locate all peaks and other frames with notable gradients 4. Calculate an average bit value for storing most frame distance values 5. Adjust the list of frames to store to fit the distance value bitsize now 6. Compress all the wavelet sample values to an appropriate bit depth 7. Round the gradients of chosen frames to the lower bit resolution 8. Store the block: Frame Distance bitrate, Sample bitdepth, then each frame. Each frame contains: Sample value, local gradient, and distance to next. I believe that compressing the distance values separately in a string for each block would also be advantageous. Because peaks in a wavelet come at definite frequencies, there is some repetition to be taken advantage of. Using a form of LZW, perhaps as used by GZip, might further cut the distance value storage by a third. All this taken in account, how good is this compression method? Using the suggested bit sizes and estimated distance between peaks or notable changes in gradients, let's see... one out of eight frames stored. Maybe one out of sixteen stored frames needs a separate gradient value. Average storage bit depth could be 10 bits, and the distance to next frame five bits, though further compressed. The ratio between eight original wavelet frames and compressed ones would be... 128 to 11+5/3 bits. Multiply by 16 for the separate gradient value to get 2048 to 200 or so. This multiplied by 344.5 gives about the number of bits used to store a single second of a mono wavelet at 44100 Hz... Reasonably good-quality mono sound would eat a bit below 70kbps. This may or may not be a tie with MP3. I'm sure further improvements are possible for my compression scheme, that might bump it past MP3, maybe even to match Ogg quality. However, a test application would need to be written to hear exactly how the sound changes when using my compression, and whether the quality suffers more than I can forehear. Also I do not have a nice way to store stereo sound, although I suppose the standard method of subtracting one channel from the other and storing the difference would give decent results. One possible improvement for size and quality might lie in the distinction of high, and medium to low frequencies. DC interpolation shows its best sides for the lowest frequencies, which would compress very well using this method. For the highest frequencies, though, it might make more sense to filter them out and save in a different way. Say, from 10kHz upwards might count as high frequencies. The downsampling method described in the previous article could be perfect for separating the two. Whatever is left could be saved using this method, for DC decompression. The high frequencies could possibly be stored in a manner similar to the third article of this page, keeping only the approximate amplitudes of the high frequencies, and upon decompression adding them to the DC interpolated wave. This gradient-based compression technique is inherently more simple to understand and apply than either MP3 or Ogg Vorbis. I can't say I'm very familiar with existing possibly superior compression methods, so odds are something like this already has been invented. Although modern personal computer users will no doubt demand quality before speed and size, I believe this method might have some real use for handhelds and other devices with more limited resources. This method is easy to program without using floating point numbers. It's also possible to revise the DC interpolation when decompressing into a modified linear interpolation which might sound almost as good. The interpolation is the heaviest part of decompression so changing it would allow for warp speed audio output! -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- BRIEFLY: Very simple way to digitally cut out high frequencies from a wavelet. This technique uses what I affectionally call 'a lagger' to slow down the changes in the wave, effectively filtering out high frequencies while low ones pass unmolested. I believe this method is basically describing a Resonant Low-pass Filter. I was wondering, how do those frequency filters and boosters work. An obvious answer is, use an FFT to splice the stream and then apply your favorite multipliers on the spliced bits before mixing them together again. Trouble is, Fourier Transformations are relatively slow and complex. (ie. I don't get how exactly they work.) So here's a different method for filtering. This whole theory works on the digital representation of a wavelet: Sampled values in a stream of audio. It might not be useful at all for analogic devices. The stream of audio, a wavelet, can be thought of as being composed of a number of sine waves of various frequencies. If I understand correctly, a Fourier Transform searches for peaks of sine waves. This could possibly be done by using a derivative of the wavelet, showing the amount of change in each frame. My method views the wavelet with a similar approach, not concentrating on the actual sample values, but rather the changes from frame to frame. On the simplest level, it works like this. Instead of sending each frame of the wavelet to your audio output, you keep track of the sample values and send your tracking counter to the audio output instead. For a non-filtered sound, allow the tracking variable which I'll call X from now on to move freely up and down following your wavelet. The sample value of each frame can be called Y. Once you want to start filtering, you feed the change between the frames into some nice function and move X using the result of that function. Consider having harmonic triangle waves at 110, 220, 440, 880 and on, in a wavelet playing at 11kHz. The highest triangle frequency is hence 3520 Hz. Then you define the filter function to be: if dY > (quarter of max amplitude) then counter = counter + quarter of max amp - square root of quarter of max amp + dY to some nice fractional power If you have a 16 bit wavelet, the maximum amplitude is 65535. counter = counter + 16384 - sqr(16384) + dY^0.8 This would allow a maximum unfiltered change of 16384 during one frame, which means the highest thing you could get out unfiltered and at maximum amplitude would be a 1378 Hz triangle wave. The highest undistorted sine wave would be something lower, I can't calculate what. Now see what happens to our harmonic triangle waves above 1378 Hz. We've got 1760 Hz. This would require a delta of about a third of the maximum amplitude, some 21845. Instead our function allows it a movement of 19217 per frame. This does alter the dynamics a little but hey, what were you expecting from a cheap, easy solution? :p This would mean a 1617 Hz triangle wave if we were in fact trying to reach every turning point of the original wavelet. Good thing we aren't. In fact what we'll get is a somewhat displaced, hopefully 1760 Hz triangle wave whose amplitude has dropped from full 65536 to 57651, 88% of maximum. Her big brother, 3520 Hz, desires a framey change of 41850. Instead, he is allowed 21237. Calculate this: allowed movement / desired movement * max amp. He'll hopefully end up a happily displaced but otherwise intact triangle wave at an amplitude of 33257, 51% of maximum. The function of course needs to be adjusted for the frequency of the wavelet. You'll only want to allow half as much movement per frame for 44100 Hz as you would for 22050 Hz. Don't be intimidated by the maths presented above. Most of that is spent on converting the wave to understandable hertz values. All you need to do, is 1) Prepare the X sample value pointer 2) Start reading the wavelet 3) dY = next value - previous value 4) Add f(dY) to X 5) Write X to audio output 6) Move to next sampled value, hop back to 3 if more wavelet to go One thing to note: To make sure you get no distortion you'd need to calculate dY using only the original wavelet, not X. However, this would quickly throw X way off and might cause some undesired clipping in the sound. But if you calculate dY = next value - X, then the resulting wave will be displaced somewhat, but will follow the original wavelet more faithfully. Perhaps some sort of hybrid would be cool, for example by using the no distortion method only when it moves X toward the new sample value, and otherwise using the displacing method. Easy, isn't it! The only trouble arises with the nasty fractional exponent. It might be a bit annoying to code and is the single slowest thing in the loop. Of course I haven't tested this out either, for whatever reason. It may be that the resulting filtered output sounds horribly distorted, but at least it's good for a few laughs in that case. Side note: It might be possible to do a high frequency booster using a different function, which would move X more than dY if dY is higher than some arbitrary value. Also, you could possibly boost low frequencies by making a copy of the low-pass filtered wavelet and mixing that together with the original with some balancing multiplier. Any frequency that wouldn't be filtered smaller would get instant 2x gain. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- BRIEFLY: Another simple way of filtering out high frequencies or boosting low ones in a digitally sampled wavelet. Or, in fact, with some creative combining, any adjustment of bass and treble in a wavelet. I've verified this method does indeed work, and has been historically used in electronics engineering when an FFT wasn't ideal for whatever reason. Stacked Moving Averages are a cheap approximation of full Gaussian filtering, which has the advantage over FFT of preserving the original wave shape more accurately. Alright, so the previously described 'lagger' method may or may not alter the dynamics of the sound to be filtered a bit more than desired. In any case, there is another way to remove high frequencies, familiar to any of us. A wavelet is comprised of a number of digital samples, for example recorded from a live source by storing the pressure of air (sound wave) a number of times per second. A popular number is 44100 samples a second. It sounds very close to the real thing already. But what happens when you downsample a wave to, say, 11025 samples a second? Using proper interpolation you get the same wave with frequencies only up to 5512.5 Hz and some sound artifacts that hopefully aren't too huge. And what happens if you simply resample this with good interpolation back to 44100 Hz? Why, the frequencies stay away, while the actual form of the wave remains basically the same. Or downsample to 200 frames per second. Use proper averaging interpolation to even out the wave, and you'll find yourself with a wavelet that at best can contain frequencies of up to 100 Hz. Then resample this with DC interpolation to make it natural, and mix in with the original. Frequencies at 100 Hz and below are boosted. Like this isn't enough, you can also do the above, but not mix the filtered wave together with the original. Instead, since the basic form of the filtered wave remains the same, you can subtract it from the original. For simpleness, just negate the wavelet by multiplying with -1, then mix as above. In doing this you cancel out the frequencies below 100 Hz. Who'd want to do that, you ask, what's a song without bass? But use the first filtered wave that only went down to 11025 samples a second: a high-frequency filter. Or give 2x gain for the original wavelet, then subtract the wave that had its frequencies above 5512.5 Hz cut out. This boosts the cut frequencies by 2x. The advantage of this method is, again, ease of implementation and relative computational effectiveness. Also it should be noted that the frequency cut-off is quite precise, which is not always a good thing. For some special mixing effects it can be very cool, but for just adding a bit more bass you may desire a filter-booster that has a progressive cut-off. Not unlike the lagger filter, coincidentally. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Briefly: Frame skipping and frame limiting can be done the right way or any of a number of wrong ways. Tying program state updates into the display's refresh rate is a common offense. Here are some thoughts and pseudocode suggesting a superior frame control method. SuperSakura uses a frame limiter designed like this. In anything except user-input dependent scenarios, a program needs to update the screen constantly and smoothly. 3D engines come to mind first when the term FPS is mentioned, but sprite-based, tile-based, even real-time ASCII graphical engines need screen refresh timing. Neglecting to program in a frame limiter and trusting that no computer will ever be fast enough to make the game unplayable has been shown to be very annoying. Likewise leaving out frame skipping and trusting that no computer is slow enough to make the game unplayable. There is a temptation to tie screen updates to the screen refresh rate, particularly among console programmers. This is bad practice, in part due to the legacy of two different video standards around the world: NTSC and PAL. Porting a program between the two will then require recoding screen updates, mainly resulting in European console gamers having to wait a few extra months while the outsourced porting programmers scratch their heads. The simplest way to implement a frame limiter is to add a forced delay between all updates. Say, 20 milliseconds before a screen buffer switch or full screen blit. This guarantees the program will never run faster than at 50 FPS, and will often run slower than that. This doesn't take into account the time spent calculating what to draw and the drawing itself, times that might vary from frame to frame. But, it's better than nothing! FPS = 50 MsecsPerFrame = 1000 / FPS Repeat Update program state Draw the frame into a buffer Sleep (MsecsPerFrame msec) Display the buffer Until Oblivion An advanced frame update system could remember the system timer count at the previous frame, and only sleep however long is needed on top of that to have passed the desired 20 milliseconds between frame displays. FPS = 50 MsecsPerFrame = 1000 / FPS PreviousTicks = SystemClock Repeat Update program state Draw the frame into a buffer CurrentTicks = SystemClock TimeSpent = CurrentTicks - PreviousTicks TimeLeft = MsecsPerFrame - TimeSpent if TimeLeft > 0 then Sleep (TimeLeft msec) Display the buffer PreviousTicks = CurrentTicks Until Fun That already runs quite nicely, and should be perfectly smooth if you can guarantee the program never needs more time to update its state and draw the frame than the 20 milliseconds you want used per frame. On multitasking or otherwise unreliable operating systems, unfortunately, you can never guarantee this unless you can get away with running with real-time priority. If the program does go overtime, everything will happen in slow motion, even with the Sleep function skipped. This brings us to frame skipping. When the program is running late, it should skip ahead by a frame or few to catch up. The pseudocode below does this by updating the program state an extra time for every frame to be skipped. Obviously you'd use integer division instead of floating point, unless you had something creative in mind. It also makes sense to define a maximum amount of frames to allow skipping past; there may be an OS hiccup that prevents a screen update for two seconds, and it's then better to continue where you left off rather than two seconds further into the action, possibly having missed critical user input during that time. SkipFrames = 0 MaxSkip = 4 FPS = 50 MsecsPerFrame = 1000 / FPS PreviousTicks = SystemClock Repeat if SkipFrames > MaxSkip then SkipFrames = MaxSkip While SkipFrames > 0 Update program state SkipFrames = SkipFrames - 1 Update program state once more Draw the frame into a buffer CurrentTicks = SystemClock TimeSpent = CurrentTicks - PreviousTicks TimeLeft = MsecsPerFrame - TimeSpent if TimeLeft > 0 then Sleep (TimeLeft msec) else SkipFrames = -TimeLeft / MsecsPerFrame Display the buffer PreviousTicks = CurrentTicks Until Profitable It's good, but not perfect yet. This implementation is still tied to the desired screen refresh rate, since program state is updated exactly at that same frequency. Changing the desired FPS results in a different amount of state updates per second. A cheap way around this would be to up the FPS and hardcode the state update procedure to run at that rate. For example, 300 state updates per second smoothly divides into 50 and 60 screen updates per second with constant frame skipping. However, this may be computationally inefficient. A proper way around this problem would be to make the state update procedure capable of internally processing however many milliseconds needed in a single call. Unfortunately this may mean a complete rewrite of the existing system. If building from the ground up, this is the desired method to use, however. It requires making animations run on a millisecond basis instead of a set FPS basis. Mob/Actor/Thing energy systems such as used in Angband and many other roguelikes are very easily converted; instead of giving a steady 10 energy to everything per update, the energy given depends on the amount of time passed since the previous update. MaxSkip = 100 msecs FPS = 50 MsecsPerFrame = 1000 / FPS PreviousTicks = SystemClock Repeat CurrentTicks = SystemClock TimeSpent = CurrentTicks - PreviousTicks if TimeSpent > MaxSkip then TimeSpent = MaxSkip Update program state (TimeSpent) Draw the frame into a buffer TimeLeft = MsecsPerFrame - TimeSpent if TimeLeft > 0 then Sleep (TimeLeft msec) Display the buffer PreviousTicks = CurrentTicks Until No More The beauty of this approach is that it tries to run at exactly the desired framerate, but can skip arbitrary amounts of time between frames. The state update procedure is hopefully efficiently programmed. More interestingly, it is possible to remove the FPS setting entirely, as state updates no longer rely on a constant FPS number. The program will run with optimal smoothness, no matter what FPS the display device is operating at. Add a vertical retrace wait to avoid calculating a dozen updates for each actually seen frame. MaxSkip = 100 msecs PreviousTicks = SystemClock Repeat CurrentTicks = SystemClock TimeSpent = CurrentTicks - PreviousTicks if TimeSpent > MaxSkip then TimeSpent = MaxSkip Update program state (TimeSpent) Draw the frame into a buffer Wait for vertical retrace... Display the buffer PreviousTicks = CurrentTicks Until End -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Briefly: Sprite blitting can be relatively simple and fast, if you're the kind of person who likes using hardware acceleration. However, should you prefer software rendering, using a packed bitmap may be much more efficient than an unpacked one. Two-dimensional bitmaps have been a staple of computer graphics for ages. They are simple to use, since you just copy the bitmap into the output buffer raster row by raster row. This is fast and works well for solid rectangular images. Someone came up with the idea of showing a number of different bitmaps one after another to create animation. These are sprites. At its simplest, you just had two different frames and alternated between drawing them. However, this introduced an additional complication: animated things weren't always solidly rectangular. Transparency was needed to avoid everything being box-like. The early Sierra AGI adventures used a palette of 16 colors. There sprites could pick one color to not be drawn. However, copying a bitmap onto the output buffer then required a conditional jump for every pixel of the sprite to decide whether to draw or not. (Or a pair of bitmasks, I suppose.) A modern 24-bit RGB sprite bitmap would probably just include an alpha channel, becoming a 32-bit sprite. These can be generally be blitted with hardware acceleration, very fast. The fastest theoretical way of drawing a sprite would be to hardcode it in straight machine code. Then, you can use direct offset additions to skip transparent pixels and move down scanlines, and can use 8-, 16-, 32- or even 64-bit MOV instructions to draw the non-transparent pixels. This approach has two problems: first, for sanity, you would need a function that takes a bitmap and produces the machine code to draw it, taking into account output buffer width and possibly bitdepth. Few programmers know how to write code that can produce its own executable code - essentially, writing a small compiler. Also, this approach is not at all portable. The second problem is clipping. A vanilla bitmap is easy enough to clip, but here you would need to have an extra routine that is called if clipping is needed, that either uses the original bitmap or else hacks the drawing machine code to exclude pixels outside the viewport. This is mainly an implementation problem, however, as sprites spend 99% of the time not clipped. You would also have to use the painter's algorithm when blitting, since fitting a Z-buffer or something in this system could be difficult. Perhaps a good alternative, then, would be sprites stored with RLE-compressed transparency. A main loop checks if the next run of pixels is transparent, and only proceeds into the drawing code if it is not. I would estimate a 5-15% speed increase over a more traditional sprite blitting function. I implemented a new tile rendering in-line assembly routine in MoonVideo using the presented principle, and rendering went from 7.5 milliseconds down to 6 milliseconds, though some of that is thanks to another optimization. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-