Locality is an interesting to think about and exploit when designing algorithms for image processing. For normal data processing a simple MRU cache is a common solution, taking advantage of similarities as you linearly traverse a data set. When you start thinking about image processing however, locality is suddenly two-dimensional. Considering the pixel above you is equally as advantageous as considering the pixel to your left.
For example, consider turning a 32 bit-per-pixel image into a paletted one. Determining the closest match in the colour palette for each pixel can be an expensive task. Caching recently computed values can help, but this falls apart if you are processing large images (say 1600x1200) linearly. By the time you process pixel (x,y), the computed value from (x, y-1) has long since been removed from the cache, and there is a fairly good chance that final palette index will be the same. A simple change in processing order can make all the difference. By processing the image in MxM blocks (Choose M to suit your cache size) and iterating through the y direction first we can take advantage of the bi-linear nature of locality for images, greatly improving cache efficiency and, as a result, processing time.
Techniques like this are really important when designing software, especially graphics software, for embedded devices.
Higher end graphical features such as Gaussian filters and anti-aliasing can add a lot of pop and polish to your UI, but are often too computationally expensive for deployment on embedded devices, where every CPU cycle counts. By being creative with implementation decisions and carefully choosing your optimizations, you can greatly reduce the cost associated with highly desirable features, allowing you to provide a crisper, nicer looking application.
-- Chris K