As a kid in the 1980s, there was the C64. Many may remember that little machine with its 1 MHz 6502 CPU and notorious 38911 BASIC BYTES FREE. Boy what small resources! And then we all programmed on interpreted BASIC which made the apparent speed of execution about a tenth of what it would have been in native machine code. An effective 0,1 MHz. You could literally watch the computer execute each statement.
Where are we now? Dual core, quadcore, 3 GHz, 4 GHz. Applications do not run in shared RAM anymore but in a world of seperate kingdoms called virtual address spaces, eagerly guarded by a memory management unit. 32bit operating systems offer 2-4 gigabyte, and 64bit operating systems gazillions more. Processors are so powerful that they can be shared to run multiple operating systems simultaneously with a hypervisor like VMWare.
So why do I bring up this comparison? To illustrate an important point: CPU usage in normal applications is so negligible that it doesn't play much of a role anymore. There are notable exceptions to this: any multi-dimensional information processing (pictures, movies, huge nested loops, huge databases, huge recursions), or in computer science lingo: O(N^2) and above. But for the rest of algorithms, I think programmers should reconsider what they learnt during their professional upbringing.
In the dark ages before lightning fast processors, optimizing CPU usage was of real importance. You could reach speed increases of seconds, minutes or even hours by optimizing code, throwing unneccessary function calls away here, using a funky assembler instruction there. These days are gone for application software. Sure, applications still fight for CPU time in the kernel scheduler. But compared to the massive bottleneck "disk access", the CPU pales in comparison. That's why the Megahertz race in processor advertising finally subsided, because nobody could understand anymore why they needed all this for typing letters in Microsoft Word.
Think of a typical application program that does disk I/O (either from a local drive or, worse, from a network share). An access may take 100ms to deliver data. One hundred milliseconds! Do you know what your CPU did in all that time, all these nanoseconds? It waited literally millions of cycles if the scheduler had nothing else to do for it.
Hard drives and network interfaces have taken over the role of "slow parts" in computers. That is why optimizing file and network access is key to optimizing software these days. Sure, you can still suffer from CPU shortage if you program recklessly, but many more delays in an application happen because there is only one thread of execution, and that thread of execution is waiting for an external event. The worst delays happen when the operating system itself is competing for hard disk access (read: access swap space). Swap space is RAM of absolutely last resort, not something you would want in normal operation of a PC.
If your application runs slowly on a customer system, be on the lookout for RAM shortage, badly fragmented hard drives and slow networks.
This brings me to one of the things I learned in many years working with computers: To make a computer faster, increase RAM, not necessarily processor speed. You can have a fast 500 MHz CPU when it has enough RAM and a good hard drive interface, while many gigahertz class machines are crippled by far too little RAM filled with all sorts of pre-installed software the user never uninstalls, and slow hard disk access.