It is more cost-effective to plug one acceleration board into the desktop to achieve better float-point performance by migrating to the PC cluster or commercial supercomputer for medium-scale applications. In SC2006, there are at least four techniques:
GPU is dedicated ASIC for multimedia, gaming applications with optimized texture/render pipeline architecture. Generic Purpose GPU is based upon vendors’ API to boost the float-point performance. PeakStream unleashes the power of ATI GPU via ATI proprietary interface, other platforms are in development. RapidMind trade off the performance for portability by using OpenGL interface.
CELL may be the first generic purpose CPU designed for the multimedia application. There have been some commercial products available in the market besides Sony’s PS3, for example, the acceleration board from mercury.
Clearspeed acceleration board had made a big buzz in the SC2005. It is quite impressive for the computing capacity and power consumption.
FPGA based reconfigure computing
This exotic technology has been around for a few year, the main barrier for its popularity is the steep learning curve for the software developers to implement the functionalities in Hardware Description Language(HDL). Some high level programming language either lacks the expressiveness of parallelism or performance.
The main barrier for the co-processor architecture is the memory bandwidth, PCI Express is still the bottleneck for transportation of CPU and acceleration board. Maybe one day, AMD would embed the GPU to the CPU to replace the float-point unit if they could figure out the power consumption and manufacture.
Another challenge comes from the programming language and library. The new programming language designers fall into the dilemma: How could we hide the low-level detail to make it more expressive and intuitive to the programmer and exploit the low-level features to enhance the performance at the same time? We must trade off between them, but where is the turning point?