Although we mainly want to discuss real, marketable systems and no experimental, special purpose, or even speculative machines, we want to include a section on systems that are in a far stage of development and have a fair chance of reaching the market. For inclusion in section 3 we set the rule that the system described there should be on the market within a period of 6 months from announcement. The systems described in this section will in all probability appear within one year from the publication of this report. However, there are vendors who do not want to disclose any specific data on their new machines until they are actually beginning to ship them. We recognise the wishes of such vendors (it is generally wise not to stretch the expectation of potential customers too long) and they will not disclose such information.
Below we discuss systems that may lead to commercial systems to be introduced on the market between somewhat more than half a year to a year from now. The commercial systems that result from it will sometimes deviate significantly from the original research models depending on the way the development is done (the approaches in Japan and the USA differ considerably in this respect) and the user group which is targeted.
A development that has shown to be of significance is the introduction of Intel's IA-64 Itanium processor family. Already four vendors are offering Itanium 2-based systems at the moment and it is known that HP will end the marketing of its Alpha and PA-RISC based systems in favour of the Itanium processor family. At the same time SGI will stop the further development of MIPS processor based machines. This means that in a few years only AMD, IBM, Intel, and SUN will produce RISC-like processors for HPC systems. This will make the HPC system field much less diverse and interesting. On the other hand, the shock that was caused in the USA by the advent of the Japanese Earth Simulator system may help in refueling the funding of alternative processor and computer architecture research. Indeed, some initiatives in that direction are already under way but these will not bear real new results in one or two years (except maybe with the IBM Blue Gene, see below).
In the end of 2002 the next generation vector processor, the X1, from Cray Inc.
was ready to ship. It built on the technology found in the Cray SV-1s.
Cray
widely publicises a roadmap of future systems as far as around 2010. It remains
to be seen how much can be realised, however, at least 2 of these systems are
certain to reach the market in 2005: first the Cray X1e, which is nothing more
than the present X1 system in which the clock cycle of the processor is raised
from 800 MHz to 1.2 GHz. The other one is the Cray Strider, a commercialised
version of the AMD Opteron-based (11,648 processors) Red Storm machine that is
presently built by Cray for Sandia Laboratories. There is much interest for
this type of system because of the cheap basic processor and the fast network
based on AMD's HyperTransport and Cray's SeaStar router ASIC.
Further away lies the Black Widow a follow on to the X1e, scheduled for 2006.
Recently plans for a new type of system have been disclosed, code name
“Rainier”. In this system the inter-processor network is the
central part and in this network nodes of different type can be mixed, vector
type (Black Widow), scalar type (AMD or other), and FPGAs/DSPs for special
functions. An upgraded form of the Red Storm network seems a very good basis
for such a system but the realisation is still some time away and undoubtly
some technical challenges will be met.
HP and Intel will have a great influence in the next few years with their Itanium processor family. A dual core processor based on the Itanium is already on (or beyond) the drawing board and will hit the market in a year or two. As dual core processors usually have a relatively poorer performance than their single core equivalents, the performance improvement will not be spectacular. The system architecture will be much more important. Also a diversification of the processors themselves may help to boost the performance. Because of HP/Intel's experience with VLIW processors (as the Itanium essentially is), one might expect that the research will go in the direction of processors with even longer instruction words and possibly including specialised devices for high level operations like FFTs or sparse Matrix-Vector multiplies as well. When and how such improvements would turn up in future systems is however speculative. It will certainly not happen within the next two years. As yet no radically different system architectures are known to be on HP's drawing boards. Instead it may try to penetrate more in the cluster field were it already has installed some large Itanium-based systems.
IBM has been working for some years on its BlueGene systems. Of which the first models, the BlueGene/L, will be installed within a few months (see the BlueGene/L). Other BlueGene follow-ups are planned called the BlueGene/P with a peak speed of 1 Pflop/s, and the BlueGene/Q with a peak speed of 3 Pflop/s, respectively. All these systems are hardly meant for the average HPC user but they may help in finding suitable architectural features for systems for the general market.
Of course the development of the POWERx processors also will make its mark: the POWER5 processor has the usual technology-related advantages over its predecessor, and now it is a subject of research how to couple 8 of them such that a virtual vector processor with a peak speed of 60--80 Gflop/s can be made. This approach is called the ViVA (Virtual Vector Architecture). It is reminiscent of Hitachi's SR811000 processors (which are also POWER5 processors) or the MSP processors in the Cray X1. This road will take some years to go also after the POWER5 processor has become available and will extend to the next generation(s) of the POWERx.
Last year it has become known that SGI will stop producing its MIPS-based systems. Therefore, the difference they would like to make with respect to other vendors that also offer Itanium-based systems would have to lie in the macro-architecture of their systems. Improvements can be realised in the speed of the network (Numalink3 to Numalink4 and beyond) and systems with a large amount of processors in a single system image (SSI). In that respect SGI has a track record with its MIPS-based Origin 3000 systems which may be extended for its future Altix $x$000 systems where at present SSIs of 256 are realised.
Further in the future SGI seems to have plans that are more or less similar to Cray's Rainier project: coupling of heterogeneous processor sets by its proprietary network, in this case a successor of the NUMAlink4 network architecture. Development of such systems is very cost-intensive, so it remains to be seen whether such plans will pass the stage of intentions.
Like Cray and IBM Sun has been awarded a grant from DARPA to develop so-called high-productivity systems in DARPA's HPCS program. Up till now Sun has concentrated on developing heavily multi-threaded processors, the first product being the Niagara chip and the next, the still more multi-threaded Rock processor. The first implementation of the Niagara processor is about ready, although not product-ready. The chip harbours 8 CPU cores of which each core is 4-way multi-threaded. Systems based on the Rock processor would not be available before 2008. But also for the Niagara processor no system architecture is detailed yet which makes the future for these rather experimental systems somewhat speculative. For its near future mainstream high-end systems Sun will therefore rely on Fujitsu's SPARC64 processors.