next up previous contents
Next: Systems disappeared from the list Up: Distributed-memory MIMD systems Previous: The Parsytec CC series.

The Silicon Graphics Origin series.

Machine type RISC-based distributed-memory multi-processor
Models Origin 200, Origin 2000
Operating system IRIX (SGI's Unix variant)
Connection structure Crossbar, hypercube (see remarks)
Compilers Fortran 77, Fortran 90, C, C++ , Pascal
Vendors information Web page www.sgi.com/Products/hardware/servers/index.html
Year of introduction 1996.

System parameters:

Model Origin 200 Origin 2000
Clock cycle 5.4 ns 5.12 ns
Theor. peak performance
Per proc. (64-bits) 370 Mflop/s 490 Mflop/s
Maximum (64-bits) 1.48 Gflop/s 49.9 Gflop/s
Main memory
Memory/node <= 1 GB <= 1 GB
Memory/maximal <= 4 GB <= 256 GB
Communication bandwidth
Aggregate peak 3.1 GB/s 99.8 GB/s
Bisectional 1.6 GB/s 82 GB/s
No. of processors 1--4 2--128

Remarks:

The Origin 2000 is the newest high-end parallel server marketed by SGI. The basic processor is the MIPS R10000. A maximum of 128 processors can be configured in the system. The interconnection is somewhat hybrid: 4 CPUs on two node cards can communicate directly with the memory partitions of each other via the hub, a 4-ported non-blocking crossbar. Hubs can be coupled to other hubs in a hypercube fashion.

The structure of the machine makes it somewhat difficult to classify: SGI prefers to call it a shared-memory non-uniform memory architecture system. The memory is physically distributed over the node boards but the system has one system image. Because of the structure of the system, the bisectional bandwidth of the system remains constant from 4 processors on: 82 GB/s. This is a large improvement over the earlier PowerChallenge systems which possessed a 1.2 GB/s bus.\\ The Origin 200 is a smaller configuration, using the same crossbar as the Origin 2000 but without the need for the hypercube connections used in the latter. Effectively, it is a SMP system because of the uniform access of the memory modules. Therefore, also the bisectional bandwidth is identical to the point-to-point bandwidth: 1.6 GB/s.

Parallelisation is done either automatically by the (Fortran or C) compiler or explicitly by the user, mainly through the use of directives. All synchronisation, etc., has to be done via memory. This may cause potentially a fairly large parallelisation overhead. Also a message passing model is allowed on the Origin using the optimised SGI versions of PVM and MPI. Programs implemented in this way will possibly run very efficiently on the system.

A nice feature of the new system is that it may migrate processes to nodes that should satisfy the data requests of these processes. So, the overhead involved in transferring data across the machine are minimised in this way. The technique is reminiscent of the late Kendall Square Systems although in these systems the data were moved to the active process. SGI claims that the time for non-local memory references is on average about 3 times longer than for local memory references.

Measured Performances: In [2] a speed of 10.9 Gflop/s out of 12.5 was measured on a system with 32 processors, an efficiency of 87%. Furthermore, for an MPI version of a matrix-vector multiplication from the EuroBen benchmark ([16]) a speed of 7.2 Gflop/s was attained, an efficiency of 60%.



next up previous contents
Next: Systems disappeared from the list Up: Distributed-memory MIMD systems Previous: The Parsytec CC series.



Aad van der Steen
Mon Feb 16 12:50:47 MET 1998