I have just returned from the Second SIAM Conference on Parallel Processing for Scientific Computing in Norfolk, Virginia. There I heard about 1,000 processor systems, 4,000 processor systems, and even a proposed 1,000,000 processor system. Since I wonder if such systems are the best way to do general purpose, scientific computing, I am making the following offer. I will pay $100 to the first person to demonstrate a speed-up of at least 200 on a general purpose, MIMD computer used for scientific computing. This offer will be withdrawn at 11:59 PM on 31 December 1995. Some definitions are in order. Speed-up: The time taken to run an application on one processor using the best sequential algorithm divided by the time to run the same application on N processors using the best parallel algorithm. General purpose: The parallel system should be able to run a wide variety of applications. For the purposes of this test, the machine must run 3 different programs that demonstrate its ability to solve different problems. I suggest a large, dynamic structures calculation, a trans-sonic fluid flow past a complex barrier, and a large problem in econometric modelling. These are only suggestions; I will be quite flexible in the choice of the problems. Several people have volunteered to adjudicate disputes in the selection of the applications. Application: The problems run for this test must be complete applications, no computational kernels allowed. They must contain all input, data transfers from host to parallel processors, and all output. The problems chosen should be the kind of job that a working scientist or engineer would submit as a batch job to a large supercomputer. In addition, I am arbitrarily disqualifying all problems that Cleve Moler calls "embarrassingly parallel". These include signal processing with multiple independent data streams, the computation of the Mandelbrot set, etc. There are some rather obvious ground rules. Simulations of the hardware are not permitted. I am looking for a demonstration on a running piece of hardware. The same problem should be run on the sequential and parallel processors. It is not fair to cripple the sequential processor. For example, if your operating system uses 99% of one processor, the single processor run will spend all its time in the operating system. Super-linear speed-up as the number of processors is increased is evidence of this problem. It is not fair to memory starve the single processor run. If you have 100K words of memory on each of 1,000 processors, and you run a 10 MW problem, it is not fair for the sequential run to be made on a 100 KW processor. After all, we are not interested in seeing the impact of additional memory; we want to see how much the extra CPUs help. It may not be possible to follow all these additional rules. For example, you can't be expected to build a single processor with lots of extra memory or write a new operating system just for this test. However, you should do your best to make the demonstration fair. A third party has agreed to adjudicate any scaling disputes. Anyone claiming this prize will be expected to present his or her results at a suitable conference. At the end of the presentation I will have the audience vote on whether the demonstration was successful. Remember, the purpose of the bet is to force developers to demonstrate the utility of massively parallel systems, not to show they can find holes in the rules. Alan Karp na.karp@su-score.arpa ------- From GOLUB@SU-SCORE.ARPA Tue Dec 17 18:41:39 1985 Received: from SU-SCORE.ARPA (su-score.arpa.ARPA) by anl-mcs.ARPA (4.12/4.9) id AA09204; Tue, 17 Dec 85 18:41:05 cst Return-Path: Received: from su-navajo.arpa by SU-SCORE.ARPA with TCP; Mon 16 Dec 85 08:33:16-PST Received: from arthur.Purdue.EDU (purdue.edu) by su-navajo.arpa with Sendmail; Mon, 16 Dec 85 08:32:01 pst Date: Mon, 16 Dec 85 11:32:23 EST From: "John Rice" Message-Id: <8512161632.AA04135@arthur.Purdue.EDU> Received: by arthur.Purdue.EDU; Mon, 16 Dec 85 11:32:23 EST To: golub@su-navajo.ARPA Subject: Reply to Karp Cc: bajaj@Purdue.EDU, jrr@Purdue.EDU Resent-Date: Tue 17 Dec 85 16:28:32-PST Resent-From: Gene Golub (415/497-3124) Resent-To: :; Resent-Message-Id: <12167943241.14.GOLUB@SU-SCORE.ARPA> Status: R Dear Gene, December 16, 1985 I attach below a reply to the recent "prize" announcement forwarded from Alan Karp. He has touched on an interesting problem but I think the "contest" should be made more precise in its goals and procedures. Thanks for considering this reply for wider distribution John Rice ---------------------------------------------------------------------- Alan Karp has offered a prize of $100 for a demonstration of a speed-up of 200 or more for a complete application run on a general purpose MIMD machine. This prize reflects the high interest in this important scientific question; however, the context of the prize needs to be made more precise. The following points come to our minds on a first reading of the rules. 1. BEST ALGORITHM. If no restriction is placed on the algorithm class, then the best is invariably to "print out" the answer. Thus the question would revolve about parallelism in output devices which is probably not Karp's intent. If reasonable restrictions are placed on the algorithm class, then the best algorithm (either sequential or parallel) is unknown for almost any interesting application. If "best" is replaced by "commonly used", then one allows some terrible algorithms. Note that given algorithm X, it is highly probable that there is an "expert" with "impeccable credentials" who will testify that X is best (another can be found who will testify that X is worst). Karp probably intends to accept algorithms which are "widely accepted as good"; in any case, this term needs to be clarified. Note that best (or even good) algorithms depend on the application size, the number N of MIMD processors, and the architecture, among other things. 2. APPLICATION. It seems that an application is expected to be a large complex computation. The cost to create a "good", "production quality" program of 10,000-100,000 lines is the order of $1 million. Few people or agencies can afford to create two such programs just to prove the point of this contest. 3. EMBARRASSINGLY PARALLEL. A working definition of an embarrassingly parallel application is one which can be sped-up by a factor of 199 or more by an MIMD machine. Then the scientific questions of interest is how many applications are embarrassingly parallel and this contest is irrelevant. A reasonable definition of the excluded problems should be given. Is the method of lines excluded? If not, consider 3D parabolic problem with a 100x100x100 spatial grid and a reasonably complex (say, 500-1000 operations to evaluate) f(x,y,z,u,ux,uy,uz,uxx,uyy,uzz) term. 4. WHAT IS AN ANSWER? For simple linear algebra problems one would insist on both programs producing identical numbers (modulo round-off?). For real applications, the term answer is not yet well defined. For example, I can "solve" a PDE by producing A. A table of values on a 500x500 grid that are accurate to .001. B. A table of values on a 13x11 grid that are accurate to .000001. C. The 2500 coefficients of a power series that is accurate to .0002. D. A plot of the solution that is accurate to within the eye's ability to distinguish solutions. Are these equally acceptable? Are they equally acceptable in all accuracies? Are all .01? It is almost certain that the best sequential and parallel algorithm for solving a particular PDE (or any application involving one) will not produce identical numbers -- or numbers that should be identical. 5. WHAT IS "CRIPPLING"? The "proper" balance between CPU power, memory and I/O capacity depends significantly on the architecture; there will be large differences in the balance between a uniprocessor and a 1000 processor machine. In practice, however, we do not care about the proper balance; we care about the actual balance. Does the contest involve "properly" balanced machines or actual machines? A 1000 node NCUBE machine has 128 megabytes of memory; is it fair to compare it to a Cray with 8 megawords of memory? And, if it is unfair, to which is it unfair? C. Bajaj and J. Rice bajaj@purdue jrr@purdue