Relatively little attention was paid in the early days of parallel computers to debugging the resulting parallel programs. We developed our approaches by trial and error during our various experiments in CP, and debugging was never a major research project in CP.
In this section, we shall consider some of the history and current technology of parallel debugging, as developed by CP.
Method 1. Source Scrutiny The way one worked on the early CP machines was to compile the target code, download it to the nodes, and wait. If everything worked perfectly, results might come back. Under any other circumstances, nothing would come out. The only real way to debug was to stare at the source code.
The basic problem was that while the communication routines discussed in the previous chapter were adequate (and in some sense ideal) for the task of algorithm development, they lacked a lot in terms of debugging support. In order to ``see'' the value of a variable inside the nodes, one had to package it up into a message and then send it to the host machine. Similarly, the host code had to be modified to receive this message at the right time and format it for the user's inspection. Even then only node 0 could perform this task directly, and all the other nodes had to somehow get their data to node 0 before it could be displayed.
Given the complexity of this task it is hardly surprising that users typically stared at their source code rather than attempt it. Ironically this procedure actually tended to introduce new bugs in the process of detecting the old ones because incorrect synchronization of the messages in nodes and host would lead to the machine hanging, probably somewhere in the new debugging code rather than the location that one was trying to debug. After several hours of fooling around, one would make the painful discovery that the debugging code itself was wrong and would have to start once more.
Method 2. Serial Channels In building the first CP hypercubes, each node had been given a serial RS-232 channel. No one quite knew why this had been done, but it was pointed out that by attaching some kind of terminal, or even a PC, it might be possible to send ``print'' statements out of the back of one or more nodes.
This was quickly achieved but proved less than the dramatic improvement one would have hoped. The interface was really slow and only capable of printing simple integer values. Furthermore, one had to use it while sitting in the machine room and it was necessary to attach the serial cable from the debugging terminal to the node to be debugged-an extremely hazardous process that could cause other cables to become loose.
A modification of the process that should probably have pointed us in the right direction immediately was when the MS-DOS program DEBUG was modified for this interface. Finally, we could actually insert breakpoints in the target node code and examine memory!
Unfortunately, this too failed to become popular because of the extremely low level at which it operated. Memory locations had to be specified in hexadecimal and code could only be viewed as assembly language instructions.
A final blow to this method was that our machines still operated in ``single-user'' mode-that is, only a single user could be using the system at any one time. As a result, it was intolerable for a single individual to ``have'' the machine for a couple of hours while struggling with the DEBUG program while others were waiting.
Method 3. Cubix As has been described in the previous section on communication systems, the advent of the Cubix programming style brought a significant improvement to the life of the parallel code developer. For the first time, any node could print out its data values, not using some obscure and arcane functions but with normal printf and WRITE statements. To this extent, debugging parallel programs really did become as simple as debugging sequential ones.
Using this method took us out of the stone age: Each user would generate huge data files containing the values of all the important data and then go to work with a pocket calculator to see what went wrong.
Method 4. Help from the Manufacturer The most significant advance in debugging technology, however, came with the first nCUBE machine. This system embodied two important advances:
The ``real'' kernel was a mixed blessing. As has been pointed out previously, we didn't really need most of its resources and resented the fact that the kernel imposed a message latency almost ten times that of the basic hardware. On the other hand, it supported real debugging capabilities.
Unfortunately, the system software supplied with the nCUBE hadn't made much more progress in this direction than we had with our ``DEBUG'' program. The debugger expected to see addresses in hex and displayed code as assembly instructions. Single stepping was only possible at the level of a single machine instruction.
Method 5. The Node Debugger: ndb Our initial attempt to get something useful out of the nCUBE's debugging potential was something called ``bdb'' that communicated with nCUBE's own debugger through a pipe and attempted to provide a slightly more friendly user interface. In particular, it allowed stack frames to be unrolled and also showed the names of functions rather than the absolute addresses. It was extremely popular.
As a result of this experience, we decided to build a full-blown, user-friendly, parallel programming debugger, finally resulting in the CP and now ParaSoft tool known as ``ndb,'' the ``node debugger.''