ccm_checkin

CCM barrier with timeout and deadlock detection

Routine:

ccm_checkin

Purpose:

Block at a point in the program and wait a given maximum number of seconds. Signal a timeout if all processes have not reached that point before the time has expired. Checkin points can be given a label. If two tasks call ccm_checkin with different label at the same time then a deadlock is signalled.

Minimal calling sequence:

call ccm_checkin()

Required Arguments:

NONE

Call with all Optional Arguments:

call ccm_checkin(wait,name,the_err)
wait :: real,intent (in)
Seconds to wait before signalling a timeout at this point. Default is 10 seconds.
name :: character (len=*) ,intent (in)
A label for the routine. If two processes call ccm_checkin with different labels at the same time then a deadlock is signalled. The label defaults to "ccm_undefined."
the_err :: integer, intent (out)
Error code 0 = success, != 0 either timeout or deadlock.
See Specifying Optional Arguments for the syntax for using optional arguments.

Example:


program ccm_checkin_x1
    use ccm
    implicit none
    integer my_id,num_nodes,the_err
    call ccm_init(my_id,num_nodes)
    call ccm_checkin(10.0,"hello",the_err)
    if(the_err .ne. 0)then
         write(*,*)"error in checkin ",the_err
    else
        write(*,*)" checkin ok"
    endif
! the next line will cause a timeout"
    if(my_id .eq. 0)call ccm_checkin(5.0)
    call ccm_barrier()
! the next lines should cause a deadlock that is detected"
    if(my_id .eq. 1)then
       call ccm_checkin(10.0,"one")
    else
       call ccm_checkin(10.0,"the_rest")
    endif
    call ccm_close()
end program

Example output on 4 processors


[ccm_host:~/ccm/source]% ccm_checkin_x1
  checkin ok
  checkin ok
  checkin ok
  checkin ok
 Warning from collective communications module
  routine: ccm_checkin
  process   0  timed out at   11295391.0220000  called with label ccm_undefined
 
 
 
 Warning from collective communications module
  routine: ccm_checkin
  deadlock detected at   11295391.0380000  process   0  and   1
  waiting for label one but got  the_rest
 
 
 Warning from collective communications module
  routine: ccm_checkin
  process   1  timed out at   11295401.0380000  called with label one
 
 Warning from collective communications module
  routine: ccm_checkin
  process   2  timed out at   11295401.0500000  called with label the_rest
 
 Warning from collective communications module
  routine: ccm_checkin
  process   3  timed out at   11295401.0650000  called with label the_rest
   [ccm_host:~/ccm/source] % 

The call to ccm_init initializes the communication package. The ccm_checkin with label "hello" passes without a problem. Since the second ccm_checkin is only called by only task 0, it times out. Next, ccm_checkin is called with different labels on various tasks. Task 0 detects the deadlock and returns. The other tasks timeout waiting for process 0.



Error conditions:


Back to API and user's guide