Minutes of the Message Passing Interface Forum
                                Dallas, Texas
                              August 11 - 13, 1993

The MPI Forum met August 11-13, 1993, at the Bristol Suites Hotel in North 
Dallas. 

This was the eigth meeting of the MPIF and the sixth regular working 
meeting in Dallas. There were both general meetings of the committee as a 
whole and meetings of several of the subcommittees. 

This meeting included the first reading of the Communication Contexts, 
Environmental Management and Subset chapters, the second reading of the 
Process Topologies chapter and formal consideration of various topics in 
the Point-to-point and Collective Communication chapters. There were a 
substantial number of formal votes taken at this meeting as well as a few 
straw votes. All of the votes are recorded in these minutes (and can be 
found by searching for VOTE) and have also been published in summary form 
to the mpi-core mailing list. 

These minutes were written by Bob Knighten (knighten@ssd.intel.com) and 
Rusty Lusk (lusk@anl.gov). 

These minutes are quite long.  If you want to see the important topics you
can search for --- and this will quickly lead to each topic (and a few
other things.)

The basic document that was used at this meeting are:

  + DRAFT Document for a Standard Message-Passing Interface (August 10,1993)
     
  + MPI Environmental Management section 


Attendees:
---------

Robert G. Babb II	U. of Denver			babb@cs.du.edu
Doreen Cheng		NASA/Ames			dcheng@nas.nasa.gov
Lyndon Clarke		University.of Edinburgh		lyndon@epcc.ed.ac.uk
James Cownie		Meiko				jim@meiko.co.uk
Jack Dongarra		UT/ORNL				dongarra@cs.utk.edu
Anne C. Elster		Cornell University.		elster@cs.cornell.edu
Jim Feeney		IBM Endicott			feenyj@vnet.endicott.ibm.com
Al Geist		ORNL				gst@ornl.gov
Ian Glendinning		University. of Southampton	igl@ecs.soton.ac.uk
Brian K. Grant		LLNL				bkg@llnl.gov
Adam Greenberg		TMC				moose@think.com
Leslie Hart		NOAA/FSL			hart@fsl.noaa.gov
Don Heller		Shell Development		heller@shell.com
Tom Henderson		NOAA/FSL			hender@fsl.noaa.gov
Alex Ho                 IBM Almaden                     wkh@almaden.ibm.com
Gary Howell		Florida Tech			howell@zach.fit.edu
Steven Huss-Lederman    SRC                      	lederman@super.org
Bob Knighten		Intel SSD			knighten@ssd.intel.com
Rik Littlefield		PNL				rj_littlefield@pnl.gov
Rusty Lusk		ANL				lusk@mcs.anl.gov
Peter Madams		nCube				pmadams@ncube.com
Dan Nessett		LLNL				nessett@llnl.gov
Steve Otto		Oregon Graduate Instiute	otto@cse.ogi.edu
Peter Pacheco		U. of San Francisco	       peter@sun.math.usfca.edu
Anthony Skjellum	Mississippi State U.		tony@cs.msstate.edu
Marc Snir               IBM, T.J. Watson            	snir@watson.ibm.com
Alan Sussman		University. of Maryland		als@cs.umd.edu
Bob Tomlinson		LANL 				bob@lanl.gov
Eric T. Van de Velde	CalTech				evdv@ama.caltech.edu
David Walker		ORNL				walker@msr.epm.ornl.gov
Joel Williamson		Convex Computer			joelw@convex.com


Wednesday, August 11
---------  ---------

-------------------------------------------------------------------------------
                               General Meeting
-------------------------------------------------------------------------------


Jack Dongarra opened the meeting by presenting the agenda that was previously 
sent out by David Walker.

				AGENDA
				------
Wednesday
	1:00 -  2:00	Subcommittee meetings
	2:00 -  4:00	Point-to-point communications	Snir
	4:00 -  5:00	Collective communications 		Geist
	6:00 -  7:30	Dinner
	7:30 - 10:00	Subcommittee meetings
        
Thursday
	9:00 - 12:00 	Context  						Skjellum
   12:00 -  1:30 	lunch
	1:30 -  2:30 	Context
	3-4 subset    Huss
	4-6 topology  Huss
	6-8 dinner
	8-10 subcommittees

Friday 
	9-10:30 env       Lusk
	10:30-12 language Lusk

Status of Readings

sec\date	May		June	August	September
--------- +--------------------------------------------------------------
p-p		  |	2	
coll	  |	1		2		2
profile	  |	1		2
context	  | 				1		2
topology  |			1		2
subset	  |			1		2		2
lang	  |					1		2
env		  |					1		2


The next meeting will be September 22-24.  It will again be 
here in Dallas.


Started at 2:10
-------------------------------------------------------------------------------
           Report From the Point-to-Point Communication Subcommittee
-------------------------------------------------------------------------------

Marc Snir presided.

Marc reorganized the chapter to make it more readable.  He also added the 
material in Section 4.13 (Derived datatypes) in line with the straw vote 
at the last meeting.

In response to a question, Marc noted that he welcomes editorial 
comments, and asks that they be sent to him in e-mail.

  Derived datatypes (4.3)
  -----------------------
  
Marc began by describing the ideas behind "Derived datatypes".

What is the relation of this for Fortran 90 data types?  Largely 
orthogonal.

Organization count: 20

STRAW VOTE: Should "Derived datatypes" be added to MPI?
----------  Yes: 22 No: 0 Abstain: 0

    Introduction (4.13.0)
    ---------------------
    
VOTE: Approve 4.13.0 (Introduction)?
----  Yes: 19  No: 0  Abstain: 1


    Datatype constructors (4.13.1)
    ------------------------------
    
Marc gave brief descriptions of the five functions in this section and 
contrasted them with the earlier buffer construction functions.

VOTE: Approve 4.13.1 (Datatype constructors)
----  Yes: 20  No: 0  Abstain: 0


	Additional functions (4.13.2)
	-----------------------------
	
It was noted that the sentence starting at line 33 on p. 77 is wrong and 
contradicts what follows.  Marc agreed and will repair this.


There was some discussion of exactly what is passed in a message using a 
datatype that contains gaps.  There was no disagreement and this will be 
clarified.

The fact that in MPI_ADDRESS has in integer OUT parameter that provides 
the byte address of location is a problem on some architectures was 
briefly discussed.  One proposal was to use a suitable implementation 
specific definition of the return type.

There was also a discussion of MPI_TYPE_COMMIT primarily to better 
understand what was intended here.  The principal confusion was that 
MPI_TYPE_COMMIT has to do with completing the definition of the type, NOT 
committing data.

There is an alternative form of MPI_TYPE_COMMIT with only one paramater 
(type).  Then would need commit before communication; can use type in 
constructors after commit.

Yet another alternative is a lazy commit, i.e. a datatype object becomes 
commited at first use in a communication.  

There are several issues in considering these alternatives.  One is ease 
of use for the programmer (lazy commit is easy); use of resources (the 
original allows reclaiming resources as soon as they are not being used); 
the cost of using a datatype buffer in a communication.

Adam Greenberg proposed yet another option - provide an optional 
function, MPI_TYPE_COMPILE, which can be used for efficiency but 
otherwise there is lazy commit.  The objection to this was efficiency of 
the communication.  Adam's counter was that he expected this to be in the 
noise of the general overhead of communication.

Marc noted that we need to write up full proposals and have a more 
focused discussion.


VOTE: Approve 4.13.2 (Additional functions) EXCEPT MPI_TYPE_COMMIT and 
----  MPI_TYPE_FREE?
	  Yes: 16  No: 0  Abstain: 2


    Use of general datatypes in communication (4.13.3)
    --------------------------------------------------
    
This section specifies how datatypes are used in send and receive, 
including how type matching working.  In particular datatypes match if 
they are structurally equivalent (i.e. the type signatures are the same.)

Is there some function that gives a count (of some suitable sort) of 
elements in a send when using datatypes?  {{{I am confused?}}}

VOTE: Have query function that returns ther number of top-most elements 
----  received?
      Yes: 14 No: 0 Abstain: 4


VOTE: Have ONLY a query function that returns the count of top-most level 
----  elements in a receive.
      Yes: No: Abstain:

VOTE: Approve 4.13.3 (Use of general datatypes in communication) as amended.
----  Yes: 15  No: 0  Abstain: 4


	Examples (4.13.4)
	-----------------
	
There are no requirements here and so no vote was taken.


	Correct use of addresses (4.13.5)
	---------------------------------

This section is about dealing with addressing when the system does not 
have  limits in using addresses systems which do not have a flat 
address space.  

VOTE: Approve section 4.13.5 (Correct use of addresses)?
----  Yes: 18  No: 0  Abstain: 1


    Message data (4.2.1)
    --------------------
    
Marc asked if the table in this section needs to be expanded to include 
all of the native C data types.  YES!


Marc reviewed all of the changes (other than order) that he made in this 
chapter.

More detail in the discussion of the semantics of point-to-point 
communication. Has added Progress (some guaranteed) and Fairness (none 
guaranteed) requirements. 

{{{Where?}}}  Can use null when nothing is needed.

On p. 61 there is a new function, MPI_TESTALL, as suggested by David 
Walker.  It is included for reason of completeness and symmetry.

{{{See Discussion on p. 62}}}

VOTE: Allow null pointers in an array of pointers (with the system 
----  required to do the right thing?
      Yes: 20  No: 0  Abstain: 1


On p. 63 there is a new function, MPI_PROBE_COUNT, which uses a datatype 
to interpret the result of a probe and get a count.

On p. 64 there is a typo in MPI_PROBE.  There should not be a datatype 
parameter.

On p. 65, the name of MPI_IS_CANCELLED has been changed to 
MPI_TEST_CANCELLED.

The time arrived to make a decision on SENDRECV (section 4.11).

Adam Greenberg suggested that there should be two tag arguments (for 
sending and for receiving) rather than only one.  One effect of this is 
that a wild card can now be used in the receive_tag.

VOTE: Have two tag arguments (send_tag and receive_tag) in MPI_EXCHANGE?
----  Yes: 6  No: 3  Abstain: 10


Section 4.12 (Null processes) is a proposal to the suggestion from Jon 
Flower that was supported in a straw vote at the last meeting.

There was a substantial discussion of the utility and cost of 
MPI_PROCNULL.  Steve Huss-Lederman, as a proxy for Rolf Hempel, argued 
for the value of this in use with process topologies.  Various people 
argued that there would a universal overhead if MPI_PROCNULL were allowed 
as a source/destination.  

Three positions: Use as legitimate source/destination; only in send/receive/exchange; not 
legitimate source/destination.
time (based on the suggestion from Jon Flower.)  

VOTE: (1) Allow MPI_PROCNULL as a source or destination for all 
          communication operations.
      (2) Allow MPI_PROCNULL only for MPI_SENDRECV and MPI_EXCHANGE.
      (3) Never allow MPI_PROCNULL as a source or destination in 
          communication operations.

          (3) Yes: 3  No: 9  Abstain: 8
          (2) Yes: 11  No: 5  Abstain: 6


Section 4.14 (Universal communication functions) includes one new 
convenience function, MPI_COMM_INIT.  This was added by Marc Snir to make 
it part of the base functions in terms of which all other functions can 
be defined.

An alternative approach is to put section 4.14 in an Annex and NOT 
require these functions as part of 

VOTE: Approve Chapter 4 (Point to Point Communication) as ammended.
----  Yes: 18  No: 0  Abstain: 1


Note that all that remains to consider in Chapter 4 are the type_commit 
and free functions.

-------------------------------------------------------------------------------
             Report From the Collective Communication Subcommittee
-------------------------------------------------------------------------------

Al Geist presided.

This was a continuation of the second reading of this chapter that was 
begun at the last meeting.  The number of changes was small.

The format of the buffer arguments have all been changed to agree with 
those in the Point-to-point Communication chapter.

The material on user_reduce has been rewritten to provide two variants, 
one assuming commutativity of the user operation and one not.

Steve Huss-Lederman asked the question if there are ANY collective 
operations that are guaranteed to give the same result for repeated runs 
with the same initial conditions.  Al Geist replied "Broadcast.  Next 
question."

A long discussion ensued.  As in previous discussions of this topic, 
there was a wide spectrum of opinions expressed.  Some insisted that 
reproducibility, at least in a debug mode, is required.  Others insisted 
this is a quality of implementation issue.  Other opinions expressed 
included that this is outside of the scope of MPI; an implementation must 
document the extent to which reproducibility is available; etc.

In line with the Discussion paragraph on p. 95, it was agreed that it is 
unneeded to have the completely general ALLTOALL.

Rik Littlefield noted that there is a useful kind of reduce that is 
missing, a "scatter-reduce" that does a reduction of sections of an array 
to an array of processes.  He will write up a proposal and distribute it 
to the collective communication mailing list.

Al Geist noted the change mentioned in the paragraph labeled Missing on 
p. 99.  There was some confusion, so Al Geist promised to add an example 
to clarify this.

Jim Cownie pointed out that it is important that implementors should 
provide an implementation guide that specifies which of the collective 
operations that may/may not be synchronizing actually are synchronizing.

Adam Greenberg countered that users must assume that these operations are 
not synchronizing and therefore such a document serves no purpose.

Adam Greenberg also expressed unease with section 5.6 and 

STRAW VOTE: Should MPI require documentation of the implementation 
----------  variations in synchronization properties of collective operations.
            Yes: 18  No: 2  Abstain: 2

Eric Van de Velde asked why two versions of user_reduce were being 
provided.  A brief review of the arguments of last time ensued.

VOTE: Approve Chapter 5 (Collective Communications)
----  Yes: 21  No: 0  Abstain: 0


-----------------------------------------------------------------------------
The group adjourned for dinner at 6:10pm

============================================================================= 

Thursday, August 12, 1993
--------  ---------------

------------------------------------------------------------------------------- 
Report From the Communication Contexts Subcommittee 
------------------------------------------------------------------------------- 

Anthony Skjellum presented.

Group, Contexts and Communicators (Chapter 3) - First Reading
-------------------------------------------------------------

  Introduction (3.1)
  ------------------
  
  Objection to use of term intra-communication and inter-communication 
  without definition.  This is editorial and will be addressed outside 
  the meeting.
  
  There are no requirements in this section, so no vote was taken.
  
  
  Context (3.2)
  -------------
  
  There was confusion about the term "hypertag".  This too is editorial.
  
  There are no requirements in this section, so no vote was taken.
  
  
  Groups (3.3)
  ------------
  
	Predefined Groups (3.3.1)
	-------------------------
	
It was proposed to change the wording describing MPI_GROUP_ALL to be "all 
processes at moment of process creation" and to add another group, 
MPI_GROUP_SIBLING which is "all processes with the same program text." 

Don Heller asked about system defined server processes and the like.  It 
was agreed to modify the wording to include these.

Jim Cownie noted that there is a problem with the notion of HOST because 
the host would have to have many different versions of MPI_GROUP_HOST.  
As an alternative Jim proposed that host should acessible via some 
constant or function that would give the rank of the host in MPI_ALL.

After various proposals for additional predefined groups {MPI_GROUP_PEER 
"all processes except the host (if there is one)", MPI_GROUP_PARENT "parent 
of all children spawned"} it was proposed that this section be revised to 
say something like "There are no predefined groups. The effect of 
predefined groups is gotten by using the groups associated with the 
predefined communicators." 
	
Adam Greenberg asked that a vote on this section NOT be taken until after 
the decision on which communicators are predefined.


  Communicators (3.4)
  -------------------
  
    Predefined Communicators (3.4.1)
    --------------------------------
    
MPI_COMM_ALL
MPI_COMM_SIBLING
MPI_COMM_PEER   
MPI_COMM_PARENT
MPI_COMM_SELF

After dealing for a time with the complexity and inclarity of this 
situation various alternatives were offered.  The simplest was to have 
only MPI_COMM_ALL.  Dan Nessett 

Organization count: 22

VOTE: Revise sections 3.3 and 3.4 as follows:
----    (1) There are no predefined groups
        (2) The only predefined communicators are MPI_COMM_ALL and 
            MPI_COMM_PEER.
        (3) There is a predefined MPI_HOST_RANK which gives the rank of 
            the host in the ALL group.  It is MPI_UNDEFINED if there is 
            no host.
      Yes: 17  No: 2  Abstain: 3


VOTE: Approve sections 3.3 and 3.4 as amended.
----  Yes: 18  No: 0  Abstain: 3


  Group Management (3.5)
  ----------------------
  
    Local Operations (3.5.1)
    ------------------------

There was some discussion of how MPI manages the coordination between 
various groups (always by relation to the ALL group), which of the 
inquiry functions in this section properly belong in the environmental 
sections (none) and general confusion about what the various functions 
do.  No changes were made.

VOTE: Approve section 3.5.1 (Local Operations)?
----  Yes: 19  No: 0  Abstain: 3


    Local Group Constructors (3.5.2)
    --------------------------------

Most of the discussion had to do with clarification and editorial 
corrections.  

Eric V. asked for a function to reorder the ranks in a group. After some 
discussion as to exactly what is needed and why it was noted that the 
MPI_LOCAL_SUBGROUP provides the desired function. It was agreed to add a 
remark to this effect.

It was noted that all of the functions in section 3.5 need descriptions 
rather than just names and parameters.

Marc Snir asked for a clarifying note that of the functions in this 
section only MPI_LOCAL_SUBGROUP and MPI_LOCAL_SUBGROUP_RANGES can change 
the ranks.  This has as a side effect that the ranges must not overlap.
This was agreed.

Lyndon Clarke asked for a clarification that the effect of 
MPI_LOCAL_SUBGROUP_RANGES is as though the ranges were expanded to a list 
of ranks and MPI_LOCAL_SUBGROUP were called with these ranks.  There 
should be a similar statement for the relation between 
MPI_LOCAL_SUBGROUP_EXCL_RANGES and MPI_LOCAL_EXCL_SUBGROUP.  This was 
agreed as well.

VOTE: Approve section 3.5.2 (Local Group Constructors) as clarified?
----  Yes: 17  No: 0  Abstain: 5


    Collective Group Constructors (3.5.3)
    -------------------------------------
    
The phrase "a stable sort is used to determine rank order" on line 23 of 
p. 18 will be change to say that in the event of ties the rank in the 
comm group will be used to determine the rank in new_group.

After discussion of the meaning of MPI_COLL_SUBGROUP it was proposed to 
have instead:

MPI_COLL_SUBCOMM(comm, key, color, new_comm)

which will then appear in section 3.7.3.  The effect on section 3.5.3 
is that it would simply say there there are no collective group 
constructors.

VOTE: Approve section 3.5.3. (Collective Group Constructors) as amended?
----  Yes: 18  No: 0  Abstain: 3


---
break  10:30 - 11
---

Sections 3.6 and 3.7 (pp. 18-21) were handled by giving a function by 
function discussion followed by an overview of a "tuning" proposal by 
Marc Snir.

  Operations on Contexts (3.6)
  ----------------------------
  
    Local Operations (3.6.1)
    ------------------------
    
    Collective Operations (3.6.2)
    -----------------------------

In MPI_CONTEXTS_ALLOC, the len parameter is removed.  The "void *" in the 
descriptions is removed and better words will be provided.

The idea of quiescence that was prominent in the context proposal at the 
last meeting has largely disappeared.  The manner of dealing with the 
problem that quiescence was designed to solve is discussed at length on 
p. 19.

A discussion of the relation of point-to-point and collective 
communication was prompted by a dispute between Marc Snir and Jim 
Cownie.  Jim made the point that the collective communication routines 
can be written using the point-to-point communication routines and the 
material in the context chapter.  It was noted that there is one 
collective communication routine in the context chapter - 
MPI_CONTEXTS_ALLOC - and some system magic must insure that this works 
correctly.  Marc Snir noted that the system can provide similar magic 
throughout for optimization purposes.
    

  Operations on Communicators (3.7)
  ---------------------------------

	Local communicator Operations (3.7.1)
	-------------------------------------

There were no issues in this section.
	
	
	Local Constructors (3.7.2)
	--------------------------

There is a MPI_COMM_BIND functions missing.  It was accidently deleted 
and will be added.  The form is:

MPI_COMM_BIND(group, context, new_comm)
  IN	group	
  IN	context		
  OUT	new_comm

The details will be provided in the next draft.

The name of the function MPI_COMM_UNBIND will be changed to MPI_COMM_FREE 
(and the function of this name in the point-to-point chapter will be 
renamed.)

The frequent reference to MPI_COMM_GROUP(comm) will be changed to 
MPI_COMM_GROUP(comm, group).


	Collective Communicator Constructors (3.7.3)
	--------------------------------------------

The one collective operation for communicators is MPI_COMM_MAKE.

Adam Greenberg noted that as currently written every member of the group 
associated with sync_comm gets comm_new which has 


At this point Marc Snir 

An out of context proposal

 - Only use of context is for local creation of communicators.
 - Result can be achieved without explicit context object
 (some loss of safety)
 - Either case needs ruls for coordinated context allocation.
 
 Communication context
   - specified by communicator
   - can be "preallocated" and then locally bound to communicator.
   
MPI_CONTEXTS_ALLOC(comm, n) - preallocates n contexts and "caches" them 
with comm.

{This can be called repeatedly and adds the number of contexts specified 
on each call.  This is a collective operation.}

MPI_CONTEXTS_FREE(comm, n) - releases up to n preallocated contexts.

{This can be called repeatedly.  It is a local operation.}

MPI_COMM_CONTEXT(comm, n) - queries the number of available preallocated 
contexts.

{This is a local operation.}

MPI_COMM_DUP(comm, new_comm) - duplicates a comunicator.  Uses locally 
cached context, if available, otherwise this is a collective operation

{It is erroneous if some but not all have locally cached context 
available.  Note that new_comm does NOT have any cached context.)

MPI_COM_LDUP(comm, new_comm) - duplicates a communicator.  Uses locally 
cached context and returns NULL if none available.


MPI_COMM_MAKE(sync_comm, comm_group, comm_new)

MPI_COMM_LMAKE(sync_comm, comm_group, comm_new)
  - both of these create new communicator associated with the comm_group 
  which is a subgroup of the group of sync_comm.  This must be call by 
  all members of the group of sync_comm.

Some unease was expressed about sometime collective sometimes not.  Safety?

In response Marc noted that there could also be MPI_COMM_GDUP which would 
always do a collective operation.

Correctness rule
----------------

All processes in a comm group must execute the same sequence of calls to 
MPI_CONTEXT_ALLOC, MPI_COTEXT_FREE, MPI_COMM_xDUP, MPI_COMM_xMAKE with 
comm as argument.

  - simple to state
  - same as for collective communication
  - too conservative?

Note: This is compatible with the existence of static preallocated 
contexts.

At this point Lyndon Clarke, having been waving his hand in the air for 
several minutes, stood on his chair to try and get Marc to address his 
question.

After a brief discuss between Adam, Marc and Tony, Lyndon was allowed to 
speak.

Lyndon Clarke noted that there is no way for the system to check for proper 
usage of arguments, so this offers no additional security compared with 
earlier proposals. Others noted that this did provide some additional 
safety, but it is hard to make direct comparison. 

Steve Huss asked if any one in the contex comm wanted to keep the current 
material.  Tony said no, but that would likely not be true if Mark Sears 
were here.

STRAW VOTE: Make this into a full proposal to replace the current 3.7.2 & 
----------  3.7.3
			Yes: 25 No: 0 Abstain: 2

--
break for lunch  12:10 - 1:40
--

  Cacheing (3.9)
  --------------
  
Rik Littlefield presented this material.

Attribute Cacheing

Function:
	Safely attach arbitrary informatio to groups (and communicators).
	
Purpose:
	Allow modules to retain or exchange gropup-specific information WITHOUT 
	complicated calling sequences or correctness ruls for use of module
	
Examples


Basic Capabilities:

	- Attributes are local
	- Attribute value can be pointer to arbitrary structure
	- Attributes are referenced by a key value obtained from MPI
	- Attributes can be defined and retrieved
	- Destructur routine is called when the group (communicator) is freed
	- Propagation routine is called when the group (communicator) is 
      duplicated.

Funtions:
	MPI_GET_ATTRIBUTE_KEY( OUT 	keyval)
	
	MPI_GROUP_ATTR(IN group, OUT attr_set_handle)
	MPI_COMM_ATTR(IN comm, OUT attr_set_handle)
	
	MPI_SET_ATTRIBUTE(
		IN attr_set_handle
		IN keyval
		IN attribute_value
		IN *attribute_destructor_routine
		IN *attribute_propagation_routine
		)
		
	MPI_TEST_ATTRIBUTE(
		IN attr_set_handle
		IN keyval
		OUT attribute_value
		OUT result_status
		)
		
	attribute_destructor_routine(IN attribute_value)
	attribute_propagation_routine(IN attribute_value, ...<TBD>...)
	
This list is an updated and organized variant of the text. In particular 
the two routines, MPI_ATTRIBUTE_ALLOC and MPI_ATTRIBUTE_FREE, have been 
eliminated. 

What is the rationale for these functions altogether?  These provide a 
method for managing resources associated with groups and communicators.  
For example this provides facilities to implement the topology facilities 
on top of MPI.

Rik observed that this allows one to effectively extend MPI, e.g. to 
provide a user written collective operation that can be safely use with 
MPI and which looks like an MPI routine.

Marc Snir asked for a routine to change the value of attributes without 
having to provide the destructor and propagation routines.  There was a 
question if this introduced a degree of insecurity.  Jim Cownie noted 
that one might well want at attribute with null destructor and 
propagation routines.

Such a reset routine will be provided in the next draft of this chapter.

Do we need attributes for both groups and communicators?  Why not just on 
groups?  This would allow elimination of attribute handles.  There do 
seem to be situations where it is need on communicators, not just on 
groups.

Adam noted that this proposal puts a resource burden on the system, so he 
asked about the possiblity of providing only a single system slot with 
the remainder of the storage provided by the user.  Adam is concerned 
about the admixture of resources at both user and system level.  {{{I'm 
confused}}}

Tony proposed adding MPI_ATTRIBUTE_ALLOC and MPI_ATTRIBUTE_FREE back.

It was claimed that providing attribute allocate and free routines 
together with a call back mechanism associated with the free is 
sufficient to provide all of the functions in section 3.9.  Various 
people countered that this would introduce new problems of coordination 
and safety.  In particular each library might have independent attribute 
mechanisms and this would require using multiple callback on each call of 
free.

It was noted that this is very similar to the that was solved in X by 
using a register of callback functions.  That can provide a model for 
this group to use.


STRAW VOTE: Do we want a cacheing mechanism?
----------  Yes: 14  No: 4  Abstain: 7


How would toplogy need this?  Steve Huss, as virtual Rolf Hemple, noted 
that topologies need a variety of information (e.g. dimensions, 
periodicities) that need to be associated with group (and 
communicators?).  As topologies are a part of MPI, a general cacheing 
mechanism is not required.  But without it there is likely to be 
conflicts between topology and user written libraries.


VOTE: Approve section 3.9 (Cacheing) as amended in the presentation?
----  Yes: 8  No: 7  Abstain: 6


--------
break 2:45 - 3:10

Adam Greenberg asked about having a User's Guide meeting tonight.
There are enough to have a meeting here after dinner
--------

  Introduction to Inter-Communication (3.8)
  -----------------------------------------
  
Tom Henderson presented on Inter-communication

( A ) <=> ( B ) <=> ( C )
  /         /
 Arank     Brank
 
 Arank:
 -----
   send(...,Brank,tag, commAB)
   
Brank:
-----
	recv(...,Arank,tag,commAB,...)

Want to be able to send from process in one group to process in another 
group using the rank in the target group.

                      ALTERNATIVES

Local group has acess to remote group and have a rank translation in some 
common ancestor.

User maintains tables and communicators for group-pairs.


{{{SLIDES HANDED OUT}}}

STRAW VOTE: Hear the full proposal?
----------  Yes: 25  No: 1  Abstain: 4


Basic concepts

	Local Group						Remote Group
		local group leader				remote group leader

Peer-group that contains both group leaders.  {Not to be confused with 
MPI_COMM_PEER introduced earlier today.}

-
All members of both groups must call MPI_COMM_PEER_MAKE.
What is reason for tag? - serves as the identifier for this particular 
inter-communicator.

Joel Williamson asked why do this rather than just working in MPI_ALL?  
So can use names that are convenient for the problem at hand.  This is 
also suitable for generalization to a dynamic situation rather than the 
static situation that is now in MPI.
-

Why does everyone call PEER_MAKE?  To get a common communicator.
-

Discussion slide

Add an IN argument, my_leader_rank, to MPI_COMM_PEER_MAKE().  This allows 
later addition of dynamic process creation.

The IN argument peer_comm need only be valid in the local group leader.  
Only the group leaders need to be members of peer_comm. 
-
LOOSELY-SYNCHRONOUS INTER-COMMUNICATOR CONSTRUCTOR
-
SYNCHRONIZATION PROPERTIES OF MPI_COMM_PEER_MAKE_START() AND 
MPI_COMM_PEER_MAKE_FINISH()
-
COMMUNICATOR STATUS
(convenience function - make go away)
-
Synchronization issue: Can a process using an inter-communicator send a 
message using that inter-communicator as soon as it has the 
inter-communicator?  Something needs to be said.
-
INTER-COMMUNICATOR SUPPORT
-
EXAMPLE 1
-
"UNDER THE HOOD"
-
INTRA-COMMUNICATION

INTER-COMMUNICATION
-
POSSIBLE IMPLEMENTATION OF MPI_COMM_PEER_MAKE()
-
POSSIBLE IMPLEMENTATION OF MPI_COMM_PEER_MAKE_START()
-
POSSIBLE IMPLEMENTATION OF MPI_COMM_PEER_MAKE_FINISH()
-

What is comparison with using the common ancestor approach?  To do this 
one would create a union group

STRAW VOTE: Have inter-communication?
----------  Yes: 14  No: 3  Abstain: 9

VOTE: Approve section 3.8 (Introduction to Inter-Communication) as 
----  amended but minus the name service?
      Yes: 8  No: 2  Abstain: 10

-
?
-
EXAMPLE 2
-
EXAMPLE 3
-
EXAMPLE 4
-
Al Geist noted all of the examples should be expanded to show at least 
one message being sent!
-
POSSIBLE IMPLEMENTATION OF MPI_COMM_NAME_MAKE()
-
POSSIBLE IMPLEMENTATION OF MPI_COMM_NAME_MAKE_START()
-
POSSIBLE IMPLEMENTATION OF MPI_COMM_NAME_MAKE_FINISH()
-

VOTE: Approve the name serverice material in section 3.8 (as amended)?
----  Yes: 8  No: 1  Abstain: 10


-------------------------------------------------------------------------------
                 Report From the Process Topology Subcommittee
-------------------------------------------------------------------------------

Process Topology - Second Reading
---------------------------------

Steve Huss presented.

{He put on a pair of Birkenstocks to emphasize that he was acting as a 
virtual Rolf Hemple.}

A couple of proposals that were made verbally at the last meeting were 
not written and so have not been included.

  Introduction (6.1)
  ------------------
  
VOTE: Approve section 6.1 (Introduction)
----  Yes: 13  No: 0  Abstain: 5

  Virtual Topologies (6.2)
  ------------------------

Terminology - change the name to "process topologies" (Eric Van ) or 
"application topologies" (David Walker). 

VOTE: Approve section 6.2 (Virtual Topologies)?
----  Yes: 10  No: 0  Abstain: 6


  Embedding in MPI (6.3)
  ----------------------
  
There were various editorial remarks which Steve recorded to relay to 
Rolf Hemple.
  
VOTE: Approve section 6.3 (Embedding in MPI)?
----  Yes: 9  No: 0  Abstain: 8

  Overview of the proposed MPI functions (6.4)
  --------------------------------------------

The initial part of 6.4 (before 6.4.1) will go away.

VOTE: Specify that we use row major order?
----  Yes: 4  No: 0  Abstain: 13

Steve Huss pointed out that the extent to which these functions are global
functions is not specified.  Lyndon Clarke offered the amendment that 
they be specified as collective.

There was a discussion of the group parameters in these functions.  Steve 
agreed to propose to Rolf that they be systematically replaced by 
communicators.

VOTE: Replace all group parameters throughout by communicator?
----  Yes: 8  No: 1  Abstain: 7

VOTE: MPI_MAP_CART and MPI_MAP_GRAPH shall be global routines?
----  Yes: 4  No: 2  Abstain: 12

The large number of abstentions in the recent votes led to a discussion 
of the value of topology and also to a discussion of our procedures.  
There was no strong interest in discussing including topology.  Neither 
was there any strong interest in changing procedures.

VOTE: Postpone the second reading of the topology chapter until the next 
----  meeting?
      Yes: 3  No: 12  Abstain: 2
      
VOTE: Approve chapter 6 as amended?
----  Yes: 12  No: 3  Abstain: 2


-------------------------------------------------------------------------------
                     Report from the Subset Subcommittee
-------------------------------------------------------------------------------

Steve Huss presided.

This was NOT a second reading. 

Jim Cownie argued that the profiling material should be included in the 
subset because this provides important facilities and the cost of 
providing it in an initial implementation is not large.

Several people agreed with this, so there was a quick vote.

STRAW VOTE: Include profiling in the subset?
----------  Yes: 24  No: 0  Abstain: 1

A discussion of the parts of environmental management and inquiry to be 
included lead to an agreement that this should be deferred until the 
presentation on this material.

There was nothing to be said about language binding - there will be F77 
and C bindings.

STRAW VOTE: Exclude topology from the subset?
----------  Yes: 21  No: 2  Abstain: 3


It was agreed that the list for collective communication in the document 
is OK.

In considering the Point-to-point Functions, the list in the document 
includes MPI_SENDRECV but excludes MPI_EXCHANGE.  It was generally agreed 
that this is sensible.

Steve Otto argued for including hvec type functions in the subset because 
of common usage.  In considering this and the issue of derived datatype 
several posibilities were considered.  The one that got general support 
is to include all of the material in section 4.13.  

It was noted that data conversion is not a subset issue - heterogeneous 
systems have to have it; homogeneous systems do not need it.

---
break for dinner at 6:05
---

Subcommitte meetings - starting about 8:30

  Subset - immediately after dinner
  User Guide - after subset meeting
  Context - after subset meeting

 =============================================================================

Friday, June 25, 1993
------  -------  ----

-------------------------------------------------------------------------------
                Report from the Environmental Management Subcommittee
-------------------------------------------------------------------------------

Rusty Lusk presided.

Rusty began by handing out a new version


Environmental Management and Inquiry

 1 Initialization
 
 2 Environmental query
 
 3 Others
 
 
Initialization and Termination

  Current draft:
    MPI_INIT()			"idempotent"
    MPI_FINALIZE()		"last MPI call"
    
Discussion:
  How does a library know whether to call MPI_FINALIZE?
  Is MPI_FINALIZE optional?
  MPI_INIT requires some state not attached to any object; why not a 
    communicator?
  MPI_INIT requires some state not attached to any object; why not a 
    communicator?
  
A proposed amendment:

	MPI_INIT(old_comm, new_comm)
{if old_comm is null, gets first communicator}
	MPI_FINALIZE(current_comm, old_comm)
	
Nests MPI invocations
Attaches MPI state to communicator
	

STRAW VOTE: We shall provide a mechanism that allows a library written 
----------  using to be called from either within or without MPI?
            Yes: 11  No: 8  Abstain: 3


Steve Huss asks what happens if two libraries are invoked using different 
numbers of processors, then what is the ALL group?

Jim Cownie offered a very simple proposal - All processes must call 
MPI_INIT at the start and all processes must call MPI_FINALIZE at the 
end.  

Note that the picture is that by a vendor provided miracle the MPI system 
is started and only after this can MPI_INIT be called.  This is likely to 
be a global barrier.


STRAW VOTE: MPI_INIT and MPI_FINALIZE must be called exactly once in each 
----------  process.
            Yes: 18  No: 0  Abstain: 5


organization count: 20

Lotes of continuing discussion.  It was agreed that, in the context of 
the straw vote, any program that violated the requirement is erroneous.

VOTE: Have an MPI_INITIALIZED flag?
      Yes: 16  No: 1  Abstain: 2

VOTE: MPI_INIT and MPI_FINALIZE must be called exactly once in each 
----  process.  MPI_INIT is a global operation.  It must be called before 
	  any other MPI routine.  MPI_FINALIZE is the last MPI call.
      Yes: 16  No: 1  Abstain: 3 

Rusty offered a proposal for MPI_ABORT(error_code) which terminates every
process in the ALL group.

VOTE: Have MPI_ABORT?
----  Yes: 13  No: 3  Abstain: 2


  MPI-Specific (1.1)  [Section numbers from chapter handed out at meeting]
  ------------------
  
Why are there communicator arguments in MPI_NumCommunicator?  Rusty did 
not know.  No one had a convincing argument.

VOTE: Remove communicator arguments from MPI_ValidTags and 
----  MPI_NumCommunicator?
      Yes: 14  No: 0  Abstain: 5


There was a fair amount of confusion about the intent and value of the 
MPI_BufferParams routine.  In particular, various alternative proposals 
were mentioned.  Rik Littlefield has proposed that the user be able to 
specify buffering capability

STRAW VOTE: Should there be some way of asking the system about buffering?
----------  Yes: 15  No: 2  Abstain: 7

STRAW VOTE: Should there be some way of telling the system about buffer 
----------  requirements?
            Yes: 6  No: 3  Abstain: 15

Rik will provide a proposal at the next meeting.

VOTE: Remove MPI_IOmode?
----  Yes: 18  No: 1  Abstain: 2

Discussion of MPI_Errormode?  There was again uncertainty about the 
communicator argument leading to:

VOTE: Remove the communicator argument from MPI_Errormode?
----  Yes: 7  No: 4  Abstain: 8 

After further discussion it was realized that while something of this 
sort is desirable, that is much more detail (e.g. how are error handlers 
established) that is essential before accepting this function.

STRAW VOTE: Should there be a facility to set and query error mode?
----------  Yes: 18  No: 0  Abstain: 0

 
There was quick agreement that MPI_Has_Nonblocking and 
MPI_Has_Heterogeneous are not useful.

VOTE: Remove MPI_Has_Nonblocking and MPI_Has_Heterogeneous?
----  Yes: 9  No: 0  Abstain: 4 

STRAW VOTE: Have functions to inspect receive queue and other interesting 
----------  internal structures?
            Yes: 12  No: 3  Abstain: 2 

Anne Elster again asked for MPI_LOAD_INFO.  She was proposing a modified 
version that had less time-specific information, but no written proposal 
was available at the meeting.  A concrete proposal will be seen at the 
next meeting.

VOTE: Remove sections 1.2 (Parallel programming) and 1.3 (non-MPI) except 
      keep for MPI_Ge
----  Yes: 8  No: 3  Abstain: 2


VOTE: Accept MPI Environmental Management chapter as amended.
----  Yes: 8  No: 2  Abstain: 3


------------------------------------------------------------------------------
MPI Sound Bites
Jim Cownie

Where's David?

Oh No!

Don't think about that one too much

What's the question again?

Those in favor of going to the bar?

Shall we accept the chapter as eviscerated?

------------------------------------------------------------------------------
                Report from the Language Binding Subcommittee
-------------------------------------------------------------------------------

Rusty Lusk presided.

Language Binding

7.1-7.4 will go into another chapter (MPI Terms and Conventions)
7.5 will go into an Appendix

We need to:
  1.	Update and read 7.1-7.4
  2.	Decide on principles for binding presentation
  3.	Decide on format of definitions in the chapters
  4.    Decide on format and order of definitions in appendix.
  5.    Choose a procedure for agreeing on names of C functions, Fortran
          subroutines, named constants, types.
  6.    Enforce consistency
  
1. Modification to 7.1-7.4 (see draft)

2. Presenting the bindings
	a. Named constants (MPI_SUCCESS)
	b. Named types (MPI_COMMUNICATOR)
	c. Functions and arguments
		i. C - use ANSI C style
	   ii. Fortran - use prototypes and declarations
	d. Consistency of formal argument names
	e. IN arguments before OUT arguments
	f. return code last argument in FORTRAN
	g. others?

3. Current 7.5 OK modulo name updating?

Jim Cownie argued that the principle that "all C functions should return 
an error code" should be relaxed for those functions that would best be 
implemented using macros.

VOTE: Accept chapter and annex with the modifications outline by Rusty
----  Yes: 12  No: 0  Abstain: 1 


Format

In the chapters

  Current Format    <-  match C binding in order and number of arguments
        +
  Fortran Binding   <-  match appendix
        +
  C Binding         <-  match appendix
  
In the appendix,
  Sort by appearance order?
  Alphabetically within chapter?    <-- this one chosen
  Alphabetically?
  
Keep appendix after using if for consistency check?

(Note alphabetical index)

As a technical issue, Steve Otto would like to have the bindings appear 
in the chapter source, but only appear printed in the appendices.

There was general agreement that in the appendix the functions should 
appear alphabetically within each chapter.

------------------------------------------------------------------------------