While most of the local computations are performed by the BLAS and the communication is handled by the BLACS, the PBLAS is in fact only responsible for organizing the distributed computations. A typical PBLAS subroutine locally checks the coherency and the validity of its input arguments, translates these global parameters into their local equivalents and performs the basic operations using an optimally shaped adaptive procedure. Note that most of the PBLAS routines currently assume the data to be aligned. Various routines have different alignment restrictions. For instance, some routines will require that two matrices start at the same process row or column, while others may require only the block size to be the same. In the next version of the PBLAS, some of these restrictions have been removed and the remaining restrictions will be evaluated by user feedback.