The objective of this paper is to discuss the hypotheses (H1) to (H6) of Ohta et al., and to reformulate their results using a more accurate modeling of current architectures. Indeed, their study is conducted while assuming that processors cannot simultaneously communicate bordering data items of the last tile and perform computations for the next tile. However, overlapping computations and communications is a facility provided by all distributed memory computers, so we relax this restriction. This simple modification has a tremendous effect on the determination of the best tile size.