Abstract
Missing data are frequently encountered in molecular phylogenetics and need to be imputed. For a distance matrix with missing distances, the least-squares approach is often used for imputing the missing values. Here I develop a method, similar to the expectation-maximization algorithm, to impute multiple missing distance in a distance matrix. I show that, for inferring the best tree and missing distances, the minimum evolution criterion is not as desirable as the least-squares criterion. I also discuss the problem involving cases where the missing values cannot be uniquely determined, e.g., when a missing distance involve two sister taxa. The new method has the advantage over the existing one in that it does not assume a molecular clock. I have implemented the function in DAMBE software which is freely available at available at http://dambe.bio.uottawa.ca