Recent developments in the MAFFT multiple sequence alignment program

K Katoh, H Toh - Briefings in bioinformatics, 2008 - academic.oup.com
K Katoh, H Toh
Briefings in bioinformatics, 2008academic.oup.com
The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins
have long been and are still important issues in bioinformatics. To rapidly construct a
reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA
software is now facing greater challenges in both scalability and accuracy than those of 5
years ago. As increasing amounts of sequence data are being generated by large-scale
sequencing projects, scalability is now critical in many situations. The requirement of …
Abstract
The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations. The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems, in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment. We review these and other techniques that MAFFT uses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/.
Oxford University Press