Comparison of Different Methods for Performing Sequence Alignment
Location
CSU 255
Start Date
13-4-2004 10:30 AM
End Date
13-4-2004 12:15 PM
Student's Major
Computer Information Science
Student's College
Science, Engineering and Technology
Mentor's Name
Christophe Veltsos
Mentor's Department
Computer Information Science
Mentor's College
Science, Engineering and Technology
Second Mentor's Name
Timothy Secott
Second Mentor's Department
Biological Sciences
Second Mentor's College
Science, Engineering and Technology
Description
The fact that biological sequences can be represented as strings belonging to a finite alphabet (A, C, G, and T for DNA) plays an important role in connecting biology to computer science. It allows us to apply various string comparison techniques available in computer science. As a result, various applications have been developed that facilitate the task of sequence alignment. The problem of finding sequence alignments consists of finding the best match between two biological sequences. A best match, one which displays high sequence similarity, potentially hints at an evolutionary relationship and functional similarity. However, there is a lack of research on how reliable and efficient these applications are especially when it comes to comparing two sequences that might not be highly similar (but could have common patterns that are small yet biologically significant). This study compares three biological sequence comparison packages namely WuBlast2, Fasta3, and MPsrch which implement Blast, FastA, and Smith-Waterman algorithms, respectively. In order to do so, a framework was developed to facilitate the task of data collection and create meaningful reports. Amino acid sequences corresponding to related proteins, as well as the DNA sequences encoding these proteins, were analyzed with matching parameters on each application. Initial observations show a trend of increasing variations between the matches produced by the three applications with decreasing sequence similarity. In addition, the time required for performing the search showed a pattern of exponential growth as the complexity of the sequence is increased.
Comparison of Different Methods for Performing Sequence Alignment
CSU 255
The fact that biological sequences can be represented as strings belonging to a finite alphabet (A, C, G, and T for DNA) plays an important role in connecting biology to computer science. It allows us to apply various string comparison techniques available in computer science. As a result, various applications have been developed that facilitate the task of sequence alignment. The problem of finding sequence alignments consists of finding the best match between two biological sequences. A best match, one which displays high sequence similarity, potentially hints at an evolutionary relationship and functional similarity. However, there is a lack of research on how reliable and efficient these applications are especially when it comes to comparing two sequences that might not be highly similar (but could have common patterns that are small yet biologically significant). This study compares three biological sequence comparison packages namely WuBlast2, Fasta3, and MPsrch which implement Blast, FastA, and Smith-Waterman algorithms, respectively. In order to do so, a framework was developed to facilitate the task of data collection and create meaningful reports. Amino acid sequences corresponding to related proteins, as well as the DNA sequences encoding these proteins, were analyzed with matching parameters on each application. Initial observations show a trend of increasing variations between the matches produced by the three applications with decreasing sequence similarity. In addition, the time required for performing the search showed a pattern of exponential growth as the complexity of the sequence is increased.