Comparison of Different Methods for Performing Sequence Alignment

Location

CSU 255

Start Date

13-4-2004 10:30 AM

End Date

13-4-2004 12:15 PM

Student's Major

Computer Information Science

Student's College

Science, Engineering and Technology

Mentor's Name

Christophe Veltsos

Mentor's Department

Computer Information Science

Mentor's College

Science, Engineering and Technology

Second Mentor's Name

Timothy Secott

Second Mentor's Department

Biological Sciences

Second Mentor's College

Science, Engineering and Technology

Description

The fact that biological sequences can be represented as strings belonging to a finite alphabet (A, C, G, and T for DNA) plays an important role in connecting biology to computer science. It allows us to apply various string comparison techniques available in computer science. As a result, various applications have been developed that facilitate the task of sequence alignment. The problem of finding sequence alignments consists of finding the best match between two biological sequences. A best match, one which displays high sequence similarity, potentially hints at an evolutionary relationship and functional similarity. However, there is a lack of research on how reliable and efficient these applications are especially when it comes to comparing two sequences that might not be highly similar (but could have common patterns that are small yet biologically significant). This study compares three biological sequence comparison packages namely WuBlast2, Fasta3, and MPsrch which implement Blast, FastA, and Smith-Waterman algorithms, respectively. In order to do so, a framework was developed to facilitate the task of data collection and create meaningful reports. Amino acid sequences corresponding to related proteins, as well as the DNA sequences encoding these proteins, were analyzed with matching parameters on each application. Initial observations show a trend of increasing variations between the matches produced by the three applications with decreasing sequence similarity. In addition, the time required for performing the search showed a pattern of exponential growth as the complexity of the sequence is increased.

This document is currently not available here.

Share

COinS
 
Apr 13th, 10:30 AM Apr 13th, 12:15 PM

Comparison of Different Methods for Performing Sequence Alignment

CSU 255

The fact that biological sequences can be represented as strings belonging to a finite alphabet (A, C, G, and T for DNA) plays an important role in connecting biology to computer science. It allows us to apply various string comparison techniques available in computer science. As a result, various applications have been developed that facilitate the task of sequence alignment. The problem of finding sequence alignments consists of finding the best match between two biological sequences. A best match, one which displays high sequence similarity, potentially hints at an evolutionary relationship and functional similarity. However, there is a lack of research on how reliable and efficient these applications are especially when it comes to comparing two sequences that might not be highly similar (but could have common patterns that are small yet biologically significant). This study compares three biological sequence comparison packages namely WuBlast2, Fasta3, and MPsrch which implement Blast, FastA, and Smith-Waterman algorithms, respectively. In order to do so, a framework was developed to facilitate the task of data collection and create meaningful reports. Amino acid sequences corresponding to related proteins, as well as the DNA sequences encoding these proteins, were analyzed with matching parameters on each application. Initial observations show a trend of increasing variations between the matches produced by the three applications with decreasing sequence similarity. In addition, the time required for performing the search showed a pattern of exponential growth as the complexity of the sequence is increased.