دانلود مقاله ISI انگلیسی شماره 137900
ترجمه فارسی عنوان مقاله

یک رویکرد جدید توزیع نشده ترتیبی برای مقایسه پروتئوم کل

عنوان انگلیسی
A new distributed alignment-free approach to compare whole proteomes
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
137900 2017 21 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Theoretical Computer Science, Volume 698, 25 October 2017, Pages 100-112

ترجمه کلمات کلیدی
تقسیم آزاد فاصله، متوسط ​​رشته متداول، ناسازگاری، سیستم های توزیع شده، بیوانفورماتیک،
کلمات کلیدی انگلیسی
Alignment free distances; Average common substring; Mismatches; Distributed systems; Bioinformatics;
پیش نمایش مقاله
پیش نمایش مقاله  یک رویکرد جدید توزیع نشده ترتیبی برای مقایسه پروتئوم کل

چکیده انگلیسی

Phylogeny inference has moved in recent years from the analysis of a single or few proteins to that of whole proteomes. However, the reconstruction of evolutionary trees for big number of species poses a significant computational challenge when using complete proteomes, even when relatively fast pairwise sequence comparison algorithms are used. We present a distributed approach that relies on the computation of distance measures based on maximal shared substrings within a bounded Hamming distance. The distributed system we built to implement this approach is flexible in that it supports a variety of design choices. It is based on the Spark framework and covers all the steps required by our approach, starting from the initial indexing of a set of FASTA sequences up to producing a report detailing the distances among these sequences, ranked according to a user-defined measure. Here we apply it to compare all proteins of selected organisms, divide them into groups and perform the comparisons within each group separately. The groups include: the functionally characterized proteins, the ribosomal proteins, and the unannotated proteins. We compute the average distances within the groups and evaluate their relationship and ability to capture the evolutionary closeness of organisms. We run experiments on selected species using a Hadoop computing cluster running Spark. The results show that the system implementing our approach is scalable and accurate.