دانلود مقاله ISI انگلیسی شماره 79062
عنوان فارسی مقاله

# الگوریتم های سریع برای پیدا کردن نمایندگی حداقل تکرار رشته ها و درختان.

کد مقاله سال انتشار مقاله انگلیسی ترجمه فارسی تعداد کلمات
79062 2013 20 صفحه PDF سفارش دهید 14090 کلمه
خرید مقاله
پس از پرداخت، فوراً می توانید مقاله را دانلود فرمایید.
عنوان انگلیسی
Fast algorithms for finding a minimum repetition representation of strings and trees ☆
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Discrete Applied Mathematics, Volume 161, Issues 10–11, July 2013, Pages 1556–1575

کلمات کلیدی
تکرار پشت سر هم - الگوریتم رشته؛ دستور برچسب شده درختان
کلمات کلیدی انگلیسی
Tandem repeat; String algorithm; Labeled ordered trees
پیش نمایش مقاله

#### چکیده انگلیسی

A string with many repetitions can be represented compactly by replacing hh-fold contiguous repetitions of a string rr with (r)h(r)h. We present a compact representation, which we call a repetition representation (of a string) or RRS, by which a set of disjoint or nested tandem arrays can be compacted. In this paper, we study the problem of finding a minimum RRS   or MRRS, where the size of an RRS is defined by the sum of the length of component letters and the description length of the component repetitions (⋅)h(⋅)h which is defined by wR(h)wR(h) using a repetition weight function wRwR. We develop two dynamic programming-based algorithms to solve this problem: CMR, which works for any type of wRwR, and CMR-C, which is faster but can be applied to a constant wRwR only. CMR-C is an O(n2logn)O(n2logn)-time O(nlogn)O(nlogn)-space algorithm, which is more efficient in both time and space than CMR by a ((logn)/n)((logn)/n)-factor, where nn is the length of the given string. The problem of finding an MRRS for a string can be extended to that of finding a minimum repetition representation (of a tree) or MRRT for a given labeled ordered tree. For this problem, we present two algorithms, CMRT and CMRT-C, by using CMR and CMR-C, respectively, as a subroutine. As well as the theoretical analysis, we confirmed the efficiency of the proposed algorithms by experiments, which consist of the following three parts: First we demonstrated that CMR-C and CMRT-C are fast enough for large-scale data by using synthetic strings and trees, respectively. The size of an MRRS for a given string can be a measure of how compactly the string can be represented, meaning how well the string is structurally organized. This is also true of trees. To check such ability of MRRS-size, second we measured the size of an MRRS for chromosomes of nine different species. We found that all the chromosomes of the same species have a similar compression rate when realized by an MRRS. Run length encoding (RLE) was also shown to have species-specific compression rate, but species were separated more clearly by MRRS than by RLE. Third we examined the size of an MRRT for web pages of world-leading companies by using the tag trees, showing a consistency between the compression rate by an MRRT and visual web page structures.

خرید مقاله
پس از پرداخت، فوراً می توانید مقاله را دانلود فرمایید.