دانلود مقاله ISI انگلیسی شماره 23880
ترجمه فارسی عنوان مقاله

در مطالعه از ابهام و تجاری کردن بین معیارهای و ابهام در زبان های درج-حذف

عنوان انگلیسی
On the study of ambiguity and the trade-off between measures and ambiguity in insertion–deletion languages
کد مقاله سال انتشار تعداد صفحات مقاله انگلیسی
23880 2011 13 صفحه PDF
منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Nano Communication Networks, Volume 2, Issues 2–3, June–September 2011, Pages 106–118

ترجمه کلمات کلیدی
سیستم های درج - حذف - زبان های ذاتا مبهم - سیستم بدون ابهام - تعیین ناپذیری - ساختارهای زیستی مولکولی - معیارهای پیچیدگی توصیفی
کلمات کلیدی انگلیسی
Insertion–deletion systems, Inherently ambiguous languages, Unambiguous system, Undecidability, Bio-molecular structures, Descriptional complexity measures
پیش نمایش مقاله
پیش نمایش مقاله  در مطالعه از ابهام و تجاری کردن بین معیارهای و ابهام در زبان های درج-حذف

چکیده انگلیسی

Gene insertion and deletion are the operations that occur commonly in DNA processing and RNA editing. Based on these operations, a computing model has been formulated in formal language theory known as insertion–deletion systems. In this paper we study about ambiguity issues of these systems. First, we define six levels of ambiguity for insertion–deletion systems that are based on the components used in the derivation such as axiom , contexts and strings . Next, we show that there are inherently ii-ambiguous insertion–deletion languages which are jj-unambiguous for the combinations (i,j)∈{(5,4),(4,3),(4,2),(3,1),(2,1),(1,0),(0,1)}(i,j)∈{(5,4),(4,3),(4,2),(3,1),(2,1),(1,0),(0,1)}. As an application, we discuss with an example that how some of these ambiguity levels can be interpreted in gene sequences. Further, we prove an important result that the ambiguity problem of insertion–deletion systems is undecidable. Then, we define six new measures for insertion–deletion systems based on used contexts and strings. Finally, we analyze the trade-off between ambiguity levels and measures. We note that there are languages which are inherently ii-ambiguous (for i=5,4,2,0i=5,4,2,0) when a measure MM is minimal for the languages but they are ii-unambiguous otherwise.

مقدمه انگلیسی

In the past few decades, natural computing is an area being pursued with great interest. It includes evolutionary computing [2] and biologically inspired computing such as DNA computing [11] and membrane computing [9]. The developments in DNA computing inspired the study of new theoretical models in formal language theory known as sticker systems, splicing systems, Watson–Crick automata and insertion–deletion systems [1] and [11]. Insertion and deletion operations were studied first in [4] and [5] and based on these operations, insertion–deletion systems were introduced in [6]. Informally, the insertion and deletion operations in an insertion–deletion system is defined as follows: if a string ββ is inserted between two parts w1w1 and w2w2 of a string w1w2w1w2 to get w1βw2w1βw2, we call the operation as insertion, whereas if a substring αα is deleted from a string w1αw2w1αw2 to get w1w2w1w2, we call the operation as deletion. Insertion–deletion operations have relevances to some phenomena in human genetics. In Fig. 1 we show how the insertion–deletion systems are applied in the field of genetics. Consider a single strand DNA sequence S1=xuvyzS1=xuvyz, where x,u,v,y,zx,u,v,y,z are all strings. Add a single stranded DNA sequence u′w′v′u′w′v′ to the sequence xuvyzxuvyz, where u′,v′u′,v′ are the Watson–Crick complements of the strings u,vu,v and w′w′ is the complement of some string ww (see Fig. 1(a)). First, annealing will take place such that u′u′ will stick to uu and v′v′ to vv, thus we obtain the scenario as in Fig. 1(b). Next a cut by a restriction enzyme to the double stranded DNA sequence uvuv will be done (shown in Fig. 1(c) where the cut is denoted by a thick double directed arrow ⟷⟷). By adding a primer z′z′, we obtain another double stranded sequence. Finally, by melting the double stranded sequence, the two strands will be separated, thus we obtain two strings in which one is of the form S2=xuwvyzS2=xuwvyz (refer Fig. 1(d)). Thus, the string S2S2 obtained from S1S1 shows the use of insertion operation in DNA sequences.Ambiguity is considered as one of the fundamental problems in linguistics. A grammar is said to be ambiguous, if there exists more than one distinct derivation of words in the generated language. As we have seen above that the insertion–deletion system can be applied theoretically in DNA processing, the ambiguity in DNA processing (which uses the insertion–deletion system) may happen in the following manner. Let W1W2W1W2 be a DNA strand and suppose we want to insert W3W4W5W3W4W5 between W1W1 and W2W2 to obtain another DNA strand W1W3W4W5W2W1W3W4W5W2. This can be done first by inserting W3W3 between W1W1 and W2W2, followed by inserting W4W4 between W3W3 and W2W2, followed by inserting W5W5 between W4W4 and W2W2. The other sequence would be first by inserting W5W5 between W1W1 and W2W2, followed by inserting W4W4 between W1W1 and W5W5, followed by inserting W3W3 between W1W1 and W4W4. The two distinct derivations of above are given as: (1) View the MathML sourceW1W2⟹W1W3¯W2⟹W1W3W4¯W2⟹W1W3W4W5¯W2 (2) View the MathML sourceW1W2⟹W1W5¯W2⟹W1W4¯W5W2⟹W1W3¯W4W5W2 (the underlined string denotes the inserted string). This shows that ambiguity in gene sequences is also possible in the sense that starting from one sequence we are able to get another sequence in more than one way such that the intermediate sequences are different. This motivates us to define formally the ambiguity for insertion–deletion systems. Fig. 1 shows the applicability of insertion–deletion systems in gene sequences, storing of such sequences of large data can be minimized if the corresponding insertion–deletion system can be identified by means of minimal measures. Such systems are called minimal systems (with respect to the measure). In [10], the following measures are defined for insertion systems: Ax,MAx,TAx,ProdAx,MAx,TAx,Prod, Symbol . AxAx denotes the number of axioms, MAxMAx denotes the maximum length of an axiom, TAxTAx denotes total length of all axioms, ProdProd denotes the number of insertion rules and Symbol denotes the number of symbols in the insertion rules. As the insertion–deletion systems are extended models of insertion systems, the measures Ax,MAx,TAx,ProdAx,MAx,TAx,Prod are even applicable to insertion–deletion systems. In this paper, we introduce a few more descriptional complexity measures for insertion–deletion systems: TLength-Con (total length of contexts used in insertion and deletion rules), TLength-Str(total length of the strings to be inserted plus the total length of the strings to be deleted), TINS-StrCon (total length of the contexts used in insertion rules plus the total length of the strings to be inserted), TDEL-StrCon (total length of the contexts used in deletion rules plus the total length of the strings to be deleted), TINS-Str (total length of the strings to be inserted), TDEL-Str (total length of the strings to be deleted). It is preferable that a minimal system is also unambiguous as it will help to predict the gene structure without any ambiguity. Unfortunately, such a system may not exist for all languages and in such cases, a trade-off between ambiguity and measures need to be analyzed. In this paper, we identify some languages for which the trade-off has to be made. We notice that all the minimal systems for some languages turn to be ambiguous and when the minimal condition is relaxed, there exist unambiguous systems for such languages. We call these languages as pseudo inherently ambiguous languages. In this paper, we extend the work carried out in [7]. In [7], six levels of ambiguity of insertion–deletion systems have been defined and it is shown that there are inherently ii-ambiguous insertion–deletion languages which are jj-unambiguous for the combinations (i,j)∈{(5,4),(4,2),(3,1),(3,2),(2,1),(0,1)}(i,j)∈{(5,4),(4,2),(3,1),(3,2),(2,1),(0,1)}. Also, three new measures TLength-Con , TLength-Ins , TLength-Del are introduced and we discussed the trade-off between ambiguity and measures of insertion–deletion languages. More specifically, it is shown that if there are languages for which a minimal measure MM is chosen, then all the corresponding minimal grammar systems are ambiguous and when the minimality is traded off, there exist unambiguous grammar systems for those languages. This is proved for the measures ProdProd, AxAx and for the ambiguity levels 5 and 4. In this paper, we inherit all the results of [7] (with some revision) and we extend the paper with the following contributions: Theorem 4.2, Theorem 4.3 and Theorem 4.4 are proved by considering new languages which are simpler than the languages considered in [7]. A new result on ambiguity is proved for the combination (1,0)(1,0) and is reflected in Theorem 4.5. This was left as an open problem in [7] and is solved here. We show the application of these defined ambiguity levels and how they can be interpreted in gene sequences with some suitable examples (see Section 4.1). We introduce three more new measures, TLength-Str, TINS-Str and TDEL-Str, for insertion–deletion systems and discuss the trade-off results based on them (see the results in Theorem 7.2, Theorem 7.4 and Theorem 7.5). This paper is organized as follows. In Section 2, we discuss the preliminaries. In Section 3, based on the components used in the derivation, we define the various ambiguity levels (i=0,1,2,3,4,5)(i=0,1,2,3,4,5) for insertion–deletion systems. In Section 4, we show that there are inherently ii-ambiguous insertion–deletion languages which are jj-unambiguous for the combinations (i,j)∈{(5,4),(4,3),(4,2),(3,1),(2,1),(1,0),(0,1)}(i,j)∈{(5,4),(4,3),(4,2),(3,1),(2,1),(1,0),(0,1)}. We also discuss an example that shows how the ambiguity level defined for insertion–deletion systems can be interpreted and applied in gene sequences. In Section 5, we prove that there is no solution (i.e., no algorithmic procedure) to decide whether a given arbitrary insertion–deletion system is ambiguous or not. In Section 6, we introduce six new descriptional complexity measures for insertion–deletion systems. Finally, in Section 7, we analyze the trade-off between the newly defined ambiguity levels and the above measures of insertion–deletion systems.

نتیجه گیری انگلیسی

Insertion–deletion systems were defined and motivated by the way DNA strands are inserted and deleted. In this paper, we defined various ambiguity levels (i=0,1,2,3,4,5)(i=0,1,2,3,4,5) for insertion–deletion systems. We showed that there are inherently ii-ambiguous insertion–deletion languages which are jj-unambiguous for the following combinations (i,j)∈{(5,4),(4,3),(4,2),(3,1),(2,1),(1,0),(0,1)}(i,j)∈{(5,4),(4,3),(4,2),(3,1),(2,1),(1,0),(0,1)}. We have also shown that how the ambiguity levels defined for insertion–deletion systems can be interpreted in gene sequences. We proved that the ambiguity problem for insertion–deletion systems is undecidable. Further, we defined six new measures based on contexts and strings of insertion–deletion systems. We discussed the trade-off between the newly defined ambiguity levels and measures in insertion–deletion systems. We showed that there are pseudo inherently ii-ambiguous insertion–deletion languages which are ii-unambiguous. A few more trade-off results can be analyzed by considering other measures and ambiguity levels which are not analyzed in this paper is left as future work. In this paper, we considered the languages for which the difference in measures between the minimal systems and the non-minimal (unambiguous) systems is marginal; however, there can be many languages for which the difference in measures is very huge. For example, consider the language L11L11 in which 3 and 2 are replaced with 300 and 200 respectively. In this case, the difference with respect to the measure AxAx in minimal system and non-minimal system is very large. Whether the difference is bounded by a constant or by a polynomial factor for such languages seems to be an interesting research line and is left as future work.