ترجمه فارسی عنوان مقاله

روش تراز سیگنال جدید بر اساس الگوریتم ژنتیک

عنوان انگلیسی

SAGA: A novel signal alignment method based on genetic algorithm

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
8149	2013	18 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Information Sciences, Volume 228, 10 April 2013, Pages 113–130

ترجمه کلمات کلیدی

- تراز سیگنال - الگوریتم ژنتیک - تاب زمان

کلمات کلیدی انگلیسی

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

Signal alignment is one of the most commonly used strategies in analyzing a group of time series in order to learn the variations or common patterns across individual signals. A pairwise alignment algorithm aligns two signals by warping the time axis of the first signal so that the warped signal is “near” to the second. The majority of alignment algorithms are focused on extracting features like the locations of significant peaks or peak widths, and using those features in aligning the signals instead of raw signal. Although this approach allows fast alignments, it suffers from the risk of missing important features, leading to inaccurate alignments. In this paper, a novel Signal Alignment method based on Genetic Algorithm (SAGA) is proposed to align raw signals by first modeling the warping function with an ODE model. The parameters of the warping function are then optimized by using a genetic algorithm. The SAGA does not require feature extraction and it preserves the smoothness of the signals. The performance of the proposed method is evaluated on two sets of synthetic and real world datasets and compared to the well-known alignment algorithms. The results show that SAGA is a powerful algorithm that can compete with the others.

مقدمه انگلیسی

Time series is one of the common data types encountered in many applications. It can be obtained by periodically measuring the height of a child [19] or it can be an outcome of a sophisticated device such as a DNA sequencer [24] or a Mass Spectrometer [6]. In either case, a time series is a list of measurements obtained by observing an event over time. The variations or patterns in the data usually carry useful information about the investigated subject. For example, height measurements describe the growth rate of children or the data produced by a DNA sequencer indicate the DNA sequence of a patient under inspection. Processing a single time series and extracting useful information out of it are well-known research topics and deserve attention on their own. A simple example is to create a 4-nucleotide sequence by analyzing the time series provided by DNA sequencers [9]. However, a more interesting problem is to analyze a group of time series in order to learn the variations or common patterns across the individuals. Extracting that information about the group may provide better understanding on the investigated study. Analyzing a set of growth profiles of male and female children [27], detecting changes in peptide/protein abundances in samples [2], digital tracking of lesions [13] or comparing a group of DNA sequences to a reference [5] are such examples. Many techniques used in analysis of multiple time series assume that they are ready to make a side by side comparison. For example one may want to cluster the time series into distinct groups by assuming that the location of a peak common in all time series are at the same time point. However shifts in time series is a common phenomena in many application fields. Shifts, which are also called time drifts or retention time difference, can be highly non-linear because of imperfections of measurement device or differences in the examined subjects. Therefore it is often necessary to correct the time drifts between the time series. This involves stretching or compressing time axis of one or more time series so that the resulted series are ready to analyze. This is called an alignment. There are two basic strategies in alignment of a group of time series. In the first approach, alignment is carried out by using the features extracted from raw data. These features used to align original raw data. Conventional mutation screening fits very well to this description [15] and [16]. The raw data (signal) generated by DNA sequencers are first processed to obtain a 4-nucleotide DNA sequence (features), then these string sequences are aligned. Another application field is the alignment of chromatographic profiles which stems from Gas/Liquid Chromatograph-Mass spectrometry (MS) platform for the profiling of certain classes of small molecules in biological samples. In this type of data, the locations of prominent peaks are called feature vectors which are generally obtained by finding the points at which the first and second derivatives are zero and positive, respectively. Feature vectors can then be aligned by various procedures including Dynamic Programming (DP) [21] and fuzzy matching approach [29]. A detailed review on signal alignment based on extracted features can be found in [1]. The advantage of this approach shows itself in the name of speed since the transformation reduces data size which allows a faster analysis. However, it also has an important drawback; the risk of missing a feature, resulting in misleading conclusions in later steps. For example, during the generation of DNA sequences, a single false nucleotide assignment may lead to an erroneous diagnosis of the patient. The second strategy depends on the idea of using raw data throughout the entire comparison study in order to eliminate the risk of incorrect feature extraction of the first approach [7], [14] and [15]. The alignment of raw signals is achieved by “warping” the time axis of one or both of the signals so that the two signals are forced to be “close” to each other in the sense that the Euclidean distance between them is small. Warping is defined as nonlinear stretching and compressing the signal along the time axis. The clock analogy is very useful to further understand this phenomena. Consider two people observing the same event and recording the outcomes by using the time displayed in their individual clocks, namely A and B, as shown in Fig. 1. If there is no time difference between A and B, then the recorded observations will be identical. However, there will be differences between the signals depending on the amount of time drifts which can be modelled by a strictly monotone increasing function, which is called a “warping function”, mapping the time in A to the time in B. If one knows the warping function, then it can be used to align the two signals by applying its inverse only to the second signal. In this way, the time axis of the second signal is corrected to match that of the first. In other words, we can say that only clock B is adjusted to create a synchronization between the clocks. Likewise, if one applies the warping function to the first signal and keeps the second signal fixed, then the time axis of the first signal is corrected to match that of the second. This is equivalent to adjusting only the first clock A to create a synchronization. These two cases are examples of asymmetric alignment since only one of the signals is warped. In symmetric cases both signals are warped to create an alignment. This is equivalent to making adjustments in both clocks. The symmetric case is further explained in Section 3.Aligning a pair of signals, as in the above example, is called a “pairwise alignment”. Aligning more than two signals is called a “multiple alignment” and it can be achieved by doing a series of pairwise alignments. One of the signals can be chosen as a “reference”, then the other signals are pairwise aligned to the reference one at a time. The average of the signals can be chosen as a reference. It is also possible to create a representative signal by assuming that each time series is generated as a noisy transformation of a single signal called a “latent trace” which can be obtained by Continuous Profile Model (CPM) [12]. CPM has difficulties in aligning signals longer than 1000 points and not meaningful for just a pair of signals. Therefore it is not used for comparison purposes in the experimental part of this study. From now on, an alignment represents pairwise alignment unless otherwise stated. The stated optimization problem to the signal alignment has been a major interest in the community for nearly half a century. The vast majority of the methods developed in this era are built on Dynamic Programming (DP) [4] that lends itself to the creation of cost matrices which becomes a liability in the alignment of long signals. It also generates non-smooth warping functions resulting in sudden jumps in the alignment. Many attempts have been made to solve these problems, but none of them achieved both objectives. In this paper, we introduced a new signal alignment algorithm, Signal Alignment method based on Genetic Algorithm (SAGA) to overcome the limitations of aforementioned methods namely (1) non-smoothness of the warping functions (2) quadratic space and time complexity. In order to achieve the smoothness, the warping function is modelled by an Ordinary Differential Equation (ODE) combined with a weight function which allows rapid changes in curvature [18]. Any solution of ODE is guaranteed to be smooth and strictly increasing. A rich set of warping functions can be produced by using specially designed weight functions which can be optimized to find an optimum warping function in the course of pairwise signal alignment. Genetic Algorithm (GA) is a powerful method to solve this kind of optimization problems especially when the derivative of the fitness function is not available. To our best knowledge, the combination of GA and the ODE model has never been used to align two signals. Our method has two advantages; (1) we employ an ODE model to produce smooth and strictly increasing warping functions. This model not only results in a smooth warping function but also enables the alignment of signals without extracting features. (2) We use the genetic algorithm for estimating the model parameters of warping function, which ensures to align long signals without increasing memory requirements. Indeed, the time and space complexity of SAGA is linear. The results on synthetic data sets show that our method can effectively align signals providing smooth warping functions with low memory requirements. We also evaluate our method on real data sets to show that the proposed approach scales and provides effective alignment of long signals. The rest of the paper is organized as follows. We introduce the related work in Section 2 and define the properties of warping functions in Section 3. Details of the method are given in Section 4. We have compared the performance of our algorithm with other state of the art alignment algorithms and evaluate the performance in Section 5. Finally, we conclude with an analysis of our results and point out future work in Section 6.

نتیجه گیری انگلیسی

In the first experiment, two short signals with known warping functions are aligned to compare the performance of our method to the well known methods, DTW, COW and PTW. The SAGA is able to align those signals and successfully capture the real warping function. The second experiment is conducted to test the method on multiple alignment benchmark for which our method SAGA is verified to be very effective. In the third experiment, the methods are tested on a real data set for which all methods except PTW produced similar yet satisfactory alignments. In the fourth experiment, another real data set is used to test all methods. In the first step of this experiment, a short segment from the signal is used. In this step, the GT warping function is determined manually to verify that it is non-linear. While the SAGA, DTW and COW produce similar and successful alignments, PTW has failed again. In the second step of the last experiment, the full length signals are used. The SAGA, DTW and COW align the signals successfully while PTW has failed. In all the experiments, DTW suffers from overfitting which usually results in distortions in the aligned signals. In other words, the signals are forced to be aligned. Hence, the evaluation scores obtained by DTW are almost always below the ones produced by other methods. Another drawback of DTW is the quadratic space complexity which makes it inconvenient to work with long signals as shown in the last experiment. COW produces smoother alignments than DTW especially in the first and second experiments. In other experiments, staircase patterns caused by DTW or COW are not prominent.