یادگیری و طبقه بندی اقدامات کارگران و تجهیزات ساخت و ساز با استفاده از بسته های ویدئو ویژگی واژه و مدل شبکه های بیزی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|29141||2011||12 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Advanced Engineering Informatics, Volume 25, Issue 4, October 2011, Pages 771–782
Automated action classification of construction workers and equipment from videos is a challenging problem that has a wide range of potential applications in construction. These applications include, but are not limited to, enabling rapid construction operation analysis and ergonomic studies. This research explores the potential of an emerging action analysis framework, Bag-of-Video-Feature-Words, in learning and classifying worker and heavy equipment actions in challenging construction environments. We developed a test bed that integrates the Bag-of-Video-Feature-Words model with Bayesian learning methods for evaluating the performance of this action analysis approach and tuning the model parameters. Video data sets were created for experimental evaluations. For each video data set, a number of action models were learned from training video segments and applied to testing video segments. Compared to previous studies on construction worker and equipment action classification, this new approach can achieve good performance in recognizing multiple action categories while robustly coping with the issues of partial occlusion, view point, and scale changes.
Advanced sensing and information technologies are increasingly used on construction jobsites for collecting and analyzing a variety of project information that traditionally relied on manual methods , , , , , , , , ,  and . Among these technologies, video becomes an easily captured and widely spread media, serving the purposes of construction method analyses, progress tracking, and worker ergonomic studies in the construction industry ,  and . The associated demand for reducing the burden of manual analyses in retrieving information from video motivates further research in automated construction video understanding. Recent studies have focused on leveraging computer vision algorithms to automate the manual information extraction process in analyzing recorded videos , , ,  and . However, despite considerable progress in construction object tracking, classifying the action of construction workers or construction equipment in single view video, especially in beyond simple categories like working and not working, remains a hurdle for reaping the full benefits of video-based analysis in method studies and worker ergonomic studies. Robust action analysis algorithms that are capable of differentiating subtle action categories and handling scene clutter, occlusion, and view point changes are essential to overcome such a hurdle. In this paper, we aim to explore the potential of an emerging visual learning approach in classifying subtle action categories in a variety of construction video segments. By action, we consider the combination of rigid and non-rigid motions. This visual learning approach is composed of four major steps including feature detection, feature representation, feature modeling, and model learning. More specifically, it utilizes 3D-Harris detector as the feature detector, local histograms as the feature representation, Bag-of-Words as the feature model, and Bayesian network models as the learning mechanism for action learning and classification. For simplicity purpose, we refer this approach as the Bag-of-Video-Feature-Words in the remaining part of this paper. We developed a test bed in MATLAB to evaluate the performance of this new approach in learning and classifying action categories in construction videos. At the same time, this study also aimed to tune a set of model parameters for the model to perform well in construction scenario. Two video data sets, including backhoe actions and worker actions in a formwork activity, are constructed from a large number of construction videos as the evaluation data sets. As the main contributions of this paper, we demonstrate that the Bag-of-Words model with local action feature representations and Bayesian learning methods have a great potential in significantly advancing automated construction video understanding as it performs well in learning subtle action categories in challenging construction videos. We also characterized the impact of model parameters on the model performance; therefore, a set of good choices of model parameter values are identified. The rest of the paper is organized as follows. Section 2 briefly reviews the relevant literature in computer vision-based construction video analysis and the background of action analysis. Section 3 explains the Bag-of-Video-Feature-Words model. Section 4 evaluates the performance of the Bag-of-Video-Feature-Words model on two video data sets. Section 5 concludes the paper.
نتیجه گیری انگلیسی
n this study, we extended the Bag-of-Video-Feature-Words model into the construction domain. We implemented this new action learning and classification framework in MATLAB, and two construction video data sets were created for evaluating its performance. The following conclusions can be drawn from the experimental evaluations: • The Bag-of-Video-Feature-Words model with naïve Bayesian classifier can be extended to classify the complex action of construction workers and equipment with good accuracy. • The use of HoG descriptors as the video feature words yields better performance in classification accuracy and speed than the use of HoF descriptors. • The accuracy of classification generally improves as more code words are used, but there is little gain in classification accuracy once the number of code words exceeds 1500. • Naïve Bayesian classifier performs significantly better than the pLSA does on our construction video data sets. This may be caused by the size of training data. The attractiveness of the bag of video feature words is that it does not require accurate foreground segmentation, and is robust to partial occlusion and changes in view point, illumination, and scale. Future studies in the following directions are needed. First, the performance of this method can be further improved by adding spatial information since it is well-known that the Bag-of-Words method ignores spatial information and only concerns the frequency of feature occurrence . Second, there is also a need to introduce more action categories and more data into the existing video data set. Therefore, complex Bayesian network models or even Markov Random Field can be more thoroughly tested and evaluated. Last, but not the least, in addition to Bayesian learning methods, kernel-based methods, such as support vector machine, should also be tested on the video sets since these are two competing methods frequently used in similar studies in computer vision. Above all, since this is the first study on using Bag-of-Words method on construction video data set, we hope that this study can establish a baseline for further comparing the performance of other algorithms.