نظارت بر رفتار انسان با فیلم ضبط شده در محیط اداری
|کد مقاله||سال انتشار||تعداد صفحات مقاله انگلیسی||ترجمه فارسی|
|28060||2001||14 صفحه PDF||29 صفحه WORD|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Image and Vision Computing, Volume 19, Issue 12, 1 October 2001, Pages 833–846
2. آثار مرتبط
3. دانش قبلی
4.1. تشخیص پوست
4.3. تشخیص تغییر صحنه
جدول 1. لیست اعمال و شرایط تشخیص
4.4. تشخیص عمل
شکل 1. مدلهای وضعیت، برای اتاق نمونه، که شامل تلفن، کابینت، و ترمینال کامپوتر است.
شکل 2. مدلهای وضعیت، برای اتاق نمونه که شامل تلفن، کابینت، و ترمینال کامپوتر است.
شکل 3. پنج نمونه از تنظیمات صحنه.
4.5 تعیین فریمهای کلیدی
شکل 4. فرد کابینتی را باز میکند، در حالی که فرد دیگری مینشیند (350 فریم، 8 فریم کلیدی).
شکل 5. فرد تلفن را برمیدارد، با آن صحبت میکند و سپس تلفن را بر میدارد (199 فریم، 4 فریم کلیدی).
شکل 6. فرد جعبه را برمیدارد و آن را جای دیگری میگذارد (199 فریم، 4 فریم کلیدی).
شکل 7. فرد مینشیند تا از ترمینال استفاده کند، سپس برمیخیزد و میرود (399 فریم، 5 فریم کلیدی).
شکل 8. فرد وارد اتاق میشود، فلاسک را برمیدارد و از اتاق خارج میشود (170 فریم، 3 فریم کلیدی).
شکل 9. فرد وارد اتاق میشود، فلاسکی را برمیدارد، اما الگوریتم ردیابی فلاسک را رها میکند، بنابراین خطای خروجی رخ میدهد، در نهایت فرد از اتاق خارج میشود (170 فریم، 4 فریم کلیدی)
شکل 10. فرد وارد اتاق میشود، تلفنی را برمیدارد، اما الگوریتم ردیابی فلاسک را رها میکند، بنابراین خطای خروجی رخ میدهد، در نهایت فرد از اتاق خارج میشود (325 فریم، 4 فریم کلیدی).
7. نتیجهگیری و آثار آینده
In this paper, we describe a system which automatically recognizes human actions from video sequences taken of a room. These actions include entering a room, using a computer terminal, opening a cabinet, picking up a phone, etc. Our system recognizes these actions by using prior knowledge about the layout of the room. In our system, action recognition is modeled by a state machine, which consists of ‘states’ and ‘transitions’ between states. The transitions from different states can be made based on a position of a person, scene change detection, or an object being tracked. In addition to generating textual description of recognized actions, the system is able to generate a set of key frames from video sequences, which is essentially content-based video compression. The system has been tested on several video sequences and has performed well. A representative set of results is presented in this paper. The ideas presented in this system are applicable to automated security.
Human action recognition has become an important topic in computer vision. One of the most obvious applications of this technology is in security. In this paper we present a system for recognizing human actions, which is geared toward automated security applications. A human action recognition system is useful in security applications for two important reasons. The first is to detect the entrance of an unauthorized individual and monitor that individual's actions. The second is to monitor the behavior of people who do belong in an area. By recognizing actions a person performs and using context, the behavior of the person can be determined. Some behaviors are inappropriate for certain persons. For example, someone using another person's computer without them being in the room or taking objects they are not permitted to take. The ability to associate names with people in the scene would help achieve both of these goals. The system described in this paper recognizes human action in an environment for which prior knowledge is available. Three low-level computer vision techniques are used in our system. These techniques are skin detection, tracking and scene change detection. All three techniques use color images. Our system is capable of recognizing the actions of multiple people in a room simultaneously. The system can recognize several actions: entering the scene, picking up a phone, putting down a phone, using a computer terminal, standing up, sitting down, opening something, closing something, picking up an object (specified as interesting in advance), putting down an object (previously picked up), leaving the scene and leaving the scene with an object. Several of the actions are fairly generic in nature. Objects that could be picked up by a person include briefcases, computer equipment, etc. Objects that can be opened and closed include cabinets, overhead compartments, doors, etc. In addition to generating a textual description of recognized actions, our system reduces a video sequence into a smaller series of key frames, which concisely describe the important actions that have taken place in a room. Reduction of a video sequence to a series of key frames facilitates further analysis of the scene by computers or humans (such as deciding the name of the person who performed certain actions). Another advantage of key frames is the reduction of space required to store and time required to transmit them. The rest of this paper is organized into six sections. Section 2 deals with related work in this area. In Section 3, we discuss the role of prior knowledge in our approach. Section 4 describes our system. In this section, we describe three low-level computer vision techniques, the high-level method we use for action recognition and discuss strategies for determining key frames from video sequences. The low-level techniques are skin detection, tracking and scene change detection. All of the low-level techniques use color imagery and provide useful information for the action recognition, which is based on finite state machine model. In this section, we also discuss strategies for determining key frames from video sequences. Section 5 deals with the experimental results. We have tested our ideas with several video sequences and we provide a summary of our analysis. In Section 6, we comment on the limitations of our system. Finally, in Section 7 we provide conclusions and propose some future work in this area.
نتیجه گیری انگلیسی
We have proposed a system for recognizing human actions such as entering, using a terminal, opening a cabinet, picking up a phone, etc. The system uses the three low-level techniques of skin detection, tracking and scene change detection. We have successfully recognized these actions in several sequences, some of which had more than one person performing actions. In the experiments shown in this paper, our field of view was limited. We have performed some experiments with a wide-angle lens (Fig. 9 and Fig. 10), which increased the field of view. This improvement, however, is still not satisfactory. It would be interesting to perform some experiments in a situation where the system could view most of a room. This would allow the system to be tested on sequences with several people in the scene performing a variety of actions simultaneously. We are experimenting with the use of multiple cameras to provide a view that encompasses most of a small room. We feel that this is one of the most important extensions that should be made to this system. Extending our system to multiple cameras is conceptually straight forward, although the implementation details may be quite complex. As mentioned in Section 3, it would be desirable for our system to incorporate some method for learning the layout of the scene automatically. Obviously, increasing the number of actions recognized by the system is another avenue for future work. Detecting the arrival of new objects into the scene is an action that will be added soon. Also, future work should include implementing the system in real-time and testing the system on a large number of long sequences. Another interesting area for future work would be determining the identity of a person who has entered the room. This could be done, off-line by processing the key frames. Also, any advances in the three techniques that the system relies on (skin detection, color-based tracking and scene change detection) would help improve the system. As mentioned earlier, changing the low-level techniques would not affect our high-level model for action recognition. The system presented in this paper is capable of recognizing a number of interesting human actions. Future improvements to the low-level algorithms will make the system even more robust. The addition of multiple cameras and real-time analysis of video will allow the system to monitor an entire room over an extended period of time.