ترجمه فارسی عنوان مقاله

نظارت بر رفتار انسان با فیلم ضبط شده در محیط اداری

عنوان انگلیسی

Monitoring human behavior from video taken in an office environment

کد مقاله	سال انتشار	تعداد صفحات مقاله انگلیسی
28060	2001	14 صفحه PDF

منبع

Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)

Journal : Image and Vision Computing, Volume 19, Issue 12, 1 October 2001, Pages 833–846

فهرست مطالب ترجمه فارسی

چکیده

کلیدواژه‌ها

1.مقدمه

2. آثار مرتبط

3. دانش قبلی

4. سیستم

4.1. تشخیص پوست

4.2 ردیابی

4.3. تشخیص تغییر صحنه

جدول 1. لیست اعمال و شرایط تشخیص

4.4. تشخیص عمل

شکل 1. مدل‌های وضعیت، برای اتاق نمونه، که شامل تلفن، کابینت، و ترمینال کامپوتر است.

شکل 2. مدل‌های وضعیت، برای اتاق نمونه که شامل تلفن، کابینت، و ترمینال کامپوتر است.

شکل 3. پنج نمونه از تنظیمات صحنه.

4.5 تعیین فریم‌های کلیدی

شکل 4. فرد کابینتی را باز می‌کند، در حالی که فرد دیگری می‌نشیند (350 فریم، 8 فریم کلیدی).

5. نتایج

شکل 5. فرد تلفن را برمی‌دارد، با آن صحبت می‌کند و سپس تلفن را بر می‌دارد (199 فریم، 4 فریم کلیدی).

شکل 6. فرد جعبه را برمی‌دارد و آن را جای دیگری می‌گذارد (199 فریم، 4 فریم کلیدی).

شکل 7. فرد می‌نشیند تا از ترمینال استفاده کند، سپس برمی‌خیزد و می‌رود (399 فریم، 5 فریم کلیدی).

شکل 8. فرد وارد اتاق می‌شود، فلاسک را برمی‌دارد و از اتاق خارج می‌شود (170 فریم، 3 فریم کلیدی).

شکل 9. فرد وارد اتاق می‌شود، فلاسکی را برمی‌دارد، اما الگوریتم ردیابی فلاسک را رها می‌کند، بنابراین خطای خروجی رخ می‌دهد، در نهایت فرد از اتاق خارج می‌شود (170 فریم، 4 فریم کلیدی)

6. محدودیت‌ها

شکل 10. فرد وارد اتاق می‌شود، تلفنی را برمی‌دارد، اما الگوریتم ردیابی فلاسک را رها می‌کند، بنابراین خطای خروجی رخ می‌دهد، در نهایت فرد از اتاق خارج می‌شود (325 فریم، 4 فریم کلیدی).

7. نتیجه‌گیری و آثار آینده

ترجمه کلمات کلیدی

ویدئو - تشخیص اکشن - فریم های کلیدی - متن

کلمات کلیدی انگلیسی

Video,Action recognition,Key frames,Context

ترجمه چکیده

در این مقاله، سیستمی را توصیف می‌کنیم که به طور خودکار اعمال انسان را از ویدئوی ضبط شده از یک اتاق تشخیص می‌دهد. این اقدامات شامل ورود به اتاق، استفاده از ترمینال کامپیوتر، باز کردن کابینت، برداشتن تلفن، و غیره هستند. سیستم ما این اقدامات را با استفاده از دانش قبلی در مورد طرح اتاق تشخیص می‌دهد. در سیستم ما، تشخیص عمل توسط ماشین وضعیت مدلسازی می‌شود، که متشکل از «حالات» و «انتقال‌های» بین حالات است. انتقال‌ها از حالات مختلف را می‌توان بر اساس موقعیت فرد، تشخیص تغییر صحنه، و یا شی در حال ردیابی انجام داد. سیستم علاوه بر ایجاد توصیف متنی اعمال شناسایی شده، قادر به ایجاد فریم‌های کلیدی از توالی ویدئو است، که در اصل ویدئوی فشرده‌ی مبتنی بر ویدئو است. سیستم در چند توالی ویدئو تست شده و خوب عمل کرده است. نمونه مجموعه‌ی نتایج در این مقاله ارائه شده است. ایده‌های ارائه شده در این سیستم در امنیت خودکار قابل اجرا هستند.

ترجمه مقدمه

تشخیص عمل انسان به موضوع مهمی در بصری کامپیوتر تبدیل شده است. یکی از بدیهی‌ترین کاربردهای این فناوری در بحث امنیت است. در این مقاله، سیستمی برای تشخیص عمل انسان ارائه می‌کنیم که هدف آن کاربردهای امنیتی خودکار است. سیستم تشخیص عمل انسان به دو دلیل مهم در برنامه‌های امنیتی مفید است. اولین دلیل تشخیص ورود فرد غیرمجاز و نظارت بر اعمال اوست. دلیل دوم نظارت بر رفتار افرادی است که به حوزه‌ای تعلق دارند. با تشخیص اعمال فرد و با استفاده از زمینه، رفتار فرد می‌تواند تعیین شود. برخی از رفتارها برای افراد خاص نامناسب هستند. برای مثال، فردی که بدون حضور فردی در اتاق از کامپیوتر او استفاده می‌کند یا اشیایی که اجازه ندارد را برمی‌دارد. توانایی ارتباط برقرار کردن بین اسامی با افراد حاضر در صحنه به دستیابی به هر دو هدف کمک می‌کند.سیستم توصیف شده در این مقاله عمل انسان را در محیطی تشخیص می‌دهد که دانش قبلی در دسترس است. سه سطح پایین تکنیک‌های بصری کامپیوتر در سیستم ما استفاده می‌شوند. تکنیک‌ها شامل تشخیص پوست، ردیابی و تشخیص تغییر صحنه هستند. هر سه روشاز تصاویر رنگی استفاده می‌کنند. سیستم ما قادر به شناخت اعمال افراد متعدد به طور همزمان در اتاق است. این سیستم می‌تواند چند عمل را تشخیص دهد: ورود به صحنه، برداشتن تلفن، گذاشتن تلفن، استفاده از ترمینال کامپیوتر، ایستادن، نشستن، باز کردن چیزی، بستن چیزی، برداشتن شی (که قبلاً جالب تشخیص داده شده است)، پایین گذاشتن شیئی (که قبلا برداشته شده است)، خروج از صحنه و خروج از صحنه با شی. چند نمونه از این اعمال ماهیت نسبتاً عمومی دارند. اشیایی که فرد می‌تواند بردارد شامل کیف، تجهیزات کامپیوتر، و غیره هستند. اشیایی که می‌توانند باز و بسته شوند شامل کابینت، محفظه‌ی بالای سر، درها، و غیره هستند. سیستم ما علاوه بر ایجاد توصیف متنی از اعمال تشخیص داده شده، توالی ویدئو را به مجموعه‌های کوچک‌تری از فریم‌های کلیدی تبدیل می‌کند، که به صورت مختصر اعمال مهمی را که در اتاق رخ داده‌اند توصیف می‌کنند. کاهش توالی ویدئو به مجموعه‌ای از فریم‌های کلیدی تحلیل بیشتر صحنه را توسط کامپیوترها یا انسان‌ها تسهیل می‌کند (مانند تعیین نام شخصی که اعمال خاصی را انجام داده است). مزیت دیگر فریم‌های کلیدی کاهش فضای لازم برای ذخیره‌سازی و زمان لازم برای انتقال آنها است. بقیه‌ی مقاله در شش بخش سازماندهی شده است. بخش 2 مربوط به آثار مرتبط به این حوزه است. در بخش 3، در موردنقش دانش قبلی بحث در رویکرد خود بحث می‌کنیم. بخش 4 سیستم ما را توصیف می‌کند. در این بخش، ما سه سطح پایین تکنیک‌های بصری کامپیوتر را توصیف می‌کنیم، روش سطح بالایی را که برای تشخیصاعمال استفاده می‌کنیم ارائه می‌کنیم و در مورد استراتژی تعیین فریم‌های کلیدی ویدئو بحث می‌کنیم. تکنیک‌های سطح پایین، تشخیص پوست، ردیابی و تشخیص تغییر صحنه هستند. همه‌ی تکنیک‌های سطح پایین از تصاویر رنگی استفاده می‌کنند و اطلاعات مفیدی برای تشخیص اعمال ارائه می‌کنند که بر اساس مدل دستگاه حالات محدود است. در این بخش، در مورد استراتژی‌های تعیین فریم‌های کلیدی توالی‌های ویدئو هم بحث می‌کنیم. بخش 5 مربوط به نتایج تجربی هستند. ایده‌های خود را در چند توالی ویدئو آزمایش کرده‌ایم و خلاصه‌ای از تحلیل خود ارائه می‌کنیم. در بخش 6، در مورد محدودیت‌های سیستم خود صحبت می‌کنیم. سرانجام، در بخش 7، نتایج را ارائه می‌کنیم و پیشنهاداتی برای آثار آینده در این حوزه می‌دهیم.

دانلود رایگان 2 صفحه اول مقاله لاتین (PDF)

پیش نمایش مقاله

چکیده انگلیسی

In this paper, we describe a system which automatically recognizes human actions from video sequences taken of a room. These actions include entering a room, using a computer terminal, opening a cabinet, picking up a phone, etc. Our system recognizes these actions by using prior knowledge about the layout of the room. In our system, action recognition is modeled by a state machine, which consists of ‘states’ and ‘transitions’ between states. The transitions from different states can be made based on a position of a person, scene change detection, or an object being tracked. In addition to generating textual description of recognized actions, the system is able to generate a set of key frames from video sequences, which is essentially content-based video compression. The system has been tested on several video sequences and has performed well. A representative set of results is presented in this paper. The ideas presented in this system are applicable to automated security.

مقدمه انگلیسی

Human action recognition has become an important topic in computer vision. One of the most obvious applications of this technology is in security. In this paper we present a system for recognizing human actions, which is geared toward automated security applications. A human action recognition system is useful in security applications for two important reasons. The first is to detect the entrance of an unauthorized individual and monitor that individual's actions. The second is to monitor the behavior of people who do belong in an area. By recognizing actions a person performs and using context, the behavior of the person can be determined. Some behaviors are inappropriate for certain persons. For example, someone using another person's computer without them being in the room or taking objects they are not permitted to take. The ability to associate names with people in the scene would help achieve both of these goals. The system described in this paper recognizes human action in an environment for which prior knowledge is available. Three low-level computer vision techniques are used in our system. These techniques are skin detection, tracking and scene change detection. All three techniques use color images. Our system is capable of recognizing the actions of multiple people in a room simultaneously. The system can recognize several actions: entering the scene, picking up a phone, putting down a phone, using a computer terminal, standing up, sitting down, opening something, closing something, picking up an object (specified as interesting in advance), putting down an object (previously picked up), leaving the scene and leaving the scene with an object. Several of the actions are fairly generic in nature. Objects that could be picked up by a person include briefcases, computer equipment, etc. Objects that can be opened and closed include cabinets, overhead compartments, doors, etc. In addition to generating a textual description of recognized actions, our system reduces a video sequence into a smaller series of key frames, which concisely describe the important actions that have taken place in a room. Reduction of a video sequence to a series of key frames facilitates further analysis of the scene by computers or humans (such as deciding the name of the person who performed certain actions). Another advantage of key frames is the reduction of space required to store and time required to transmit them. The rest of this paper is organized into six sections. Section 2 deals with related work in this area. In Section 3, we discuss the role of prior knowledge in our approach. Section 4 describes our system. In this section, we describe three low-level computer vision techniques, the high-level method we use for action recognition and discuss strategies for determining key frames from video sequences. The low-level techniques are skin detection, tracking and scene change detection. All of the low-level techniques use color imagery and provide useful information for the action recognition, which is based on finite state machine model. In this section, we also discuss strategies for determining key frames from video sequences. Section 5 deals with the experimental results. We have tested our ideas with several video sequences and we provide a summary of our analysis. In Section 6, we comment on the limitations of our system. Finally, in Section 7 we provide conclusions and propose some future work in this area.

نتیجه گیری انگلیسی

We have proposed a system for recognizing human actions such as entering, using a terminal, opening a cabinet, picking up a phone, etc. The system uses the three low-level techniques of skin detection, tracking and scene change detection. We have successfully recognized these actions in several sequences, some of which had more than one person performing actions. In the experiments shown in this paper, our field of view was limited. We have performed some experiments with a wide-angle lens (Fig. 9 and Fig. 10), which increased the field of view. This improvement, however, is still not satisfactory. It would be interesting to perform some experiments in a situation where the system could view most of a room. This would allow the system to be tested on sequences with several people in the scene performing a variety of actions simultaneously. We are experimenting with the use of multiple cameras to provide a view that encompasses most of a small room. We feel that this is one of the most important extensions that should be made to this system. Extending our system to multiple cameras is conceptually straight forward, although the implementation details may be quite complex. As mentioned in Section 3, it would be desirable for our system to incorporate some method for learning the layout of the scene automatically. Obviously, increasing the number of actions recognized by the system is another avenue for future work. Detecting the arrival of new objects into the scene is an action that will be added soon. Also, future work should include implementing the system in real-time and testing the system on a large number of long sequences. Another interesting area for future work would be determining the identity of a person who has entered the room. This could be done, off-line by processing the key frames. Also, any advances in the three techniques that the system relies on (skin detection, color-based tracking and scene change detection) would help improve the system. As mentioned earlier, changing the low-level techniques would not affect our high-level model for action recognition. The system presented in this paper is capable of recognizing a number of interesting human actions. Future improvements to the low-level algorithms will make the system even more robust. The addition of multiple cameras and real-time analysis of video will allow the system to monitor an entire room over an extended period of time.