|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|151800||2018||30 صفحه PDF||سفارش دهید||11086 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Parallel and Distributed Computing, Available online 5 March 2018
This report details a course on big data analytics designed for undergraduate junior and senior computer science students. The course is heavily focused on projects and writing code for big data processing. It is designed to help students learn parallel and distributed computing frameworks and techniques commonly used in industry. The curriculum includes a progression of projects requiring increasingly sophisticated big data processing ranging from data preprocessing with Linux tools, distributed processing with Hadoop MapReduce and Spark, and database queries with Hive and Googleâs BigQuery. We discuss hardware infrastructure and experimentally evaluate the cost/benefit of an on-premise server versus Amazonâs Elastic MapReduce. Finally, we showcase outcomes of our course in terms of student engagement and anonymous student feedback.