Spring 2012 ECE 6130 Grid and Cloud Computing
Prof. Howie Huang
Monday 6:10 - 8:40 PM                                    2020 K St 23
Office Hours: Monday 3:00 - 5:00 PM or by appointments


Introduction

What are the challenges in building a large computing system?  What is Cloud Computing that is drawing investments in billions of dollars from companies like Amazon, Google, Yahoo, etc.?  ...  These are the questions that this research-oriented class will try to answer through a combination of extensive reading, writing, and most importantly, practical experience in Clusters, Grids, and Clouds.

Each student will be expected to read and present papers, write short summaries, lead and participate in discussions. There is no exam. Students are required to complete a semester-long research project in groups. Many thanks for the generous support from Amazon AWS!

 

Discussion board (jointly with CSCI 6907): http://piazza.com/class#spring2012/csci690783

 

Students are required to honor the GWU Code of Academic Integrity when completing all assignments, projects, and examinations.

 


Grading 

All assignments are due in class on Mondays.  No late submissions.

Participation:                                       10%
            Each student must be willing to participate fully in the class.  It is insufficient just to show up in class.

Homework Assignment:                    10%
            Each student is required to complete the assignment due on Feb 13th.
Reading Summaries:                         15%
            We will read about three papers each week.  Each student is required to read ALL of them ahead of the class, and write a short summary for one paper each week. A template is in the Blackboard. (Read "The Task of a Referee" by Alan Jay Smith). 
Paper Presentation:                           15%
            Each student would be expected to present and lead the discussion of TWO papers.  The student must email the presentation slides to the instructor by 8 AM on Fridays of each week.  Late or no submissions will result in the penalties.  (Read "How to Present a Paper"by Leslie Lamport).
Course Project:                                   50%
   Initial Proposal                                          (Week 3)
   Final Proposal                                   10%  (Week 4)
   Mid-term Report                              10%  (Week 8)
   Final Report                                      30%  (Week 15)
            Projects will be done individually, or in groups of two students.  Each group is required to submit a project proposal, a midterm report, and a final report.  Students are strongly encouraged to talk to the instructor often and seek help as soon as possible.  Each group will present their projects in class.  It is expected that publishable results will come out of some projects.

TENTATIVE Course Schedule

Week 1
Jan 23

Introduction

- Ian Foster, What is The Grid, July 2002


 

 

 

 

Week 2
Jan 30

Architecture Virtualization

 

Corcoran 106

 

- Armbrust, M.  et al., Above the clouds: A berkeley view of cloud computing, EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009

Werner Vogels. 2008. Beyond Server Consolidation. Queue 6, 1 (January 2008), 20-26. DOI=10.1145/1348583.1348590 http://doi.acm.org/10.1145/1348583.1348590

http://dl.acm.org/citation.cfm?id=1348590 (short survey)

 

Form groups and email to the professor

 

Week 3
Feb 6

IO

Software

Corcoran 106

 

Evangelos Kotsovinos. 2011. Virtualization: blessing or curse?. Commun. ACM 54, 1 (January 2011), 61-65. DOI=10.1145/1866739.1866754 http://doi.acm.org/10.1145/1866739.1866754

http://dl.acm.org/citation.cfm?id=1866754 (short survey)

 

Albert Greenberg, James Hamilton, David A. Maltz, and Parveen Patel. 2008. The cost of a cloud: research problems in data center networks. SIGCOMM Comput. Commun. Rev. 39, 1 (December 2008), 68-73. DOI=10.1145/1496091.1496103 http://doi.acm.org/10.1145/1496091.1496103

http://dl.acm.org/citation.cfm?id=1496103 (SHORT)

 

An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS, Garfinkel, Harvard University.

http://simson.net/clips/academic/2007.Harvard.S3.pdf

 

Every group need to meet with the professor this week

 

Week 4
Feb 13

Proposal Presentation

Corcoran 106

Homework due

Project proposal due

 

Feb 20

President’s Day

No Class
        

 

 

Week 5
Feb 27

Data Center Architecture

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles (SOSP '07). ACM, New York, NY, USA, 205-220. DOI=10.1145/1294261.1294281 http://doi.acm.org/10.1145/1294261.1294281

http://dl.acm.org/citation.cfm?id=1294281 (Rathi)

 

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107-113. DOI=10.1145/1327452.1327492 http://doi.acm.org/10.1145/1327452.1327492

http://dl.acm.org/citation.cfm?id=1327492

 

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP '03). ACM, New York, NY, USA, 29-43. DOI=10.1145/945445.945450 http://doi.acm.org/10.1145/945445.945450

http://dl.acm.org/citation.cfm?id=945450 (Samuel)

 

 

 

Week 6
Mar 5

Virtualization

Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. 2003. Xen and the art of virtualization. In Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP '03). ACM, New York, NY, USA, 164-177. DOI=10.1145/945445.945462 http://doi.acm.org/10.1145/945445.945462

http://dl.acm.org/citation.cfm?id=945445.945462&coll=DL&dl=GUIDE&CFID=62472787&CFTOKEN=49991587  (Liu)

 

David Wentzlaff, Charles Gruenwald, III, Nathan Beckmann, Kevin Modzelewski, Adam Belay, Lamia Youseff, Jason Miller, and Anant Agarwal. 2010. An operating system for multicore and clouds: mechanisms and implementation. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC '10). ACM, New York, NY, USA, 3-14. DOI=10.1145/1807128.1807132 http://doi.acm.org/10.1145/1807128.1807132 (Bati)

 

Eric Keller, Jakub Szefer, Jennifer Rexford, and Ruby B. Lee. 2010. NoHype: virtualized cloud infrastructure without the virtualization. In Proceedings of the 37th annual international symposium on Computer architecture (ISCA '10). ACM, New York, NY, USA, 350-361. DOI=10.1145/1815961.1816010 http://doi.acm.org/10.1145/1815961.1816010 (Winter)

 

 

 

Mar 12

Spring Break

No class

 

 

Week 7
Mar 19

Resource Management I

Vijayaraghavan Soundararajan and Kinshuk Govil. 2010. Challenges in building scalable virtualized datacenter management. SIGOPS Oper. Syst. Rev. 44, 4 (December 2010), 95-102. DOI=10.1145/1899928.1899941 http://doi.acm.org/10.1145/1899928.1899941

http://dl.acm.org/citation.cfm?id=1899941 (Samuel)

 

Peter Bodik, Rean Griffith, Charles Sutton, Armando Fox, Michael I. Jordan, and David A. Patterson. 2009. Automatic exploration of datacenter performance regimes. In Proceedings of the 1st workshop on Automated control for datacenters and clouds (ACDC '09). ACM, New York, NY, USA, 1-6. DOI=10.1145/1555271.1555273 http://doi.acm.org/10.1145/1555271.1555273

http://dl.acm.org/citation.cfm?id=1555273 (SHORT) (Rode)

 

Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. 2011. CloudScale: elastic resource scaling for multi-tenant cloud systems. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC '11). ACM, New York, NY, USA, , Article 5 , 14 pages. DOI=10.1145/2038916.2038921 http://doi.acm.org/10.1145/2038916.2038921

http://dl.acm.org/citation.cfm?id=2038916.2038921&coll=DL&dl=GUIDE&CFID=62472787&CFTOKEN=49991587 (Xu)

 

 

 

Week 8
Mar 26

Resource Management II

Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant Resource Fairness: Fair Allocation of Multiple Resources in Datacenters. NSDI 2011 (Fazal)

 

Timothy Wood, K. K. Ramakrishnan, Prashant Shenoy, and Jacobus van der Merwe. 2011. CloudNet: dynamic pooling of cloud resources by live WAN migration of virtual machines. In Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments (VEE '11). ACM, New York, NY, USA, 121-132. DOI=10.1145/1952682.1952699 http://doi.acm.org/10.1145/1952682.1952699 (Ratanpal)

 

Ron C. Chiang and H. Howie Huang. 2011. TRACON: interference-aware scheduling for data-intensive applications in virtualized environments. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, , Article 47 , 12 pages. DOI=10.1145/2063384.2063447 http://doi.acm.org/10.1145/2063384.2063447 (Chiang)

 

 

Midterm project report due

 

Week 9
Apr 2

Storage I

Michael Vrable, Stefan Savage, and Geoffrey M. Voelker. 2009. Cumulus: Filesystem backup to the cloud. Trans. Storage 5, 4, Article 14 (December 2009), 28 pages. DOI=10.1145/1629080.1629084 http://doi.acm.org/10.1145/1629080.1629084

http://dl.acm.org/citation.cfm?id=1629084  (Liu)

 

John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazi\&\#232;res, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2010. The case for RAMClouds: scalable high-performance storage entirely in DRAM. SIGOPS Oper. Syst. Rev. 43, 4 (January 2010), 92-105. DOI=10.1145/1713254.1713276 http://doi.acm.org/10.1145/1713254.1713276 (Malik)

 

Ajay Gulati, Ganesha Shanmuganathan, Irfan Ahmad, Carl Waldspurger, and Mustafa Uysal. 2011. Pesto: online storage performance management in virtualized datacenters. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC '11). ACM, New York, NY, USA, , Article 19 , 14 pages. DOI=10.1145/2038916.2038935 http://doi.acm.org/10.1145/2038916.2038935 (Xiong)

 

 

 

Week 10
Apr 9

Storage II

Irfan Ahmad, Ajay Gulati, and Ali Mashtizadeh. 2011. vIC: interrupt coalescing for virtual machine storage device IO. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference (USENIXATC'11). USENIX Association, Berkeley, CA, USA, 4-4. (Rode)

 

Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir, and Assaf Schuster. 2011. vIOMMU: efficient IOMMU emulation. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference (USENIXATC'11). USENIX Association, Berkeley, CA, USA, 6-6. (Trocchia)

 

Adding Advanced Storage Controller Functionality via Low-Overhead Virtualization (FAST12) (Lamond)

 

 

 

Week 11
Apr 16

Reliability I

Kashi Venkatesh Vishwanath and Nachiappan Nagappan. 2010. Characterizing cloud computing hardware reliability. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC '10). ACM, New York, NY, USA, 193-204. DOI=10.1145/1807128.1807161 http://doi.acm.org/10.1145/1807128.1807161

http://dl.acm.org/citation.cfm?id=1807161 (Rode)

 

Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael I. Jordan. 2009. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP '09). ACM, New York, NY, USA, 117-132. DOI=10.1145/1629575.1629587 http://doi.acm.org/10.1145/1629575.1629587

http://dl.acm.org/citation.cfm?id=1629587 (Fazal)

 

Bianca Schroeder, Eduardo Pinheiro, and Wolf-Dietrich Weber. 2011. DRAM errors in the wild: a large-scale field study. Commun. ACM 54, 2 (February 2011), 100-107. DOI=10.1145/1897816.1897844 http://doi.acm.org/10.1145/1897816.1897844 (Malik)

 

 

 

Week 12
Apr 23

Reliability II

Xin Li, Michael C. Huang, Kai Shen, and Lingkun Chu. 2010. A realistic evaluation of memory hardware errors and software system susceptibility. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference (USENIXATC'10). USENIX Association, Berkeley, CA, USA, 6-6.  (Xiong)

 

PipeCloud: Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery. Timothy Wood (University of Massachusetts Amherst), H. Andres Lagar-Cavilla and K.K. Ramakrishnan (AT&T Research), Prashant Shenoy (University of Massachusetts Amherst), and Jacobus Van der Merwe (AT&T Research), SOCC’11 (Lamond)

 

Fengzhe Zhang, Jin Chen, Haibo Chen, and Binyu Zang. 2011. CloudVisor: retrofitting protection of virtual machines in multi-tenant cloud with nested virtualization. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 203-216. DOI=10.1145/2043556.2043576 http://doi.acm.org/10.1145/2043556.2043576 (Rathi)

 

 

 

Week 13
Apr 30

Security and Privacy

 Richard Chow, Philippe Golle, Markus Jakobsson, Elaine Shi, Jessica Staddon, Ryusuke Masuoka, and Jesus Molina. 2009. Controlling data in the cloud: outsourcing computation without outsourcing control. In Proceedings of the 2009 ACM workshop on Cloud computing security(CCSW '09). ACM, New York, NY, USA, 85-90. DOI=10.1145/1655008.1655020 http://doi.acm.org/10.1145/1655008.1655020

http://dl.acm.org/citation.cfm?id=1655020 (SHORT) (Bati)

 

Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. 2009. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM conference on Computer and communications security (CCS '09). ACM, New York, NY, USA, 199-212. DOI=10.1145/1653662.1653687 http://doi.acm.org/10.1145/1653662.1653687

http://dl.acm.org/citation.cfm?id=1653687 (Winter)

 

Gary Anthes. 2010. Security in the cloud. Commun. ACM 53, 11 (November 2010), 16-18. DOI=10.1145/1839676.1839683 http://doi.acm.org/10.1145/1839676.1839683

http://dl.acm.org/citation.cfm?id=1839683  (SHORT) (Altuwaijri)

 

 

 

Week 14
May 2

Power

The Case for Energy-Proportional Computing, Barroso and Holzle, IEEE Tras. on Computers 40(12), pp. 33-37.

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/33387.pdf (SHORT) (Altuwaijri)

 

Aman Kansal, Feng Zhao, Jie Liu, Nupur Kothari, and Arka A. Bhattacharya. 2010. Virtual machine power metering and provisioning. In Proceedings of the 1st ACM symposium on Cloud computing (SoCC '10). ACM, New York, NY, USA, 39-50. DOI=10.1145/1807128.1807136 http://doi.acm.org/10.1145/1807128.1807136

http://dl.acm.org/citation.cfm?id=1807136 (Trocchia)

 

Harold Lim, Aman Kansal, and Jie Liu. 2011. Power budgeting for virtualized data centers. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference (USENIXATC'11). USENIX Association, Berkeley, CA, USA, 5-5. (Ratanpal)

 

 

 

Week 15
May 7

Final project presentation 

Corcoran 106

Final Project Report Due