18
THUCloudDisk: Learn by Doing 李李李 GreenOrbs 李李李李李李李李李 http://www.greenorbs.org/ people/lzh/ Dec. 26th, 2013 1 清清清清 清清清清清清清

THUCloudDisk: Learn by Doing

  • Upload
    leal

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

THUCloudDisk: Learn by Doing. 清云网盘:边干边学云计算. 李振华 GreenOrbs 云计算与未来网络组 http://www.greenorbs.org/people/lzh/ Dec . 26th, 2013. Cloud Storage Service. Enabled by Cloud Computing & Internet Broadband Extremely popular in recent years. SkyDrive: 200 M users Dropbox: 100 M users - PowerPoint PPT Presentation

Citation preview

Page 1: THUCloudDisk: Learn by Doing

THUCloudDisk: Learn by Doing

李振华GreenOrbs 云计算与未来网络组

http://www.greenorbs.org/people/lzh/Dec. 26th, 2013

1

清云网盘:边干边学云计算

Page 2: THUCloudDisk: Learn by Doing

Cloud Storage Service Enabled by Cloud Computing & Internet Broadband Extremely popular in recent years

2

SkyDrive: 200 M users Dropbox: 100 M users Google Drive: numerous

… Apple iCloud: countless … Box.com: 14 M users

Page 3: THUCloudDisk: Learn by Doing

Our Cloud Storage Research (1)Started from the end of 2011

3

Black-box measurement: Traffic Overuse Problem and Computation Overuse Problem

- Focus on the most representative service, i.e. Dropbox

用户直观认为的资源开销 << 系统实际的资源开销

Page 4: THUCloudDisk: Learn by Doing

Our Cloud Storage Research (2)Solve the problems by developing middleware

4

- Significantly reduce the traffic/computation overuse

Zhenhua Li, et al. Efficient Batched Synchronization in Dropbox-like Cloud Storage Services. The 14th ACM/IFIP/USENIX International Middleware Conference (Middleware), Dec. 9-13, 2013, Beijing, China. (accept ratio: 24/128 = 18.8%)

Zhenhua Li, et al. Is the Cloud Storage Service Traffic All Necessary? Understanding the Data Sync Traffic Usage Effectiveness. In submission.

Modify the Linux kernel to thoroughly address

this problem

Page 5: THUCloudDisk: Learn by Doing

5

Drawback of Our ResearchBlack-box measurement and

middleware solution are very, very insufficient

What happens after the data packet dives into the cloud?

“Google Drive, SkyDrive and Dropbox do have problems. But have you considered the problems from a system design/tradeoff perspective?”

Page 6: THUCloudDisk: Learn by Doing

So the ThuCloudDisk project started …

6

We are re-developing a small-scale Dropbox from scratch

White-box measurement

Full knowledge of the system

Add any function as we like

Page 7: THUCloudDisk: Learn by Doing

7

Amazon 云计算的开源等价物称为“云计算的

Linux”新浪 SAE—微博除了 Keystone ,别的组件都独立一定使用官方教程,虽然它很长……

Page 8: THUCloudDisk: Learn by Doing

http://www.thucloud.com

8

Page 9: THUCloudDisk: Learn by Doing

9

Three Potential Problems 1. Does RAID Conflict with Cloud ?

2. How to Properly Configure Openstack Parameters?

3. When Moving Servers to the Real Data Center

4. Numerous Smartphones Aggregate Data into Cloud

Learn by Doing

Page 10: THUCloudDisk: Learn by Doing

Problem 1

10

Our HP Servers have internal RAID Cards and the RAID function CANNOT be disabled

“RAID on the storage drives is not required and not recommended. Swift's disk usage pattern is the worst case possible for RAID, and performance degrades very quickly using RAID 5 or 6.”

Openstack Swift official deploy manual

Our findings:1) RAID penalty does exist for Swift2) Only for some patterns of data

streams3) Only for some kinds of RAID

Bounding their reciprocities and conflicts

Page 11: THUCloudDisk: Learn by Doing

Problem 2

11

How to properly configure Openstack paras? swift-ring-builder account.builder create 18 3 1swift-ring-builder container.builder create 18 3 1swift-ring-builder object.builder create 18 3 1

Openstack Swift official deploy manual

Thierry’s findings:1) Some people are discussing about

the parameters in their blogs2) Sometimes the paras are very bad3) But we do not know the rules …

Finding the rules of Openstack paras

Page 12: THUCloudDisk: Learn by Doing

Problem 3

12

When moving servers to CERNET data center 1 、机房默认封锁托管服务器所有端口,要几个开几个2 、服务器一般不能重启,否则……3 、机房的所有托管服务器登录之后都只能看到局域网地址

我们的应对办法:1) 现学防火墙、端口扫描、NAT网络地址转换技术2) 反复研究官方教程中的每一句话、每一行命令3) 遇到无法解决的问题,还得亲自进机房

Openstack官方教程、网上各种攻略绝大多数地方使用的是公网地址Openstack官方教程有多处错误,但却是唯一靠谱的教程

Page 13: THUCloudDisk: Learn by Doing

Problem 4

13

Numerous smartphones aggregating data into Openstack Swift

We find almost all the data are Appended to certain files

But Swift can only Create or Delete files Like what Dropbox does, we implement a rsync layer between the clients and Swift However, we find most traffic and

computation overheads are unnecessary This is why we are now implementing an additional “virtual” APPEND API for Openstack Swift

X

“适合静态动态数据变动模式的综合业务云服务平台”

Page 14: THUCloudDisk: Learn by Doing

14

云计算到底是什么?云计算其实什么都不是

Openstack 是一族 Linux API 和工具的集合,让服务器管理员能够活的更轻松、用户能够用的更流畅

翻阅数百页Openstack官方文档,压根就没有” Cloud”这个词,但是有一个对Openstack的定义:

Openstack = Linux + ssh + ssl + MySQL + Apache + rsync + scp + xfs 用 Python 脚本(还有一点 C/C++ )粘合到一起

Page 15: THUCloudDisk: Learn by Doing

15

云计算到底有什么好处?跳过务虚的千条万条……只举一个实际例子

10月份在本地搭好云平台 11月上旬送进 CERNET机房,服务一直稳定运行 12月上旬突然发现部分服务器已经挂了很久了 然而,这一个月里,我们的用户没有任何感觉 替换掉坏服务器的部分硬盘后,十几条命令就让挂掉的服务器恢复工作,即使在这段时间里云服务依然稳定运行

如果纯用 Linux构建同样稳定容错的云计算系统,则需要几个 Linux高手中的高手奋战数月

Page 16: THUCloudDisk: Learn by Doing

16

Web 端和客户端演示( Esc )

Page 17: THUCloudDisk: Learn by Doing

17

世纪末日的话如果世纪末日到了,让你留下一句话给孩子

这个世界最大的痛苦不在于不圆满,而在于被扭曲。人生最主要的进步就是不断摆脱语言对人格的扭曲。 当我们真的需要云计算的时候,“贴着地面步行,不在云端跳舞”。对待真实普适的系统问题,给出我们学术界科学、量化、坚实的独特解答。

今天的云计算已经被太多人过度神话了……

Page 18: THUCloudDisk: Learn by Doing

Join us & Learn by doing!

Dr. Zhenhua Li

Dr. Jian Li

PhD students:Linsong ChengZhen Lu (potential)

Master students:He XiaoXin ZhongYinlong WangThierry