46

Cross-DC Fault-Tolerant ViewFileSystem @ Twitter

Embed Size (px)

Citation preview

○○

○○

Command Line Interface

hadoop —config /etc/hadoop/revenue-dcA fs -ls /hadoop —config /etc/hadoop/dw-dcB fs -ls /

……

Directory layout on local file system/etc/hadoop

revenue-dcA

dw-dcB

core-site.xml, hdfs-site.xml

Log processing

MR

YARN

HDFS

Revenue

MR

YARN

HDFS

Data Warehouse

MR

YARN

HDFS

Log processing

MR

YARN

HDFS

DataCenter A DataCenter B

○○○

○○○

○○

DW @ dcB

HDFS blocks

Client machine: /etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml

DW @ dcB

HDFS blocks

Client machine:

hadoop -config /etc/hadoop/dw-dcB fs -get /user/bob/fileBDistributed FileSystem

fs.defaultFS -> hdfs://dw-nn-dcB -> NameNode1, NameNode2

/etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml

DW @ dcB

App paths HDFS paths

viewfs://dw-nn-dcB/var hdfs://dw-tmp-dcB/var

viewfs://dw-nn-dcB/user hdfs://dw-user-dcB/user

viewfs://dw-nn-dcB/logs hdfs://dw-log-dcB/logs

mountable.xmlFileSystem

ViewFileSystem (viewfs://)

Distributed FileSystem (hdfs://)

S3 FileSystem

Client machine:

hadoop -config /etc/hadoop/dw-dcB fs -get /user/bob/fileB

Distributed FileSystemdw-user-dcB -> dw-user-NN1-dcB, dw-user-NN2-dcB

ViewFileSystem/user -> hdfs://dw-user-dcB/user

/etc/hadoop/dw-dcB/hdfs-site.xml, core-site.xml, mountable.xml

DW @ dcB

○○

RevenueRevenue

MR

YARN

HDFS

Data WarehouseRevenue

MR

YARN

HDFS …..

DataCenter A DataCenter B

hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA

hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA

hdfs --config /etc/hadoop/dw-dcB fsck -fs hdfs://dw-user-dc2 /

hadoop —config /etc/hadoop/revenue-dcA fs -get /user/bob/fileA

hdfs --config /etc/hadoop/dw-dcB fsck -fs hdfs://dw-user-dc2 /

// find all “fileC” files on all clustersfor i in `ls /etc/hadoop`;do hadoop --config /etc/hadoop/$i fs -ls fileC; done

Description $sourcePath Result

global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable

Description $sourcePath Result

global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable

Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard coded namenode

Description $sourcePath Result

global namespace viewfs://revenue-nn-dcA/logs/dirB /logs/dirB Source path unresolvable

Active namenode hdfs://revenue-log-nn1-dcA/logs/dirB /logs/dirB not reliable due to hard coded namenode

hftp & DNS alias hftp://revenue-nn-dcA/logs/dirB hftp reliability and efficiency

viewfs://cluster-dc/

user tmp log

ClusterA

DC1

root /

○○

scalding> fsShell("-ls /dc*/") // list all clusters Found N items...

// no need to remember copy(From/To)Local aka get/putscalding> fsShell("-cp /local/user/cecily/file.txt /user/cecily/hdfs_file.txt")

res6: Int = 0

user user

Forward FileSystem API calls to all target filesystems with certain policies.

c@dc1

FileSystem nfly = FileSystem.get(conf);

out = nfly.create(“/nfly/C/user/cecily/file.txt”);

out.write(....);

out.close() BEGIN

out.close() END

In = nfly.open(“/nfly/C/user/cecily/file.txt”);

c@dc2

/user/cecily/_nfly_tmp_file.txt

/user/cecily/_nfly_tmp_file.txt

close, rename to /user/cecily/file.txt

close, rename to /user/cecily/file.txt

client@dc1

○○