24

MapR Tutorial Series

Embed Size (px)

Citation preview

Page 1: MapR Tutorial Series

MapR Learning Guide

Selvaraaju Murugesan

May 6, 2017

Selvaraaju Murugesan MapR Learning Guide

Page 2: MapR Tutorial Series

Storage Pool

MapR-FS groups disks into storage pools, usually made up oftwo or three disks

�Stripe Width� parameter lets you con�gure number of disksper storage pool

Each node in a MapR cluster can support up to 36 storagepools

Use mrcon�g command to create, remove and manage storagepolols, disk groups and disks

Selvaraaju Murugesan MapR Learning Guide

Page 3: MapR Tutorial Series

Example 1

If you have 11 disks in a node, how many storage pools will becreated by default?

Selvaraaju Murugesan MapR Learning Guide

Page 4: MapR Tutorial Series

Example 1 Solution

If you have 11 disks in a node, how many storage pools will becreated by default?

3 storage pool of 3 disks each1 storage pool of 2 disks

Selvaraaju Murugesan MapR Learning Guide

Page 5: MapR Tutorial Series

Example 2

If you have 9 disks in a node, how many storage pools will becreated by default?

Selvaraaju Murugesan MapR Learning Guide

Page 6: MapR Tutorial Series

Example 2 Solution

If you have 9 disks in a node, how many storage pools will becreated by default?

3 storage pool of 3 disks each

Selvaraaju Murugesan MapR Learning Guide

Page 7: MapR Tutorial Series

Tradeo�s

If a disk fails in a storage pool, then an entire storage pool istaken o�ine and MapR will automatically begin datareplication

More disks increase more data to be replicated in case of diskfailure

Ideal scenario is have 3 disks per storage pool

Remember to have same size and speed disk drives in astorage pool for good performance

Selvaraaju Murugesan MapR Learning Guide

Page 8: MapR Tutorial Series

List of Ports

Port Number Services

7221 CLDB

8443 MCS

9443 MapR Installer

8888 Hue

8047 Drill

5181 Zookeeper

19888 ResourceManager

Selvaraaju Murugesan MapR Learning Guide

Page 9: MapR Tutorial Series

Default Settings

If a disk fails, then the data replication starts immediately

If a node fails, then the data replication starts after an hour

(60 minutes)

Node maintenance default time out is 1 hour after which data

replication starts (timeout is con�gurable)

To see / change con�guration use the comand maprcli con�g

load

If the CLDB heartbeat is greater than 5 seconds, an alarm is

raised and must be cleared manually

Secondary CLDB in a node will perform �read� operations

Selvaraaju Murugesan MapR Learning Guide

Page 10: MapR Tutorial Series

CLDB

Name container holds the metadata for the �les and directoriesin the volume, and the �rst 64 KB of each �le

Data container and Name container can have di�erentreplication factors

Data replication happens at volume level

For high availability, install more Zookeeper in the nodes

/opt/mapr/roles

Contains the list of con�gured services on a given node

/opt/cores

Core �les are copies of the contents of memory when certainanomalies are detected. Core �les are located in /opt/cores,and the name of the �le will include the name of the servicethat experienced an issue. When a core �le is created, analarm is raised

Selvaraaju Murugesan MapR Learning Guide

Page 11: MapR Tutorial Series

Zookeeper

If you want to start zookeeper

service mapr-zookeeper start

If you want to stop zookeeper

service mapr-zookeeper stop

If you want to know the status of zookeeper

service mapr-zookeeper qstatus

ZooKeeper should always be the �rst service that is started

Selvaraaju Murugesan MapR Learning Guide

Page 12: MapR Tutorial Series

MapR Commands

To know list of services in a node

maprcli service list

maprcli node list -columns id,ip,svc

To list CLDBs

maprcli node listcldbs

CLDB master

maprcli node cldbmaster

Node topology

maprcli node topo

Selvaraaju Murugesan MapR Learning Guide

Page 13: MapR Tutorial Series

Cluster Permissions

Log into the MCS (login)

This level also includes permissions to use the API andcommand-line interface, and grants read access on the clusterand its volumes

Start and stop services (SS)

Create volumes (CV)

Edit and view Access Control Lists, or permissions (A)

Full control gives user the ability to do everything except editpermissions (FC)

Selvaraaju Murugesan MapR Learning Guide

Page 14: MapR Tutorial Series

Volume Permissions

Dump or back up the volume (dump)

Mirror or restore the volume (restore)

Modify volume properties, which includes creating and deletingsnapshots, (m)

Delete the volume (d)

View and edit volume permissions (A)

Perform all operations except view and edit volumepermissions (FC)

Selvaraaju Murugesan MapR Learning Guide

Page 15: MapR Tutorial Series

MapR Utilities

Con�gure.sh

To setup a cluster nodeTo change services such as zookeeper, CLDB, etc..

Disksetup

formats speci�ed disks for use by MapR storage

fsck

used to �nd and �x inconsistencies in the �lesystemto make the metadata consistent on the next load of thestorage pool

gfsck

performs a scan and repair operation on a cluster, volume, orsnapshot

Selvaraaju Murugesan MapR Learning Guide

Page 16: MapR Tutorial Series

MapR Utilities

mrcon�g

create, remove, and manage storage pools, disk groups, anddisks; and provide information about containers

mapr-support-collect.sh

collect diagnostic information from all nodes in the cluster

mapr-support-dump.sh

ollects node and cluster-level information about the nodewhere the script is invoked

cldbguts

monitor the activity of the CLDB

Selvaraaju Murugesan MapR Learning Guide

Page 17: MapR Tutorial Series

NTP Server

All nodes should synchronize to one internal NTP server

systemctl commandntpq command

Selvaraaju Murugesan MapR Learning Guide

Page 18: MapR Tutorial Series

Logs

Centralised logging

Logs kept for 30 days by defaultsymbolic links to the logs

Local logging

logs kept for 3 hours by default

YARN logs expire after 3 hours

time starts after the job begins

Logs stord in /opt/mapr/logs deleted after 10 days by default

Change the settings in yarn-site.xml �le

Retention time are given in seconds

Selvaraaju Murugesan MapR Learning Guide

Page 19: MapR Tutorial Series

Space Requirements

/opt -> 128GB

/tmp -> 10GB

/opt/mapr/zkdata � 500MB

Swap space

110% physical memoryMinimum of 24GB and maximum of 128GB

Use LVM for boot drives

Selvaraaju Murugesan MapR Learning Guide

Page 20: MapR Tutorial Series

Volume Quota

Once the Advisory Quota is reached

alarm raised

Once Hard Quota is reached

no futher data is written

Only compressed data size is counted against the volume quota

Selvaraaju Murugesan MapR Learning Guide

Page 21: MapR Tutorial Series

Pre / Post-Installation Check

Pre-installation check

Stream � CPUIozone � I/O speed memory (destructive write/read)Rpctest � network speed

Post-installation check

DFSIO - I/O speed memory (mapreduce job)RWspeedtestTerraGen / Terrasort � mapreduce jobTerrasort job suggest possible problem with hard drive orcontroller

Selvaraaju Murugesan MapR Learning Guide

Page 22: MapR Tutorial Series

Snapshot / Mirror

Snapshots are stored at top level of every volume (hiddendirectory)

Scheduled snapshots expire automatically

Mirror start -> start mirror operation between source &destination

Mirror push -> push updates from source volume to all mirrorvolume

Mirror operation uses

70% network bandwidth�les are compressed

Selvaraaju Murugesan MapR Learning Guide

Page 23: MapR Tutorial Series

Role / Disk Balancer

Disk balancer

redistributes the data in all nodesuse disk balancer after you have added many new nodes

% concurrent disk rebalancer � 2 to 30%

Role balancer �

evenly distriburtes master containerso� by default; starts after 30 minutes after CLDB (can becon�gured)

Delay for active data 120 sec � 1800 sec (2 min � 30 min)

Selvaraaju Murugesan MapR Learning Guide

Page 24: MapR Tutorial Series

Job Scheduler

Fair scheduler is default

FIFO & Capacity scheduler

Can be on memory; also on CPU

User has each own queue

Weights to set resources

Allocation �le (reloaded every 10 seconds) to modify resourcemanagers

/opt/mapr/Hadoop/version/etc/hadoop /fair-scheduler.xml

Selvaraaju Murugesan MapR Learning Guide