MapR Tutorial Series

MapR Learning Guide

Selvaraaju Murugesan

May 6, 2017

Selvaraaju Murugesan MapR Learning Guide

Storage Pool

MapR-FS groups disks into storage pools, usually made up oftwo or three disks

�Stripe Width� parameter lets you con�gure number of disksper storage pool

Each node in a MapR cluster can support up to 36 storagepools

Use mrcon�g command to create, remove and manage storagepolols, disk groups and disks


Example 1

If you have 11 disks in a node, how many storage pools will becreated by default?


Example 1 Solution


3 storage pool of 3 disks each1 storage pool of 2 disks


Example 2



Example 2 Solution


3 storage pool of 3 disks each


Tradeo�s

If a disk fails in a storage pool, then an entire storage pool istaken o�ine and MapR will automatically begin datareplication

More disks increase more data to be replicated in case of diskfailure

Ideal scenario is have 3 disks per storage pool

Remember to have same size and speed disk drives in astorage pool for good performance


List of Ports

Port Number Services

7221 CLDB

8443 MCS

9443 MapR Installer

8888 Hue

8047 Drill

5181 Zookeeper

19888 ResourceManager


Default Settings

If a disk fails, then the data replication starts immediately

If a node fails, then the data replication starts after an hour

(60 minutes)

Node maintenance default time out is 1 hour after which data

replication starts (timeout is con�gurable)

To see / change con�guration use the comand maprcli con�g

load

If the CLDB heartbeat is greater than 5 seconds, an alarm is

raised and must be cleared manually

Secondary CLDB in a node will perform �read� operations


CLDB

Name container holds the metadata for the �les and directoriesin the volume, and the �rst 64 KB of each �le

Data container and Name container can have di�erentreplication factors

Data replication happens at volume level

For high availability, install more Zookeeper in the nodes

/opt/mapr/roles

Contains the list of con�gured services on a given node

/opt/cores

Core �les are copies of the contents of memory when certainanomalies are detected. Core �les are located in /opt/cores,and the name of the �le will include the name of the servicethat experienced an issue. When a core �le is created, analarm is raised


Zookeeper

If you want to start zookeeper

service mapr-zookeeper start

If you want to stop zookeeper

service mapr-zookeeper stop

If you want to know the status of zookeeper

service mapr-zookeeper qstatus

ZooKeeper should always be the �rst service that is started


MapR Commands

To know list of services in a node

maprcli service list

maprcli node list -columns id,ip,svc

To list CLDBs

maprcli node listcldbs

CLDB master

maprcli node cldbmaster

Node topology

maprcli node topo


Cluster Permissions

Log into the MCS (login)

This level also includes permissions to use the API andcommand-line interface, and grants read access on the clusterand its volumes

Start and stop services (SS)

Create volumes (CV)

Edit and view Access Control Lists, or permissions (A)

Full control gives user the ability to do everything except editpermissions (FC)


Volume Permissions

Dump or back up the volume (dump)

Mirror or restore the volume (restore)

Modify volume properties, which includes creating and deletingsnapshots, (m)

Delete the volume (d)

View and edit volume permissions (A)

Perform all operations except view and edit volumepermissions (FC)


MapR Utilities

Con�gure.sh

To setup a cluster nodeTo change services such as zookeeper, CLDB, etc..

Disksetup

formats speci�ed disks for use by MapR storage

fsck

used to �nd and �x inconsistencies in the �lesystemto make the metadata consistent on the next load of thestorage pool

gfsck

performs a scan and repair operation on a cluster, volume, orsnapshot


MapR Utilities

mrcon�g

create, remove, and manage storage pools, disk groups, anddisks; and provide information about containers

mapr-support-collect.sh

collect diagnostic information from all nodes in the cluster

mapr-support-dump.sh

ollects node and cluster-level information about the nodewhere the script is invoked

cldbguts

monitor the activity of the CLDB


NTP Server

All nodes should synchronize to one internal NTP server

systemctl commandntpq command


Logs

Centralised logging

Logs kept for 30 days by defaultsymbolic links to the logs

Local logging

logs kept for 3 hours by default

YARN logs expire after 3 hours

time starts after the job begins

Logs stord in /opt/mapr/logs deleted after 10 days by default

Change the settings in yarn-site.xml �le

Retention time are given in seconds


Space Requirements

/opt -> 128GB

/tmp -> 10GB

/opt/mapr/zkdata � 500MB

Swap space

110% physical memoryMinimum of 24GB and maximum of 128GB

Use LVM for boot drives


Volume Quota

Once the Advisory Quota is reached

alarm raised

Once Hard Quota is reached

no futher data is written

Only compressed data size is counted against the volume quota


Pre / Post-Installation Check

Pre-installation check

Stream � CPUIozone � I/O speed memory (destructive write/read)Rpctest � network speed

Post-installation check

DFSIO - I/O speed memory (mapreduce job)RWspeedtestTerraGen / Terrasort � mapreduce jobTerrasort job suggest possible problem with hard drive orcontroller


Snapshot / Mirror

Snapshots are stored at top level of every volume (hiddendirectory)

Scheduled snapshots expire automatically

Mirror start -> start mirror operation between source &destination

Mirror push -> push updates from source volume to all mirrorvolume

Mirror operation uses

70% network bandwidth�les are compressed


Role / Disk Balancer

Disk balancer

redistributes the data in all nodesuse disk balancer after you have added many new nodes

% concurrent disk rebalancer � 2 to 30%

Role balancer �

evenly distriburtes master containerso� by default; starts after 30 minutes after CLDB (can becon�gured)

Delay for active data 120 sec � 1800 sec (2 min � 30 min)


Job Scheduler

Fair scheduler is default

FIFO & Capacity scheduler

Can be on memory; also on CPU

User has each own queue

Weights to set resources

Allocation �le (reloaded every 10 seconds) to modify resourcemanagers

/opt/mapr/Hadoop/version/etc/hadoop /fair-scheduler.xml


Technology

MapR Tutorial Series