Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
FlexFlow Installation on AWS
Vinay Ramakrishnaiah
2020-12-11
LA-UR-20-29824
1 Logging-in to AWS
Step 1: Visit https://aws.amazon.com and click the sign-in button on the topright corner.
1
Step 2: Select root user and sign in to your AWS account.
Step 3: In the AWS Management Console, select EC2.
2
Step 4: In the EC2 console, select ”Launch instance”.
Step 5: You’ll be redirected to the ”Choose AMI” window. There, search for”deep learning” and select ”Deep Learning AMI (Ubuntu 18.04) Version 36.0- ami-0bc87a16c757a7f07”. This AMI has most of the dependencies installedthat are required for FlexFlow.
3
Step 6: Under the ’Choose Instance Type’ tab, ”filter by”: ”p2”.
Step 7: Depending on your requirement, you can select p2.xlarge or somethingelse and click ”Review and Launch” button at the bottom.
4
Step 8: You’ll be asked to create a new key pair or use an existing key pair forconnecting to your instance securely. If you haven’t, create a new pair and click”Launch instance”.
Step 9: Select your instance and click ”Connect”.
5
Step 10: Follow the instructions to connect using your favorite SSH client.
1.1 Connecting to the Instance
An AWS instance is provided for hands-on exercises. You should have receivedan email with your username and password (all lower case letters).username: <your_first_name>
password: bootcamp
SSH into the AWS instance.
1 ssh <username >@ec2 -<ip >.<region >. compute.amazonaws.com
2 Installing FlexFlow
FlexFlow has the following prerequisites:
• CUDNN
• Legion
• Protocol Buffer
• NCCL (optional)
• GASNet (Optional)
Your deep learning Ubuntu AMI has the following dependencies installed:CUDA, CUDNN, NCCL, Protocol Buffer. So, we only need to install Legion tobuild FlexFlow on your instance.
Step 1: Clone the repository (with submodules).
6
1 git clone --recursive https :// github.com/flexflow/
FlexFlow.git
2 cd FlexFlow/
Step 2: Export environment variables to point FlexFlow to its dependencies(make sure to use your local paths).
1 export FF_HOME=<FlexFlow_dir >
2 export CUDNN_DIR =/usr/local/cuda
3 export CUDA_DIR =/usr/local/cuda
4 export LG_RT_DIR=<FlexFlow_dir >/ legion/runtime
5 export PROTOBUF_DIR =/home/ubuntu/anaconda3/pkgs/
libprotobuf -3.13.0.1 - h8b12597_0
6 export LD_LIBRARY_PATH=$PROTOBUF_DIR/lib:
$LD_LIBRARY_PATH
Step 3: specify CUDA architecture of your GPU (K80 is 3.7).
1 export GPU_ARCH =37
Step 4: Go to the Python directory in FlexFlow and build Python bindings.
1 cd python/
2 make -j4
Step 5: Verify your installation by running a simple MNIST example. Refer tosection 3 for description on flags. Note: You’ll need keras-processing to runthis example, which can be installed using pip.
1 pip install keras -preprocessing
Run the MNIST example:
1 ./ flexflow_python ../ examples/python/keras/
seq_mnist_mlp.py -ll:py 1 -ll:gpu 1 -ll:fsize 10000
-ll:zsize 5000
3 Command-Line Flags
In addition to setting runtime configurations in a FlexFlow Python/C++ pro-gram, the FlexFlow runtime also accepts command line arguments for variousruntime parameters.
FlexFlow training flags
• -e or --epochs: total number of epochs to run (default: 1)
7
• -b or --batch-size: global batch size in each iteration (default: 64)
• -p or --print-freq: print frequency (default: 10)
• -d or --dataset: path to training dataset. If not set, synthetic data willbe used for training
Legion runtime flags
• -ll:py : number of python processors (default: 0)
• -ll:gpu : number of GPUs to use on each node (default: 0)
• -ll:fsize : size of device memory on each GPU (in MB)
• -ll:zsize : size of zero copy memory (pinned memory with direct GPUaccess) on each node (in MB). This is used for pre-fetching training imagesfrom disk
• -ll:cpu : number of data loading workers (default: 4)
• -ll:util : number of utility threads to create per process (default: 1)
• -ll:bgwork : number of background worker threads to create per process(default: 1)
Performance auto-tuning flags
• --search-budget or --budget: number of iterations of MCMC search(default: 0)
• --search-alpha or --alpha: hyper-parameter for the search procedure(default: 0.05)
• --export-strategy or --export: path to export the best discoveredstrategy (default: None)
• --import-strategy or --import: path to import previously saved strat-egy (default: None)
4 Docker Image
A Dockerfile defining the FlexFlow container in available in the folder FlexFlow/-docker, which can be used to build an image as shown in the example below.
Build an image using the Dockerfile.
1 cd docker
2 docker build -t ff:v1 .
8
Use the NVIDIA Container Toolkit to run the GPU accelerated Docker con-tainer.
1 nvidia -docker run -ti --rm ff:v1
9