Custom Docker Image for AWS Sagemaker

Building Custom Image for AWS Sagemaker

This post will help you build your own docker image for training on Sagemaker. You can add your own libraries etc. This process takes more than an hour. I did it by using a GCP instance. You may use your local machine or an EC2/Cloud9 instance.

Install Docker

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y python3.8-venv docker-ce docker-ce-cli containerd.io

sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Install AWS CLI

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws configure # enter your access key, secret key, and region

Clone repo with changes made

This repo is explained below

git clone https://github.com/mmg10/aws_sage_custom.git
cd aws_sage_custom

Configuring Docker Repo

You may push your image to a private repo or a public repo. Note the cost difference before making a choice!

export ACCOUNT_ID={your_id}
export REGION=us-east-2
export REPOSITORY_NAME={your_repo}

For Private Repo

aws ecr create-repository --repository-name $REPOSITORY_NAME --region $REGION
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com

For Public Repo

Do not change the region. Public repos are always in the us-east-1 region

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
aws ecr-public create-repository --repository-name $REPOSITORY_NAME --region us-east-1

Install the Pre-requisites

python3 -m venv dlc
source dlc/bin/activate
pip install pip --upgrade
pip install -r src/requirements.txt

Build the Image

Here, I am only building the training image

bash src/setup.sh pytorch
python src/main.py --buildspec pytorch/buildspec.yml --framework pytorch --device_types cpu,gpu --image_types training

The Repo

The cloned repo is adapted from the original one. The changes made are in pytorch image (with pytorch lightning, torchmetrics, and torchsummary installed)


Author | MMG

Learning...