Custom Docker Image for AWS Sagemaker
Building Custom Image for AWS Sagemaker
This post will help you build your own docker image for training on Sagemaker. You can add your own libraries etc. This process takes more than an hour. I did it by using a GCP instance. You may use your local machine or an EC2/Cloud9 instance.
Install Docker
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y python3.8-venv docker-ce docker-ce-cli containerd.io
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker
Install AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws configure # enter your access key, secret key, and region
Clone repo with changes made
This repo is explained below
git clone https://github.com/mmg10/aws_sage_custom.git
cd aws_sage_custom
Configuring Docker Repo
You may push your image to a private repo or a public repo. Note the cost difference before making a choice!
export ACCOUNT_ID={your_id}
export REGION=us-east-2
export REPOSITORY_NAME={your_repo}
For Private Repo
aws ecr create-repository --repository-name $REPOSITORY_NAME --region $REGION
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
For Public Repo
Do not change the region. Public repos are always in the us-east-1 region
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws
aws ecr-public create-repository --repository-name $REPOSITORY_NAME --region us-east-1
Install the Pre-requisites
python3 -m venv dlc
source dlc/bin/activate
pip install pip --upgrade
pip install -r src/requirements.txt
Build the Image
Here, I am only building the training image
bash src/setup.sh pytorch
python src/main.py --buildspec pytorch/buildspec.yml --framework pytorch --device_types cpu,gpu --image_types training
The Repo
The cloned repo is adapted from the original one. The changes made are in pytorch image (with pytorch lightning, torchmetrics, and torchsummary installed)