Deploy Your Custom Pre-Trained Model Using AWS Sagemaker

6 min readJul 28, 2024

AWS SageMaker makes deploying custom machine learning models simple and efficient. This article helps you to get your model up and running with ease using AWS services such as AWS Sagemaker, AWS ECR, AWS S3 and Docker.

Why AWS Sagemaker?

AWS SageMaker provides a robust environment to deploy machine learning models, supporting various use cases and offering managed infrastructure and tools. Key benefits include:

Simplified model training and deployment process
Managed infrastructure
Scalability and flexibility
Integration with other AWS services

Now Lets get started without wasting more time on the theoritical aspects of Sagemaker.

Pre-Requisites

An AWS Account

2. AWS CLI configured

3. Docker installed

4. Python environment with necessary libraries.

5. SageMaker IAM Policies

AmazonS3FullAccess
CloudWatchFullAccess
ECRFullAccessPolicy

You can tailor the policies as per your need, without granting full access to each service.

Steps to Deploy your Custom ML Model

1. Train and Save the Model

Since you want to deploy your custom AWS Model, you should have your Machine Learning model saved as pickle or a joblib file or any format o your preference which we load later with ease.

2. Upload Model and its artifacts to AWS S3

Assuming you have aws configured in your CLI, if not please configure it using the command aws configure. Then you can compress your model artifacts an upload it directly with the AWS console or use the following python code.

import boto3
import os
import subprocess

# Define your file paths
tar_file_path = 'path/to/your/compressed_model.tar.gz'
pickle_file_path = 'path/to/your/model.pkl'
s3_bucket = 'your-s3-bucket-name'
s3_bucket_path = f's3://{s3_bucket}'

# Step 1: Delete the existing tar file if it exists
if os.path.exists(tar_file_path):
    os.remove(tar_file_path)
    print(f'Deleted existing file: {tar_file_path}')

# Step 2: Ensure the pickle file exists at the expected location
if not os.path.exists(pickle_file_path):
    raise FileNotFoundError(f'{pickle_file_path} does not exist.')

# Step 3: Compress the model file
try:
    # Change the directory to 'model' and create the tar file with the pickle file at the root
    subprocess.run(
        ['tar', '-czvf', tar_file_path, '-C', os.path.dirname(pickle_file_path), os.path.basename(pickle_file_path)],
        check=True
    )
    print(f'Successfully compressed {pickle_file_path} to {tar_file_path}')
except subprocess.CalledProcessError as e:
    print(f'Error compressing file: {e}')

# Step 4: Upload the compressed file to S3
try:
    s3_client = boto3.client('s3')
    # Upload the file to S3
    s3_client.upload_file(tar_file_path, s3_bucket, tar_file_path)
    print(f"File uploaded successfully to s3://{s3_bucket}/{tar_file_path}")
except Exception as e:
    print(f"Error uploading file: {e}")

# Step 5: Output the S3 location of the model data
model_data = f'{s3_bucket_path}/{tar_file_path}'
print(f"Model Data S3 Location: {model_data}")

3. Create a Requirements File

Have a requirements.txt file at the root of your project which has all the required dependencies needed to run your model.

numpy
pandas==2.2.2
Flask

4. Create a Docker container

Using Docker to containerize your inference code and model dependencies ensures that your application can run smoothly and consistently across different environments. Create a Dockerfile to containerize your inference code and model dependencies. Here’s a sample Dockerfile.

# Pull Python Image from Docker Hub
FROM python:3.10

# Maintainer information
MAINTAINER you_good_name <your_good_email@your_good_domain.com>

# Set the working directory
WORKDIR /opt/program

# Install python3-venv
RUN apt-get update && \
    apt-get install -y python3-venv && \
    rm -rf /var/lib/apt/lists/*

# Create a virtual environment
RUN python3 -m venv /opt/venv

# Ensure the virtual environment is used
ENV PATH="/opt/venv/bin:$PATH"

# Upgrade pip to the latest version
RUN pip install --upgrade pip

# Copy the requirements file and install dependencies
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Set some environment variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE

# Copy the inference script and other necessary files
COPY src/inference.py ./

# Set the entry point for the container
ENTRYPOINT ["python", "inference.py"]

4. Build and Create your Docker Image and Push to AWS ECR

Since you have now written your docker file. It is now time to create your docker image and push it to a Repository in AWS ECR, from which sagemaker can later extract the image. Make sure your dockerfile is placed at the root of your project. Run the below set of commands.

# Build Docker image
docker build -t my-model-image .

# Create ECR repository (if not already created)
aws ecr create-repository --repository-name my-model-ecr --region your-region

# Authenticate Docker to ECR
aws ecr get-login-password --region your-region | docker login --username AWS --password-stdin your-account-id.dkr.ecr.your-region.amazonaws.com

# Tag Docker image
docker tag my-model-image:latest your-account-id.dkr.ecr.your-region.amazonaws.com/my-model-ecr:latest

# Push Docker image to ECR
docker push your-account-id.dkr.ecr.your-region.amazonaws.com/my-model-ecr:latest

5. Create Inference Script

The inference script should contain the model_fn, input_fn, predict_fn, and output_fn functions. Sagemaker will by default call and excecute them. A sample inference script is given below.

import os
import pickle
import json
import numpy as np
import logging
from flask import Flask, request, jsonify

# Configure logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

app = Flask(__name__)

def model_fn(model_dir):
    """
    Load the model from the specified directory.
    """
    model_path = os.path.join(model_dir, 'model.pkl')
    if not os.path.exists(model_path):
        raise FileNotFoundError(f'Model file does not exist: {model_path}')
    
    with open(model_path, 'rb') as model_file:
        model = pickle.load(model_file)
    
    logger.info("Model loaded successfully")
    return model

def input_fn(request_body, request_content_type='application/json'):
    """
    Process the input data from the request body.
    """
    if request_content_type == 'application/json':
        input_data = json.loads(request_body)
        features = np.array([input_data.get('feature1', 0), input_data.get('feature2', 0)]).reshape(1, -1)
        return features
    else:
        raise ValueError(f"Unsupported content type: {request_content_type}")

def predict_fn(input_data, model):
    """
    Make a prediction using the provided model and input data.
    """
    prediction = model.predict(input_data)
    return prediction

def output_fn(prediction, accept='application/json'):
    """
    Format the prediction output as specified.
    """
    response = {'prediction': int(prediction[0])}
    if accept == 'application/json':
        return json.dumps(response), accept
    else:
        raise ValueError(f"Unsupported accept type: {accept}")

# Load the model
model_dir = '/opt/ml/model'
model = model_fn(model_dir)

@app.route('/ping', methods=['GET'])
def ping():
    """
    Health check endpoint to verify if the model is loaded.
    """
    health = model is not None
    status = 200 if health else 404
    return jsonify({'status': 'Healthy' if health else 'Unhealthy'}), status

@app.route('/invocations', methods=['POST'])
def invoke():
    """
    Endpoint to process incoming requests and return predictions.
    """
    data = request.data.decode('utf-8')
    content_type = request.content_type
    
    # Process input data
    input_data = input_fn(data, content_type)
    
    # Make a prediction
    prediction = predict_fn(input_data, model)
    
    # Format the output
    response, content_type = output_fn(prediction, content_type)
    
    return response, 200, {'Content-Type': content_type}

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

6. Create Sagemaker Endpoint

from sagemaker import get_execution_role
from sagemaker.model import Model

# Define the role and session
role = get_execution_role()
sagemaker_session = sagemaker.Session()

# Define the SageMaker model
model = Model(
    image_uri=image_uri, # Provide the URI to your Pushed Docker image from AWS ECR
    model_data=model_data,
    role=role,
    sagemaker_session=sagemaker_session,
    entry_point=entry_point,
    source_dir=source_dir,
    env={
        'SAGEMAKER_CONTAINER_LOG_LEVEL': '30',
        'SAGEMAKER_ENABLE_CLOUDWATCH_LOGGING': 'true'
    }
)

# Create and deploy the endpoint
try:
    predictor = model.deploy(
        instance_type="ml.m5.xlarge",
        initial_instance_count=1,
        endpoint_name="your-good-endpoint-name"
    )
except Exception as e:
    print(f"Failed to deploy endpoint: {e}")

7. Test the Endpoint

If the endpoint was created successfully, you can test it with the code below. If it fails, investigate Cloudwatch logs to debug the issue.

import json
import sagemaker
from sagemaker.predictor import Predictor
from sagemaker import get_execution_role

# Initialize SageMaker session and get execution role
sagemaker_session = sagemaker.Session()
role = get_execution_role()

# Replace with your actual endpoint name
endpoint_name = "your-endpoint-name"

# Define the request body with the input features
request_body = {
    "feature1": 1.0,
    "feature2": 2.0
    # Add more features as required
}

# Create a Predictor object
predictor = Predictor(endpoint_name=endpoint_name, sagemaker_session=sagemaker_session)

# Send a request to the SageMaker endpoint
response = predictor.predict(json.dumps(request_body), initial_args={'ContentType': 'application/json'})

# Parse the response
prediction = json.loads(response)

# Print the prediction
print("Prediction:", prediction)

Summary

And we are DONE !! You have just deployed your custom pre-trained model using AWS Sagemaker.