Build a medical imaging AI inference pipeline with MONAI Deploy on AWS

This post is cowritten with Ming (Melvin) Qin, David Bericat and Brad Genereaux from NVIDIA.
Medical imaging AI researchers and developers need a scalable, enterprise framework to build, deploy, and integrate their AI applications. AWS and NVIDIA have come together to make this vision a reality. AWS, NVIDIA, and other partners build applications and solutions to make healthcare more accessible, affordable, and efficient by accelerating cloud connectivity of enterprise imaging. MONAI Deploy is one of the key modules within MONAI (Medical Open Network for Artificial Intelligence) developed by a consortium of academic and industry leaders, including NVIDIA. AWS HealthImaging (AHI) is a HIPAA-eligible, highly scalable, performant, and cost-effective medical imagery store. We have developed a MONAI Deploy connector to AHI to integrate medical imaging AI applications with subsecond image retrieval latencies at scale powered by cloud-native APIs. The MONAI AI models and applications can be hosted on Amazon SageMaker, which is a fully managed service to deploy machine learning (ML) models at scale. SageMaker takes care of setting up and managing instances for inference and provides built-in metrics and logs for endpoints that you can use to monitor and receive alerts. It also offers a variety of NVIDIA GPU instances for ML inference, as well as multiple model deployment options with automatic scaling, including real-time inference, serverless inference, asynchronous inference, and batch transform.
In this post, we demonstrate how to deploy a MONAI Application Package (MAP) with the connector to AWS HealthImaging, using a SageMaker multi-model endpoint for real-time inference and asynchronous inference. These two options cover a majority of near-real-time medical imaging inference pipeline use cases.
Solution overview
The following diagram illustrates the solution architecture.

Complete the following prerequisite steps:

Use an AWS account with one of the following Regions, where AWS HealthImaging is available: North Virginia (us-east-1), Oregon (us-west-2), Ireland (eu-west-1), and Sydney (ap-southeast-2).
Create an Amazon SageMaker Studio domain and user profile with AWS Identity and Access Management (IAM) permission to access AWS HealthImaging.
Enable the JupyterLab v3 extension and install Imjoy-jupyter-extension if you want to visualize medical images on SageMaker notebook interactively using itkwidgets.

MAP connector to AWS HealthImaging
AWS HealthImaging imports DICOM P10 files and converts them into ImageSets, which are a optimized representation of a DICOM series. AHI provides API access to ImageSet metadata and ImageFrames. Metadata contains all DICOM attributes in a JSON document. ImageFrames are returned encoded in the High-Throughput JPEG2000 (HTJ2K) lossless format, which can be decoded extremely fast. ImageSets can be retrieved by using the AWS Command Line Interface (AWS CLI) or the AWS SDKs.
MONAI is a medical imaging AI framework that takes research breakthroughs and AI applications into clinical impact. MONAI Deploy is the processing pipeline that enables the end-to-end workflow, including packaging, testing, deploying, and running medical imaging AI applications in clinical production. It comprises the MONAI Deploy App SDK, MONAI Deploy Express, Workflow Manager, and Informatics Gateway. The MONAI Deploy App SDK provides ready-to-use algorithms and a framework to accelerate building medical imaging AI applications, as well as utility tools to package the application into a MAP container. The built-in standards-based functionalities in the app SDK allow the MAP to smoothly integrate into health IT networks, which requires the use of standards such as DICOM, HL7, and FHIR, and across data center and cloud environments. MAPs can use both predefined and customized operators for DICOM image loading, series selection, model inference, and postprocessing
We have developed a Python module using the AWS HealthImaging Python SDK Boto3. You can pip install it and use the helper function to retrieve DICOM Service-Object Pair (SOP) instances as follows:
!pip install -q AHItoDICOMInterface
from AHItoDICOMInterface.AHItoDICOM import AHItoDICOM
helper = AHItoDICOM()
instances = helper.DICOMizeImageSet(datastore_id=datastoreId , image_set_id=next(iter(imageSetIds)))
The output SOP instances can be visualized using the interactive 3D medical image viewer itkwidgets in the following notebook. The AHItoDICOM class takes advantage of multiple processes to retrieve pixel frames from AWS HealthImaging in parallel, and decode the HTJ2K binary blobs using the Python OpenJPEG library. The ImageSetIds come from the output files of a given AWS HealthImaging import job. Given the DatastoreId and import JobId, you can retrieve the ImageSetId, which is equivalent to the DICOM series instance UID, as follows:
imageSetIds = {}
response = s3.head_object(Bucket=OutputBucketName, Key=f”output/{res_createstore[‘datastoreId’]}-DicomImport-{res_startimportjob[‘jobId’]}/job-output-manifest.json”)
if response[‘ResponseMetadata’][‘HTTPStatusCode’] == 200:
data = s3.get_object(Bucket=OutputBucketName, Key=f”output/{res_createstore[‘datastoreId’]}-DicomImport-{res_startimportjob[‘jobId’]}/SUCCESS/success.ndjson”)
contents = data[‘Body’].read().decode(“utf-8″)
for l in contents.splitlines():
isid = json.loads(l)[‘importResponse’][‘imageSetId’]
if isid in imageSetIds:
except ClientError:
With ImageSetId, you can retrieve the DICOM header metadata and image pixels separately using native AWS HealthImaging API functions. The DICOM exporter aggregates the DICOM headers and image pixels into the Pydicom dataset, which can be processed by the MAP DICOM data loader operator. Using the DICOMizeImageSet()function, we have created a connector to load image data from AWS HealthImaging, based on the MAP DICOM data loader operator:
class AHIDataLoaderOperator(Operator):
def __init__(self, ahi_client, must_load: bool = True, *args, **kwargs):
self.ahi_client = ahi_client

def _load_data(self, input_obj: string):
study_dict = {}
series_dict = {}
sop_instances = self.ahi_client.DICOMizeImageSet(input_obj[‘datastoreId’], input_obj[‘imageSetId’])
In the preceding code, ahi_client is an instance of the AHItoDICOM DICOM exporter class, with data retrieval functions illustrated. We have included this new data loader operator into a 3D spleen segmentation AI application created by the MONAI Deploy App SDK. You can first explore how to create and run this application on a local notebook instance, and then deploy this MAP application into SageMaker managed inference endpoints.
SageMaker asynchronous inference
A SageMaker asynchronous inference endpoint is used for requests with large payload sizes (up to 1 GB), long processing times (up to 15 minutes), and near-real-time latency requirements. When there are no requests to process, this deployment option can downscale the instance count to zero for cost savings, which is ideal for medical imaging ML inference workloads. Follow the steps in the sample notebook to create and invoke the SageMaker asynchronous inference endpoint. To create an asynchronous inference endpoint, you will need to create a SageMaker model and endpoint configuration first. To create a SageMaker model, you will need to load a model.tar.gz package with a defined directory structure into a Docker container. The model.tar.gz package includes a pre-trained spleen segmentation model.ts file and a customized file. We have used a prebuilt container with Python 3.8 and PyTorch 1.12.1 framework versions to load the model and run predictions.
In the customized file, we instantiate an AHItoDICOM helper class from AHItoDICOMInterface and use it to create a MAP instance in the model_fn() function, and we run the MAP application on every inference request in the predict_fn() function:
from app import AISpleenSegApp
from AHItoDICOMInterface.AHItoDICOM import AHItoDICOM
helper = AHItoDICOM()
def model_fn(model_dir, context):

monai_app_instance = AISpleenSegApp(helper, do_run=False,path=”/home/model-server”)

def predict_fn(input_data, model):
with open(‘/home/model-server/inputImageSets.json’, ‘w’) as f:
output_folder = “/home/model-server/output”
if not os.path.exists(output_folder):
os.makedirs(output_folder)’/home/model-server/inputImageSets.json’, output=output_folder, workdir=’/home/model-server’, model=’/opt/ml/model/model.ts’)
To invoke the asynchronous endpoint, you will need to upload the request input payload to Amazon Simple Storage Service (Amazon S3), which is a JSON file specifying the AWS HealthImaging datastore ID and ImageSet ID to run inference on:
sess = sagemaker.Session()
InputLocation = sess.upload_data(‘inputImageSets.json’, bucket=sess.default_bucket(), key_prefix=prefix, extra_args={“ContentType”: “application/json”})
response = runtime_sm_client.invoke_endpoint_async(EndpointName=endpoint_name, InputLocation=InputLocation, ContentType=”application/json”, Accept=”application/json”)
output_location = response[“OutputLocation”]
The output can be found in Amazon S3 as well.
SageMaker multi-model real-time inference
SageMaker real-time inference endpoints meet interactive, low-latency requirements. This option can host multiple models in one container behind one endpoint, which is a scalable and cost-effective solution to deploying several ML models. A SageMaker multi-model endpoint uses NVIDIA Triton Inference Server with GPU to run multiple deep learning model inferences.
In this section, we walk through how to create and invoke a multi-model endpoint adapting your own inference container in the following sample notebook. Different models can be served in a shared container on the same fleet of resources. Multi-model endpoints reduce deployment overhead and scale model inferences based on the traffic patterns to the endpoint. We used AWS developer tools including Amazon CodeCommit, Amazon CodeBuild, and Amazon CodePipeline to build the customized container for SageMaker model inference. We prepared a to bring your own container instead of the file in the previous example, and implemented the initialize(), preprocess(), and inference() functions:
from app import AISpleenSegApp
from AHItoDICOMInterface.AHItoDICOM import AHItoDICOM
class ModelHandler(object):
def __init__(self):
self.initialized = False
self.shapes = None
def initialize(self, context):
self.initialized = True
properties = context.system_properties
model_dir = properties.get(“model_dir”)
gpu_id = properties.get(“gpu_id”)
helper = AHItoDICOM()
self.monai_app_instance = AISpleenSegApp(helper, do_run=False, path=”/home/model-server/”)
def preprocess(self, request):
inputStr = request[0].get(“body”).decode(‘UTF8’)
datastoreId = json.loads(inputStr)[‘inputs’][0][‘datastoreId’]
imageSetId = json.loads(inputStr)[‘inputs’][0][‘imageSetId’]
with open(‘/tmp/inputImageSets.json’, ‘w’) as f:
f.write(json.dumps({“datastoreId”: datastoreId, “imageSetId”: imageSetId}))
return ‘/tmp/inputImageSets.json’
def inference(self, model_input):, output=”/home/model-server/output/”, workdir=”/home/model-server/”, model=os.environ[“model_dir”]+”/model.ts”)
After the container is built and pushed to Amazon Elastic Container Registry (Amazon ECR), you can create SageMaker model with it, plus different model packages (tar.gz files) in a given Amazon S3 path:
model_name = “DEMO-MONAIDeployModel” + strftime(“%Y-%m-%d-%H-%M-%S”, gmtime())
model_url = “s3://{}/{}/”.format(bucket, prefix)
container = “{}.dkr.ecr.{}{}:dev”.format( account_id, region, prefix )
container = {“Image”: container, “ModelDataUrl”: model_url, “Mode”: “MultiModel”}
create_model_response = sm_client.create_model(ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=container)
It’s noteworthy that the model_url here only specifies the path to a folder of tar.gz files, and you specify which model package to use for inference when you invoke the endpoint, as shown in the following code:
Payload = {“inputs”: [ {“datastoreId”: datastoreId, “imageSetId”: next(iter(imageSetIds))} ]}
response = runtime_sm_client.invoke_endpoint(EndpointName=endpoint_name, ContentType=”application/json”, Accept=”application/json”, TargetModel=”model.tar.gz”, Body=json.dumps(Payload))
We can add more models to the existing multi-model inference endpoint without having to update the endpoint or create a new one.
Clean up
Don’t forget to complete the Delete the hosting resources step in the lab-3 and lab-4 notebooks to delete the SageMaker inference endpoints. You should turn down the SageMaker notebook instance to save costs as well. Finally, you can either call the AWS HealthImaging API function or use the AWS HealthImaging console to delete the image sets and data store created earlier:
for s in imageSetIds.keys():
medicalimaging.deleteImageSet(datastoreId, s)
In this post, we showed you how to create a MAP connector to AWS HealthImaging, which is reusable in applications built with the MONAI Deploy App SDK, to integrate with and accelerate image data retrieval from a cloud-native DICOM store to medical imaging AI workloads. The MONAI Deploy SDK can be used to support hospital operations. We also demonstrated two hosting options to deploy MAP AI applications on SageMaker at scale.
Go through the example notebooks in the GitHub repository to learn more about how to deploy MONAI applications on SageMaker with medical images stored in AWS HealthImaging. To know what AWS can do for you, contact an AWS representative.
For additional resources, refer to the following:

Medical Imaging on AWS
Introducing AWS HealthImaging — purpose-built for medical imaging at scale
AWS HealthImaging Developer Guide
Integration of on-premises medical imaging data with AWS HealthImaging

About the Authors
Ming (Melvin) Qin is an independent contributor on the Healthcare team at NVIDIA, focused on developing an AI inference application framework and platform to bring AI to medical imaging workflows. Before joining NVIDIA in 2018 as a founding member of Clara, Ming spent 15 years developing Radiology PACS and Workflow SaaS as lead engineer/architect at Stentor Inc., later acquired by Philips Healthcare to form its Enterprise Imaging.
David Bericat is a product manager for Healthcare at NVIDIA, where he leads the Project MONAI Deploy working group to bring AI from research to clinical deployments. His passion is to accelerate health innovation globally translating it to true clinical impact. Previously, David worked at Red Hat, implementing open source principles at the intersection of AI, cloud, edge computing, and IoT. His proudest moments include hiking to the Everest base camp and playing soccer for over 20 years.
Brad Genereaux is Global Lead, Healthcare Alliances at NVIDIA, where he is responsible for developer relations with a focus in medical imaging to accelerate artificial intelligence and deep learning, visualization, virtualization, and analytics solutions. Brad evangelizes the ubiquitous adoption and integration of seamless healthcare and medical imaging workflows into everyday clinical practice, with more than 20 years of experience in healthcare IT.
Gang Fu is a Healthcare Solutions Architect at AWS. He holds a PhD in Pharmaceutical Science from the University of Mississippi and has over 10 years of technology and biomedical research experience. He is passionate about technology and the impact it can make on healthcare.
JP Leger is a Senior Solutions Architect supporting academic medical centers and medical imaging workflows at AWS. He has over 20 years of expertise in software engineering, healthcare IT, and medical imaging, with extensive experience architecting systems for performance, scalability, and security in distributed deployments of large data volumes on premises, in the cloud, and hybrid with analytics and AI.
Chris Hafey is a Principal Solutions Architect at Amazon Web Services. He has over 25 years’ experience in the medical imaging industry and specializes in building scalable high-performance systems. He is the creator of the popular CornerstoneJS open source project, which powers the popular OHIF open source zero footprint viewer. He contributed to the DICOMweb specification and continues to work towards improving its performance for web-based viewing.