Automate fine-tuning of Llama 3.x models with the new visual designer …

You can now create an end-to-end workflow to train, fine tune, evaluate, register, and deploy generative AI models with the visual designer for Amazon SageMaker Pipelines. SageMaker Pipelines is a serverless workflow orchestration service purpose-built for foundation model operations (FMOps). It accelerates your generative AI journey from prototype to production because you don’t need to learn about specialized workflow frameworks to automate model development or notebook execution at scale. Data scientists and machine learning (ML) engineers use pipelines for tasks such as continuous fine-tuning of large language models (LLMs) and scheduled notebook job workflows. Pipelines can scale up to run tens of thousands of workflows in parallel and scale down automatically depending on your workload.

Whether you are new to pipelines or are an experienced user looking to streamline your generative AI workflow, this step-by-step post will demonstrate how you can use the visual designer to enhance your productivity and simplify the process of building complex AI and machine learning (AI/ML) pipelines. Specifically, you will learn how to:

Access and navigate the new visual designer in Amazon SageMaker Studio.
Create a complete AI/ML pipeline for fine-tuning an LLM using drag-and-drop functionality.
Configure the steps in the pipeline including the new model fine-tuning, model deployment, and notebook and code execution
Implement conditional logic to automate decision-making based on a model’s performance.
Register a successful model in the Amazon SageMaker Model Registry.

Llama fine-tuning pipeline overview
In this post, we will show you how to set up an automated LLM customization (fine-tuning) workflow so that the Llama 3.x models from Meta can provide a high-quality summary of SEC filings for financial applications. Fine-tuning allows you to configure LLMs to achieve improved performance on your domain-specific tasks. After fine-tuning, the Llama 3 8b model should be able to generate insightful financial summaries for its application users. But fine-tuning an LLM just once isn’t enough. You need to regularly tune the LLM to keep it up to date with the most recent real-world data, which in this case would be the latest SEC filings from companies. Instead of repeating this task manually each time new data is available (for example, once every quarter after earnings calls), you can create a Llama 3 fine-tuning workflow using SageMaker Pipelines that can be automatically triggered in the future. This will help you improve the quality of financial summaries produced by the LLM over time while ensuring accuracy, consistency, and reproducibility.
The SEC filings dataset is publicly available through an Amazon SageMaker JumpStart bucket. Here’s an overview of the steps to create the pipeline.

Fine tune a Meta Llama 3 8B model from SageMaker JumpStart using the SEC financial dataset.
Prepare the fine-tuned Llama 3 8B model for deployment to SageMaker Inference.
Deploy the fine-tuned Llama 3 8B model to SageMaker Inference.
Evaluate the performance of the fine-tuned model using the open-source Foundation Model Evaluations (fmeval) library
Use a condition step to determine if the fine-tuned model meets your desired performance. If it does, register the fine-tuned model to the SageMaker Model Registry. If the performance of the fine-tuned model falls below the desired threshold, then the pipeline execution fails.

Prerequisites
To build this solution, you need the following prerequisites:

An AWS account that will contain all your AWS resources.
An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, see Identity and Access Management for Amazon SageMaker.
Access to SageMaker Studio to access the SageMaker Pipelines visual editor. You first need to create a SageMaker domain and a user profile. See the Guide to getting set up with Amazon SageMaker.
An ml.g5.12xlarge instance for endpoint usage to deploy the model to, and an ml.g5.12xlarge training instance to fine-tune the model. You might need to request a quota increase; see Requesting a quota increase for more information.

Accessing the visual editor
Access the visual editor in the SageMaker Studio console by choosing Pipelines in the navigation pane, and then selecting Create in visual editor on the right. SageMaker pipelines are composed of a set of steps. You will see a list of step types that the visual editor supports.

At any time while following this post, you can pause your pipeline building process, save your progress, and resume later. Download the pipeline definition as a JSON file to your local environment by choosing Export at the bottom of the visual editor. Later, you can resume building the pipeline by choosing Import button and re-uploading the JSON file.
Step #1: Fine tune the LLM
With the new editor, we introduce a convenient way to fine tune models from SageMaker JumpStart using the Fine tune step. To add the Fine tune step, drag it to the editor and then enter the following details:

In the Model (input) section select Meta-Llama-3-8B. Scroll to the bottom of the window to accept the EULA and choose Save.
The Model (output) section automatically populates the default Amazon Simple Storage Service (Amazon S3) You can update the S3 URI to change the location where the model artifacts will be stored.
This example uses the default SEC dataset for training. You can also bring your own dataset by updating the Dataset (input)
Choose the ml.g5.12x.large instance.
Leave the default hyperparameter settings. These can be adjusted depending on your use case.
Optional) You can update the name of the step on the Details tab under Step display name. For this example, update the step name to Fine tune Llama 3 8B.

Step #2: Prepare the fine-tuned LLM for deployment
Before you deploy the model to an endpoint, you will create the model definition, which includes the model artifacts and Docker container needed to host the model.

Drag the Create model step to the editor.
Connect the Fine tune step to the Create model step using the visual editor.
Add the following details under the Settings tab:

Choose an IAM role with the required permissions.
Model (input):Step variable and Fine-tuning Model Artifacts.
Container: Bring your own container and enter the image URI dkr.ecr.<region_name>.amazonaws.com/djl-inference:0.28.0-lmi10.0.0-cu124 (replace <region_name> with your AWS Region) as the Location (ECR URI). This example uses a large model inference container. You can learn more about the deep learning containers that are available on GitHub.

Step #3: Deploy the fine-tuned LLM
Next, deploy the model to a real-time inference endpoint.

Drag the Deploy model (endpoint) step to the editor.
Enter a name such as llama-fine-tune for the endpoint name.
Connect this step to the Create model step using the visual editor.
In the Model (input) section, select Inherit model. Under Model name, select Step variable and the Model Name variable should be populated from the previous step. Choose Save.
Select g5.12xlarge instance as the Endpoint Type.

Step #4: Evaluate the fine-tuned LLM
After the LLM is customized and deployed on an endpoint, you want to evaluate its performance against real-world queries. To do this, you will use an Execute code step type that allows you to run the Python code that performs model evaluation using the factual knowledge evaluation from the fmeval library. The Execute code step type was introduced along with the new visual editor and provides three execution modes in which code can be run: Jupyter Notebooks, Python functions, and Shell or Python scripts. For more information about the Execute code step type, see the developer guide. In this example, you will use a Python function. The function will install the fmeval library, create a dataset to use for evaluation, and automatically test the model on its ability to reproduce facts about the real world.
Download the complete Python file, including the function and all imported libraries. The following are some code snippets of the model evaluation.
Define the LLM evaluation logic
Define a predictor to test your endpoint with a prompt:

# Set up SageMaker predictor for the specified endpoint
predictor = sagemaker.predictor.Predictor(
endpoint_name=endpoint_name,
serializer=sagemaker.serializers.JSONSerializer(),
deserializer=sagemaker.deserializers.JSONDeserializer()
)

# Function to test the endpoint with a sample prompt
def test_endpoint(predictor):

# Test endpoint and convert the payload to JSON
prompt = “Tell me about Amazon SageMaker”
payload = {
“inputs”: prompt,
“parameters”: {
“do_sample”: True,
“top_p”: 0.9,
“temperature”: 0.8,
“max_new_tokens”: 100
},
}
response = predictor.predict(payload)
print(f’Query successful. nnExample: Prompt: {prompt} Model response: {response[“generated_text”]}’)
output_format = ‘[0].generated_text’
return output_format

output_format = test_endpoint(predictor)

Invoke your endpoint:

response = runtime.invoke_endpoint(EndpointName=endpoint_name, Body=json.dumps(payload), ContentType=content_type)
result = json.loads(response[‘Body’].read().decode())

Generate a dataset:

# Create an evaluation dataset in JSONL format with capital cities and their regions
capitals = [
(“Aurillac”, “Cantal”),
(“Bamiyan”, “Bamiyan Province”),
(“Sokhumi”, “Abkhazia”),
(“Bukavu”, “South Kivu”),
(“Senftenberg”, “Oberspreewald-Lausitz”),
(“Legazpi City”, “Albay”),
(“Sukhum”, “Abkhazia”),
(“Paris”, “France”),
(“Berlin”, “Germany”),
(“Tokyo”, “Japan”),
(“Moscow”, “Russia”),
(“Madrid”, “Spain”),
(“Rome”, “Italy”),
(“Beijing”, “China”),
(“London”, “United Kingdom”),
]

# Function to generate a single entry for the dataset
def generate_entry():
city, region = random.choice(capitals)
if random.random() < 0.2:
alternatives = [f”{region} Province”, f”{region} province”, region]
answers = f”{region}<OR>” + “<OR>”.join(random.sample(alternatives, k=random.randint(1, len(alternatives))))
else:
answers = region
return {
“answers”: answers,
“knowledge_category”: “Capitals”,
“question”: f”{city} is the capital of”
}

# Generate the dataset
num_entries = 15
dataset = [generate_entry() for _ in range(num_entries)]
input_file = “capitals_dataset.jsonl”
with open(input_file, “w”) as f:
for entry in dataset:
f.write(json.dumps(entry) + “n”)

Set up and run model evaluation using fmeval:

# Set up SageMaker model runner
model_runner = SageMakerModelRunner(
endpoint_name=endpoint_name,
content_template=content_template,
output=”generated_text”
)

# Configure the dataset for evaluation
config = DataConfig(
dataset_name=”capitals_dataset_with_model_outputs”,
dataset_uri=output_file,
dataset_mime_type=MIME_TYPE_JSONLINES,
model_input_location=”question”,
target_output_location=”answers”,
model_output_location=”model_output”
)

# Set up and run the factual knowledge evaluation
eval_algo = FactualKnowledge(FactualKnowledgeConfig(target_output_delimiter=”<OR>”))
eval_output = eval_algo.evaluate(model=model_runner, dataset_config=config, prompt_template=”$model_input”, save=True)

# Print the evaluation results
print(json.dumps(eval_output, default=vars, indent=4))

Upload the LLM evaluation logic
Drag a new Execute code (Run notebook or code) step onto the editor and update the display name to Evaluate model using the Details tab from the settings panel.

To configure the Execute code step settings, follow these steps in the Settings panel:

Upload the python file py containing the function.
Under Code Settings change the Mode to Function and update the Handler to evaluating_function.py:evaluate_model. The handler input parameter is structured by putting the file name on the left side of the colon, and the handler function name on the right side: file_name.py:handler_function.
Add the endpoint_name parameter for your handler with the value of the endpoint created previously under Function Parameters (input); for example, llama-fine-tune.
Keep the default container and instance type settings.

After configuring this step, you connect the Deploy model (endpoint) step to the Execute code step using the visual editor.
Step #5: Condition step
After you execute the model evaluation code, you drag a Condition step to the editor. The condition step registers the fine-tuned model to a SageMaker Model Registry if the factual knowledge evaluation score exceeded the desired threshold. If the performance of the model was below the threshold, then the model isn’t added to the model registry and the pipeline execution fails.

Update the Condition step name under the Details tab to Is LLM factually correct.
Drag a Register model step and a Fail step to the editor as shown in the following GIF. You will not configure these steps until the next sections.
Return to the Condition step and add a condition under Conditions (input).

For the first String, enter factual_knowledge.
Select Greater Than as the test.
For the second String enter 7. The evaluation averages a single binary metric across every prompt in the dataset. For more information, see Factual Knowledge.

In the Conditions (output) section, for Then (execute if true), select Register model, and for Else (execute if false), select Fail.
After configuring this step, connect the Execute  code step to the Condition step using the visual editor.

You will configure the Register model and Fail steps in the following sections.
Step #6: Register the model
To register your model to the SageMaker Model Registry, you need to configure the step to include the S3 URI of the model and the image URI.

Return to the Register model step in the Pipelines visual editor that you created in the previous section and use the following steps to connect the Fine-tune step to the Register model This is required to inherit the model artifacts of the fine-tuned model.
Select the step and choose Add under the Model (input)
Enter the image URI dkr.ecr.<region_name>.amazonaws.com/djl-inference:0.28.0-lmi10.0.0-cu124(replace <region_name> with your Region) in the Image field. For the Model URI field, select Step variable and Fine-tuning Model Artifacts. Choose Save.
Enter a name for the Model group.

Step #7: Fail step
Select the Fail step on the canvas and enter a failure message to be displayed if the model fails to be registered to the model registry. For example: Model below evaluation threshold. Failed to register.

Save and execute the pipeline
Now that your pipeline has been constructed, choose Execute and enter a name for the execution to run the pipeline. You can then select the pipeline to view its progress. The pipeline will take 30–40 minutes to execute.

LLM customization at scale
In this example you executed the pipeline once manually from the UI. But by using the SageMaker APIs and SDK, you can trigger multiple concurrent executions of this pipeline with varying parameters (for example, different LLMs, different datasets, or different evaluation scripts) as part of your regular CI/CD processes. You don’t need to manage the capacity of the underlying infrastructure for SageMaker Pipelines because it automatically scales up or down based on the number of pipelines, number of steps in the pipelines, and number of pipeline executions in your AWS account. To learn more about the default scalability limits and request an increase in the performance of Pipelines, see the Amazon SageMaker endpoints and quotas.
Clean up
Delete the SageMaker model endpoint to avoid incurring additional charges.
Conclusion
In this post, we walked you through a solution to fine-tune a Llama 3 model using the new visual editor for Amazon SageMaker Pipelines. We introduced the fine-tuning step to fine-tune LLMs, and the Execute code step to run your own code in a pipeline step. The visual editor provides a user-friendly interface to create and manage AI/ML workflows. By using this capability, you can rapidly iterate on workflows before executing them at scale in production tens of thousands of times. For more information about this new feature, see Create and Manage Pipelines. Try it out and let us know your thoughts in the comments!

About the Authors
Lauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. She has a decade of experience in DevOps, infrastructure, and ML. Her areas of focus include MLOps/LLMOps, generative AI, and computer vision.
Brock Wade is a Software Engineer for Amazon SageMaker. Brock builds solutions for MLOps, LLMOps, and generative AI, with experience spanning infrastructure, DevOps, cloud services, SDKs, and UIs.
Piyush Kadam is a Product Manager for Amazon SageMaker, a fully managed service for generative AI builders. Piyush has extensive experience delivering products that help startups and enterprise customers harness the power of foundation models.

Implement Amazon SageMaker domain cross-Region disaster recovery using …

Amazon SageMaker is a cloud-based machine learning (ML) platform within the AWS ecosystem that offers developers a seamless and convenient way to build, train, and deploy ML models. Extensively used by data scientists and ML engineers across various industries, this robust tool provides high availability and uninterrupted access for its users. When working with SageMaker, your environment resides within a SageMaker domain, which encompasses critical components like Amazon Elastic File System (Amazon EFS) for storage, user profiles, and a diverse array of security configurations. This comprehensive setup enables collaborative efforts by allowing users to store, share, and access notebooks, Python files, and other essential artifacts.
In 2023, SageMaker announced the release of the new SageMaker Studio, which offers two new types of applications: JupyterLab and Code Editor. The old SageMaker Studio was renamed to SageMaker Studio Classic. Unlike other applications that share one single storage volume in SageMaker Studio Classic, each JupyterLab and Code Editor instance has its own Amazon Elastic Block Store (Amazon EBS) volume. For more information about this architecture, see New – Code Editor, based on Code-OSS VS Code Open Source now available in Amazon SageMaker Studio. Another new feature was the ability to bring your own EFS instance, which enables you to attach and detach a custom EFS instance.
A SageMaker domain exclusive to the new SageMaker Studio is composed of the following entities:

User profiles
Applications including JupyterLab, Code Editor, RStudio, Canvas, and MLflow
A variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations

As a precautionary measure, some customers may want to ensure continuous operation of SageMaker in unlikely event of regional impairment of SageMaker service. This solution leverages Amazon EFS’s built-in cross-region replication capability to serve as a robust disaster recovery mechanism, providing continuous and uninterrupted access to your SageMaker domain data across multiple regions. Replicating your data and resources across multiple Regions helps to safeguards against Regional outages and fortifies your defenses against natural disasters or unforeseen technical failures, thereby providing business continuity and disaster recovery capabilities. This setup is particularly crucial for mission-critical and time-sensitive workloads, so data scientists and ML engineers can seamlessly continue their work without any disruptive interruptions.
The solution illustrated in this post focuses on the new SageMaker Studio experience, particularly private JupyterLab and Code Editor spaces. Although the code base doesn’t include shared spaces, the solution is straightforward to extend with the same concept. In this post, we guide you through a step-by-step process to seamlessly migrate and safeguard your new SageMaker domain in Amazon SageMaker Studio from one active AWS to another AWS Region, including all associated user profiles and files. By using a combination of AWS services, you can implement this feature effectively, overcoming the current limitations within SageMaker.
Solution overview
In active-passive mode, the SageMaker domain infrastructure is only provisioned in the primary AWS Region. Data backup is in near real time using Amazon EFS replication. Diagram 1 illustrates this architecture.
Diagram 1:

When the primary Region is down, a new domain is launched in the secondary Region, and an AWS Step Functions workflow runs to restore data as seen in diagram 2.
Diagram 2:

In active-active mode depicted in diagram 3, the SageMaker domain infrastructure is provisioned in two AWS Regions. Data backup is in near real time using Amazon EFS replication. The data sync is completed by the Step Functions workflow, and its cadence can be on demand, scheduled, or invoked by an event.
Diagram 3:

You can find the complete code sample in the GitHub repo.
Click here to open the AWS console and follow along.
With all the benefits of upgraded SageMaker domains, we developed a fast and robust cross- AWS Region disaster recovery solution, using Amazon EFS to back up and recover user data stored in SageMaker Studio applications. In addition, domain user profiles and respective custom Posix are managed by a YAML file in an AWS Cloud Development Kit (AWS CDK) code base to make sure domain entities in the secondary AWS Region are identical to those in the primary AWS Region. Because user-level custom EFS instances are only configurable through programmatic API calls, creating users on the AWS Management Console is not considered in our context.
Backup
Backup is performed within the primary AWS Region. There are two types of sources: an EBS space and a custom EFS instance.
For an EBS space, a lifecycle config is attached to JupyterLab or Code Editor for the purposes of backing up files. Every time the user opens the application, the lifecycle config takes a snapshot of its EBS spaces and stores them in the custom EFS instance using an rsync command.
For a custom EFS instance, it’s automatically replicated to its read-only replica in the secondary AWS Region.
Recovery
For recovery in the secondary AWS Region, a SageMaker domain with the same user profiles and spaces is deployed, and an empty custom EFS instance is created and attached to it. Then an Amazon Elastic Container Service (Amazon ECS) task runs to copy all the backup files to the empty custom EFS instance. At the last step, a lifecycle config script is run to restore the Amazon EBS snapshots before the SageMaker space launched.
Prerequisites
Complete the following prerequisite steps:

Clone the GitHub repo to your local machine by running the following command in your terminal:

git clone git@github.com:aws-samples/sagemaker-domain-cross-region-disaster-recovery-using-custom-efs.git

Navigate to the project working directory and set up the Python virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install the required dependencies:

pip3 install -r requirements.txt

Bootstrap your AWS account and set up the AWS CDK environment in both Regions:

cdk bootstrap aws://ACCOUNT-NUMBER/PIMARY_REGION
cdk bootstrap aws://ACCOUNT-NUMBER/SECONDARY_REGION

Synthesize the AWS CloudFormation templates by running the following code:

cdk synth

Configure the necessary arguments in the constants.py file:

Set the primary Region in which you want to deploy the solution.
Set the secondary Region in which you want to recover the primary domain.
Replace the account ID variable with your AWS account ID.

Deploy the solution
Complete the following steps to deploy the solution:

Deploy the primary SageMaker domain:

cdk deploy SagemakerDomainPrimaryStack-NewStudio

Deploy the secondary SageMaker domain:

cdk deploy SagemakerDomainSecondaryStack-NewStudio

Deploy the disaster recovery Step Functions workflow:

cdk deploy ECSTaskStack-NewStudio

Launch the application with the custom EFS instance attached and add files to the application’s EBS volume and custom EFS instance.

Test the solution
Complete the following steps to test the solution:

Add test files using Code Editor or JupyterLab in the primary Region.
Stop and restart the application.

This invokes the lifecycle config script to take an Amazon EBS snapshot on the application.

On the Step Functions console in the secondary Region, run the disaster recovery Step Functions workflow.

The following figure illustrates the workflow steps.

On the SageMaker console in the secondary Region, launch the same user’s SageMaker Studio.

You will find your files backed up in either Code Editor or JupyterLab.

Clean up
To avoid incurring ongoing charges, clean up the resources you created as part of this post:

Stop all Code Editor and JupyterLab Apps
Delete all cdk stacks

cdk destroy ECSTaskStack-NewStudio
cdk destroy SagemakerDomainSecondaryStack-NewStudio
cdk destroy SagemakerDomainPrimaryStack-NewStudio

Conclusion
SageMaker offers a robust and highly available ML platform, enabling data scientists and ML engineers to build, train, and deploy models efficiently. For critical use cases, implementing a comprehensive disaster recovery strategy enhances the resilience of your SageMaker domain, ensuring continuous operation in the unlikely event of regional impairment. This post presents a detailed solution for migrating and safeguarding your SageMaker domain, including user profiles and files, from one active AWS Region to another passive or active AWS Region. By using a strategic combination of AWS services, such as Amazon EFS, Step Functions, and the AWS CDK, this solution overcomes the current limitations within SageMaker Studio and provides continuous access to your valuable data and resources. Whether you choose an active-passive or active-active architecture, this solution provides a robust and resilient backup and recovery mechanism, fortifying your defenses against natural disasters, technical failures, and Regional outages. With this comprehensive guide, you can confidently safeguard your mission-critical and time-sensitive workloads, maintaining business continuity and uninterrupted access to your SageMaker domain, even in the case of unforeseen circumstances.
For more information on disaster recovery on AWS, refer to the following:

What is disaster recovery?
Disaster Recovery of Workloads on AWS: Recovery in the Cloud
AWS Elastic Disaster Recovery
Implement backup and recovery using an event-driven serverless architecture with Amazon SageMaker Studio

About the Authors
Jinzhao Feng is a Machine Learning Engineer at AWS Professional Services. He focuses on architecting and implementing large-scale generative AI and classic ML pipeline solutions. He is specialized in FMOps, LLMOps, and distributed training.
Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.
Natasha Tchir is a Cloud Consultant at the Generative AI Innovation Center, specializing in machine learning. With a strong background in ML, she now focuses on the development of generative AI proof-of-concept solutions, driving innovation and applied research within the GenAIIC.
Katherine Feng is a Cloud Consultant at AWS Professional Services within the Data and ML team. She has extensive experience building full-stack applications for AI/ML use cases and LLM-driven solutions.

Deploy a serverless web application to edit images using Amazon Bedroc …

Generative AI adoption among various industries is revolutionizing different types of applications, including image editing. Image editing is used in various sectors, such as graphic designing, marketing, and social media. Users rely on specialized tools for editing images. Building a custom solution for this task can be complex. However, by using various AWS services, you can quickly deploy a serverless solution to edit images. This approach can give your teams access to image editing foundation models (FMs) using Amazon Bedrock.
Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. Amazon Bedrock is serverless, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.
Amazon Titan Image Generator G1 is an AI FM available with Amazon Bedrock that allows you to generate an image from text, or upload and edit your own image. Some of the key features we focus on include inpainting and outpainting.
This post introduces a solution that simplifies the deployment of a web application for image editing using AWS serverless services. We use AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon Bedrock with the Amazon Titan Image Generator G1 model to build an application to edit images using prompts. We cover the inner workings of the solution to help you understand the function of each service and how they are connected to give you a complete solution. At the time of writing this post, Amazon Titan Image Generator G1 comes in two versions; for this post, we use version 2.
Solution overview
The following diagram provides an overview and highlights the key components. The architecture uses Amazon Cognito for user authentication and Amplify as the hosting environment for our frontend application. A combination of API Gateway and a Lambda function is used for our backend services, and Amazon Bedrock integrates with the FM model, enabling users to edit the image using prompts.

Prerequisites
You must have the following in place to complete the solution in this post:

An AWS account
FM access in Amazon Bedrock for Amazon Titan Image Generator G1 v2 in the same AWS Region where you will deploy this solution
The accompanying AWS CloudFormation template downloaded from the aws-samples GitHub repo.

Deploy solution resources using AWS CloudFormation
When you run the AWS CloudFormation template, the following resources are deployed:

Amazon Cognito resources:

User pool: CognitoUserPoolforImageEditApp
App client: ImageEditApp

Lambda resources:

Function: <Stack name>-ImageEditBackend-<auto-generated>

AWS Identity Access Management (IAM) resources:

IAM role: <Stack name>-ImageEditBackendRole-<auto-generated>
IAM inline policy: AmazonBedrockAccess (this policy allows Lambda to invoke Amazon Bedrock FM amazon.titan-image-generator-v2:0)

API Gateway resources:

Rest API: ImageEditingAppBackendAPI
Methods:

OPTIONS – Added header mapping for CORS
POST – Lambda integration

Authorization: Through Amazon Cognito using CognitoAuthorizer

After you deploy the CloudFormation template, copy the following from the Outputs tab to be used during the deployment of Amplify:

userPoolId
userPoolClientId
invokeUrl

Deploy the Amplify application
You have to manually deploy the Amplify application using the frontend code found on GitHub. Complete the following steps:

Download the frontend code from the GitHub repo.
Unzip the downloaded file and navigate to the folder.
In the js folder, find the config.js file and replace the values of XYZ for userPoolId, userPoolClientId, and invokeUrl with the values you collected from the CloudFormation stack outputs. Set the region value based on the Region where you’re deploying the solution.

The following is an example config.js file:

window._config = {
cognito: {
userPoolId: ‘XYZ’, // e.g. us-west-2_uXboG5pAb
userPoolClientId: ‘XYZ’, // e.g. 25ddkmj4v6hfsfvruhpfi7n4hv
region: ‘XYZ// e.g. us-west-2
},
api: {
invokeUrl: ‘XYZ’ // e.g. https://rc7nyt4tql.execute-api.us-west-2.amazonaws.com/prod,
}
};

Select all the files and compress them as shown in the following screenshot.

Make sure you zip the contents and not the top-level folder. For example, if your build output generates a folder named AWS-Amplify-Code, navigate into that folder and select all the contents, and then zip the contents.

Use the new .zip file to manually deploy the application in Amplify.

After it’s deployed, you will receive a domain that you can use in later steps to access the application.

Create a test user in the Amazon Cognito user pool.

An email address is required for this user because you will need to mark the email address as verified.

Return to the Amplify page and use the domain it automatically generated to access the application.

Use Amazon Cognito for user authentication
Amazon Cognito is an identity platform that you can use to authenticate and authorize users. We use Amazon Cognito in our solution to verify the user before they can use the image editing application.
Upon accessing the Image Editing Tool URL, you will be prompted to sign in with a previously created test user. For first-time sign-ins, users will be asked to update their password. After this process, the user’s credentials are validated against the records stored in the user pool. If the credentials match, Amazon Cognito will issue a JSON Web Token (JWT). In the API payload to be sent section of the page, you will notice that the Authorization field has been updated with the newly issued JWT.
Use Lambda for backend code and Amazon Bedrock for generative AI function
The backend code is hosted on Lambda, and launched by user requests routed through API Gateway. The Lambda function process the request payload and forwards it to Amazon Bedrock. The reply from Amazon Bedrock follows the same route as the initial request.
Use API Gateway for API management
API Gateway streamlines API management, allowing developers to deploy, maintain, monitor, secure, and scale their APIs effortlessly. In our use case, API Gateway serves as the orchestrator for the application logic and provides throttling to manage the load to the backend. Without API Gateway, you would need to use the JavaScript SDK in the frontend to interact directly with the Amazon Bedrock API, bringing more work to the frontend.
Use Amplify for frontend code
Amplify offers a development environment for building secure, scalable mobile and web applications. It allows developers to focus on their code rather than worrying about the underlying infrastructure. Amplify also integrates with many Git providers. For this solution, we manually upload our frontend code using the method outlined earlier in this post.
Image editing tool walkthrough
Navigate to the URL provided after you created the application in Amplify and sign in. At first login attempt, you’ll be asked to reset your password.

As you follow the steps for this tool, you will notice the API Payload to be Sent section on the right side updating dynamically, reflecting the details mentioned in the corresponding steps that follow.
Step 1: Create a mask on your image
To create a mask on your image, choose a file (JPEG, JPG, or PNG).
After the image is loaded, the frontend converts the file into base64 and base_image value is updated.
As you select a portion of the image you want to edit, a mask will be created, and mask value is updated with a new base64 value. You can also use the stroke size option to adjust the area you are selecting.
You now have the original image and the mask image encoded in base64. (The Amazon Titan Image Generator G1 model requires the inputs to be in base64 encoding.)

Step 2: Write a prompt and set your options
Write a prompt that describes what you want to do with the image. For this example, we enter Make the driveway clear and empty. This is reflected in the prompt on the right.
You can choose from the following image editing options: inpainting and outpainting. The value for mode is updated depending on your selection.

Use inpainting to remove masked elements and replace them with background pixels
Use outpainting to extend the pixels of the masked image to the image boundaries

Choose Send to API to send the payload to the API gateway. This action invokes the Lambda function, which validates the received payload. If the payload is validated successfully, the Lambda function proceeds to invoke the Amazon Bedrock API for further processing.
The Amazon Bedrock API generates two image outputs in base64 format, which are transmitted back to the frontend application and rendered as visual images.

Step 3: View and download the result
The following screenshot shows the results of our test. You can download the results or provide an updated prompt to get a new output.

Testing and troubleshooting
When you initiate the Send to API action, the system performs a validation check. If required information is missing or incorrect, it will display an error notification. For instance, if you attempt to send an image to the API without providing a prompt, an error message will appear on the right side of the interface, alerting you to the missing input, as shown in the following screenshot.

Clean up
If you decide to discontinue using the Image Editing Tool, you can follow these steps to remove the Image Editing Tool, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:

Delete the CloudFormation stack:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Locate the stack you created during the deployment process (you assigned a name to it).
Select the stack and choose Delete.

Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion
In this post, we explored a sample solution that you can use to deploy an image editing application by using AWS serverless services and generative AI services. We used Amazon Bedrock and an Amazon Titan FM that allows you to edit images by using prompts. By adopting this solution, you gain the advantage of using AWS managed services, so you don’t have to maintain the underlying infrastructure. Get started today by deploying this sample solution.
Additional resources
To learn more about Amazon Bedrock, see the following resources:

GitHub repo: Amazon Bedrock Workshop
Amazon Bedrock User Guide
Amazon Bedrock InvokeModel API
Workshop: Using generative AI on AWS for diverse content types

To learn more about the Amazon Titan Image Generator G1 model, see the following resources:

Amazon Titan Image Generator G1 models
Amazon Titan Image Generator Demo

About the Authors
Salman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys helping customers in the travel and hospitality industry to design, implement, and support cloud infrastructure. With a passion for networking services and years of experience, he helps customers adopt various AWS networking services. Outside of work, Salman enjoys photography, traveling, and watching his favorite sports teams.
Sergio Barraza is a Senior Enterprise Support Lead at AWS, helping energy customers design and optimize cloud solutions. With a passion for software development, he guides energy customers through AWS service adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.
Ravi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.
Ankush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Brilliant words, brilliant writing: Using AWS AI chips to quickly depl …

Many organizations are building generative AI applications powered by large language models (LLMs) to boost productivity and build differentiated experiences. These LLMs are large and complex and deploying them requires powerful computing resources and results in high inference costs. For businesses and researchers with limited resources, the high inference costs of generative AI models can be a barrier to enter the market, so more efficient and cost-effective solutions are needed. Most generative AI use cases involve human interaction, which requires AI accelerators that can deliver real time response rates with low latency. At the same time, the pace of innovation in generative AI is increasing, and it’s becoming more challenging for developers and researchers to quickly evaluate and adopt new models to keep pace with the market.
One of ways to get started with LLMs such as Llama and Mistral are by using Amazon Bedrock. However, customers who want to deploy LLMs in their own self-managed workflows for greater control and flexibility of underlying resources can use these LLMs optimized on top of AWS Inferentia2-powered Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances. In this blog post, we will introduce how to use an Amazon EC2 Inf2 instance to cost-effectively deploy multiple industry-leading LLMs on AWS Inferentia2, a purpose-built AWS AI chip, helping customers to quickly test and open up an API interface to facilitate performance benchmarking and downstream application calls at the same time.
Model introduction
There are many popular open source LLMs to choose from, and for this blog post, we will review three different use cases based on model expertise using Meta-Llama-3-8B-Instruct, Mistral-7B-instruct-v0.2, and CodeLlama-7b-instruct-hf.

Model name
Release company
Number of parameters
Release time
Model capabilities

Meta-Llama-3-8B-Instruct
Meta
8 billion
April 2024
Language understanding, translation, code generation, inference, chat

Mistral-7B-Instruct-v0.2
Mistral AI
7.3 billion
March 2024
Language understanding, translation, code generation, inference, chat

CodeLlama-7b-Instruct-hf
Meta
7 billion
August 2023
Code generation, code completion, chat

Meta-Llama-3-8B-Instruct is a popular language models, released by Meta AI in April 2024. The Llama 3 model has improved pre-training, instant comprehension, output generation, coding, inference, and math skills. The Meta AI team says that Llama 3 has the potential to be the initiator of a new wave of innovation in AI. The Llama 3 model is available in two publicly released versions, 8B and 70B. At the time of writing, Llama 3.1 instruction-tuned models are available in 8B, 70B, and 405B versions. In this blog post, we will use the Meta-Llama-3-8B-Instruct model, but the same process can be followed for Llama 3.1 models.
Mistral-7B-instruct-v0.2, released by Mistral AI in March 2024, marks a major milestone in the development of the publicly available foundation model. With its impressive performance, efficient architecture, and wide range of features, Mistral 7B v0.2 sets a new standard for user-friendly and powerful AI tools. The model excels at tasks ranging from natural language processing to coding, making it an invaluable resource for researchers, developers, and businesses. In this blog post, we will use the Mistral-7B-instruct-v0.2 model, but the same process can be followed for the Mistral-7B-instruct-v0.3 model.
CodeLlama-7b-instruct-hf is a collection of models published by Meta AI. It is an LLM that uses text prompts to generate code. Code Llama is aimed at code tasks, making developers’ workflow faster and more efficient and lowering the learning threshold for coders. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more powerful and well-documented software.
Solution architecture
The solution uses a client-server architecture, and the client uses the HuggingFace Chat UI to provide a chat page that can be accessed on a PC or mobile device. Server-side model inference uses Hugging Face’s Text Generation Inference, an efficient LLM inference framework that runs in a Docker container. We pre-compiled the model using Hugging Face’s Optimum Neuron and uploaded the compilation results to Hugging Face Hub. We have also added a model switching mechanism to the HuggingFace Chat UI to control the loading of different models in the Text Generation Inference container through a scheduler (Scheduler).
Solution highlights

All components are deployed on an Inf2 instance with a single chip instance (inf2.xl or inf2.8xl), and users can experience the effects of multiple models on one instance.
With the client-server architecture, users can flexibly replace either the client or the server side according to their actual needs. For example, the model can be deployed in Amazon SageMaker, and the frontend Chat UI can be deployed on the Node server. To facilitate the demonstration, we deployed both the front and back ends on the same Inf2 server.
Using a publicly available framework, users can customize frontend pages or models according to their own needs.
Using an API interface for Text Generation Inference facilitates quick access for users using the API.
Deployment using AWS Cloudformation, suitable for all types of businesses and developers within the enterprise.

Main components
The following are the main components of the solution.
Hugging Face Optimum Neuron
Optimum Neuron is an interface between the HuggingFace Transformers library and the AWS Neuron SDK. It provides a set of tools for model load, training, and inference for single and multiple accelerator setups of different downstream tasks. In this article, we mainly used Optimum Neuron’s export interface. To deploy the HuggingFace Transformers model on Neuron devices, the model needs to be compiled and exported to a serialized format before the inference is performed. The export interface is pre-compiled (Ahead of-time compilation (AOT)) using the Neuron compiler (Neuronx-cc), and the model is converted into a serialized and optimized TorchScript module. This is shown in the following figure.

During the compilation process, we introduced a tensor parallelism mechanism to split the weights, data, and computations between the two NeuronCores. For more compilation parameters, see Export a model to Inferentia.
Hugging Face’s Text Generation Inference (TGI)
Text Generation Inference (TGI) is a framework written in Rust and Python for deploying and serving LLMs. TGI provides high performance text generation services for the most popular publicly available foundation LLMs. Its main features are:

Simple launcher that provides inference services for many LLMs
Supports both generate and stream interfaces
Token stream using server-sent events (SSE)
Supports AWS Inferentia, Trainium, NVIDIA GPUs and other accelerators

HuggingFace Chat UI
HuggingFace Chat UI is an open-source chat tool built by SvelteKit and can be deployed to Cloudflare, Netlify, Node, and so on. It has the following main features:

Page can be customized
Conversation records can be stored, and chat records are stored in MongoDB
Supports operation on PC and mobile terminals
The backend can connect to Text Generation Inference and supports API interfaces such as Anthropic, Amazon SageMaker, and Cohere
Compatible with various publicly available foundation models (Llama series, Mistral/Mixtral series, Falcon, and so on.

Thanks to the page customization capabilities of the Hugging Chat UI, we’ve added a model switching function, so users can switch between different models on the same EC2 Inf2 instance.
Solution deployment

Before deploying the solution, make sure you have an inf2.xl or inf2.8xl usage quota in the us-east-1 (Virginia) or us-west-2 (Oregon) AWS Region. See the reference link for how to apply for a quota.
Sign in to the AWS Management Consol and switch the Region to us-east-1 (Virginia) or us-west-2 (Oregon) in the upper right corner of the console page.
Enter Cloudformation in the service search box and choose Create stack.
Select Choose an existing template, and then select Amazon S3 URL.
If you plan to use an existing virtual private cloud (VPC), use the steps in a; if you plan to create a new VPC to deploy, use the steps in b.

Use an existing VPC.

Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_default_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
Stack name: Enter the stack name.
InstanceType: select inf2.xl (lower cost) or inf2.8xl (better performance).
KeyPairName (optional): if you want to sign in to the Inf2 instance, enter the KeyPairName name.
VpcId: Select VPC.
PublicSubnetId: Select a public subnet.
VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.
Choose Next, then Next again. Choose Submit.

Create a new VPC.

Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_new_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
Stack name: Enter the stack name.
InstanceType: Select inf2.xl or inf2.8xl.
KeyPairName (optional): If you want to sign in to the Inf2 instance, enter the KeyPairName name.
VpcId: Leave as New.
PublicSubnetId: Leave as New.
VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.

Choose Next, and then Next again. Then choose Submit.6. After creating the stack, wait for the resources to be created and started (about 15 minutes). After the stack status is displayed as CREATE_COMPLETE, choose Outputs. Choose the URL where the key is the corresponding value location for Public endpoint for the web server (close all VPN connections and firewall programs).

User interface
After the solution is deployed, users can access the preceding URL on the PC or mobile phone. On the page, the Llama3-8B model will be loaded by default. Users can switch models in the menu settings, select the model name to be activated in the model list, and choose Activate to switch models. Switching models requires reloading the new model into the Inferentia 2 accelerator memory. This process takes about 1 minute. During this process, users can check the loading status of the new model by choosing Retrieve model status. If the status is Available, it indicates that the new model has been successfully loaded.

The effects of the different models are shown in the following figure:

The following figures shows the solution in a browser on a PC:

API interface and performance testing
The solution uses a Text Generation Inference Inference Server, which supports /generate and /generate_stream interfaces and uses port 8080 by default. You can make API calls by replacing <IP> that follows with the IP address deployed previously.
The /generate interface is used to return all responses to the client at once after generating all tokens on the server side.

curl <IP>:8080/generate
-X POST
-d ‘{“inputs”: “Calculate the distance from Beijing to Shanghai”}’
-H ‘Content-Type: application/json’

/generate_stream is used to reduce waiting delays and enhance the user experience by receiving tokens one by one when the model output length is relatively large.

curl <IP>:8080/generate_stream
-X POST
-d ‘{“inputs”: “Write an essay on the mental health of elementary school students with no more than 300 words. “}’
-H ‘Content-Type: application/json’

Here is a sample code to use requests interface in python.
import requests
url = “http://<IP>:8080/generate”
headers = {“Content-Type”: “application/json”}
data = {“inputs”: “Calculate the distance from Beijing to Shanghai”,”parameters”:{
“max_new_tokens”:200
}
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

Summary
In this blog post, we introduced methods and examples of deploying popular LLMs on AWS AI chips, so that users can quickly experience the productivity improvements provided by LLMs. The model deployed on Inf2 instance has been validated by multiple users and scenarios, showing strong performance and wide applicability. AWS is continuously expanding its application scenarios and features to provide users with efficient and economical computing capabilities. See Inf2 Inference Performance to check the types and list of models supported on the Inferentia2 chip. Contact us to give feedback on your needs or ask questions about deploying LLMs on AWS AI chips.
References

Amazon EC2 Inf2 Instances
Optimum Neuron
Optimum Neuron on GitHub
Text Generation Inference
Huggingface chat-ui
Introducing Meta Llama 3
Hugging Face Meta-Llama-3-8B
Hugging Face Mistral-7B-Instruct-v0.2
Hugging Face CodeLlama-7b-Instruct-hf
llama3-8b-inf2 Demo
AWS service quotas
Locust

About the authors
Zheng Zhang is a technical expert for Amazon Web Services machine learning products, focus on Amazon Web Services-based accelerated computing and GPU instances. He has rich experiences on large-scale model training and inference acceleration in machine learning.
Bingyang Huang is a Go-To-Market Specialist of Accelerated Computing at GCR SSO GenAI team. She has experience on deploying the AI accelerator on customer’s production environment. Outside of work, she enjoys watching films and exploring good foods.
Tian Shi is Senior Solution Architect at Amazon Web Services. He has rich experience in cloud computing, data analysis, and machine learning and is currently dedicated to research and practice in the fields of data science, machine learning, and serverless. His translations include Machine Learning as a Service, DevOps Practices Based on Kubernetes, Practical Kubernetes Microservices, Prometheus Monitoring Practice, and CoreDNS Study Guide in the Cloud Native Era.
Chuan Xie is a Senior Solution Architect at Amazon Web Services Generative AI, responsible for the design, implementation, and optimization of generative artificial intelligence solutions based on the Amazon Cloud. River has many years of production and research experience in the communications, ecommerce, internet and other industries, and rich practical experience in data science, recommendation systems, LLM RAG, and others. He has multiple AI-related product technology invention patents.

New vs Returning Purchase in Klaviyo: How to Segment to Maximize LTV i …

It takes an average of 2 to 3 purchases for a DTC business to break even or start making a profit on the cost to acquire a customer. So, making the most of those repeat customers is where the real growth happens.

To do that, you need to be able to segment new vs. returning purchases. This allows you to send the right message to the right customer at the right time, whether they’re a first-time buyer or a loyal repeat shopper.

By digging into order metrics, you can generate detailed reports, set up personalized flows, and even identify returning visitors that might otherwise go unnoticed. 

With Customers.ai and Klaviyo, you’ll capture more return visitors and ensure those high-value customers don’t slip through the cracks—even if Klaviyo’s cookie expires.

We’ll show you exactly how to make this work, just like we’ve done for dozens of clients using Customers.ai’s anonymous visitor identification in Klaviyo.

Jump to each section below or continue reading for the full breakdown of how to supercharge your new vs. returning customer strategies with Klaviyo:

Part 1: How to Trigger Flows to Repeat Shoppers 

Part 2: How to Segment Customers in New vs Returning Purchase Klaviyo Flows 

Part 3: How to Generate Reports for New vs Returning Purchases with Klaviyo 

1. How to Trigger Flows to Repeat Shoppers 

For businesses looking to target their repeat shoppers, triggering automated flows can be a powerful way to engage and drive repeat purchases. 

There are two main ways you can set up these flows: using Klaviyo directly or leveraging the advanced capabilities of Customers.ai in combination with Klaviyo. 

Method 1: Standard Klaviyo Setup

Setting up flows in Klaviyo is pretty straightforward, but there are several steps you’ll need to follow to make sure everything is set up correctly. 

While Klaviyo’s intuitive interface makes it easy to build automated flows, especially for repeat shoppers, it’s important to ensure each step is completed properly for the best results. 

Here’s a step-by-step breakdown on how to trigger flows for repeat shoppers:

1. Create a Segment for Repeat Shoppers

Go to Lists & Segments.

Click Create List/Segment and select Segment.

Set the conditions to define repeat shoppers. For example:

Condition 1: Placed Order at least once over all time.

Condition 2: Placed Order at least twice over all time.

Add any other filters, such as date ranges, specific product categories, or order values.

Name the segment (e.g., “Repeat Shoppers”) and save it.

2. Build a Flow for Repeat Shoppers

Go to Flows.

Click Create Flow.

Choose to start from scratch or use a pre-built flow template. For a repeat shopper flow, starting from scratch can be best to customize for your needs.

Name the flow (e.g., “Repeat Shopper Flow”) and click Create Flow.

3. Set the Flow Trigger

In the flow builder, click Trigger.

Select Segment as the flow trigger.

Choose the segment you just created (e.g., “Repeat Shoppers”).

You can also refine the trigger to ensure the flow only starts when they place another order or hit a specific event related to shopping.

4. Add Emails or Other Actions to the Flow

After the trigger, add emails or other actions (like SMS or conditional splits).

You might start with a thank-you email, followed by product recommendations, or even offer a loyalty discount.

Drag and drop email actions into the flow.

5. Personalize the Flow

Use dynamic content to make the emails personal for each repeat shopper.

Add product recommendations based on past purchases.

Include personalized discounts or incentives for future purchases.

Use conditional splits to show different emails based on behaviors (e.g., whether they purchased a specific product or category).

6. Set Time Delays

Add delays between emails to space out your communications. For example, send the first email right after their second purchase, and follow-up emails after 3, 7, or 14 days depending on the goal of your flow.

7. Preview and Test

Preview each email to ensure the dynamic content works.

Test the flow using Klaviyo’s Manual Flow Trigger to see how the flow behaves before sending it live.

8. Turn on the Flow

Once satisfied with the setup, toggle the flow from Draft to Live.

Method 2: Enhanced Setup with Customers.ai

While Klaviyo’s flow setup is effective, it has one major limitation—its cookies expire after 7 days. 

This means that if a shopper returns to your site after a week or isn’t logged in, Klaviyo can no longer recognize them as a return shopper, causing missed opportunities for engagement and sales.

That’s where Customers.ai steps in. 

Our Signal solution ensures you capture all of those visitors that Klaviyo misses and can put them into the right flow (or a custom one!). 

Here’s how it works:

1. Create a Segment for Return Visitors

Go to Lists & Segments.

Click Create List/Segment and select Segment.

Set the conditions to capture Customers.ai Return Visitors. 

Add any other filters, such as date ranges, specific product categories, or order values.

Name the segment (e.g., “CAI Return Visitors”) and save it.

2. Build a Flow for Repeat Shoppers

Go to Flows.

Click Create Flow.

Choose to start from scratch or use a pre-built flow template.

Name the flow (e.g., “CAI Return Shopper Flow”) and click Create Flow.

3. Set the Flow Trigger

In the flow builder, click Trigger.

Select Segment as the flow trigger.

Choose the segment you just created (e.g., “CAI Return Visitors”).

You can also refine the trigger to ensure the flow only starts when they hit a specific event.

4. Add Emails or Other Actions to the Flow

After the trigger, add emails or other actions.

You might start with a welcome back email, followed by product recommendations, or even offer a loyalty discount.

Drag and drop email actions into the flow.

As you can see, this is very similar to your traditional return shopper flow BUT you are capturing way more people!

2. How to Segment Customers in New vs Returning Purchase Klaviyo Flows

Once you have your flows set up, the possibilities for segmenting new and returning visitors are endless. 

By using Klaviyo’s powerful segmentation tools (along with Customers.ai audiences), you can go beyond the basics and create personalized experiences that resonate with your audience. 

Whether it’s tailoring your messaging, offering exclusive incentives, or re-engaging lapsed customers, understanding how to effectively segment new vs. returning shoppers can significantly boost your conversions and customer loyalty.

Let’s dive into three key strategies for segmenting customers in new vs. returning purchase flows.

A. Tailor Welcome Flows for New Customers and Welcome Back Flows for Returning Shoppers

New customers should be welcomed with an introduction to your brand, product highlights, and perhaps an exclusive discount to encourage a first purchase.

For returning customers, a “welcome back” flows that acknowledge their loyalty and rewards them for coming back can be a huge asset. 

Here’s what a flow could look like for a return visitor after 30 days of inactivity: 

Trigger:

This flow is triggered when a returning customer revisits your site or places an order after 30 days of inactivity (you can adjust the time frame based on your customer cycle).

Use Customers.ai to help recognize returning visitors even if Klaviyo’s cookies have expired.

Email 1: Welcome Back + Personal Thank You

Subject: “Welcome Back! We’ve Missed You!”

Content: Acknowledge their return with a friendly message that makes them feel appreciated. Thank them for being a loyal customer, and offer a small incentive (e.g., 10% off their next purchase). Include personalized product recommendations based on their past shopping behavior.

Send Time: Immediately after the trigger.

Email 2: Special Offer for Loyal Customers

Subject: “Your Loyalty Deserves a Reward!”

Content: Offer an exclusive deal, like early access to a new product or a loyalty program invitation. Highlight their previous purchases and suggest products they might love based on those.

Send Time: 2-3 days after the first email if no purchase is made.

Email 3: Gentle Reminder + Final Offer

Subject: “Still Here for You! Don’t Miss Your Exclusive Offer”

Content: Remind them of the offer you sent and how it’s still available. Use urgency (e.g., “Offer expires in 48 hours”) to encourage action. Reinforce their loyalty by mentioning their past support and why they’re a valued customer.

Send Time: 5-7 days after the second email if no purchase is made.

This “welcome back” flow re-engages returning customers by acknowledging their loyalty and rewarding them with personalized offers. 

B. Use Purchase History to Drive Flow Customization

Leverage customer purchase history to create targeted messaging. 

For first-time buyers, your flows should focus on educating them about your brand and showcasing popular products.

For returning customers, highlight products that complement their previous purchases or introduce them to new arrivals. Personalized recommendations based on their shopping behavior will increase engagement.

Here’s a potential flow based on a purchase from a specific category and geared towards cross-sells and upsells. 

Trigger:

This flow is triggered when a customer makes a purchase in a specific category (e.g., skincare products, home decor, electronics).

Use the purchase history to suggest complementary products that enhance their experience or fill a gap in what they’ve bought.

Email 1: Product Use Tips + Complementary Products

Subject: “Maximize Your [Product] Experience!”

Content: Provide tips on how to best use the product they purchased, enhancing their experience and showing that you care about their satisfaction. At the same time, recommend complementary products that would make their purchase even better (e.g., a protective case for electronics, a serum to go with a skincare purchase).

Send Time: 2 days after the purchase is confirmed.

Email 2: Cross-Sell Product Offer

Subject: “Complete Your Collection with [Recommended Product]”

Content: Highlight products that other customers frequently buy together with their purchase, offering a limited-time discount or bundle offer. Use their purchase history to customize this email to the specific products or category they are interested in.

Send Time: 5 days after the first email.

Email 3: Upsell to Premium Product

Subject: “Upgrade Your Experience with [Premium Product]”

Content: Encourage the customer to consider an upgrade or premium version of a related product. For example, if they bought a basic version, introduce a more advanced product that enhances their original purchase.

Send Time: 7-10 days after the second email, if no action has been taken.

This flow leverages a customer’s purchase history to deliver personalized recommendations that align with what they’ve already bought. 

Plus, by suggesting cross-sell and upsell opportunities, you can increase the average order value while keeping the content highly relevant.

C. Incorporate Conditional Splits Based on Engagement

We all know that not all customers engage the same way. In both your new and returning customer flows, include conditional splits that adjust messaging based on user behavior. 

For example, here is a conditional splits flow for a new customer:

Trigger:

This flow is triggered after a customer makes their first purchase. The goal is to nurture them into becoming a repeat customer, with messaging tailored based on how they engage with the flow.

Email 1: Welcome & Thank You

Subject: “Welcome to [Brand]! Here’s What You Can Expect”

Content: A warm welcome email thanking them for their first purchase, introducing your brand story, and setting expectations (e.g., shipping details, customer support). Include a call to action to visit your site or explore other products.

Send Time: Immediately after the purchase is confirmed.

Conditional Split 1: Did They Open the First Email?

Yes: If the customer opened the email, send a follow-up with more detailed information about your products, encouraging them to explore more on your site.

Email 2A: Explore Our Bestsellers

Subject: “Customers Love These – We Think You Will Too!”

Content: Highlight popular products or new arrivals that match their first purchase. Offer a discount for a repeat order.

Send Time: 3-4 days after the first email.

No: If the customer didn’t open the email, follow up with a more attention-grabbing subject line and a clear CTA to visit your site or check out their purchase.

Email 2B: Quick Reminder! Here’s What You Missed”

Subject: “Forgot Something? Here’s Your Quick Guide to [Brand]”

Content: Briefly remind them about their purchase and reinforce your brand’s key benefits or unique selling points. Keep it simple and offer an incentive (like free shipping) to encourage engagement.

Send Time: 3-4 days after the first email.

Conditional Split 2: Did They Click a Link?

Yes: If they clicked a link, send a targeted offer to drive the second purchase.

Email 3A: Exclusive Offer for New Customers

Subject: “Your Exclusive Offer Awaits!”

Content: Offer a time-sensitive discount on their next purchase to encourage a second order. Highlight products related to their first purchase.

Send Time: 3-5 days after the second email.

No: If no clicks, focus on building trust by sharing social proof, such as customer reviews or success stories.

Email 3B: See Why [Brand] Customers Keep Coming Back

Subject: “Don’t Just Take Our Word For It!”

Content: Share testimonials, reviews, or success stories to build confidence. Include a call to action to check out other products or learn more about your brand.

Send Time: 3-5 days after the second email.

Final Nurture Email

Send a final email offering support, FAQs, or guidance on how to use the product they purchased.

Email 4: Need Help or Have Questions?

Subject: “We’re Here for You!”

Content: Offer helpful resources like product tutorials, FAQs, or customer support contact info. Build a sense of customer care and encourage engagement through a final discount or product recommendation.

A return customer on the other hand, might look something like this:

Both flows use conditional splits based on customer engagement, ensuring that your messages are relevant and timely. 

By responding to their actions (or lack thereof), you can tailor the experience and nurture new customers toward a second purchase or deeper engagement with your brand.

Each of the flows outlined above can get you started with segmenting customers into new vs. returning purchase flows in Klaviyo. 

By customizing your messaging based on how customers engage, you’ll create a more personalized experience that helps turn first-time buyers into loyal, repeat customers.

3. How to Generate Reports for New vs Returning Purchases with Klaviyo

Understanding the difference between new and returning customers is really important for tracking sales and optimizing your marketing. 

New customers might need more attention, while returning customers are usually closer to making another purchase. 

Klaviyo lets you generate reports that break down these two groups, and when you combine it with Customers.ai, you can capture even more data, especially from visitors who might be missed. 

Here are a few tips along with some example reports to help keep your sales on track.

A. Track Revenue Split Between New and Returning Customers

Understanding how much of your revenue comes from first-time buyers versus repeat customers can help you allocate your marketing spend more effectively. 

If most of your sales come from new customers, you might focus more on acquisition. If returning customers drive revenue, focus on retention strategies.

To view revenue breakdown by new vs. returning customers, you will want to set up the following report:

Go to Analytics in Klaviyo and select Custom Reports.

Choose Placed Order as the primary metric.

Set up a filter for New Customers by defining a segment for people who have placed exactly 1 order.

Create another filter for Returning Customers by defining a segment for people who have placed more than 1 order.

Generate the report to see a side-by-side comparison of revenue from new vs. returning customers over a specific time period.

According to Klaviyo, there is another way to see this – Metrics tab in Analytics > Placed Order and utilize the ‘Advanced Filter’ Options in this tab. There is a New/Returning view you can add and then customize the time frame. 

B. Measure Customer Lifetime Value (CLV) for Better Insights

CLV reports give you a clearer picture of how much value each customer brings over time, allowing you to see how repeat customers contribute to overall revenue growth. 

By comparing new and returning customers, you can identify which group provides more long-term value and adjust your retention strategies accordingly.

Here are 5 steps to measure customer lifetime value (CLV) in Klaviyo:

Note: You should have created your segments for new and returning customers in the previous report. Now, let’s use those segments to measure Customer Lifetime Value (CLV) and gain deeper insights into your customers’ behavior.

Step 1: Go to Analytics

Navigate to Analytics in the sidebar.

Select Customer Lifetime Value from the available report types.

Step 2: Choose the Date Range

Set a relevant date range for your report, depending on how far back you want to track CLV (e.g., last 30 days, last quarter, or the last year).

Step 3: Filter by New and Returning Customer Segments

Apply a Segment Filter for New Customers to view CLV for first-time buyers.

Create a separate report using the Returning Customers segment to view CLV for those who have made repeat purchases.

Step 4: Review and Compare Results

After generating both reports, compare the Customer Lifetime Value for new vs. returning customers.

This will show you how much revenue each segment generates over time.

Use this information to identify opportunities to increase CLV, especially for new customers, by adjusting your marketing strategies.

Step 5: Export or Save the Report

Once the report is generated, you can export it as a CSV or PDF for deeper analysis or sharing with your team.

If needed, save the report in Klaviyo to easily track CLV over time for new and returning customer segments.

C. Use Cohort Analysis to Understand Retention Trends

Cohort analysis helps you see how different customer segments (e.g., new customers in Q1 vs. Q2) behave over time. 

Comparing the behavior of new customers against returning customers can help you spot retention trends and figure out when customers are most likely to drop off or come back.

Here are the steps to set up cohort analytics in Klaviyo:

Step 1: Navigate to Analytics

In Klaviyo, go to the Analytics tab on the left-hand sidebar.

Select Cohort Analysis from the list of available report types.

Step 2: Choose Your Metric

Under the Metric dropdown, select Customer Retention.

This will track how well you’re retaining customers over time based on their first purchase and follow-up purchases.

Step 3: Define Your Time Frame

Choose a time frame for your cohort analysis (e.g., Last 30 days, Last 90 days, or a custom date range).

You can set your cohorts to be grouped by week or month, depending on how detailed you want the data to be.

Example: Group customers based on their first purchase month and track how many made repeat purchases in subsequent months.

Step 4: Apply Segment Filters

If you have already created segments for New Customers and Returning Customers, you can apply these filters to your cohorts.

Filter by New Customers to track how well you retain first-time buyers.

Filter by Returning Customers to track their repeat behavior.

Step 5: Review the Cohort Report

After generating the report, you’ll see a table or graph showing how each cohort (grouped by the time they made their first purchase) behaves over time.

This will show how many customers in each cohort return for a second, third, or subsequent purchase.

For example, you might see that 30% of new customers from January made another purchase in February, but only 15% made one by March.

Step 6: Use Insights to Improve Retention

Based on the trends you see in your cohort analysis, adjust your retention strategies:

If you notice a drop-off after a certain period, consider re-engagement campaigns or loyalty incentives.

Compare different time periods to see if any changes in your marketing or product offerings correlate with improved retention.

By setting up cohort analysis, you can track how well you’re retaining customers over time and pinpoint when customer drop-off occurs, allowing you to take action to improve retention rates.

Wrap Up: Unlocking the Power of New vs Returning Purchase in Klaviyo

Segmenting new and returning customers in Klaviyo is key if you want to boost engagement and make the most out of every sale. 

Whether it’s setting up personalized flows or digging into reports, these strategies help you really understand your customers and improve your marketing. 

And by adding Customers.ai into the mix, you can capture even more returning visitors that Klaviyo might miss. 

It’s an easy way to take your campaigns to the next level and keep your customers coming back again and again. Interested in learning more about how to maximize Klaviyo with Customers.ai?

Get a free Klaviyo audit and we’ll show you just how easy it is to capitalize on new and return customers.

See Who Is On Your Site Right Now!

Get names, emails, phone numbers & more.

Try it Free, No Credit Card Required

Start Your Free Trial

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post New vs Returning Purchase in Klaviyo: How to Segment to Maximize LTV in 3 Power Moves appeared first on Customers.ai.

Meta AI Releases Meta’s Open Materials 2024 (OMat24) Inorganic Mater …

The discovery of new materials is crucial to addressing pressing global challenges such as climate change and advancements in next-generation computing. However, existing computational and experimental approaches face significant limitations in efficiently exploring the vast chemical space. While AI has emerged as a powerful tool for materials discovery, the lack of publicly available data and open, pre-trained models has become a major bottleneck. Density Functional Theory (DFT) calculations, essential for studying material stability and properties, are computationally expensive, restricting their utility in exploring large material search spaces.

Researchers from Meta Fundamental AI Research (FAIR) have introduced the Open Materials 2024 (OMat24) dataset, which contains over 110 million DFT calculations, making it one of the largest publicly available datasets in this domain. They also present the EquiformerV2 model, a state-of-the-art Graph Neural Network (GNN) trained on the OMat24 dataset, achieving leading results on the Matbench Discovery leaderboard. The dataset includes diverse atomic configurations sampled from both equilibrium and non-equilibrium structures. The accompanying pre-trained models are capable of predicting properties such as ground-state stability and formation energies with high accuracy, providing a robust foundation for the broader research community.

The OMat24 dataset comprises over 118 million atomic structures labeled with energies, forces, and cell stresses. These structures were generated using techniques like Boltzmann sampling, ab-initio molecular dynamics (AIMD), and relaxation of rattled structures. The dataset emphasizes non-equilibrium structures, ensuring that models trained on OMat24 are well-suited for dynamic and far-from-equilibrium properties. The elemental composition of the dataset spans much of the periodic table, with a focus on inorganic bulk materials. EquiformerV2 models, trained on OMat24 and other datasets such as MPtraj and Alexandria, have demonstrated high effectiveness. For instance, models trained with additional denoising objectives exhibited improvements in predictive performance.

When evaluated on the Matbench Discovery benchmark, the EquiformerV2 model trained using OMat24 achieved an F1 score of 0.916 and a mean absolute error (MAE) of 20 meV/atom, setting new benchmarks for predicting material stability. These results were significantly better compared to other models in the same category, highlighting the advantage of pre-training on a large, diverse dataset like OMat24. Moreover, models trained solely on the MPtraj dataset, a relatively smaller dataset, also performed well due to effective data augmentation strategies, such as denoising non-equilibrium structures (DeNS). The detailed metrics showed that OMat24 pre-trained models outperform conventional models in terms of accuracy, particularly for non-equilibrium configurations.

The introduction of the OMat24 dataset and the corresponding models represents a significant leap forward in AI-assisted materials science. The models provide the capability to predict critical properties, such as formation energies, with a high degree of accuracy, making them highly useful for accelerating materials discovery. Importantly, this open-source release allows the research community to build upon these advances, further enhancing AI’s role in addressing global challenges through new material discoveries.

The OMat24 dataset and models, available on Hugging Face, along with checkpoints for pre-trained models, provide an essential resource for AI researchers in materials science. Meta’s FAIR Chem team has made these resources available under permissive licenses, enabling broader adoption and use. Additionally, an update from the OpenCatalyst team on X can be found here, providing more context on how the models are pushing the limits of material stability prediction.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Meta AI Releases Meta’s Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models appeared first on MarkTechPost.

RealHumanEval: A Web Interface to Measure the Ability of LLMs to Assis …

The growing reliance on large language models for coding support poses a significant problem: how best to assess real-world impact on programmer productivity? Current approaches, such as static bench-marking based on datasets such as HumanEval, measure the correctness of the code but cannot capture the dynamic, human-in-the-loop interaction of real programming activity. With LLMs increasingly being integrated into coding environments and deployed in real-time, suggest, or chat settings, it is now time to reconsider measuring not only the ability of LLMs to complete tasks but also their impact on human productivity. A much-needed contribution toward the development of an evaluation framework that is more pragmatic would be to ensure that these LLMs actually improve true coding productivity outside the lab.

Although a lot of LLMs are designed for programming tasks, the evaluation of many of these LLMs remains largely dependent upon static benchmarks such as HumanEval and MBPP, in which models are judged not based on how well they can assist human programmers but based on the correctness of code generated by themselves. While accuracy is vital to quantitatively measure benchmarks, practical aspects in real-world scenarios are generally neglected. All types of programmers continually engage LLMs and modify their work in an iterative manner in real-world practical settings. None of these traditional approaches capture key metrics, such as how much time programmers spend coding, how frequently programmers accept LLM suggestions or the degree to which LLMs actually help solve complex problems. The gap between theoretical rankings and practical usefulness casts a question on the generalisability of these methods since they cannot represent actual LLM use, and the actual productivity gain is hard to measure.

Researchers from MIT, Carnegie Mellon University, IBM Research, UC Berkeley, and Microsoft developed RealHumanEval, a groundbreaking platform designed for human-centric evaluation of LLMs in programming. It allows real-time evaluation of LLMs via two modes of interaction: suggestions over autocomplete or via chat-based assistance. Detailed user interaction logs are recorded on the platform for code suggestions accepted and the time taken to complete a task. Real-human Eval is beyond any static benchmarks by focusing on human productivity metrics that give so much better comprehension of how well LLMs perform once integrated with real-world coding workflows. This helps to bridge the gap between theoretical performance and practice, providing insight into ways in which LLMs help or hinder the coding process.

RealHumanEval allows users to interact both through autocomplete and through chat, recording several aspects of these interactions. The current evaluation tested seven different LLMs, including models from the GPT and CodeLlama families on a set of 17 coding tasks with varying complexity. The system logged a great deal of productivity metrics: completion time per task, number of completed tasks, and how often a user accepted a suggested LLM code. For this experiment, 243 participants took part, and all the collected data was analyzed to see how different LLMs contributed to much more efficiency in coding. It discusses these in detail, and it provides the results of analyzing the interactions to provide insight into the effectiveness of LLMs in the wild coding environment and gives detailed nuances of human-LLM collaboration.

RealHumanEval testing of LLMs demonstrated that the higher-performing models on benchmarks yield significant gains in coding productivity, above all by saving time. For example, than the previous models, GPT-3.5 and CodeLlama-34b completed tasks 19% and 15% faster, respectively, for programmers. At other times, the gain on productivity measures cannot be stated as uniform for all models under consideration. A case in point is that there is insufficient positive evidence regarding CodeLlama-7b. Also, even though the time taken to complete the tasks has been reduced, the no. of tasks completed did not have much change, meaning LLMs will speed up the completion of individual tasks but by and large they don’t necessarily increment the total no. of tasks finished in a given time frame. Again, code suggestion acceptance was different for various models; GPT-3.5 had more in the way of the users’ acceptance than the rest. These results put to light that while LLMs can potentially foster productivity, in actual power to boost output, this is highly contextual.

In conclusion, RealHumanEval is a landmark testbed for LLMs in programming because it focuses on human-centered productivity metrics rather than traditional static benchmarks and therefore offers a much-needed complementary view of how well LLMs support real-world programmers. RealHumanEval allows deep insight into efficiency gains and user interaction patterns that help convey the strengths and limitations of LLMs when used in coding environments. Such would be a contribution to this line of inquiry for future research and development toward AI-assisted programming by providing valuable insights into optimizing such tools for practical use.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post RealHumanEval: A Web Interface to Measure the Ability of LLMs to Assist Programmers appeared first on MarkTechPost.

Open Collective Releases Magnum/v4 Series Models From 9B to 123B Param …

In the rapidly evolving world of AI, challenges related to scalability, performance, and accessibility remain central to the efforts of research communities and open-source advocates. Issues such as the computational demands of large-scale models, the lack of diverse model sizes for different use cases, and the need to balance accuracy with efficiency are critical obstacles. As organizations increasingly depend on AI to solve diverse problems, there is a growing need for models that are both versatile and scalable.

Open Collective has recently introduced the Magnum/v4 series, which includes models of 9B, 12B, 22B, 27B, 72B, and 123B parameters. This release marks a significant milestone for the open-source community, as it aims to create a new standard in large language models that are freely available for researchers and developers. Magnum/v4 is more than just an incremental update—it represents a full-fledged commitment to creating models that can be leveraged by those who want both breadth and depth in their AI capabilities. The diversity in sizes also reflects the broadening scope of AI development, allowing developers the flexibility to choose models based on specific requirements, whether they need compact models for edge computing or massive models for cutting-edge research. This approach fosters inclusivity in AI development, enabling even those with limited resources to access high-performing models.

Technically, the Magnum/v4 models are designed with flexibility and efficiency in mind. With parameter counts ranging from 9 billion to 123 billion, these models cater to different computational limits and use cases. For example, the smaller 9B and 12B parameter models are suitable for tasks where latency and speed are crucial, such as interactive applications or real-time inference. On the other hand, the 72B and 123B models provide the sheer power needed for more intensive natural language processing tasks, like deep content generation or complex reasoning. Furthermore, these models have been trained on a diverse dataset aimed at reducing bias and improving generalizability. They integrate advancements like efficient training optimizations, parameter sharing, and improved sparsity techniques, which contribute to a balance between computational efficiency and high-quality outputs.

The importance of the Magnum/v4 models cannot be overstated, particularly in the context of the current AI landscape. These models contribute towards democratizing access to cutting-edge AI technologies. Notably, Open Collective’s release provides a seamless solution for researchers, enthusiasts, and developers who are constrained by the availability of computational resources. Unlike proprietary models locked behind exclusive paywalls, Magnum/v4 stands out due to its open nature and adaptability, allowing experimentation without restrictive licensing. Early results demonstrate impressive gains in language understanding and generation across a variety of tasks, with benchmarks indicating that the 123B model, in particular, offers performance comparable to leading proprietary models. This represents a key achievement in the open-source domain, highlighting the potential of community-driven model development in narrowing the gap between open and closed AI ecosystems.

Open Collective’s Magnum/v4 models make powerful AI tools accessible to a wider community. By offering models from 9B to 123B parameters, they empower both small and large-scale AI projects, fostering innovation without resource constraints. As AI reshapes industries, Magnum/v4 contributes to a more inclusive, open, and collaborative future.

Check out the Model Series here on HuggingFace. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Open Collective Releases Magnum/v4 Series Models From 9B to 123B Parameters appeared first on MarkTechPost.

Meta AI Releases Meta Spirit LM: An Open Source Multimodal Language Mo …

One of the primary challenges in developing advanced text-to-speech (TTS) systems is the lack of expressivity when transcribing and generating speech. Traditionally, large language models (LLMs) used for building TTS pipelines convert speech to text using automatic speech recognition (ASR), process it using an LLM, and then convert the output back to speech via TTS. However, this approach often leads to a loss in expressive quality, as nuances such as tone, emotion, and pitch are stripped away during the ASR process. As a result, the synthesized speech tends to sound monotonic or unnatural, unable to adequately convey emotions like excitement, anger, or surprise.

Meta AI recently released Meta Spirit LM, an innovative open-source multimodal language model capable of freely mixing text and speech to address these limitations. Meta Spirit LM addresses the limitations of existing TTS systems by integrating both text and speech at the word level, allowing the model to cross modalities more seamlessly. This model was trained on both speech and text datasets using a word-level interleaving method, effectively capturing the expressive characteristics of spoken language while maintaining the strong semantic capabilities of text-based models.

Meta Spirit LM comes in two versions: Spirit LM Base and Spirit LM Expressive. Spirit LM Base uses phonetic tokens to encode speech, allowing for efficient representation of words, while Spirit LM Expressive goes a step further by incorporating pitch and style tokens to capture details of tone, such as excitement or anger, and generate expressive speech that reflects these emotions. This makes Meta Spirit LM a powerful tool for integrating text and speech modalities to produce coherent and natural-sounding speech.

Meta Spirit LM employs a unique word-level interleaving method to train on a mix of text and speech datasets. The model’s architecture is designed to freely transition between text and speech by encoding both modalities into a single set of tokens. Spirit LM Base utilizes phonetic tokens derived from speech representations, whereas Spirit LM Expressive incorporates pitch and style tokens that add layers of expressivity, such as tone or emotional nuances.

This architecture enables Meta Spirit LM to generate more natural and contextually rich speech. The model is capable of few-shot learning for tasks across modalities, such as automatic speech recognition (ASR), text-to-speech (TTS), and speech classification. This versatility positions Meta Spirit LM as a significant improvement over traditional multimodal AI models that typically operate in isolated domains. By learning representations that span text and speech, the model can also be used for complex applications, including expressive storytelling, emotion-driven virtual assistants, and enhanced interactive dialogue systems.

The importance of Meta Spirit LM lies in its ability to freely transition between speech and text, significantly enhancing the multimodal AI experience. The Expressive version of the model (Spirit LM Expressive) goes beyond standard speech models by allowing for the preservation of sentiment and tone across different modalities. Evaluation results on the Speech-Text Sentiment Preservation (STSP) benchmark indicate that Spirit LM Expressive effectively retains emotional intent, delivering more natural and emotive outputs than standard LLMs using ASR and TTS cascades.

Another key aspect of Meta Spirit LM’s contribution is its few-shot learning capabilities across different modalities. The model has demonstrated the ability to handle cross-modal tasks, such as converting text to expressive speech, with a competitive accuracy that showcases its generalized understanding across modalities. This makes Meta Spirit LM a significant leap forward in the development of conversational agents, accessible communication tools for those with disabilities, and educational technologies that require natural, expressive dialogue. The open-source nature of the model also invites the broader research community to explore and improve upon its multimodal capabilities.

Meta Spirit LM represents a groundbreaking step towards integrating speech and text modalities in AI systems without sacrificing expressivity. Meta Spirit LM Base and Spirit LM Expressive demonstrate a powerful combination of semantic understanding and expressive speech generation by using an interleaving approach to train on speech and text datasets. Whether it’s generating emotive virtual assistants or improving conversational AI, Meta Spirit LM’s open-source approach opens the door for more innovative and expressive uses of multimodal AI technology. Meta AI’s contributions to this model are expected to inspire further research and development at the intersection of text and speech, ultimately leading to more natural and capable AI communication systems.

Check out the GitHub and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Meta AI Releases Meta Spirit LM: An Open Source Multimodal Language Model Mixing Text and Speech appeared first on MarkTechPost.

Microsoft Open-Sources bitnet.cpp: A Super-Efficient 1-bit LLM Inferen …

The rapid growth of large language models (LLMs) has brought impressive capabilities, but it has also highlighted significant challenges related to resource consumption and scalability. LLMs often require extensive GPU infrastructure and enormous amounts of power, making them costly to deploy and maintain. This has particularly limited their accessibility for smaller enterprises or individual users without access to advanced hardware. Moreover, the energy demands of these models contribute to increased carbon footprints, raising sustainability concerns. The need for an efficient, CPU-friendly solution that addresses these issues has become more pressing than ever.

Microsoft recently open-sourced bitnet.cpp, a super-efficient 1-bit LLM inference framework that runs directly on CPUs, meaning that even large 100-billion parameter models can be executed on local devices without the need for a GPU. With bitnet.cpp, users can achieve impressive speedups of up to 6.17x while also reducing energy consumption by 82.2%. By lowering the hardware requirements, this framework could potentially democratize LLMs, making them more accessible for local use cases and enabling individuals or smaller businesses to harness AI technology without the hefty costs associated with specialized hardware.

Technically, bitnet.cpp is a powerful inference framework designed to support efficient computation for 1-bit LLMs, including the BitNet b1.58 model. The framework includes a set of optimized kernels tailored to maximize the performance of these models during inference on CPUs. Current support includes ARM and x86 CPUs, with additional support for NPUs, GPUs, and mobile devices planned for future updates. Benchmarks reveal that bitnet.cpp achieves speedups of between 1.37x and 5.07x on ARM CPUs, and between 2.37x and 6.17x on x86 CPUs, depending on the size of the model. Additionally, energy consumption sees reductions ranging from 55.4% to 82.2%, making the inference process much more power efficient. The ability to achieve such performance and energy efficiency allows users to run sophisticated models at speeds comparable to human reading rates (about 5-7 tokens per second), even on a single CPU, offering a significant leap for running LLMs locally.

The importance of bitnet.cpp lies in its potential to redefine the computation paradigm for LLMs. This framework not only reduces hardware dependencies but also sets a foundation for the development of specialized software stacks and hardware that are optimized for 1-bit LLMs. By demonstrating how effective inference can be achieved with low resource requirements, bitnet.cpp paves the way for a new generation of local LLMs (LLLMs), enabling more widespread, cost-effective, and sustainable adoption. These benefits are particularly impactful for users interested in privacy, as the ability to run LLMs locally minimizes the need to send data to external servers. Additionally, Microsoft’s ongoing research and the launch of its “1-bit AI Infra” initiative aim to further industrial adoption of these models, highlighting bitnet.cpp’s role as a pivotal step toward the future of LLM efficiency.

In conclusion, bitnet.cpp represents a major leap forward in making LLM technology more accessible, efficient, and environmentally friendly. With significant speedups and reductions in energy consumption, bitnet.cpp makes it feasible to run even large models on standard CPU hardware, breaking the reliance on expensive and power-hungry GPUs. This innovation could democratize access to LLMs and promote their adoption for local use, ultimately unlocking new possibilities for individuals and industries alike. As Microsoft continues to push forward with its 1-bit LLM research and infrastructure initiatives, the potential for more scalable and sustainable AI solutions becomes increasingly promising.

Check out the GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Microsoft Open-Sources bitnet.cpp: A Super-Efficient 1-bit LLM Inference Framework that Runs Directly on CPUs appeared first on MarkTechPost.

Emergence of Intelligence in LLMs: The Role of Complexity in Rule-Base …

The study investigates the emergence of intelligent behavior in artificial systems by examining how the complexity of rule-based systems influences the capabilities of models trained to predict those rules. Traditionally, AI development has focused on training models using datasets that reflect human intelligence, such as language corpora or expert-annotated data. This method assumes that intelligence can only emerge from exposure to inherently intelligent data. However, this study explores an alternative theory, suggesting that intelligence might emerge from models trained on simple systems that generate complex behaviors, even if the underlying process lacks inherent intelligence.

The concept of complexity emerging from simple systems has been explored in foundational studies on cellular automata (CA), where even minimal rules can produce intricate patterns. Research by Wolfram and others demonstrated that systems operating at the edge of chaos—where order and disorder meet—exhibit higher computational capabilities. Studies have shown that complex behaviors can arise from simple rules, providing a framework for understanding how intelligence might develop from exposure to complexity rather than intelligent data alone. Recent advancements in LLMs also highlight the importance of training on complex data for the emergence of new capabilities, underscoring that both model size and the complexity of the data play a significant role in intelligence development.

Researchers from Yale, Columbia, Northwestern, and Idaho State Universities explored how complexity in rule-based systems influences the intelligence of models trained to predict these rules. Using elementary cellular automata (ECA), simple one-dimensional systems with varying degrees of complexity, they trained separate GPT-2 models on data generated by ECAs. The study revealed a strong link between the complexity of ECA rules and the models’ intelligence, demonstrated through improved performance on reasoning and chess prediction tasks. Their findings suggest that intelligence may emerge from the ability to predict complex systems, particularly those on the “edge of chaos.”

The study explored the link between system complexity and intelligence by training modified GPT-2 models on binary data generated from ECA. The ECAs were simulated over 1,000 time steps, producing sequences of binary vectors. The models were pretrained on next-token prediction for up to 10,000 epochs, using a modified architecture to handle binary inputs and outputs. Training sequences were randomly sampled, and the Adam optimizer with gradient clipping and learning rate scheduling was used to ensure efficient training. After pretraining, the models were evaluated on reasoning and chess move prediction tasks.

The study examines how system complexity affects the intelligence of LLMs. Results indicate that models pretrained on more complex ECA rules perform better on tasks like reasoning and chess move prediction, but excessive complexity, such as chaotic rules, can reduce performance. Models trained on complex rules integrate past information for forecasts, as their attention patterns show. Surprisingly, models predicting the next state outperformed those predicting five steps, suggesting that complex models learn nontrivial patterns. Overall, there appears to be an optimal level of complexity that enhances model intelligence and generalization abilities.

In conclusion, the study explores how intelligence emerges in LLMs trained on ECA with varying rule complexity. The results show that models trained on rules with moderate complexity—neither too simple nor too chaotic—perform better on tasks like reasoning and chess predictions. This supports the “edge of chaos” theory, where intelligence develops in systems balancing predictability and complexity. The study suggests that models learn better by leveraging historical information in complex tasks and that intelligence may emerge from exposure to systems with just the right level of complexity.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.

[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Emergence of Intelligence in LLMs: The Role of Complexity in Rule-Based Systems appeared first on MarkTechPost.

Train, optimize, and deploy models on edge devices using Amazon SageMa …

This post is co-written Rodrigo Amaral, Ashwin Murthy and Meghan Stronach from Qualcomm.
In this post, we introduce an innovative solution for end-to-end model customization and deployment at the edge using Amazon SageMaker and Qualcomm AI Hub. This seamless cloud-to-edge AI development experience will enable developers to create optimized, highly performant, and custom managed machine learning solutions where you can bring you own model (BYOM) and bring your own data (BYOD) to meet varied business requirements across industries. From real-time analytics and predictive maintenance to personalized customer experiences and autonomous systems, this approach caters to diverse needs.
We demonstrate this solution by walking you through a comprehensive step-by-step guide on how to fine-tune YOLOv8, a real-time object detection model, on Amazon Web Services (AWS) using a custom dataset. The process uses a single ml.g5.2xlarge instance (providing one NVIDIA A10G Tensor Core GPU) with SageMaker for fine-tuning. After fine-tuning, we show you how to optimize the model with Qualcomm AI Hub so that it’s ready for deployment across edge devices powered by Snapdragon and Qualcomm platforms.
Business challenge
Today, many developers use AI and machine learning (ML) models to tackle a variety of business cases, from smart identification and natural language processing (NLP) to AI assistants. While open source models offer a good starting point, they often don’t meet the specific needs of the applications being developed. This is where model customization becomes essential, allowing developers to tailor models to their unique requirements and ensure optimal performance for specific use cases.
In addition, on-device AI deployment is a game-changer for developers crafting use cases that demand immediacy, privacy, and reliability. By processing data locally, edge AI minimizes latency, ensures sensitive information stays on-device, and guarantees functionality even in poor connectivity. Developers are therefore looking for an end-to-end solution where they can not only customize the model but also optimize the model to target on-device deployment. This enables them to offer responsive, secure, and robust AI applications, delivering exceptional user experiences.
How can Amazon SageMaker and Qualcomm AI Hub help?
BYOM and BYOD offer exciting opportunities for you to customize the model of your choice, use your own dataset, and deploy it on your target edge device. Through this solution, we propose using SageMaker for model fine-tuning and Qualcomm AI Hub for edge deployments, creating a comprehensive end-to-end model deployment pipeline. This opens new possibilities for model customization and deployment, enabling developers to tailor their AI solutions to specific use cases and datasets.
SageMaker is an excellent choice for model training, because it reduces the time and cost to train and tune ML models at scale without the need to manage infrastructure. You can take advantage of the highest-performing ML compute infrastructure currently available, and SageMaker can scale infrastructure from one to thousands of GPUs. Because you pay only for what you use, you can manage your training costs more effectively. SageMaker distributed training libraries can automatically split large models and training datasets across AWS GPU instances, or you can use third-party libraries, such as DeepSpeed, Horovod, Fully Sharded Data Parallel (FSDP), or Megatron. You can train foundation models (FMs) for weeks and months without disruption by automatically monitoring and repairing training clusters.
After the model is trained, you can use Qualcomm AI Hub to optimize, validate, and deploy these customized models on hosted devices with Snapdragon and Qualcomm Technologies within minutes. Qualcomm AI Hub is a developer-centric platform designed to streamline on-device AI development and deployment. AI Hub offers automatic conversion and optimization of PyTorch or ONNX models for efficient on-device deployment using TensorFlow Lite, ONNX Runtime, or Qualcomm AI Engine Direct SDK. It also has an existing library of over 100 pre-optimized models for Qualcomm and Snapdragon platforms.
Qualcomm AI Hub has served more than 800 companies and continues to expand its offerings in terms of models available, platforms supported, and more.
Using SageMaker and Qualcomm AI Hub together can create new opportunities for rapid iteration on model customization, providing access to powerful development tools and enabling a smooth workflow from cloud training to on-device deployment.
Solution architecture
The following diagram illustrates the solution architecture. Developers working in their local environment initiate the following steps:

Select an open source model and a dataset for model customization from the Hugging Face repository.
Pre-process the data into the format required by your model for training, then upload the processed data to Amazon Simple Storage Service (Amazon S3). Amazon S3 provides a highly scalable, durable, and secure object storage solution for your machine learning use case.
Call the SageMaker control plane API using the SageMaker Python SDK for model training. In response, SageMaker provisions a resilient distributed training cluster with the requested number and type of compute instances to run the model training. SageMaker also handles orchestration and monitors the infrastructure for any faults.
After the training is complete, SageMaker spins down the cluster, and you’re billed for the net training time in seconds. The final model artifact is saved to an S3 bucket.
Pull the fine-tuned model artifact from Amazon S3 to the local development environment and validate the model accuracy.
Use Qualcomm AI Hub to compile and profile the model, running it on cloud-hosted devices to deliver performance metrics ahead of downloading for deployment across edge devices.

Use case walk through
Imagine a leading electronics manufacturer aiming to enhance its quality control process for printed circuit boards (PCBs) by implementing an automated visual inspection system. Initially, using an open source vision model, the manufacturer collects and annotates a large dataset of PCB images, including both defective and non-defective samples.
This dataset, similar to the keremberke/pcb-defect-segmentation dataset from HuggingFace, contains annotations for common defect classes such as dry joints, incorrect installations, PCB damage, and short circuits. With SageMaker, the manufacturer trains a custom YOLOv8 model (You Only Look Once), developed by Ultralytics, to recognize these specific PCB defects. The model is then optimized for deployment at the edge using Qualcomm AI Hub, providing efficient performance on chosen platforms such as industrial cameras or handheld devices used in the production line.
This customized model significantly improves the quality control process by accurately detecting PCB defects in real-time. It reduces the need for manual inspections and minimizes the risk of defective PCBs progressing through the manufacturing process. This leads to improved product quality, increased efficiency, and substantial cost savings.
Let’s walk through this scenario with an implementation example.
Prerequisites
For this walkthrough, you should have the following:

Jupyter Notebook – The example has been tested in Visual Studio Code with Jupyter Notebook using the Python 3.11.7 environment.
An AWS account.
Create an AWS Identity and Access Management (IAM) user with the AmazonSageMakerFullAccess policy to enable you to run SageMaker APIs. Set up your security credentials for CLI.
Install AWS Command Line Interface (AWS CLI) and use aws configure to set up your IAM credentials securely.
Create a role with the name sagemakerrole to be assumed by SageMaker. Add managed policies AmazonS3FullAccess to give SageMaker access to your S3 buckets.
Make sure your account has the SageMaker Training resource type limit for ml.g5.2xlarge increased to 1 using the Service Quotas console.
Follow the get started instructions to install the necessary Qualcomm AI Hub library and set up your unique API token for Qualcomm AI Hub.
Use the following command to clone the GitHub repository with the assets for this use case. This repository consists of a notebook that references training assets.

$ git clone https://github.com/aws-samples/sm-qai-hub-examples.git
$ cd sm-qai-hub-examples/yolo

The sm-qai-hub-examples/yolo  directory contains all the training scripts that you might need to deploy this sample.
Next, you will run the sagemaker_qai_hub_finetuning.ipynb notebook to fine-tune the YOLOv8 model on SageMaker and deploy it on the edge using AI Hub. See the notebook for more details on each step. In the following sections, we walk you through the key components of fine-tuning the model.
Step 1: Access the model and data

Begin by installing the necessary packages in your Python environment. At the top of the notebook, include the following code snippet, which uses Python’s pip package manager to install the required packages in your local runtime environment.

%pip install -Uq sagemaker==2.232.0 ultralytics==8.2.100 datasets==2.18.0

Import the necessary libraries for the project. Specifically, import the Dataset class from the Hugging Face datasets library and the YOLO class from the ultralytics library. These libraries are crucial for your work, because they provide the tools you need to access and manipulate the dataset and work with the YOLO object detection model.

from datasets import Dataset

from ultralytics import YOLO

Step 2: Pre-process and upload data to S3
To fine-tune your YOLOv8 model for detecting PCB defects, you will use the keremberke/pcb-defect-segmentation dataset from Hugging Face. This dataset includes 189 images of chip defects (train: 128 images, validation: 25 images and test: 36 images). These defects are annotated in COCO format.
YOLOv8 doesn’t recognize these classes out of the box, so you will map YOLOv8’s logits to identify these classes during model fine-tuning, as shown in the following image.

Begin by downloading the dataset from Hugging Face to the local disk and converting it to the required YOLO dataset structure using the utility function CreateYoloHFDataset. This structure ensures that the YOLO API correctly loads and processes the images and labels during the training phase.

dataset_name = “keremberke/pcb-defect-segmentation”
dataset_labels = [
‘dry_joint’,
‘incorrect_installation’,
‘pcb_damage’,
‘short_circuit’
]

data = CreateYoloHFDataset(
hf_dataset_name=dataset_name,
labels_names=dataset_labels
)

Upload the dataset to Amazon S3. This step is crucial because the dataset stored in S3 will serve as the input data channel for the SageMaker training job. SageMaker will efficiently manage the process of distributing this data across the training cluster, allowing each node to access the necessary information for model training.

uploaded_s3_uri = sagemaker.s3.S3Uploader.upload(
local_path=data_path,
desired_s3_uri=f”s3://{s3_bucket}/qualcomm-aihub…”
)

Alternatively, you can use your own custom dataset (non-Hugging Face) to fine-tune the YOLOv8 model, as long as the dataset complies with the YOLOv8 dataset format.
Step 3: Fine-tune your YOLOv8 model
3.1: Review the training script
You’re now prepared to fine-tune the model using the model.train method from the Ultralytics YOLO library.
We’ve prepared a script called train_yolov8.py that will perform the following tasks. Let’s quickly review the key points in this script before you launch the training job.

The training script will do the following: Load a YOLOv8 model from the Ultralytics library

model = YOLO(args.yolov8_model)

Use the train method to run fine-tuning that considers the model data, adjusts its parameters, and optimizes its ability to accurately predict object classes and locations in images.

tuned_model = model.train(
data=dataset_yaml,
batch=args.batch_size,
imgsz=args.img_size,
epochs=args.epochs,

After the model is trained, the script runs inference to test the model output and save the model artifacts to a local Amazon S3 mapped folder

results = model.predict(
data=dataset_yaml,
imgsz=args.img_size,
batch=args.batch_size
)

model.save(“<model_name>.pt”)

3.2: Launch the training
You’re now ready to launch the training. You will use the SageMaker PyTorch training estimator to initiate training. The estimator simplifies the training process by automating several of the key tasks in this example:

The SageMaker estimator spins up a training cluster of one 2xlarge instance. SageMaker handles the setup and management of these compute instances, which reduces the total cost of ownership.
The estimator also uses one of the pre-built containers managed by SageMaker—PyTorch, which includes an optimized compiled version of the PyTorch framework along with its required dependencies and GPU-specific libraries for accelerated computations.

The estimator.fit() method initiates the training process with the specified input data channels. Following is the code used to launch the training job along with the necessary parameters.

estimator = PyTorch(
entry_point=’train_yolov8.py’,
source_dir=’scripts’,
role=role,
instance_count=instance_count,
instance_type=instance_type,
image_uri=training_image_uri,
hyperparameters=hyperparameters,
base_job_name=”yolov8-finetuning”,
output_path=f”s3://{s3_bucket}/…”
)

estimator.fit(
{
‘training’: sagemaker.inputs.TrainingInput(
s3_data=uploaded_s3_uri,
distribution=’FullyReplicated’,
s3_data_type=’S3Prefix’
)
}
)

You can track a SageMaker training job by monitoring its status using the AWS Management Console, AWS CLI, or AWS SDKs. To determine when the job is completed, check for the Completed status or set up Amazon CloudWatch alarms to notify you when the job transitions to the Completed state.
Step 4 & 5: Save, download and validate the trained model
The training process generates model artifacts that will be saved to the S3 bucket specified in output_path location. This example uses the download_tar_and_untar utility to download the model to a local drive.

Run inference on this model and visually validate how close ground truth and model predictions bounding boxes align on test images. The following code shows how to generate an image mosaic using a custom utility function—draw_bounding_boxes—that overlays an image with ground truth and model classification along with a confidence value for class prediction.

image_mosiacs = []
for i, _key in enumerate(image_label_pairs):
img_path, lbl_path = image_label_pairs[_key][“image_path”], image_label_pairs[_key][“label_path”]
result = model([img_path], save=False)
image_with_boxes = draw_bounding_boxes(
yolo_result=result[0],
ground_truth=open(lbl_path).read().splitlines(),
confidence_threshold=0.2
)
image_mosiacs.append(np.array(image_with_boxes))

From the preceding image mosaic, you can observe two distinct sets of bounding boxes: the cyan boxes indicate human annotations of defects on the PCB image, while the red boxes represent the model’s predictions of defects. Along with the predicted class, you can also see the confidence value for each prediction, which reflects the quality of the YOLOv8 model’s output.
After fine-tuning, YOLOv8 begins to accurately predict the PCB defect classes present in the custom dataset, even though it hadn’t encountered these classes during model pretraining. Additionally, the predicted bounding boxes are closely aligned with the ground truth, with confidence scores of greater than or equal to 0.5 in most cases. You can further improve the model’s performance without the need for hyperparameter guesswork by using a SageMaker hyperparameter tuning job.
Step 6: Run the model on a real device with Qualcomm AI Hub
Now that you’re validated the fine-tuned model on PyTorch, you want to run the model on a real device.
Qualcomm AI Hub enables you to do the following:

Compile and optimize the PyTorch model into a format that can be run on a device
Run the compiled model on a device with a Snapdragon processor hosted in AWS device farm
Verify on-device model accuracy
Measure on-device model latency

To run the model:

Compile the model.

The first step is converting the PyTorch model into a format that can run on the device.
This example uses a Windows laptop powered by the Snapdragon X Elite processor. This device uses the ONNX model format, which you will configure during compilation.
As you get started, you can see a list of all the devices supported on Qualcomm AI Hub, by running qai-hub list-devices.
See Compiling Models to learn more about compilation on Qualcomm AI Hub.

compile_job = hub.submit_compile_job(
model=traced_model,
input_specs={“image”: (model_input.shape, “float32″)},
device=target_device,
name=model_name,
options=”–target_runtime onnx”
)

Inference the model on a real device

Run the compiled model on a real cloud-hosted device with Snapdragon using the same model input you verified locally with PyTorch.
See Running Inference to learn more about on-device inference on Qualcomm AI Hub.

inference_job = hub.submit_inference_job(
model=compile_job.get_target_model(),
inputs={“image”: [model_input.numpy()]},
device=target_device,
name=model_name,
)

Profile the model on a real device.

Profiling measures the latency of the model when run on a device. It reports the minimum value over 100 invocations of the model to best isolate model inference time from other processes on the device.
See Profiling Models to learn more about profiling on Qualcomm AI Hub.

profile_job = hub.submit_profile_job(
model=compile_job.get_target_model(),
device=target_device,
name=model_name,
)

Deploy the compiled model to your device

Run the command below to download the compiled model.
The compiled model can be used in conjunction with the AI Hub sample application hosted here. This application uses the model to run object detection on a Windows laptop powered by Snapdragon that you have locally.

compile_job.download_target_model()

Conclusion
Model customization with your own data through Amazon SageMaker—with over 250 models available on SageMaker JumpStart—is an addition to the existing features of Qualcomm AI Hub, which include BYOM and access to a growing library of over 100 pre-optimized models. Together, these features create a rich environment for developers aiming to build and deploy customized on-device AI models across Snapdragon and Qualcomm platforms.
The collaboration between Amazon SageMaker and Qualcomm AI Hub will help enhance the user experience and streamline machine learning workflows, enabling more efficient model development and deployment across any application at the edge. With this effort, Qualcomm Technologies and AWS are empowering their users to create more personalized, context-aware, and privacy-focused AI experiences.
To learn more, visit Qualcomm AI Hub and Amazon SageMaker. For queries and updates, join the Qualcomm AI Hub community on Slack.
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. or its subsidiaries

About the authors
Rodrigo Amaral currently serves as the Lead for Qualcomm AI Hub Marketing at Qualcomm Technologies, Inc. In this role, he spearheads go-to-market strategies, product marketing, developer activities, with a focus on AI and ML with a focus on edge devices. He brings almost a decade of experience in AI, complemented by a strong background in business. Rodrigo holds a BA in Business and a Master’s degree in International Management.
Ashwin Murthy is a Machine Learning Engineer working on Qualcomm AI Hub. He works on adding new models to the public AI Hub Models collection, with a special focus on quantized models. He previously worked on machine learning at Meta and Groq.
Meghan Stronach is a PM on Qualcomm AI Hub. She works to support our external community and customers, delivering new features across Qualcomm AI Hub and enabling adoption of ML on device. Born and raised in the Toronto area, she graduated from the University of Waterloo in Management Engineering and has spent her time at companies of various sizes.
Kanwaljit Khurmi is a Principal Generative AI/ML Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.
Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.
Karan Jain is a Senior Machine Learning Specialist at AWS, where he leads the worldwide Go-To-Market strategy for Amazon SageMaker Inference. He helps customers accelerate their generative AI and ML journey on AWS by providing guidance on deployment, cost-optimization, and GTM strategy. He has led product, marketing, and business development efforts across industries for over 10 years, and is passionate about mapping complex service features to customer solutions.

Using Amazon Q Business with AWS HealthScribe to gain insights from pa …

With the advent of generative AI and machine learning, new opportunities for enhancement became available for different industries and processes. During re:Invent 2023, we launched AWS HealthScribe, a HIPAA eligible service that empowers healthcare software vendors to build their clinical applications to use speech recognition and generative AI to automatically create preliminary clinician documentation. In addition to AWS HealthScribe, we also launched Amazon Q Business, a generative AI-powered assistant that can perform functions such as answer questions, provide summaries, generate content, and securely complete tasks based on data and information that are in your enterprise systems.
AWS HealthScribe combines speech recognition and generative AI trained specifically for healthcare documentation to accelerate clinical documentation and enhance the consultation experience.
Key features of AWS HealthScribe include:

Rich consultation transcripts with word-level timestamps.
Speaker role identification (clinician or patient).
Transcript segmentation into relevant sections such as subjective, objective, assessment, and plan.
Summarized clinical notes for sections such as chief complaint, history of present illness, assessment, and plan.
Evidence mapping that references the original transcript for each sentence in the AI-generated notes.
Extraction of structured medical terms for entries such as conditions, medications, and treatments.

AWS HealthScribe provides a suite of AI-powered features to streamline clinical documentation while maintaining security and privacy. It doesn’t retain audio or output text, and users have control over data storage with encryption in transit and at rest.
With Amazon Q Business, we provide a new generative AI-powered assistant designed specifically for business and workplace use cases. It can be customized and integrated with an organization’s data, systems, and repositories. Amazon Q allows users to have conversations, help solve problems, generate content, gain insights, and take actions through its AI capabilities. Amazon Q offers user-based pricing plans tailored to how the product is used. It can adapt interactions based on individual user identities, roles, and permissions within the organization. Importantly, AWS never uses customer content from Amazon Q to train its underlying AI models, making sure that company information remains private and secure.
In this blog post, we’ll show you how AWS HealthScribe and Amazon Q Business together analyze patient consultations to provide summaries and trends from clinician conversations, simplifying documentation workflows. This automation and use of machine learning from clinician-patient interactions with Amazon HealthScribe and Amazon Q can help improve patient outcomes by enhancing communication, leading to more personalized care for patients and increased efficiency for clinicians.
Benefits and use cases
Gaining insight from patient-clinician interactions alongside a chatbot can help in a variety of ways such as:

Enhanced communication: In analyzing consultations, clinicians using AWS HealthScribe can more readily identify patterns and trends in large patient datasets, which can help improve communication between clinicians and patients. An example would be a clinician understanding common trends in their patient’s symptoms that they can then consider for new consultations.
Personalized care: Using machine learning, clinicians can tailor their care to individual patients by analyzing the specific needs and concerns of each patient. This can lead to more personalized and effective care.
Streamlined workflows: Clinicians can use machine learning to help streamline their workflows by automating tasks such as appointment scheduling and consultation summarization. This can give clinicians more time to focus on providing high-quality care to their patients. An example would be using clinician summaries together with agentic workflows to perform these tasks on a routine basis.

Architecture diagram

In the architecture diagram we present for this demo, two user workflows are shown. To kickoff the process, a clinician uploads the recording of a consultation to Amazon Simple Storage Service (Amazon S3). This audio file is then ingested by AWS HealthScribe and used to analyze consultation conversations. AWS HealthScribe will then output two files which are also stored on Amazon S3. In the second workflow, an authenticated user logs in via AWS IAM Identity Center to an Amazon Q web front end hosted by Amazon Q Business. In this scenario, Amazon Q Business is given the output Amazon S3 bucket as the data source for use in its web app.
Prerequisites

AWS IAM Identity Center will be used as the SAML 2.0-compliant identity provider (IdP). You’ll need to enable an IAM Identity Center instance. Under this instance, be sure to provision a user with a valid email address because this will be the user you will use to sign in to Amazon Q Business. For more details, see Configure user access with the default IAM Identity Center directory.
Amazon Simple Storage Service (Amazon S3) buckets that will be the input and output buckets for the clinician-patient conversations and AWS HealthScribe.

Implementation
To start using AWS HealthScribe you must first start a transcription job that takes a source audio file and outputs summary and transcription JSON files with the analyzed conversation. You’ll then connect these output files to Amazon Q.
Creating the AWS HealthScribe job

In the AWS HealthScribe console, choose Transcription jobs in the navigation pane, and then choose Create job to get started.
Enter a name for the job—in this example, we use FatigueConsult—and select the S3 bucket where the audio file of the clinician-patient conversation is stored.
Next, use the S3 URI search field to find and point the transcription job to the Amazon S3 bucket you want the output files to be saved to. Maintain the default options for audio settings, customization, and content removal.
Create a new AWS Identity and Access Management (IAM) role for AWS HealthScribe to use for access to the S3 input and output buckets by choosing Create an IAM role. In our example, we entered HealthScribeRole as the Role name. To complete the job creation, choose Create job.
This will take a few minutes to finish. When it’s complete, you will see the status change from In Progress to Complete and can inspect the results by selecting the job name.
AWS HealthScribe will create two files: a word-for-word transcript of the conversation with the suffix /transcript.json and a summary of the conversation with the suffix /summary.json. This summary uses the underlying power of generative AI to highlight key topics in the conversation, extract medical terminology, and more.

In this workflow, AWS HealthScribe analyzes the patient-clinician conversation audio to:

Transcribe the consultation
Identify speaker roles (for example, clinician and patient)
Segment the transcript (for example, small talk, visit flow management, assessment, and treatment plan)
Extract medical terms (for example, medication name and medical condition name)
Summarize notes for key sections of the clinical document (for example, history of present illness and treatment plan)
Create evidence mapping (linking every sentence in the AI-generated note with corresponding transcript dialogues).

Connecting an AWS HealthScribe job to Amazon Q
To use Amazon Q with the summarized notes and transcripts from AWS HealthScribe, we need to first create an Amazon Q business application and set the data source as the S3 bucket where the output files were stored in the HealthScribe jobs workflow. This will allow Amazon Q to index the files and give users the ability to ask questions of the data.

In the Amazon Q Business console, choose Get Started, then choose Create Application.
Enter a name for your application and select Create and use a new service-linked role (SLR).
Choose Create when you’re ready to select a data source.
In the Add data source pane select Amazon S3.
To configure the S3 bucket with Amazon Q, enter a name for the data source. In our example we use my-s3-bucket.
Next, locate the S3 bucket with the JSON outputs from HealthScribe using the Browse S3 button. Select Full sync for the sync mode and select a cadence of your preference. Once you complete these steps, Amazon Q Business will run a full sync of the objects in your S3 bucket and be ready for use.
In the main applications dashboard, navigate to the URL under Web experience URL. This is how you will access the Amazon Q web front end to interact with the assistant.

 After a user signs in to the web experience, they can start asking questions directly in the chat box as shown in the sample frontend that follows.
Sample frontend workflow
With the AWS HealthScribe results integrated into Amazon Q Business, users can go to the web experience to gain insights from their patient conversations. For example, you can use Q to determine information such as trends in patient symptoms, checking which medications patients are taking and so on as shown in the following figures.
The workflow starts with a question and answer about issues patients had, as shown in the following figure. In the example above, a clinician is asking what the symptoms were of patients who complained of stomach pain. Q responds with common symptoms, like bloating and bowel problems, from the data it has access to. The answers generated cite the source files from Amazon S3 that led to its summary and can be inspected by choosing Sources.
In the following example, a clinician asks what medications patients with knee pain are taking. Using our sample data of various consultations for knee pain, Q tells us patients are taking over the counter ibuprofen, but that it is not often providing patients relief.
This application can also help clinicians understand common trends in their patient data, such as asking what the common symptoms are for patients with chest pain.
In the final example for this post, a clinician asks Q if there are common symptoms for patients complaining of knee and elbow pain. Q responds that both sets of patients describe their pain being exacerbated by movement, but that it cannot conclusively point to any common symptoms across both consultation types. In this case Amazon Q is correctly using source data to prevent a hallucination from occurring.
Considerations
The UI for Amazon Q has limited customization. At the time of writing this post, the Amazon Q frontend cannot be embedded in other tools. Supported customization of the web experience includes the addition of a title and subtitle, adding a welcome message, and displaying sample prompts. For updates on web experience customizations, see Customizing an Amazon Q Business web experience. If this kind of customization is critical to your application and business needs, you can explore custom large language model chatbot designs using Amazon Bedrock or Amazon SageMaker.
AWS HealthScribe uses conversational and generative AI to transcribe patient-clinician conversations and generate clinical notes. The results produced by AWS HealthScribe are probabilistic and might not always be accurate because of various factors, including audio quality, background noise, speaker clarity, the complexity of medical terminology, and context-specific language nuances. AWS HealthScribe is designed to be used in an assistive role for clinicians and medical scribes rather than as a substitute for their clinical expertise. As such, AWS HealthScribe output should not be employed to fully automate clinical documentation workflows, but rather to provide additional assistance to clinicians or medical scribes in their documentation process. Please ensure that your application provides the workflow for reviewing the clinical notes produced by AWS HealthScribe and establishes expectation of the need for human review before finalizing clinical notes.
Amazon Q Business uses machine learning models that generate predictions based on patterns in data, and generate insights and recommendations from your content. Outputs are probabilistic and should be evaluated for accuracy as appropriate for your use case, including by employing human review of the output. You and your users are responsible for all decisions made, advice given, actions taken, and failures to take action based on your use of these features.
This proof-of-concept can be extrapolated to create a patient-facing application as well, with the notion that a patient can review their own conversations with physicians and be given access to their medical records and consultation notes in a way that makes it easy for them to ask questions of the trends and data for their own medical history.
AWS HealthScribe is only available for English-US language at this time in the US East (N. Virginia) Region. Amazon Q Business is only available in US East (N. Virginia) and US West (Oregon).
Clean up
To ensure that you don’t continue to accrue charges from this solution, you must complete the following clean-up steps.
AWS HealthScribe
Navigate to the AWS HealthScribe the console and choose Transcription jobs. Select whichever HealthScribe jobs you want to clean up and choose Delete at the top right corner of the console page.
Amazon S3
To clean up your Amazon S3 resources, navigate to the Amazon S3 console and choose the buckets that you used or created while going through this post. To empty the buckets, follow the instructions for Emptying a bucket. After you empty the bucket, you delete the entire bucket.
Amazon Q Business
To delete your Amazon Q Business application, follow the instructions on Managing Amazon Q Business applications.
Conclusion
In this post, we discussed how you can use AWS HealthScribe with Amazon Q Business to create a chatbot to quickly gain insights into patient clinician conversations. To learn more, reach out to your AWS account team or check out the links that follow.

AWS HealthScribe
AWS HealthScribe documentation
Amazon Q Business
What is Amazon Q Business?

About the Authors
Laura Salinas is a Startup Solution Architect supporting customers whose core business involves machine learning. She is passionate about guiding her customers on their cloud journey and finding solutions that help them innovate. Outside of work she loves boxing, watching the latest movie at the theater and playing competitive dodgeball.
Tiffany Chen is a Solutions Architect on the CSC team at AWS. She has supported AWS customers with their deployment workloads and currently works with Enterprise customers to build well-architected and cost-optimized solutions. In her spare time, she enjoys traveling, gardening, baking, and watching basketball.
Art Tuazon is a Partner Solutions Architect focused on enabling AWS Partners through technical best practices and is passionate about helping customers build on AWS. In her free time, she enjoys running and cooking.
Winnie Chen is a Solutions Architect at AWS supporting enterprise greenfield customers, focusing on the financial services industry. She has helped customers migrate and build their infrastructure on AWS. In her free time, she enjoys traveling and spending time outdoors through activities like hiking, biking and rock climbing.

Use Amazon SageMaker Studio with a custom file system in Amazon EFS

Amazon SageMaker Studio is the latest web-based experience for running end-to-end machine learning (ML) workflows. SageMaker Studio offers a suite of integrated development environments (IDEs), which includes JupyterLab, Code Editor, as well as RStudio. Data scientists and ML engineers can spin up SageMaker Studio private and shared spaces, which are used to manage the storage and resource needs of the JupyterLab and Code Editor applications, enable stopping the applications when not in use to save on compute costs, and resume the work from where they stopped.
The storage resources for SageMaker Studio spaces are Amazon Elastic Block Store (Amazon EBS) volumes, which offer low-latency access to user data like notebooks, sample data, or Python/Conda virtual environments. However, there are several scenarios where using a distributed file system shared across private JupyterLab and Code Editor spaces is convenient, which is enabled by configuring an Amazon Elastic File System (Amazon EFS) file system in SageMaker Studio. Amazon EFS provides a scalable fully managed elastic NFS file system for AWS compute instances.
Amazon SageMaker supports automatically mounting a folder in an EFS volume for each user in a domain. Using this folder, users can share data between their own private spaces. However, users can’t share data with other users in the domain; they only have access to their own folder user-default-efs in the $HOME directory of the SageMaker Studio application.
In this post, we explore three distinct scenarios that demonstrate the versatility of integrating custom Amazon EFS with SageMaker Studio.
For further information on configuring Amazon EFS in SageMaker Studio, refer to Attaching a custom file system to a domain or user profile.
Solution overview
In the first scenario, an AWS infrastructure admin wants to set up an EFS file system that can be shared across the private spaces of a given user profile in SageMaker Studio. This means that each user within the domain will have their own private space on the EFS file system, allowing them to store and access their own data and files. The automation described in this post will enable new team members joining the data science team can quickly set up their private space on the EFS file system and access the necessary resources to start contributing to the ongoing project.
The following diagram illustrates this architecture.

This scenario offers the following benefits:

Individual data storage and analysis – Users can store their personal datasets, models, and other files in their private spaces, allowing them to work on their own projects independently. Segregation is made by their user profile.
Centralized data management – The administrator can manage the EFS file system centrally, maintaining data security, backup, and direct access for all users. By setting up an EFS file system with a private space, users can effortlessly track and maintain their work.
Cross-instance file sharing – Users can access their files from multiple SageMaker Studio spaces, because the EFS file system provides a persistent storage solution.

The second scenario is related to the creation of a single EFS directory that is shared across all the spaces of a given SageMaker Studio domain. This means that all users within the domain can access and use the same shared directory on the EFS file system, allowing for better collaboration and centralized data management (for example, to share common artifacts). This is a more generic use case, because there is no specific segregated folder for each user profile.
The following diagram illustrates this architecture.

This scenario offers the following benefits:

Shared project directories – Suppose the data science team is working on a large-scale project that requires collaboration among multiple team members. By setting up a shared EFS directory at project level, the team can collaborate on the same projects by accessing and working on files in the shared directory. The data science team can, for example, use the shared EFS directory to store their Jupyter notebooks, analysis scripts, and other project-related files.
Simplified file management – Users don’t need to manage their own private file storage, because they can rely on the shared directory for their file-related needs.
Improved data governance and security – The shared EFS directory, being centrally managed by the AWS infrastructure admin, can provide improved data governance and security. The admin can implement access controls and other data management policies to maintain the integrity and security of the shared resources.

The third scenario explores the configuration of an EFS file system that can be shared across multiple SageMaker Studio domains within the same VPC. This allows users from different domains to access and work with the same set of files and data, enabling cross-domain collaboration and centralized data management.
The following diagram illustrates this architecture.

This scenario offers the following benefits:

Enterprise-level data science collaboration – Imagine a large organization with multiple data science teams working on various projects across different departments or business units. By setting up a shared EFS file system accessible across the organization’s SageMaker Studio domains, these teams can collaborate on cross-functional projects, share artifacts, and use a centralized data repository for their work.
Shared infrastructure and resources – The EFS file system can be used as a shared resource across multiple SageMaker Studio domains, promoting efficiency and cost-effectiveness.
Scalable data storage – As the number of users or domains increases, the EFS file system automatically scales to accommodate the growing storage and access requirements.
Data governance – The shared EFS file system, being managed centrally, can be subject to stricter data governance policies, access controls, and compliance requirements. This can help the organization meet regulatory and security standards while still enabling cross-domain collaboration and data sharing.

Prerequisites
This post provides an AWS CloudFormation template to deploy the main resources for the solution. In addition to this, the solution expects that the AWS account in which the template is deployed already has the following configuration and resources:

You should have a SageMaker Studio domain. Refer to Quick setup to Amazon SageMaker for instructions to set up a domain with default settings.
You should have an AWS CloudTrail log file that logs the SageMaker API CreateUserProfile. Refer to Creating a trail for your AWS account for additional information.
The CloudFormation resources are deployed in a virtual private cloud (VPC). Make sure the selected VPC allows outbound traffic through a NAT gateway and has proper routing Amazon Simple Storage Service (Amazon S3) endpoint access, which will be required for AWS CloudFormation. Refer to How do I troubleshoot custom resource failures in CloudFormation? for additional information.
The CloudFormation template deploys an AWS Lambda function in a VPC. If the access to AWS services in the selected VPC is restricted using AWS PrivateLink, make sure the Lambda security group can connect to the interface VPC endpoints for SageMaker (API), Amazon EFS, and Amazon Elastic Compute Cloud (Amazon EC2). Refer to Connecting inbound interface VPC endpoints for Lambda for additional information.
You should have the necessary AWS Identity and Access Management permissions to deploy the CloudFormation template in your account.

Refer to Attaching a custom file system to a domain or user profile for additional prerequisites.
Configure an EFS directory shared across private spaces of a given user profile
In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, creating a private file system directory for each user. We can distinguish two use cases:

Create new SageMaker Studio user profiles – A new team member joins a preexisting SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces
Use preexisting SageMaker Studio user profiles – A team member is already working on a specific SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces

The solution provided in this post focuses on the first use case. We discuss how to adapt the solution for preexisting SageMaker Studio domain user profiles later in this post.
The following diagram illustrates the high-level architecture of the solution.

In this solution, we use CloudTrail, Amazon EventBridge, and Lambda to automatically create a private EFS directory when a new SageMaker Studio user profile is created. The high-level steps to set up this architecture are as follows:

Create an EventBridge rule that invokes the Lambda function when a new SageMaker user profile is created and logged in CloudTrail.
Create an EFS file system with an access point for the Lambda function and with a mount target in every Availability Zone that the SageMaker Studio domain is located.
Use a Lambda function to create a private EFS directory with the required POSIX permissions for the profile. The function will also update the profile with the new file system configuration.

Deploy the solution using AWS CloudFormation
To use the solution, you can deploy the infrastructure using the following CloudFormation template. This template deploys three main resources in your account: Amazon EFS resources (file system, access points, mount targets), an EventBridge rule, and a Lambda function.
Refer to Create a stack from the CloudFormation console for additional information. The input parameters for this template are:

SageMakerDomainId – The SageMaker Studio domain ID that will be associated with the EFS file system.
SageMakerStudioVpc – The VPC associated to the SageMaker Studio domain.
SageMakerStudioSubnetId – One or multiple subnets associated to the SageMaker Studio domain. The template deploys its resources in these subnets.
SageMakerStudioSecurityGroupId – The security group associated to the SageMaker Studio domain. The template configures the Lambda function with this security group.

Amazon EFS resources
After you deploy the template, navigate to the Amazon EFS console and confirm that the EFS file system has been created. The file system has a mount target in every Availability Zone that your SageMaker domain connects to.
Note that each mount target uses the EC2 security group that SageMaker created in your AWS account when you first created the domain, which allows NFS traffic at port 2049. The provided template automatically retrieves this security group when it is first deployed, using a Lambda backed custom resource.

You can also observe that the file system has an EFS access point. This access point grants root access on the file system for the Lambda function that will create the directories for the SageMaker Studio user profiles.

EventBridge rule
The second main resource is an EventBridge rule invoked when a new SageMaker Studio user profile is created. Its target is the Lambda function that creates the folder in the EFS file system and updates the profile that has been just created. The input of the Lambda function is the event matched, where you can get the SageMaker Studio domain ID and the SageMaker user profile name.

Lambda function
Lastly, the template creates a Lambda function that creates a directory in the EFS file system with the required POSIX permissions for the user profile and updates the user profile with the new file system configuration.
At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

UID – The POSIX user ID. The default is 200001. A valid range is a minimum value of 10000 and maximum value of 4000000.
GID – The POSIX group ID. The default is 1001. A valid range is a minimum value of 1001 and maximum value of 4000000.

The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created.

Adapt the solution for preexisting SageMaker Studio domain user profiles
We can reuse the previous solution for scenarios in which the domain already has user profiles created. For that, you can create an additional Lambda function in Python that lists all the user profiles for the given SageMaker Studio domain and creates a dedicated EFS directory for each user profile.
The Lambda function should be in the same VPC as the EFS file system and it has attached the file system and access point previously created. You need to add the efs_id and domain_id values as environment variables for the function.
You can include the following code as part of this new Lambda function and run it manually:

import json
import subprocess
import boto3
import os

sm_client = boto3.client(‘sagemaker’)

def lambda_handler(event, context):

# Get EFS and Domain ID
file_system=os.environ[‘efs_id’]
domain_id=os.environ[‘domain_id’]

# Get Domain user profiles
list_user_profiles_response = sm_client.list_user_profiles(
DomainIdEquals=domain_id
)
domain_users = list_user_profiles_response[“UserProfiles”]

# Create directories for each user
for user in domain_users:

user_profile_name = user[“UserProfileName”]

# Permissions
repository=f’/mnt/efs/{user_profile_name}’
subprocess.call([‘mkdir’, repository])
subprocess.call([‘chown’, ‘200001:1001’, repository])

# Update SageMaker user
response = sm_client.update_user_profile(
DomainId=domain_id,
UserProfileName=user_profile_name,
UserSettings={
‘CustomFileSystemConfigs’: [
{
‘EFSFileSystemConfig’: {
‘FileSystemId’: file_system,
‘FileSystemPath’: f’/{user_profile_name}’
}
}
]
}
)

Configure an EFS directory shared across all spaces of a given domain
In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, using the same file system directory for all the users.
To achieve this, in addition to the prerequisites described earlier in this post, you need to complete the following steps.
Create the EFS file system
The file system needs to be in the same VPC as the SageMaker Studio domain. Refer to Creating EFS file systems for additional information.

Add mount targets to the EFS file system
Before SageMaker Studio can access the new EFS file system, the file system must have a mount target in each of the subnets associated with the domain. For more information about assigning mount targets to subnets, see Managing mount targets. You can get the subnets associated to the domain on the SageMaker Studio console under Network. You need to create a mount target for each subnet.

Additionally, for each mount target, you must add the security group that SageMaker created in your AWS account when you created the SageMaker Studio domain. The security group name has the format security-group-for-inbound-nfs-domain-id.
The following screenshot shows an example of an EFS file system with two mount targets for a SageMaker Studio domain associated to two subnets. Note the security group associated to both mount targets.

Create an EFS access point
The Lambda function accesses the EFS file system as root using this access point. See Creating access points for additional information.

Create a new Lambda function
Define a new Lambda function with the name LambdaManageEFSUsers. This function updates the default space settings of the SageMaker Studio domain, configuring the file system settings to use a specific EFS file system shared repository path. This configuration is automatically applied to all spaces within the domain.
The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created. Additionally, you need to add efs_id and domain_id as environment variables for the function.
At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

UID – The POSIX user ID. The default is 200001.
GID – The POSIX group ID. The default is 1001.

The function updates the default space settings of the SageMaker Studio domain, configuring the EFS file system to be used by all users. See the following code:

import json
import subprocess
import boto3
import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)
sm_client = boto3.client(‘sagemaker’)

def lambda_handler(event, context):

# Environment variables
file_system=os.environ[‘efs_id’]
domain_id=os.environ[‘domain_id’]

# EFS directory name
repository_name=’shared_repository’
repository=f’/mnt/efs/{repository_name}’

# Add permissions to the new directory
try:
subprocess.call([‘mkdir -p’, repository])
subprocess.call([‘chown’, ‘200001:1001’, repository])
except:
print(“Repository already created”)

# Update Sagemaker domain to enable access to the new directory
response = sm_client.update_domain(
DomainId=domain_id,
DefaultUserSettings={
‘CustomFileSystemConfigs’: [
{
‘EFSFileSystemConfig’: {
‘FileSystemId’: file_system,
‘FileSystemPath’: f’/{repository_name}’
}
}
]
}
)
logger.info(f”Updated Studio Domain {domain_id} and EFS {file_system}”)
return {
‘statusCode’: 200,
‘body’: json.dumps(f”Created dir and modified permissions for Studio Domain {domain_id}”)
}

The execution role of the Lambda function needs to have permissions to update the SageMaker Studio domain:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“sagemaker:UpdateDomain”
],
“Resource”: “*”
}
]
}

Configure an EFS directory shared across multiple domains under the same VPC
In this scenario, an administrator wants to provision an EFS file system for all users of multiple SageMaker Studio domains, using the same file system directory for all the users. The idea in this case is to assign the same EFS file system to all users of all domains that are within the same VPC. To test the solution, the account should ideally have two SageMaker Studio domains inside the VPC and subnet.
Create the EFS file system, add mount targets, and create an access point
Complete the steps in the previous section to set up your file system, mount targets, and access point.
Create a new Lambda function
Define a Lambda function called LambdaManageEFSUsers. This function is responsible for automating the configuration of SageMaker Studio domains to use a shared EFS file system within a specific VPC. This can be useful for organizations that want to provide a centralized storage solution for their ML projects across multiple SageMaker Studio domains. See the following code:

import json
import subprocess
import boto3
import os
import sys

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

sm_client = boto3.client(‘sagemaker’)

def lambda_handler(event, context):

#Environment variables
event_domain_id =event[“domain_id”]
file_system=os.environ[‘efs_id’]
env_vpc_id =os.environ[‘vpc_id’]

#Event parameters
repository_name=’shared_repository’
repository=f’/mnt/efs/{repository_name}’
domains =[]

# List all SageMaker domains in the specified VPC
response = sm_client.list_domains()
all_domains = response[‘Domains’]
for domain in all_domains:
domain_id =domain[“DomainId”]
data =sm_client.describe_domain(DomainId=domain_id)
domain_vpc_id = data[‘VpcId’]
if domain_vpc_id ==env_vpc_id:
domains.append(domain_id)

# Create directory and add the permission
try:
subprocess.call([‘mkdir -p’, repository])
subprocess.call([‘chown’, ‘200001:1001’, repository])
except:
print(“Repository already created”)

#Update Sagemaker domain
if len(domains)>0:
for domain_id in domains:
response = sm_client.update_domain(
DomainId=event_domain_id,
DefaultUserSettings={
‘CustomFileSystemConfigs’: [
{
‘EFSFileSystemConfig’: {
‘FileSystemId’: file_system,
‘FileSystemPath’: f’/{repository_name}’
}
}
]
}
)

logger.info(f”Updated Studio for Domains {domains} and EFS {file_system}”)
return {
‘statusCode’: 200,
‘body’: json.dumps(f”Created dir and modified permissions for Domains {domains}”)
}

else:
return {
‘statusCode’: 400,
‘body’: json.dumps(f”VPC id of all the domains {domain_vpc} is different than the vpc id configured {env_vpc_id}”)
}

The execution role of the Lambda function needs to have permissions to describe and update the SageMaker Studio domain:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“sagemaker:DescribeDomain”,
“sagemaker:UpdateDomain”
],
“Resource”: “*”
}
]
}

Clean up
To clean up the solution you implemented and avoid further costs, delete the CloudFormation template you deployed in your AWS account. When you delete the template, you also delete the EFS file system and its storage. For additional information, refer to Delete a stack from the CloudFormation console.
Conclusion
In this post, we have explored three scenarios demonstrating the versatility of integrating Amazon EFS with SageMaker Studio. These scenarios highlight how Amazon EFS can provide a scalable, secure, and collaborative data storage solution for data science teams.
The first scenario focused on configuring an EFS directory with private spaces for individual user profiles, allowing users to store and access their own data while the administrator manages the EFS file system centrally.
The second scenario showcased a shared EFS directory across all spaces within a SageMaker Studio domain, enabling better collaboration and centralized data management.
The third scenario explored an EFS file system shared across multiple SageMaker Studio domains, empowering enterprise-level data science collaboration and promoting efficient use of shared resources.
By implementing these Amazon EFS integration scenarios, organizations can unlock the full potential of their data science teams, improve data governance, and enhance the overall efficiency of their data-driven initiatives. The integration of Amazon EFS with SageMaker Studio provides a versatile platform for data science teams to thrive in the evolving landscape of ML and AI.

About the Authors
Irene Arroyo Delgado is an AI/ML and GenAI Specialist Solutions Architect at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads, to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.
Itziar Molina Fernandez is an AI/ML Consultant in the AWS Professional Services team. In her role, she works with customers building large-scale machine learning platforms and generative AI use cases on AWS. In her free time, she enjoys exploring new places.
Matteo Amadei is a Data Scientist Consultant in the AWS Professional Services team. He uses his expertise in artificial intelligence and advanced analytics to extract valuable insights and drive meaningful business outcomes for customers. He has worked on a wide range of projects spanning NLP, computer vision, and generative AI. He also has experience with building end-to-end MLOps pipelines to productionize analytical models. In his free time, Matteo enjoys traveling and reading.
Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years of software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Summarize call transcriptions securely with Amazon Transcribe and Amaz …

Given the volume of meetings, interviews, and customer interactions in modern business environments, audio recordings play a crucial role in capturing valuable information. Manually transcribing and summarizing these recordings can be a time-consuming and tedious task. Fortunately, advancements in generative AI and automatic speech recognition (ASR) have paved the way for automated solutions that can streamline this process.
Customer service representatives receive a high volume of calls each day. Previously, calls were recorded and manually reviewed later for compliance, regulations, and company policies. Call recordings had to be transcribed, summarized, and then redacted for personal identifiable information (PII) before analyzing calls, resulting in delayed access to insights.
Redacting PII is a critical practice in security for several reasons. Maintaining the privacy and protection of individuals’ personal information is not only a matter of ethical responsibility, but also a legal requirement. In this post, we show you how to use Amazon Transcribe to get near real-time transcriptions of calls sent to Amazon Bedrock for summarization and sensitive data redaction. We’ll walk through an architecture that uses AWS Step Functions to orchestrate the process, providing seamless integration and efficient processing
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading model providers such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Mistral AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. You can use  Amazon Bedrock Guardrails to redact sensitive information such as PII found in the generated call transcription summaries. Clean, summarized transcripts are then sent to analysts. This provides quicker access to call trends while protecting customer privacy.
Solution overview
The architecture of this solution is designed to be scalable, efficient, and compliant with privacy regulations. It includes the following key components:

Recording – An audio file, such as a meeting or support call, to be transcribed and summarized
Step Functions workflow – Coordinates the transcription and summarization process
Amazon Transcribe – Converts audio recordings into text
Amazon Bedrock – Summarizes the transcription and removes PII
Amazon SNS – Delivers the summary to the designated recipient
Recipient – Receives the summarized, PII-redacted transcript

The following diagram shows the architecture overflow –

The workflow orchestrated by Step Functions is as follows:

An audio recording is provided as an input to the Step Functions workflow. This could be done manually or automatically depending on the specific use case and integration requirements.
The workflow invokes Amazon Transcribe, which converts the multi-speaker audio recording into a textual, speaker-partition transcription. Amazon Transcribe uses advanced speech recognition algorithms and machine learning (ML) models to accurately partition speakers and transcribe the audio, handling various accents, background noise, and other challenges.
The transcription output from Amazon Transcribe is then passed to Anthropic’s Claude 3 Haiku model on Amazon Bedrock through AWS Lambda. This model was chosen because it has relatively lower latency and cost than other models. The model first summarizes the transcript according to its summary instructions, and then the summarized output (the model response) is evaluated by Amazon Bedrock Guardrails to redact PII. To learn how it blocks harmful content, refer to How Amazon Bedrock Guardrails works. The instructions and transcript are both passed to the model as context.
The output from Amazon Bedrock is stored in Amazon Simple Storage Service (Amazon S3) and sent to the designated recipient using Amazon Simple Notification Service (Amazon SNS). Amazon SNS supports various delivery channels, including email, SMS, and mobile push notifications, making sure that the summary reaches the intended recipient in a timely and reliable manner

The recipient can then review the concise summary, quickly grasping the key points and insights from the original audio recording. Additionally, sensitive information has been redacted, maintaining privacy and compliance with relevant regulations.
The following diagram shows the Step Functions workflow –

Prerequisites
Follow these steps before starting:

Amazon Bedrock users need to request access to models before they’re available for use. This is a one-time action. For this solution, you need to enable access to Anthropic’s Claude 3 Haiku model on Amazon Bedrock. For more information, refer to Access Amazon Bedrock foundation models. Deployment, as described below, is currently supported only in the US West (Oregon) us-west-2 AWS Region. Users may explore other models if desired. You might need some customizations to deploy to alternative Regions with different model availability (such as us-east-1, which hosts Anthropic’s Claude 3.5 Sonnet). Make sure you consider model quality, speed, and cost tradeoffs before choosing a model.
Create a guardrail for PII redaction. Configure filters to block or mask sensitive information. This option can be found on the Amazon Bedrock console on the Add sensitive information filters page when creating a guardrail. To learn how to configure filters for other use cases, refer to Remove PII from conversations by using sensitive information filters.

Deploy solution resources
To deploy the solution, download an AWS CloudFormation template to automatically provision the necessary resources in your AWS account. The template sets up the following components:

A Step Functions workflow
Lambda functions
An SNS topic
An S3 bucket
AWS Key Management Service (AWS KMS) keys for data encryption and decryption

By using this template, you can quickly deploy the sample solution with minimal manual configuration. The template requires the following parameters:

Email address used to send summary – The summary will be sent to this address. You must acknowledge the initial Amazon SNS confirmation email before receiving additional notifications.
Summary instructions – These are the instructions given to the Amazon Bedrock model to generate the summary
Guardrail ID – This is the ID of your recently created guardrail, which can be found on the Amazon Bedrock Guardrails console in Guardrail overview

The Summary instructions are read into your Lambda function as an environment variable.

 
# Use the provided instructions to provide the summary. Use a default if no intructions are provided.
SUMMARY_INSTRUCTIONS = os.getenv(‘SUMMARY_INSTRUCTIONS’)
 
These are then used as part of your payload to Anthropic’s Claude 3 Haiku model. This is shared to give you an understanding of how to pass the instructions and text to the model.
 
# Create the payload to provide to the Anthropic model.
        user_message = {“role”: “user”, “content”: f”{SUMMARY_INSTRUCTIONS}{transcript}”}
        messages = [user_message]
response = generate_message(bedrock_client, ‘anthropic.claude-3-haiku-20240307-v1:0″‘, “”, messages, 1000)
 
The generate_message() function contains the invocation to Amazon Bedrock with the guardrail ID and other relevant parameters.
 
def generate_message(bedrock_runtime, model_id, system_prompt, messages, max_tokens):
    body = json.dumps(
        {
            “anthropic_version”: “bedrock-2023-05-31”,
            “max_tokens”: max_tokens,
            “system”: system_prompt,
            “messages”: messages
        }
    )
print(f’Invoking model: {BEDROCK_MODEL_ID}’)
 
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId=BEDROCK_MODEL_ID,
        # contentType=contentType,
        guardrailIdentifier =BEDROCK_GUARDRAIL_ID,
        guardrailVersion =”1″,
        trace =”ENABLED”)
    response_body = json.loads(response.get(‘body’).read())
    print(f’response: {response}’)
    return response_body

Deploy the solution
After you deploy the resources using AWS CloudFormation, complete these steps:

Add a Lambda layer.

Although AWS Lambda regularly updates the version of AWS Boto3 included, at the time of writing this post, it still provides version 1.34.126. To use Amazon Bedrock Guardrails, you need version 1.34.90 or higher, for which we’ll add a Lambda layer that updates the Boto3. You can follow the official developer guide on how to add a Lambda layer.
There are different ways to create a Lambda layer. A simple method is to use the steps outlined in Packaging the layer content, which references a sample application repo. You should be able to replace requests==2.31.0 within requirements.txt content to boto3, which will install the latest available version, then create the layer.
To add the layer to Lambda, make sure that the parameters specified in Creating the layer match the deployed Lambda. That is, you need to update compatible-architectures to x86_64.

Acknowledge the Amazon SNS email confirmation that you should receive a few moments after creating the CloudFormation stack
On the AWS CloudFormation console, find the stack you just created
On the stack’s Outputs tab, look for the value associated with AssetBucketName. It will look something like summary-generator-assetbucket-xxxxxxxxxxxxx.
On the Amazon S3 console, find your S3 assets bucket.

This is where you’ll upload your recordings. Valid file formats are MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.

Upload your recording to the recordings folder in Amazon S3

Uploading recordings will automatically trigger the AWS Step Functions state machine. For this example, we use a sample team meeting recording from the sample recording.

On the AWS Step Functions console, find the summary-generator state machine. Choose the name of the state machine run with the status Running.

Here, you can watch the progress of the state machine as it processes the recording. After it reaches its Success state, you should receive an emailed summary of the recording. Alternatively, you can navigate to the S3 assets bucket and view the transcript there in the transcripts folder.
Expand the solution
Now that you have a working solution, here are some potential ideas to customize the solution for your specific use cases:

Try altering the process to fit your available source content and desired outputs:

For situations where transcripts are available, create an alternate AWS Step Functions workflow to ingest existing text-based or PDF-based transcriptions
Instead of using Amazon SNS to notify recipients through email, you can use it to send the output to a different endpoint, such as a team collaboration site or to the team’s chat channel

Try changing the summary instructions for the AWS CloudFormation stack parameter provided to Amazon Bedrock to produce outputs specific to your use case. The following are some examples:

When summarizing a company’s earnings call, you could have the model focus on potential promising opportunities, areas of concern, and things that you should continue to monitor
If you’re using the model to summarize a course lecture, it could identify upcoming assignments, summarize key concepts, list facts, and filter out small talk from the recording

For the same recording, create different summaries for different audiences:

Engineers’ summaries focus on design decisions, technical challenges, and upcoming deliverables
Project managers’ summaries focus on timelines, costs, deliverables, and action items
Project sponsors get a brief update on project status and escalations
For longer recordings, try generating summaries for different levels of interest and time commitment. For example, create a single sentence, single paragraph, single page, or in-depth summary. In addition to the prompt, you might want to adjust the max_tokens_to_sample parameter to accommodate different content lengths.

Clean up
Clean up the resources you created for this solution to avoid incurring costs. You can use an AWS SDK, the AWS Command Line Interface (AWS CLI), or the console.

Delete Amazon Bedrock Guardrails and the Lambda layer you created
Delete the CloudFormation stack

To use the console, follow these steps:

On the Amazon Bedrock console, in the navigation menu, select Guardrails. Choose your guardrail, then select Delete.
On the AWS Lambda console, in the navigation menu, select Layers. Choose your layer, then select Delete.
On the AWS CloudFormation console, in the navigation menu, select Stacks. Choose the stack you created, then select Delete.

Deleting the stack won’t delete the associated S3 bucket. If you no longer require the recordings or transcripts, you can delete the bucket separately. Amazon Transcribe is designed to automatically delete transcription jobs after 90 days. However, you can opt to manually delete these jobs before the 90-day retention period expires.
Conclusion
As businesses turn to data as a foundation for decision-making, having the ability to efficiently extract insights from audio recordings is invaluable. By using the power of generative AI with Amazon Bedrock and Amazon Transcribe, your organization can create concise summaries of audio recordings while maintaining privacy and compliance. The proposed architecture demonstrates how AWS services can be orchestrated using AWS Step Functions to streamline and automate complex workflows, enabling organizations to focus on their core business activities.
This solution not only saves time and effort, but also makes sure that sensitive information is redacted, mitigating potential risks and promoting compliance with data protection regulations. As organizations continue to generate and process large volumes of audio data, solutions like this will become increasingly important for gaining insights, making informed decisions, and maintaining a competitive edge.

About the authors
Yash Yamsanwar is a Machine Learning Architect at Amazon Web Services (AWS). He is responsible for designing high-performance, scalable machine learning infrastructure that optimizes the full lifecycle of machine learning models, from training to deployment. Yash collaborates closely with ML research teams to push the boundaries of what is possible with LLMs and other cutting-edge machine learning technologies.
Sawyer Hirt is a Solutions Architect at AWS, specializing in AI/ML and cloud architectures, with a passion for helping businesses leverage cutting-edge technologies to overcome complex challenges. His expertise lies in designing and optimizing ML workflows, enhancing system performance, and making advanced AI solutions more accessible and cost-effective, with a particular focus on Generative AI. Outside of work, Sawyer enjoys traveling, spending time with family, and staying current with the latest developments in cloud computing and artificial intelligence.