Researchers at Stanford University Introduce ‘pyvene’: An Open-So …

Understanding and manipulating neural models is essential in the evolving field of AI. This necessity stems from various applications, from refining models for enhanced robustness to unraveling their decision-making processes for greater interpretability. Amidst this backdrop, the Stanford University research team has introduced “pyvene,” a groundbreaking open-source Python library that facilitates intricate interventions on PyTorch models. pyvene is ingeniously designed to overcome the limitations posed by existing tools, which often need more flexibility, extensibility, and user-friendliness.

At the heart of pyvene’s innovation is its configuration-based approach to interventions. This method departs from traditional, code-executed interventions, offering a more intuitive and adaptable way to manipulate model states. The library handles various intervention types, including static and trainable parameters, accommodating multiple research needs. One of the library’s standout features is its support for complex intervention schemes, such as sequential and parallel interventions, and its ability to apply interventions at various stages of a model’s decoding process. This versatility makes pyvene an invaluable asset for generative model research, where model output generation dynamics are particularly interesting.

Delving deeper into pyvene’s capabilities, the research demonstrates the library’s efficacy through compelling case studies focused on model interpretability. The team illustrates pyvene’s potential to uncover the mechanisms underlying model predictions by employing causal abstraction and knowledge localization techniques. This endeavor showcases the library’s utility in practical research scenarios and highlights its contribution to making AI models more transparent and understandable.

The Stanford team’s research rigorously tests pyvene across various neural architectures, illustrating its broad applicability. For instance, the library successfully facilitates interventions on models ranging from simple feed-forward networks to complex, multi-modal architectures. This adaptability is further showcased in the library’s support for interventions that involve altering activations across multiple forward passes of a model, a challenging task for many existing tools.

Performance and results derived from using pyvene are notably impressive. The library has been instrumental in identifying and manipulating specific components of neural models, thereby enabling a more nuanced understanding of model behavior. In one of the case studies, pyvene was used to localize gender in neural model representations, achieving an accuracy of 100% in gendered pronoun prediction tasks. This high level of precision underscores the library’s effectiveness in facilitating targeted interventions and extracting meaningful insights from complex models.

As the Stanford University research team continues to refine and expand pyvene’s capabilities, they underscore the library’s potential for fostering innovation in AI research. The introduction of pyvene marks a significant step in understanding and improving neural models. By offering a versatile, user-friendly tool for conducting interventions, the team addresses the limitations of existing resources and opens new pathways for exploration and discovery in artificial intelligence. As pyvene gains traction within the research community, it promises to catalyze further advancements, contributing to developing more robust, interpretable, and effective AI systems.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post Researchers at Stanford University Introduce ‘pyvene’: An Open-Source Python Library that Supports Intervention-Based Research on Machine Learning Models appeared first on MarkTechPost.

Apple Announces MM1: A Family of Multimodal LLMs Up To 30B Parameters …

Recent research has focused on crafting advanced Multimodal Large Language Models (MLLMs) that seamlessly integrate visual and textual data complexities. By delving into the minutiae of architectural design, data selection, and methodological transparency, research has pushed the boundaries of what MLLMs can achieve and support future explorations. Their work is particularly notable for its comprehensive approach to dissecting the various components that contribute to the success of these models, shedding light on the pivotal roles played by image encoders, vision-language connectors, and the strategic amalgamation of diverse data types.

The researchers at Apple build MM1, a family of cutting-edge multimodal models with up to 30 billion parameters. They have taken a different path of openness and detailed documentation, providing valuable insights into constructing MLLMs. Their meticulous documentation covers everything from the choice of image encoders to the intricacies of connecting visual data with linguistic elements, offering a clear roadmap for building more effective and transparent models.

One of the study’s key revelations is the significant impact of carefully chosen pre-training data on the model’s performance. The researchers discovered that a judicious mix of image-caption pairs, interleaved image-text documents, and text-only data is essential for achieving superior results, particularly in few-shot learning scenarios. It highlights the importance of diversity in training data, which enables models to better generalize across different tasks and settings.

The suite of MM1 models represents a significant leap forward, capable of achieving competitive performance across a wide array of benchmarks. What sets MM1 apart is its sheer scale and its architectural innovations, including dense models and mixture-of-experts variants. These models demonstrate the effectiveness of the researchers’ approach, combining large-scale pre-training with strategic data selection to enhance the model’s learning capabilities.

Key Takeaways from the research include:

Researchers from Apple led a comprehensive study on MLLMs, focusing on architectural and data selection strategies.

Transparency and detailed documentation were prioritized to facilitate future research.

A balanced mix of diverse pre-training data was crucial for model performance.

MM1, a new family of models with up to 30 billion parameters, was introduced, showcasing superior performance across benchmarks.

The study’s findings emphasize the significance of methodological choices in advancing MLLM development.

In conclusion, this research represents a significant advancement in the field of MLLMs, offering new insights into the optimal construction of these complex models. By highlighting the importance of transparency, detailed documentation, and strategic data selection, the study paves the way for future innovations. The introduction of MM1 underscores the potential of well-designed MLLMs to set new standards in multimodal understanding. The principles and findings outlined in this study will unlock the full potential of multimodal language models.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post Apple Announces MM1: A Family of Multimodal LLMs Up To 30B Parameters that are SoTA in Pre-Training Metrics and Perform Competitively after Fine-Tuning appeared first on MarkTechPost.

Meet Rerankers: A Lightweight Python Library to Provide a Unified Way …

In the world of information retrieval, one common challenge is document reranking. It’s a powerful technique but can be intimidating for those unfamiliar with it. Trying a new reranking approach often means reworking your entire pipeline, even if the goal remains the same. This complexity can discourage experimentation and innovation in the field.

Existing solutions to this problem are limited. Many reranking methods are implemented in separate libraries with quirks and dependencies. This fragmented landscape makes it difficult for users to integrate different reranking models into their workflows seamlessly.

Meet Rerankers, a lightweight library designed to address these challenges. With rerankers, users can effortlessly incorporate various reranking methods into their pipelines using a unified API. The library aims to simplify the process of trying out different reranking models, empowering users to experiment freely without fear of disrupting their existing workflows.

One of the key advantages of rerankers is its simplicity. The library provides a straightforward API with just a handful of calls to learn. Regardless of the reranking model being used, users can expect consistent input/output formats, making it easy to swap between different methods.

Despite its lightweight nature, rerankers delivers impressive performance. Metrics demonstrate its capabilities in improving the relevance and ranking of search results. Whether it’s using standard SentenceTransformer models, T5-based pointwise rankers, or APIs like Cohere and Jina, rerankers consistently deliver reliable results.

In conclusion, rerankers offers a user-friendly solution to the challenges of document reranking. By providing a unified API and simplifying the integration process, it empowers users to explore and experiment with different reranking methods with ease. Rerankers’ focus on simplicity, flexibility, and performance makes it a catalyst for innovation in information retrieval.

Installation is very simple. The core package ships with just two dependencies, tqdm and pydantic, so as to avoid any conflict with your current environment. You may then install only the dependencies required by the models you want to try out:

# Core package only, will require other dependencies already installed
pip install rerankers

# All transformers-based approaches (cross-encoders, t5, colbert)
pip install “rerankers[transformers]”

# RankGPT
pip install “rerankers[gpt]”

# API-based rerankers (Cohere, Jina, soon MixedBread)
pip install “rerankers[api]”

# All of the above
pip install “rerankers[all]”

Document reranking is powerful, but daunting to get started with. Moreover, trying a new approach requires modifying your pipeline, even though it does the same thing!Introducing rerankers: a lightweight library to provide a unified way to use various reranking methods1/? pic.twitter.com/FbclPwnXLF— Benjamin Clavié (@bclavie) March 14, 2024

The post Meet Rerankers: A Lightweight Python Library to Provide a Unified Way to Use Various Reranking Methods appeared first on MarkTechPost.

Zhejiang University Researchers Propose Fuyou: A Low-Cost Deep Learnin …

The advent of large language models (LLMs) has sparked a revolution in natural language processing, captivating the world with their superior capabilities stemming from the massive number of parameters they utilize. These LLMs, epitomized by the transformative power of dense transformer models, have not only broken records in accuracy but have also become indispensable assets in data management tasks. Recently, the model size of dense transformer models has grown from 1.5B (GPT-2) to 540B (PaLM), which shows the evolution of these models in an unprecedented journey into the realm of linguistic mastery.

While the potential of LLMs is undeniable, a critical challenge arises from their immense parameter sizes overwhelming even the most powerful GPUs, which currently peak at 80GB of memory. When conducting stochastic gradient descent-based optimization, they must be more sufficient to accommodate these vast parameters and their associated optimizer states. To host such a huge model, one can aggregate device memory from multiple GPUs, and it takes 32 NVIDIA A100 GPUs to fit a model with 100 billion parameters for fine-tuning. However, this approach introduces prohibitive costs for most academic researchers, who always have a limited budget for many high-end GPU servers.

Researchers from Zhejiang University proposed Fuyou. This low-cost training framework enables efficient 100B huge model fine-tuning on a low-end server with a low-end GPU and limited CPU memory capacity. It is implemented on PyTorch, which is a popular deep-learning framework. Compared with other models like ZeRO-Infinity, Fuyou can fine-tune 175B GPT-3 on a consumer GPU RTX 4090 with high GPU utilization, while ZeRO-Infinity fails to fine-tune. 

The focus lies on integrating SSD-CPU communication as a pivotal optimization dimension, strategically harmonizing computation and data swapping to unlock the full potential of GPU utilization. This endeavor unfolds through three pioneering innovations:

 A synchronous out-of-core CPU optimizer that overlaps with backward propagation to maximize GPU utilization.

A GPU-CPU-SSD fully-pipelined activation swapping mechanism to allow for a significantly larger model fine-tuning.

An automatic activation swapping management to automatically determine the optimal amount of swapping activations to minimize the epoch time.

In the dynamic realm of model fine-tuning, Fuyou emerges as a powerhouse, delivering exceptional performance whether on the cutting-edge A100-80GB or the formidable 4090 in a commodity server. When fine-tuning a GPT-3 175B model, Fuyou achieves 87 TFLOPS on 4090 and 172 TFLOPS on A100-80GB. Also, it reaches up to 3.47×TFLOPS compared to ZeRO-Infinity when a GPT-3 13B model is fine-tuned. To utilize cheap SSDs in improving training throughput, the cost-effectiveness of Fuyou with Megatron-LM is compared on DGX-2 nodes using tensor parallelism. Throughput is compared over the total price of GPUs6 and SSDs in a server where Fuyou achieves at most 1.70× cost-effectiveness over Megatron-LM.

In conclusion, this paper proposed Fuyou, a low-cost training framework that enables efficient 100B huge model fine-tuning on a low-end server with a low-end GPU and limited CPU memory capacity. It is implemented on PyTorch. It achieves 87 and 172 TFLOPS when fine-tuning GPT-3 175B. Besides, it reaches up to 3.42× and 6.73× TFLOPS compared to ZeRO-Infinity and Colossal-AI when fine-tuning GPT-3 13B. Also, Fuyou achieves at most 1.70× cost-effectiveness over Megatron-LM.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post Zhejiang University Researchers Propose Fuyou: A Low-Cost Deep Learning Training Framework that Enables Efficient 100B Huge Model Fine-Tuning on a Low-End Server with a Low-End GPU and Limited CPU Memory Capacity appeared first on MarkTechPost.

Meet Ragas: A Python-based Machine Learning Framework that Helps to Ev …

In language models, there’s a sophisticated technique known as Retrieval Augmented Generation (RAG). This approach enhances the language model’s understanding by fetching relevant information from external data sources. However, a significant challenge arises when developers try to assess how well their RAG systems perform. With a straightforward way to measure effectiveness, knowing if the external data truly benefits the language model or complicates its responses is easier.

There are tools and frameworks designed to build these advanced RAG pipelines, enabling the integration of external data into language models. These resources are invaluable for developers looking to enhance their systems but must catch up on evaluation. When augmented with external data, determining the quality of a language model’s output is more complex. Existing tools primarily focus on RAG systems’ setup and operational aspects, leaving a gap in the evaluation phase.

Ragas is a machine learning framework designed to fill this gap, offering a comprehensive way to evaluate RAG pipelines. It provides developers with the latest research-based tools to assess the generated text’s quality, including how relevant and faithful the information is to the original query. By integrating Ragas into their continuous integration/continuous deployment (CI/CD) pipelines, developers can continuously monitor and ensure their RAG systems perform as expected.

Ragas showcases its capabilities through critical metrics, such as context precision, faithfulness, and answer relevancy. These metrics offer tangible insights into how well the RAG system is performing. For example, context precision measures how accurately the external data retrieved relates to the query. Faithfulness checks how closely the language model’s responses align with the truth of the retrieved data. Lastly, answer relevancy assesses how relevant the provided answers are to the original questions. These metrics provide a comprehensive overview of an RAG system’s performance.

In conclusion, Ragas is a crucial tool for developers working with Retrieval Augmented Generation systems. By addressing the previously unmet need for practical evaluation, Ragas enables developers to quantify the performance of their RAG pipelines accurately. This not only helps in refining the systems but also ensures that the integration of external data genuinely enhances the language model’s capabilities. With Ragas, developers can now navigate the complex landscape of RAG systems with a clearer understanding of their performance, leading to more informed improvements and, ultimately, more powerful and accurate language models.
The post Meet Ragas: A Python-based Machine Learning Framework that Helps to Evaluate Your Retrieval Augmented Generation (RAG) Pipelines appeared first on MarkTechPost.

Meet Motion Mamba: A Novel Machine Learning Framework Designed for Eff …

The search to replicate human motion digitally has long captivated researchers, spanning applications from video games and animations to robotics. This pursuit demands an intricate understanding of the nuances that define human movement, challenging scientists to devise models that can mimic and predict complex behaviors with precision. While groundbreaking in their time, existing approaches often grapple with the limitations imposed by computational complexity and an inability to capture human motion’s fluidity over extended sequences accurately.

Recent advancements include exploring state space models (SSMs), which have been heralded for motion prediction significantly. These models, particularly the Mamba variant, have shown promise in managing long sequences more effectively than their predecessors without the burden of excessive computational demands. However, the application of SSMs to motion generation has its challenges. The primary obstacle lies in adapting these models to fully grasp the detailed choreography of human movement, which requires both precision in moment-to-moment transitions and the capacity to maintain the integrity of motion over time. 

Researchers from Monash University, The Australian National University, Mohamed bin Zayed University of Artificial Intelligence, and Carnegie Mellon University have collaboratively introduced Motion Mamba to address the discussed challenges. This model stands out for its innovative approach to motion generation. The Motion Mamba framework integrates two parts:

Hierarchical Temporal Mamba (HTM) block

Bidirectional Spatial Mamba (BSM) block

These parts are designed to navigate temporal and spatial motion data complexities. The HTM block excels in analyzing temporal aspects, employing a hierarchical scanning mechanism that discerns intricate movement patterns across time. On the other hand, the BSM block focuses on spatial data, processing information in both forward and reverse directions to ensure a comprehensive understanding of motion at any given instant.

The performance of the Motion Mamba model achieves up to 50% better FID (Fréchet Inception Distance) scores than existing methods. This improvement highlights its ability to generate high-quality, realistic human motion sequences. Motion Mamba’s design allows up to four times faster processing, enabling real-time motion generation without sacrificing quality.

In conclusion,  the research can be summarized in the following points:

Exploration of SSMs in digital human motion replication highlights their efficiency and accuracy in predicting complex behaviors.

The Mamba model, a variant of SSMs, is particularly noted for its effectiveness in handling long sequences with reduced computational demands.

Motion Mamba integrates Hierarchical Temporal Mamba (HTM) and Bidirectional Spatial Mamba (BSM) blocks for improved temporal and spatial motion analysis.

Significant performance gains are observed with Motion Mamba, achieving up to 50% better FID scores and four times faster processing than existing methods.

Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post Meet Motion Mamba: A Novel Machine Learning Framework Designed for Efficient and Extended Sequence Motion Generation appeared first on MarkTechPost.

Transform one-on-one customer interactions: Build speech-capable order …

In today’s landscape of one-on-one customer interactions for placing orders, the prevailing practice continues to rely on human attendants, even in settings like drive-thru coffee shops and fast-food establishments. This traditional approach poses several challenges: it heavily depends on manual processes, struggles to efficiently scale with increasing customer demands, introduces the potential for human errors, and operates within specific hours of availability. Additionally, in competitive markets, businesses adhering solely to manual processes might find it challenging to deliver efficient and competitive service. Despite technological advancements, the human-centric model remains deeply ingrained in order processing, leading to these limitations.
The prospect of utilizing technology for one-on-one order processing assistance has been available for some time. However, existing solutions can often fall into two categories: rule-based systems that demand substantial time and effort for setup and upkeep, or rigid systems that lack the flexibility required for human-like interactions with customers. As a result, businesses and organizations face challenges in swiftly and efficiently implementing such solutions. Fortunately, with the advent of generative AI and large language models (LLMs), it’s now possible to create automated systems that can handle natural language efficiently, and with an accelerated on-ramping timeline.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. In addition to Amazon Bedrock, you can use other AWS services like Amazon SageMaker JumpStart and Amazon Lex to create fully automated and easily adaptable generative AI order processing agents.
In this post, we show you how to build a speech-capable order processing agent using Amazon Lex, Amazon Bedrock, and AWS Lambda.
Solution overview
The following diagram illustrates our solution architecture.

The workflow consists of the following steps:

A customer places the order using Amazon Lex.
The Amazon Lex bot interprets the customer’s intents and triggers a DialogCodeHook.
A Lambda function pulls the appropriate prompt template from the Lambda layer and formats model prompts by adding the customer input in the associated prompt template.
The RequestValidation prompt verifies the order with the menu item and lets the customer know via Amazon Lex if there’s something they want to order that isn’t part of the menu and will provide recommendations. The prompt also performs a preliminary validation for order completeness.
The ObjectCreator prompt converts the natural language requests into a data structure (JSON format).
The customer validator Lambda function verifies the required attributes for the order and confirms if all necessary information is present to process the order.
A customer Lambda function takes the data structure as an input for processing the order and passes the order total back to the orchestrating Lambda function.
The orchestrating Lambda function calls the Amazon Bedrock LLM endpoint to generate a final order summary including the order total from the customer database system (for example, Amazon DynamoDB).
The order summary is communicated back to the customer via Amazon Lex. After the customer confirms the order, the order will be processed.

Prerequisites
This post assumes that you have an active AWS account and familiarity with the following concepts and services:

Generative AI
Amazon Bedrock
Anthropic Claude V2
Amazon DynamoDB
AWS Lambda
Amazon Lex
Amazon Simple Storage Service (Amazon S3)

Also, in order to access Amazon Bedrock from the Lambda functions, you need to make sure the Lambda runtime has the following libraries:

boto3>=1.28.57
awscli>=1.29.57
botocore>=1.31.57

This can be done with a Lambda layer or by using a specific AMI with the required libraries.
Furthermore, these libraries are required when calling the Amazon Bedrock API from Amazon SageMaker Studio. This can be done by running a cell with the following code:

%pip install –no-build-isolation –force-reinstall
“boto3>=1.28.57”
“awscli>=1.29.57”
“botocore>=1.31.57”

Finally, you create the following policy and later attach it to any role accessing Amazon Bedrock:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “Statement1”,
“Effect”: “Allow”,
“Action”: “bedrock:*”,
“Resource”: “*”
}
]
}

Create a DynamoDB table
In our specific scenario, we’ve created a DynamoDB table as our customer database system, but you could also use Amazon Relational Database Service (Amazon RDS). Complete the following steps to provision your DynamoDB table (or customize the settings as needed for your use case):

On the DynamoDB console, choose Tables in the navigation pane.
Choose Create table.

For Table name, enter a name (for example, ItemDetails).
For Partition key, enter a key (for this post, we use Item).
For Sort key, enter a key (for this post, we use Size).
Choose Create table.

Now you can load the data into the DynamoDB table. For this post, we use a CSV file. You can load the data to the DynamoDB table using Python code in a SageMaker notebook.
First, we need to set up a profile named dev.

Open a new terminal in SageMaker Studio and run the following command:

aws configure –profile dev

This command will prompt you to enter your AWS access key ID, secret access key, default AWS Region, and output format.

Return to the SageMaker notebook and write a Python code to set up a connection to DynamoDB using the Boto3 library in Python. This code snippet creates a session using a specific AWS profile named dev and then creates a DynamoDB client using that session. The following is the code sample to load the data:

%pip install boto3
import boto3
import csv

# Create a session using a profile named ‘dev’
session = boto3.Session(profile_name=’dev’)

# Create a DynamoDB resource using the session
dynamodb = session.resource(‘dynamodb’)

# Specify your DynamoDB table name
table_name = ‘your_table_name’
table = dynamodb.Table(table_name)

# Specify the path to your CSV file
csv_file_path = ‘path/to/your/file.csv’

# Read CSV file and put items into DynamoDB
with open(csv_file_path, ‘r’, encoding=’utf-8-sig’) as csvfile:
csvreader = csv.reader(csvfile)

# Skip the header row
next(csvreader, None)

for row in csvreader:
# Extract values from the CSV row
item = {
‘Item’: row[0], # Adjust the index based on your CSV structure
‘Size’: row[1],
‘Price’: row[2]
}

# Put item into DynamoDB
response = table.put_item(Item=item)

print(f”Item added: {response}”)
print(f”CSV data has been loaded into the DynamoDB table: {table_name}”)

Alternatively, you can use NoSQL Workbench or other tools to quickly load the data to your DynamoDB table.
The following is a screenshot after the sample data is inserted into the table.
Create templates in a SageMaker notebook using the Amazon Bedrock invocation API
To create our prompt template for this use case, we use Amazon Bedrock. You can access Amazon Bedrock from the AWS Management Console and via API invocations. In our case, we access Amazon Bedrock via API from the convenience of a SageMaker Studio notebook to create not only our prompt template, but our complete API invocation code that we can later use on our Lambda function.

On the SageMaker console, access an existing SageMaker Studio domain or create a new one to access Amazon Bedrock from a SageMaker notebook.

After you create the SageMaker domain and user, choose the user and choose Launch and Studio. This will open a JupyterLab environment.
When the JupyterLab environment is ready, open a new notebook and begin importing the necessary libraries.

There are many FMs available via the Amazon Bedrock Python SDK. In this case, we use Claude V2, a powerful foundational model developed by Anthropic.
The order processing agent needs a few different templates. This can change depending on the use case, but we have designed a general workflow that can apply to multiple settings. For this use case, the Amazon Bedrock LLM template will accomplish the following:

Validate the customer intent
Validate the request
Create the order data structure
Pass a summary of the order to the customer

To invoke the model, create a bedrock-runtime object from Boto3.

#Model api request parameters
modelId = ‘anthropic.claude-v2’ # change this to use a different version from the model provider
accept = ‘application/json’
contentType = ‘application/json’

import boto3
import json
bedrock = boto3.client(service_name=’bedrock-runtime’)

Let’s start by working on the intent validator prompt template. This is an iterative process, but thanks to Anthropic’s prompt engineering guide, you can quickly create a prompt that can accomplish the task.

Create the first prompt template along with a utility function that will help prepare the body for the API invocations.

The following is the code for prompt_template_intent_validator.txt:

“{“prompt”: “Human: I will give you some instructions to complete my request.\n<instructions>Given the Conversation between Human and Assistant, you need to identify the intent that the human wants to accomplish and respond appropriately. The valid intents are: Greeting,Place Order, Complain, Speak to Someone. Always put your response to the Human within the Response tags. Also add an XML tag to your output identifying the human intent.\nHere are some examples:\n<example><Conversation> H: hi there.\n\nA: Hi, how can I help you today?\n\nH: Yes. I would like a medium mocha please</Conversation>\n\nA:<intent>Place Order</intent><Response>\nGot it.</Response></example>\n<example><Conversation> H: hello\n\nA: Hi, how can I help you today?\n\nH: my coffee does not taste well can you please re-make it?</Conversation>\n\nA:<intent>Complain</intent><Response>\nOh, I am sorry to hear that. Let me get someone to help you.</Response></example>\n<example><Conversation> H: hi\n\nA: Hi, how can I help you today?\n\nH: I would like to speak to someone else please</Conversation>\n\nA:<intent>Speak to Someone</intent><Response>\nSure, let me get someone to help you.</Response></example>\n<example><Conversation> H: howdy\n\nA: Hi, how can I help you today?\n\nH:can I get a large americano with sugar and 2 mochas with no whipped cream</Conversation>\n\nA:<intent>Place Order</intent><Response>\nSure thing! Please give me a moment.</Response></example>\n<example><Conversation> H: hi\n\n</Conversation>\n\nA:<intent>Greeting</intent><Response>\nHi there, how can I help you today?</Response></example>\n</instructions>\n\nPlease complete this request according to the instructions and examples provided above:<request><Conversation>REPLACEME</Conversation></request>\n\nAssistant:\n”, “max_tokens_to_sample”: 250, “temperature”: 1, “top_k”: 250, “top_p”: 0.75, “stop_sequences”: [“\n\nHuman:”, “\n\nhuman:”, “\n\nCustomer:”, “\n\ncustomer:”]}”

Save this template into a file in order to upload to Amazon S3 and call from the Lambda function when needed. Save the templates as JSON serialized strings in a text file. The previous screenshot shows the code sample to accomplish this as well.
Repeat the same steps with the other templates.

The following are some screenshots of the other templates and the results when calling Amazon Bedrock with some of them.
The following is the code for prompt_template_request_validator.txt:

“{“prompt”: “Human: I will give you some instructions to complete my request.\n<instructions>Given the context do the following steps: 1. verify that the items in the input are valid. If customer provided an invalid item, recommend replacing it with a valid one. 2. verify that the customer has provided all the information marked as required. If the customer missed a required information, ask the customer for that information. 3. When the order is complete, provide a summary of the order and ask for confirmation always using this phrase: ‘is this correct?’ 4. If the customer confirms the order, Do not ask for confirmation again, just say the phrase inside the brackets [Great, Give me a moment while I try to process your order]</instructions>\n<context>\nThe VALID MENU ITEMS are: [latte, frappe, mocha, espresso, cappuccino, romano, americano].\nThe VALID OPTIONS are: [splenda, stevia, raw sugar, honey, whipped cream, sugar, oat milk, soy milk, regular milk, skimmed milk, whole milk, 2 percent milk, almond milk].\nThe required information is: size. Size can be: small, medium, large.\nHere are some examples: <example>H: I would like a medium latte with 1 Splenda and a small romano with no sugar please.\n\nA: <Validation>:\nThe Human is ordering a medium latte with one splenda. Latte is a valid menu item and splenda is a valid option. The Human is also ordering a small romano with no sugar. Romano is a valid menu item.</Validation>\n<Response>\nOk, I got: \n\t-Medium Latte with 1 Splenda and.\n\t-Small Romano with no Sugar.\nIs this correct?</Response>\n\nH: yep.\n\nA:\n<Response>\nGreat, Give me a moment while I try to process your order</example>\n\n<example>H: I would like a cappuccino and a mocha please.\n\nA: <Validation>:\nThe Human is ordering a cappuccino and a mocha. Both are valid menu items. The Human did not provide the size for the cappuccino. The human did not provide the size for the mocha. I will ask the Human for the required missing information.</Validation>\n<Response>\nSure thing, but can you please let me know the size for the Cappuccino and the size for the Mocha? We have Small, Medium, or Large.</Response></example>\n\n<example>H: I would like a small cappuccino and a large lemonade please.\n\nA: <Validation>:\nThe Human is ordering a small cappuccino and a large lemonade. Cappuccino is a valid menu item. Lemonade is not a valid menu item. I will suggest the Human a replacement from our valid menu items.</Validation>\n<Response>\nSorry, we don’t have Lemonades, would you like to order something else instead? Perhaps a Frappe or a Latte?</Response></example>\n\n<example>H: Can I get a medium frappuccino with sugar please?\n\nA: <Validation>:\n The Human is ordering a Frappuccino. Frappuccino is not a valid menu item. I will suggest a replacement from the valid menu items in my context.</Validation>\n<Response>\nI am so sorry, but Frappuccino is not in our menu, do you want a frappe or a cappuccino instead? perhaps something else?</Response></example>\n\n<example>H: I want two large americanos and a small latte please.\n\nA: <Validation>:\n The Human is ordering 2 Large Americanos, and a Small Latte. Americano is a valid menu item. Latte is a valid menu item.</Validation>\n<Response>\nOk, I got: \n\t-2 Large Americanos and.\n\t-Small Latte.\nIs this correct?</Response>\n\nH: looks correct, yes.\n\nA:\n<Response>\nGreat, Give me a moment while I try to process your order.</Response></example>\n\n</Context>\n\nPlease complete this request according to the instructions and examples provided above:<request>REPLACEME</request>\n\nAssistant:\n”, “max_tokens_to_sample”: 250, “temperature”: 0.3, “top_k”: 250, “top_p”: 0.75, “stop_sequences”: [“\n\nHuman:”, “\n\nhuman:”, “\n\nCustomer:”, “\n\ncustomer:”]}”

The following is our response from Amazon Bedrock using this template.
The following is the code for prompt_template_object_creator.txt:

“{“prompt”: “Human: I will give you some instructions to complete my request.\n<instructions>Given the Conversation between Human and Assistant, you need to create a json object in Response with the appropriate attributes.\nHere are some examples:\n<example><Conversation> H: I want a latte.\n\nA:\nCan I have the size?\n\nH: Medium.\n\nA: So, a medium latte.\nIs this Correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\”1\”:{\”item\”:\”latte\”,\”size\”:\”medium\”,\”addOns\”:[]}}</Response></example>\n<example><Conversation> H: I want a large frappe and 2 small americanos with sugar.\n\nA: Okay, let me confirm:\n\n1 large frappe\n\n2 small americanos with sugar\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\”1\”:{\”item\”:\”frappe\”,\”size\”:\”large\”,\”addOns\”:[]},\”2\”:{\”item\”:\”americano\”,\”size\”:\”small\”,\”addOns\”:[\”sugar\”]},\”3\”:{\”item\”:\”americano\”,\”size\”:\”small\”,\”addOns\”:[\”sugar\”]}}</Response>\n</example>\n<example><Conversation> H: I want a medium americano.\n\nA: Okay, let me confirm:\n\n1 medium americano\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\”1\”:{\”item\”:\”americano\”,\”size\”:\”medium\”,\”addOns\”:[]}}</Response></example>\n<example><Conversation> H: I want a large latte with oatmilk.\n\nA: Okay, let me confirm:\n\nLarge latte with oatmilk\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\”1\”:{\”item\”:\”latte\”,\”size\”:\”large\”,\”addOns\”:[\”oatmilk\”]}}</Response></example>\n<example><Conversation> H: I want a small mocha with no whipped cream please.\n\nA: Okay, let me confirm:\n\nSmall mocha with no whipped cream\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\”1\”:{\”item\”:\”mocha\”,\”size\”:\”small\”,\”addOns\”:[\”no whipped cream\”]}}</Response>\n\n</example></instructions>\n\nPlease complete this request according to the instructions and examples provided above:<request><Conversation>REPLACEME</Conversation></request>\n\nAssistant:\n”, “max_tokens_to_sample”: 250, “temperature”: 0.3, “top_k”: 250, “top_p”: 0.75, “stop_sequences”: [“\n\nHuman:”, “\n\nhuman:”, “\n\nCustomer:”, “\n\ncustomer:”]}”

The following is the code for prompt_template_order_summary.txt:

“{“prompt”: “Human: I will give you some instructions to complete my request.\n<instructions>Given the Conversation between Human and Assistant, you need to create a summary of the order with bullet points and include the order total.\nHere are some examples:\n<example><Conversation> H: I want a large frappe and 2 small americanos with sugar.\n\nA: Okay, let me confirm:\n\n1 large frappe\n\n2 small americanos with sugar\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>10.50</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\n1 large frappe\n\n2 small americanos with sugar.\nYour Order total is $10.50</Response></example>\n<example><Conversation> H: I want a medium americano.\n\nA: Okay, let me confirm:\n\n1 medium americano\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>3.50</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\n1 medium americano.\nYour Order total is $3.50</Response></example>\n<example><Conversation> H: I want a large latte with oat milk.\n\nA: Okay, let me confirm:\n\nLarge latte with oat milk\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>6.75</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\nLarge latte with oat milk.\nYour Order total is $6.75</Response></example>\n<example><Conversation> H: I want a small mocha with no whipped cream please.\n\nA: Okay, let me confirm:\n\nSmall mocha with no whipped cream\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>4.25</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\nSmall mocha with no whipped cream.\nYour Order total is $6.75</Response>\n\n</example>\n</instructions>\n\nPlease complete this request according to the instructions and examples provided above:<request><Conversation>REPLACEME</Conversation>\n\n<OrderTotal>REPLACETOTAL</OrderTotal></request>\n\nAssistant:\n”, “max_tokens_to_sample”: 250, “temperature”: 0.3, “top_k”: 250, “top_p”: 0.75, “stop_sequences”: [“\n\nHuman:”, “\n\nhuman:”, “\n\nCustomer:”, “\n\ncustomer:”, “[Conversation]”]}”

As you can see, we have used our prompt templates to validate menu items, identify missing required information, create a data structure, and summarize the order. The foundational models available on Amazon Bedrock are very powerful, so you could accomplish even more tasks via these templates.
You have completed engineering the prompts and saved the templates to text files. You can now begin creating the Amazon Lex bot and the associated Lambda functions.
Create a Lambda layer with the prompt templates
Complete the following steps to create your Lambda layer:

In SageMaker Studio, create a new folder with a subfolder named python.
Copy your prompt files to the python folder.

You can add the ZIP library to your notebook instance by running the following command.

!conda install -y -c conda-forge zip

Now, run the following command to create the ZIP file for uploading to the Lambda layer.

!zip -r prompt_templates_layer.zip prompt_templates_layer/.

After you create the ZIP file, you can download the file. Go to Lambda, create a new layer by uploading the file directly or by uploading to Amazon S3 first.
Then attach this new layer to the orchestration Lambda function.

Now your prompt template files are locally stored in your Lambda runtime environment. This will speed up the process during your bot runs.
Create a Lambda layer with the required libraries
Complete the following steps to create your Lambda layer with the required librarues:

Open an AWS Cloud9 instance environment, create a folder with a subfolder called python.
Open a terminal inside the python folder.
Run the following commands from the terminal:

pip install “boto3>=1.28.57” -t .
pip install “awscli>=1.29.57″ -t .
pip install “botocore>=1.31.57” -t .

Run cd .. and position yourself inside your new folder where you also have the python subfolder.
Run the following command:

zip -r lambda-layer.zip

After you create the ZIP file, you can download the file. Go to Lambda, create a new layer by uploading the file directly or by uploading to Amazon S3 first.
Then attach this new layer to the orchestration Lambda function.

Create the bot in Amazon Lex v2
For this use case, we build an Amazon Lex bot that can provide an input/output interface for the architecture in order to call Amazon Bedrock using voice or text from any interface. Because the LLM will handle the conversation piece of this order processing agent, and Lambda will orchestrate the workflow, you can create a bot with three intents and no slots.

On the Amazon Lex console, create a new bot with the method Create a blank bot.

Now you can add an intent with any appropriate initial utterance for the end-users to start the conversation with the bot. We use simple greetings and add an initial bot response so end-users can provide their requests. When creating the bot, make sure to use a Lambda code hook with the intents; this will trigger a Lambda function that will orchestrate the workflow between the customer, Amazon Lex, and the LLM.

Add your first intent, which triggers the workflow and uses the intent validation prompt template to call Amazon Bedrock and identify what the customer is trying to accomplish. Add a few simple utterances for end-users to start conversation.

You don’t need to use any slots or initial reading in any of the bot intents. In fact, you don’t need to add utterances to the second or third intents. That is because the LLM will guide Lambda throughout the process.

Add a confirmation prompt. You can customize this message in the Lambda function later.

Under Code hooks, select Use a Lambda function for initialization and validation.

Create a second intent with no utterance and no initial response. This is the PlaceOrder intent.

When the LLM identifies that the customer is trying to place an order, the Lambda function will trigger this intent and validate the customer request against the menu, and make sure that no required information is missing. Remember that all of this is on the prompt templates, so you can adapt this workflow for any use case by changing the prompt templates.

Don’t add any slots, but add a confirmation prompt and decline response.

Select Use a Lambda function for initialization and validation.

Create a third intent named ProcessOrder with no sample utterances and no slots.
Add an initial response, a confirmation prompt, and a decline response.

After the LLM has validated the customer request, the Lambda function triggers the third and last intent to process the order. Here, Lambda will use the object creator template to generate the order JSON data structure to query the DynamoDB table, and then use the order summary template to summarize the whole order along with the total so Amazon Lex can pass it to the customer.

Select Use a Lambda function for initialization and validation. This can use any Lambda function to process the order after the customer has given the final confirmation.

After you create all three intents, go to the Visual builder for the ValidateIntent, add a go-to intent step, and connect the output of the positive confirmation to that step.
After you add the go-to intent, edit it and choose the PlaceOrder intent as the intent name.

Similarly, to go the Visual builder for the PlaceOrder intent and connect the output of the positive confirmation to the ProcessOrder go-to intent. No editing is required for the ProcessOrder intent.
You now need to create the Lambda function that orchestrates Amazon Lex and calls the DynamoDB table, as detailed in the following section.

Create a Lambda function to orchestrate the Amazon Lex bot
You can now build the Lambda function that orchestrates the Amazon Lex bot and workflow. Complete the following steps:

Create a Lambda function with the standard execution policy and let Lambda create a role for you.
In the code window of your function, add a few utility functions that will help: format the prompts by adding the lex context to the template, call the Amazon Bedrock LLM API, extract the desired text from the responses, and more. See the following code:

import json
import re
import boto3
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

bedrock = boto3.client(service_name=’bedrock-runtime’)
def CreatingCustomPromptFromLambdaLayer(object_key,replace_items):

folder_path = ‘/opt/order_processing_agent_prompt_templates/python/’
try:
file_path = folder_path + object_key
with open(file_path, “r”) as file1:
raw_template = file1.read()
# Modify the template with the custom input prompt
#template[‘inputs’][0].insert(1, {“role”: “user”, “content”: ‘### Input:n’ + user_request})
for key,value in replace_items.items():
value = json.dumps(json.dumps(value).replace(‘”‘,”)).replace(‘”‘,”)
raw_template = raw_template.replace(key,value)
modified_prompt = raw_template

return modified_prompt
except Exception as e:
return {
‘statusCode’: 500,
‘body’: f’An error occurred: {str(e)}’
}
def CreatingCustomPrompt(object_key,replace_items):
logger.debug(‘replace_items is: {}’.format(replace_items))
#retrieve user request from intent_request
#we first propmt the model with current order

bucket_name = ‘your-bucket-name’

#object_key = ‘prompt_template_order_processing.txt’
try:
s3 = boto3.client(‘s3’)
# Retrieve the existing template from S3
response = s3.get_object(Bucket=bucket_name, Key=object_key)
raw_template = response[‘Body’].read().decode(‘utf-8’)
raw_template = json.loads(raw_template)
logger.debug(‘raw template is {}’.format(raw_template))
#template_json = json.loads(raw_template)
#logger.debug(‘template_json is {}’.format(template_json))
#template = json.dumps(template_json)
#logger.debug(‘template is {}’.format(template))

# Modify the template with the custom input prompt
#template[‘inputs’][0].insert(1, {“role”: “user”, “content”: ‘### Input:n’ + user_request})
for key,value in replace_items.items():
raw_template = raw_template.replace(key,value)
logger.debug(“Replacing: {} nwith: {}”.format(key,value))
modified_prompt = json.dumps(raw_template)
logger.debug(“Modified template: {}”.format(modified_prompt))
logger.debug(“Modified template type is: {}”.format(print(type(modified_prompt))))

#modified_template_json = json.loads(modified_prompt)
#logger.debug(“Modified template json: {}”.format(modified_template_json))

return modified_prompt
except Exception as e:
return {
‘statusCode’: 500,
‘body’: f’An error occurred: {str(e)}’
}

def validate_intent(intent_request):
logger.debug(‘starting validate_intent: {}’.format(intent_request))
#retrieve user request from intent_request
user_request = ‘Human: ‘ + intent_request[‘inputTranscript’].lower()
#getting current context variable
current_session_attributes = intent_request[‘sessionState’][‘sessionAttributes’]
if len(current_session_attributes) > 0:
full_context = current_session_attributes[‘fullContext’] + ‘\n\n’ + user_request
dialog_context = current_session_attributes[‘dialogContext’] + ‘\n\n’ + user_request
else:
full_context = user_request
dialog_context = user_request
#Preparing validation prompt by adding context to prompt template
object_key = ‘prompt_template_intent_validator.txt’
#replace_items = {“REPLACEME”:full_context}
#replace_items = {“REPLACEME”:dialog_context}
replace_items = {“REPLACEME”:dialog_context}
#validation_prompt = CreatingCustomPrompt(object_key,replace_items)
validation_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)

#Prompting model for request validation
intent_validation_completion = prompt_bedrock(validation_prompt)
intent_validation_completion = re.sub(r'[“]’,”,intent_validation_completion)

#extracting response from response completion and removing some special characters
validation_response = extract_response(intent_validation_completion)
validation_intent = extract_intent(intent_validation_completion)

#business logic depending on intents
if validation_intent == ‘Place Order’:
return validate_request(intent_request)
elif validation_intent in [‘Complain’,’Speak to Someone’]:
##adding session attributes to keep current context
full_context = full_context + ‘\n\n’ + intent_validation_completion
dialog_context = dialog_context + ‘\n\nAssistant: ‘ + validation_response
intent_request[‘sessionState’][‘sessionAttributes’][‘fullContext’] = full_context
intent_request[‘sessionState’][‘sessionAttributes’][‘dialogContext’] = dialog_context
intent_request[‘sessionState’][‘sessionAttributes’][‘customerIntent’] = validation_intent
return close(intent_request[‘sessionState’][‘sessionAttributes’],intent_request[‘sessionState’][‘intent’][‘name’],’Fulfilled’,’Close’,validation_response)
if validation_intent == ‘Greeting’:
##adding session attributes to keep current context
full_context = full_context + ‘\n\n’ + intent_validation_completion
dialog_context = dialog_context + ‘\n\nAssistant: ‘ + validation_response
intent_request[‘sessionState’][‘sessionAttributes’][‘fullContext’] = full_context
intent_request[‘sessionState’][‘sessionAttributes’][‘dialogContext’] = dialog_context
intent_request[‘sessionState’][‘sessionAttributes’][‘customerIntent’] = validation_intent
return close(intent_request[‘sessionState’][‘sessionAttributes’],intent_request[‘sessionState’][‘intent’][‘name’],’InProgress’,’ConfirmIntent’,validation_response)

def validate_request(intent_request):
logger.debug(‘starting validate_request: {}’.format(intent_request))
#retrieve user request from intent_request
user_request = ‘Human: ‘ + intent_request[‘inputTranscript’].lower()
#getting current context variable
current_session_attributes = intent_request[‘sessionState’][‘sessionAttributes’]
if len(current_session_attributes) > 0:
full_context = current_session_attributes[‘fullContext’] + ‘\n\n’ + user_request
dialog_context = current_session_attributes[‘dialogContext’] + ‘\n\n’ + user_request
else:
full_context = user_request
dialog_context = user_request

#Preparing validation prompt by adding context to prompt template
object_key = ‘prompt_template_request_validator.txt’
replace_items = {“REPLACEME”:dialog_context}
#validation_prompt = CreatingCustomPrompt(object_key,replace_items)
validation_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)

#Prompting model for request validation
request_validation_completion = prompt_bedrock(validation_prompt)
request_validation_completion = re.sub(r'[“]’,”,request_validation_completion)

#extracting response from response completion and removing some special characters
validation_response = extract_response(request_validation_completion)

##adding session attributes to keep current context
full_context = full_context + ‘\n\n’ + request_validation_completion
dialog_context = dialog_context + ‘\n\nAssistant: ‘ + validation_response
intent_request[‘sessionState’][‘sessionAttributes’][‘fullContext’] = full_context
intent_request[‘sessionState’][‘sessionAttributes’][‘dialogContext’] = dialog_context

return close(intent_request[‘sessionState’][‘sessionAttributes’],’PlaceOrder’,’InProgress’,’ConfirmIntent’,validation_response)

def process_order(intent_request):
logger.debug(‘starting process_order: {}’.format(intent_request))

#retrieve user request from intent_request
user_request = ‘Human: ‘ + intent_request[‘inputTranscript’].lower()
#getting current context variable
current_session_attributes = intent_request[‘sessionState’][‘sessionAttributes’]
if len(current_session_attributes) > 0:
full_context = current_session_attributes[‘fullContext’] + ‘\n\n’ + user_request
dialog_context = current_session_attributes[‘dialogContext’] + ‘\n\n’ + user_request
else:
full_context = user_request
dialog_context = user_request
# Preparing object creator prompt by adding context to prompt template
object_key = ‘prompt_template_object_creator.txt’
replace_items = {“REPLACEME”:dialog_context}
#object_creator_prompt = CreatingCustomPrompt(object_key,replace_items)
object_creator_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)
#Prompting model for object creation
object_creation_completion = prompt_bedrock(object_creator_prompt)
#extracting response from response completion
object_creation_response = extract_response(object_creation_completion)
inputParams = json.loads(object_creation_response)
inputParams = json.dumps(json.dumps(inputParams))
logger.debug(‘inputParams is: {}’.format(inputParams))
client = boto3.client(‘lambda’)
response = client.invoke(FunctionName = ‘arn:aws:lambda:us-east-1:<AccountNumber>:function:aws-blog-order-validator’,InvocationType = ‘RequestResponse’,Payload = inputParams)
responseFromChild = json.load(response[‘Payload’])
validationResult = responseFromChild[‘statusCode’]
if validationResult == 205:
order_validation_error = responseFromChild[‘validator_response’]
return close(intent_request[‘sessionState’][‘sessionAttributes’],’PlaceOrder’,’InProgress’,’ConfirmIntent’,order_validation_error)
#invokes Order Processing lambda to query DynamoDB table and returns order total
response = client.invoke(FunctionName = ‘arn:aws:lambda:us-east-1: <AccountNumber>:function:aws-blog-order-processing’,InvocationType = ‘RequestResponse’,Payload = inputParams)
responseFromChild = json.load(response[‘Payload’])
orderTotal = responseFromChild[‘body’]
###Prompting the model to summarize the order along with order total
object_key = ‘prompt_template_order_summary.txt’
replace_items = {“REPLACEME”:dialog_context,”REPLACETOTAL”:orderTotal}
#order_summary_prompt = CreatingCustomPrompt(object_key,replace_items)
order_summary_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)
order_summary_completion = prompt_bedrock(order_summary_prompt)
#extracting response from response completion
order_summary_response = extract_response(order_summary_completion)
order_summary_response = order_summary_response + ‘. Shall I finalize processing your order?’
##adding session attributes to keep current context
full_context = full_context + ‘\n\n’ + order_summary_completion
dialog_context = dialog_context + ‘\n\nAssistant: ‘ + order_summary_response
intent_request[‘sessionState’][‘sessionAttributes’][‘fullContext’] = full_context
intent_request[‘sessionState’][‘sessionAttributes’][‘dialogContext’] = dialog_context
return close(intent_request[‘sessionState’][‘sessionAttributes’],’ProcessOrder’,’InProgress’,’ConfirmIntent’,order_summary_response)

“”” — Main handler and Workflow functions — “””

def lambda_handler(event, context):
“””
Route the incoming request based on intent.
The JSON body of the request is provided in the event slot.
“””
logger.debug(‘event is: {}’.format(event))

return dispatch(event)

def dispatch(intent_request):
“””
Called when the user specifies an intent for this bot. If intent is not valid then returns error name
“””
logger.debug(‘intent_request is: {}’.format(intent_request))
intent_name = intent_request[‘sessionState’][‘intent’][‘name’]
confirmation_state = intent_request[‘sessionState’][‘intent’][‘confirmationState’]
# Dispatch to your bot’s intent handlers
if intent_name == ‘ValidateIntent’ and confirmation_state == ‘None’:
return validate_intent(intent_request)
if intent_name == ‘PlaceOrder’ and confirmation_state == ‘None’:
return validate_request(intent_request)
elif intent_name == ‘PlaceOrder’ and confirmation_state == ‘Confirmed’:
return process_order(intent_request)
elif intent_name == ‘PlaceOrder’ and confirmation_state == ‘Denied’:
return close(intent_request[‘sessionState’][‘sessionAttributes’],intent_request[‘sessionState’][‘intent’][‘name’],’Fulfilled’,’Close’,’Got it. Let me know if I can help you with something else.’)
elif intent_name == ‘PlaceOrder’ and confirmation_state not in [‘Denied’,’Confirmed’,’None’]:
return close(intent_request[‘sessionState’][‘sessionAttributes’],intent_request[‘sessionState’][‘intent’][‘name’],’Fulfilled’,’Close’,’Sorry. I am having trouble completing the request. Let me get someone to help you.’)
logger.debug(‘exiting intent {} here’.format(intent_request[‘sessionState’][‘intent’][‘name’]))
elif intent_name == ‘ProcessOrder’ and confirmation_state == ‘None’:
return validate_request(intent_request)
elif intent_name == ‘ProcessOrder’ and confirmation_state == ‘Confirmed’:
return close(intent_request[‘sessionState’][‘sessionAttributes’],intent_request[‘sessionState’][‘intent’][‘name’],’Fulfilled’,’Close’,’Perfect! Your order has been processed. Please proceed to payment.’)
elif intent_name == ‘ProcessOrder’ and confirmation_state == ‘Denied’:
return close(intent_request[‘sessionState’][‘sessionAttributes’],intent_request[‘sessionState’][‘intent’][‘name’],’Fulfilled’,’Close’,’Got it. Let me know if I can help you with something else.’)
elif intent_name == ‘ProcessOrder’ and confirmation_state not in [‘Denied’,’Confirmed’,’None’]:
return close(intent_request[‘sessionState’][‘sessionAttributes’],intent_request[‘sessionState’][‘intent’][‘name’],’Fulfilled’,’Close’,’Sorry. I am having trouble completing the request. Let me get someone to help you.’)
logger.debug(‘exiting intent {} here’.format(intent_request[‘sessionState’][‘intent’][‘name’]))
raise Exception(‘Intent with name ‘ + intent_name + ‘ not supported’)

def prompt_bedrock(formatted_template):
logger.debug(‘prompt bedrock input is:’.format(formatted_template))
body = json.loads(formatted_template)

modelId = ‘anthropic.claude-v2’ # change this to use a different version from the model provider
accept = ‘application/json’
contentType = ‘application/json’

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get(‘body’).read())
response_completion = response_body.get(‘completion’)
logger.debug(‘response is: {}’.format(response_completion))

#print_ww(response_body.get(‘completion’))
#print(response_body.get(‘results’)[0].get(‘outputText’))
return response_completion

#function to extract text between the <Response> and </Response> tags within model completion
def extract_response(response_completion):

if ‘<Response>’ in response_completion:
customer_response = response_completion.replace(‘<Response>’,’||’).replace(‘</Response>’,”).split(‘||’)[1]

logger.debug(‘modified response is: {}’.format(response_completion))

return customer_response
else:

logger.debug(‘modified response is: {}’.format(response_completion))

return response_completion

#function to extract text between the <Response> and </Response> tags within model completion
def extract_intent(response_completion):
if ‘<intent>’ in response_completion:
customer_intent = response_completion.replace(‘<intent>’,’||’).replace(‘</intent>’,’||’).split(‘||’)[1]
return customer_intent
else:
return customer_intent

def close(session_attributes, intent, fulfillment_state, action_type, message):
#This function prepares the response in the appropiate format for Lex V2

response = {
“sessionState”: {
“sessionAttributes”:session_attributes,
“dialogAction”: {
“type”: action_type
},
“intent”: {
“name”:intent,
“state”:fulfillment_state

},

},
“messages”:
[{
“contentType”:”PlainText”,
“content”:message,
}]
,

}
return response

Attach the Lambda layer you created earlier to this function.
Additionally, attach the layer to the prompt templates you created.
In the Lambda execution role, attach the policy to access Amazon Bedrock, which was created earlier.

The Lambda execution role should have the following permissions.
Attach the Orchestration Lambda function to the Amazon Lex bot

After you create the function in the previous section, return to the Amazon Lex console and navigate to your bot.
Under Languages in the navigation pane, choose English.
For Source, choose your order processing bot.
For Lambda function version or alias, choose $LATEST.
Choose Save.

Create assisting Lambda functions
Complete the following steps to create additional Lambda functions:

Create a Lambda function to query the DynamoDB table that you created earlier:

import json
import boto3
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
# Initialize the DynamoDB client
dynamodb = boto3.resource(‘dynamodb’)
table = dynamodb.Table(‘your-table-name’)

def calculate_grand_total(input_data):
# Initialize the total price
total_price = 0

try:
# Loop through each item in the input JSON
for item_id, item_data in input_data.items():
item_name = item_data[‘item’].lower() # Convert item name to lowercase
item_size = item_data[‘size’].lower() # Convert item size to lowercase

# Query the DynamoDB table for the item based on Item and Size
response = table.get_item(
Key={‘Item’: item_name,
‘Size’: item_size}
)

# Check if the item was found in the table
if ‘Item’ in response:
item = response[‘Item’]
price = float(item[‘Price’])
total_price += price # Add the item’s price to the total

return total_price
except Exception as e:
raise Exception(‘An error occurred: {}’.format(str(e)))

def lambda_handler(event, context):
try:

# Parse the input JSON from the Lambda event
input_json = json.loads(event)

# Calculate the grand total
grand_total = calculate_grand_total(input_json)

# Return the grand total in the response
return {‘statusCode’: 200,’body’: json.dumps(grand_total)}
except Exception as e:
return {
‘statusCode’: 500,
‘body’: json.dumps(‘An error occurred: {}’.format(str(e)))

Navigate to the Configuration tab in the Lambda function and choose Permissions.
Attach a resource-based policy statement allowing the order processing Lambda function to invoke this function.

Navigate to the IAM execution role for this Lambda function and add a policy to access the DynamoDB table.

Create another Lambda function to validate if all required attributes were passed from the customer. In the following example, we validate if the size attribute is captured for an order:

import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

def lambda_handler(event, context):
# Define customer orders from the input event
customer_orders = json.loads(event)

# Initialize a list to collect error messages
order_errors = {}
missing_size = []
error_messages = []
# Iterate through each order in customer_orders
for order_id, order in customer_orders.items():
if “size” not in order or order[“size”] == “”:
missing_size.append(order[‘item’])
order_errors[‘size’] = missing_size
if order_errors:
items_missing_size = order_errors[‘size’]
error_message = f”could you please provide the size for the following items: {‘, ‘.join(items_missing_size)}?”
error_messages.append(error_message)

# Prepare the response message
if error_messages:
response_message = “n”.join(error_messages)
return {
‘statusCode’: 205,
‘validator_response’: response_message
}
else:
response_message = “Order is validated successfully”
return {
‘statusCode’: 200,
‘validator_response’: response_message
}

Navigate to the Configuration tab in the Lambda function and choose Permissions.
Attach a resource-based policy statement allowing the order processing Lambda function to invoke this function.

Test the solution
Now we can test the solution with example orders that customers place via Amazon Lex.
For our first example, the customer asked for a frappuccino, which is not on the menu. The model validates with the help of order validator template and suggests some recommendations based on the menu. After the customer confirms their order, they are notified of the order total and order summary. The order will be processed based on the customer’s final confirmation.

In our next example, the customer is ordering for large cappuccino and then modifying the size from large to medium. The model captures all necessary changes and requests the customer to confirm the order. The model presents the order total and order summary, and processes the order based on the customer’s final confirmation.

For our final example, the customer placed an order for multiple items and the size is missing for a couple of items. The model and Lambda function will verify if all required attributes are present to process the order and then ask the customer to provide the missing information. After the customer provides the missing information (in this case, the size of the coffee), they’re shown the order total and order summary. The order will be processed based on the customer’s final confirmation.

LLM limitations
LLM outputs are stochastic by nature, which means that the results from our LLM can vary in format, or even in the form of untruthful content (hallucinations). Therefore, developers need to rely on a good error handling logic throughout their code in order to handle these scenarios and avoid a degraded end-user experience.
Clean up
If you no longer need this solution, you can delete the following resources:

Lambda functions
Amazon Lex box
DynamoDB table
S3 bucket

Additionally, shut down the SageMaker Studio instance if the application is no longer required.
Cost assessment
For pricing information for the main services used by this solution, see the following:

Amazon Bedrock Pricing
Amazon DynamoDB Pricing
AWS Lambda Pricing
Amazon Lex Pricing
Amazon S3 Pricing

Note that you can use Claude v2 without the need for provisioning, so overall costs remain at a minimum. To further reduce costs, you can configure the DynamoDB table with the on-demand setting.
Conclusion
This post demonstrated how to build a speech-enabled AI order processing agent using Amazon Lex, Amazon Bedrock, and other AWS services. We showed how prompt engineering with a powerful generative AI model like Claude can enable robust natural language understanding and conversation flows for order processing without the need for extensive training data.
The solution architecture uses serverless components like Lambda, Amazon S3, and DynamoDB to enable a flexible and scalable implementation. Storing the prompt templates in Amazon S3 allows you to customize the solution for different use cases.
Next steps could include expanding the agent’s capabilities to handle a wider range of customer requests and edge cases. The prompt templates provide a way to iteratively improve the agent’s skills. Additional customizations could involve integrating the order data with backend systems like inventory, CRM, or POS. Lastly, the agent could be made available across various customer touchpoints like mobile apps, drive-thru, kiosks, and more using the multi-channel capabilities of Amazon Lex.
To learn more, refer to the following related resources:

Deploying and managing multi-channel bots:

Deploying bots
Deploying an Amazon Lex Bot in Mobile Applications

Prompt engineering for Claude and other models:

Introduction to prompt design
Prompt Engineering Guide
Prompt engineering for foundation models

Serverless architectural patterns for scalable AI assistants:

AWS Architecture Blog

About the Authors
Moumita Dutta is a Partner Solution Architect at Amazon Web Services. In her role, she collaborates closely with partners to develop scalable and reusable assets that streamline cloud deployments and enhance operational efficiency. She is a member of AI/ML community and a Generative AI expert at AWS. In her leisure, she enjoys gardening and cycling.
Fernando Lammoglia is a Partner Solutions Architect at Amazon Web Services, working closely with AWS partners in spearheading the development and adoption of cutting-edge AI solutions across business units. A strategic leader with expertise in cloud architecture, generative AI, machine learning, and data analytics. He specializes in executing go-to-market strategies and delivering impactful AI solutions aligned with organizational goals. On his free time he loves to spend time with his family and travel to other countries.
Mitul Patel is a Senior Solution Architect at Amazon Web Services. In his role as a cloud technology enabler, he works with customers to understand their goals and challenges, and provides prescriptive guidance to achieve their objective with AWS offerings. He is a member of AI/ML community and a Generative AI ambassador at AWS. In his free time, he enjoys hiking and playing soccer.

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMake …

This post is co-written with Chaoyang He, Al Nevarez and Salman Avestimehr from FedML.
Many organizations are implementing machine learning (ML) to enhance their business decision-making through automation and the use of large distributed datasets. With increased access to data, ML has the potential to provide unparalleled business insights and opportunities. However, the sharing of raw, non-sanitized sensitive information across different locations poses significant security and privacy risks, especially in regulated industries such as healthcare.
To address this issue, federated learning (FL) is a decentralized and collaborative ML training technique that offers data privacy while maintaining accuracy and fidelity. Unlike traditional ML training, FL training occurs within an isolated client location using an independent secure session. The client only shares its output model parameters with a centralized server, known as the training coordinator or aggregation server, and not the actual data used to train the model. This approach alleviates many data privacy concerns while enabling effective collaboration on model training.
Although FL is a step towards achieving better data privacy and security, it’s not a guaranteed solution. Insecure networks lacking access control and encryption can still expose sensitive information to attackers. Additionally, locally trained information can expose private data if reconstructed through an inference attack. To mitigate these risks, the FL model uses personalized training algorithms and effective masking and parameterization before sharing information with the training coordinator. Strong network controls at local and centralized locations can further reduce inference and exfiltration risks.
In this post, we share an FL approach using FedML, Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker to improve patient outcomes while addressing data privacy and security concerns.
The need for federated learning in healthcare
Healthcare relies heavily on distributed data sources to make accurate predictions and assessments about patient care. Limiting the available data sources to protect privacy negatively affects result accuracy and, ultimately, the quality of patient care. Therefore, ML creates challenges for AWS customers who need to ensure privacy and security across distributed entities without compromising patient outcomes.
Healthcare organizations must navigate strict compliance regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, while implementing FL solutions. Ensuring data privacy, security, and compliance becomes even more critical in healthcare, requiring robust encryption, access controls, auditing mechanisms, and secure communication protocols. Additionally, healthcare datasets often contain complex and heterogeneous data types, making data standardization and interoperability a challenge in FL settings.
Use case overview
The use case outlined in this post is of heart disease data in different organizations, on which an ML model will run classification algorithms to predict heart disease in the patient. Because this data is across organizations, we use federated learning to collate the findings.
The Heart Disease dataset from the University of California Irvine’s Machine Learning Repository is a widely used dataset for cardiovascular research and predictive modeling. It consists of 303 samples, each representing a patient, and contains a combination of clinical and demographic attributes, as well as the presence or absence of heart disease.
This multivariate dataset has 76 attributes in the patient information, out of which 14 attributes are most commonly used for developing and evaluating ML algorithms to predict the presence of heart disease based on the given attributes.
FedML framework
There is a wide selection of FL frameworks, but we decided to use the FedML framework for this use case because it is open source and supports several FL paradigms. FedML provides a popular open source library, MLOps platform, and application ecosystem for FL. These facilitate the development and deployment of FL solutions. It provides a comprehensive suite of tools, libraries, and algorithms that enable researchers and practitioners to implement and experiment with FL algorithms in a distributed environment. FedML addresses the challenges of data privacy, communication, and model aggregation in FL, offering a user-friendly interface and customizable components. With its focus on collaboration and knowledge sharing, FedML aims to accelerate the adoption of FL and drive innovation in this emerging field. The FedML framework is model agnostic, including recently added support for large language models (LLMs). For more information, refer to Releasing FedLLM: Build Your Own Large Language Models on Proprietary Data using the FedML Platform.
FedML Octopus
System hierarchy and heterogeneity is a key challenge in real-life FL use cases, where different data silos may have different infrastructure with CPU and GPUs. In such scenarios, you can use FedML Octopus.
FedML Octopus is the industrial-grade platform of cross-silo FL for cross-organization and cross-account training. Coupled with FedML MLOps, it enables developers or organizations to conduct open collaboration from anywhere at any scale in a secure manner. FedML Octopus runs a distributed training paradigm inside each data silo and uses synchronous or asynchronous trainings.
FedML MLOps
FedML MLOps enables local development of code that can later be deployed anywhere using FedML frameworks. Before initiating training, you must create a FedML account, as well as create and upload the server and client packages in FedML Octopus. For more details, refer to steps and Introducing FedML Octopus: scaling federated learning into production with simplified MLOps.
Solution overview
We deploy FedML into multiple EKS clusters integrated with SageMaker for experiment tracking. We use Amazon EKS Blueprints for Terraform to deploy the required infrastructure. EKS Blueprints helps compose complete EKS clusters that are fully bootstrapped with the operational software that is needed to deploy and operate workloads. With EKS Blueprints, the configuration for the desired state of EKS environment, such as the control plane, worker nodes, and Kubernetes add-ons, is described as an infrastructure as code (IaC) blueprint. After a blueprint is configured, it can be used to create consistent environments across multiple AWS accounts and Regions using continuous deployment automation.
The content shared in this post reflects real-life situations and experiences, but it’s important to note that the deployment of these situations in different locations may vary. Although we utilize a single AWS account with separate VPCs, it’s crucial to understand that individual circumstances and configurations may differ. Therefore, the information provided should be used as a general guide and may require adaptation based on specific requirements and local conditions.
The following diagram illustrates our solution architecture.

In addition to the tracking provided by FedML MLOps for each training run, we use Amazon SageMaker Experiments to track the performance of each client model and the centralized (aggregator) model.
SageMaker Experiments is a capability of SageMaker that lets you create, manage, analyze, and compare your ML experiments. By recording experiment details, parameters, and results, researchers can accurately reproduce and validate their work. It allows for effective comparison and analysis of different approaches, leading to informed decision-making. Additionally, tracking experiments facilitates iterative improvement by providing insights into the progression of models and enabling researchers to learn from previous iterations, ultimately accelerating the development of more effective solutions.
We send the following to SageMaker Experiments for each run:

Model evaluation metrics – Training loss and Area Under the Curve (AUC)
Hyperparameters – Epoch, learning rate, batch size, optimizer, and weight decay

Prerequisites
To follow along with this post, you should have the following prerequisites:

An AWS account
Local access to the AWS Command Line Interface (AWS CLI) or usage of AWS CloudShell
Terraform
kubectl
A FedML account ID

Deploy the solution
To begin, clone the repository hosting the sample code locally:

git clone git@ssh.gitlab.aws.dev:west-ml-sa/fl_fedml.ai.git

Then deploy the use case infrastructure using the following commands:

terraform init
terraform apply

The Terraform template may take 20–30 minutes to fully deploy. After it’s deployed, follow the steps in the next sections to run the FL application.
Create an MLOps deployment package
As a part of the FedML documentation, we need to create the client and server packages, which the MLOps platform will distribute to the server and clients to begin training.
To create these packages, run the following script found in the root directory:

. ./build_mlops_pkg.sh

This will create the respective packages in the following directory in the project’s root directory:

mlops/dist-packages

Upload the packages to the FedML MLOps platform
Complete the following steps to upload the packages:

On the FedML UI, choose My Applications in the navigation pane.
Choose New Application.
Upload the client and server packages from your workstation.
You can also adjust the hyperparameters or create new ones.

Trigger federated training
To run federated training, complete the following steps:

On the FedML UI, choose Project List in the navigation pane.
Choose Create a new project.
Enter a group name and a project name, then choose OK.
Choose the newly created project and choose Create new run to trigger a training run.
Select the edge client devices and the central aggregator server for this training run.
Choose the application that you created in the previous steps.
Update any of the hyperparameters or use the default settings.
Choose Start to start training.
Choose the Training Status tab and wait for the training run to complete. You can also navigate to the tabs available.
When training is complete, choose the System tab to see the training time durations on your edge servers and aggregation events.

View results and experiment details
When the training is complete, you can view the results using FedML and SageMaker.
On the FedML UI, on the Models tab, you can see the aggregator and client model. You can also download these models from the website.
You can also log in to Amazon SageMaker Studio and choose Experiments in the navigation pane.
The following screenshot shows the logged experiments.

Experiment tracking code
In this section, we explore the code that integrates SageMaker experiment tracking with the FL framework training.
In an editor of your choice, open the following folder to see the edits to the code to inject SageMaker experiment tracking code as a part of the training:

cd fl_fedml.ai/

For tracking the training, we create a SageMaker experiment with parameters and metrics logged using the log_parameter and log_metric command as outlined in the following code sample.
An entry in the config/fedml_config.yaml file declares the experiment prefix, which is referenced in the code to create unique experiment names: sm_experiment_name: “fed-heart-disease”. You can update this to any value of your choice.
For example, see the following code for the heart_disease_trainer.py, which is used by each client to train the model on their own dataset:

# Add this code before the for loop on epochs
# We are passing the experiment prefix & client-rank from the config
# to the function to create a unique name
experiment_name = unique_name_from_base(args.sm_experiment_name + “-client-” + str(args.rank))
print(f”Sagemaker Experiment Name: {experiment_name}”)

For each client run, the experiment details are tracked using the following code in heart_disease_trainer.py:

# create an experiment and start a new run
with Run(experiment_name=experiment_name, run_name=run_name, sagemaker_session=Session()) as run:
run.log_parameters(
{ “Train Data Size”: str(len(train_data.dataset)),
“device”: “cpu”,
“center”: args.rank,
“learning-rate”: args.lr,
“batch-size”: args.batch_size,
“client-optimizer” : args.client_optimizer,
“weight-decay”: args.weight_decay
}
)
run.log_metric(name=”Validation:AUC”, value=epoch_auc)
run.log_metric(name=”Training:Loss”, value=epoch_loss)

Similarly, you can use the code in heart_disease_aggregator.py to run a test on local data after updating the model weights. The details are logged after each communication run with the clients.

# create an experiment and start a new run
with Run(experiment_name=experiment_name, run_name=run_name, sagemaker_session=Session()) as run:
run.log_parameters(
{ “Train Data Size”: str(len(test_data_local_dict[i])),
“device”: “cpu”,
“round”: i,
“learning-rate”: args.lr,
“batch-size”: args.batch_size,
“client-optimizer” : args.client_optimizer,
“weight-decay”: args.weight_decay
}
)
run.log_metric(name=”Test:AUC”, value=test_auc_metrics)
run.log_metric(name=”Test:Loss”, value=test_loss_metrics)

Clean up
When you’re done with the solution, make sure to clean up the resources used to ensure efficient resource utilization and cost management, and avoid unnecessary expenses and resource wastage. Active tidying up the environment, such as deleting unused instances, stopping unnecessary services, and removing temporary data, contributes to a clean and organized infrastructure. You can use the following code to clean up your resources:

terraform destroy -target=module.m_fedml_edge_server.module.eks_blueprints_kubernetes_addons -auto-approve
terraform destroy -target=module.m_fedml_edge_client_1.module.eks_blueprints_kubernetes_addons -auto-approve
terraform destroy -target=module.m_fedml_edge_client_2.module.eks_blueprints_kubernetes_addons -auto-approve

terraform destroy -target=module.m_fedml_edge_client_1.module.eks -auto-approve
terraform destroy -target=module.m_fedml_edge_client_2.module.eks -auto-approve
terraform destroy -target=module.m_fedml_edge_server.module.eks -auto-approve

terraform destroy

Summary
By using Amazon EKS as the infrastructure and FedML as the framework for FL, we are able to provide a scalable and managed environment for training and deploying shared models while respecting data privacy. With the decentralized nature of FL, organizations can collaborate securely, unlock the potential of distributed data, and improve ML models without compromising data privacy.
As always, AWS welcomes your feedback. Please leave your thoughts and questions in the comments section.

About the Authors
Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. He entered the big data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences, including Strata and GlueCon.
Arnab Sinha is a Senior Solutions Architect for AWS, acting as Field CTO to help organizations design and build scalable solutions supporting business outcomes across data center migrations, digital transformation and application modernization, big data, and machine learning. He has supported customers across a variety of industries, including energy, retail, manufacturing, healthcare, and life sciences. Arnab holds all AWS Certifications, including the ML Specialty Certification. Prior to joining AWS, Arnab was a technology leader and previously held architect and engineering leadership roles.
Prachi Kulkarni is a Senior Solutions Architect at AWS. Her specialization is machine learning, and she is actively working on designing solutions using various AWS ML, big data, and analytics offerings. Prachi has experience in multiple domains, including healthcare, benefits, retail, and education, and has worked in a range of positions in product engineering and architecture, management, and customer success.
Tamer Sherif is a Principal Solutions Architect at AWS, with a diverse background in the technology and enterprise consulting services realm, spanning over 17 years as a Solutions Architect. With a focus on infrastructure, Tamer’s expertise covers a broad spectrum of industry verticals, including commercial, healthcare, automotive, public sector, manufacturing, oil and gas, media services, and more. His proficiency extends to various domains, such as cloud architecture, edge computing, networking, storage, virtualization, business productivity, and technical leadership.
Hans Nesbitt is a Senior Solutions Architect at AWS based out of Southern California. He works with customers across the western US to craft highly scalable, flexible, and resilient cloud architectures. In his spare time, he enjoys spending time with his family, cooking, and playing guitar.
Chaoyang He is Co-founder and CTO of FedML, Inc., a startup running for a community building open and collaborative AI from anywhere at any scale. His research focuses on distributed and federated machine learning algorithms, systems, and applications. He received his PhD in Computer Science from the University of Southern California.
Al Nevarez is Director of Product Management at FedML. Before FedML, he was a group product manager at Google, and a senior manager of data science at LinkedIn. He has several data product-related patents, and he studied engineering at Stanford University.
Salman Avestimehr is Co-founder and CEO of FedML. He has been a Dean’s Professor at USC, Director of the USC-Amazon Center on Trustworthy AI, and an Amazon Scholar in Alexa AI. He is an expert on federated and decentralized machine learning, information theory, security, and privacy. He is a Fellow of IEEE and received his PhD in EECS from UC Berkeley.
Samir Lad is an accomplished enterprise technologist with AWS who works closely with customers’ C-level executives. As a former C-suite executive who has driven transformations across multiple Fortune 100 companies, Samir shares his invaluable experiences to help his clients succeed in their own transformation journey.
Stephen Kraemer is a Board and CxO advisor and former executive at AWS. Stephen advocates culture and leadership as the foundations of success. He professes security and innovation the drivers of cloud transformation enabling highly competitive, data-driven organizations.

Enable data sharing through federated learning: A policy approach for …

This is a guest blog post written by Nitin Kumar, a Lead Data Scientist at T and T Consulting Services, Inc.
In this post, we discuss the value and potential impact of federated learning in the healthcare field. This approach can help heart stroke patients, doctors, and researchers with faster diagnosis, enriched decision-making, and more informed, inclusive research work on stroke-related health issues, using a cloud-native approach with AWS services for lightweight lift and straightforward adoption.
Diagnosis challenges with heart strokes
Statistics from the Centers for Disease Control and Prevention (CDC) show that each year in the US, more than 795,000 people suffer from their first stroke, and about 25% of them experience recurrent attacks. It is the number five cause of death according to the American Stroke Association and a leading cause of disability in the US. Therefore, it’s crucial to have prompt diagnosis and treatment to reduce brain damage and other complications in acute stroke patients.
CTs and MRIs are the gold standard in imaging technologies for classifying different sub-types of strokes and are crucial during preliminary assessment of patients, determining the root cause, and treatment. One critical challenge here, especially in the case of acute stroke, is the time of imaging diagnosis, which on average ranges from 30 minutes up to an hour and can be much longer depending on emergency department crowding.
Doctors and medical staff need quick and accurate image diagnosis to evaluate a patient’s condition and propose treatment options. In Dr. Werner Vogels’s own words at AWS re:Invent 2023, “every second that a person has a stroke counts.” Stroke victims can lose around 1.9 billion neurons every second they are not being treated.
Medical data restrictions
You can use machine learning (ML) to assist doctors and researchers in diagnosis tasks, thereby speeding up the process. However, the datasets needed to build the ML models and give reliable results are sitting in silos across different healthcare systems and organizations. This isolated legacy data has the potential for massive impact if cumulated. So why hasn’t it been used yet?
There are multiple challenges when working with medical domain datasets and building ML solutions, including patient privacy, security of personal data, and certain bureaucratic and policy restrictions. Additionally, research institutions have been tightening their data sharing practices. These obstacles also prevent international research teams from working together on diverse and rich datasets, which could save lives and prevent disabilities that can result from heart strokes, among other benefits.
Policies and regulations like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPPA), and California Consumer Privacy Act (CCPA) put guardrails on sharing data from the medical domain, especially patient data. Additionally, the datasets at individual institutes, organizations, and hospitals are often too small, are unbalanced, or have biased distribution, leading to model generalization constraints.
Federated learning: An introduction
Federated learning (FL) is a decentralized form of ML—a dynamic engineering approach. In this decentralized ML approach, the ML model is shared between organizations for training on proprietary data subsets, unlike traditional centralized ML training, where the model generally trains on aggregated datasets. The data stays protected behind the organization’s firewalls or VPC, while the model with its metadata is shared.
In the training phase, a global FL model is disseminated and synchronized between unit organizations for training on individual datasets, and a local trained model is returned. The final global model is available to use to make predictions for everyone among the participants, and can also be used as a base for further training to build local custom models for participating organizations. It can further be extended to benefit other institutes. This approach can significantly reduce the cybersecurity requirements for data in transit by removing the need for data to transit outside of the organization’s boundaries at all.
The following diagram illustrates an example architecture.

In the following sections, we discuss how federated learning can help.
Federation learning to save the day (and save lives)
For good artificial intelligence (AI), you need good data.
Legacy systems, which are frequently found in the federal domain, pose significant data processing challenges before you can derive any intelligence or merge them with newer datasets. This is an obstacle in providing valuable intelligence to leaders. It can lead to inaccurate decision-making because the proportion of legacy data is sometimes much more valuable compared to the newer small dataset. You want to resolve this bottleneck effectively and without workloads of manual consolidation and integration efforts (including cumbersome mapping processes) for legacy and newer datasets sitting across hospitals and institutes, which can take many months—if not years, in many cases. The legacy data is quite valuable because it holds important contextual information needed for accurate decision-making and well-informed model training, leading to reliable AI in the real world. Duration of data informs on long-term variations and patterns in the dataset that would otherwise go undetected and lead to biased and ill-informed predictions.
Breaking down these data silos to unite the untapped potential of the scattered data can save and transform many lives. It can also accelerate the research related to secondary health issues arising from heart strokes. This solution can help you share insights from data isolated between institutes due to policy and other reasons, whether you are a hospital, a research institute, or other health data-focused organizations. It can enable informed decisions on research direction and diagnosis. Additionally, it results in a centralized repository of intelligence via a secure, private, and global knowledge base.

Federated learning has many benefits in general and specifically for medical data settings.
Security and Privacy features:

Keeps sensitive data away from the internet and still uses it for ML, and harnesses its intelligence with differential privacy
Enables you to build, train, and deploy unbiased and robust models across not just machines but also networks, without any data security hazards
Overcomes the hurdles with multiple vendors managing the data
Eliminates the need for cross-site data sharing and global governance
Preserves privacy with differential privacy and offers secure multi-party computation with local training

Performance Improvements:

Addresses the small sample size problem in the medical imaging space and costly labeling processes
Balances the distribution of the data
Enables you to incorporate most traditional ML and deep learning (DL) methods
Uses pooled image sets to help improve statistical power, overcoming the sample size limitation of individual institutions

Resilience Benefits:

If any one party decides to leave, it won’t hinder the training
A new hospital or institute can join at any time; it’s not reliant on any specific dataset with any node organization
There is no need for extensive data engineering pipelines for the legacy data scattered across widespread geographical locations

These features can help bring the walls down between institutions hosting isolated datasets on similar domains. The solution can become a force multiplier by harnessing the unified powers of distributed datasets and improving efficiency by radically transforming the scalability aspect without the heavy infrastructure lift. This approach helps ML reach its full potential, becoming proficient at the clinical level and not just research.
Federated learning has comparable performance to regular ML, as shown in the following experiment by NVidia Clara (on Medical Modal ARchive (MMAR) using the BRATS2018 dataset). Here, FL achieved a comparable segmentation performance compared to training with centralized data: over 80% with approximately 600 epochs while training a multi-modal, multi-class brain tumor segmentation task.
Federated learning has been tested recently in a few medical sub-fields for use cases including patient similarity learning, patient representation learning, phenotyping, and predictive modeling.
Application blueprint: Federated learning makes it possible and straightforward
To get started with FL, you can choose from many high-quality datasets. For example, datasets with brain images include ABIDE (Autism Brain Imaging Data Exchange initiative), ADNI (Alzheimer’s Disease Neuroimaging Initiative), RSNA (Radiological Society of North America) Brain CT, BraTS (Multimodal Brain Tumor Image Segmentation Benchmark) updated regularly for the Brain Tumor Segmentation Challenge under UPenn (University of Pennsylvania), UK BioBank (covered in the following NIH paper), and IXI. Similarly for heart images, you can choose from several publicly available options, including ACDC (Automatic Cardiac Diagnosis Challenge), which is a cardiac MRI assessment dataset with full annotation mentioned by the National Library of Medicine in the following paper, and M&M (Multi-Center, Multi-Vendor, and Multi-Disease) Cardiac Segmentation Challenge mentioned in the following IEEE paper.
The following images show a probabilistic lesion overlap map for the primary lesions from the ATLAS R1.1 dataset. (Strokes are one of the most common causes of brain lesions according to Cleveland Clinic.)

For Electronic Health Records (EHR) data, a few datasets are available that follow the Fast Healthcare Interoperability Resources (FHIR) standard. This standard helps you build straightforward pilots by removing certain challenges with heterogenous, non-normalized datasets, allowing for seamless and secure exchange, sharing, and integration of datasets. The FHIR enables maximum interoperability. Dataset examples include MIMIC-IV (Medical Information Mart for Intensive Care). Other good-quality datasets that aren’t currently FHIR but can be easily converted include Centers for Medicare & Medicaid Services (CMS) Public Use Files (PUF) and eICU Collaborative Research Database from MIT (Massachusetts Institute of Technology). There are also other resources becoming available that offer FHIR-based datasets.
The lifecycle for implementing FL can include the following steps: task initialization, selection, configuration, model training, client/server communication, scheduling and optimization, versioning, testing, deployment, and termination. There are many time-intensive steps that go into preparing medical imaging data for traditional ML, as described in the following paper. Domain knowledge might be needed in some scenarios to preprocess raw patient data, especially due to its sensitive and private nature. These can be consolidated and sometimes eliminated for FL, saving crucial time for training and providing faster results.
Implementation
FL tools and libraries have grown with widespread support, making it straightforward to use FL without a heavy overhead lift. There are a lot of good resources and framework options available to get started. You can refer to the following extensive list of the most popular frameworks and tools in the FL domain, including PySyft, FedML, Flower, OpenFL, FATE, TensorFlow Federated, and NVFlare. It provides a beginner’s list of projects to get started quickly and build upon.
You can implement a cloud-native approach with Amazon SageMaker that seamlessly works with AWS VPC peering, keeping each node’s training in a private subnet in their respective VPC and enabling communication via private IPv4 addresses. Furthermore, model hosting on Amazon SageMaker JumpStart can help by exposing the endpoint API without sharing model weights.
It also takes away potential high-level compute challenges with on-premises hardware with Amazon Elastic Compute Cloud (Amazon EC2) resources. You can implement the FL client and servers on AWS with SageMaker notebooks and Amazon Simple Storage Service (Amazon S3), maintain regulated access to the data and model with AWS Identity and Access Management (IAM) roles, and use AWS Security Token Service (AWS STS) for client-side security. You can also build your own custom system for FL using Amazon EC2.
For a detailed overview of implementing FL with the Flower framework on SageMaker, and a discussion of its difference from distributed training, refer to Machine learning with decentralized training data using federated learning on Amazon SageMaker.
The following figures illustrate the architecture of transfer learning in FL.

Addressing FL data challenges
Federated learning comes with its own data challenges, including privacy and security, but they are straightforward to address. First, you need to address the data heterogeneity problem with medical imaging data arising from data being stored across different sites and participating organizations, known as a domain shift problem (also referred to as client shift in an FL system), as highlighted by Guan and Liu in the following paper. This can lead to a difference in convergence of the global model.
Other components for consideration include ensuring data quality and uniformity at the source, incorporating expert knowledge into the learning process to inspire confidence in the system among medical professionals, and achieving model precision. For more information about some of the potential challenges you may face during implementation, refer to the following paper.
AWS helps you resolve these challenges with features like the flexible compute of Amazon EC2 and pre-built Docker images in SageMaker for straightforward deployment. You can resolve client-side problems like unbalanced data and computation resources for each node organization. You can address server-side learning problems like poisoning attacks from malicious parties with Amazon Virtual Private Cloud (Amazon VPC), security groups, and other security standards, preventing client corruption and implementing AWS anomaly detection services.
AWS also helps in addressing real-world implementation challenges, which can include integration challenges, compatibility issues with current or legacy hospital systems, and user adoption hurdles, by offering flexible, easy-to-use, and effortless lift tech solutions.
With AWS services, you can enable large-scale FL-based research and clinical implementation and deployment, which can consist of various sites across the world.
Recent policies on interoperability highlight the need for federated learning
Many laws recently passed by the government include a focus on data interoperability, bolstering the need for cross-organizational interoperability of data for intelligence. This can be fulfilled by using FL, including frameworks like the TEFCA (Trusted Exchange Framework and Common Agreement) and the expanded USCDI (United States Core Data for Interoperability).
The proposed idea also contributes towards the CDC’s capture and distribution initiative CDC Moving Forward. The following quote from the GovCIO article Data Sharing and AI Top Federal Health Agency Priorities in 2024 also echoes a similar theme: “These capabilities can also support the public in an equitable way, meeting patients where they are and unlocking critical access to these services. Much of this work comes down to the data.”
This can help medical institutes and agencies around the country (and across the globe) with data silos. They can benefit from seamless and secure integration and data interoperability, making medical data usable for impactful ML-based predictions and pattern recognition. You can start with images, but the approach is applicable to all EHR as well. The goal is to find the best approach for data stakeholders, with a cloud-native pipeline to normalize and standardize the data or directly use it for FL.
Let’s explore an example use case. Heart stroke imaging data and scans are scattered around the country and the world, sitting in isolated silos in institutes, universities, and hospitals, and separated by bureaucratic, geographical, and political boundaries. There is no single aggregated source and no easy way for medical professionals (non-programmers) to extract insights from it. At the same time, it’s not feasible to train ML and DL models on this data, which could help medical professionals make faster, more accurate decisions in critical times when heart scans can take hours to come in while the patient’s life could be hanging in the balance.
Other known use cases include POTS (Purchasing Online Tracking System) at NIH (National Institutes of Health) and cybersecurity for scattered and tiered intelligence solution needs at COMCOMs/MAJCOMs locations around the globe.
Conclusion
Federated learning holds great promise for legacy healthcare data analytics and intelligence. It’s straightforward to implement a cloud-native solution with AWS services, and FL is especially helpful for medical organizations with legacy data and technical challenges. FL can have a potential impact on the entire treatment cycle, and now even more so with the focus on data interoperability from large federal organizations and government leaders.
This solution can help you avoid reinventing the wheel and use the latest technology to take a leap from legacy systems and be at the forefront in this ever-evolving world of AI. You can also become a leader for best practices and an efficient approach to data interoperability within and across agencies and institutes in the health domain and beyond. If you are an institute or agency with data silos scattered around the country, you can benefit from this seamless and secure integration.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post. It is each customers’ responsibility to determine whether they are subject to HIPAA, and if so, how best to comply with HIPAA and its implementing regulations. Before using AWS in connection with protected health information, customers must enter an AWS Business Associate Addendum (BAA) and follow its configuration requirements.

About the Author

Nitin Kumar (MS, CMU) is a Lead Data Scientist at T and T Consulting Services, Inc. He has extensive experience with R&D prototyping, health informatics, public sector data, and data interoperability. He applies his knowledge of cutting-edge research methods to the federal sector to deliver innovative technical papers, POCs, and MVPs. He has worked with multiple federal agencies to advance their data and AI goals. Nitin’s other focus areas include natural language processing (NLP), data pipelines, and generative AI.

How to Use Website Visitor Data in Your Client Campaigns

As we rapidly wave goodbye to third-party cookies and attribution data, marketing and advertising agencies find themselves in a rush to pivot and adapt. 

After all, agencies are tasked with one thing – driving results – and with less data to work with and without the ability to truly understand the impact of campaigns, driving results becomes much harder. 

The good news is, there’s no need for panic. While this shift may feel monumental (and let’s be honest, it is), it does present marketers a chance to revolutionize their approach to client campaigns across the board—from email and digital ads to social media and direct mail. 

The key? 

Your website visitor data. 

Whether it’s the rich potential of first-party data or your website analytics data, there is still plenty we can do to ensure campaigns are successful and clients are happy.

Understanding Website Visitor Data

At the core of any successful marketing strategy is a deep understanding of our audience. 

Unfortunately, many of the privacy changes made by the tech platforms have taken away much of our audience data (looking at you GA4). 

That doesn’t mean we don’t have data to work with though. In fact, it means we need take a look at the data we do have to understand how we can paint a complete picture of our website visitors. 

Let’s look at three types of website data:

First-Party Data. First-party data includes information directly collected from your audience, such as names, email addresses, preferences, and behaviors on your website. This data is gold in the marketing world, allowing for unparalleled personalization and targeting.

Website Analytics. Analytics data can help shed light on broader patterns, such as which pages are most visited, the average time spent on the site, and the journey users take through your content. Analytics data helps identify what captures interest and drives action.

CRM Data. While technically ‘first-party data’, CRM data can provide a historical view of customer interactions, purchases, and feedback, enabling a 360-degree view of the customer journey. For that, it’s worth calling out on it’s own. 

How to Capture First-Party Data

While we could create a whole blog post around this (see here, here, and here), we think it’s important to emphasize in this post as first-party data is the truly a life saver when it comes to overcoming the loss of third-party cookies (even Google agrees). 

So we aren’t going to go into every method of collecting first-party data but we do want to look at a few.

Website Visitor Identification

We are obviously partial to this as it’s the core of our product but it’s also a game-changer for marketers when it comes to collecting data.

With a website visitor identification tool, you can collect the names, email addresses, and phone numbers of your potential customers without them ever having to fill out a form.

And with a solution like Customers.ai that enriches the contact data, you can get information on demographics, business insights (company, revenue, LinkedIn profile), and more, giving you a huge amount of details on your website visitors. 

Interested in trying a website visitor identification tool for yourself? 

We have a free trial and you can be up and running in 90 seconds.

See Who Is On Your Site Right Now!

Turn anonymous visitors into genuine contacts.

Try it Free, No Credit Card Required

Get The X-Ray Pixel

Exit Intent or Promo Popups

Positioned to activate when visitors indicate they’re about to depart, exit-intent pop-ups serve as a prime chance to gather first-party data.

By providing special offers, attractive discounts, or compelling content, popups encourage users to divulge their personal information, giving you real-time data on your website visitors.

Lead Magnets or Newsletters

We’ve all seen them – the reports we had to fill out our personal information to download, the pages we had to add our email addresses to in order to read the full page, the newsletter we had to give our firstborn to to receive. 

But look, they work!

Your visitors get something they want and in return, you get first-party data. 

At the end of the day, there are any number of creative ways and methodologies to capturing first-party data. The key is making it as easy and as seamless as possible for your visitors.

How to Use Website Visitor Data in Your Client Campaigns

All right. We know what type of data we are looking for and we know how to capture the data. Let’s look at a few ways we can use our website visitor data in client campaigns.

Leverage First-Party Data for Personalization

Personalization is key to connecting with your audience. It’s also what they want!

Customers want personalization in their emails, they want targeted ads, they want product recommendations that make sense for them, and they want businesses to speak to them in a way that resonates and doesn’t waste their time, 

First-party data is how we do this!

Let’s look at an example of just how this can be implemented. 

A woman visits a fitness website and browses several articles on running but doesn’t make a purchase. 

Using a website visitor identification solution like Customers.ai, we can identify who the user is (name, email, etc.) and what pages she visited. 

With our first-party data in hand, we can then add her to our retargeting audience and build a campaign around those who have visited articles about running but have not purchased.

Now what if she instead visited the running shoes page, added the shoes to the cart, but then abandoned before giving us her info?

Well, given we have her name and email and we know what shoes she looked at before visiting the cart page, we can send a highly personalized email reminding her about the shoes she left behind. 

That is the power of first-party data. 

Identify Messaging Effectiveness with Analytics Data 

While platforms like Google and Facebook have taken away much of the analytics data we used to have access to, they are still helpful when it comes to understanding the effectiveness of content and campaigns. 

By examining patterns, such as which blog posts drive the most engagement or what path users typically follow before purchasing, agencies can tailor their strategies to match user behavior and understand which messaging resonates best with each client’s audience.

Let’s look at another example using the same fitness site as above. 

Using analytics, the agency sees that a high percentage of visitors who come to the site through a blog post about minimalist running shoes, make a purchase.

They also can see that the minimalist running shoe category page has a much lower conversion rate. 

By comparing the messaging on the blog to the messaging on the category page, they can see that the pages are not cohesive. The category page uses less descriptive language and does not showcase the value of the shoes the way the blog does.

The agency is able to make a recommendation on how to update the category page and improve the overall conversion rate.

Understanding what resonates with customers is tricky, especially from an agency perspective where you don’t necessarily have access to customer support or the CRM.

By using analytics, we can better understand what’s working and what’s not.

Segment Audiences for Targeted Campaigns

Effective segmentation is crucial for delivering relevant content. For example, segmenting users based on their engagement level—such as new visitors, frequent browsers, and past purchasers—allows for campaigns that speak directly to where they are in the customer journey.

This is also where we can integrate our CRM data. 

Let’s look at another example, this time using a B2B software company. 

A B2B software provider uses CRM data to segment its audience based on industry, job role, and previous engagement with the website. 

For high-intent visitors, they create tailored email campaigns for each segment, offering relevant whitepapers and case studies, which leads to a higher open rate and more qualified leads.

For visitors at the top of the funnel, they create custom audience segments that can be used in their LinkedIn campaigns. 

The more segmented you can get, the better. 

As an agency, getting access to CRM data is always challenging but when positioned the right way and with case studies in hand, you can show your clients exactly how effective that data can be.

Making Visitor Data Work for Your Clients

The importance of website visitor data in crafting targeted, personalized marketing campaigns cannot be overstated. 

By effectively leveraging first-party data, analytics, and CRM insights, marketing agencies can significantly enhance their client campaigns, driving better results and building stronger customer relationships.

Look, while none of this is really all that groundbreaking, it is a new way of thinking about how we use the data we have and agencies that adapt now are going to be in a better place and ultimately have happier clients. 

Remember, the key to success is adaptability, personalization, and a relentless focus on the data that matters most – the insights directly from your audience.

Customers.ai for Agencies

Higher retainers, improved ROI, and happier clients.

Book a Demo

Important Next Steps

See what targeted outbound marketing is all about. Capture and engage your first 500 website visitor leads with Customers.ai X-Ray website visitor identification for free.

Talk and learn about sales outreach automation with other growth enthusiasts. Join Customers.ai Island, our Facebook group of 40K marketers and entrepreneurs who are ready to support you.

Advance your marketing performance with Sales Outreach School, a free tutorial and training area for sales pros and marketers.

The post How to Use Website Visitor Data in Your Client Campaigns appeared first on Customers.ai.

Meta AI Introduces Branch-Train-MiX (BTX): A Simple Continued Pretrain …

In the landscape of artificial intelligence, developing Large Language Models (LLMs) has been a cornerstone for various applications that can range from natural language processing to code generation. The relentless pursuit of advancing these models has introduced new methodologies aimed at refining their capabilities and efficiency. 

Training LLMs traditionally entail a considerable allocation of computational resources and data, often resulting in a steep trade-off between breadth and depth of knowledge. The challenge of efficiently scaling their abilities becomes increasingly pronounced. Previous training paradigms have usually led to a bottleneck, where the addition of specialized expertise is met with diminishing returns on investment in terms of computational resources and training time.

Recent methodologies have addressed this issue by segmenting the training process, focusing on developing domain-specific expertise within the models. However, These segmented training processes have faced their own challenges, particularly in balancing specialized training with the maintenance of a model’s general capabilities. Integrating specialized knowledge often comes at the expense of a model’s adaptability and efficiency, creating a gap in the quest for a versatile and scalable LLM.

Researchers from FAIR at Meta introduce Branch-Train-Mix (BTX), a pioneering strategy at the confluence of parallel training, and the Mixture-of-Experts (MoE) model. BTX distinguishes itself by initiating parallel training for domain-specific experts. This is followed by a strategic amalgamation of these experts into a unified MoE framework to enhance the model’s overall efficacy and versatility.

The BTX methodology is characterized by its innovative approach to integrating domain expertise into a cohesive model. By first branching out into parallel training pathways, the method allows for focused expertise development in individual domains. These parallel paths increase efficiency and prevent the dilution of specialized knowledge. The subsequent phase of the process involves meticulously integrating these domain-specific models into a singular MoE model through parameter merging and fine-tuning. This integrated model can then leverage specialized knowledge across various domains while maintaining its foundational capabilities.

The efficacy of the BTX model was tested across a broad spectrum of benchmarks, showcasing its ability to retain and enhance performance in specialized domains. This was achieved with impressive efficiency, minimizing the additional computational demands typically associated with such enhancements. The BTX method’s performance underscores its potential as a scalable and adaptable approach to LLM training, presenting a significant advancement in the field.

This research encapsulates a significant stride towards optimizing the training of LLMs, offering a glimpse into the future of artificial intelligence development. The BTX method represents a nuanced approach to enhancing the depth and breadth of LLM capabilities, marking a pivotal shift towards more efficient, scalable, and adaptable training paradigms.

In conclusion, some key takeaways from the research include:

Innovative Training Approach: The BTX strategy introduces a novel LLM enhancement method through parallel training and integration into a Mixture-of-Experts model, emphasizing efficiency and domain-specific enhancement.

Enhanced Model Performance: Demonstrated superior performance in domain-specific benchmarks while maintaining general capabilities, showcasing an optimal balance between specialization and adaptability.

Optimal Efficiency: Achieved significant enhancements without the proportional increase in computational demand, illustrating the method’s efficiency and scalability.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post Meta AI Introduces Branch-Train-MiX (BTX): A Simple Continued Pretraining Method to Improve an LLM’s Capabilities appeared first on MarkTechPost.

This AI Research Discusses Personalized Audiobook Recommendations at S …

Spotify, which is well-known for its vast collection of music and talk shows, has expanded its services to include audiobooks to serve a wider range of users. However, this extension comes with certain limitations, especially with regard to customized recommendations. Since audiobooks were originally sold for a price and cannot just be browsed before being purchased, precise and pertinent suggestions are much more important than they are for music and podcasts. 

The challenge of handling sparse data is also present when incorporating a new content type into an already-existing platform. Moreover, due to the enormous volume of content recommendations to millions of individuals, a system that can respond quickly and expand efficiently is required.

In order to address this, a team of researchers has focussed on users’ current musical and podcast interests and and has presented a new recommendation engine known as 2T-HGNN. Using a Two Tower (2T) architecture and components of Heterogeneous Graph Neural Networks (HGNNs), this system reveals intricate links between objects with minimal latency and complexity. 

Decoupling users from the HGNN graph is a crucial tactic that has been used to enable a more in-depth study of item relationships. A multi-link neighbor sampler has also been introduced that improves the effectiveness of the recommendation process. The HGNN model’s computational complexity is greatly decreased by these calculated decisions in conjunction with the 2T component.

Extensive experiments with millions of users have validated the effectiveness of the methodology, exhibiting a notable enhancement in the caliber of customized suggestions. The strategy has resulted in a noteworthy 23% rise in streaming rates and a 46% increase in the rate at which customers are starting new audiobooks. 

The team has summarized their primary contributions as follows.

Examining the Design of Audiobook Recommendation Systems – Extensive research has been conducted on creating a large-scale audiobook recommendation system. The analysis of user consumption patterns allows to better understand consumer preferences for audiobooks, especially when it comes to podcasts, which are renowned for their conversational approach.

Integrating Modular Architecture – A modular design has been suggested that easily incorporates audiobook content into already-in-use recommendation systems. In this architecture, a Two Tower (2T) model and a Heterogeneous Graph Neural Network (HGNN) have been combined into a single stack. While the 2T model easily learns user preferences for audiobooks across all user types, including cold-start users, the HGNN captures long-range, subtle item relations.

Resolving the Imbalance in Data Distribution – An innovative edge sampler has been incorporated into the HGNN to address imbalances in data distribution. The user-audiobook predictions have been generated by integrating weak signals in the user representation. 

Comprehensive Assessment – The 2T-HGNN model has been proven to be efficient and effective through extensive offline trials, consistently outperforming other approaches. Millions of people participating in A/B testing have shown notable gains, such as a 23% rise in audiobook stream rates and a 46% spike in the number of users beginning new audiobooks. 

In conclusion, by utilizing user preferences, sophisticated graph-based methods, and effective computational methodologies, this unique recommendation system tackles the difficulties presented by the integration of audiobooks into the Spotify platform. By doing this, the user experience for audiobooks can be improved while also making a positive impact on the broader richness of the digital audio landscape.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 38k+ ML SubReddit
The post This AI Research Discusses Personalized Audiobook Recommendations at Spotify Using Graph Neural Networks and Introduces a New Recommendation Engine Called 2T-HGNN appeared first on MarkTechPost.

Meet Devin: The World’s First Fully Autonomous AI Software Engineer

The groundbreaking development in artificial intelligence has led to the creation of Devin by Cognition AI, the world’s first autonomous AI software engineer. This unique creation marks a significant leap forward in software engineering, setting a new benchmark in the SWE-Bench coding benchmark. Devin’s impressive accomplishments include passing practical engineering interviews at leading AI companies and fulfilling actual job roles on Upwork, showcasing its potential to revolutionize the field.

The earlier approaches, often bogged down by repetitive tasks and the intricacies of debugging, have been ripe for transformation. Devin emerges as a pioneering solution, embodying AI’s full potential to tackle complex engineering challenges autonomously.

Existing tools in AI-assisted programming have offered valuable support by simplifying aspects like code completion and problem identification. However, they need a holistic approach to software development. Devin, with its advanced capabilities, bridges this gap. It operates autonomously and is equipped with developer tools such as a shell, code editor, and browser embedded in a secure computing environment. This autonomy enables Devin to efficiently plan, execute, and collaborate on software engineering tasks.

The innovation behind Devin is deeply rooted in advancements in AI technology, specifically designed to handle complex, long-term engineering tasks. This includes learning new technologies, building and deploying applications from scratch, autonomously debugging, and even training and fine-tuning other AI models. Without human intervention, the capability to perform these tasks independently represents a monumental shift in how software development projects are approached, highlighting Devin’s potential to reshape the industry.

Devin’s efficacy has been rigorously tested on the SWE-Bench benchmark, a challenging test that requires resolving real-world GitHub issues in open-source projects. Devin’s performance is nothing short of revolutionary. It resolved 13.86% of these issues unassisted, a significant improvement over the previous state-of-the-art, which managed 1.96% unassisted and 4.80% assisted. This achievement proves Devin’s capability and highlights its potential to redefine the landscape of software engineering.

In summarizing Devin‘s impact, it’s evident that this AI software engineer is not just a tool but a transformational force in software development. By automating complex tasks, Devin not only elevates productivity but also paves the way for new software creation and maintenance innovations. This pioneering development signifies the dawn of a new era in software engineering, where AI engineers like Devin play a crucial role in shaping the future of technology.

Today we’re excited to introduce Devin, the first AI software engineer.Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.Devin is… pic.twitter.com/ladBicxEat— Cognition (@cognition_labs) March 12, 2024

The post Meet Devin: The World’s First Fully Autonomous AI Software Engineer appeared first on MarkTechPost.

The journey of PGA TOUR’s generative AI virtual assistant, from conc …

This is a guest post co-written with Scott Gutterman from the PGA TOUR.
Generative artificial intelligence (generative AI) has enabled new possibilities for building intelligent systems. Recent improvements in Generative AI based large language models (LLMs) have enabled their use in a variety of applications surrounding information retrieval. Given the data sources, LLMs provided tools that would allow us to build a Q&A chatbot in weeks, rather than what may have taken years previously, and likely with worse performance. We formulated a Retrieval-Augmented-Generation (RAG) solution that would allow the PGA TOUR to create a prototype for a future fan engagement platform that could make its data accessible to fans in an interactive fashion in a conversational format.
Using structured data to answer questions requires a way to effectively extract data that’s relevant to a user’s query. We formulated a text-to-SQL approach where by a user’s natural language query is converted to a SQL statement using an LLM. The SQL is run by Amazon Athena to return the relevant data. This data is again provided to an LLM, which is asked to answer the user’s query given the data.
Using text data requires an index that can be used to search and provide relevant context to an LLM to answer a user query. To enable quick information retrieval, we use Amazon Kendra as the index for these documents. When users ask questions, our virtual assistant rapidly searches through the Amazon Kendra index to find relevant information. Amazon Kendra uses natural language processing (NLP) to understand user queries and find the most relevant documents. The relevant information is then provided to the LLM for final response generation. Our final solution is a combination of these text-to-SQL and text-RAG approaches.
In this post we highlight how the AWS Generative AI Innovation Center collaborated with the AWS Professional Services and PGA TOUR to develop a prototype virtual assistant using Amazon Bedrock that could enable fans to extract information about any event, player, hole or shot level details in a seamless interactive manner. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Development: Getting the data ready
As with any data-driven project, performance will only ever be as good as the data. We processed the data to enable the LLM to be able to effectively query and retrieve relevant data.
For the tabular competition data, we focused on a subset of data relevant to the greatest number of user queries and labelled the columns intuitively, such that they would be easier for LLMs to understand. We also created some auxiliary columns to help the LLM understand concepts it might otherwise struggle with. For example, if a golfer shoots one shot less than par (such as makes it in the hole in 3 shots on a par 4 or in 4 shots on a par 5), it is commonly called a birdie. If a user asks, “How many birdies did player X make in last year?”, just having the score and par in the table is not sufficient. As a result, we added columns to indicate common golf terms, such as bogey, birdie, and eagle. In addition, we linked the Competition data with a separate video collection, by joining a column for a video_id, which would allow our app to pull the video associated with a particular shot in the Competition data. We also enabled joining text data to the tabular data, for example adding biographies for each player as a text column. The following figures shows the step-by-step procedure of how a query is processed for the text-to-SQL pipeline. The numbers indicate the series of step to answer a query.

In the following figure we demonstrate our end-to-end pipeline. We use AWS Lambda as our orchestration function responsible for interacting with various data sources, LLMs and error correction based on the user query. Steps 1-8 are similar to what is shown in the proceeding figure. There are slight changes for the unstructured data, which we discuss next.

Text data requires unique processing steps that chunk (or segment) long documents into parts digestible by the LLM, while maintaining topic coherence. We experimented with several approaches and settled on a page-level chunking scheme that aligned well with the format of the Media Guides. We used Amazon Kendra, which is a managed service that takes care of indexing documents, without requiring specification of embeddings, while providing an easy API for retrieval. The following figure illustrates this architecture.

The unified, scalable pipeline we developed allows the PGA TOUR to scale to their full history of data, some of which goes back to the 1800s. It enables future applications that can take live on the course context to create rich real-time experiences.
Development: Evaluating LLMs and developing generative AI applications
We carefully tested and evaluated the first- and third-party LLMs available in Amazon Bedrock to choose the model that is best suited for our pipeline and use case. We selected Anthropic’s Claude v2 and Claude Instant on Amazon Bedrock. For our final structured and unstructured data pipeline, we observe Anthropic’s Claude 2 on Amazon Bedrock generated better overall results for our final data pipeline.
Prompting is a critical aspect of getting LLMs to output text as desired. We spent considerable time experimenting with different prompts for each of the tasks. For example, for the text-to-SQL pipeline we had several fallback prompts, with increasing specificity and gradually simplified table schemas. If a SQL query was invalid and resulted in an error from Athena, we developed an error correction prompt that would pass the error and incorrect SQL to the LLM and ask it to fix it. The final prompt in the text-to-SQL pipeline asks the LLM to take the Athena output, which can be provided in Markdown or CSV format, and provide an answer to the user. For the unstructured text, we developed general prompts to use the context retrieved from Amazon Kendra to answer the user question. The prompt included instructions to use only the information retrieved from Amazon Kendra and not rely on data from the LLM pre-training.
Latency is often a concern with generative AI applications, and it is also the case here. It is especially a concern for text-to-SQL, which requires an initial SQL generation LLM invocation, followed by a response generation LLM invocation. If we’re using a large LLM, such as Anthropic’s Claude V2, this effectively doubles the latency of just one LLM invocation. We experimented with several configurations of large and smaller LLMs to evaluate run time as well as correctness. The following table shows an example for one question is shown below that demonstrates the latency, as well as the generated responses with Anthropic’s Claude V2 and Claude Instant on Amazon Bedrock.

Prototype
In our application, we used a Lambda function to orchestrate the prompt choices and communication between Amazon Athena, Amazon Kendra, and Amazon Bedrock. The virtual assistant features a conversational interface, built with React and the Cloudscape Design System, that uses text and videos to engage with users.
Our virtual assistant can field a wide range of questions about PGA TOUR events, players, statistics, history, and more. For example, when asked about Tony Finau’s longest drives at the Shriners Children’s Open (one of the Signature events of the PGA TOUR), the assistant can query structured data to pull his shot level data. It can extract the exact distance in yards, the round number and the hole during which he achieved this record using the text-to-SQL pipeline. The assistant then interprets this numerical data to generate a final response. The following table contains some example responses.

Question
Response

What was the longest drive hit by Tony Finau in the Shriners Children’s Open?
The longest drive hit by Tony Finau in the Shriners Childrens Open was 382 yards, which he hit during the first round on hole number 4 in 2018.

What was the longest drive hit by Collin Morikawa in the Shriners Children’s Open?
Collin Morikawa’s longest drive at the Shriners Childrens Open was 334 yards. This occurred in 2019 during the first round on hole number 15.

Did anyone make an ace at the 2022 Shriners Children’s Open?
Yes, Adam Hadwin made a hole-in-one on hole 14 during round 3 of the 2022 Shriners Children’s Open

The following explainer video highlights a few examples of interacting with the virtual assistant.

In initial testing, our PGA TOUR virtual assistant has shown great promise in improving fan experiences. By blending AI technologies like text-to-SQL, semantic search, and natural language generation, the assistant delivers informative, engaging responses. Fans are empowered to effortlessly access data and narratives that were previously hard to find.
What does the future hold?
As we continue development, we will expand the range of questions our virtual assistant can handle. This will require extensive testing, through collaboration between AWS and the PGA TOUR. Over time, we aim to evolve the assistant into a personalized, omni-channel experience accessible across web, mobile, and voice interfaces.
The establishment of a cloud-based generative AI assistant lets the PGA TOUR present its vast data source to multiple internal and external stakeholders. As the sports generative AI landscape evolves, it enables the creation of new content. For example, you can use AI and machine learning (ML) to surface content fans want to see as they’re watching an event, or as production teams are looking for shots from previous tournaments that match a current event. For example, if Max Homa is getting ready to take his final shot at the PGA TOUR Championship from a spot 20 feet from the pin, the PGA TOUR can use AI and ML to identify and present clips, with AI-generated commentary, of him attempting a similar shot five times previously. This kind of access and data allows a production team to immediately add value to the broadcast or allow a fan to customize the type of data that they want to see.

“The PGA TOUR is the industry leader in using cutting-edge technology to improve the fan experience. AI is at the forefront of our technology stack, where it is enabling us to create a more engaging and interactive environment for fans. This is the beginning of our generative AI journey in collaboration with the AWS Generative AI Innovation Center for a transformational end-to-end customer experience. We are working to leverage Amazon Bedrock and our propriety data to create an interactive experience for PGA TOUR fans to find information of interest about an event, player, stats, or other content in an interactive fashion.” – Scott Gutterman, SVP of Broadcast and Digital Properties at PGA TOUR.

Conclusion
The project we discussed in this post exemplifies how structured and unstructured data sources can be fused using AI to create next-generation virtual assistants. For sports organizations, this technology enables more immersive fan engagement and unlocks internal efficiencies. The data intelligence we surface helps PGA TOUR stakeholders like players, coaches, officials, partners, and media make informed decisions faster. Beyond sports, our methodology can be replicated across any industry. The same principles apply to building assistants that engage customers, employees, students, patients, and other end-users. With thoughtful design and testing, virtually any organization can benefit from an AI system that contextualizes their structured databases, documents, images, videos, and other content.
If you’re interested in implementing similar functionalities, consider using Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock as an alternative, fully AWS-managed solution. This approach could further investigate providing intelligent automation and data search abilities through customizable agents. These agents could potentially transform user application interactions to be more natural, efficient, and effective.

About the authors
Scott Gutterman is the SVP of Digital Operations for the PGA TOUR. He is responsible for the TOUR’s overall digital operations, product development and is driving their GenAI strategy.
Ahsan Ali is an Applied Scientist at the Amazon Generative AI Innovation Center, where he works with customers from different domains to solve their urgent and expensive problems using Generative AI.
Tahin Syed is an Applied Scientist with the Amazon Generative AI Innovation Center, where he works with customers to help realize business outcomes with generative AI solutions. Outside of work, he enjoys trying new food, traveling, and teaching taekwondo.
Grace Lang is an Associate Data & ML engineer with AWS Professional Services. Driven by a passion for overcoming tough challenges, Grace helps customers achieve their goals by developing machine learning powered solutions.
Jae Lee is a Senior Engagement Manager in ProServe’s M&E vertical. She leads and delivers complex engagements, exhibits strong problem solving skill sets, manages stakeholder expectations, and curates executive level presentations. She enjoys working on projects focused on sports, generative AI, and customer experience.
Karn Chahar is a Security Consultant with the shared delivery team at AWS. He is a technology enthusiast who enjoys working with customers to solve their security challenges and to improve their security posture in the cloud.
Mike Amjadi is a Data & ML Engineer with AWS ProServe focused on enabling customers to maximize value from data. He specializes in designing, building, and optimizing data pipelines following well-architected principles. Mike is passionate about using technology to solve problems and is committed to delivering the best results for our customers.
Vrushali Sawant is a Front End Engineer with Proserve. She is highly skilled in creating responsive websites. She loves working with customers, understanding their requirements and providing them with scalable, easy to adopt UI/UX solutions.
Neelam Patel is a Customer Solutions Manager at AWS, leading key Generative AI and cloud modernization initiatives. Neelam works with key executives and technology owners to address their cloud transformation challenges and helps customers maximize the benefits of cloud adoption. She has an MBA from Warwick Business School, UK and a Bachelors in Computer Engineering, India.
Dr. Murali Baktha is Global Golf Solution Architect at AWS, spearheads pivotal initiatives involving Generative AI, data analytics and cutting-edge cloud technologies. Murali works with key executives and technology owners to understand customer’s business challenges and designs solutions to address those challenges. He has an MBA in Finance from UConn and a doctorate from Iowa State University.
Mehdi Noor is an Applied Science Manager at Generative Ai Innovation Center. With a passion for bridging technology and innovation, he assists AWS customers in unlocking the potential of Generative AI, turning potential challenges into opportunities for rapid experimentation and innovation by focusing on scalable, measurable, and impactful uses of advanced AI technologies, and streamlining the path to production.

Enhance code review and approval efficiency with generative AI using A …

In the world of software development, code review and approval are important processes for ensuring the quality, security, and functionality of the software being developed. However, managers tasked with overseeing these critical processes often face numerous challenges, such as the following:

Lack of technical expertise – Managers may not have an in-depth technical understanding of the programming language used or may not have been involved in software engineering for an extended period. This results in a knowledge gap that can make it difficult for them to accurately assess the impact and soundness of the proposed code changes.
Time constraints – Code review and approval can be a time-consuming process, especially in larger or more complex projects. Managers need to balance between the thoroughness of review vs. the pressure to meet project timelines.
Volume of change requests – Dealing with a high volume of change requests is a common challenge for managers, especially if they’re overseeing multiple teams and projects. Similar to the challenge of time constraint, managers need to be able to handle those requests efficiently so as to not hold back project progress.
Manual effort – Code review requires manual effort by the managers, and the lack of automation can make it difficult to scale the process.
Documentation – Proper documentation of the code review and approval process is important for transparency and accountability.

With the rise of generative artificial intelligence (AI), managers can now harness this transformative technology and integrate it with the AWS suite of deployment tools and services to streamline the review and approval process in a manner not previously possible. In this post, we explore a solution that offers an integrated end-to-end deployment workflow that incorporates automated change analysis and summarization together with approval workflow functionality. We use Amazon Bedrock, a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage any infrastructure.
Solution overview
The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

A developer pushes new code changes to their code repository (such as AWS CodeCommit), which automatically triggers the start of an AWS CodePipeline deployment.
The application code goes through a code building process, performs vulnerability scans, and conducts unit tests using your preferred tools.
AWS CodeBuild retrieves the repository and performs a git show command to extract the code differences between the current commit version and the previous commit version. This produces a line-by-line output that indicates the code changes made in this release.
CodeBuild saves the output to an Amazon DynamoDB table with additional reference information:

CodePipeline run ID
AWS Region
CodePipeline name
CodeBuild build number
Date and time
Status

Amazon DynamoDB Streams captures the data modifications made to the table.
An AWS Lambda function is triggered by the DynamoDB stream to process the record captured.
The function invokes the Anthropic Claude v2 model on Amazon Bedrock via the Amazon Bedrock InvokeModel API call. The code differences, together with a prompt, are provided as input to the model for analysis, and a summary of code changes is returned as output.
The output from the model is saved back to the same DynamoDB table.
The manager is notified via Amazon Simple Email Service (Amazon SES) of the summary of code changes and that their approval is required for the deployment.
The manager reviews the email and provides their decision (either approve or reject) together with any review comments via the CodePipeline console.
The approval decision and review comments are captured by Amazon EventBridge, which triggers a Lambda function to save them back to DynamoDB.
If approved, the pipeline deploys the application code using your preferred tools. If rejected, the workflow ends and the deployment does not proceed further.

In the following sections, you deploy the solution and verify the end-to-end workflow.
Prerequisites
To follow the instructions in this solution, you need the following prerequisites:

An AWS account with an AWS Identity and Access Management (IAM) user who has permissions to AWS CloudFormation, CodePipeline, CodeCommit, CodeBuild, DynamoDB, Lambda, Amazon Bedrock, Amazon SES, EventBridge, and IAM
Model access to Anthropic Claude on Amazon Bedrock

Deploy the solution
To deploy the solution, complete the following steps:

Choose Launch Stack to launch a CloudFormation stack in us-east-1:
For EmailAddress, enter an email address that you have access to. The summary of code changes will be sent to this email address.
For modelId, leave as the default anthropic.claude-v2, which is the Anthropic Claude v2 model.

Deploying the template will take about 4 minutes.

When you receive an email from Amazon SES to verify your email address, choose the link provided to authorize your email address.
You’ll receive an email titled “Summary of Changes” for the initial commit of the sample repository into CodeCommit.
On the AWS CloudFormation console, navigate to the Outputs tab of the deployed stack.
Copy the value of RepoCloneURL. You need this to access the sample code repository.

Test the solution
You can test the workflow end to end by taking on the role of a developer and pushing some code changes. A set of sample codes has been prepared for you in CodeCommit. To access the CodeCommit repository, enter the following commands on your IDE:

git clone <replace_with_value_of_RepoCloneURL>
cd my-sample-project
ls

You will find the following directory structure for an AWS Cloud Development Kit (AWS CDK) application that creates a Lambda function to perform a bubble sort on a string of integers. The Lambda function is accessible via a publicly available URL.

.
├── README.md
├── app.py
├── cdk.json
├── lambda
│ └── index.py
├── my_sample_project
│ ├── __init__.py
│ └── my_sample_project_stack.py
├── requirements-dev.txt
├── requirements.txt
└── source.bat

You make three changes to the application codes.

To enhance the function to support both quick sort and bubble sort algorithm, take in a parameter to allow the selection of the algorithm to use, and return both the algorithm used and sorted array in the output, replace the entire content of lambda/index.py with the following code:

# function to perform bubble sort on an array of integers
def bubble_sort(arr):
for i in range(len(arr)):
for j in range(len(arr)-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr

# function to perform quick sort on an array of integers
def quick_sort(arr):
if len(arr) <= 1:
return arr
else:
pivot = arr[0]
less = [i for i in arr[1:] if i <= pivot]
greater = [i for i in arr[1:] if i > pivot]
return quick_sort(less) + [pivot] + quick_sort(greater)

# lambda handler
def lambda_handler(event, context):
try:
algorithm = event[‘queryStringParameters’][‘algorithm’]
numbers = event[‘queryStringParameters’][‘numbers’]
arr = [int(x) for x in numbers.split(‘,’)]
if ( algorithm == ‘bubble’):
arr = bubble_sort(arr)
elif ( algorithm == ‘quick’):
arr = quick_sort(arr)
else:
arr = bubble_sort(arr)

return {
‘statusCode’: 200,
‘body’: {
‘algorithm’: algorithm,
‘numbers’: arr
}
}
except:
return {
‘statusCode’: 200,
‘body’: {
‘algorithm’: ‘bubble or quick’,
‘numbers’: ‘integer separated by commas’
}
}

To reduce the timeout setting of the function from 10 minutes to 5 seconds (because we don’t expect the function to run longer than a few seconds), update line 47 in my_sample_project/my_sample_project_stack.py as follows:

timeout=Duration.seconds(5),

To restrict the invocation of the function using IAM for added security, update line 56 in my_sample_project/my_sample_project_stack.py as follows:

auth_type=_lambda.FunctionUrlAuthType.AWS_IAM

Push the code changes by entering the following commands:

git commit -am ‘added new changes for release v1.1’
git push

This starts the CodePipeline deployment workflow from Steps 1–9 as outlined in the solution overview. When invoking the Amazon Bedrock model, we provided the following prompt:

Human: Review the following “git show” output enclosed within <gitshow> tags detailing code changes, and analyze their implications.
Assess the code changes made and provide a concise summary of the modifications as well as the potential consequences they might have on the code’s functionality.
<gitshow>
{code_change}
</gitshow>

Assistant:

Within a few minutes, you will receive an email informing you that you have a deployment pipeline pending your approval, the list of code changes made, and an analysis on the summary of changes generated by the model. The following is an example of the output:

Based on the diff, the following main changes were made:

1. Two sorting algorithms were added – bubble sort and quick sort.
2. The lambda handler was updated to take an ‘algorithm’ query parameter to determine which sorting algorithm to use. By default it uses bubble sort if no algorithm is specified.
3. The lambda handler now returns the sorting algorithm used along with the sorted numbers in the response body.
4. The lambda timeout was reduced from 10 mins to 5 seconds.
5. The function URL authentication was changed from none to AWS IAM, so only authenticated users can invoke the URL.

Overall, this adds support for different sorting algorithms, returns more metadata in the response, reduces timeout duration, and tightens security around URL access. The main functional change is the addition of the sorting algorithms, which provides more flexibility in how the numbers are sorted. The other changes improve various non-functional attributes of the lambda function.

Finally, you take on the role of an approver to review and approve (or reject) the deployment. In your email, there is a hyperlink that will bring you to the CodePipeline console for you to input your review comments and approve the deployment.

If approved, the pipeline will proceed to the next step, which deploys the application. Otherwise, the pipeline ends. For the purpose of this test, the Lambda function will not actually be deployed because there are no deployment steps defined in the pipeline.
Additional considerations
The following are some additional considerations when implementing this solution:

Different models will produce different results, so you should conduct experiments with different foundation models and different prompts for your use case to achieve the desired results.
The analyses provided are not meant to replace human judgement. You should be mindful of potential hallucinations when working with generative AI, and use the analysis only as a tool to assist and speed up code review.

Clean up
To clean up the created resources, go to the AWS CloudFormation console and delete the CloudFormation stack.
Conclusion
This post explores the challenges faced by managers in the code review process, and introduces the use of generative AI as an augmented tool to accelerate the approval process. The proposed solution integrates the use of Amazon Bedrock in a typical deployment workflow, and provides guidance on deploying the solution in your environment. Through this implementation, managers can now take advantage of the assistive power of generative AI and navigate these challenges with ease and efficiency.
Try out this implementation and let us know your thoughts in the comments.

About the Author
Xan Huang is a Senior Solutions Architect with AWS and is based in Singapore. He works with major financial institutions to design and build secure, scalable, and highly available solutions in the cloud. Outside of work, Xan spends most of his free time with his family and getting bossed around by his 3-year-old daughter. You can find Xan on LinkedIn.