Connect the Amazon Q Business generative AI coding companion to your G …

Incorporating generative artificial intelligence (AI) into your development lifecycle can offer several benefits. For example, using an AI-based coding companion such as Amazon Q Developer can boost development productivity by up to 30 percent. Additionally, reducing the developer context switching that stems from frequent interactions with many different development tools can also increase developer productivity. In this post, we show you how development teams can quickly obtain answers based on the knowledge distributed across your development environment using generative AI.
GitHub (Cloud) is a popular development platform that helps teams build, scale, and deliver software used by more than 100 million developers and over 4 million organizations worldwide. GitHub helps developers host and manage Git repositories, collaborate on code, track issues, and automate workflows through features such as pull requests, code reviews, and continuous integration and deployment (CI/CD) pipelines.
Amazon Q Business is a fully managed, generative AI–powered assistant designed to enhance enterprise operations. You can tailor it to specific business needs by connecting to company data, information, and systems using over 40 built-in connectors.
You can connect your GitHub (Cloud) instance to Amazon Q Business using an out-of-the-box connector to provide a natural language interface to help your team analyze the repositories, commits, issues, and pull requests contained in your GitHub (Cloud) organization. After establishing the connection and synchronizing data, your teams can use Amazon Q Business to perform natural language queries in the supported GitHub (Cloud) data entities, streamlining access to this information.
Overview of solution
To create an Amazon Q Business application to connect to your GitHub repositories using AWS IAM Identity Center and AWS Secrets Manager, follow these high-level steps:

Create an Amazon Q Business application
Perform sync
Run sample queries to test the solution

The following screenshot shows the solution architecture.

In this post, we show how developers and other relevant users can use the Amazon Q Business web experience to perform natural language–based Q&A over the indexed information reflective of the associated access control lists (ACLs). For this post, we set up a dedicated GitHub (Cloud) organization with four repositories and two teams—review and development. Two of the repositories are private and are only accessible to the members of the review team. The remaining two repositories are public and are accessible to all members and teams.
Prerequisites
To perform the solution, make sure you have the following prerequisites in place:

Have an AWS account with privileges necessary to administer Amazon Q Business
Have access to the AWS region in which Amazon Q Business is available (Supported regions)
Enable the IAM Identity Center and add a user (Guide to enable IAM Identity Center, Guide to add user)
Have a GitHub account with an organization and repositories (Guide to create organization)
Have a GitHub access token classic (Guide to create access tokens, Permissions needed for tokens)

Create, sync, and test an Amazon Q business application with IAM Identity Center
To create the Amazon Q Business application, you need to select the retriever, connect the data sources, and add groups and users.
Create application

On the AWS Management Console, search for Amazon Q Business in the search bar, then select Amazon Q Business.

On the Amazon Q Business landing page, choose Get started.

On the Amazon Q Business Applications screen, at the bottom, choose Create application.

Under Create application, provide the required values. For example, in Application name, enter anycompany-git-application. For Service access, select Create and use a new service-linked role (SLR). Under Application connected to IAM Identity Center, note the ARN for the associated IAM Identity Center instance. Choose Create.

Select retriever
Under Select retriever, in Retrievers, select Use native retriever. Under Index provisioning, enter “1.”
Amazon Q Business pricing is based on the chosen document index capacity. You can choose up to 50 capacity units as part of index provisioning. Each unit can contain up to 20,000 documents or 200 MB, whichever comes first. You can adjust this number as needed for your use case.
Choose Next at the bottom of the screen.

Connect data sources

Under Connect data sources, in the search field under All, enter “GitHub” and select the plus sign to the right of the GitHub selection. Choose Next to configure the data source.

You can use the following examples to create a default configuration with file type exclusions to bypass crawling common image and stylesheet files.

Enter anycompany-git-datasource in the Data source name and Description.

In the GitHub organization name field, enter your GitHub organization name. Under Authentication, provide a new access token or select an existing access token stored in AWS Secrets Manager.

Under IAM role, select Create a new service role and enter the role name under Role name for the data source.

Define Sync scope by selecting the desired repositories and content types to be synced.

Complete the Additional configuration and Sync mode.

This optional section can be used to specify the file names, types, or file path using regex patterns to define the sync scope. Also, the Sync Mode setting to define the types of content changes to sync when your data source content changes.

For the purposes of this post, under Sync run schedule, select Run on demand under Frequency so you can manually invoke the sync process. Other options for automated periodic sync runs are also supported. In the Field Mappings section, keep the default settings. After you complete the retriever creation, you can modify field mappings and add custom field attributes. You can access field mapping by editing the data source.

Add groups and users
There are two users we will use for testing: one with full permissions on all the repositories in the GitHub (Cloud) organization, and a second user with permission only on one specific repository.

Choose Add groups and users.

Select Assign existing users and groups. This will show you the option to select the users from the IAM Identity Center and add them to this Amazon Q Business application. Choose Next.

Search for the username or name and select the user from the listed options. Repeat for all of the users you wish to test with.

Assign the desired subscrption to the added users.

For Web experience service access, use the default value of Create and use a new service role. Choose Create Application and wait for the application creation process to complete.

Perform sync
To sync your new Amazon Q Business application with your desired data sources, follow these steps:

Select the newly created data source under Data sources and choose Sync now.

Depending on the number of supported data entities in the source GitHub (Cloud) organization, the sync process might take several minutes to complete.

Once the sync is complete, click on the data source name to show the sync history including number of objects scanned, added, deleted, modified, and failed. You can also access the associated Amazon CloudWatch logs to inspect the sync process and failed objects.

To access the Amazon Q Business application, select Web experience settings and choose Deployed URL. A new tab will open and ask you for sign-in details. Provide the details of the user you created earlier and choose Sign in.

Run sample queries to test the solution
You should now see the home screen of Amazon Q Business, including the associated web experience. Now we can ask questions in natural language and Amazon Q Business will provide answers based on the information indexed from your GitHub (Cloud) organization.

To begin, enter a natural language question in the Enter a prompt.

You can ask questions about the information from the synced GitHub (Cloud) data entities. For example, you can enter, “Tell me how to start a new Serverless application from scratch?” and obtain a response based on the information from the associated repository README.md file.

Because you are logged in as the first user and mapped to a GitHub (Cloud) user belonging to the review team, you should also be able to ask questions about the contents of private repositories accessible by the members of that team.

As shown in the following screenshot, you can ask questions about the private repository called aws-s3-object-management and obtain the response based on the README.md in that repository.

However, when you attempt to ask the same question when logged in as the second user, which has no access to the associated GitHub (Cloud) repository, Amazon Q Business will provide an ACL-filtered response.

Troubleshooting and frequently asked questions:
1. Why isn’t Amazon Q Business answering any of my questions?
If you are not getting answers to your questions from Amazon Q Business, verify the following:

Permissions – document ACLs indexed by Amazon Q Business may not allow you to query certain data entities as demonstrated in our example. If this is the case, please reach out to your GitHub (Cloud) administrator to verify that your user has access to the restricted documents and repeat the sync process.
Data connector sync – a failed data source sync may prevent the documents from being indexed, meaning that Amazon Q Business would be unable to answer questions about the documents that failed to sync. Please refer to the official documentation to troubleshoot data source connectors.

2. My connector is unable to sync.
Please refer to the official documentation to troubleshoot data source connectors. Please also verify that all of the required prerequisites for connecting Amazon Q Business to GitHub (Cloud) are in place.
3. I updated the contents of my data source but Amazon Q business answers using old data.
Verifying the sync status and sync schedule frequency for your GitHub (Cloud) data connector should reveal when the last sync ran successfully. It could be that your data connector sync run schedule is set to run on demand or has not yet been triggered for its next periodic run. If the sync is set to run on demand, it will need to be manually triggered.
4. How can I know if the reason I don’t see answers is due to ACLs?
If different users are getting different answers to the same questions, including differences in source attribution with citation, it is likely that the chat responses are being filtered based on user document access level represented via associated ACLs.
5. How can I sync documents without ACLs?
Access control list (ACL) crawling is on by default and can’t be turned off.
Cleanup
To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q Business application:

On the Amazon Q Business console, choose Applications in the navigation pane.
Select the application you created.
On the Actions menu, choose Delete.
Delete the AWS Identity and Access Management (IAM) roles created for the application and data retriever. You can identify the IAM roles used by the created Amazon Q Business application and data retriever by inspecting the associated configuration using the AWS console or AWS Command Line Interface (AWS CLI).
If you created an IAM Identity Center instance for this walkthrough, delete it.

Conclusion
In this post, we walked through the steps to connect your GitHub (Cloud) organization to Amazon Q Business using the out-of-the-box GitHub (Cloud) connector. We demonstrated how to create an Amazon Q Business application integrated with AWS IAM Identity Center as the identity provider. We then configured the GitHub (Cloud) connector to crawl and index supported data entities such as repositories, commits, issues, pull requests, and associated metadata from your GitHub (Cloud) organization. We showed how to perform natural language queries over the indexed GitHub (Cloud) data using the AI-powered chat interface provided by Amazon Q Business. Finally, we covered how Amazon Q Business applies ACLs associated with the indexed documents to provide permissions-filtered responses.
Beyond the web-based chat experience, Amazon Q Business offers a Chat API to create custom conversational interfaces tailored to your specific use cases. You can also use the associated API operations using the AWS CLI or AWS SDK to manage Amazon Q Business applications, retriever, sync, and user configurations.
By integrating Amazon Q Business with your GitHub (Cloud) organization, development teams can streamline access to information scattered across repositories, issues, and pull requests. The natural language interface powered by generative AI reduces context switching and can provide timely answers in a conversational manner.
To learn more about Amazon Q connector for GitHub (Cloud), refer to Connecting GitHub (Cloud) to Amazon Q Business, the Amazon Q User Guide, and the Amazon Q Developer Guide.

About the Authors

Maxim Chernyshev is a Senior Solutions Architect working with mining, energy, and industrial customers at AWS. Based in Perth, Western Australia, Maxim helps customers devise solutions to complex and novel problems using a broad range of applicable AWS services and features. Maxim is passionate about industrial Internet of Things (IoT), scalable IT/OT convergence, and cyber security.

Manjunath Arakere is a Senior Solutions Architect on the Worldwide Public Sector team at AWS, based in Atlanta, Georgia. He works with public sector partners to design and scale well-architected solutions and supports their cloud migrations and modernization initiatives. Manjunath specializes in migration, modernization, and serverless technology.

Mira Andhale is a Software Development Engineer on the Amazon Q and Amazon Kendra engineering team. She works on the Amazon Q connector design, development, integration and test operations.

Elevate customer experience through an intelligent email automation so …

Organizations spend a lot of resources, effort, and money on running their customer care operations to answer customer questions and provide solutions. Your customers may ask questions through various channels, such as email, chat, or phone, and deploying a workforce to answer those queries can be resource intensive, time-consuming, and unproductive if the answers to those questions are repetitive.
Although your organization might have the data assets for customer queries and answers, you may still struggle to implement an automated process to reply to your customers. Challenges might include unstructured data, different languages, and a lack of expertise in artificial intelligence (AI) and machine learning (ML) technologies.
In this post, we show you how to overcome such challenges by using Amazon Bedrock to automate email responses to customer queries. With our solution, you can identify the intent of customer emails and send an automated response if the intent matches your existing knowledge base or data sources. If the intent doesn’t have a match, the email goes to the support team for a manual response.
Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. Amazon Bedrock offers a serverless experience so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.
The following are some common customer intents when contacting customer care:

Transaction status (for example, status of a money transfer)
Password reset
Promo code or discount
Hours of operation
Find an agent location
Report fraud
Unlock account
Close account

Agents for Amazon Bedrock can help you perform classification and entity detection on emails for these intents. For this solution, we show how to classify customer emails for the first three intents. You can also use Agents for Amazon Bedrock to detect key information from emails, so you can automate your business processes with some actions. For example, you can use Agents for Amazon Bedrock to automate the reply to a customer request with specific information related to that query.
Moreover, Agents for Amazon Bedrock can serve as an intelligent conversational interface, facilitating seamless interactions with both internal team members and external clients, efficiently addressing inquiries and implementing desired actions. Currently, Agents for Amazon Bedrock supports Anthropic Claude models and the Amazon Titan Text G1 – Premier model on Amazon Bedrock.
Solution overview
To build our customer email response flow, we use the following services:

Agents for Amazon Bedrock
Amazon DynamoDB
AWS Lambda
Amazon Simple Email Service (Amazon SES)
Amazon Simple Notification Service (Amazon SNS)
Amazon WorkMail

Although we illustrate this use case using WorkMail, you can use another email tool that allows integration with serverless functions or webhooks to accomplish similar email automation workflows. Agents for Amazon Bedrock enables you to build and configure autonomous agents in your application. An agent helps your end-users complete actions based on organization data and user input. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. In addition, agents automatically call APIs to take actions and invoke knowledge bases to supplement information for these actions. Developers can save weeks of development effort by integrating agents to accelerate the delivery of generative AI applications. For this use case, we use the Anthropic Claude 3 Sonnet model.
When you create your agent, you enter details to tell the agent what it should do and how it should interact with users. The instructions replace the $instructions$ placeholder in the orchestration prompt template.
The following is an example of instructions we used for our use cases:

“You are a classification and entity recognition agent.

Task 1: Classify the given text into one of the following categories: “Transfer Status”, “Password Reset”, or “Promo Code”. Return only the category without additional text.

Task 2: If the classified category is “Transfer Status”, find the 10-digit entity “money_transfer_id” (example: “MTN1234567”) in the text. Call the “GetTransferStatus” action, passing the money_transfer_id as an argument, to retrieve the transfer status.

Task 3: Write an email reply for the customer based on the received text, the classified category, and the transfer status (if applicable). Include the money_transfer_id in the reply if the category is “Transfer Status”.

Task 4: Use the email signature “Best regards, Intelligent Corp” at the end of the email reply.”

An action group defines actions that the agent can help the user perform. For example, you could define an action group called GetTransferStatus with an OpenAPI schema and Lambda function attached to it. Agents for Amazon Bedrock takes care of constructing the API based on the OpenAPI schema and fulfills actions using the Lambda function to get the status from the DynamoDB money_transfer_status table.
The following architecture diagram highlights the end-to-end solution.

The solution workflow includes the following steps:

A customer initiates the process by sending an email to the dedicated customer support email address created within WorkMail.
Upon receiving the email, WorkMail invokes a Lambda function, setting the subsequent workflow in motion.
The Lambda function seamlessly relays the email content to Agents for Amazon Bedrock for further processing.
The agent employs the natural language processing capabilities of Anthropic Claude 3 Sonnet to understand the email’s content classification based on the predefined agent instruction configuration. If relevant entities are detected within the email, such as a money transfer ID, the agent invokes a Lambda function to retrieve the corresponding payment status.
If the email classification doesn’t pertain to a money transfer inquiry, the agent generates an appropriate email response (for example, password reset instructions) and calls a Lambda function to facilitate the response delivery.
For inquiries related to money transfer status, the agent action group Lambda function queries the DynamoDB table to fetch the relevant status information based on the provided transfer ID and relays the response back to the agent.
With the retrieved information, the agent crafts a tailored email response for the customer and invokes a Lambda function to initiate the delivery process.
The Lambda function uses Amazon SES to send the email response, providing the email body, subject, and customer’s email address.
Amazon SES delivers the email message to the customer’s inbox, providing seamless communication.
In scenarios where the agent can’t discern the customer’s intent accurately, it escalates the issue by pushing the message to an SNS topic. This mechanism allows subscribed ticketing system to receive the notification and create a support ticket for further investigation and resolution.

Prerequisites
Refer to the README.md file in the GitHub repo to make sure you meet the prerequisites to deploy this solution.
Deploy the solution
The solution is comprised of three AWS Cloud Deployment Kit (AWS CDK) stacks:

WorkmailOrgUserStack – Creates the WorkMail account with domain, user, and inbox access
BedrockAgentCreation – Creates the Amazon Bedrock agent, agent action group, OpenAPI schema, S3 bucket, DynamoDB table, and agent group Lambda function for getting the transfer status from DynamoDB
EmailAutomationWorkflowStack – Creates the classification Lambda function that interacts with the agent and integration Lambda function, which is integrated with WorkMail

To deploy the solution, you also perform some manual configurations using the AWS Management Console.
For full instructions, refer to the README.md file in the GitHub repo.
Test the solution
To test the solution, send an email from your personal email to the support email created as part of the AWS CDK deployment (for this post, we use support@vgs-workmail-org.awsapps.com). We use the following three intents in our sample data for custom classification training:

MONEYTRANSFER – The customer wants to know the status of a money transfer
PASSRESET – The customer has a login, account locked, or password request
PROMOCODE – The customer wants to know about a discount or promo code available for a money transfer

The following screenshot shows a sample customer email requesting the status of a money transfer. The following screenshot shows the email received in a WorkMail inbox. The following screenshot shows a response from the agent regarding the customer query. If the customer email isn’t classified, the content of the email is forwarded to an SNS topic. The following screenshot shows an example customer email. The following screenshot shows the agent response. Whoever is subscribed to the topic receives the email content as a message. We subscribed to this SNS topic with the email that we passed with the human_workflow_email parameter during the deployment.
Clean up
To avoid incurring ongoing costs, delete the resources you created as part of this solution when you’re done. For instructions, refer to the README.md file.
Conclusion
In this post, you learned how to configure an intelligent email automation solution using Agents for Amazon Bedrock, WorkMail, Lambda, DynamoDB, Amazon SNS, and Amazon SES. This solution can provide the following benefits:

Improved email response time
Improved customer satisfaction
Cost savings regarding time and resources
Ability to focus on key customer issue

You can expand this solution to other areas in your business and to other industries. Also, you can use this solution to build a self-service chatbot by deploying the BedrockAgentCreation stack to answer customer or internal user queries using Agents for Amazon Bedrock.
As next steps, check out Agents for Amazon Bedrock to start using its features. Follow Amazon Bedrock on the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Bedrock.

About the Author
Godwin Sahayaraj Vincent is an Enterprise Solutions Architect at AWS who is passionate about Machine Learning and providing guidance to customers to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play cricket with his friends and tennis with his three kids.
Ramesh Kumar Venkatraman is a Senior Solutions Architect at AWS who is passionate about Generative AI, Containers and Databases. He works with AWS customers to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play with his two kids and follows cricket.

Getting started with cross-region inference in Amazon Bedrock

With the advent of generative AI solutions, a paradigm shift is underway across industries, driven by organizations embracing foundation models to unlock unprecedented opportunities. Amazon Bedrock has emerged as the preferred choice for numerous customers seeking to innovate and launch generative AI applications, leading to an exponential surge in demand for model inference capabilities. Bedrock customers aim to scale their worldwide applications to accommodate growth, and require additional burst capacity to handle unexpected surges in traffic. Currently, users might have to engineer their applications to handle scenarios involving traffic spikes that can use service quotas from multiple regions by implementing complex techniques such as client-side load balancing between AWS regions, where Amazon Bedrock service is supported. However, this dynamic nature of demand is difficult to predict, increases operational overhead, introduces potential points of failure, and might hinder businesses from achieving true global resilience and continuous service availability.
Today, we are happy to announce the general availability of cross-region inference, a powerful feature allowing automatic cross-region inference routing for requests coming to Amazon Bedrock. This offers developers using on-demand inference mode, a seamless solution for managing optimal availability, performance, and resiliency while managing incoming traffic spikes of applications powered by Amazon Bedrock. By opting in, developers no longer have to spend time and effort predicting demand fluctuations. Instead, cross-region inference dynamically routes traffic across multiple regions, ensuring optimal availability for each request and smoother performance during high-usage periods. Moreover, this capability prioritizes the connected Amazon Bedrock API source/primary region when possible, helping to minimize latency and improve responsiveness. As a result, customers can enhance their applications’ reliability, performance, and efficiency.
Let us dig deeper into this feature where we will cover:

Key features and benefits of cross-region inference
Getting started with cross-region inference
Code samples for defining and leveraging this feature
How to think about migrating to cross-region inference
Key considerations
Best Practices to follow for this feature
Conclusion

Let’s dig in!
Key features and benefits.
One of the critical requirements from our customers is the ability to manage bursts and spiky traffic patterns across a variety of generative AI workloads and disparate request shapes. Some of the key features of cross-region inference include:

Utilize capacity from multiple AWS regions allowing generative AI workloads to scale with demand.
Compatibility with existing Amazon Bedrock API
No additional routing or data transfer cost and you pay the same price per token for models as in your source/primary region.
Become more resilient to any traffic bursts. This means, users can focus on their core workloads and writing logic for their applications powered by Amazon Bedrock.
Ability to choose from a range of pre-configured AWS region sets tailored to your needs.

The below image would help to understand how this feature works. Amazon Bedrock makes real-time decisions for every request made via cross-region inference at any point of time. When a request arrives to Amazon Bedrock, a capacity check is performed in the same region where the request originated from, if there is enough capacity the request is fulfilled else a second check determines a secondary region which has capacity to take the request, it is then re-routed to that decided region and results are retrieved for customer request. This ability to perform capacity checks was not available to customers so they had to implement manual checks of every region of choice after receiving an error and then re-route. Further the typical custom implementation of re-routing might be based on round robin mechanism with no insights into the available capacity of a region. With this new capability, Amazon Bedrock takes into account all the aspects of traffic and capacity in real-time, to make the decision on behalf of customers in a fully-managed manner without any extra costs.

 Few points to be aware of:

AWS network backbone is used for data transfer between regions instead of internet or VPC peering, resulting in secure and reliable execution.
The feature will try to serve the request from your primary region first. It will route to other regions in case of heavy traffic, bottlenecks and load balance the requests.
You can access a select list of models via cross-region inference, which are essentially region agnostic models made available across the entire region-set. You will be able to use a subset of models available in Amazon Bedrock from anywhere inside the region-set even if the model is not available in your primary region.
You can use this feature in the Amazon Bedrock model invocation APIs (InvokeModel and Converse API).
You can choose whether to use Foundation Models directly via their respective model identifier or use the model via the cross-region inference mechanism. Any inferences performed via this feature will consider on-demand capacity from all of its pre-configured regions to maximize availability.
There will be additional latency incurred when re-routing happens and, in our testing, it has been a double-digit milliseconds latency add.
All terms applicable to the use of a particular model, including any end user license agreement, still apply when using cross-region inference.
When using this feature, your throughput can reach up to double the allocated quotas in the region that the inference profile is in. The increase in throughput only applies to invocation performed via inference profiles, the regular quota still applies if you opt for in-region model invocation request. To see quotas for on-demand throughput, refer to the Runtime quotas section in Quotas for Amazon Bedrock or use the Service Quotas console

Definition of a secondary region
Let us dive deep into a few important aspects:

What is a secondary region? As part of this launch, you can select either a US Model or EU Model, each of which will include 2-3 preset regions from these geographical locations.
Which models are included? As part of this launch, we will have Claude 3 family of models (Haiku, Sonnet, Opus) and Claude 3.5 Sonnet made available.
Can we use PrivateLink? Yes, you will be able to leverage your private links and ensure traffic flows via your VPC with this feature.
Can we use Provisioned Throughput with this feature as well? Currently, this feature will not apply to Provisioned Throughput and can be used for on-demand inference only.
When does the workload traffic get re-routed? Cross-region inference will first try to service your request via the primary region (region of the connected Amazon Bedrock endpoint). As the traffic patterns spike up and Amazon Bedrock detects potential delays, the traffic will shift pro-actively to the secondary region and get serviced from those regions.
Where would the logs be for cross-region inference? The logs and invocations will still be in the primary region and account where the request originates from. Amazon Bedrock will output indicators on the logs which will show which region actually serviced the request.
Here is an example of the traffic patterns can be from below (map not to scale).

A customer with a workload in eu-west-1 (Ireland) may choose both eu-west-3 (Paris) and eu-central-1 (Frankfurt) as a pair of secondary regions, or a workload in us-east-1 (Northern Virginia) may choose us-west-2 (Oregon) as a single secondary region, or vice versa. This would keep all inference traffic within the United States of America or European Union.
Security and Architecture of how cross-region inference looks like
The following diagram shows the high-level architecture for a cross-region inference request:

The operational flow starts with an Inference request coming to a primary region for an on-demand baseline model. Capacity evaluations are made on the primary region and the secondary region list, creating a region capacity list in capacity order. The region with the most available capacity, in this case eu-central-1 (Frankfurt), is selected as the next target. The request is re-routed to Frankfurt using the AWS Backbone network, ensuring that all traffic remains within the AWS network. The request bypasses the standard API entry-point for the Amazon Bedrock service in the secondary region and goes directly to the Runtime inference service, where the response is returned back to the primary region over the AWS Backbone and then returned to the caller as per a normal inference request. If processing in the chosen region fails for any reason, then the next region in the region capacity list highest available capacity is tried, eu-west-1 (Ireland) in this example, followed by eu-west-3 (Paris), until all configured regions have been attempted. If no region in the secondary region list can handle the inference request, then the API will return the standard “throttled” response.
Networking and data logging
The AWS-to-AWS traffic flows, such as Region-to-Region (inclusive of Edge Locations and Direct Connect paths), will always traverse AWS-owned and operated backbone paths. This not only reduces threats, such as common exploits and DDoS attacks, but also ensures that all internal AWS-to-AWS traffic uses only trusted network paths. This is combined with inter-Region and intra-Region path encryption and routing policy enforcement mechanisms, all of which use AWS secure facilities. This combination of enforcement mechanisms helps ensure that AWS-to-AWS traffic will never use non-encrypted or untrusted paths, such as the internet, and hence as a result all cross-region inference requests will remain on the AWS backbone at all times.
Log entries will continue to be made in the original source region for both Amazon CloudWatch and AWS CloudTrail, and there will be no additional logs in the re-routed region. In order to indicate that re-routing happened the related entry in AWS CloudTrail will also include the following additional data – it is only added if the request was processed in a re-routed region.

<requestRoutedToRegion>
    us-east-1
</requestRoutedToRegion>

During an inference request, Amazon Bedrock does not log or otherwise store any of a customer’s prompts or model responses. This is still true if cross-region inference re-routes a query from a primary region to a secondary region for processing – that secondary region does not store any data related to the inference request, and no Amazon CloudWatch or AWS CloudTrail logs are stored in that secondary region.
Identity and Access Management
AWS Identity and Access Management (IAM) is key to securely managing your identities and access to AWS services and resources. Before you can use cross-region inference, check that your role has access to the cross-region inference API actions. For more details, see here.An example policy, which allows the caller to use the cross-region inference with the InvokeModel* APIs for any model in the us-east-1 and us-west-2 region is as follows:

{
  “Version”: “2012-10-17”,
  “Statement”: [
    {
      “Effect”: “Allow”,
      “Action”: [“bedrock:InvokeModel*”],
      “Resource: [
         “arn:aws:bedrock:us-east-1:<account_id>:inference-profile/*”,
          “arn:aws:bedrock:us-east-1::foundation-model/*”,
          “arn:aws:bedrock:us-west-2::foundation-model/*”
      ]
    }
  ]
}

Getting started with Cross-region inference
To get started with cross-region inference, you make use of Inference Profiles in Amazon Bedrock. An inference profile for a model, configures different model ARNs from respective AWS regions and abstracts them behind a unified model identifier (both id and ARN). Just by simply using this new inference profile identifier with the InvokeModel or Converse API, you can use the cross-region inference feature.
Here are the steps to start using cross-region inference with the help of inference profiles:

List Inference Profiles You can list the inference profiles available in your region by either signing in to Amazon Bedrock AWS console or API.

Console

From the left-hand pane, select “Cross-region Inference”
You can explore different inference profiles available for your region(s).
Copy the inference profile ID and use it in your application, as described in the section below

API It is also possible to list the inference profiles available in your region via boto3 SDK or AWS CLI.

aws bedrock list-inference-profiles

You can observe how different inference profiles have been configured for various geo locations comprising of multiple AWS regions. For example, the models with the prefix us. are configured for AWS regions in USA, whereas models with eu. are configured with the regions in European Union (EU).

Modify Your Application

Update your application to use the inference profile ID/ARN from console or from the API response as modelId in your requests via InvokeModel or Converse
This new inference profile will automatically manage inference throttling and re-route your request(s) across multiple AWS Regions (as per configuration) during peak utilization bursts.

Monitor and Adjust

Use Amazon CloudWatch to monitor your inference traffic and latency across regions.
Adjust the use of inference profile vs FMs directly based on your observed traffic patterns and performance requirements.

Code example to leverage Inference Profiles
Use of inference profiles is similar to that of foundation models in Amazon Bedrock using the InvokeModel or Converse API, the only difference between the modelId is addition of a prefix such as us. or eu.
Foundation Model

modelId = ‘anthropic.claude-3-5-sonnet-20240620-v1:0’
bedrock_runtime.converse(
modelId=modelId,
system=[{
“text”: “You are an AI assistant.”
}],
messages=[{
“role”: “user”,
“content”: [{“text”: “Tell me about Amazon Bedrock.”}]
}]
)

Inference Profile

modelId = ‘eu.anthropic.claude-3-5-sonnet-20240620-v1:0’
bedrock_runtime.converse(
modelId=modelId,
system=[{
“text”: “You are an AI assistant.”
}],
messages=[{
“role”: “user”,
“content”: [{“text”: “Tell me about Amazon Bedrock.”}]
}]
)

Deep Dive
While it is straight forward to start using inference profiles, you first need to know which inference profiles are available as part of your region. Start with the list of inference profiles and observe models available for this feature. This is done through the AWS CLI or SDK.

import boto3
bedrock_client = boto3.client(“bedrock”, region_name=”us-east-1″)
bedrock_client.list_inference_profiles()

You can expect an output similar to the one below:

{
“inferenceProfileSummaries”: [
{
“inferenceProfileName”: “us. Anthropic Claude 3.5 Sonnet”,
“models”: [
{
“modelArn”: “arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0”
},
{
“modelArn”: “arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0”
}
],
“description”: “Routes requests to Anthropic Claude 3.5 Sonnet in us-east-1 and us-west-2”,
“createdAt”: “2024-XX-XXT00:00:00Z”,
“updatedAt”: “2024-XX-XXT00:00:00Z”,
“inferenceProfileArn”: “arn:aws:bedrock:us-east-1:<account_id>:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0”,
“inferenceProfileId”: “us.anthropic.claude-3-5-sonnet-20240620-v1:0”,
“status”: “ACTIVE”,
“type”: “SYSTEM_DEFINED”
},

]
}

The difference between ARN for a foundation model available via Amazon Bedrock and the inference profile can be observed as:
Foundation Model: arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0
Inference Profile: arn:aws:bedrock:us-east-1:<account_id>:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0
Choose the configured inference profile, and start sending inference requests to your model’s endpoint as usual. Amazon Bedrock will automatically route and scale the requests across the configured regions as needed. You can choose to use both ARN as well as ID with the Converse API whereas just the inference profile ID with the InvokeModel API. It is important to note which models are supported by Converse API.

import boto3

primary_region =”<primary-region-name>” #us-east-1, eu-central-1
bedrock_runtime = boto3.client(“bedrock-runtime”, region_name= primary_region)
inferenceProfileId = ‘<regional-prefix>.anthropic.claude-3-5-sonnet-20240620-v1:0’

# Example with Converse API
system_prompt = “You are an expert on AWS AI services.”
input_message = “Tell me about AI service for Foundation Models”
response = bedrock_runtime.converse(
modelId = inferenceProfileId,
system = [{“text”: system_prompt}],
messages=[{
“role”: “user”,
“content”: [{“text”: input_message}]
}]
)

print(response[‘output’][‘message’][‘content’])
us-east-1 or eu-central-1

In the code sample above you must specify <your-primary-region-name> such as US regions including us-east-1, us-west-2 or EU regions including eu-central-1, eu-west-1, eu-west-3. The <regional-prefix> will then be relative, either us or eu.
Adapting your applications to use Inference Profiles for your Amazon Bedrock FMs is quick and easy with steps above. No significant code changes are required on the client side. Amazon Bedrock handles the cross-region inference transparently. Monitor CloudTrail logs to check if your request is automatically re-routed to another region as described in the section above.
How to think about adopting to the new cross-region inference feature?
When considering the adoption of this new capability, it’s essential to carefully evaluate your application requirements, traffic patterns, and existing infrastructure. Here’s a step-by-step approach to help you plan and adopt cross-region inference:

Assess your current workload and traffic patterns. Analyze your existing generative AI workloads and identify those that experience significant traffic bursts or have high availability requirements including current traffic patterns, including peak loads, geographical distribution, and any seasonal or cyclical variations
Evaluate the potential benefits of cross-region inference. Consider the potential advantages of leveraging cross-region inference, such as increased burst capacity, improved availability, and better performance for global users. Estimate the potential cost savings by not having to implement a custom logic of your own and pay for data transfer (as well as different token pricing for models) or efficiency gains by off-loading multiple regional deployments into a single, fully-managed distributed solution.
Plan and execute the migration. Update your application code to use the inference profile ID/ARN instead of individual foundation model IDs, following the provided code sample above. Test your application thoroughly in a non-production environment, simulating various traffic patterns and failure scenarios. Monitor your application’s performance, latency, and cost during the migration process, and make adjustments as needed.
Develop new applications with cross-region inference in mind. For new application development, consider designing with cross-region inference as the foundation, leveraging inference profiles from the start. Incorporate best practices for high availability, resilience, and global performance into your application architecture.

Key Considerations
Impact on Current Generative AI Workloads
Inference profiles are designed to be compatible with existing Amazon Bedrock APIs, such as InvokeModel and Converse. Also, any third-party/opensource tool which uses these APIs such as LangChain can be used with inference profiles. This means that you can seamlessly integrate inference profiles into your existing workloads without the need for significant code changes. Simply update your application to use the inference profiles ARN instead of individual model IDs, and Amazon Bedrock will handle the cross-region routing transparently.
Impact on Pricing
The feature comes with no additional cost to you. You pay the same price per token of individual models in your primary/source region. There is no additional cost associated with cross-region inference including the failover capabilities provided by this feature. This includes management, data-transfer, encryption, network usage and potential differences in price per million token per model.
Regulations, Compliance, and Data Residency
Although none of the customer data is stored in either the primary or secondary region(s) when using cross-region inference, it’s important to consider that your inference data will be processed and transmitted beyond your primary region. If you have strict data residency or compliance requirements, you should carefully evaluate whether cross-region inference aligns with your policies and regulations.
Conclusion
In this blog we introduced the latest feature from Amazon Bedrock, cross-region inference via inference profiles, and a peek into how it operates and also dived into some of the how-to’s and points for considerations. This feature empowers developers to enhance the reliability, performance, and efficiency of their applications, without the need to spend time and effort building complex resiliency structures. This feature is now generally available in US and EU for supported models.

About the authors
Talha Chattha is a Generative AI Specialist Solutions Architect at Amazon Web Services, based in Stockholm. Talha helps establish practices to ease the path to production for Gen AI workloads. Talha is an expert in Amazon Bedrock and supports customers across entire EMEA. He holds passion about meta-agents, scalable on-demand inference, advanced RAG solutions and cost optimized prompt engineering with LLMs. When not shaping the future of AI, he explores the scenic European landscapes and delicious cuisines. Connect with Talha at LinkedIn using /in/talha-chattha/.
Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on the serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.
Sumit Kumar is a Principal Product Manager, Technical at AWS Bedrock team, based in Seattle. He has 12+ years of product management experience across a variety of domains and is passionate about AI/ML. Outside of work, Sumit loves to travel and enjoys playing cricket and Lawn-Tennis.
Dr. Andrew Kane is an AWS Principal WW Tech Lead (AI Language Services) based out of London. He focuses on the AWS Language and Vision AI services, helping our customers architect multiple AI services into a single use-case driven solution. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors.

Building automations to accelerate remediation of AWS Security Hub con …

Several factors can make remediating security findings challenging. First, the sheer volume and complexity of findings can overwhelm security teams, leading to delays in addressing critical issues. Findings often require a deep understanding of AWS services and configurations and require many cycles for validation, making it more difficult for less experienced teams to remediate issues effectively. Some findings might require coordination across multiple teams or departments, leading to communication challenges and delays in implementing fixes. Finally, the dynamic nature of cloud environments means that new security findings can appear rapidly and constantly, requiring a more effective and scalable solution to remediate findings.
In this post, we will harness the power of generative artificial intelligence (AI) and Amazon Bedrock to help organizations simplify and effectively manage remediations of AWS Security Hub control findings. By using Agents for Amazon Bedrock with action groups and Knowledge Bases for Amazon Bedrock, you can now create automations with AWS Systems Manager Automation (for services that support automations with AWS Systems Manager) and deploy them into AWS accounts. Thus, by following a programmatic continuous integration and development (CI/CD) approach, you can scale better and remediate security findings promptly.
Solution overview
This solution follows prescriptive guidance for automating remediation for AWS Security Hub standard findings. Before delving into the deployment, let’s review the key steps of the solution architecture, as shown in the following figure.

Figure 1 : AWS Security Hub control remediation using Amazon Bedrock and AWS Systems Manager

A SecOps user uses the Agents for Amazon Bedrock chat console to enter their responses. For instance, they might specify “Generate an automation for remediating the finding, database migration service replication instances should not be public.” Optionally, if you’re already aggregating findings in Security Hub, you can export them to an Amazon Simple Storage Service (Amazon S3) bucket and still use our solution for remediation.
On receiving the request, the agent invokes the large language model (LLM) with the provided context from a knowledge base. The knowledge base contains an Amazon S3 data source with AWS documentation. The data is converted into embeddings using the Amazon Titan Embeddings G1 model and stored in an Amazon OpenSearch vector database.
Next, the agent passes the information to an action group that invokes an AWS Lambda function. The Lambda function is used to generate the Systems Manager automation document.
The output from the Lambda function is published to a AWS CodeCommit repository.
Next, the user validates the template file that is generated as an automation for a particular service. In this case, the user will navigate to the document management system (DMS) folder and validate the template file. Once the file has been validated, the user places the template file into a new deploy folder in the repo.
This launches AWS CodePipeline to invoke a build job using AWS CodeBuild. Validation actions are run on the template.
Amazon Simple Notification Service (Amazon SNS) notification is sent to the SecOps user to approve changes for deployment.
Once changes are approved, the CloudFormation template is generated that creates an SSM automation document

If an execution role is provided, via AWS CloudFormation stack set, SSM automation document is executed across specified workload accounts.
If an execution role is not provided, SSM automation document is deployed only to the current account.

SSM automation document is executed to remediate the finding.
The user navigates to AWS Security Hub service via AWS management console and validates the compliance status of the control (For example, DMS.1).

In this post, we focus on remediation of two example security findings:

Amazon S3 general purpose buckets should require requests to use SSL
AWS Database Migration Service replication instances should not be public

The example findings demonstrate the two potential paths the actions group can take for remediation. It also showcases the capabilities of action groups with Retrieval Augmented Generation (RAG) and how you can use Knowledge Bases for Amazon Bedrock to automate security remediation.
For the first finding, AWS has an existing Systems Manager runbook to remediate the S3.5 finding. The solution uses the existing runbook (through a knowledge base) and renders an AWS CloudFormation template as automation.
The second finding has no AWS provided runbook or playbook. The solution will generate a CloudFormation template that creates an AWS Systems Manager document to remediate the finding.
Prerequisites
Below are the prerequisites that are needed before you can deploy the solution.

An AWS account with the necessary permissions to access and configure the required services in a specific AWS Region (AWS Security Hub, Amazon S3, AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, AWS Systems Manager, AWS Lambda, Amazon OpenSearch service).
Access to Anthropic Claude 3 Sonnet LLM model granted in the AWS account.
AWS Config is enabled in the account. Ensure that the configuration recorder is configured to record all resources in your AWS account.
Security Hub is enabled in the account. Integrate other AWS security services, such as AWS Config to aggregate their findings in Security Hub.
Understanding of general key terms:

Amazon Bedrock agent
Prompt
Knowledge base

Deployment steps
There are five main steps in order to deploy the solution.
Step 1: Configure a knowledge base
Configuring a knowledge base enables your Amazon Bedrock agents to access a repository of information for AWS account provisioning. Follow these steps to set up your knowledge base.
Prepare the data sources:

Create an S3 bucket that will store the knowledge base data sources. Such as, KnowledgeBaseDataSource-<AccountId>.
Define the data source. For this solution, we’re using three AWS documentation guides in PDF that covers all AWS provided automations through runbooks or playbooks. Upload files from the data-source folder in the Git repository to the newly created S3 bucket from previous step.

Create the knowledge base:

Access the Amazon Bedrock console. Sign in and go directly to the Knowledge Base section.
Name your knowledge base. Choose a clear and descriptive name that reflects the purpose of your knowledge base, such as AWSAutomationRunbooksPlaybooks.
Select an AWS Identity and Access Management (IAM) role. Assign a preconfigured IAM role with the necessary permissions. It’s typically best to let Amazon Bedrock create this role for you to ensure it has the correct permissions.
Choose the default embeddings model. The Amazon Titan Embeddings G1 is a text model that is preconfigured and ready to use, simplifying the process.
Choose the Quick create a new vector store. Allow Amazon Bedrock to create and manage the vector store for you in OpenSearch Service.
Review and finalize. Double-check all entered information for accuracy. Pay special attention to the S3 bucket URI and IAM role details.

Note: After successful creation, copy the knowledge base ID because you will need to reference it in the next step.
Sync the data source:

Select the newly created knowledge base.
In the Data source section, choose Sync to begin data ingestion.
When data ingestion completes, a green success banner appears if it is successful.

Step 2: Configure the Amazon Bedrock agent

Open the Amazon Bedrock console, select Agents in the left navigation panel, then choose Create Agent.
Enter agent details including an agent name and description (optional).
Under Agent resource role section, select Create and use a new service role. This IAM service role gives your agent access to required services, such as Lambda.
In the Select model section, choose Anthropic and Claude 3 Sonnet.
To automate remediation of Security Hub findings using Amazon Bedrock agents, attach the following instruction to the agent: “You are an AWS security expert, tasked to help customer remediate security related findings.Inform the customer what your objective is. Gather relevant information such as finding ID or finding title so that you can perform your task. With the information given, you will attempt to find an automated remediation of the finding and provide it to the customer as IaC.”
 Select the newly created agent and take note of the Agent ARN in the Agent Overview section. You will be required to input this as a parameter in the next step.

Step 3: Deploy the CDK project

Download the CDK project repository containing the solution’s infrastructure code. You can find the code from GitHub repository.
To work with a new project, create and activate a virtual environment. This allows the project’s dependencies to be installed locally in the project folder, instead of globally. Create a new virtual environment: python -m venv .venv. Activate the environment: source .venv/bin/activate
Install dependencies from requirements.txt: pip install -r requirements.txt
Before deploying the solution, you need to bootstrap your AWS environment for CDK. Run the following command to bootstrap your environment: cdk bootstrap aws://<your-aws-account-id>/<your-aws-region>
Navigate to the downloaded CDK project directory and open the cdk.json file. Update the following parameters in the file:

KB_ID: Provide the ID of the Amazon Bedrock knowledge base you set up manually in the prerequisites.
BEDROCK_AGENT_ARN: The Amazon Bedrock agent Amazon Resource Name (ARN) that was created in Step 2.
NOTIFICATION_EMAILS: Enter an email address for pipeline approval notifications.
CFN_EXEC_ROLE_NAME: (Optional) IAM role that will be used by CloudFormation to deploy templates into the workload accounts.
WORKLOAD_ACCOUNTS: (Optional) Specify a space-separated list of AWS account IDs where the CloudFormation templates will be deployed. “<account-id-1> <account-id-2>”.

Run the following command to synthesize the CDK app and generate the CloudFormation template: cdk synth
Finally, deploy the solution to your AWS environment using the following command: cdk deploy –all. This command will deploy all the necessary resources, including the Lambda function, the CodeCommit repository, the CodePipeline, and the Amazon SNS notification.
After the deployment is complete, verify that all the resources were created successfully. You can check the outputs of the CDK deployment to find the necessary information, such as the CodeCommit repository URL, Lambda function name, and the Amazon SNS topic ARN.

Step 4: Configure the agent action groups
Create an action group linked to the Lambda function that was created in the CDK app. This action group is launched by the agent after the user inputs the Security Hub finding ID or finding title, and outputs a CloudFormation template in the Code Commit repository.
Step 5: Add the action groups to the agent

Enter securityhubremediation as the Action group name and Security Hub Remediations as the Description.
Under Action group type, select Define with API schemas.
For Action group invocation, choose Select an existing Lambda function.
From the dropdown, select the Lambda function that was created in Step 3.
In Action group schema, choose Select an existing API schema. Provide a link to the Amazon S3 URI of the schema with the API description, structure, and parameters for the action group. APIs manage the logic for receiving user inputs and launching the Lambda functions for account creation and customization. For more information, see Action group OpenAPI schemas.

Note: For this solution, openapischema.json is provided to you in the Git repository. Upload the JSON into the S3 bucket created in Step 1 and reference the S3 URI when selecting the API schema in this step.
Testing
In order to validate the solution, follow the below steps :
Step 1: Sign in to AWS Security Hub console.

Select a Security Hub Finding.
 For testing the solution, look for a finding that has a status of FAILED.
Copy the finding title – ” Database Migration Service replication instance should not be public”. This is shown in Figure 2.

Figure 2 : AWS Security Hub finding title

Step 2: Sign in to the Amazon Bedrock console.

Select the agent.

As you begin to interact with the agent, it will ask you for a Security Hub finding title to remediate.
Enter a Security Hub finding title. For example, “Database migration service replication instances should not be public”.

Review the resulting CloudFormation template published to the CodeCommit repository provisioned as part of the deployment.

If a finding already has an AWS remediation runbook available, the agent will output its details. That is, it will not create a new runbook. When automation through a Systems Manager runbook isn’t possible, the agent will output a message similar to “Unable to automate remediation for this finding.” An example Bedrock Agent interaction is shown in Figure 3.

Figure 3 : An example Bedrock Agent Interaction

Step 3: For the new runbooks, validate the template file and parameters

Check if the template requires any parameters to be passed.
If required, create a new file parameter file with the following naming convention:

<Bedrock_Generated_Template_Name>-params.json
For example: DatabaseMigrationServicereplicationinstanceshouldnotbepublic-params.json

Step 4: Stage files for deployment

Create new folder named deploy in the CodeCommit repository.
Create a new folder path deploy/parameters/ in the CodeCommit repository.
Upload the YAML template file to the newly created deploy folder.
Upload the params JSON file to deploy/parameters.
The structure of the deploy folder should be as follows:

├ deploy

  ├ < Bedrock_Generated_Template_Name >.yaml

  ├ parameters

    ├ < Bedrock_Generated_Template_Name >-params.json

Note: Bedrock_Generated_Template_Name refers to the name of the YAML file that has been output by Amazon Bedrock. Commit of the file will invoke the pipeline. An example Bedrock generated YAML file is shown in Figure 4.

Figure 4 : An example Bedrock generated YAML file

Step 5: Approve the pipeline

Email will be sent through Amazon SNS during the manual approval stage. Approve the pipeline to continue the build.
Systems Manager automation will be built using CloudFormation in the workload account.

Step 6: Validate compliance status

Sign in to the Security Hub console and validate the compliance status of the finding ID or title.
Verify that the compliance status has been updated to reflect the successful remediation of the security issue. This is shown in Figure 5.

Figure 5 : Validation of successful remediation of AWS Security Hub control finding

Cleanup
To avoid unnecessary charges, delete the resources created during testing. To delete the resources, perform the following steps:

Delete the knowledge base

Open the Amazon Bedrock console.
From the left navigation pane, choose Knowledge base.
To delete a source, either choose the radio button next to the source and select Delete or choose the Name of the source and then select Delete in the top right corner of the details page.
Review the warnings for deleting a knowledge base. If you accept these conditions, enter “delete” in the input box and choose Delete to confirm.
Empty and delete the S3 bucket data source for the knowledge base.

Delete the agent

In the Amazon Bedrock console, choose Agents from the navigation pane.
Select the radio button next to the agent to delete.
A modal window will pop up warning you about the consequences of deletion. Enter delete in the input box and choose Delete to confirm.
A blue banner will inform you that the agent is being deleted. When deletion is complete, a green success banner will appear.

Delete all the other resources

Use cdk destroy -all to delete the app and all stacks associated with it.

Conclusion
The integration of generative AI for remediating security findings is an effective approach, allowing SecOps teams to scale better and remediate findings in a timely manner. Using the generative AI capabilities of Amazon Bedrock alongside AWS services such as AWS Security Hub and automation, a capability of AWS Systems Manager, allows organizations to quickly remediate security findings by building automations that align with best practices while minimizing development effort. This approach not only streamlines security operations but also embeds a CI/CD approach for remediating security findings.
The solution in this post equips you with a plausible pattern of AWS Security Hub and AWS Systems Manager integrated with Amazon Bedrock, deployment code, and instructions to help remediate security findings efficiently and securely according to AWS best practices.
Ready to start your cloud migration process with generative AI in Amazon Bedrock? Begin by exploring the Amazon Bedrock User Guide to understand how you can use Amazon Bedrock to streamline your organization’s cloud journey. For further assistance and expertise, consider using AWS Professional Services to help you accelerate remediating AWS Security Hub findings and maximize the benefits of Amazon Bedrock.

About the Authors
Shiva Vaidyanathan is a Principal Cloud Architect at AWS. He provides technical guidance for customers ensuring their success on AWS. His primary expertise include Migrations, Security, GenAI and works towards making AWS cloud adoption simpler for everyone. Prior to joining AWS, he has worked on several NSF funded research initiatives on performing secure computing in public cloud infrastructures. He holds a MS in Computer Science from Rutgers University and a MS in Electrical Engineering from New York University.
Huzaifa Zainuddin is a Senior Cloud Infrastructure Architect at AWS, specializing in designing, deploying, and scaling cloud solutions for a diverse range of clients. With a deep expertise in cloud infrastructure and a passion for leveraging the latest AWS technologies, he is eager to help customers embrace generative AI by building innovative automations that drive operational efficiency. Outside of work, Huzaifa enjoys traveling, cycling, and exploring the evolving landscape of AI.

Get the most from Amazon Titan Text Premier

Amazon Titan Text Premier, the latest addition to the Amazon Titan family of large language models (LLMs), is now generally available in Amazon Bedrock. Amazon Titan Text Premier is an advanced, high performance, and cost-effective LLM engineered to deliver superior performance for enterprise-grade text generation applications, including optimized performance for Retrieval Augmented Generation (RAG) and agents. The model is built from the ground up following safe, secure, and trustworthy responsible AI practices and excels in delivering exceptional generative artificial intelligence (AI) text capabilities at scale.
Exclusive to Amazon Bedrock, Amazon Titan Text Premier supports a wide range of text-related tasks, including summarization, text generation, classification, question-answering, and information extraction. This new model offers optimized performance for key features such as RAG on Knowledge Bases for Amazon Bedrock and function calling on Agents for Amazon Bedrock. Such integrations enable advanced applications like building interactive AI assistants that use your APIs and interact with your documents.
Why choose Amazon Titan Text Premier?
As of today, the Amazon Titan family of models for text generation allows for context windows from 4K to 32K and a rich set of capabilities around free text and code generation, API orchestration, RAG, and Agent based applications. An overview of these Amazon Titan models is shown in the following table.

Model
Availability
Context window
Languages
Functionality
Customized fine-tuning

Amazon Titan Text Lite
GA
4K
English
Code, rich text
Yes

Amazon Titan Text Express
GA (English)
8K
Multilingual (100+ languages)
Code, rich text, API orchestration
Yes

Amazon Titan Text Premier
GA
32K
English
Enterprise text generation applications, RAG, agents
Yes (preview)

Amazon Titan Text Premier is an LLM designed for enterprise-grade applications. It is optimized for performance and cost-effectiveness, with a maximum context length of 32,000 tokens. Amazon Titan Text Premier enables the development of custom agents for tasks such as text summarization, generation, classification, question-answering, and information extraction. It also supports the creation of interactive AI assistants that can integrate with existing APIs and data sources. As of today, Amazon Titan Text Premier is also customizable with your own datasets for even better performance with your specific use cases. In our own internal tests, fine-tuned Amazon Titan Text Premier models on various tasks related to instruction tuning and domain adaptation yielded superior results compared to the Amazon Titan Text Premier model baseline, as well as other fine-tuned models. To try out model customization for Amazon Titan Text Premier, contact your AWS account team. By using the capabilities of Amazon Titan Text Premier, organizations can streamline workflows and enhance their operations and customer experiences through advanced language AI.
As highlighted in the AWS News Blog launch post, Amazon Titan Text Premier has demonstrated strong performance on a range of public benchmarks that assess critical enterprise-relevant capabilities. Notably, Amazon Titan Text Premier achieved a score of 92.6% on the HellaSwag benchmark, which evaluates common-sense reasoning, outperforming outperforming competitor models. Additionally, Amazon Titan Text Premier showed strong results on reading comprehension (89.7% on RACE-H) and multi-step reasoning (77.9 F1 score on the DROP benchmark). Across diverse tasks like instruction following, representation of questions in 57 subjects, and BIG-Bench Hard, Amazon Titan Text Premier has consistently delivered comparable performance to other providers, highlighting its broad intelligence and utility for enterprise applications. However, we encourage our customers to benchmark the model’s performance on their own specific datasets and use cases because actual performance may vary. Conducting thorough testing and evaluation is crucial to ensure the model meets the unique requirements of each enterprise.
How do you get started with Amazon Titan Text Premier?
Amazon Titan Text Premier is generally available in Amazon Bedrock in the US East (N. Virginia) AWS Region.
To enable access to Amazon Titan Text Premier, on the Amazon Bedrock console, choose Model access on the bottom left pane. On the Model access overview page, choose Manage model access in the upper right corner and enable access to Amazon Titan Text Premier.
With Amazon Titan Text Premier available through the Amazon Bedrock serverless experience, you can easily access the model using a single API and without managing any infrastructure. You can use the model either through the Amazon Bedrock REST API or the AWS SDK using the InvokeModel API or Converse API. In the code example below, we define a simple function “call_titan” which uses the boto3 “bedrock-runtime” client to invoke the Amazon Titan Text Premier model.

import logging
import json
import boto3
from botocore.exceptions import ClientError

# Configure logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

def call_titan(prompt,
model_id=’amazon.titan-text-premier-v1:0′,
max_token_count=1024,
temperature=0.7,
top_p=0.9):
“””
Generate text using Amazon Titan Text Premier model on demand.
Args:
prompt (str): The prompt to be used.
model_id (str): The model ID to use. We are using ‘amazon.titan-text-premier-v1:0′ for this example.
max_token_count (int): Number of max tokens to be used. Default is 1024.
temperature (float): Randomness parameter. Default is 0.7.
top_p (float): Sum of Probability threshold. Default is 0.9.
Returns:
response (dict): The response from the model.
“””
logger.info(“Generating text with Amazon Titan Text Premier model %s”, model_id)
try:
# Initialize Bedrock client
bedrock = boto3.client(service_name=’bedrock-runtime’)
accept = “application/json”
content_type = “application/json”

# Prepare request body
request_body = {
“inputText”: prompt,
“textGenerationConfig”: {
“maxTokenCount”: max_token_count,
“stopSequences”: [],
“temperature”: temperature,
“topP”: top_p
}
}
body = json.dumps(request_body)

# Invoke model
bedrock_client = boto3.client(service_name=’bedrock’)
response = bedrock.invoke_model(
body=body, modelId=model_id, accept=accept, contentType=content_type
)
response_body = json.loads(response.get(“body”).read())
return response_body
except ClientError as err:
message = err.response[“Error”][“Message”]
logger.error(“A client error occurred: %s”, message)
return None

# Example usage
# response = call_titan(“Your prompt goes here”)

With a maximum context length of 32K tokens, Amazon Titan Text Premier has been specifically optimized for enterprise use cases, such as building RAG and agent-based applications with Knowledge Bases for Amazon Bedrock and Agents for Amazon Bedrock. The model training data includes examples for tasks like summarization, Q&A, and conversational chat and has been optimized for integration with Knowledge Bases for Amazon Bedrock and Agents for Amazon Bedrock. The optimization includes training the model to handle the nuances of these features, such as their specific prompt formats.
Sample RAG and agent based application using Knowledge Bases for Amazon Bedrock and Agents for Amazon Bedrock
Amazon Titan Text Premier offers high-quality RAG through integration with Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for RAG. You can now choose Amazon Titan Text Premier with Knowledge Bases for Amazon Bedrock to implement question-answering and summarization tasks over your company’s proprietary data.
Evaluating high-quality RAG system on research papers with Amazon Titan Text Premier using Knowledge Bases for Amazon Bedrock
To demonstrate how Amazon Titan Text Premier can be used in RAG based applications, we ingested recent research papers (which are linked in the resources section) and articles related to LLMs to construct a knowledge base using Amazon Bedrock and Amazon OpenSearch Serverless. Learn more about how you can do this on your own here. This collection (see the references section for the full list) of papers and articles covers a wide range of topics, including benchmarking tools, distributed training techniques, surveys of LLMs, prompt engineering methods, scaling laws, quantization approaches, security considerations, self-improvement algorithms, and efficient training procedures. As LLMs continue to progress rapidly and find widespread use, it is crucial to have a comprehensive and up-to-date knowledge repository that can facilitate informed decision-making, foster innovation, and enable responsible development of these powerful AI systems. By grounding the answers from a RAG model on this Amazon Bedrock knowledge base, we can ensure that the responses are backed by authoritative and relevant research, enhancing their accuracy, trustworthiness, and potential for real-world impact.
The following video showcases the capabilities of Knowledge Bases for Amazon Bedrock when used with Amazon Titan Text Premier, which was constructed using the research papers and articles we discussed earlier. When models available on Amazon Bedrock, such as Amazon Amazon Titan Text Premier, are asked about research on avocados or more relevant research about AI training methods, they can confidently answer without using any sources. In this particular example, the answers may even be wrong. The video shows how Knowledge Bases for Amazon Bedrock and Amazon Titan Text Premier can be used to ground answers based on recent research. With this setup, when asked, “What does recent research have to say about the health benefits of eating avocados?” the system correctly acknowledges that it does not have access to information related to this query within its knowledge base, which focuses on LLMs and related areas. However, when prompted with “What is codistillation?” the system provides a detailed response grounded in the information found in the source chunks displayed.

This demonstration effectively illustrates the knowledge-grounded nature of Knowledge Bases for Amazon Bedrock and its ability to provide accurate and well-substantiated responses based on the curated research content when used with models like Amazon Titan Text Premier. By grounding the responses in authoritative sources, the system ensures that the information provided is reliable, up-to-date, and relevant to the specific domain of LLMs and related areas. Amazon Bedrock also allows users to edit retriever and model parameters and instructions in the prompt template to further customize how sources are used and how answers are generated, as shown in the following screenshot. This approach not only enhances the credibility of the responses but also promotes transparency by explicitly displaying the source material that informs the system’s output.

Build a human resources (HR) assistant with Amazon Titan Text Premier using Knowledge Bases for Amazon Bedrock and Agents for Amazon Bedrock
The following video describes the workflow and architecture of creating an assistant with Amazon Titan Text Premier.

The workflow consists of the following steps:

Input query – Users provide natural language inputs to the agent.
Preprocessing step – During preprocessing, the agent validates, contextualizes, and categorizes user input. The user input (or task) is interpreted by the agent using the chat history, instructions, and underlying FM specified during agent creation. The agent’s instructions are descriptive guidelines outlining the agent’s intended actions. Also, you can configure advanced prompts, which allow you to boost your agent’s precision by employing more detailed configurations and offering manually selected examples for few-shot prompting. This method allows you to enhance the model’s performance by providing labeled examples associated with a particular task.
Action groups – Action groups are a set of APIs and corresponding business logic whose OpenAPI schema is defined as JSON files stored in Amazon Simple Storage Service (Amazon S3). The schema allows the agent to reason around the function of each API. Each action group can specify one or more API paths whose business logic is run through the AWS Lambda function associated with the action group.

In this sample application, the agent has multiple actions associated within an action group, such as looking up and updating the data around the employee’s time off in an Amazon Athena table, sending Slack and Outlook messages to teammates, generating images using Amazon Titan Image Generator, and making a knowledge base query to get the relevant details.

Knowledge Bases for Amazon Bedrock look up as an action – Knowledge Bases for Amazon Bedrock provides fully managed RAG to supply the agent with access to your data. You first configure the knowledge base by specifying a description that instructs the agent when to use your knowledge base. Then, you point the knowledge base to your Amazon S3 data source. Finally, you specify an embedding model and choose to use your existing vector store or allow Amazon Bedrock to create the vector store on your behalf. Once configured, each data source sync creates vector embeddings of your data, which the agent can use to return information to the user or augment subsequent FM prompts.

In this sample application, we use Amazon Titan Text Embeddings as an embedding model along with the default OpenSearch Serverless vector database to store our embedding. The knowledge base contains the employer’s relevant HR documents, such as parental leave policy, vacation policy, payment slips and more.

Orchestration – During orchestration, the agent develops a rationale with the logical steps of which action group API invocations and knowledge base queries are needed to generate an observation that can be used to augment the base prompt for the underlying FM. This ReAct style of prompting serves as the input for activating the FM, which then anticipates the most optimal sequence of actions to complete the user’s task.

In this sample application, the agent processes the employee’s query, breaks it down into a series of subtasks, determines the proper sequence of steps, and finally executes the appropriate actions and knowledge searches on the fly.

Postprocessing – Once all orchestration iterations are complete, the agent curates a final response. Postprocessing is disabled by default.

The following sections demonstrate test calls on the HR assistant application
Using Knowledge Bases for Amazon Bedrock
In this test call, the assistant makes a knowledge base call to fetch the relevant information from the documents about HR policies to answer the query, “Can you get me some details about parental leave policy?” The following screenshot shows the prompt query and the reply.

Knowledge Bases for Amazon Bedrock call with GetTimeOffBalance action call and UpdateTimeOffBalance action call
In this test call, the assistant needs to answer the query, “My partner and I are expecting a baby on July 1. Can I take 2 weeks off?” It makes a knowledge base call to fetch the relevant information from the documents and answer questions based on the results. This is followed by making the GetTimeOffBalance action call to check for the available vacation time off balance. In the next query, we ask the assistant to update the database with appropriate values by asking,
“Yeah, let’s go ahead and request time off for 2 weeks from July 1–14, 2024.”

Amazon Titan Image Generator action call
In this test call, the assistant makes a call to Amazon Titan Image Generator through Agents for Amazon Bedrock actions to generate the corresponding image based on the input query, “Generate a cartoon image of a newborn child with parents.” The following screenshot shows the query and the response, including the generated image.

Amazon Simple Notification Service (Amazon SNS) email sending action
In this test call, the assistant makes a call to the emailSender action through Amazon SNS to send an email message, using the query, “Send an email to my team telling them that I will be away for 2 weeks starting July 1.” The following screenshot shows the exchange.

The following screenshot shows the response email.

Slack integration
You can set up the Slack message API similarly using Slack Webhooks and integrate it as one of the actions in Amazon Bedrock. For a demo, view the 90-second YouTube video and Refer to GitHub for the code repo
Agent responses might vary with different tries, so make sure to optimize your prompts to make it robust for other use cases.
Conclusion
In this post, we introduced the new Amazon Titan Text Premier model, specifically optimized for enterprise use cases, such as building RAG and agent-based applications. Such integrations enable advanced applications like building interactive AI assistants that use enterprise APIs and interact with your propriety documents. Now that you know more about Amazon Titan Text Premier and its integrations with Knowledge Bases for Amazon Bedrock and Agents for Amazon Bedrock, we can’t wait to see what you all build with this model.
To learn more about the Amazon Titan family of models, visit the Amazon Titan product page. For pricing details, review Amazon Bedrock pricing. For more examples to get started, check out the Amazon Bedrock workshop repository and Amazon Bedrock samples repository.

About the authors
Anupam Dewan is a Senior Solutions Architect with a passion for Generative AI and its applications in real life. He and his team enable Amazon Builders who build customer facing application using generative AI. He lives in Seattle area, and outside of work loves to go on hiking and enjoy nature.
Shreyas Subramanian is a Principal data scientist and helps customers by using Machine Learning to solve their business challenges using the AWS platform. Shreyas has a background in large scale optimization and Machine Learning, and in use of Machine Learning and Reinforcement Learning for accelerating optimization tasks.

GenASL: Generative AI-powered American Sign Language avatars

In today’s world, effective communication is essential for fostering inclusivity and breaking down barriers. However, for individuals who rely on visual communication methods like American Sign Language (ASL), traditional communication tools often fall short. That’s where GenASL comes in. GenASL is a generative artificial intelligence (AI)-powered solution that translates speech or text into expressive ASL avatar animations, bridging the gap between spoken and written language and sign language.
The rise of foundation models (FMs), and the fascinating world of generative AI that we live in, is incredibly exciting and opens doors to imagine and build what wasn’t previously possible. AWS makes it possible for organizations of all sizes and developers of all skill levels to build and scale generative AI applications with security, privacy, and responsible AI.
In this post, we dive into the architecture and implementation details of GenASL, which uses AWS generative AI capabilities to create human-like ASL avatar videos.
Solution overview
The GenASL solution comprises several AWS services working together to enable seamless translation from speech or text to ASL avatar animations. Users can input audio, video, or text into GenASL, which generates an ASL avatar video that interprets the provided data. The solution uses AWS AI and machine learning (AI/ML) services, including Amazon Transcribe, Amazon SageMaker, Amazon Bedrock, and FMs.
The following diagram shows a high-level overview of the architecture.

The workflow includes the following steps:

An Amazon Elastic Compute Cloud (Amazon EC2) instance initiates a batch process to create ASL avatars from a video dataset consisting of over 8,000 poses using RTMPose, a real-time multi-person pose estimation toolkit based on MMPose.
AWS Amplify distributes the GenASL web app consisting of HTML, JavaScript, and CSS to users’ mobile devices.
An Amazon Cognito identity pool grants temporary access to the Amazon Simple Storage Service (Amazon S3) bucket.
Users upload audio, video, or text to the S3 bucket using the AWS SDK through the web app.
The GenASL web app invokes the backend services by sending the S3 object key in the payload to an API hosted on Amazon API Gateway.
API Gateway instantiates an AWS Step Functions The state machine orchestrates the AI/ML services Amazon Transcribe and Amazon Bedrock and the NoSQL data store Amazon DynamoDB using AWS Lambda functions.
The Step Functions workflow generates a pre-signed URL of the ASL avatar video for the corresponding audio file.
A pre-signed URL for the video file stored in Amazon S3 is sent back to the user’s browser through API Gateway asynchronously through polling. The user’s mobile device plays the video file using the pre-signed URL.

As shown in the following figure, speech or text is converted to an ASL gloss, which is then used to produce an ASL video.

Let’s dive into the implementation details of each component.
Batch process
The ASL Lexicon Video Dataset (ASLLVD) consists of multiple synchronized videos showing the signing from different angles of more than 3,300 ASL signs in citation form, each produced by 1–6 native ASL signers. Linguistic annotations include gloss labels, sign start and end time codes, start and end handshape labels for both hands, and morphological and articulatory classifications of sign type. For compound signs, the dataset includes annotations for each morpheme. To facilitate computer vision-based sign language recognition, the dataset also includes numeric ID labels for sign variants, video sequences in uncompressed raw format, and camera calibration sequences.
We store the input dataset in an S3 bucket (video dataset) and use RTMPose and a PyTorch-based pose estimation open source toolkit to generate the ASL avatar videos. MMPose is a member of the OpenMMLab Project and contains a rich set of algorithms for 2D multi-person human pose estimation, 2D hand pose estimation, 2D face landmark detection, and 133 keypoint whole-body human pose estimations.
The EC2 instance initiates the batch process that stores the ASL avatar videos in another S3 bucket (ASL avatars) for every ASL gloss and stores the ASL gloss and its corresponding ASL avatar video’s S3 key in the DynamoDB table.
Backend
The backend process has three steps: process the input audio to English text, translate the English text to an ASL gloss, and generate an ASL avatar video from the ASL gloss. This API layer is fronted by API Gateway, which allows the user to authenticate, monitor, and throttle the API request. Because API Gateway has a timeout of 29 seconds, this asynchronous solution uses polling. Whenever the API gets a request to generate the sign video, it invokes a Step Functions workflow and then returns the Step Functions runtime URL back to the frontend application. The Step Functions workflow has three steps:

Convert the audio input to English text using Amazon Transcribe, an automatic speech-to-text AI service that uses deep learning for speech recognition. Amazon Transcribe is a fully managed and continuously training service designed to handle a wide range of speech and acoustic characteristics, including variations in volume, pitch, and speaking rate.
Translate the English text to an ASL gloss using Amazon Bedrock, which is used to build and scale generative AI applications using FMs. Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. We used Anthropic Claude v3 Sonnet on AWS Bedrock to create an ASL gloss.
Generate the ASL avatar video from the ASL gloss. Using the ASL gloss created in the translation layer, we look up the corresponding ASL sign from the DynamoDB table. If the gloss is not available in the GenASL database, the logic falls back to fingerspelling each alphabet letter. The Lookup ASL Avatar Lambda function stitches the videos together, generates a temporary video, uploads that to the S3 bucket, creates a pre-signed URL, and sends the pre-signed URL for both the sign video and the avatar video back to the frontend through polling. The frontend plays the video in a loop.

Frontend
The frontend application is built using Amplify, a framework that allows you to build, develop, and deploy full stack applications, including mobile and web applications. You can add the authentication to a frontend Amplify app using the Amplify command Add Auth, which generates the sign-up and sign-in pages, as well as the backend and the Amazon Cognito identity pools. During the audio file upload to Amazon S3, the frontend connects with Amazon S3 using the temporary identity provided by the Amazon Cognito identity pool.
Best practices
The following are best practices for creating the ASL avatar video application.
API design
API Gateway supports a maximum timeout of 29 seconds. Additionally, it’s a best practice to not build synchronous APIs for long-running processes. Therefore, we built an asynchronous API consisting of two stages by allowing the client to poll a REST resource to check the status of its request. We implemented this pattern using API Gateway and Step Functions. In the first stage, the S3 key and bucket name are sent to an API endpoint that delegates the request to a Step Functions workflow and sends a response back with the execution ARN. In the second stage, the API checks the status of the workflow run based on the ARN provided as an input to this API endpoint. If the ASL avatar is successfully created, this API returns the pre-signed URL. Otherwise, it sends a RUNNING status and the frontend waits for a couple of seconds, and then calls the second API endpoint again. This step is repeated until the API returns the pre-signed URL to the caller.
Step Functions supports direct optimized integration with Amazon Bedrock, so we don’t need to have a Lambda function in the middle to create the ASL gloss. We can call the Amazon Bedrock API directly from the Step Functions workflow to save on Lambda compute cost.
DevOps
From a DevOps perspective, the frontend uses Amplify to build and deploy, and the backend is uses AWS Serverless Application Model (AWS SAM) to build, package, and deploy the serverless applications. We used Amazon CloudWatch to build a dashboard to capture the metrics, including the number of API invocations (number of ASL avatar videos generated), average response time to create the video, and error metrics, to create a good user experience by tracking if there is a failure and alerting the DevOps team appropriately.
Prompt engineering
We provided a prompt to convert English text to an ASL gloss along with the input text message to the Amazon Bedrock API to invoke Anthropic Claude. We use the few-shot prompting technique by providing a few examples to produce an accurate ASL gloss.
The code sample is available in the accompanying GitHub repository.
Prerequisites
Before you begin, make sure you have the following set up:

Docker – Make sure Docker is installed and running on your machine. Docker is required for containerized application development and deployment. You can download and install Docker from Docker’s official website.
AWS SAM CLI – Install the AWS SAM CLI. This tool is essential for building and deploying serverless applications. For instructions, refer to Install the AWS SAM CLI.
Amplify CLI – Install the Amplify CLI. The Amplify CLI is a powerful toolchain for simplifying serverless web and mobile development. For instructions, refer to Set up Amplify CLI.
Windows-based EC2 instance – Make sure you have access to a Windows-based EC2 instance to run the batch process. This instance will be used for various tasks such as video processing and data preparation. For instructions, refer to Launch an instance.
FFmpeg – The video processing step in the GenASL solution requires a functioning installation of FFmpeg, a multimedia framework used by this solution to split and join video files. For instructions to install FFmpeg on the Windows EC2 instance, refer to Download FFmpeg.

Set up the solution
This section provides steps to deploy an ASL avatar generator using AWS services. We outline the steps for cloning the repository, processing data, deploying the backend, and setting up the frontend.

Clone the GitHub repository using the following command:

git clone https://github.com/aws-samples/genai-asl-avatar-generator.git

Follow the instructions in the dataprep folder to initialize the database:

Modify genai-asl-avatar-generator/dataprep/config.ini with information specific to your environment:

[DEFAULT]
s3_bucket= <your S3 bucket>
s3_prefix= <files will be generated in this prefix within your S3 bucket>
table_name=<dynamodb table name>
region=<your preferred AWS region>

Set up your environment by installing the required Python packages:

cd genai-asl-avatar-generator/dataprep
chmod +x env_setup.cmd
./env_setup.cmd

Prepare the sign video annotation file for each processing run:

python prep_metadata.py

Download the sign videos, segment them, and store them in Amazon S3:

python create_sign_videos.py

Generate avatar videos:

python create_pose_videos.py

Use the following command to deploy the backend application:

cd genai-asl-avatar-generator/backend
sam deploy –guided

Set up the frontend:

Initialize your Amplify environment:

amplify init

Modify the frontend configuration to point to the backend API:

Open frontend/amplify/backend/function/Audio2Sign/index.py.
Modify the stateMachineArn variable to have the state machine ARN shown in the output generated from the backend deployment.

Add hosting to the Amplify project:

amplify add hosting

In the prompt, choose Amazon CloudFront and S3 and choose the bucket to host the GenASL application.
Install the relevant packages by running the following command:

npm install –force

Deploy the Amplify project:

amplify publish

Run the solution
After you deploy the Amplify project using the amplify publish command, an Amazon CloudFront URL will be returned. You can use this URL to access the GenASL demo application. With the application open, you can register a new user and test the ASL avatar generation functionality.
Clean up
To avoid incurring costs, clean up the resources you created for this application when you no longer need them.

Remove all the frontend resources created by Amplify using the following command:

amplify delete

Remove all the backend resources created by AWS SAM using the following command:

sam delete

Clean up resources used by the batch process.

If you created a new EC2 instance for running the batch process, delete the instance using the Amazon EC2 console.
If you reused an existing EC2 instance, delete the project folder recursively to clean up all the resources:

rm -rf genai-asl-avatar-generator

Empty and delete the S3 bucket using the following commands:

aws s3 rm s3://<bucket-name> –recursive
aws s3 rb s3://<bucket-name> –force

Next steps
Although GenASL has achieved its initial goals, we’re working to expand its capabilities with advancements like 3D pose estimation, blending techniques, and bi-directional translation between ASL and spoken languages:

3D pose estimation – The GenASL application is currently generating a 2D avatar. We plan to convert the GenASL solution to create 3D avatars using the 3D pose estimation algorithms supported by MMPose. With that approach, we can create thousands of 3D keypoints. Using Stable Diffusion image generation capabilities, we can create realistic, human-like avatars in real-world settings.
Blending techniques – When you view the videos generated by the GenASL application, you may see frame skipping. There are some frame drops when we stitch the video together, resulting in a sudden change in the motion. To fix that, we can use a technique called blending. We are working on incorporating currently available partner solutions in order to create the intermediate frames to blend in and create smoother videos.
Bi-directional – The GenASL application currently converts audio to an ASL video. We also need a solution from the ASL video back to English audio, which can be done by navigating in the reverse direction. To do that, we can record a real-time sign video, take the video frame-by-frame, and send that through pose estimation algorithms. Next, we collect and combine the keypoints and search against the keypoints database to get the ASL gloss and convert that back to text. Using Amazon Polly, we can convert the text back to audio.

Conclusion
By combining speech-to-text, machine translation, text-to-video generation, and AWS AI/ML services, the GenASL solution creates expressive ASL avatar animations, fostering inclusive and effective communication. This post provided an overview of the GenASL architecture and implementation details. As generative AI continues to evolve, we can create groundbreaking applications that enhance accessibility and inclusivity for all.

About the Authors
Alak Eswaradass is a Senior Solutions Architect at AWS based in Chicago, Illinois. She is passionate about helping customers architect solutions utilizing AWS cloud technologies to solve business challenges. She is enthusiastic about leveraging cutting-edge technologies like Generative AI to drive innovation in cloud architectures. When she’s not working, Alak enjoys spending time with her daughters and exploring the outdoors with her dogs.
Suresh Poopandi is a Principal Solutions Architect at AWS, based in Chicago, Illinois, helping Healthcare Life Science customers with their cloud journey by providing architectures utilizing AWS services to achieve their business goals. He is passionate about building home automation and AI/ML solutions.
Rob Koch is a tech enthusiast who thrives on steering projects from their initial spark to successful fruition, Rob Koch is Principal at Slalom Build in Seattle, an AWS Data Hero, and Co-chair of the CNCF Deaf and Hard of Hearing Working Group. His expertise in architecting event-driven systems is firmly rooted in the belief that data should be harnessed in real time. Rob relishes the challenge of examining existing systems and mapping the journey towards an event-driven architecture.

Improving RLHF (Reinforcement Learning from Human Feedback) with Criti …

Language models have gained prominence in reinforcement learning from human feedback (RLHF), but current reward modeling approaches face challenges in accurately capturing human preferences. Traditional reward models, trained as simple classifiers, struggle to perform explicit reasoning about response quality, limiting their effectiveness in guiding LLM behavior. The primary issue lies in their inability to generate reasoning traces, forcing all evaluations to occur implicitly within a single forward pass. This constraint hinders the model’s capacity to assess the nuances of human preferences thoroughly. While alternative approaches like the LLM-as-a-Judge framework have attempted to address this limitation, they generally underperform classic reward models in pairwise preference classification tasks, highlighting the need for a more effective method.

Researchers have attempted various approaches to address the challenges in reward modeling for language models. Ranking models like Bradley-Terry and Plackett-Luce have been employed, but they struggle with intransitive preferences. Some studies directly model the probability of one response being preferred over another, while others focus on modeling rewards across multiple objectives. Recent work has proposed maintaining and training the language model head as a form of regularization.

Critique-based feedback methods have also been explored, with some utilizing self-generated critiques to improve generation quality or serve as preference signals. However, these approaches differ from efforts to train better reward models when human preference data is available. Some researchers have investigated using oracle critiques or human-labeled critique preferences to teach language models to critique effectively.

The LLM-as-a-Judge framework, which uses a grading rubric to evaluate responses, shares similarities with critique-based methods but focuses on evaluation rather than revision. While this approach produces chain-of-thought reasoning, it generally underperforms classic reward models in pairwise preference classification tasks.

Researchers from Databricks, MIT, and the University of California, San Diego present Critique-out-Loud (CLoud) reward models, which represent a unique approach to improving language model performance in reinforcement learning from human feedback. These models generate a detailed critique of how well an assistant’s response answers a user’s query before producing a scalar reward for the response quality. This process combines the strengths of classic reward models and the LLM-as-a-Judge framework.

CLoud reward models are trained using a preference dataset containing prompts, responses, and oracle critiques. The training process involves supervised fine-tuning on oracle critiques for critique generation and the Bradley-Terry preference model for scalar reward production. To enhance performance, the researchers explore multi-sample inference techniques, particularly self-consistency, which involves sampling multiple critique-reward predictions and marginalizing across critiques for a more accurate reward estimate.

This innovative approach aims to unify reward models and LLM-as-a-Judge methods, potentially leading to significant improvements in pairwise preference classification accuracy and win rates in various benchmarks. The researchers also investigate key design choices, such as on-policy versus off-policy training, and the benefits of self-consistency over critiques to optimize reward modeling performance.

CLoud reward models extend classic reward models by incorporating a language modeling head alongside the base model and reward head. The training process involves supervised fine-tuning on oracle critiques, replacing these with self-generated critiques, and then training the reward head on the self-generated critiques. This approach minimizes the distribution shift between training and inference. The model uses modified loss functions, including a Bradley-Terry model loss and a critique-supervised fine-tuning loss. To enhance performance, CLoud models can employ self-consistency during inference, sampling multiple critiques for a prompt-response pair and averaging their predicted rewards for a final estimate.

The researchers evaluated CLoud reward models against classic reward models using two key metrics: pairwise preference classification accuracy and Best-of-N (BoN) win rate. For pairwise preference classification, they used the RewardBench evaluation suite, which includes categories like Chat, Chat-Hard, Safety, and Reasoning. The BoN win rate was assessed using ArenaHard, an open-ended generation benchmark.

CLoud reward models significantly outperformed classic reward models in pairwise preference classification across all categories on RewardBench, for both 8B and 70B model scales. This led to a substantial increase in average accuracy for CLoud models.

In the BoN evaluation on ArenaHard, CLoud models demonstrated a Pareto improvement over classic models, producing equal or significantly higher win rates. For Best-of-16, CLoud improved the win rate by 1.84 and 0.89 percentage points for 8B and 70B models, respectively. These results suggest that CLoud reward models offer superior performance in guiding language model behavior compared to classic reward models.

This study introduces CLoud reward models, which represent a significant advancement in preference modeling for language models. By preserving language modeling capabilities alongside a scalar reward head, these models explicitly reason about response quality through critique generation. This approach demonstrates substantial improvements over classic reward models in pairwise preference modeling accuracy and Best-of-N decoding performance. Self-consistency decoding proved beneficial for reasoning tasks, particularly those with short reasoning horizons. By unifying language generation with preference modeling, CLoud reward models establish a new paradigm that opens avenues for improving reward models through variable inference computing, laying the groundwork for more sophisticated and effective preference modeling in language model development.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Improving RLHF (Reinforcement Learning from Human Feedback) with Critique-Generated Reward Models appeared first on MarkTechPost.

Revolutionizing Medical Training with AI- This AI Paper Unveils MEDCO: …

The rapid integration of AI technologies in medical education has revealed significant limitations in existing educational tools. Current AI-assisted systems primarily support solitary learning and are unable to replicate the interactive, multidisciplinary, and collaborative nature of real-world medical training. This deficiency poses a significant challenge, as effective medical education requires students to develop proficient question-asking skills, engage in peer discussions, and collaborate across various medical specialties. Overcoming this challenge is crucial to ensure that medical students are adequately prepared for real-world clinical settings, where the ability to navigate complex patient interactions and multidisciplinary teams is essential for accurate diagnosis and effective treatment.

Current AI-driven educational tools largely rely on single-agent chatbots designed to simulate medical scenarios by interacting with students in a limited, role-specific capacity. While these systems can automate specific tasks, such as providing diagnostic suggestions or conducting medical examinations, they fall short in promoting the development of essential clinical skills. The solitary nature of these tools means they do not facilitate peer discussions or collaborative learning, both of which are vital for a deep understanding of complex medical cases. Additionally, these models often require extensive computational resources and large datasets, which makes them impractical for real-time application in dynamic educational environments. Such limitations prevent these tools from fully replicating the intricacies of real-world medical training, thus impeding their overall effectiveness in medical education.

A team of researchers from The Chinese University of Hong Kong and The University of Hong Kong proposes MEDCO (Medical Education COpilots), a novel multi-agent system designed to emulate the complexities of real-world medical training environments. MEDCO features three core agents: an agentic patient, an expert doctor, and a radiologist, all of whom work together to create a multi-modal, interactive learning environment. This approach allows students to practice critical skills such as effective question-asking, engage in multidisciplinary collaborations, and participate in peer discussions, providing a comprehensive learning experience that mirrors real clinical settings. MEDCO’s design marks a significant advancement in AI-driven medical education by offering a more effective, efficient, and accurate training solution than existing methods.

MEDCO operates through three key stages: agent initialization, learning, and practicing scenarios. In the agent initialization phase, three agents are introduced: the agentic patient, who simulates a variety of symptoms and health conditions; the agentic medical expert, who evaluates student diagnoses and offers feedback; and the agentic doctor, who assists in interdisciplinary cases. The learning phase involves the student interacting with the patient and radiologist to develop a diagnosis, with the expert agent providing feedback that is stored in the student’s learning memory for future reference. In the practicing phase, students apply their stored knowledge to new cases, allowing for continuous improvement in diagnostic skills. The system is evaluated using the MVME dataset, which consists of 506 high-quality Chinese medical records and demonstrates substantial improvements in diagnostic accuracy and learning efficiency.

The effectiveness of MEDCO is evidenced by significant improvements in the diagnostic performance of medical students simulated by language models like GPT-3.5. Evaluated using Holistic Diagnostic Evaluation (HDE), Semantic Embedding-based Matching Assessment (SEMA), and Coarse And Specific Code Assessment for Diagnostic Evaluation (CASCADE), MEDCO consistently enhanced student performance across all metrics. For example, after training with MEDCO, students showed considerable improvement in the Medical Examination section, with scores increasing from 1.785 to 2.575 after engaging in peer discussions. SEMA and CASCADE metrics further validated the system’s effectiveness, particularly in recall and F1-score, indicating that MEDCO supports a deeper understanding of medical cases. Students trained with MEDCO achieved an average HDE score of 2.299 following peer discussions, surpassing the 2.283 score of advanced models like Claude3.5-Sonnet. This result highlights MEDCO’s capability to significantly enhance learning outcomes.

In conclusion, MEDCO represents a groundbreaking advancement in AI-assisted medical education by effectively replicating the complexities of real-world clinical training. By introducing a multi-agent framework that supports interactive and multidisciplinary learning, MEDCO addresses the critical challenges of existing educational tools. The proposed method offers a more comprehensive and accurate training experience, as demonstrated by substantial improvements in diagnostic performance. MEDCO has the potential to revolutionize medical education, better prepare students for real-world scenarios, and advance the field of AI in medical training.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Revolutionizing Medical Training with AI- This AI Paper Unveils MEDCO: Medical Education Copilots Based on a Multi-Agent Framework appeared first on MarkTechPost.

Training-Free Graph Neural Networks (TFGNNs) with Labels as Features ( …

Advanced Machine Learning models called Graph Neural Networks (GNNs) process and analyze graph-structured data. They have proven quite successful in a number of applications, including recommender systems, question-answering, and chemical modeling. Transductive node classification is a typical problem for GNNs, where the goal is to predict the labels of certain nodes in a graph based on the known labels of other nodes. This method works very well in fields like social network analysis, e-commerce, and document classification.

Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs) are two of the different varieties of GNNs that have demonstrated exceptional effectiveness in transductive node classification. However, the high computational cost of GNNs poses a significant obstacle to their deployment, particularly when working with large graphs like social networks or the World Wide Web, which can have billions of nodes. 

In order to overcome this, researchers have created methods for accelerating GNN calculations, but they all have various limitations, such as requiring numerous training repetitions or a lot of processing power. The idea of training-free Graph Neural Networks (TFGNNs) has been presented as a solution to these problems. During transductive node classification, TFGNNs use the concept of “labels as features” (LaF), in which node labels are utilized as features. By using label information from nearby nodes, this technique enables GNNs to produce node embeddings that are more informative than those that are only based on node properties. 

Using the concept of TFGNNs, the model can basically perform well even in the absence of a conventional training procedure. In contrast to traditional GNNs, which usually need a lot of training to function at their best, TFGNNs can start working immediately after initialization and only require training when necessary.

Experimental studies have strongly supported the effectiveness of TFGNNs. TFGNNs consistently beat traditional GNNs, which need a lot of training to get comparable results when tested in a training-free environment. Compared to conventional models, TFGNNs converge substantially faster and require a significantly smaller number of iterations to obtain optimal performance when optional training is used. TFGNNs are a very attractive solution for a variety of graph-based applications because of their efficiency and versatility, especially in situations where rapid deployment and low computational resources are crucial.

The team has summarized their primary contributions as follows.

The use of “labels as features” (LaF), a method that has not been well studied but has substantial advantages, has been discussed in this research for transductive learning.

The study formally demonstrates how LaF greatly increases the expressive power of GNNs, increasing their capacity to represent intricate relationships in graph data.

Training-free graph neural networks (TFGNNs) have been introduced in this research as a transformational approach that can function well even without a lot of training.

Experimental findings have demonstrated the efficiency of TFGNNs, confirming that they perform better than current GNN models in a training-free environment.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Training-Free Graph Neural Networks (TFGNNs) with Labels as Features (Laf) for Superior Transductive Learning appeared first on MarkTechPost.

Turing-Complete-RAG (TC-RAG): A Breakthrough Framework Enhancing Accur …

The field of large language models (LLMs) has rapidly evolved, particularly in specialized domains like medicine, where accuracy and reliability are crucial. In healthcare, these models promise to significantly enhance diagnostic accuracy, treatment planning, and the allocation of medical resources. However, the challenges inherent in managing the system state and avoiding errors within these models remain significant. Addressing these issues ensures that LLMs can be effectively and safely integrated into medical practice. As LLMs are tasked with processing increasingly complex queries, the need for mechanisms that can dynamically control and monitor the retrieval process becomes even more apparent. This need is particularly pressing in high-stakes medical scenarios, where the consequences of errors can be severe.

One of the primary issues facing medical LLMs is the need for more accurate and reliable performance when dealing with highly specialized queries. Despite advancements, current models frequently struggle with issues such as hallucinations—where the model generates incorrect information—outdated knowledge, and the accumulation of erroneous data. These problems stem from lacking robust mechanisms to control and monitor retrieval. Without such mechanisms, LLMs can produce unreliable conclusions, which is particularly problematic in the medical field, where incorrect information can lead to serious consequences. Moreover, the challenge is compounded by the dynamic nature of medical knowledge, which requires systems that can adapt and update continuously.

Various methods have been developed to address these challenges, with Retrieval-Augmented Generation (RAG) being one of the more promising approaches. RAG enhances LLM performance by integrating external knowledge bases and providing the models with up-to-date and relevant information during content generation. However, these methods often fall short because they need to incorporate system state variables. These variables are essential for adaptive control, ensuring the retrieval process converges on accurate and reliable results. A mechanism to manage these state variables is necessary to maintain the effectiveness of RAG, particularly in the medical domain, where decisions often require intricate, multi-step reasoning and the ability to adapt dynamically to new information.

Researchers from Peking University, Zhongnan University of Economics and Law, University of Chinese Academy of Science, and University of Electronic Science and Technology of China have introduced a novel Turing-Complete-RAG (TC-RAG) framework. This system is designed to address the shortcomings of traditional RAG methods by incorporating a Turing Complete approach to manage state variables dynamically. This innovation allows the system to control and halt the retrieval process effectively, preventing the accumulation of erroneous knowledge. By leveraging a memory stack system with adaptive retrieval and reasoning capabilities, TC-RAG ensures that the retrieval process reliably converges on an optimal conclusion, even in complex medical scenarios.

Image Source

The TC-RAG system employs a sophisticated memory stack that monitors and manages the retrieval process through actions like push and pop, which are integral to its adaptive retrieval and reasoning capabilities. This stack-based approach allows the system to selectively remove irrelevant or harmful information selectively, thereby avoiding the accumulation of errors. By maintaining a dynamic and responsive memory system, TC-RAG enhances the LLM’s ability to plan and reason effectively, similar to how medical professionals approach complex cases. The system’s ability to adapt to the evolving context of a query and make real-time decisions based on the current state of knowledge marks a significant improvement over existing methods.

Image Source

In rigorous evaluations of real-world medical datasets, TC-RAG demonstrated a notable improvement in accuracy over traditional methods. The system outperformed baseline models across various metrics, including Exact Match (EM) and BLEU-4 scores, showing an average performance gain of up to 7.20%. For instance, on the MMCU-Medical dataset, TC-RAG achieved EM scores as high as 89.61%, and BLEU-4 scores reached 53.04%. These results underscore the effectiveness of TC-RAG’s approach to managing system state and memory, making it a powerful tool for medical analysis and decision-making. The system’s ability to dynamically manage and update its knowledge base ensures that it remains relevant and accurate, even as medical knowledge evolves.

In conclusion, the TC-RAG framework addresses key challenges such as retrieval accuracy, system state management, and the avoidance of erroneous knowledge; TC-RAG offers a robust solution for enhancing the reliability and effectiveness of medical LLMs. The system’s innovative use of a Turing Complete approach to manage state variables dynamically and its ability to adapt to complex medical queries set it apart from existing methods. As demonstrated by its superior performance in rigorous evaluations, TC-RAG has the potential to become an invaluable tool in the healthcare industry, providing accurate and reliable support for medical professionals in making critical decisions.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Turing-Complete-RAG (TC-RAG): A Breakthrough Framework Enhancing Accuracy and Reliability in Medical LLMs Through Dynamic State Management and Adaptive Retrieval appeared first on MarkTechPost.

Contrastive Learning from AI Revisions (CLAIR): A Novel Approach to Ad …

Artificial intelligence (AI) development, particularly in large language models (LLMs), focuses on aligning these models with human preferences to enhance their effectiveness and safety. This alignment is critical in refining AI interactions with users, ensuring that the responses generated are accurate and aligned with human expectations and values. Achieving this requires a combination of preference data, which informs the model of desirable outcomes, and alignment objectives that guide the training process. These elements are crucial for improving the model’s performance and ability to meet user expectations.

A significant challenge in AI model alignment lies in the issue of underspecification, where the relationship between preference data and training objectives is not clearly defined. This lack of clarity can lead to suboptimal performance, as the model may need help to learn effectively from the provided data. Underspecification occurs when preference pairs used to train the model contain irrelevant differences to the desired outcome. These spurious differences complicate the learning process, making it difficult for the model to focus on the aspects that truly matter. Current alignment methods often need to account more adequately for the relationship between the model’s performance and the preference data, potentially leading to a degradation in the model’s capabilities.

Existing methods for aligning LLMs, such as those relying on contrastive learning objectives and preference pair datasets, have made significant strides but must be revised. These methods typically involve generating two outputs from the model and using a judge, another AI model, or a human to select the preferred output. However, this approach can lead to inconsistent preference signals, as the criteria for choosing the preferred response might only sometimes be clear or consistent. This inconsistency in the learning signal can hinder the model’s ability to improve effectively during training, as the model may only sometimes receive clear guidance on adjusting its outputs to align better with human preferences.

Researchers from Ghent University – imec, Stanford University, and Contextual AI have introduced two innovative methods to address these challenges: Contrastive Learning from AI Revisions (CLAIR) and Anchored Preference Optimization (APO). CLAIR is a novel data-creation method designed to generate minimally contrasting preference pairs by slightly revising a model’s output to create a preferred response. This method ensures that the contrast between the winning and losing outputs is minimal but meaningful, providing a more precise learning signal for the model. On the other hand, APO is a family of alignment objectives that offer greater control over the training process. By explicitly accounting for the relationship between the model and the preference data, APO ensures that the alignment process is more stable and effective.

The CLAIR method operates by first generating a losing output from the target model, then using a stronger model, such as GPT-4-turbo, to revise this output into a winning one. This revision process is designed to make only minimal changes, ensuring that the contrast between the two outputs is focused on the most relevant aspects. This approach differs significantly from traditional methods, which might rely on a judge to select the preferred output from two independently generated responses. By creating preference pairs with minimal yet meaningful contrasts, CLAIR provides a clearer and more effective learning signal for the model during training.

Anchored Preference Optimization (APO) complements CLAIR by offering fine-grained control over the alignment process. APO adjusts the likelihood of winning or losing outputs based on the model’s performance relative to the preference data. For example, the APO-zero variant increases the probability of winning outputs while decreasing the likelihood of losing ones, which is particularly useful when the model’s outputs are generally less desirable than the winning outputs. Conversely, APO-down decreases the likelihood of winning and losing outputs, which can be beneficial when the model’s outputs are already better than the preferred responses. This level of control allows researchers to tailor the alignment process more closely to the specific needs of the model and the data.

The effectiveness of CLAIR and APO was demonstrated by aligning the Llama-3-8B-Instruct model using a variety of datasets and alignment objectives. The results were significant: CLAIR, combined with the APO-zero objective, led to a 7.65% improvement in performance on the MixEval-Hard benchmark, which measures model accuracy across a range of complex queries. This improvement represents a substantial step towards closing the performance gap between Llama-3-8B-Instruct and GPT-4-turbo, reducing the difference by 45%. These results highlight the importance of minimally contrasting preference pairs and tailored alignment objectives in improving AI model performance.

In conclusion, CLAIR and APO offer a more effective approach to aligning LLMs with human preferences, addressing the challenges of underspecification and providing more precise control over the training process. Their success in improving the performance of the Llama-3-8B-Instruct model underscores their potential to enhance the alignment process for AI models more broadly.

Check out the Paper, Model, and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Contrastive Learning from AI Revisions (CLAIR): A Novel Approach to Address Underspecification in AI Model Alignment with Anchored Preference Optimization (APO) appeared first on MarkTechPost.

Llama3 Just Got Ears! Llama3-s v0.2: A New Multimodal Checkpoint with …

Understanding spoken language for large language models (LLMs) is crucial for creating more natural and intuitive interactions with machines. While traditional models excel at text-based tasks, they struggle with comprehending human speech, limiting their potential in real-world applications like voice assistants, customer service, and accessibility tools. Enhancing speech understanding can improve interactions between humans and machines, particularly in scenarios that demand real-time processing.

Homebrew Research introduces Llama3-s v0.2 to address the challenge of understanding spoken language in natural language processing. Current language models predominantly focus on text, with limited capabilities in processing spoken language. Existing speech understanding models often falter in scenarios involving complex accents, background noise, or extended audio inputs. 

Llama3-s v0.2 builds on the foundation of the Llama 3.1 language model, introducing significant enhancements specifically designed to improve speech understanding. The model utilizes a pre-trained audio encoder (like WhisperVQ) to convert spoken audio into numerical representations that the language model can process. This multimodal training approach, which integrates text and audio inputs, allows Llama3-s v0.2 to learn the relationship between spoken language and its textual representation efficiently. Furthermore, the model employs semantic tokens, abstract representations of word meanings, to improve its understanding of the underlying content of speech.

Llama3-s v0.2 enhances its speech understanding capabilities through a two-stage training process. In the first stage, the model is pre-trained on real speech data using the MLS-10k dataset, which includes 10 hours of unlabeled, multilingual human speech. This pre-training enhances the model’s ability to generalize across semantic tokens. In the second stage, the model undergoes instruct tuning with a mixture of synthetic data, using WhisperVQ to semantically encode the speech data. This approach helps the model learn from a combination of speech instruction prompts and transcription prompts. Llama3-s v0.2 demonstrates promising results, outperforming existing models on multiple benchmarks, including the ALPACA-Audio and AudioBench evaluations. Llama3-s v.02 achieved an average score of 3.53 on the ALPACA-Audio eval, which seems to beat SALMONN, Qwen-Audio, and WavLLM. Despite its advancements, the model still faces limitations, such as sensitivity to background noise and difficulties with extended audio inputs.

In conclusion, Llama3-s v0.2 represents a significant step forward in the development of multimodal language models capable of understanding spoken language. By integrating audio and text inputs and employing advanced semantic tokenization, the model overcomes the limitations faced by traditional language models in speech understanding. The experiments demonstrated by Llama3-s v0.2 open up new possibilities for real-world applications, making technology more accessible and user-friendly.

Check out the Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Llama3 Just Got Ears! Llama3-s v0.2: A New Multimodal Checkpoint with Improved Speech Understanding appeared first on MarkTechPost.

AI21 Labs Released Jamba 1.5 Family of Open Models: Jamba 1.5 Mini and …

AI21 Labs has made a significant stride in the AI landscape by releasing the Jamba 1.5 family of open models, comprising Jamba 1.5 Mini and Jamba 1.5 Large. These models, built on the novel SSM-Transformer architecture, represent a breakthrough in AI technology, particularly in handling long-context tasks. AI21 Labs aims to democratize access to these powerful models by releasing them under the Jamba Open Model License, encouraging widespread experimentation and innovation.

Key Features of the Jamba 1.5 Models

One of the standout features of the Jamba 1.5 models is their ability to handle exceptionally long contexts. They boast an effective context window of 256K tokens, the longest in the market for open models. This feature is critical for enterprise applications requiring the analysis and summarization of lengthy documents. The models also excel in agentic and Retrieval-Augmented Generation (RAG) workflows, enhancing both the quality and efficiency of these processes.

Image Source

Regarding speed, the Jamba 1.5 models are up to 2.5 times faster on long contexts than their competitors, and they maintain superior performance across all context lengths within their size class. This speed advantage is crucial for enterprises that need rapid turnaround times for tasks such as customer support or large-scale data processing.

The quality of the Jamba 1.5 models is another area where they outshine their peers. Jamba 1.5 Mini has been recognized as the strongest open model in its size class, achieving a score of 46.1 on the Arena Hard benchmark, outperforming larger models like Mixtral 8x22B and Command-R+. Jamba 1.5 Large goes even further, scoring 65.4, which surpasses leading models such as Llama 3.1 70B and 405B. This high-quality performance across different benchmarks highlights the robustness of the Jamba 1.5 models in delivering reliable and accurate results.

Multilingual Support and Developer Readiness

In addition to their technical prowess, the Jamba 1.5 models are designed with multilingual support, catering to languages such as Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew. This makes them versatile tools for global enterprises operating in diverse linguistic environments.

For developers, Jamba 1.5 models offer native support for structured JSON output, function calling, document object digestion, and citation generation. These features make the models adaptable to various development needs, enabling seamless integration into existing workflows.

Image Source

Deployment and Efficiency

AI21 Labs has ensured that the Jamba 1.5 models are accessible and deployable across multiple platforms. They are available for immediate download on Hugging Face and are supported by major cloud providers, including Google Cloud Vertex AI, Microsoft Azure, and NVIDIA NIM. The models are expected to be available soon on additional platforms such as Amazon Bedrock, Databricks Marketplace, Snowflake Cortex, and others, making them easily deployable in various environments, including on-premises and virtual private clouds.

Image Source

Another critical advantage of the Jamba 1.5 models is their resource efficiency. Built on a hybrid architecture that combines the strengths of Transformer and Mamba architectures, these models offer a lower memory footprint, allowing enterprises to handle extensive context lengths on a single GPU. AI21 Labs’ novel quantization technique, ExpertsInt8, further enhances this efficiency, which optimizes model performance without compromising quality.

Image Source

Conclusion

The release of the Jamba 1.5 family by AI21 Labs marks a significant advancement in long-context handling. These models set new benchmarks in speed, quality, and efficiency and democratize access to cutting-edge AI technology through their open model license. As enterprises continue to seek AI solutions that deliver real-world value, the Jamba 1.5 models stand out as powerful tools capable of meeting the demands of complex, large-scale applications. Their availability across multiple platforms and support for multilingual environments further enhance their appeal, making them a versatile choice for developers and businesses.

Check out the Jamba 1.5 mini, Jamba 1.5 large, and Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post AI21 Labs Released Jamba 1.5 Family of Open Models: Jamba 1.5 Mini and Jamba 1.5 Large Redefining Long-Context AI with Unmatched Speed, Quality, and Multilingual Capabilities for Global Enterprises appeared first on MarkTechPost.

Processing 2-Hour Videos Seamlessly: This AI Paper Unveils LONGVILA, A …

The main challenge in developing advanced visual language models (VLMs) lies in enabling these models to effectively process and understand long video sequences that contain extensive contextual information. Long-context understanding is crucial for applications such as detailed video analysis, autonomous systems, and real-world AI implementations where tasks require the comprehension of complex, multi-modal inputs over extended periods. However, current models are limited in their ability to handle long sequences, which restricts their performance and usability in tasks requiring deep contextual analysis. This challenge is significant because overcoming it would unlock the potential for AI systems to perform more sophisticated tasks in real time and across various domains.

Existing methods designed to handle long-context vision-language tasks often encounter scalability and efficiency issues. Approaches such as Ring-Style Sequence Parallelism and Megatron-LM have extended context length in language models but struggle when applied to multi-modal tasks that involve both visual and textual data. These methods are hindered by their computational demands, making them impractical for real-time applications or tasks requiring the processing of very long sequences. Additionally, most visual language models are optimized for short contexts, limiting their effectiveness for longer video sequences. These constraints prevent AI models from achieving the necessary performance levels in tasks that demand extended context understanding, such as video summarization and long-form video captioning.

A team of researchers from NVIDIA, MIT, UC Berkeley, and UT Austin proposes LongVILA, an innovative approach that offers a full-stack solution for long-context visual language models. LongVILA introduces the Multi-Modal Sequence Parallelism (MM-SP) system, which significantly enhances the efficiency of long-context training and inference by enabling models to process sequences up to 2 million tokens in length using 256 GPUs. This system is more efficient than existing methods, achieving a 2.1× – 5.7× speedup compared to Ring-Style Sequence Parallelism and a 1.1× – 1.4× improvement over Megatron-LM. The novelty of LongVILA lies in its ability to scale context length while seamlessly integrating with frameworks like Hugging Face Transformers. The five-stage training pipeline further enhances the model’s capabilities, focusing on multi-modal alignment, large-scale pre-training, context extension, and supervised fine-tuning, leading to substantial performance improvements on long video tasks.

The foundation of LongVILA is the MM-SP system, designed to handle the training and inference of long-context VLMs by distributing computational loads across multiple GPUs. The system employs a two-stage sharding strategy that ensures balanced processing of both the image encoder and the language modeling stages. This strategy is crucial for efficiently handling the diverse data types involved in multi-modal tasks, particularly when processing extremely long video sequences. The training pipeline is composed of five stages: multi-modal alignment, large-scale pre-training, short-supervised fine-tuning, context extension, and long-supervised fine-tuning. Each stage incrementally extends the model’s capability from handling short contexts to processing long video sequences with up to 1024 frames. A new dataset was also developed for long video instruction-following, comprising 15,292 videos, each around 10 minutes long, to support the final supervised fine-tuning stage.

The LongVILA approach achieves substantial improvements in handling long video tasks, particularly in its ability to process extended sequences with high accuracy. The model demonstrated a significant 99.5% accuracy when processing videos with a context length of 274,000 tokens, far exceeding the capabilities of previous models that were limited to shorter sequences. Additionally, LongVILA-8B consistently outperforms existing state-of-the-art models on benchmarks for video tasks of varying lengths, showcasing its superior ability to manage and analyze long video content effectively. The performance gains achieved by LongVILA highlight its efficiency and scalability, making it a leading solution for tasks that require deep contextual understanding over extended sequences.

In conclusion, LongVILA represents a significant advancement in the field of AI, particularly for tasks requiring long-context understanding in multi-modal settings. By offering a comprehensive solution that includes a novel sequence parallelism system, a multi-stage training pipeline, and specialized datasets, LongVILA effectively addresses the critical challenge of processing long video sequences. This method not only improves the scalability and efficiency of visual language models but also sets a new standard for performance in long video tasks, marking a substantial contribution to the advancement of AI research.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post Processing 2-Hour Videos Seamlessly: This AI Paper Unveils LONGVILA, Advancing Long-Context Visual Language Models for Long Videos appeared first on MarkTechPost.

This AI Paper by National University of Singapore Introduces A Compreh …

Tabular data, which dominates many genres, such as healthcare, financial, and social science applications, contains rows and columns with structured features, making it much easier for data management or analysis. However, the diversity of tabular data, including numerical, unconditional, and textual, brings huge challenges to attaining robust and accurate predictive performance. Another area for improvement in effectively modeling and analyzing this type of data is the complexity of the relationships inside the data, particularly dependencies between rows and columns.

The main challenge in analyzing tabular data is that it is very difficult to handle its heterogeneous structure. Traditional machine learning models stay far away when considering the complex relationships inside tabular datasets, especially for large and complex datasets. These models require additional guidance to generalize well in the presence of a diversity of data types and interdependencies of tabular data. This challenge becomes even more complex given the need for high predictive accuracy and robustness, especially in critical applications such as health care, where the decisions among data analysis can be quite consequential.

Different methods have been applied to overcome these challenges of modeling tabular data. Early techniques relied largely on conventional machine learning, most of which needed a lot of feature engineering to model the subtleties of the data. The known weakness of these naturally lay in their inability to scale in size and complexity of the input dataset. More recently, techniques from NLP have been adapted for tabular data; more specifically, transformer-based architectures are increasingly implemented. These methods started by training the transformers from scratch over tabular data, but this had the disadvantage of needing huge amounts of training data with significant scalability issues. Against this backdrop, researchers began using PLMs like BERT, which required less data and provided better predictive performance.

Researchers from the National University of Singapore provided a comprehensive survey of the various language modeling techniques developed for tabular data. The review systematizes classification for literature and further identifies a trend shift from traditional machine learning models to advanced methods using state-of-the-art LLMs like GPT and LLaMA. This research has emphasized the evolution of these models, showing how LLMs have been radical in the field, taking it further into more sophisticated applications in modeling tabular data. This work is important to fill a gap in the relevant literature by providing a detailed taxonomy of the tabular data structures, key datasets, and various modeling techniques.

The methodology proposed by the research team categorizes tabular data into two major categories: 1D and 2D. On the other hand, 1D tabular data usually contains only one table, with the main work coming at the row level, which, of course, is simpler but very important for tasks like classification and regression. In contrast, 2D tabular data consists of multiple related tables, requiring more complex modeling techniques for tasks such as table retrieval and table question-answering. The researchers delve into different strategies for turning tabular data into forms that their language model can consume. These strategies include flattening sequences, row processing, and integrating this information in prompts. Through these methods, the language models lever a more profound understanding and processing abilities of tabular data towards assured predictive outcomes.

The research shows how strong the ability of great big language models is in most tasks of tabular data. These models have demonstrated marked improvement in understanding and processing complex data structures on functions such as Table Question Answering and Table Semantic Parsing. The authors illustrate how LLMs enable a standard rise in all tasks at higher levels of accuracy and efficiency by exploiting pre-trained knowledge and advanced attention mechanisms that set new tabular data modeling standards across many applications.

In conclusion, the research has underscored the potential that NLP techniques have for effectively changing the very nature of tabular data analysis in the presence of large language models. By systematizing the review and categorization of existing methods, researchers have proposed a very clear roadmap for future developments in this area. The proposed methodologies negate the intrinsic challenges of tabular data and open up new advanced applications with guarantees of relevance and effectiveness, including when the complexity of data rises.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 49k+ ML SubReddit

Find Upcoming AI Webinars here
The post This AI Paper by National University of Singapore Introduces A Comprehensive Survey of Language Models for Tabular Data Analysis appeared first on MarkTechPost.