NVIDIA AI Open Sources Dynamo: An Open-Source Inference Library for Ac …

​The rapid advancement of artificial intelligence (AI) has led to the development of complex models capable of understanding and generating human-like text. Deploying these large language models (LLMs) in real-world applications presents significant challenges, particularly in optimizing performance and managing computational resources efficiently.​

Challenges in Scaling AI Reasoning Models

As AI models grow in complexity, their deployment demands increase, especially during the inference phase—the stage where models generate outputs based on new data. Key challenges include:​

Resource Allocation: Balancing computational loads across extensive GPU clusters to prevent bottlenecks and underutilization is complex.​

Latency Reduction: Ensuring rapid response times is critical for user satisfaction, necessitating low-latency inference processes.​

Cost Management: The substantial computational requirements of LLMs can lead to escalating operational costs, making cost-effective solutions essential.​

Introducing NVIDIA Dynamo

In response to these challenges, NVIDIA has introduced Dynamo, an open-source inference library designed to accelerate and scale AI reasoning models efficiently and cost-effectively. As the successor to the NVIDIA Triton Inference Server, Dynamo offers a modular framework tailored for distributed environments, enabling seamless scaling of inference workloads across large GPU fleets. ​

Technical Innovations and Benefits

Dynamo incorporates several key innovations that collectively enhance inference performance:​

Disaggregated Serving: This approach separates the context (prefill) and generation (decode) phases of LLM inference, allocating them to distinct GPUs. By allowing each phase to be optimized independently, disaggregated serving improves resource utilization and increases the number of inference requests served per GPU. ​

GPU Resource Planner: Dynamo’s planning engine dynamically adjusts GPU allocation in response to fluctuating user demand, preventing over- or under-provisioning and ensuring optimal performance. ​

Smart Router: This component efficiently directs incoming inference requests across large GPU fleets, minimizing costly recomputations by leveraging knowledge from prior requests, known as KV cache. ​

Low-Latency Communication Library (NIXL): NIXL accelerates data transfer between GPUs and across diverse memory and storage types, reducing inference response times and simplifying data exchange complexities.

KV Cache Manager: By offloading less frequently accessed inference data to more cost-effective memory and storage devices, Dynamo reduces overall inference costs without impacting user experience. ​

Performance Insights

Dynamo’s impact on inference performance is substantial. When serving the open-source DeepSeek-R1 671B reasoning model on NVIDIA GB200 NVL72, Dynamo increased throughput—measured in tokens per second per GPU—by up to 30 times. Additionally, serving the Llama 70B model on NVIDIA Hopper resulted in more than a twofold increase in throughput. ​

These enhancements enable AI service providers to serve more inference requests per GPU, accelerate response times, and reduce operational costs, thereby maximizing returns on their accelerated compute investments. ​

Conclusion

NVIDIA Dynamo represents a significant advancement in the deployment of AI reasoning models, addressing critical challenges in scaling, efficiency, and cost-effectiveness. Its open-source nature and compatibility with major AI inference backends, including PyTorch, SGLang, NVIDIA TensorRT-LLM, and vLLM, empower enterprises, startups, and researchers to optimize AI model serving across disaggregated inference environments. By leveraging Dynamo’s innovative features, organizations can enhance their AI capabilities, delivering faster and more efficient AI services to meet the growing demands of modern applications.

Check out the Technical details and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

The post NVIDIA AI Open Sources Dynamo: An Open-Source Inference Library for Accelerating and Scaling AI Reasoning Models in AI Factories appeared first on MarkTechPost.

A Step-by-Step Guide to Building a Semantic Search Engine with Sentenc …

Semantic search goes beyond traditional keyword matching by understanding the contextual meaning of search queries. Instead of simply matching exact words, semantic search systems capture the intent and contextual definition of the query and return relevant results even when they don’t contain the same keywords.

In this tutorial, we’ll implement a semantic search system using Sentence Transformers, a powerful library built on top of Hugging Face’s Transformers that provides pre-trained models specifically optimized for generating sentence embeddings. These embeddings are numerical representations of text that capture semantic meaning, allowing us to find similar content through vector similarity. We’ll create a practical application: a semantic search engine for a collection of scientific abstracts that can answer research queries with relevant papers, even when the terminology differs between the query and relevant documents.

First, let’s install the necessary libraries in our Colab notebook:

Copy CodeCopiedUse a different Browser!pip install sentence-transformers faiss-cpu numpy pandas matplotlib datasets

Now, let’s import the libraries we’ll need:

Copy CodeCopiedUse a different Browserimport numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sentence_transformers import SentenceTransformer
import faiss
from typing import List, Dict, Tuple
import time
import re
import torch

For our demonstration, we’ll use a collection of scientific paper abstracts. Let’s create a small dataset of abstracts from various fields:

Copy CodeCopiedUse a different Browserabstracts = [
{
“id”: 1,
“title”: “Deep Learning for Natural Language Processing”,
“abstract”: “This paper explores recent advances in deep learning models for natural language processing tasks. We review transformer architectures including BERT, GPT, and T5, and analyze their performance on various benchmarks including question answering, sentiment analysis, and text classification.”
},
{
“id”: 2,
“title”: “Climate Change Impact on Marine Ecosystems”,
“abstract”: “Rising ocean temperatures and acidification are severely impacting coral reefs and marine biodiversity. This study presents data collected over a 10-year period, demonstrating accelerated decline in reef ecosystems and proposing conservation strategies to mitigate further damage.”
},
{
“id”: 3,
“title”: “Advancements in mRNA Vaccine Technology”,
“abstract”: “The development of mRNA vaccines represents a breakthrough in immunization technology. This review discusses the mechanism of action, stability improvements, and clinical efficacy of mRNA platforms, with special attention to their rapid deployment during the COVID-19 pandemic.”
},
{
“id”: 4,
“title”: “Quantum Computing Algorithms for Optimization Problems”,
“abstract”: “Quantum computing offers potential speedups for solving complex optimization problems. This paper presents quantum algorithms for combinatorial optimization and compares their theoretical performance with classical methods on problems including traveling salesman and maximum cut.”
},
{
“id”: 5,
“title”: “Sustainable Urban Planning Frameworks”,
“abstract”: “This research proposes frameworks for sustainable urban development that integrate renewable energy systems, efficient public transportation networks, and green infrastructure. Case studies from five cities demonstrate reductions in carbon emissions and improvements in quality of life metrics.”
},
{
“id”: 6,
“title”: “Neural Networks for Computer Vision”,
“abstract”: “Convolutional neural networks have revolutionized computer vision tasks. This paper examines recent architectural innovations including residual connections, attention mechanisms, and vision transformers, evaluating their performance on image classification, object detection, and segmentation benchmarks.”
},
{
“id”: 7,
“title”: “Blockchain Applications in Supply Chain Management”,
“abstract”: “Blockchain technology enables transparent and secure tracking of goods throughout supply chains. This study analyzes implementations across food, pharmaceutical, and retail industries, quantifying improvements in traceability, reduction in counterfeit products, and enhanced consumer trust.”
},
{
“id”: 8,
“title”: “Genetic Factors in Autoimmune Disorders”,
“abstract”: “This research identifies key genetic markers associated with increased susceptibility to autoimmune conditions. Through genome-wide association studies of 15,000 patients, we identified novel variants that influence immune system regulation and may serve as targets for personalized therapeutic approaches.”
},
{
“id”: 9,
“title”: “Reinforcement Learning for Robotic Control Systems”,
“abstract”: “Deep reinforcement learning enables robots to learn complex manipulation tasks through trial and error. This paper presents a framework that combines model-based planning with policy gradient methods to achieve sample-efficient learning of dexterous manipulation skills.”
},
{
“id”: 10,
“title”: “Microplastic Pollution in Freshwater Systems”,
“abstract”: “This study quantifies microplastic contamination across 30 freshwater lakes and rivers, identifying primary sources and transport mechanisms. Results indicate correlation between population density and contamination levels, with implications for water treatment policies and plastic waste management.”
}
]

papers_df = pd.DataFrame(abstracts)
print(f”Dataset loaded with {len(papers_df)} scientific papers”)
papers_df[[“id”, “title”]]

Now we’ll load a pre-trained Sentence Transformer model from Hugging Face. We’ll use the all-MiniLM-L6-v2 model, which provides a good balance between performance and speed:

Copy CodeCopiedUse a different Browsermodel_name = ‘all-MiniLM-L6-v2’
model = SentenceTransformer(model_name)
print(f”Loaded model: {model_name}”)

Next, we’ll convert our text abstracts into dense vector embeddings:

Copy CodeCopiedUse a different Browserdocuments = papers_df[‘abstract’].tolist()
document_embeddings = model.encode(documents, show_progress_bar=True)

print(f”Generated {len(document_embeddings)} embeddings with dimension {document_embeddings.shape[1]}”)

FAISS (Facebook AI Similarity Search) is a library for efficient similarity search. We’ll use it to index our document embeddings:

Copy CodeCopiedUse a different Browserdimension = document_embeddings.shape[1]

index = faiss.IndexFlatL2(dimension)
index.add(np.array(document_embeddings).astype(‘float32’))

print(f”Created FAISS index with {index.ntotal} vectors”)

Now let’s implement a function that takes a query, converts it to an embedding, and retrieves the most similar documents:

Copy CodeCopiedUse a different Browserdef semantic_search(query: str, top_k: int = 3) -> List[Dict]:
“””
Search for documents similar to query

Args:
query: Text to search for
top_k: Number of results to return

Returns:
List of dictionaries containing document info and similarity score
“””
query_embedding = model.encode([query])

distances, indices = index.search(np.array(query_embedding).astype(‘float32’), top_k)

results = []
for i, idx in enumerate(indices[0]):
results.append({
‘id’: papers_df.iloc[idx][‘id’],
‘title’: papers_df.iloc[idx][‘title’],
‘abstract’: papers_df.iloc[idx][‘abstract’],
‘similarity_score’: 1 – distances[0][i] / 2
})

return results

Let’s test our semantic search with various queries that demonstrate its ability to understand meaning beyond exact keywords:

Copy CodeCopiedUse a different Browsertest_queries = [
“How do transformers work in natural language processing?”,
“What are the effects of global warming on ocean life?”,
“Tell me about COVID vaccine development”,
“Latest algorithms in quantum computing”,
“How can cities reduce their carbon footprint?”
]

for query in test_queries:
print(“n” + “=”*80)
print(f”Query: {query}”)
print(“=”*80)

results = semantic_search(query, top_k=3)

for i, result in enumerate(results):
print(f”nResult #{i+1} (Score: {result[‘similarity_score’]:.4f}):”)
print(f”Title: {result[‘title’]}”)
print(f”Abstract snippet: {result[‘abstract’][:150]}…”)

Let’s visualize the document embeddings to see how they cluster by topic:

Copy CodeCopiedUse a different Browserfrom sklearn.decomposition import PCA

pca = PCA(n_components=2)
reduced_embeddings = pca.fit_transform(document_embeddings)

plt.figure(figsize=(12, 8))
plt.scatter(reduced_embeddings[:, 0], reduced_embeddings[:, 1], s=100, alpha=0.7)

for i, (x, y) in enumerate(reduced_embeddings):
plt.annotate(papers_df.iloc[i][‘title’][:20] + “…”,
(x, y),
fontsize=9,
alpha=0.8)

plt.title(‘Document Embeddings Visualization (PCA)’)
plt.xlabel(‘Component 1’)
plt.ylabel(‘Component 2′)
plt.grid(True, linestyle=’–‘, alpha=0.7)
plt.tight_layout()
plt.show()

Let’s create a more interactive search interface:

Copy CodeCopiedUse a different Browserfrom IPython.display import display, HTML, clear_output
import ipywidgets as widgets

def run_search(query_text):
clear_output(wait=True)

display(HTML(f”<h3>Query: {query_text}</h3>”))

start_time = time.time()
results = semantic_search(query_text, top_k=5)
search_time = time.time() – start_time

display(HTML(f”<p>Found {len(results)} results in {search_time:.4f} seconds</p>”))

for i, result in enumerate(results):
html = f”””
<div style=”margin-bottom: 20px; padding: 15px; border: 1px solid #ddd; border-radius: 5px;”>
<h4>{i+1}. {result[‘title’]} <span style=”color: #007bff;”>(Score: {result[‘similarity_score’]:.4f})</span></h4>
<p>{result[‘abstract’]}</p>
</div>
“””
display(HTML(html))

search_box = widgets.Text(
value=”,
placeholder=’Type your search query here…’,
description=’Search:’,
layout=widgets.Layout(width=’70%’)
)

search_button = widgets.Button(
description=’Search’,
button_style=’primary’,
tooltip=’Click to search’
)

def on_button_clicked(b):
run_search(search_box.value)

search_button.on_click(on_button_clicked)

display(widgets.HBox([search_box, search_button]))

In this tutorial, we’ve built a complete semantic search system using Sentence Transformers. This system can understand the meaning behind user queries and return relevant documents even when there isn’t exact keyword matching. We’ve seen how embedding-based search provides more intelligent results than traditional methods.

Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post A Step-by-Step Guide to Building a Semantic Search Engine with Sentence Transformers, FAISS, and all-MiniLM-L6-v2 appeared first on MarkTechPost.

Build a generative AI enabled virtual IT troubleshooting assistant usi …

Today’s organizations face a critical challenge with the fragmentation of vital information across multiple environments. As businesses increasingly rely on diverse project management and IT service management (ITSM) tools such as ServiceNow, Atlassian Jira and Confluence, employees find themselves navigating a complex web of systems to access crucial data.
This isolated approach leads to several challenges for IT leaders, developers, program managers, and new employees. For example:

Inefficiency: Employees need to access multiple systems independently to gather data insights and remediation steps during incident troubleshooting
Lack of integration: Information is isolated across different environments, making it difficult to get a holistic view of ITSM activities
Time-consuming: Searching for relevant information across multiple systems is time-consuming and reduces productivity
Potential for inconsistency: Using multiple systems increases the risk of inconsistent data and processes across the organization.

Amazon Q Business is a fully managed, generative artificial intelligence (AI) powered assistant that can address challenges such as inefficient, inconsistent information access within an organization by providing 24/7 support tailored to individual needs. It handles a wide range of tasks such as answering questions, providing summaries, generating content, and completing tasks based on data in your organization. Amazon Q Business offers over 40 data source connectors that connect to your enterprise data sources and help you create a generative AI solution with minimal configuration. Amazon Q Business also supports over 50 actions across popular business applications and platforms. Additionally, Amazon Q Business offers enterprise-grade data security, privacy, and built-in guardrails that you can configure.
This blog post explores an innovative solution that harnesses the power of generative AI to bring value to your organization and ITSM tools with Amazon Q Business.
Solution overview
The solution architecture shown in the following figure demonstrates how to build a virtual IT troubleshooting assistant by integrating with multiple data sources such as Atlassian Jira, Confluence, and ServiceNow. This solution helps streamline information retrieval, enhance collaboration, and significantly boost overall operational efficiency, offering a glimpse into the future of intelligent enterprise information management.

This solution integrates with ITSM tools such as ServiceNow Online and project management software such as Atlassian Jira and Confluence using the Amazon Q Business data source connectors. You can use a data source connector to combine data from different places into a central index for your Amazon Q Business application. For this demonstration, we use the Amazon Q Business native index and retriever. We also configure an application environment and grant access to users to interact with an application environment using AWS IAM Identity Center for user management. Then, we provision subscriptions for IAM Identity Center users and groups.
Authorized users interact with the application environment through a web experience. You can share the web experience endpoint URL with your users so they can open the URL and authenticate themselves to start chatting with the generative AI application powered by Amazon Q Business.
Deployment
Start by setting up the architecture and data needed for the demonstration.

We’ve provided an AWS CloudFormation template in our GitHub repository that you can use to set up the environment for this demonstration. If you don’t have existing Atlassian Jira, Confluence, and ServiceNow accounts follow these steps to create trial accounts for the demonstration
Once step 1 is complete, open the AWS Management Console for Amazon Q Business. On the Applications tab, open your application to see the data sources. See Best practices for data source connector configuration in Amazon Q Business to understand best practices
To improve retrieved results and customize the end user chat experience, use Amazon Q to map document attributes from your data sources to fields in your Amazon Q index. Choose the Atlassian Jira, Confluence Cloud and ServiceNow Online links to learn more about their document attributes and field mappings. Select the data source to edit its configurations under Actions. Select the appropriate fields that you think would be important for your search needs. Repeat the process for all of the data sources. The following figure is an example of some of the Atlassian Jira project field mappings that we selected
Sync mode enables you to choose how you want to update your index when your data source content changes. Sync run schedule sets how often you want Amazon Q Business to synchronize your index with the data source. For this demonstration, we set the Sync mode to Full Sync and the Frequency to Run on demand. Update Sync mode with your changes and choose Sync Now to start syncing data sources. When you initiate a sync, Amazon Q will crawl the data source to extract relevant documents, then sync them to the Amazon Q index, making them searchable
After syncing data sources, you can configure the metadata controls in Amazon Q Business. An Amazon Q Business index has fields that you can map your document attributes to. After the index fields are mapped to document attributes and are search-enabled, admins can use the index fields to boost results from specific sources, or by end users to filter and scope their chat results to specific data. Boosting chat responses based on document attributes helps you rank sources that are more authoritative higher than other sources in your application environment. See Boosting chat responses using metadata boosting to learn more about metadata boosting and metadata controls. The following figure is an example of some of the metadata controls that we selected
For the purposes of the demonstration, use the Amazon Q Business web experience. Select your application under Applications and then select the Deployed URL link in the web experience settings
Enter the same username, password and multi-factor authentication (MFA) authentication for the user that you created previously in IAM Identity Center to sign in to the Amazon Q Business web experience generative AI assistant

Demonstration
Now that you’ve signed in to the Amazon Q Business web experience generative AI assistant (shown in the previous figure), let’s try some natural language queries.
IT leaders: You’re an IT leader and your team is working on a critical project that needs to hit the market quickly. You can now ask questions in natural language to Amazon Q Business to get answers based on your company data.

Developers: Developers who want to know information such as the tasks that are assigned to them, specific tasks details, or issues in a particular sub segment. They can now get these questions answered from Amazon Q Business without necessarily signing in to either Atlassian Jira or Confluence.

Project and program managers: Project and program managers can monitor the activities or developments in their projects or programs from Amazon Q Business without having to contact various teams to get individual status updates.

New employees or business users: A newly hired employee who’s looking for information to get started on a project or a business user who needs tech support can use the generative AI assistant to get the information and support they need.

Benefits and outcomes
From the demonstrations, you saw that various users whether they are leaders, managers, developers, or business users can benefit from using a generative AI solution like our virtual IT assistant built using Amazon Q Business. It removes the undifferentiated heavy lifting of having to navigate multiple solutions and cross-reference multiple items and data points to get answers. Amazon Q Business can use the generative AI to provide responses with actionable insights in just few seconds. Now, let’s dive deeper into some of the additional benefits that this solution provides.

Increased efficiency: Centralized access to information from ServiceNow, Atlassian Jira, and Confluence saves time and reduces the need to switch between multiple systems.
Enhanced decision-making: Comprehensive data insights from multiple systems leads to better-informed decisions in incident management and problem-solving for various users across the organization.
Faster incident resolution: Quick access to enterprise data sources and knowledge and AI-assisted remediation steps can significantly reduce mean time to resolutions (MTTR) for cases with elevated priorities.
Improved knowledge management: Access to Confluence’s architectural documents and other knowledge bases such as ServiceNow’s Knowledge Articles promotes better knowledge sharing across the organization. Users can now get responses based on information from multiple systems.
Seamless integration and enhanced user experience: Better integration between ITSM processes, project management, and software development streamlines operations. This is helpful for organizations and teams that incorporate agile methodologies.
Cost savings: Reduction in time spent searching for information and resolving incidents can lead to significant cost savings in IT operations.
Scalability: Amazon Q Business can grow with the organization, accommodating future needs and additional data sources as required. Organization can create more Amazon Q Business applications and share purpose-built Amazon Q Business apps within their organizations to manage repetitive tasks.

Clean up
After completing your exploration of the virtual IT troubleshooting assistant, delete the CloudFormation stack from your AWS account. This action terminates all resources created during deployment of this demonstration and prevents unnecessary costs from accruing in your AWS account.
Conclusion
By integrating Amazon Q Business with enterprise systems, you can create a powerful virtual IT assistant that streamlines information access and improves productivity. The solution presented in this post demonstrates the power of combining AI capabilities with existing enterprise systems to create powerful unified ITSM solutions and more efficient and user-friendly experiences.
We provide the sample virtual IT assistant using an Amazon Q Business solution as open source—use it as a starting point for your own solution and help us make it better by contributing fixes and features through GitHub pull requests. Visit the GitHub repository to explore the code, choose Watch to be notified of new releases, and check the README for the latest documentation updates.
Learn more:

Amazon Q Business
Generative AI on AWS

For expert assistance, AWS Professional Services, AWS Generative AI partner solutions, and AWS Generative AI Competency Partners are here to help.
We’d love to hear from you. Let us know what you think in the comments section, or use the issues forum in the GitHub repository.

About the Authors
Jasmine Rasheed Syed is a Senior Customer Solutions manager at AWS, focused on accelerating time to value for the customers on their cloud journey by adopting best practices and mechanisms to transform their business at scale. Jasmine is a seasoned, result oriented leader with 20+ years of progressive experience in Insurance, Retail & CPG with exemplary track record spanning across Business Development, Cloud/Digital Transformation, Delivery, Operational & Process Excellence and Executive Management.
Suprakash Dutta is a Sr. Solutions Architect at Amazon Web Services. He focuses on digital transformation strategy, application modernization and migration, data analytics, and machine learning. He is part of the AI/ML community at AWS and designs Generative AI and Intelligent Document Processing(IDP) solutions.
Joshua Amah is a Partner Solutions Architect at Amazon Web Services, specializing in supporting SI partners with a focus on AI/ML and generative AI technologies. He is passionate about guiding AWS Partners in using cutting-edge technologies and best practices to build innovative solutions that meet customer needs. Joshua provides architectural guidance and strategic recommendations for both new and existing workloads.
Brad King is an Enterprise Account Executive at Amazon Web Services specializing in translating complex technical concepts into business value and making sure that clients achieve their digital transformation goals efficiently and effectively through long term partnerships.
Joseph Mart is an AI/ML Specialist Solutions Architect at Amazon Web Services (AWS). His core competence and interests lie in machine learning applications and generative AI. Joseph is a technology addict who enjoys guiding AWS customers on architecting their workload in the AWS Cloud. In his spare time, he loves playing soccer and visiting nature.

Process formulas and charts with Anthropic’s Claude on Amazon Bedroc …

Research papers and engineering documents often contain a wealth of information in the form of mathematical formulas, charts, and graphs. Navigating these unstructured documents to find relevant information can be a tedious and time-consuming task, especially when dealing with large volumes of data. However, by using Anthropic’s Claude on Amazon Bedrock, researchers and engineers can now automate the indexing and tagging of these technical documents. This enables the efficient processing of content, including scientific formulas and data visualizations, and the population of Amazon Bedrock Knowledge Bases with appropriate metadata.
Amazon Bedrock is a fully managed service that provides a single API to access and use various high-performing foundation models (FMs) from leading AI companies. It offers a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI practices. Anthropic’s Claude 3 Sonnet offers best-in-class vision capabilities compared to other leading models. It can accurately transcribe text from imperfect images—a core capability for retail, logistics, and financial services, where AI might glean more insights from an image, graphic, or illustration than from text alone. The latest of Anthropic’s Claude models demonstrate a strong aptitude for understanding a wide range of visual formats, including photos, charts, graphs and technical diagrams. With Anthropic’s Claude, you can extract more insights from documents, process web UIs and diverse product documentation, generate image catalog metadata, and more.
In this post, we explore how you can use these multi-modal generative AI models to streamline the management of technical documents. By extracting and structuring the key information from the source materials, the models can create a searchable knowledge base that allows you to quickly locate the data, formulas, and visualizations you need to support your work. With the document content organized in a knowledge base, researchers and engineers can use advanced search capabilities to surface the most relevant information for their specific needs. This can significantly accelerate research and development workflows, because professionals no longer have to manually sift through large volumes of unstructured data to find the references they need.
Solution overview
This solution demonstrates the transformative potential of multi-modal generative AI when applied to the challenges faced by scientific and engineering communities. By automating the indexing and tagging of technical documents, these powerful models can enable more efficient knowledge management and accelerate innovation across a variety of industries.
In addition to Anthropic’s Claude on Amazon Bedrock, the solution uses the following services:

Amazon SageMaker JupyterLab – The SageMakerJupyterLab application is a web-based interactive development environment (IDE) for notebooks, code, and data. JupyterLab application’s flexible and extensive interface can be used to configure and arrange machine learning (ML) workflows. We use JupyterLab to run the code for processing formulae and charts.
Amazon Simple Storage Service (Amazon S3) – Amazon S3 is an object storage service built to store and protect any amount of data. We use Amazon S3 to store sample documents that are used in this solution.
AWS Lambda –AWS Lambda is a compute service that runs code in response to triggers such as changes in data, changes in application state, or user actions. Because services such as Amazon S3 and Amazon Simple Notification Service (Amazon SNS) can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.

The solution workflow contains the following steps:

Split the PDF into individual pages and save them as PNG files.
With each page:

Extract the original text.
Render the formulas in LaTeX.
Generate a semantic description of each formula.
Generate an explanation of each formula.
Generate a semantic description of each graph.
Generate an interpretation for each graph.
Generate metadata for the page.

Generate metadata for the full document.
Upload the content and metadata to Amazon S3.
Create an Amazon Bedrock knowledge base.

The following diagram illustrates this workflow.

Prerequisites

If you’re new to AWS, you first need to create and set up an AWS account.
Additionally, in your account under Amazon Bedrock, request access to anthropic.claude-3-5-sonnet-20241022-v2:0 if you don’t have it already.

Deploy the solution
Complete the following steps to set up the solution:

Launch the AWS CloudFormation template by choosing Launch Stack (this creates the stack in the us-east-1 AWS Region):

When the stack deployment is complete, open the Amazon SageMaker AI
Choose Notebooks in the navigation pane.
Locate the notebook claude-scientific-docs-notebook and choose Open JupyterLab.

In the notebook, navigate to notebooks/process_scientific_docs.ipynb.

Choose conda_python3 as the kernel, then choose Select.

Walk through the sample code.

Explanation of the notebook code
In this section, we walk through the notebook code.
Load data
We use example research papers from arXiv to demonstrate the capability outlined here. arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.
We download the documents and store them under a samples folder locally. Multi-modal generative AI models work well with text extraction from image files, so we start by converting the PDF to a collection of images, one for each page.
Get Metadata from formulas
After the image documents are available, you can use Anthropic’s Claude to extract formulas and metadata with the Amazon Bedrock Converse API. Additionally, you can use the Amazon Bedrock Converse API to obtain an explanation of the extracted formulas in plain language. By combining the formula and metadata extraction capabilities of Anthropic’s Claude with the conversational abilities of the Amazon Bedrock Converse API, you can create a comprehensive solution for processing and understanding the information contained within the image documents.
We start with the following example PNG file.

We use the following request prompt:

sample_prompt = “””

Evaluate this page line by line.
For each line, if it is a formula, convert this math expression to latex format.
Next describe the formula in plain language Be sure to enclose Latex formulas in double dollar sign for example: $$ <math expression> $$ Use markdown syntax to format your output
“””

file = “./samples/2003.10304/page_2.png”

display(Image(filename=file, width=600))
output, result = stream_conversation(message=sample_prompt, file_paths=[file])
response_text = result[“content”]
display(Markdown(response_text))
print(output)

We get the following response, which shows the extracted formula converted to LaTeX format and described in plain language, enclosed in double dollar signs.

Get metadata from charts
Another useful capability of multi-modal generative AI models is the ability to interpret graphs and generate summaries and metadata. The following is an example of how you can obtain metadata of the charts and graphs using simple natural language conversation with models. We use the following graph.

We provide the following request:

sample_prompt = f”””
You are a data scientist expert who has perfect vision and pay a lot of attention to details.
interpret the graph on this page
provide the answer in markdown format “””

file = “./samples/2003.10304/page_5.png”

display(Image(filename=file, width=600))
output, result = stream_conversation(message=sample_prompt, file_paths=[file])
response_text = result[“content”]
display(Markdown(response_text))
print(output)

The response returned provides its interpretation of the graph explaining the color-coded lines and suggesting that overall, the DSC model is performing well on the training data, achieving a high Dice coefficient of around 0.98. However, the lower and fluctuating validation Dice coefficient indicates potential overfitting and room for improvement in the model’s generalization performance.

Generate metadata
Using natural language processing, you can generate metadata for the paper to aid in searchability.
We use the following request:

sample_prompt = f”””
Generate a metadata json object for this research paper.

{{
“title”: “”,
“authors”: [],
“institutions”: [],
“topics”: [],
“funding-sources”: [],
“algorithms”:[],
“data_sets”:[]
}}
“””

file = ‘./samples/2003.10304/page_0.png’

We get the following response, including formula markdown and a description.

{

“title”: “Attention U-Net Based Adversarial Architectures for Chest X-ray Lung Segmentation”,

“authors”: [“Gusztáv Gaál”, “Balázs Maga”, “András Lukács”], “institutions”: [“AI Research Group, Institute of Mathematics, Eötvös Loránd University, Budapest, Hungary”],

“topics”: [ “Chest X-ray segmentation”, “Medical imaging”, “Deep learning”, “Computer-aided detection”, “Lung segmentation” ],

“funding-sources”: [],

“algorithms”: [ “U-Net”, “Adversarial architectures”, “Fully Convolutional Neural Networks (FCN)”, “Mask R-CNN” ],

“data_sets”: [“JSRT dataset”]

}

Use your extracted data in a knowledge base
Now that we’ve prepared our data with formulas, analyzed charts, and metadata, we will create an Amazon Bedrock knowledge base. This will make the information searchable and enable question-answering capabilities.
Prepare your Amazon Bedrock knowledge base
To create a knowledge base, first upload the processed files and metadata to Amazon S3:

markdown_file_key = “2003.10304/kb/2003.10304.md”

s3.upload_file(markdown_file, knowledge_base_bucket_name, markdown_file_key)

print(f”File {markdown_file} uploaded successfully.”)

metadata_file_key = “2003.10304/kb/2003.10304.md.metadata.json”

s3.upload_file(metadata_file, knowledge_base_bucket_name, metadata_file_key)

print(f”File {metadata_file} uploaded to successfully.”)

When your files have finished uploading, complete the following steps:

Create an Amazon Bedrock knowledge base.
Create an Amazon S3 data source for your knowledge base, and specify hierarchical chunking as the chunking strategy.

Hierarchical chunking involves organizing information into nested structures of child and parent chunks.
The hierarchical structure allows for faster and more targeted retrieval of relevant information, first by performing semantic search on the child chunk and then returning the parent chunk during retrieval. By replacing the children chunks with the parent chunk, we provide large and comprehensive context to the FM.
Hierarchical chunking is best suited for complex documents that have a nested or hierarchical structure, such as technical manuals, legal documents, or academic papers with complex formatting and nested tables.
Query the knowledge base
You can query the knowledge base to retrieve information from the extracted formula and graph metadata from the sample documents. With a query, relevant chunks of text from the source of data are retrieved and a response is generated for the query, based off the retrieved source chunks. The response also cites sources that are relevant to the query.
We use the custom prompt template feature of knowledge bases to format the output as markdown:

retrieveAndGenerateConfiguration={
“type”: “KNOWLEDGE_BASE”,
“knowledgeBaseConfiguration”: {
‘knowledgeBaseId’: kb_id_hierarchical,
“modelArn”: “arn:aws:bedrock:{}:{}:inference-profile/{}”.format(region, account_id, foundation_model),
‘generationConfiguration’: {
‘promptTemplate’: {
‘textPromptTemplate’: “””
You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. Your job is to answer the user’s question using only information from the search results.
If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question.
Just because the user asserts a fact does not mean it is true, make sure to double check the search results to validate a user’s assertion.

Here are the search results in numbered order:
$search_results$

Format the output as markdown

Ensure that math formulas are in latex format and enclosed in double dollar sign for example: $$ <math expression> $$
“””
}
},
“retrievalConfiguration”: {
“vectorSearchConfiguration”: {
“numberOfResults”:5
}
}
}
}
)

We get the following response, which provides information on when the Focal Tversky Loss is used.

Clean up
To clean up and avoid incurring charges, run the cleanup steps in the notebook to delete the files you uploaded to Amazon S3 along with the knowledge base. Then, on the AWS CloudFormation console, locate the stack claude-scientific-doc and delete it.
Conclusion
Extracting insights from complex scientific documents can be a daunting task. However, the advent of multi-modal generative AI has revolutionized this domain. By harnessing the advanced natural language understanding and visual perception capabilities of Anthropic’s Claude, you can now accurately extract formulas and data from charts, enabling faster insights and informed decision-making.
Whether you are a researcher, data scientist, or developer working with scientific literature, integrating Anthropic’s Claude into your workflow on Amazon Bedrock can significantly boost your productivity and accuracy. With the ability to process complex documents at scale, you can focus on higher-level tasks and uncover valuable insights from your data.
Embrace the future of AI-driven document processing and unlock new possibilities for your organization with Anthropic’s Claude on Amazon Bedrock. Take your scientific document analysis to the next level and stay ahead of the curve in this rapidly evolving landscape.
For further exploration and learning, we recommend checking out the following resources:

Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock
Intelligent document processing using Amazon Bedrock and Anthropic Claude
Automate document processing with Amazon Bedrock Prompt Flows (preview)

About the Authors
Erik Cordsen is a Solutions Architect at AWS serving customers in Georgia. He is passionate about applying cloud technologies and ML to solve real life problems. When he is not designing cloud solutions, Erik enjoys travel, cooking, and cycling.
Renu Yadav is a Solutions Architect at Amazon Web Services (AWS), where she works with enterprise-level AWS customers providing them with technical guidance and help them achieve their business objectives. Renu has a strong passion for learning with her area of specialization in DevOps. She leverages her expertise in this domain to assist AWS customers in optimizing their cloud infrastructure and streamlining their software development and deployment processes.
Venkata Moparthi is a Senior Solutions Architect at AWS who empowers financial services organizations and other industries to navigate cloud transformation with specialized expertise in Cloud Migrations, Generative AI, and secure architecture design. His customer-focused approach combines technical innovation with practical implementation, helping businesses accelerate digital initiatives and achieve strategic outcomes through tailored AWS solutions that maximize cloud potential.

Automate IT operations with Amazon Bedrock Agents

IT operations teams face the challenge of providing smooth functioning of critical systems while managing a high volume of incidents filed by end-users. Manual intervention in incident management can be time-consuming and error prone because it relies on repetitive tasks, human judgment, and potential communication gaps. Using generative AI for IT operations offers a transformative solution that helps automate incident detection, diagnosis, and remediation, enhancing operational efficiency.
AI for IT operations (AIOps) is the application of AI and machine learning (ML) technologies to automate and enhance IT operations. AIOps helps IT teams manage and monitor large-scale systems by automatically detecting, diagnosing, and resolving incidents in real time. It combines data from various sources—such as logs, metrics, and events—to analyze system behavior, identify anomalies, and recommend or execute automated remediation actions. By reducing manual intervention, AIOps improves operational efficiency, accelerates incident resolution, and minimizes downtime.
This post presents a comprehensive AIOps solution that combines various AWS services such as Amazon Bedrock, AWS Lambda, and Amazon CloudWatch to create an AI assistant for effective incident management. This solution also uses Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents. The solution uses the power of Amazon Bedrock to enable the deployment of intelligent agents capable of monitoring IT systems, analyzing logs and metrics, and invoking automated remediation processes.
Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through a single API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage the infrastructure. Amazon Bedrock Knowledge Bases is a fully managed capability with built-in session context management and source attribution that helps you implement the entire Retrieval Augmented Generation (RAG) workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources and manage data flows. Amazon Bedrock Agents is a fully managed capability that make it straightforward for developers to create generative AI-based applications that can complete complex tasks for a wide range of use cases and deliver up-to-date answers based on proprietary knowledge sources.
Generative AI is rapidly transforming businesses and unlocking new possibilities across industries. This post highlights the transformative impact of large language models (LLMs). With the ability to encode human expertise and communicate in natural language, generative AI can help augment human capabilities and allow organizations to harness knowledge at scale.
Challenges in IT operations with runbooks
Runbooks are detailed, step-by-step guides that outline the processes, procedures, and tasks needed to complete specific operations, typically in IT and systems administration. They are commonly used to document repetitive tasks, troubleshooting steps, and routine maintenance. By standardizing responses to issues and facilitating consistency in task execution, runbooks help teams improve operational efficiency and streamline workflows. Most organizations rely on runbooks to simplify complex processes, making it straightforward for teams to handle routine operations and respond effectively to system issues. For organizations, managing hundreds of runbooks, monitoring their status, keeping track of failures, and setting up the right alerting can become difficult. This creates visibility gaps for IT teams. When you have multiple runbooks for various processes, managing the dependencies and run order between them can become complex and tedious. It’s challenging to handle failure scenarios and make sure everything runs in the right sequence.
The following are some of the challenges that most organizations face with manual IT operations:

Manual diagnosis through run logs and metrics
Runbook dependency and sequence mapping
No automated remediation processes
No real-time visibility into runbook progress

Solution overview
Amazon Bedrock is the foundation of this solution, empowering intelligent agents to monitor IT systems, analyze data, and automate remediation. The solution provides sample AWS Cloud Development Kit (AWS CDK) code to deploy this solution. The AIOps solution provides an AI assistant using Amazon Bedrock Agents to help with operations automation and runbook execution.
The following architecture diagram explains the overall flow of this solution.

The agent uses Anthropic’s Claude LLM available on Amazon Bedrock as one of the FMs to analyze incident details and retrieve relevant information from the knowledge base, a curated collection of runbooks and best practices. This equips the agent with business-specific context, making sure responses are precise and backed by data from Amazon Bedrock Knowledge Bases. Based on the analysis, the agent dynamically generates a runbook tailored to the specific incident and invokes appropriate remediation actions, such as creating snapshots, restarting instances, scaling resources, or running custom workflows.
Amazon Bedrock Knowledge Bases create an Amazon OpenSearch Serverless vector search collection to store and index incident data, runbooks, and run logs, enabling efficient search and retrieval of information. Lambda functions are employed to run specific actions, such as sending notifications, invoking API calls, or invoking automated workflows. The solution also integrates with Amazon Simple Email Service (Amazon SES) for timely notifications to stakeholders.
The solution workflow consists of the following steps:

Existing runbooks in various formats (such as Word documents, PDFs, or text files) are uploaded to Amazon Simple Storage Service (Amazon S3).
Amazon Bedrock Knowledge Bases converts these documents into vector embeddings using a selected embedding model, configured as part of the knowledge base setup.
These vector embeddings are stored in OpenSearch Serverless for efficient retrieval, also configured during the knowledge base setup.
Agents and action groups are then set up with the required APIs and prompts for handling different scenarios.
The OpenAPI specification defines which APIs need to be called, along with their input parameters and expected output, allowing Amazon Bedrock Agents to make informed decisions.
When a user prompt is received, Amazon Bedrock Agents uses RAG, action groups, and the OpenAPI specification to determine the appropriate API calls. If more details are needed, the agent prompts the user for additional information.
Amazon Bedrock Agents can iterate and call multiple functions as needed until the task is successfully complete.

Prerequisites
To implement this AIOps solution, you need an active AWS account and basic knowledge of the AWS CDK and the following AWS services:

Amazon Bedrock
Amazon CloudWatch
AWS Lambda
Amazon OpenSearch Serverless
Amazon SES
Amazon S3

Additionally, you need to provision the required infrastructure components, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Block Store (Amazon EBS) volumes, and other resources specific to your IT operations environment.
Build the RAG pipeline with OpenSearch Serverless
This solution uses a RAG pipeline to find relevant content and best practices from operations runbooks to generate responses. The RAG approach helps make sure the agent generates responses that are grounded in factual documentation, which avoids hallucinations. The relevant matches from the knowledge base guide Anthropic’s Claude 3 Haiku model so it focuses on the relevant information. The RAG process is powered by Amazon Bedrock Knowledge Bases, which stores information that the Amazon Bedrock agent can access and use. For this use case, our knowledge base contains existing runbooks from the organization with step-by-step procedures to resolve different operational issues on AWS resources.
The pipeline has the following key tasks:

Ingest documents in an S3 bucket – The first step ingests existing runbooks into an S3 bucket to create a searchable index with the help of OpenSearch Serverless.
Monitor infrastructure health using CloudWatch – An Amazon Bedrock action group is used to invoke Lambda functions to get CloudWatch metrics and alerts for EC2 instances from an AWS account. These specific checks are then used as Anthropic’s Claude 3 Haiku model inputs to form a health status overview of the account.

Configure Amazon Bedrock Agents
Amazon Bedrock Agents augment the user request with the right information from Amazon Bedrock Knowledge Bases to generate an accurate response. For this use case, our knowledge base contains existing runbooks from the organization with step-by-step procedures to resolve different operational issues on AWS resources.
By configuring the appropriate action groups and populating the knowledge base with relevant data, you can tailor the Amazon Bedrock agent to assist with specific tasks or domains and provide accurate and helpful responses within its intended scopes.
Amazon Bedrock agents empower Anthropic’s Claude 3 Haiku to use tools, overcoming LLM limitations like knowledge cutoffs and hallucinations, for enhanced task completion through API calls and other external interactions.
The agent’s workflow is to check for resource alerts using an API, then if found, fetch and execute the relevant runbook’s steps (for example, create snapshots, restart instances, and send emails).
The overall system enables automated detection and remediation of operational issues on AWS while enforcing adherence to documented procedures through the runbook approach.
To set up this solution using Amazon Bedrock Agents, refer to the GitHub repo that provisions the following resources. Make sure to verify the AWS Identity and Access Management (IAM) permissions and follow IAM best practices while deploying the code. It is advised to apply least-privilege permissions for IAM policies.

S3 bucket
Amazon Bedrock agent
Action group
Amazon Bedrock agent IAM role
Amazon Bedrock agent action group
Lambda function
Lambda service policy permission
Lambda IAM role

Benefits
With this solution, organizations can automate their operations and save a lot of time. The automation is also less prone to errors compared to manual execution. It offers the following additional benefits:

Reduced manual intervention – Automating incident detection, diagnosis, and remediation helps minimize human involvement, reducing the likelihood of errors, delays, and inconsistencies that often arise from manual processes.
Increased operational efficiency – By using generative AI, the solution speeds up incident resolution and optimizes operational workflows. The automation of tasks such as runbook execution, resource monitoring, and remediation allows IT teams to focus on more strategic initiatives.
Scalability – As organizations grow, managing IT operations manually becomes increasingly complex. Automating operations using generative AI can scale with the business, managing more incidents, runbooks, and infrastructure without requiring proportional increases in personnel.

Clean up
To avoid incurring unnecessary costs, it’s recommended to delete the resources created during the implementation of this solution when not in use. You can do this by deleting the AWS CloudFormation stacks deployed as part of the solution, or manually deleting the resources on the AWS Management Console or using the AWS Command Line Interface (AWS CLI).
Conclusion
The AIOps pipeline presented in this post empowers IT operations teams to streamline incident management processes, reduce manual interventions, and enhance operational efficiency. With the power of AWS services, organizations can automate incident detection, diagnosis, and remediation, enabling faster incident resolution and minimizing downtime.
Through the integration of Amazon Bedrock, Anthropic’s Claude on Amazon Bedrock, Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, and other supporting services, this solution provides real-time visibility into incidents, automated runbook generation, and dynamic remediation actions. Additionally, the solution provides timely notifications and seamless collaboration between AI agents and human operators, fostering a more proactive and efficient approach to IT operations.
Generative AI is rapidly transforming how businesses can take advantage of cloud technologies with ease. This solution using Amazon Bedrock demonstrates the immense potential of generative AI models to enhance human capabilities. By providing developers expert guidance grounded in AWS best practices, this AI assistant enables DevOps teams to review and optimize cloud architecture across of AWS accounts.
Try out the solution yourself and leave any feedback or questions in the comments.

About the Authors
Upendra V is a Sr. Solutions Architect at Amazon Web Services, specializing in Generative AI and cloud solutions. He helps enterprise customers design and deploy production-ready Generative AI workloads, implement Large Language Models (LLMs) and Agentic AI systems, and optimize cloud deployments. With expertise in cloud adoption and machine learning, he enables organizations to build and scale AI-driven applications efficiently.
Deepak Dixit is a Solutions Architect at Amazon Web Services, specializing in Generative AI and cloud solutions. He helps enterprises architect scalable AI/ML workloads, implement Large Language Models (LLMs), and optimize cloud-native applications.

NVIDIA AI Just Open Sourced Canary 1B and 180M Flash – Multilingual …

In the realm of artificial intelligence, multilingual speech recognition and translation have become essential tools for facilitating global communication. However, developing models that can accurately transcribe and translate multiple languages in real-time presents significant challenges. These challenges include managing diverse linguistic nuances, maintaining high accuracy, ensuring low latency, and deploying models efficiently across various devices.​

To address these challenges, NVIDIA AI has open-sourced two models: Canary 1B Flash and Canary 180M Flash. These models are designed for multilingual speech recognition and translation, supporting languages such as English, German, French, and Spanish. Released under the permissive CC-BY-4.0 license, these models are available for commercial use, encouraging innovation within the AI community.​

Technically, both models utilize an encoder-decoder architecture. The encoder is based on FastConformer, which efficiently processes audio features, while the Transformer Decoder handles text generation. Task-specific tokens, including <target language>, <task>, <toggle timestamps>, and <toggle PnC> (punctuation and capitalization), guide the model’s output. The Canary 1B Flash model comprises 32 encoder layers and 4 decoder layers, totaling 883 million parameters, whereas the Canary 180M Flash model consists of 17 encoder layers and 4 decoder layers, amounting to 182 million parameters. This design ensures scalability and adaptability to various languages and tasks. ​

Performance metrics indicate that the Canary 1B Flash model achieves an inference speed exceeding 1000 RTFx on open ASR leaderboard datasets, enabling real-time processing. In English automatic speech recognition (ASR) tasks, it attains a word error rate (WER) of 1.48% on the Librispeech Clean dataset and 2.87% on the Librispeech Other dataset. For multilingual ASR, the model achieves WERs of 4.36% for German, 2.69% for Spanish, and 4.47% for French on the MLS test set. In automatic speech translation (AST) tasks, the model demonstrates robust performance with BLEU scores of 32.27 for English to German, 22.6 for English to Spanish, and 41.22 for English to French on the FLEURS test set. ​

Data as of March 20 2025

The smaller Canary 180M Flash model also delivers impressive results, with an inference speed surpassing 1200 RTFx. It achieves a WER of 1.87% on the Librispeech Clean dataset and 3.83% on the Librispeech Other dataset for English ASR. For multilingual ASR, the model records WERs of 4.81% for German, 3.17% for Spanish, and 4.75% for French on the MLS test set. In AST tasks, it achieves BLEU scores of 28.18 for English to German, 20.47 for English to Spanish, and 36.66 for English to French on the FLEURS test set. ​

Both models support word-level and segment-level timestamping, enhancing their utility in applications requiring precise alignment between audio and text. Their compact sizes make them suitable for on-device deployment, enabling offline processing and reducing dependency on cloud services. Moreover, their robustness leads to fewer hallucinations during translation tasks, ensuring more reliable outputs. The open-source release under the CC-BY-4.0 license encourages commercial utilization and further development by the community.​

In conclusion, NVIDIA’s open-sourcing of the Canary 1B and 180M Flash models represents a significant advancement in multilingual speech recognition and translation. Their high accuracy, real-time processing capabilities, and adaptability for on-device deployment address many existing challenges in the field. By making these models publicly available, NVIDIA not only demonstrates its commitment to advancing AI research but also empowers developers and organizations to build more inclusive and efficient communication tools.

Check out the Canary 1B Model and Canary 180M Flash. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

The post NVIDIA AI Just Open Sourced Canary 1B and 180M Flash – Multilingual Speech Recognition and Translation Models appeared first on MarkTechPost.

Microsoft AI Introduces Claimify: A Novel LLM-based Claim-Extraction M …

The widespread adoption of Large Language Models (LLMs) has significantly changed the landscape of content creation and consumption. However, it has also introduced critical challenges regarding accuracy and factual reliability. The content generated by LLMs often includes claims that lack proper verification, potentially leading to misinformation. Therefore, accurately extracting claims from these outputs for effective fact-checking has become essential, albeit challenging due to inherent ambiguities and context dependencies.

Microsoft AI Research has recently developed Claimify, an advanced claim-extraction method based on LLMs, specifically designed to enhance accuracy, comprehensiveness, and context-awareness in extracting claims from LLM outputs. Claimify addresses the limitations of existing methods by explicitly dealing with ambiguity. Unlike other approaches, it identifies sentences with multiple possible interpretations and only proceeds with claim extraction when the intended meaning is clearly determined within the given context. This careful approach ensures higher accuracy and reliability, particularly benefiting subsequent fact-checking efforts.

From a technical standpoint, Claimify employs a structured pipeline comprising three key stages: Selection, Disambiguation, and Decomposition. During the Selection stage, Claimify leverages LLMs to identify sentences that contain verifiable information, filtering out those without factual content. In the Disambiguation stage, it uniquely focuses on detecting and resolving ambiguities, such as unclear references or multiple plausible interpretations. Claims are extracted only if ambiguities can be confidently resolved. The final stage, Decomposition, involves converting each clarified sentence into precise, context-independent claims. This structured process enhances both the accuracy and completeness of the resulting claims.

In evaluations using the BingCheck dataset—which covers a broad range of topics and complex LLM-generated responses—Claimify demonstrated notable improvements over previous methods. It achieved a high entailment rate of 99%, indicating a strong consistency between the extracted claims and the original content. Regarding coverage, Claimify captured 87.6% of verifiable content while maintaining a high precision rate of 96.7%, outperforming comparable approaches. Its systematic approach to decontextualization also ensured that essential contextual details were retained, resulting in better-grounded claims compared to prior methods.

Overall, Claimify represents a meaningful advancement in the automated extraction of reliable claims from LLM-generated content. By methodically addressing ambiguity and contextuality through a structured and careful evaluation framework, Claimify establishes a new standard for accuracy and reliability. As reliance on LLM-produced content continues to grow, tools like Claimify will play an increasingly crucial role in ensuring the trustworthiness and factual integrity of this content.

Check out the Paper and Technical details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

The post Microsoft AI Introduces Claimify: A Novel LLM-based Claim-Extraction Method that Outperforms Prior Solutions to Produce More Accurate, Comprehensive, and Substantiated Claims from LLM Outputs appeared first on MarkTechPost.

A Coding Implementation to Build a Document Search Agent (DocSearchAge …

In today’s information-rich world, finding relevant documents quickly is crucial. Traditional keyword-based search systems often fall short when dealing with semantic meaning. This tutorial demonstrates how to build a powerful document search engine using:

Hugging Face’s embedding models to convert text into rich vector representations

Chroma DB as our vector database for efficient similarity search

Sentence transformers for high-quality text embeddings

This implementation enables semantic search capabilities – finding documents based on meaning rather than just keyword matching. By the end of this tutorial, you’ll have a working document search engine that can:

Process and embed text documents

Store these embeddings efficiently

Retrieve the most semantically similar documents to any query

Handle a variety of document types and search needs

Please follow the detailed steps mentioned below in sequence to implement DocSearchAgent.

First, we need to install the necessary libraries. 

Copy CodeCopiedUse a different Browser!pip install chromadb sentence-transformers langchain datasets

Let’s start by importing the libraries we’ll use:

Copy CodeCopiedUse a different Browserimport os
import numpy as np
import pandas as pd
from datasets import load_dataset
import chromadb
from chromadb.utils import embedding_functions
from sentence_transformers import SentenceTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
import time

For this tutorial, we’ll use a subset of Wikipedia articles from the Hugging Face datasets library. This gives us a diverse set of documents to work with.

Copy CodeCopiedUse a different Browserdataset = load_dataset(“wikipedia”, “20220301.en”, split=”train[:1000]”)
print(f”Loaded {len(dataset)} Wikipedia articles”)

documents = []
for i, article in enumerate(dataset):
doc = {
“id”: f”doc_{i}”,
“title”: article[“title”],
“text”: article[“text”],
“url”: article[“url”]
}
documents.append(doc)

df = pd.DataFrame(documents)
df.head(3)

Now, let’s split our documents into smaller chunks for more granular searching:

Copy CodeCopiedUse a different Browsertext_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
)

chunks = []
chunk_ids = []
chunk_sources = []

for i, doc in enumerate(documents):
doc_chunks = text_splitter.split_text(doc[“text”])
chunks.extend(doc_chunks)
chunk_ids.extend([f”chunk_{i}_{j}” for j in range(len(doc_chunks))])
chunk_sources.extend([doc[“title”]] * len(doc_chunks))

print(f”Created {len(chunks)} chunks from {len(documents)} documents”)

We’ll use a pre-trained sentence transformer model from Hugging Face to create our embeddings:

Copy CodeCopiedUse a different Browsermodel_name = “sentence-transformers/all-MiniLM-L6-v2”
embedding_model = SentenceTransformer(model_name)

sample_text = “This is a sample text to test our embedding model.”
sample_embedding = embedding_model.encode(sample_text)
print(f”Embedding dimension: {len(sample_embedding)}”)

Now, let’s set up Chroma DB, a lightweight vector database perfect for our search engine:

Copy CodeCopiedUse a different Browserchroma_client = chromadb.Client()

embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)

collection = chroma_client.create_collection(
name=”document_search”,
embedding_function=embedding_function
)

batch_size = 100
for i in range(0, len(chunks), batch_size):
end_idx = min(i + batch_size, len(chunks))

batch_ids = chunk_ids[i:end_idx]
batch_chunks = chunks[i:end_idx]
batch_sources = chunk_sources[i:end_idx]

collection.add(
ids=batch_ids,
documents=batch_chunks,
metadatas=[{“source”: source} for source in batch_sources]
)

print(f”Added batch {i//batch_size + 1}/{(len(chunks)-1)//batch_size + 1} to the collection”)

print(f”Total documents in collection: {collection.count()}”)

Now comes the exciting part – searching through our documents:

Copy CodeCopiedUse a different Browserdef search_documents(query, n_results=5):
“””
Search for documents similar to the query.

Args:
query (str): The search query
n_results (int): Number of results to return

Returns:
dict: Search results
“””
start_time = time.time()

results = collection.query(
query_texts=[query],
n_results=n_results
)

end_time = time.time()
search_time = end_time – start_time

print(f”Search completed in {search_time:.4f} seconds”)
return results

queries = [
“What are the effects of climate change?”,
“History of artificial intelligence”,
“Space exploration missions”
]

for query in queries:
print(f”nQuery: {query}”)
results = search_documents(query)

for i, (doc, metadata) in enumerate(zip(results[‘documents’][0], results[‘metadatas’][0])):
print(f”nResult {i+1} from {metadata[‘source’]}:”)
print(f”{doc[:200]}…”)

Let’s create a simple function to provide a better user experience:

Copy CodeCopiedUse a different Browserdef interactive_search():
“””
Interactive search interface for the document search engine.
“””
while True:
query = input(“nEnter your search query (or ‘quit’ to exit): “)

if query.lower() == ‘quit’:
print(“Exiting search interface…”)
break

n_results = int(input(“How many results would you like? “))

results = search_documents(query, n_results)

print(f”nFound {len(results[‘documents’][0])} results for ‘{query}’:”)

for i, (doc, metadata, distance) in enumerate(zip(
results[‘documents’][0],
results[‘metadatas’][0],
results[‘distances’][0]
)):
relevance = 1 – distance
print(f”n— Result {i+1} —“)
print(f”Source: {metadata[‘source’]}”)
print(f”Relevance: {relevance:.2f}”)
print(f”Excerpt: {doc[:300]}…”)
print(“-” * 50)

interactive_search()

Let’s add the ability to filter our search results by metadata:

Copy CodeCopiedUse a different Browserdef filtered_search(query, filter_source=None, n_results=5):
“””
Search with optional filtering by source.

Args:
query (str): The search query
filter_source (str): Optional source to filter by
n_results (int): Number of results to return

Returns:
dict: Search results
“””
where_clause = {“source”: filter_source} if filter_source else None

results = collection.query(
query_texts=[query],
n_results=n_results,
where=where_clause
)

return results

unique_sources = list(set(chunk_sources))
print(f”Available sources for filtering: {len(unique_sources)}”)
print(unique_sources[:5])

if len(unique_sources) > 0:
filter_source = unique_sources[0]
query = “main concepts and principles”

print(f”nFiltered search for ‘{query}’ in source ‘{filter_source}’:”)
results = filtered_search(query, filter_source=filter_source)

for i, doc in enumerate(results[‘documents’][0]):
print(f”nResult {i+1}:”)
print(f”{doc[:200]}…”)

In conclusion, we demonstrate how to build a semantic document search engine using Hugging Face embedding models and ChromaDB. The system retrieves documents based on meaning rather than just keywords by transforming text into vector representations. The implementation processes Wikipedia articles chunks them for granularity, embeds them using sentence transformers, and stores them in a vector database for efficient retrieval. The final product features interactive searching, metadata filtering, and relevance ranking.

Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 80k+ ML SubReddit.

The post A Coding Implementation to Build a Document Search Agent (DocSearchAgent) with Hugging Face, ChromaDB, and Langchain appeared first on MarkTechPost.

Streamline AWS resource troubleshooting with Amazon Bedrock Agents and …

As AWS environments grow in complexity, troubleshooting issues with resources can become a daunting task. Manually investigating and resolving problems can be time-consuming and error-prone, especially when dealing with intricate systems. Fortunately, AWS provides a powerful tool called AWS Support Automation Workflows, which is a collection of curated AWS Systems Manager self-service automation runbooks. These runbooks are created by AWS Support Engineering with best practices learned from solving customer issues. They enable AWS customers to troubleshoot, diagnose, and remediate common issues with their AWS resources.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Because Amazon Bedrock is serverless, you don’t have to manage infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.
In this post, we explore how to use the power of Amazon Bedrock Agents and AWS Support Automation Workflows to create an intelligent agent capable of troubleshooting issues with AWS resources.
Solution overview
Although the solution is versatile and can be adapted to use a variety of AWS Support Automation Workflows, we focus on a specific example: troubleshooting an Amazon Elastic Kubernetes Service (Amazon EKS) worker node that failed to join a cluster. The following diagram provides a high-level overview of troubleshooting agents with Amazon Bedrock.

Our solution is built around the following key components that work together to provide a seamless and efficient troubleshooting experience:

Amazon Bedrock Agents – Amazon Bedrock Agents acts as the intelligent interface between users and AWS Support Automation Workflows. It processes natural language queries to understand the issue context and manages conversation flow to gather required information. The agent uses Anthropic’s Claude 3.5 Sonnet model for advanced reasoning and response generation, enabling natural interactions throughout the troubleshooting process.
Amazon Bedrock agent action groups – These action groups define the structured API operations that the Amazon Bedrock agent can invoke. Using OpenAPI specifications, they define the interface between the agent and AWS Lambda functions, specifying the available operations, required parameters, and expected responses. Each action group contains the API schema that tells the agent how to properly format requests and interpret responses when interacting with Lambda functions.
Lambda Function – The Lambda function acts as the integration layer between the Amazon Bedrock agent and AWS Support Automation Workflows. It validates input parameters from the agent and initiates the appropriate SAW runbook execution. It monitors the automation progress while processing the technical output into a structured format. When the workflow is complete, it returns formatted results back to the agent for user presentation.
IAM role – The AWS Identity and Access Management (IAM) role provides the Lambda function with the necessary permissions to execute AWS Support Automation Workflows and interact with required AWS services. This role follows the principle of least privilege to maintain security best practices.
AWS Support Automation Workflows – These pre-built diagnostic runbooks are developed by AWS Support Engineering. The workflows execute comprehensive system checks based on AWS best practices in a standardized, repeatable manner. They cover a wide range of AWS services and common issues, encapsulating AWS Support’s extensive troubleshooting expertise.

The following steps outline the workflow of our solution:

Users start by describing their AWS resource issue in natural language through the Amazon Bedrock chat console. For example, “Why isn’t my EKS worker node joining the cluster?”
The Amazon Bedrock agent analyzes the user’s question and matches it to the appropriate action defined in its OpenAPI schema. If essential information is missing, such as a cluster name or instance ID, the agent engages in a natural conversation to gather the required parameters. This makes sure that necessary data is collected before proceeding with the troubleshooting workflow.
The Lambda function receives the validated request and triggers the corresponding AWS Support Automation Workflow. These SAW runbooks contain comprehensive diagnostic checks developed by AWS Support Engineering to identify common issues and their root causes. The checks run automatically without requiring user intervention.
The SAW runbook systematically executes its diagnostic checks and compiles the findings. These results, including identified issues and configuration problems, are structured in JSON format and returned to the Lambda function.
The Amazon Bedrock agent processes the diagnostic results using chain of thought (CoT) reasoning, based on the ReAct (synergizing reasoning and acting) technique. This enables the agent to analyze the technical findings, identify root causes, generate clear explanations, and provide step-by-step remediation guidance.

During the reasoning phase of the agent, the user is able to view the reasoning steps.
Troubleshooting examples
Let’s take a closer look at a common issue we mentioned earlier and how our agent can assist in troubleshooting it.
EKS worker node failed to join EKS cluster
When an EKS worker node fails to join an EKS cluster, our Amazon Bedrock agent can be invoked with the relevant information: cluster name and worker node ID. The agent will execute the corresponding AWS Support Automation Workflow, which will perform checks like verifying the worker node’s IAM role permissions and verifying the necessary network connectivity.
The automation workflow will run all the checks. Then Amazon Bedrock agent will ingest the troubleshooting, explain the root cause of the issue to the user, and suggest remediation steps based on the AWSSupport-TroubleshootEKSWorkerNode output, such as updating the worker node’s IAM role or resolving network configuration issues, enabling them to take the necessary actions to resolve the problem.
OpenAPI example
When you create an action group in Amazon Bedrock, you must define the parameters that the agent needs to invoke from the user. You can also define API operations that the agent can invoke using these parameters. To define the API operations, we will create an OpenAPI schema in JSON:
“Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post”: {
“properties”: {
“cluster_name”: {
“type”: “string”,
“title”: “Cluster Name”,
“description”: “The name of the EKS cluster”
},
“worker_id”: {
“type”: “string”,
“title”: “Worker Id”,
“description”: “The ID of the worker node”
}
},
“type”: “object”,
“required”: [
“cluster_name”,
“worker_id”
],
“title”: “Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post”
}

The schema consists of the following components:

Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post – This is the name of the schema, which corresponds to the request body for the troubleshoot-eks-worker_node POST endpoint.
Properties – This section defines the properties (fields) of the schema:

“cluster_name” – This property represents the name of the EKS cluster. It is a string type and has a title and description.
“worker_id” – This property represents the ID of the worker node. It is also a string type and has a title and description.

Type – This property specifies that the schema is an “object” type, meaning it is a collection of key-value pairs.
Required – This property lists the required fields for the schema, which in this case are “cluster_name” and “worker _id”. These fields must be provided in the request body.
Title – This property provides a human-readable title for the schema, which can be used for documentation purposes.

The OpenAPI schema defines the structure of the request body. To learn more, see Define OpenAPI schemas for your agent’s action groups in Amazon Bedrock and OpenAPI specification.
Lambda function code
Now let’s explore the Lambda function code:
@app.post(“/troubleshoot-eks-worker-node”)
@tracer.capture_method
def troubleshoot_eks_worker_node(
cluster_name: Annotated[str, Body(description=”The name of the EKS cluster”)],
worker_id: Annotated[str, Body(description=”The ID of the worker node”)]
) -> dict:
“””
Troubleshoot EKS worker node that failed to join the cluster.

Args:
cluster_name (str): The name of the EKS cluster.
worker_id (str): The ID of the worker node.

Returns:
dict: The output of the Automation execution.
“””
return execute_automation(
automation_name=’AWSSupport-TroubleshootEKSWorkerNode’,
parameters={
‘ClusterName’: [cluster_name],
‘WorkerID’: [worker_id]
},
execution_mode=’TroubleshootWorkerNode’
)

The code consists of the following components

app.post(“/troubleshoot-eks-worker-node”, description=”Troubleshoot EKS worker node failed to join the cluster”) – This is a decorator that sets up a route for a POST request to the /troubleshoot-eks-worker-node endpoint. The description parameter provides a brief explanation of what this endpoint does.
@tracer.capture_method – This is another decorator that is likely used for tracing or monitoring purposes, possibly as part of an application performance monitoring (APM) tool. It captures information about the execution of the function, such as the duration, errors, and other metrics.
cluster_name: str = Body(description=”The name of the EKS cluster”), – This parameter specifies that the cluster_name is a string type and is expected to be passed in the request body. The Body decorator is used to indicate that this parameter should be extracted from the request body. The description parameter provides a brief explanation of what this parameter represents.
worker_id: str = Body(description=”The ID of the worker node”) – This parameter specifies that the worker_id is a string type and is expected to be passed in the request body.
 -> Annotated[dict, Body(description=”The output of the Automation execution”)] – This is the return type of the function, which is a dictionary. The Annotated type is used to provide additional metadata about the return value, specifically that it should be included in the response body. The description parameter provides a brief explanation of what the return value represents.

To link a new SAW runbook in the Lambda function, you can follow the same template.
Prerequisites
Make sure you have the following prerequisites:

An AWS account
Access for Anthropic’s Claude 3.5 Sonnet model enabled in Amazon Bedrock
Your credentials configured in the AWS Command Line Interface (AWS CLI)
Javascript installed
The AWS Cloud Development Kit (AWS CDK) 143.0

Deploy the solution
Complete the following steps to deploy the solution:

Clone the GitHub repository and go to the root of your downloaded repository folder:

$ git clone https://github.com/aws-samples/sample-bedrock-agent-for-troubleshooting-aws-resources.git
$ cd bedrock-agent-for-troubleshooting-aws-resources

Install local dependencies:

$ npm install

Sign in to your AWS account using the AWS CLI by configuring your credential file (replace <PROFILE_NAME> with the profile name of your deployment AWS account):

$ export AWS_PROFILE=PROFILE_NAME

Bootstrap the AWS CDK environment (this is a one-time activity and is not needed if your AWS account is already bootstrapped):

$ cdk bootstrap

Run the script to replace the placeholders for your AWS account and AWS Region in the config files:

$ cdk deploy –all
Test the agent
Navigate to the Amazon Bedrock Agents console in your Region and find your deployed agent. You will find the agent ID in the cdk deploy command output.
You can now interact with the agent and test troubleshooting a worker node not joining an EKS cluster. The following are some example questions:

I want to troubleshoot why my Amazon EKS worker node is not joining the cluster. Can you help me?
Why this instance <instance_ID> is not able to join the EKS cluster <Cluster_Name>?

The following screenshot shows the console view of the agent.

The agent understood the question and mapped it with the right action group. It also spotted that the parameters needed are missing in the user prompt. It came back with a follow-up question to require the Amazon Elastic Compute Cloud (Amazon EC2) instance ID and EKS cluster name.

We can see the agent’s thought process in the trace step 1. The agent assesses the next step as ready to call the right Lambda function and right API path.

With the results coming back from the runbook, the agent now reviews the troubleshooting outcome. It goes through the information and will start writing the solution where it provides the instructions for the user to follow.
In the answer provided, the agent was able to spot all the issues and transform that into solution steps. We can also see the agent mentioning the right information like IAM policy and the required tag.
Clean up
When implementing Amazon Bedrock Agents, there are no additional charges for resource construction. However, costs are incurred for embedding model and text model invocations on Amazon Bedrock, with charges based on the pricing of each FM used. In this use case, you will also incur costs for Lambda invocations.
To avoid incurring future charges, delete the created resources by the AWS CDK. From the root of your repository folder, run the following command:
$ npm run cdk destroy –all
Conclusion
Amazon Bedrock Agents and AWS Support Automation Workflows are powerful tools that, when combined, can revolutionize AWS resource troubleshooting. In this post, we explored a serverless application built with the AWS CDK that demonstrates how these technologies can be integrated to create an intelligent troubleshooting agent. By defining action groups within the Amazon Bedrock agent and associating them with specific scenarios and automation workflows, we’ve developed a highly efficient process for diagnosing and resolving issues such as Amazon EKS worker node failures.
Our solution showcases the potential for automating complex troubleshooting tasks, saving time and streamlining operations. Powered by Anthropic’s Claude 3.5 Sonnet, the agent demonstrates improved understanding and responding in languages other than English, such as French, Japanese, and Spanish, making it accessible to global teams while maintaining its technical accuracy and effectiveness. The intelligent agent quickly identifies root causes and provides actionable insights, while automatically executing relevant AWS Support Automation Workflows. This approach not only minimizes downtime, but also scales effectively to accommodate various AWS services and use cases, making it a versatile foundation for organizations looking to enhance their AWS infrastructure management.
Explore the AWS Support Automation Workflow for additional use cases and consider using this solution as a starting point for building more comprehensive troubleshooting agents tailored to your organization’s needs. To learn more about using agents to orchestrate workflows, see Automate tasks in your application using conversational agents. For details about using guardrails to safeguard your generative AI applications, refer to Stop harmful content in models using Amazon Bedrock Guardrails.
Happy coding!
Acknowledgements
The authors thank all the reviewers for their valuable feedback.

About the Authors
Wael Dimassi is a Technical Account Manager at AWS, building on his 7-year background as a Machine Learning specialist. He enjoys learning about AWS AI/ML services and helping customers meet their business outcomes by building solutions for them.
Marwen Benzarti is a Senior Cloud Support Engineer at AWS Support where he specializes in Infrastructure as Code. With over 4 years at AWS and 2 years of previous experience as a DevOps engineer, Marwen works closely with customers to implement AWS best practices and troubleshoot complex technical challenges. Outside of work, he enjoys playing both competitive multiplayer and immersive story-driven video games.

Create generative AI agents that interact with your companies’ syste …

Today we are announcing that general availability of Amazon Bedrock in Amazon SageMaker Unified Studio.
Companies of all sizes face mounting pressure to operate efficiently as they manage growing volumes of data, systems, and customer interactions. Manual processes and fragmented information sources can create bottlenecks and slow decision-making, limiting teams from focusing on higher-value work. Generative AI agents offer a powerful solution by automatically interfacing with company systems, executing tasks, and delivering instant insights, helping organizations scale operations without scaling complexity.
Amazon Bedrock in SageMaker Unified Studio addresses these challenges by providing a unified service for building AI-driven solutions that centralize customer data and enable natural language interactions. It integrates with existing applications and includes key Amazon Bedrock features like foundation models (FMs), prompts, knowledge bases, agents, flows, evaluation, and guardrails. Users can access these AI capabilities through their organization’s single sign-on (SSO), collaborate with team members, and refine AI applications without needing AWS Management Console access.
Generative AI-powered agents for automated workflows
Amazon Bedrock in SageMaker Unified Studio allows you to create and deploy generative AI agents that integrate with organizational applications, databases, and third-party systems, enabling natural language interactions across the entire technology stack. The chat agent bridges complex information systems and user-friendly communication. By using Amazon Bedrock functions and Amazon Bedrock Knowledge Bases, the agent can connect with data sources like JIRA APIs for real-time project status tracking, retrieve customer information, update project tasks, and manage preferences.
Sales and marketing teams can quickly access customer information and their meeting preferences, and project managers can efficiently manage JIRA tasks and timelines. This streamlined process enhances productivity and customer interactions across the organization.
The following diagram illustrates the generative AI agent solution workflow.

Solution overview
Amazon Bedrock provides a governed collaborative environment to build and share generative AI applications within SageMaker Unified Studio. Let’s look at an example solution for implementing a customer management agent:

An agentic chat can be built with Amazon Bedrock chat applications, and integrated with functions that can be quickly built with other AWS services such as AWS Lambda and Amazon API Gateway.
SageMaker Unified Studio, using Amazon DataZone, provides a comprehensive data management solution through its integrated services. Organization administrators can control member access to Amazon Bedrock models and features, maintaining secure identity management and granular access control.

Before we dive deep into the deployment of the AI agent, let’s walk through the key steps of the architecture, as shown in the following diagram.

The workflow is as follows:

The user logs into SageMaker Unified Studio using their organization’s SSO from AWS IAM Identity Center. Then the user interacts with the chat application using natural language.
The Amazon Bedrock chat application uses a function to retrieve JIRA status and customer information from the database through the endpoint using API Gateway.
The chat application authenticates with API Gateway to securely access the endpoint with the random API key from AWS Secrets Manager, and triggers the Lambda function based on the user’s request.
The Lambda function performs the actions by calling the JIRA API or database with the required parameters provided from the agent. The agent has the capability to:

Provide a brief customer overview.
List recent customer interactions.
Retrieve the meeting preferences for a customer.
Retrieve open JIRA tickets for a project.
Update the due date for a JIRA ticket.

Prerequisites
You need the following prerequisites to follow along with this solution implementation:

An AWS account
User access to Amazon Bedrock in SageMaker Unified Studio
Model access to Amazon Nova Pro on Amazon Bedrock in a supported AWS Region
A JIRA application, JIRA URL, and a JIRA API token to your account

We assume you are familiar with fundamental serverless constructs on AWS, such as API Gateway, Lambda functions, and IAM Identity Center. We don’t focus on defining these services in this post, but we do use them to show use cases for the new Amazon Bedrock features within SageMaker Unified Studio.
Deploy the solution
Complete the following deployment steps:

Download the code from GitHub.
Get the value of JIRA_API_KEY_ARN, JIRA_URL, and JIRA_USER_NAME for the Lambda function.
Use the following AWS CloudFormation template, and refer to Create a stack from the CloudFormation console to launch the stack in your preferred AWS Region.
After the stack is deployed, note down the API Gateway URL value from the CloudFormation Outputs tab (ApiInvokeURL).
On the Secrets Manager console, find the secrets for JIRA_API_KEY_ARN, JIRA_URL, and JIRA_USER_NAME.
Choose Retrieve secret and copy the variables from Step 2 to the secret plaintext string.
Sign in to SageMaker Unified Studio using your organization’s SSO.

Create a new project
Complete the following steps to create a new project:

On the SageMaker Unified Studio landing page, create a new project.
Give the project a name (for example, crm-agent).
Choose Generative AI application development profile and continue.
Use the default settings and continue.
Review and choose Create project to confirm.

Build the chat agent application
Complete the following steps to build the chat agent application:

Under the New section located to the right of the crm-agent project landing page, choose Chat agent.

It has a list of configurations for your agent application.

Under the model section, choose a desired FM supported by Amazon Bedrock. For this crm-agent, we choose Amazon Nova Pro.
In the system prompt section, add the following prompt. Optionally, you could add examples of user input and model responses to improve it.

You are a customer relationship management agent tasked with helping a sales person plan their work with customers. You are provided with an API endpoint. This endpoint can provide information like company overview, company interaction history (meeting times and notes), company meeting preferences (meeting type, day of week, and time of day). You can also query Jira tasks and update their timeline. After receiving a response, clean it up into a readable format. If the output is a numbered list, format it as such with newline characters and numbers.

In the Functions section, choose Create a new function.
Give the function a name, such as crm_agent_calling.
For Function schema, use the OpenAPI definition from the GitHub repo.

For Authentication method, choose API Keys (Max. 2 Keys)and enter the following details:

For Key sent in, choose Header.
For Key name, enter x-api-key.
For Key value, enter the Secrets Manager api Key

In the API servers section, input the endpoint URL.
Choose Create to finish the function creation.
In the Functions section of the chat agent application, choose the function you created and choose Save to finish the application creation.

Example interactions
In this section, we explore two example interactions.
Use case 1: CRM analyst can retrieve customer details stored in the database with natural language.
For this use case, we ask the following questions in the chat application:

Give me a brief overview of customer C-jkl101112.
List the last 2 recent interactions for customer C-def456.
What communication method does customer C-mno131415 prefer?
Recommend optimal time and contact channel to reach out to C-ghi789 based on their preferences and our last interaction.

The response from the chat application is shown in the following screenshot. The agent successfully retrieves the customer’s information from the database. It understands the user’s question and queries the database to find corresponding answers.

Use case 2: Project managers can list and update the JIRA ticket.
In this use case, we ask the following questions:

What are the open JIRA Tasks for project id CRM?
Please update JIRA Task CRM-3 to 1 weeks out.

The response from the chat application is shown in the following screenshot. Similar to the previous use case, the agent accesses the JIRA board and fetches the JIRA project information. It provides a list of open JIRA tasks and updates the timeline of the task following the user’s request.

Clean up
To avoid incurring additional costs, complete the following steps:

Delete the CloudFormation stack.
Delete the function component in Amazon Bedrock.
Delete the chat agent application in Amazon Bedrock.
Delete the domains in SageMaker Unified Studio.

Cost
Amazon Bedrock in SageMaker Unified Studio doesn’t incur separate charges, but you will be charged for the individual AWS services and resources utilized within the service. You only pay for the Amazon Bedrock resources you use, without minimum fees or upfront commitments.
If you need further assistance with pricing calculations or have questions about optimizing costs for your specific use case, please reach out to AWS Support or consult with your account manager.
Conclusion
In this post, we demonstrated how to use Amazon Bedrock in SageMaker Unified Studio to build a generative AI application to integrate with an existing endpoint and database.
The generative AI features of Amazon Bedrock transform how organizations build and deploy AI solutions by enabling rapid agent prototyping and deployment. Teams can swiftly create, test, and launch chat agent applications, accelerating the implementation of AI solutions that automate complex tasks and enhance decision-making capabilities. The solution’s scalability and flexibility allow organizations to seamlessly integrate advanced AI capabilities into existing applications, databases, and third-party systems.
Through a unified chat interface, agents can handle project management, data retrieval, and workflow automation—significantly reducing manual effort while enhancing user experience. By making advanced AI capabilities more accessible and user-friendly, Amazon Bedrock in SageMaker Unified Studio empowers organizations to achieve new levels of productivity and customer satisfaction in today’s competitive landscape.
Try out Amazon Bedrock in SageMaker Unified Studio for your own use case, and share your questions in the comments.

About the Authors
Jady Liu is a Senior AI/ML Solutions Architect on the AWS GenAI Labs team based in Los Angeles, CA. With over a decade of experience in the technology sector, she has worked across diverse technologies and held multiple roles. Passionate about generative AI, she collaborates with major clients across industries to achieve their business goals by developing scalable, resilient, and cost-effective generative AI solutions on AWS. Outside of work, she enjoys traveling to explore wineries and distilleries.
Justin Ossai is a GenAI Labs Specialist Solutions Architect based in Dallas, TX. He is a highly passionate IT professional with over 15 years of technology experience. He has designed and implemented solutions with on-premises and cloud-based infrastructure for small and enterprise companies.

Asure’s approach to enhancing their call center experience using gen …

Asure, a company of over 600 employees, is a leading provider of cloud-based workforce management solutions designed to help small and midsized businesses streamline payroll and human resources (HR) operations and ensure compliance. Their offerings include a comprehensive suite of human capital management (HCM) solutions for payroll and tax, HR compliance services, time tracking, 401(k) plans, and more.
Asure anticipated that generative AI could aid contact center leaders to understand their team’s support performance, identify gaps and pain points in their products, and recognize the most effective strategies for training customer support representatives using call transcripts. The Asure team was manually analyzing thousands of call transcripts to uncover themes and trends, a process that lacked scalability. The overarching goal of this engagement was to improve upon this manual approach. Failing to adopt a more automated approach could have potentially led to decreased customer satisfaction scores and, consequently, a loss in future revenue. Therefore, it was valuable to provide Asure a post-call analytics pipeline capable of providing beneficial insights, thereby enhancing the overall customer support experience and driving business growth.
Asure recognized the potential of generative AI to further enhance the user experience and better understand the needs of the customer and wanted to find a partner to help realize it.
Pat Goepel, chairman and CEO of Asure, shares,

“In collaboration with the AWS Generative AI Innovation Center, we are utilizing Amazon Bedrock, Amazon Comprehend, and Amazon Q in QuickSight to understand trends in our own customer interactions, prioritize items for product development, and detect issues sooner so that we can be even more proactive in our support for our customers. Our partnership with AWS and our commitment to be early adopters of innovative technologies like Amazon Bedrock underscore our dedication to making advanced HCM technology accessible for businesses of any size.”
“We are thrilled to partner with AWS on this groundbreaking generative AI project. The robust AWS infrastructure and advanced AI capabilities provide the perfect foundation for us to innovate and push the boundaries of what’s possible. This collaboration will enable us to deliver cutting-edge solutions that not only meet but exceed our customers’ expectations. Together, we are poised to transform the landscape of AI-driven technology and create unprecedented value for our clients.”
—Yasmine Rodriguez, CTO of Asure.
“As we embarked on our journey at Asure to integrate generative AI into our solutions, finding the right partner was crucial. Being able to partner with the Gen AI Innovation Center at AWS brings not only technical expertise with AI but the experience of developing solutions at scale. This collaboration confirms that our AI solutions are not just innovative but also resilient. Together, we believe that we can harness the power of AI to drive efficiency, enhance customer experiences, and stay ahead in a rapidly evolving market.”
—John Canada, VP of Engineering at Asure.

In this post, we explore why Asure used the Amazon Web Services (AWS) post-call analytics (PCA) pipeline that generated insights across call centers at scale with the advanced capabilities of generative AI-powered services such as Amazon Bedrock and Amazon Q in QuickSight. Asure chose this approach because it provided in-depth consumer analytics, categorized call transcripts around common themes, and empowered contact center leaders to use natural language to answer queries. This ultimately allowed Asure to provide its customers with improvements in product and customer experiences.
Solution Overview
At a high level, the solution consists of first converting audio into transcripts using Amazon Transcribe and generating and evaluating summary fields for each transcript using Amazon Bedrock. In addition, Q&A can be done at a single call level using Amazon Bedrock or for many calls using Amazon Q in QuickSight. In the rest of this section, we describe these components and the services used in greater detail.
We added upon the existing PCA solution with the following services:

Amazon Bedrock
Amazon Q in QuickSight

Customer service and call center operations are highly dynamic, with evolving customer expectations, market trends, and technological advancements reshaping the industry at a rapid pace. Staying ahead in this competitive landscape demands agile, scalable, and intelligent solutions that can adapt to changing demands.
In this context, Amazon Bedrock emerges as an exceptional choice for developing a generative AI-powered solution to analyze customer service call transcripts. This fully managed service provides access to cutting-edge foundation models (FMs) from leading AI providers, enabling the seamless integration of state-of-the-art language models tailored for text analysis tasks. Amazon Bedrock offers fine-tuning capabilities that allow you to customize these pre-trained models using proprietary call transcript data, facilitating high accuracy and relevance without the need for extensive machine learning (ML) expertise. Moreover, Amazon Bedrock offers integration with other AWS services like Amazon SageMaker, which streamlines the deployment process, and its scalable architecture makes sure the solution can adapt to increasing call volumes effortlessly.
With robust security measures, data privacy safeguards, and a cost-effective pay-as-you-go model, Amazon Bedrock offers a secure, flexible, and cost-efficient service to harness generative AI’s potential in enhancing customer service analytics, ultimately leading to improved customer experiences and operational efficiencies.
Furthermore, by integrating a knowledge base containing organizational data, policies, and domain-specific information, the generative AI models can deliver more contextual, accurate, and relevant insights from the call transcripts. This knowledge base allows the models to understand and respond based on the company’s unique terminology, products, and processes, enabling deeper analysis and more actionable intelligence from customer interactions.
In this use case, Amazon Bedrock is used for both generation of summary fields for sample call transcripts and evaluation of these summary fields against a ground truth dataset. Its value comes from its simple integration into existing pipelines and various evaluation frameworks. Amazon Bedrock also allows you to choose various models for different use cases, making it an obvious choice for the solution due to its flexibility. Using Amazon Bedrock allows for iteration of the solution using knowledge bases for simple storage and access of call transcripts as well as guardrails for building responsible AI applications.
Amazon Bedrock
Amazon Bedrock is a fully managed service that makes FMs available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage the infrastructure.
Amazon Q in Quicksight
Amazon Q in QuickSight is a generative AI assistant that accelerates decision-making and enhances business productivity with generative business intelligence (BI) capabilities.
The original PCA solution includes the following services:

AWS Lambda
Amazon Simple Storage Service (Amazon S3)
Amazon CloudFront
Amazon Athena
Amazon Comprehend
Amazon Transcribe
Amazon Cognito

The solution consisted of the following components:

Call metadata generation – After the file ingestion step when transcripts are generated for each call transcript using Amazon Transcribe, Anthropic’s Claude Haiku FM in Amazon Bedrock is used to generate call-related metadata. This includes a summary, the category, the root cause, and other high-level fields generated from a call transcript. This is orchestrated using AWS Step Functions.
Individual call Q&A – For questions requiring a specific call, such as, “How did the customer react in call ID X,” Anthropic’s Claude Haiku is used to power a Q&A assistant located in a CloudFront application. This is powered by the web app portion of the architecture diagram (provided in the next section).
Aggregate call Q&A – To answer questions requiring multiple calls, such as “What are the most common issues detected,” Amazon Q on QuickSight is used to enhance the Agent Assist interface. This step is shown by business analysts interacting with QuickSight in the storage and visualization step through natural language.

To learn more about the architectural components of the PCA solution, including file ingestion, insight extraction, storage and visualization, and web application components, refer to Post call analytics for your contact center with Amazon language AI services.
Architecture
The following diagram illustrates the solution architecture. The evaluation framework, call metadata generation, and Amazon Q in QuickSight were new components introduced from the original PCA solution.

Ragas and a human-in-the-loop UI (as described in the customer blogpost with Tealium) were used to evaluate the metadata generation and individual call Q&A portions. Ragas is an open source evaluation framework that helps evaluate FM-generated text.
The high-level takeaways from this work are the following:

Anthropic’s Claude 3 Haiku successfully took in a call transcript and determined its summary, root cause, if the issue was resolved, and, if it was a callback, next steps by the customer and agent (generative AI-powered fields). When using Anthropic’s Claude 3 Haiku as opposed to Anthropic’s Claude Instant, there was a reduction in latency. With chain-of-thought reasoning, there was an increase in overall quality (includes how factual, understandable, and relevant responses are on a 1–5 scale, described in more detail later in this post) as measured by subject matter experts (SMEs). With the use of Amazon Bedrock, various models can be chosen based on different use cases, illustrating its flexibility in this application.
Amazon Q in QuickSight proved to be a powerful analytical tool in understanding and generating relevant insights from data through intuitive chart and table visualizations. It can perform simple calculations whenever necessary while also facilitating deep dives into issues and exploring data from multiple perspectives, demonstrating great value in insight generation.
The human-in-the-loop UI plus Ragas metrics proved effective to evaluate outputs of FMs used throughout the pipeline. Particularly, answer correctness, answer relevance, faithfulness, and summarization metrics (alignment and coverage score) were used to evaluate the call metadata generation and individual call Q&A components using Amazon Bedrock. Its flexibility in various FMs allowed the testing of many types of models to generate evaluation metrics, including Anthropic’s Claude Sonnet 3.5 and Anthropic’s Claude Haiku 3.

Call metadata generation
The call metadata generation pipeline consisted of converting an audio file to a call transcript in a JSON format using Amazon Transcribe and then generating key information for each transcript using Amazon Bedrock and Amazon Comprehend. The following diagram shows a subset of the preceding architecture diagram that demonstrates this.

The original PCA post linked previously shows how Amazon Transcribe and Amazon Comprehend are used in the metadata generation pipeline.
The call transcript input that was outputted from the Amazon Transcribe step of the Step Functions workflow followed the format in the following code example:

{
call_id: <call id>,
agent_id: <agent_id>
customer_id: <customer_id>
transcript: “””
   Agent: <Agent message>.
   Customer: <Customer message>
   Agent: <Agent message>.
   Customer: <Customer message>
   Agent: <Agent message>.
   Customer: <Customer message>
   ………..
    “””
}

Metadata was generated using Amazon Bedrock. Specifically, it extracted the summary, root cause, topic, and next steps, and answered key questions such as if the call was a callback and if the issue was ultimately resolved.
Prompts were stored in Amazon DynamoDB, allowing Asure to quickly modify prompts or add new generative AI-powered fields based on future enhancements. The following screenshot shows how prompts can be modified through DynamoDB.

Individual call Q&A
The chat assistant powered by Anthropic’s Claude Haiku was used to answer natural language queries on a single transcript. This assistant, the call metadata values generated from the previous section, and sentiments generated from Amazon Comprehend were displayed in an application hosted by CloudFront.
The user of the final chat assistant can modify the prompt in DynamoDB. The following screenshot shows the general prompt for an individual call Q&A.

The UI hosted by CloudFront allows an agent or supervisor to analyze a specific call to extract additional details. The following screenshot shows the insights Asure gleaned for a sample customer service call.

The following screenshot shows the chat assistant, which exists in the same webpage.

Evaluation Framework
This section outlines components of the evaluation framework used. It ultimately allows Asure to highlight important metrics for their use case and provides visibility into the generative AI application’s strengths and weaknesses. This was done using automated quantitative metrics provided by Ragas, DeepEval, and traditional ML metrics as well as human-in-the-loop evaluation done by SMEs.
Quantitative Metrics
The results of the generated call metadata and individual call Q&A were evaluated using quantitative metrics provided by Ragas: answer correctness, answer relevance, and faithfulness; and DeepEval: alignment and coverage, both powered by FMs from Amazon Bedrock. Its simple integration with external libraries allowed Amazon Bedrock to be configured with existing libraries. In addition, traditional ML metrics were used for “Yes/No” answers. The following are the metrics used for different components of the solution:

Call metadata generation – This included the following:

Summary – Alignment and coverage (find a description of these metrics in the DeepEval repository) and answer correctness
Issue resolved, callback – F1-score and accuracy
Topic, next steps, root cause – Answer correctness, answer relevance, and faithfulness

Individual call Q&A – Answer correctness, answer relevance, and faithfulness
Human in the loop – Both individual call Q&A and call metadata generation used human-in-the-loop metrics

For a description of answer correctness, answer relevance, and faithfulness, refer to the customer blogpost with Tealium.
The use of Amazon Bedrock in the evaluation framework allowed for a flexibility of different models based on different use cases. For example, Anthropic’s Claude Sonnet 3.5 was used to generate DeepEval metrics, whereas Anthropic’s Claude 3 Haiku (with its low latency) was ideal for Ragas.
Human in the Loop
The human-in-the-loop UI is described in the Human-in-the-Loop section of the customer blogpost with Tealium. To use it to evaluate this solution, some changes had to be made:

There is a choice for the user to analyze one of the generated metadata fields for a call (such as a summary) or a specific Q&A pair.
The user can bring in two model outputs for comparison. This can include outputs from the same FMs but using different prompts, outputs from different FMs but using the same prompt, and outputs from different FMs and using different prompts.
Additional checks for fluency, coherence, creativity, toxicity, relevance, completeness, and overall quality were added, where the user adds in a measure of this metric based on the model output from a range of 0–4.

The following screenshots show the UI.

The human-in-the-loop system establishes a mechanism between domain expertise and Amazon Bedrock outputs. This in turn will lead to improved generative AI applications and ultimately to high customer trust of such systems.
To demo the human-in-the-loop UI, follow the instructions in the GitHub repo.
Natural Language Q&A using Amazon Q in Quicksight
QuickSight, integrated with Amazon Q, enabled Asure to use natural language queries for comprehensive customer analytics. By interpreting queries on sentiments, call volumes, issue resolutions, and agent performance, the service delivered data-driven visualizations. This empowered Asure to quickly identify pain points, optimize operations, and deliver exceptional customer experiences through a streamlined, scalable analytics solution tailored for call center operations.
Integrate Amazon Q in QuickSight with the PCA solution
The Amazon Q in QuickSight integration was done by following three high-level steps:

Create a dataset on QuickSight.
Create a topic on QuickSight from the dataset.
Query using natural language.

Create a dataset on QuickSight
We used Athena as the data source, which queries data from Amazon S3. QuickSight can be configured through multiple data sources (for more information, refer to Supported data sources). For this use case, we used the data generated from the PCA pipeline as the data source for further analytics and natural language queries in Amazon Q in QuickSight. The PCA pipeline stores data in Amazon S3, which can be queried in Athena, an interactive query service that allows you to analyze data directly in Amazon S3 using standard SQL.

On the QuickSight console, choose Datasets in the navigation pane.
Choose Create new.
Choose Athena as the data source and input the particular catalog, database, and table that Amazon Q in QuickSight will reference.

Confirm the dataset was created successfully and proceed to the next step.

Create a topic on Amazon Quicksight from the dataset created
Users can use topics in QuickSight, powered by Amazon Q integration, to perform natural language queries on their data. This feature allows for intuitive data exploration and analysis by posing questions in plain language, alleviating the need for complex SQL queries or specialized technical skills. Before setting up a topic, make sure that the users have Pro level access. To set up a topic, follow these steps:

On the QuickSight console, choose Topics in the navigation pane.
Choose New topic.
Enter a name for the topic and choose the data source created.
Choose the created topic and then choose Open Q&A to start querying in natural language

Query using natural language
We performed intuitive natural language queries to gain actionable insights into customer analytics. This capability allows users to analyze sentiments, call volumes, issue resolutions, and agent performance through conversational queries, enabling data-driven decision-making, operational optimization, and enhanced customer experiences within a scalable, call center-tailored analytics solution. Examples of the simple natural language queries “Which customer had positive sentiments and a complex query?” and “What are the most common issues and which agents dealt with them?” are shown in the following screenshots.

These capabilities are helpful when business leaders want to dive deep on a particular issue, empowering them to make informed decisions on various issues.
Success metrics
The primary success metric gained from this solution is boosting employee productivity, primarily by quickly understanding customer interactions from calls to uncover themes and trends while also identifying gaps and pain points in their products. Before the engagement, analysts were taking 14 days to manually go through each call transcript to retrieve insights. After engagement, Asure observed how Amazon Bedrock and Amazon Q in QuickSight could reduce this time to minutes, even seconds, to obtain both insights queried directly from all stored call transcripts and visualizations that can be used for report generation.
In the pipeline, Anthropic’s Claude 3 Haiku was used to obtain initial call metadata fields (such as summary, root cause, next steps, and sentiments) that was stored in Athena. This allowed each call transcript to be queried using natural language from Amazon Q in QuickSight, letting business analysts answer high-level questions about issues, themes, and customer and agent insights in seconds.
Pat Goepel, chairman and CEO of Asure, shares,

“In collaboration with the AWS Generative AI Innovation Center, we have improved upon a post-call analytics solution to help us identify and prioritize features that will be the most impactful for our customers. We are utilizing Amazon Bedrock, Amazon Comprehend, and Amazon Q in QuickSight to understand trends in our own customer interactions, prioritize items for product development, and detect issues sooner so that we can be even more proactive in our support for our customers. Our partnership with AWS and our commitment to be early adopters of innovative technologies like Amazon Bedrock underscore our dedication to making advanced HCM technology accessible for businesses of any size.”

Takeaways
We had the following takeaways:

Enabling chain-of-thought reasoning and specific assistant prompts for each prompt in the call metadata generation component and calling it using Anthropic’s Claude 3 Haiku improved metadata generation for each transcript. Primarily, the flexibility of Amazon Bedrock in the use of various FMs allowed full experimentation of many types of models with minimal changes. Using Amazon Bedrock can allow for the use of various models depending on the use case, making it the obvious choice for this application due to its flexibility.
Ragas metrics, particularly faithfulness, answer correctness, and answer relevance, were used to evaluate call metadata generation and individual Q&A. However, summarization required different metrics, alignment, and coverage, which didn’t require ground truth summaries. Therefore, DeepEval was used to calculate summarization metrics. Overall, the ease of integrating Amazon Bedrock allowed it to power the calculation of quantitative metrics with minimal changes to the evaluation libraries. This also allowed the use of different types of models for different evaluation libraries.
The human-in-the-loop approach can be used by SMEs to further evaluate Amazon Bedrock outputs. There is an opportunity to improve upon an Amazon Bedrock FM based on this feedback, but this was not worked on in this engagement.
The post-call analytics workflow, with the use of Amazon Bedrock, can be iterated upon in the future using features such as Amazon Bedrock Knowledge Bases to perform Q&A over a specific number of call transcripts as well as Amazon Bedrock Guardrails to detect harmful and hallucinated responses while also creating more responsible AI applications.
Amazon Q in QuickSight was able to answer natural language questions on customer analytics, root cause, and agent analytics, but some questions required reframing to get meaningful responses.
Data fields within Amazon Q in QuickSight needed to be defined properly and synonyms needed to be added to make Amazon Q more robust with natural language queries.

Security best practices
We recommend the following security guidelines for building secure applications on AWS:

Building secure machine learning environments with Amazon SageMaker
Control root access to a SageMaker notebook instance
Security in Amazon S3
Data protection in Amazon Cognito

Conclusion
In this post, we showcased how Asure used the PCA solution powered by Amazon Bedrock and Amazon Q in QuickSight to generate consumer and agent insights both at individual and aggregate levels. Specific insights included those centered around a common theme or issue. With these services, Asure was able to improve employee productivity to generate these insights in minutes instead of weeks.
This is one of the many ways builders can deliver great solutions using Amazon Bedrock and Amazon Q in QuickSight. To learn more, refer to Amazon Bedrock and Amazon Q in QuickSight.

About the Authors
Suren Gunturu is a Data Scientist working in the Generative AI Innovation Center, where he works with various AWS customers to solve high-value business problems. He specializes in building ML pipelines using large language models, primarily through Amazon Bedrock and other AWS Cloud services.
Avinash Yadav is a Deep Learning Architect at the Generative AI Innovation Center, where he designs and implements cutting-edge GenAI solutions for diverse enterprise needs. He specializes in building ML pipelines using large language models, with expertise in cloud architecture, Infrastructure as Code (IaC), and automation. His focus lies in creating scalable, end-to-end applications that leverage the power of deep learning and cloud technologies.
John Canada is the VP of Engineering at Asure Software, where he leverages his experience in building innovative, reliable, and performant solutions and his passion for AI/ML to lead a talented team dedicated to using Machine Learning to enhance the capabilities of Asure’s software and meet the evolving needs of businesses.
Yasmine Rodriguez Wakim is the Chief Technology Officer at Asure Software. She is an innovative Software Architect & Product Leader with deep expertise in creating payroll, tax, and workforce software development. As a results-driven tech strategist, she builds and leads technology vision to deliver efficient, reliable, and customer-centric software that optimizes business operations through automation.
Vidya Sagar Ravipati is a Science Manager at the Generative AI Innovation Center, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

From innovation to impact: How AWS and NVIDIA enable real-world genera …

As we gather for NVIDIA GTC, organizations of all sizes are at a pivotal moment in their AI journey. The question is no longer whether to adopt generative AI, but how to move from promising pilots to production-ready systems that deliver real business value. The organizations that figure this out first will have a significant competitive advantage—and we’re already seeing compelling examples of what’s possible.
Consider Hippocratic AI’s work to develop AI-powered clinical assistants to support healthcare teams as doctors, nurses, and other clinicians face unprecedented levels of burnout. During a recent hurricane in Florida, their system called 100,000 patients in a day to check on medications and provide preventative healthcare guidance–the kind of coordinated outreach that would be nearly impossible to achieve manually. They aren’t just building another chatbot; they are reimagining healthcare delivery at scale.
Production-ready AI like this requires more than just cutting-edge models or powerful GPUs. In my decade working with customers’ data journeys, I’ve seen that an organization’s most valuable asset is its domain-specific data and expertise. And now leading our data and AI go-to-market, I hear customers consistently emphasize what they need to transform their domain advantage into AI success: infrastructure and services they can trust—with performance, cost-efficiency, security, and flexibility—all delivered at scale. When the stakes are high, success requires not just cutting-edge technology, but the ability to operationalize it at scale—a challenge that AWS has consistently solved for customers. As the world’s most comprehensive and broadly adopted cloud, our partnership with NVIDIA’s pioneering accelerated computing platform for generative AI amplifies this capability. It’s inspiring to see how, together, we’re enabling customers across industries to confidently move AI into production.
In this post, I will share some of these customers’ remarkable journeys, offering practical insights for any organization looking to harness the power of generative AI.
Transforming content creation with generative AI

Content creation represents one of the most visible and immediate applications of generative AI today. Adobe, a pioneer that has shaped creative workflows for over four decades, has moved with remarkable speed to integrate generative AI across its flagship products, helping millions of creators work in entirely new ways.
Adobe’s approach to generative AI infrastructure exemplifies what their VP of Generative AI, Alexandru Costin, calls an “AI superhighway”—a sophisticated technical foundation that enables rapid iteration of AI models and seamless integration into their creative applications. The success of their Firefly family of generative AI models, integrated across flagship products like Photoshop, demonstrates the power of this approach. For their AI training and inference workloads, Adobe uses NVIDIA GPU-accelerated Amazon Elastic Compute Cloud (Amazon EC2) P5en (NVIDIA H200 GPUs), P5 (NVIDIA H100 GPUs), P4de (NVIDIA A100 GPUs), and G5 (NVIDIA A10G GPUs) instances. They also use NVIDIA software such as NVIDIA TensorRT and NVIDIA Triton Inference Server for faster, scalable inference. Adobe needed maximum flexibility to build their AI infrastructure, and AWS provided the complete stack of services needed—from Amazon FSx for Lustre for high-performance storage, to Amazon Elastic Kubernetes Service (Amazon EKS) for container orchestration, to Elastic Fabric Adapter (EFA) for high-throughput networking—to create a production environment that could reliably serve millions of creative professionals.
Key takeaway
If you’re building and managing your own AI pipelines, Adobe’s success highlights a critical insight: although GPU-accelerated compute often gets the spotlight in AI infrastructure discussions, what’s equally important is the NVIDIA software stack along with the foundation of orchestration, storage, and networking services that enable production-ready AI. Their results speak for themselves—Adobe achieved a 20-fold scale-up in model training while maintaining the enterprise-grade performance and reliability their customers expect.
Pioneering new AI applications from the ground up

Throughout my career, I’ve been particularly energized by startups that take on audacious challenges—those that aren’t just building incremental improvements but are fundamentally reimagining how things work. Perplexity exemplifies this spirit. They’ve taken on a technology most of us now take for granted: search. It’s the kind of ambitious mission that excites me, not just because of its bold vision, but because of the incredible technical challenges it presents. When you’re processing 340 million queries monthly and serving over 1,500 organizations, transforming search isn’t just about having great ideas—it’s about building robust, scalable systems that can deliver consistent performance in production.
Perplexity’s innovative approach earned them membership in both AWS Activate and NVIDIA Inception—flagship programs designed to accelerate startup innovation and success. These programs provided them with the resources, technical guidance, and support needed to build at scale. They were one of the early adopters of Amazon SageMaker HyperPod, and continue to use its distributed training capabilities to accelerate model training time by up to 40%. They use a highly optimized inference stack built with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server to serve both their search application and pplx-api, their public API service that gives developers access to their proprietary models. The results speak for themselves—their inference stack achieves up to 3.1 times lower latency compared to other platforms. Both their training and inference workloads run on NVIDIA GPU-accelerated EC2 P5 instances, delivering the performance and reliability needed to operate at scale. To give their users even more flexibility, Perplexity complements their own models with services such as Amazon Bedrock, and provides access to additional state-of-the-art models in their API. Amazon Bedrock offers ease of use and reliability, which are crucial for their team—as they note, it allows them to effectively maintain the reliability and latency their product demands.
What I find particularly compelling about Perplexity’s journey is their commitment to technical excellence, exemplified by their work optimizing GPU memory transfer with EFA networking. The team achieved 97.1% of the theoretical maximum bandwidth of 3200 Gbps and open sourced their innovations, enabling other organizations to benefit from their learnings.
For those interested in the technical details, I encourage you to read their fascinating post Journey to 3200 Gbps: High-Performance GPU Memory Transfer on AWS Sagemaker Hyperpod.
Key takeaway
For organizations with complex AI workloads and specific performance requirements, Perplexity’s approach offers a valuable lesson. Sometimes, the path to production-ready AI isn’t about choosing between self-hosted infrastructure and managed services—it’s about strategically combining both. This hybrid strategy can deliver both exceptional performance (evidenced by Perplexity’s 3.1 times lower latency) and the flexibility to evolve.
Transforming enterprise workflows with AI
Enterprise workflows represent the backbone of business operations—and they’re a crucial proving ground for AI’s ability to deliver immediate business value. ServiceNow, which terms itself the AI platform for business transformation, is rapidly integrating AI to reimagine core business processes at scale.
ServiceNow’s innovative AI solutions showcase their vision for enterprise-specific AI optimization. As Srinivas Sunkara, ServiceNow’s Vice President, explains, their approach focuses on deep AI integration with technology workflows, core business processes, and CRM systems—areas where traditional large language models (LLMs) often lack domain-specific knowledge. To train generative AI models at enterprise scale, ServiceNow uses NVIDIA DGX Cloud on AWS. Their architecture combines high-performance FSx for Lustre storage with NVIDIA GPU clusters for training, and NVIDIA Triton Inference Server handles production deployment. This robust technology platform allows ServiceNow to focus on domain-specific AI development and customer value rather than infrastructure management.
Key takeaway
ServiceNow offers an important lesson about enterprise AI adoption: while foundation models (FMs) provide powerful general capabilities, the greatest business value often comes from optimizing models for specific enterprise use cases and workflows. In many cases, it’s precisely this deliberate specialization that transforms AI from an interesting technology into a true business accelerator.
Scaling AI across enterprise applications
Cisco’s Webex team’s journey with generative AI exemplifies how large organizations can methodically transform their applications while maintaining enterprise standards for reliability and efficiency. With a comprehensive suite of telecommunications applications serving customers globally, they needed an approach that would allow them to incorporate LLMs across their portfolio—from AI assistants to speech recognition—without compromising performance or increasing operational complexity.
The Webex team’s key insight was to separate their models from their applications. Previously, they had embedded AI models into the container images for applications running on Amazon EKS, but as their models grew in sophistication and size, this approach became increasingly inefficient. By migrating their LLMs to Amazon SageMaker AI and using NVIDIA Triton Inference Server, they created a clean architectural break between their relatively lean applications and the underlying models, which require more substantial compute resources. This separation allows applications and models to scale independently, significantly reducing development cycle time and increasing resource utilization. The team deployed dozens of models on SageMaker AI endpoints, using Triton Inference Server’s model concurrency capabilities to scale globally across AWS data centers.
The results validate Cisco’s methodical approach to AI transformation. By separating applications from models, their development teams can now fix bugs, perform tests, and add features to applications much faster, without having to manage large models in their workstation memory. The architecture also enables significant cost optimization—applications remain available during off-peak hours for reliability, and model endpoints can scale down when not needed, all without impacting application performance. Looking ahead, the team is evaluating Amazon Bedrock to further improve their price-performance, demonstrating how thoughtful architecture decisions create a foundation for continuous optimization.
Key takeaway
For enterprises with large application portfolios looking to integrate AI at scale, Cisco’s methodical approach offers an important lesson: separating LLMs from applications creates a cleaner architectural boundary that improves both development velocity and cost optimization. By treating models and applications as independent components, Cisco significantly improved development cycle time while reducing costs through more efficient resource utilization.
Building mission-critical AI for healthcare

Earlier, we highlighted how Hippocratic AI reached 100,000 patients during a crisis. Behind this achievement lies a story of rigorous engineering for safety and reliability—essential in healthcare where stakes are extraordinarily high.
Hippocratic AI’s approach to this challenge is both innovative and rigorous. They’ve developed what they call a “constellation architecture”—a sophisticated system of over 20 specialized models working in concert, each focused on specific safety aspects like prescription adherence, lab analysis, and over-the-counter medication guidance. This distributed approach to safety means they have to train multiple models, requiring management of significant computational resources. That’s why they use SageMaker HyperPod for their training infrastructure, using Amazon FSx and Amazon Simple Storage Service (Amazon S3) for high-speed storage access to NVIDIA GPUs, while Grafana and Prometheus provide the comprehensive monitoring needed to provide optimal GPU utilization. They build upon NVIDIA’s low-latency inference stack, and are enhancing conversational AI capabilities using NVIDIA Riva models for speech recognition and text-to-speech translation, and are also using NVIDIA NIM microservices to deploy these models. Given the sensitive nature of healthcare data and HIPAA compliance requirements, they’ve implemented a sophisticated multi-account, multi-cluster strategy on AWS—running production inference workloads with patient data on completely separate accounts and clusters from their development and training environments. This careful attention to both security and performance allows them to handle thousands of patient interactions while maintaining precise control over clinical safety and accuracy.
The impact of Hippocratic AI’s work extends far beyond technical achievements. Their AI-powered clinical assistants address critical healthcare workforce burnout by handling burdensome administrative tasks—from pre-operative preparation to post-discharge follow-ups. For example, during weather emergencies, their system can rapidly assess heat risks and coordinate transport for vulnerable patients—the kind of comprehensive care that would be too burdensome and resource-intensive to coordinate manually at scale.
Key takeaway
For organizations building AI solutions for complex, regulated, and high-stakes environments, Hippocratic AI’s constellation architecture reinforces what we’ve consistently emphasized: there’s rarely a one-size-fits-all model for every use case. Just as Amazon Bedrock offers a choice of models to meet diverse needs, Hippocratic AI’s approach of combining over 20 specialized models—each focused on specific safety aspects—demonstrates how a thoughtfully designed ensemble can achieve both precision and scale.
Conclusion
As the technology partners enabling these and countless other customer innovations, AWS and NVIDIA’s long-standing collaboration continues to evolve to meet the demands of the generative AI era. Our partnership, which began over 14 years ago with the world’s first GPU cloud instance, has grown to offer the industry’s widest range of NVIDIA accelerated computing solutions and software services for optimizing AI deployments. Through initiatives like Project Ceiba—one of the world’s fastest AI supercomputers hosted exclusively on AWS using NVIDIA DGX Cloud for NVIDIA’s own research and development use—we continue to push the boundaries of what’s possible.
As all the examples we’ve covered illustrate, it isn’t just about the technology we build together—it’s how organizations of all sizes are using these capabilities to transform their industries and create new possibilities. These stories ultimately reveal something more fundamental: when we make powerful AI capabilities accessible and reliable, people find remarkable ways to use them to solve meaningful problems. That’s the true promise of our partnership with NVIDIA—enabling innovators to create positive change at scale. I’m excited to continue inventing and partnering with NVIDIA and can’t wait to see what our mutual customers are going to do next.
Resources
Check out the following resources to learn more about our partnership with NVIDIA and generative AI on AWS:

Learn about the AWS and NVIDIA partnership
Explore generative AI on AWS
Cost-effectively access NVIDIA GPUs across several new AWS Regions with Amazon EC2 Capacity Blocks for ML
Get started with Amazon SageMaker HyperPod for generative AI model development
Build and scale generative AI applications with Amazon Bedrock

About the Author
Rahul Pathak is Vice President Data and AI GTM at AWS, where he leads the global go-to-market and specialist teams who are helping customers create differentiated value with AWS’s AI and capabilities such as Amazon Bedrock, Amazon Q, Amazon SageMaker, and Amazon EC2 and Data Services such as Amaqzon S3, AWS Glue and Amazon Redshift. Rahul believes that generative AI will transform virtually every single customer experience and that data is a key differentiator for customers as they build AI applications. Prior to his current role, he was Vice President, Relational Database Engines where he led Amazon Aurora, Redshift, and DSQL . During his 13+ years at AWS, Rahul has been focused on launching, building, and growing managed database and analytics services, all aimed at making it easy for customers to get value from their data. Rahul has over twenty years of experience in technology and has co-founded two companies, one focused on analytics and the other on IP-geolocation. He holds a degree in Computer Science from MIT and an Executive MBA from the University of Washington.

Amazon Q Business now available in Europe (Ireland) AWS Region

Today, we are excited to announce that Amazon Q Business—a fully managed generative-AI powered assistant that you can configure to answer questions, provide summaries and generate content based on your enterprise data—is now generally available in the Europe (Ireland) AWS Region.
Since its launch, Amazon Q Business has been helping customers find information, gain insight, and take action at work. The general availability of Amazon Q Business in the Europe (Ireland) Region will support customers across Ireland and the EU to transform how their employees work and access information, while maintaining data security and privacy requirements.
AWS customers and partners innovate using Amazon Q Business in Europe
Organizations across the EU are using Amazon Q Business for a wide variety of use cases, including answering questions about company data, summarizing documents, and providing business insights.
Katya Dunets, the AWS Lead Sales Engineer for Adastra noted,

Adastra stands at the forefront of technological innovation, specializing in artificial intelligence, data, cloud, digital, and governance services. Our team was facing the daunting challenge of sifting through hundreds of documents on SharePoint, searching for content and information critical for market research and RFP generation. This process was not only time-consuming but also impeded our agility and responsiveness. Recognizing the need for a transformative solution, we turned to Amazon Q Business for its prowess in answering queries, summarizing documents, generating content, and executing tasks, coupled with its direct SharePoint integration. Amazon Q Business became the catalyst for unprecedented efficiency within Adastra, dramatically streamlining document retrieval, enhancing cross-team collaboration through shared insights from past projects, and accelerating our RFP development process by 70%. Amazon Q Business has not only facilitated a smoother exchange of knowledge within our teams but has also empowered us to maintain our competitive edge by focusing on innovation rather than manual tasks. Adastra’s journey with Amazon Q exemplifies our commitment to harnessing cutting-edge technology to better serve both our clients and their customers.

AllCloud is a cloud solutions provider specializing in cloud stack, infrastructure, platform, and Software-as-a-Service. Their CTO, Peter Nebel stated,

“AllCloud faces the common challenge of information sprawl. Critical knowledge for sales and delivery teams is scattered across various tools—Salesforce for customer and marketing data, Google Drive for documents, Bamboo for HR and internal information, and Confluence for internal wikis. This fragmented approach wastes valuable time as employees hunt and peck for the information they need, hindering productivity and potentially impacting client satisfaction. Amazon Q Business provides AllCloud a solution to increase productivity by streamlining information access. By leveraging Amazon Q’s natural language search capabilities, AllCloud can empower its personnel with a central hub to find answers to their questions across all their existing information sources. This drives efficiency and accuracy by eliminating the need for time-consuming searches across multiple platforms and ensures all teams have access to the most up-to-date information. Amazon Q will significantly accelerate productivity, across all lines of business, allowing AllCloud’s teams to focus on delivering exceptional service to their clients.”

Lars Ritter, Senior Manager at Woodmark Consulting noted,

“Amazon Bedrock and Amazon Q Business have been game-changers for Woodmark. Employees struggled with time-consuming searches across various siloed systems, leading to reduced productivity and slower operations. To solve for the inefficient retrieval of corporate knowledge from unstructured data sources we turned to Amazon Bedrock and Amazon Q Business for help. With this innovative solution, Woodmark has been able to revolutionize data accessibility, empowering our teams to effortlessly retrieve insights using simple natural language queries, and to make informed decisions without relying on specialized data teams, which was not feasible before. These solutions have dramatically increased efficiency, fostered a data-driven culture, and positioned us for scalable growth, driving our organization toward unparalleled success.”

Scott Kumono, Product Manager for Kinectus at Siemens Healthineers adds,

“Amazon Q Business has enhanced the delivery of service and clinical support for our ultrasound customers. Previously, finding specific information meant sifting through a 1,000-page manual or waiting for customer support to respond. Now, customers have instant access to answers and specifications right at their fingertips, using Kinectus Remote Service. With Amazon Q Business we were able to significantly reduce manual work and wait times to find the right information, allowing our customers to focus on what really matters – patient care.”

Till Gloger, Head of Digital Production Platform Region Americas at Volkswagen Group of America states,

“Volkswagen innovates not only on its products, but also on how to boost employee productivity and increase production throughput. Volkswagen is testing the use of Amazon Q to streamline employee workflows by potentially integrating it with existing processes. This integration has the possibility to help employees save time during the assembly process, reducing some processes from minutes to seconds, ultimately leading to more throughput.”

Pricing
With Amazon Q Business, enterprise customers pay for user subscriptions and index capacity. For more details, see Amazon Q Business pricing.
Get started with Amazon Q Business today
To get started with Amazon Q Business, users first need to configure an application environment and create a knowledge base using over 40 data source connectors that index documents (e.g text, pdf, images, tables). Organizations then set up user authentication through AWS IAM Identity Center or other SAML-based identity providers like Okta, Ping Identity, and Microsoft Entra ID. After configuring access permissions, applications users can navigate to their organization’s Amazon Q Business web interface using their credentials to begin interacting with Q Business and the data they have access to. Q Business enables natural language interactions where users can ask questions and receive answers based on their indexed documents, uploaded content, and world knowledge – this may include getting details, generating content or insights. Users can access Amazon Q Business through multiple channels including web applications, Slack, Microsoft Teams, Microsoft 365 for Word and Outlook, or through browser extensions for gen-AI assistance directly where they work. Additionally, customers can securely share their data with verified independent software vendors (ISVs) like Asana, Miro, PagerDuty, and Zoom using the data accessors feature, which maintains security and compliance while respecting user-level permissions.
Learn more about how to get started with Amazon Q Business here. Read about other Amazon Q Business customers’ success stories here. Certain Amazon Q Business features already available in US East (N. Virginia) and US West (Oregon) including Q Apps, Q Actions, and Audio/Video file support will become available in Europe (Ireland) soon.

About the Authors
Jose Navarro is an AI/ML Specialist Solutions Architect at AWS, based in Spain. Jose helps AWS customers—from small startups to large enterprises—architect and take their end-to-end machine learning use cases to production.
Morgan Dutton is a Senior Technical Program Manager at AWS, Amazon Q Business based in Seattle.
Eva Pagneux is a Principal Product Manager at AWS, Amazon Q Business, based in San Francisco.
Wesleigh Roeca is a Senior Worldwide Gen AI/ML Specialist at AWS, Amazon Q Business, based in Santa Monica.

Intelligent healthcare assistants: Empowering stakeholders with person …

Large language models (LLMs) have revolutionized the field of natural language processing, enabling machines to understand and generate human-like text with remarkable accuracy. However, despite their impressive language capabilities, LLMs are inherently limited by the data they were trained on. Their knowledge is static and confined to the information they were trained on, which becomes problematic when dealing with dynamic and constantly evolving domains like healthcare.
The healthcare industry is a complex, ever-changing landscape with a vast and rapidly growing body of knowledge. Medical research, clinical practices, and treatment guidelines are constantly being updated, rendering even the most advanced LLMs quickly outdated. Additionally, patient data, including electronic health records (EHRs), diagnostic reports, and medical histories, are highly personalized and unique to each individual. Relying solely on an LLM’s pre-trained knowledge is insufficient for providing accurate and personalized healthcare recommendations.
Furthermore, healthcare decisions often require integrating information from multiple sources, such as medical literature, clinical databases, and patient records. LLMs lack the ability to seamlessly access and synthesize data from these diverse and distributed sources. This limits their potential to provide comprehensive and well-informed insights for healthcare applications.
Overcoming these challenges is crucial for using the full potential of LLMs in the healthcare domain. Patients, healthcare providers, and researchers require intelligent agents that can provide up-to-date, personalized, and context-aware support, drawing from the latest medical knowledge and individual patient data.
Enter LLM function calling, a powerful capability that addresses these challenges by allowing LLMs to interact with external functions or APIs, enabling them to access and use additional data sources or computational capabilities beyond their pre-trained knowledge. By combining the language understanding and generation abilities of LLMs with external data sources and services, LLM function calling opens up a world of possibilities for intelligent healthcare agents.
In this blog post, we will explore how Mistral LLM on Amazon Bedrock can address these challenges and enable the development of intelligent healthcare agents with LLM function calling capabilities, while maintaining robust data security and privacy through Amazon Bedrock Guardrails.
Healthcare agents equipped with LLM function calling can serve as intelligent assistants for various stakeholders, including patients, healthcare providers, and researchers. They can assist patients by answering medical questions, interpreting test results, and providing personalized health advice based on their medical history and current conditions. For healthcare providers, these agents can help with tasks such as summarizing patient records, suggesting potential diagnoses or treatment plans, and staying up to date with the latest medical research. Additionally, researchers can use LLM function calling to analyze vast amounts of scientific literature, identify patterns and insights, and accelerate discoveries in areas such as drug development or disease prevention.
Benefits of LLM function calling
LLM function calling offers several advantages for enterprise applications, including enhanced decision-making, improved efficiency, personalized experiences, and scalability. By combining the language understanding capabilities of LLMs with external data sources and computational resources, enterprises can make more informed and data-driven decisions, automate and streamline various tasks, provide tailored recommendations and experiences for individual users or customers, and handle large volumes of data and process multiple requests concurrently.
Potential use cases for LLM function calling in the healthcare domain include patient triage, medical question answering, and personalized treatment recommendations. LLM-powered agents can assist in triaging patients by analyzing their symptoms, medical history, and risk factors, and providing initial assessments or recommendations for seeking appropriate care. Patients and healthcare providers can receive accurate and up-to-date answers to medical questions by using LLMs’ ability to understand natural language queries and access relevant medical knowledge from various data sources. Additionally, by integrating with electronic health records (EHRs) and clinical decision support systems, LLM function calling can provide personalized treatment recommendations tailored to individual patients’ medical histories, conditions, and preferences.
Amazon Bedrock supports a variety of foundation models. In this post, we will be exploring how to perform function calling using Mistral from Amazon Bedrock. Mistral supports function calling, which allows agents to invoke external functions or APIs from within a conversation flow. This capability enables agents to retrieve data, perform calculations, or use external services to enhance their conversational abilities. Function calling in Mistral is achieved through the use of specific function call blocks that define the external function to be invoked and handle the response or output.
Solution overview
LLM function calling typically involves integrating an LLM model with an external API or function that provides access to additional data sources or computational capabilities. The LLM model acts as an interface, processing natural language inputs and generating responses based on its pre-trained knowledge and the information obtained from the external functions or APIs. The architecture typically consists of the LLM model, a function or API integration layer, and external data sources and services.
Healthcare agents can integrate LLM models and call external functions or APIs through a series of steps: natural language input processing, self-correction, chain of thought, function or API calling through an integration layer, data integration and processing, and persona adoption. The agent receives natural language input, processes it through the LLM model, calls relevant external functions or APIs if additional data or computations are required, combines the LLM model’s output with the external data or results, and provides a comprehensive response to the user.

High Level Architecture- Healthcare assistant

The architecture for the Healthcare Agent is shown in the preceding figure and is as follows:

Consumers interact with the system through Amazon API Gateway.
AWS Lambda orchestrator, along with tool configuration and prompts, handles orchestration and invokes the Mistral model on Amazon Bedrock.
Agent function calling allows agents to invoke Lambda functions to retrieve data, perform computations, or use external services.
Functions such as insurance, claims, and pre-filled Lambda functions handle specific tasks.
Data is stored in a conversation history, and a member database (MemberDB) is used to store member information and the knowledge base has static documents used by the agent.
AWS CloudTrail, AWS Identity and Access Management (IAM), and Amazon CloudWatch handle data security.
AWS Glue, Amazon SageMaker, and Amazon Simple Storage Service (Amazon S3) facilitate data processing.

A sample code using function calling through the Mistral LLM can be found at mistral-on-aws.
Security and privacy considerations
Data privacy and security are of utmost importance in the healthcare sector because of the sensitive nature of personal health information (PHI) and the potential consequences of data breaches or unauthorized access. Compliance with regulations such as HIPAA and GDPR is crucial for healthcare organizations handling patient data. To maintain robust data protection and regulatory compliance, healthcare organizations can use Amazon Bedrock Guardrails, a comprehensive set of security and privacy controls provided by Amazon Web Services (AWS).
Amazon Bedrock Guardrails offers a multi-layered approach to data security, including encryption at rest and in transit, access controls, audit logging, ground truth validation and incident response mechanisms. It also provides advanced security features such as data residency controls, which allow organizations to specify the geographic regions where their data can be stored and processed, maintaining compliance with local data privacy laws.
When using LLM function calling in the healthcare domain, it’s essential to implement robust security measures and follow best practices for handling sensitive patient information. Amazon Bedrock Guardrails can play a crucial role in this regard by helping to provide a secure foundation for deploying and operating healthcare applications and services that use LLM capabilities.
Some key security measures enabled by Amazon Bedrock Guardrails are:

Data encryption: Patient data processed by LLM functions can be encrypted at rest and in transit, making sure that sensitive information remains secure even in the event of unauthorized access or data breaches.
Access controls: Amazon Bedrock Guardrails enables granular access controls, allowing healthcare organizations to define and enforce strict permissions for who can access, modify, or process patient data through LLM functions.
Secure data storage: Patient data can be stored in secure, encrypted storage services such as Amazon S3 or Amazon Elastic File System (Amazon EFS), making sure that sensitive information remains protected even when at rest.
Anonymization and pseudonymization: Healthcare organizations can use Amazon Bedrock Guardrails to implement data anonymization and pseudonymization techniques, making sure that patient data used for training or testing LLM models doesn’t contain personally identifiable information (PII).
Audit logging and monitoring: Comprehensive audit logging and monitoring capabilities provided by Amazon Bedrock Guardrails enable healthcare organizations to track and monitor all access and usage of patient data by LLM functions, enabling timely detection and response to potential security incidents.
Regular security audits and assessments: Amazon Bedrock Guardrails facilitates regular security audits and assessments, making sure that the healthcare organization’s data protection measures remain up-to-date and effective in the face of evolving security threats and regulatory requirements.

By using Amazon Bedrock Guardrails, healthcare organizations can confidently deploy LLM function calling in their applications and services, maintaining robust data security, privacy protection, and regulatory compliance while enabling the transformative benefits of AI-powered healthcare assistants.
Case studies and real-world examples
3M Health Information Systems is collaborating with AWS to accelerate AI innovation in clinical documentation by using AWS machine learning (ML) services, compute power, and LLM capabilities. This collaboration aims to enhance 3M’s natural language processing (NLP) and ambient clinical voice technologies, enabling intelligent healthcare agents to capture and document patient encounters more efficiently and accurately. These agents, powered by LLMs, can understand and process natural language inputs from healthcare providers, such as spoken notes or queries, and use LLM function calling to access and integrate relevant medical data from EHRs, knowledge bases, and other data sources. By combining 3M’s domain expertise with AWS ML and LLM capabilities, the companies can improve clinical documentation workflows, reduce administrative burdens for healthcare providers, and ultimately enhance patient care through more accurate and comprehensive documentation.
GE Healthcare developed Edison, a secure intelligence solution running on AWS, to ingest and analyze data from medical devices and hospital information systems. This solution uses AWS analytics, ML, and Internet of Things (IoT) services to generate insights and analytics that can be delivered through intelligent healthcare agents powered by LLMs. These agents, equipped with LLM function calling capabilities, can seamlessly access and integrate the insights and analytics generated by Edison, enabling them to assist healthcare providers in improving operational efficiency, enhancing patient outcomes, and supporting the development of new smart medical devices. By using LLM function calling to retrieve and process relevant data from Edison, the agents can provide healthcare providers with data-driven recommendations and personalized support, ultimately enabling better patient care and more effective healthcare delivery.
Future trends and developments
Future advancements in LLM function calling for healthcare might include more advanced natural language processing capabilities, such as improved context understanding, multi-turn conversational abilities, and better handling of ambiguity and nuances in medical language. Additionally, the integration of LLM models with other AI technologies, such as computer vision and speech recognition, could enable multimodal interactions and analysis of various medical data formats.
Emerging technologies such as multimodal models, which can process and generate text, images, and other data formats simultaneously, could enhance LLM function calling in healthcare by enabling more comprehensive analysis and visualization of medical data. Personalized language models, trained on individual patient data, could provide even more tailored and accurate responses. Federated learning techniques, which allow model training on decentralized data while preserving privacy, could address data-sharing challenges in healthcare.
These advancements and emerging technologies could shape the future of healthcare agents by making them more intelligent, adaptive, and personalized. Agents could seamlessly integrate multimodal data, such as medical images and lab reports, into their analysis and recommendations. They could also continuously learn and adapt to individual patients’ preferences and health conditions, providing truly personalized care. Additionally, federated learning could enable collaborative model development while maintaining data privacy, fostering innovation and knowledge sharing across healthcare organizations.
Conclusion
LLM function calling has the potential to revolutionize the healthcare industry by enabling intelligent agents that can understand natural language, access and integrate various data sources, and provide personalized recommendations and insights. By combining the language understanding capabilities of LLMs with external data sources and computational resources, healthcare organizations can enhance decision-making, improve operational efficiency, and deliver superior patient experiences. However, addressing data privacy and security concerns is crucial for the successful adoption of this technology in the healthcare domain.
As the healthcare industry continues to embrace digital transformation, we encourage readers to explore and experiment with LLM function calling in their respective domains. By using this technology, healthcare organizations can unlock new possibilities for improving patient care, advancing medical research, and streamlining operations. With a focus on innovation, collaboration, and responsible implementation, the healthcare industry can harness the power of LLM function calling to create a more efficient, personalized, and data-driven future. AWS can help organizations use LLM function calling and build intelligent healthcare assistants through its AI/ML services, including Amazon Bedrock, Amazon Lex, and Lambda, while maintaining robust security and compliance using Amazon Bedrock Guardrails. To learn more, see AWS for Healthcare & Life Sciences.

About the Authors
Laks Sundararajan is a seasoned Enterprise Architect helping companies reset, transform and modernize their IT, digital, cloud, data and insight strategies. A proven leader with significant expertise around Generative AI, Digital, Cloud and Data/Analytics Transformation, Laks is a Sr. Solutions Architect with Healthcare and Life Sciences (HCLS).
Subha Venugopal is a Senior Solutions Architect at AWS with over 15 years of experience in the technology and healthcare sectors. Specializing in digital transformation, platform modernization, and AI/ML, she leads AWS Healthcare and Life Sciences initiatives. Subha is dedicated to enabling equitable healthcare access and is passionate about mentoring the next generation of professionals.

Cohere Released Command A: A 111B Parameter AI Model with 256K Context …

LLMs are widely used for conversational AI, content generation, and enterprise automation. However, balancing performance with computational efficiency is a key challenge in this field. Many state-of-the-art models require extensive hardware resources, making them impractical for smaller enterprises. The demand for cost-effective AI solutions has led researchers to develop models that deliver high performance with lower computational requirements.

Training and deploying AI models present hurdles for researchers and businesses. Large-scale models require substantial computational power, making them costly to maintain. Also, AI models must handle multilingual tasks, ensure high instruction-following accuracy, and support enterprise applications such as data analysis, automation, and coding. Current market solutions, while effective, often demand infrastructure beyond the reach of many enterprises. The challenge is to optimize AI models for processing efficiency without compromising accuracy or functionality.

Several AI models currently dominate the market, including GPT-4o and DeepSeek-V3. These models excel in natural language processing and generation but require high-end hardware, sometimes needing up to 32 GPUs to operate effectively. While they provide advanced capabilities in text generation, multilingual support, and coding, their hardware dependencies limit accessibility. Some models also struggle with enterprise-level instruction-following accuracy and tool integration. Businesses need AI solutions that maintain competitive performance while minimizing infrastructure and deployment costs. This demand has driven efforts to optimize language models to function with minimal hardware requirements.

Researchers from Cohere introduced Command A, a high-performance AI model, designed specifically for enterprise applications requiring maximum efficiency. Unlike conventional models that require large computational resources, Command A operates on just two GPUs while maintaining competitive performance. The model comprises 111 billion parameters and supports a context length of 256K, making it suitable for enterprise applications that involve long-form document processing. Its ability to efficiently handle business-critical agentic and multilingual tasks sets it apart from its predecessors. The model has been optimized to provide high-quality text generation while reducing operational costs, making it a cost-effective alternative for businesses aiming to leverage AI for various applications.

The underlying technology of Command A is structured around an optimized transformer architecture, which includes three layers of sliding window attention, each with a window size of 4096 tokens. This mechanism enhances local context modeling, allowing the model to retain important details across extended text inputs. A fourth layer incorporates global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence. The model’s supervised fine-tuning and preference training further refine its ability to align responses with human expectations regarding accuracy, safety, and helpfulness. Also, Command A supports 23 languages, making it one of the most versatile AI models for businesses with global operations. Its chat capabilities are preconfigured for interactive behavior, enabling seamless conversational AI applications.

Image Source

Performance evaluations indicate that Command A competes favorably with leading AI models such as GPT-4o and DeepSeek-V3 across various enterprise-focused benchmarks. The model achieves a token generation rate of 156 tokens per second, 1.75 times higher than GPT-4o and 2.4 times higher than DeepSeek-V3, making it one of the most efficient models available. Regarding cost efficiency, private deployments of Command A are up to 50% cheaper than API-based alternatives, significantly reducing the financial burden on businesses. Command A also excels in instruction-following tasks, SQL-based queries, and retrieval-augmented generation (RAG) applications. It has demonstrated high accuracy in real-world enterprise data evaluations, outperforming its competitors in multilingual business use cases.

In a direct comparison of enterprise task performance, human evaluation results show that Command A consistently outperforms its competitors in fluency, faithfulness, and response utility. The model’s enterprise-ready capabilities include robust retrieval-augmented generation with verifiable citations, advanced agentic tool use, and high-level security measures to protect sensitive business data. Its multilingual capabilities extend beyond simple translation, demonstrating superior proficiency in responding accurately in region-specific dialects. For instance, evaluations of Arabic dialects, including Egyptian, Saudi, Syrian, and Moroccan Arabic, revealed that Command A delivered more precise and contextually appropriate responses than leading AI models. These results emphasize its strong applicability in global enterprise environments where language diversity is crucial.

Image Source

Several key takeaways from the research include:

Command A operates on just two GPUs, significantly reducing computational costs while maintaining high performance.

With 111 billion parameters, the model is optimized for enterprise-scale applications that require extensive text processing.

The model supports a 256K context length, enabling it to process longer enterprise documents more effectively than competing models.

Command A is trained on 23 languages, ensuring high accuracy and contextual relevance for global businesses.

It achieves 156 tokens per second, 1.75x higher than GPT-4o and 2.4x higher than DeepSeek-V3.

The model consistently outperforms competitors in real-world enterprise evaluations, excelling in SQL, agentic, and tool-based tasks.

Advanced RAG capabilities with verifiable citations make it highly suitable for enterprise information retrieval applications.

Private deployments of Command A can be up to 50% cheaper than API-based models.

The model includes enterprise-grade security features, ensuring safe handling of sensitive business data.

Demonstrates high proficiency in regional dialects, making it ideal for businesses operating in linguistically diverse regions.

Check out the Model on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

The post Cohere Released Command A: A 111B Parameter AI Model with 256K Context Length, 23-Language Support, and 50% Cost Reduction for Enterprises appeared first on MarkTechPost.