NVIDIA XGBoost 3.0: Training Terabyte-Scale Datasets with Grace Hopper …

NVIDIA has unveiled a major milestone in scalable machine learning: XGBoost 3.0, now able to train gradient-boosted decision tree (GBDT) models from gigabytes up to 1 terabyte (TB) on a single GH200 Grace Hopper Superchip. The breakthrough enables companies to process immense datasets for applications like fraud detection, credit risk modeling, and algorithmic trading, simplifying the once-complex process of scaling machine learning ML pipelines.

Breaking Terabyte Barriers

At the heart of this advancement is the new External-Memory Quantile DMatrix in XGBoost 3.0. Traditionally, GPU training was limited by the available GPU memory, capping achievable dataset size or forcing teams to adapt complex multi-node frameworks. The new release leverages the Grace Hopper Superchip’s coherent memory architecture and ultrafast 900GB/s NVLink-C2C bandwidth. This enables direct streaming of pre-binned, compressed data from host RAM into the GPU, overcoming bottlenecks and memory constraints that previously required RAM-monster servers or large GPU clusters.

Real-World Gains: Speed, Simplicity, and Cost Savings

Institutions like the Royal Bank of Canada (RBC) have reported up to 16x speed boosts and a 94% reduction in total cost of ownership (TCO) for model training by moving their predictive analytics pipelines to GPU-powered XGBoost. This leap in efficiency is crucial for workflows with constant model tuning and rapidly changing data volumes, allowing banks and enterprises to optimize features faster and scale as data grows.

How It Works: External Memory Meets XGBoost

The new external-memory approach introduces several innovations:

External-Memory Quantile DMatrix: Pre-bins every feature into quantile buckets, keeps data compressed in host RAM, and streams it as needed, maintaining accuracy while reducing GPU memory load.

Scalability on a Single Chip: One GH200 Superchip, with 80GB HBM3 GPU RAM plus 480GB LPDDR5X system RAM, can now handle a full TB-scale dataset—tasks formerly possible only across multi-GPU clusters.

Simpler Integration: For data science teams using RAPIDS, activating the new method is a straightforward drop-in, requiring minimal code changes.

Technical Best Practices

Use grow_policy=’depthwise’ for tree construction for best performance on external memory.

Run with CUDA 12.8+ and an HMM-enabled driver for full Grace Hopper support.

Data shape matters: the number of rows (labels) is the main limiter for scaling—wider or taller tables yield comparable performance on the GPU.

Upgrades

Other highlights in XGBoost 3.0 include:

Experimental support for distributed external memory across GPU clusters.

Reduced memory requirements and initialization time, notably for mostly-dense data.

Support for categorical features, quantile regression, and SHAP explainability in external-memory mode.

Industry Impact

By bringing terabyte-scale GBDT training to a single chip, NVIDIA democratizes access to massive machine learning for both financial and enterprise users, paving the way for faster iteration, lower cost, and lower IT complexity.

XGBoost 3.0 and the Grace Hopper Superchip together mark a major leap forward in scalable, accelerated machine learning.

Check out the Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post NVIDIA XGBoost 3.0: Training Terabyte-Scale Datasets with Grace Hopper Superchip appeared first on MarkTechPost.

A Coding Implementation to Advanced LangGraph Multi-Agent Research Pip …

We build an advanced LangGraph multi-agent system that leverages Google’s free-tier Gemini model for end-to-end research workflows. In this tutorial, we start by installing the necessary libraries, LangGraph, LangChain-Google-GenAI, and LangChain-Core, then walk through defining a structured state, simulating research and analysis tools, and wiring up three specialized agents: Research, Analysis, and Report. Along the way, we show how to simulate web searches, perform data analysis, and orchestrate messages between agents to produce a polished executive report. Check out the Full Codes here.

Copy CodeCopiedUse a different Browser!pip install -q langgraph langchain-google-genai langchain-core

import os
from typing import TypedDict, Annotated, List, Dict, Any
from langgraph.graph import StateGraph, END
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
import operator
import json

os.environ[“GOOGLE_API_KEY”] = “Use Your Own API Key”

class AgentState(TypedDict):
messages: Annotated[List[BaseMessage], operator.add]
current_agent: str
research_data: dict
analysis_complete: bool
final_report: str

llm = ChatGoogleGenerativeAI(model=”gemini-1.5-flash”, temperature=0.7)

We install the LangGraph and LangChain-Google-GenAI packages and import the core modules we need to orchestrate our multi-agent workflow. We set our Google API key, define the AgentState TypedDict to structure messages and workflow state, and initialize the Gemini-1.5-Flash model with a 0.7 temperature for balanced responses. Check out the Full Codes here.

Copy CodeCopiedUse a different Browserdef simulate_web_search(query: str) -> str:
“””Simulated web search – replace with real API in production”””
return f”Search results for ‘{query}’: Found relevant information about {query} including recent developments, expert opinions, and statistical data.”

def simulate_data_analysis(data: str) -> str:
“””Simulated data analysis tool”””
return f”Analysis complete: Key insights from the data include emerging trends, statistical patterns, and actionable recommendations.”

def research_agent(state: AgentState) -> AgentState:
“””Agent that researches a given topic”””
messages = state[“messages”]
last_message = messages[-1].content

search_results = simulate_web_search(last_message)

prompt = f”””You are a research agent. Based on the query: “{last_message}”

Here are the search results: {search_results}

Conduct thorough research and gather relevant information. Provide structured findings with:
1. Key facts and data points
2. Current trends and developments
3. Expert opinions and insights
4. Relevant statistics

Be comprehensive and analytical in your research summary.”””

response = llm.invoke([HumanMessage(content=prompt)])

research_data = {
“topic”: last_message,
“findings”: response.content,
“search_results”: search_results,
“sources”: [“academic_papers”, “industry_reports”, “expert_analyses”],
“confidence”: 0.88,
“timestamp”: “2024-research-session”
}

return {
“messages”: state[“messages”] + [AIMessage(content=f”Research completed on ‘{last_message}’: {response.content}”)],
“current_agent”: “analysis”,
“research_data”: research_data,
“analysis_complete”: False,
“final_report”: “”
}

We define simulate_web_search and simulate_data_analysis as placeholder tools that mock retrieving and analyzing information, then implement research_agent to invoke these simulations, prompt Gemini for a structured research summary, and update our workflow state with the findings. We encapsulate the entire research phase in a single function that advances the agent to the analysis stage once the simulated search and structured LLM output are complete. Check out the Full Codes here.

Copy CodeCopiedUse a different Browserdef analysis_agent(state: AgentState) -> AgentState:
“””Agent that analyzes research data and extracts insights”””
research_data = state[“research_data”]

analysis_results = simulate_data_analysis(research_data.get(‘findings’, ”))

prompt = f”””You are an analysis agent. Analyze this research data in depth:

Topic: {research_data.get(‘topic’, ‘Unknown’)}
Research Findings: {research_data.get(‘findings’, ‘No findings’)}
Analysis Results: {analysis_results}

Provide deep insights including:
1. Pattern identification and trend analysis
2. Comparative analysis with industry standards
3. Risk assessment and opportunities
4. Strategic implications
5. Actionable recommendations with priority levels

Be analytical and provide evidence-based insights.”””

response = llm.invoke([HumanMessage(content=prompt)])

return {
“messages”: state[“messages”] + [AIMessage(content=f”Analysis completed: {response.content}”)],
“current_agent”: “report”,
“research_data”: state[“research_data”],
“analysis_complete”: True,
“final_report”: “”
}

def report_agent(state: AgentState) -> AgentState:
“””Agent that generates final comprehensive reports”””
research_data = state[“research_data”]

analysis_message = None
for msg in reversed(state[“messages”]):
if isinstance(msg, AIMessage) and “Analysis completed:” in msg.content:
analysis_message = msg.content.replace(“Analysis completed: “, “”)
break

prompt = f”””You are a professional report generation agent. Create a comprehensive executive report based on:

Research Topic: {research_data.get(‘topic’)}
Research Findings: {research_data.get(‘findings’)}
Analysis Results: {analysis_message or ‘Analysis pending’}

Generate a well-structured, professional report with these sections:

## EXECUTIVE SUMMARY
## KEY RESEARCH FINDINGS
[Detail the most important discoveries and data points]

## ANALYTICAL INSIGHTS
[Present deep analysis, patterns, and trends identified]

## STRATEGIC RECOMMENDATIONS
[Provide actionable recommendations with priority levels]

## RISK ASSESSMENT & OPPORTUNITIES
[Identify potential risks and opportunities]

## CONCLUSION & NEXT STEPS
[Summarize and suggest follow-up actions]

Make the report professional, data-driven, and actionable.”””

response = llm.invoke([HumanMessage(content=prompt)])

return {
“messages”: state[“messages”] + [AIMessage(content=f” FINAL REPORT GENERATED:nn{response.content}”)],
“current_agent”: “complete”,
“research_data”: state[“research_data”],
“analysis_complete”: True,
“final_report”: response.content
}

We implement analysis_agent to take the simulated research findings, run them through our mock data analysis tool, prompt Gemini to produce in-depth insights and strategic recommendations, then transition the workflow to the report stage. We built report_agent to extract the latest analysis and craft a structured executive report via Gemini, with sections ranging from summary to next steps. We then mark the workflow as complete by storing the final report in the state. Check out the Full Codes here.

Copy CodeCopiedUse a different Browserdef should_continue(state: AgentState) -> str:
“””Determine which agent should run next based on current state”””
current_agent = state.get(“current_agent”, “research”)

if current_agent == “research”:
return “analysis”
elif current_agent == “analysis”:
return “report”
elif current_agent == “report”:
return END
else:
return END

workflow = StateGraph(AgentState)

workflow.add_node(“research”, research_agent)
workflow.add_node(“analysis”, analysis_agent)
workflow.add_node(“report”, report_agent)

workflow.add_conditional_edges(
“research”,
should_continue,
{“analysis”: “analysis”, END: END}
)

workflow.add_conditional_edges(
“analysis”,
should_continue,
{“report”: “report”, END: END}
)

workflow.add_conditional_edges(
“report”,
should_continue,
{END: END}
)

workflow.set_entry_point(“research”)

app = workflow.compile()

def run_research_assistant(query: str):
“””Run the complete research workflow”””
initial_state = {
“messages”: [HumanMessage(content=query)],
“current_agent”: “research”,
“research_data”: {},
“analysis_complete”: False,
“final_report”: “”
}

print(f” Starting Multi-Agent Research on: ‘{query}'”)
print(“=” * 60)

current_state = initial_state

print(” Research Agent: Gathering information…”)
current_state = research_agent(current_state)
print(” Research phase completed!n”)

print(” Analysis Agent: Analyzing findings…”)
current_state = analysis_agent(current_state)
print(” Analysis phase completed!n”)

print(” Report Agent: Generating comprehensive report…”)
final_state = report_agent(current_state)
print(” Report generation completed!n”)

print(“=” * 60)
print(” MULTI-AGENT WORKFLOW COMPLETED SUCCESSFULLY!”)
print(“=” * 60)

final_report = final_state[‘final_report’]
print(f”n COMPREHENSIVE RESEARCH REPORT:n”)
print(final_report)

return final_state

We construct a StateGraph, add our three agents as nodes with conditional edges dictated by should_continue, set the entry point to “research,” and compile the graph into an executable workflow. We then define run_research_assistant() to initialize the state, sequentially invoke each agent, research, analysis, and report, print status updates, and return the final report. Check out the Full Codes here.

Copy CodeCopiedUse a different Browserif __name__ == “__main__”:
print(” Advanced LangGraph Multi-Agent System Ready!”)
print(” Remember to set your GOOGLE_API_KEY!”)

example_queries = [
“Impact of renewable energy on global markets”,
“Future of remote work post-pandemic”
]

print(f”n Example queries you can try:”)
for i, query in enumerate(example_queries, 1):
print(f” {i}. {query}”)

print(f”n Usage: run_research_assistant(‘Your research question here’)”)

result = run_research_assistant(“What are emerging trends in sustainable technology?”)

We define the entry point that kicks off our multi-agent system, displaying a readiness message, example queries, and reminding us to set the Google API key. We showcase sample prompts to demonstrate how to interact with the research assistant and then execute a test run on “emerging trends in sustainable technology,” printing the end-to-end workflow output.

In conclusion, we reflect on how this modular setup empowers us to rapidly prototype complex workflows. Each agent encapsulates a distinct phase of intelligence gathering, interpretation, and delivery, allowing us to swap in real APIs or extend the pipeline with new tools as our needs evolve. We encourage you to experiment with custom tools, adjust the state structure, and explore alternate LLMs. This framework is designed to grow with your research and product goals. As we iterate, we continually refine our agents’ prompts and capabilities, ensuring that our multi-agent system remains both robust and adaptable to any domain.

Check out the Full Codes here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post A Coding Implementation to Advanced LangGraph Multi-Agent Research Pipeline for Automated Insights Generation appeared first on MarkTechPost.

The DIVA logistics agent, powered by Amazon Bedrock

DTDC is India’s leading integrated express logistics provider, operating the largest network of customer access points in the country. DTDC’s technology-driven logistics solutions cater to a wide range of customers across diverse industry verticals, making them a trusted partner in delivering excellence.
DTDC Express Limited receives over 400,000 customer queries each month, ranging from tracking requests to serviceability checks and shipping rates. With such a high volume of shipments, their existing logistics agent, DIVA, was operated on a rigid, guided workflow, forcing users to follow a structured path rather than engaging in natural, dynamic conversations. The lack of flexibility resulted in increased burden on customer support teams, longer resolution times, and poor customer experience.
DTDC was looking for a more flexible, intelligent assistant—one that could understand context, manage complex queries, and improve efficiency while reducing reliance on human agents. To achieve a better customer experience, DTDC decided to enhance DIVA with generative AI using Amazon Bedrock.
ShellKode is an AWS Partner, born-in-the-cloud company specializing in modernization, security, data, generative AI, and machine learning (ML). With a mission to drive transformative growth, ShellKode empowers businesses through state-of-the-art technology solutions that address complex challenges and unlock new opportunities. Using deep industry expertise, they deliver tailored strategies that foster innovation, efficiency, and long-term success in an evolving digital landscape.
In this post, we discuss how DTDC and ShellKode used Amazon Bedrock to build DIVA 2.0, a generative AI-powered logistics agent.
Solution overview
To address the limitations of the existing logistics agent, ShellKode built an advanced agentic assistant using Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, and an API integration layer.
When customers interact with DIVA 2.0, they experience a seamless, conversational interface that understands and responds to their queries naturally. Whether tracking a package, checking shipping rates, or inquiring about service availability, users can ask questions in their own words without following a rigid script. DIVA 2.0’s enhanced AI capabilities allow it to understand context, manage complex requests, and provide accurate, personalized responses, significantly improving the overall customer experience and reducing the need for human intervention. The following high-level architecture diagram illustrates the application flow and the solution architecture with AWS services.

The DTDC logistics agent is designed using a modular and scalable architecture to provide seamless integration and high performance. This streamlined workflow demonstrates how a generative AI-powered serverless logistics agent using AWS App Runner, Amazon Bedrock Agents, AWS Lambda, and a vector-based knowledge base handles user queries ranging from tracking requests to serviceability checks and shipping rates intelligently and efficiently.
The logistics agent is hosted as a static website using Amazon CloudFront and Amazon Simple Storage Service (Amazon S3). The logistics agent is integrated with the DTDC website, which provides an intuitive and user-friendly interface for end-user interactions (see the following screenshot).

An end-user accesses the logistics agent through the DTDC website and submits queries like tracking shipments, checking service availability, calculating shipping rates, FAQs, and so on using natural language.The user requests are processed by App Runner, which helps run the web application (including API services, backend web services, and websites) on AWS. App Runner is hosted with multiple API services, such as the Amazon Bedrock Agents API and Dashboard API. App Runner initiates the Amazon Bedrock Agents API based on the user requests.
Amazon Bedrock is a fully managed service that offers a choice of industry leading foundation models (FMs) along with a broad set of capabilities to build generative AI applications, simplifying development with security, privacy, and responsible AI. With Amazon Bedrock, your content is not used to improve the base models and is not shared with any model providers. Amazon Bedrock Guardrails provides configurable safeguards to help safely build generative AI applications at scale. To learn more, see Build safe and responsible generative AI applications with guardrails. AWS Identity and Access Management (IAM) helps administrators securely control who can be authenticated and authorized to use Amazon Bedrock resources.
The Amazon Bedrock agents are configured in Amazon Bedrock. An Amazon Bedrock agent receives the request and interprets the user’s intent using its natural language understanding capabilities. Based on the interpreted intent, the agent triggers an appropriate Lambda function, such as:

Tracking consignments
Pricing information
Location serviceability check
Support ticket creation

The triggered Lambda function calls the following client APIs, retrieves the relevant data, and returns the response to the agent:

Tracking System API – Retrieves real-time status and provides updates on consignment shipment tracking
Delivery Franchise Location API – Checks the service availability to deliver the parcels between the locations
Pricing System API – Calculates the shipping rates based on shipment details provided by the user
Customer Care API – Creates a support ticket for the end-users

The agent passes the response to the large language model (LLM), in this case Anthropic’s Claude 3.0 on Amazon Bedrock, which understands the context of the retrieved data, processes it, and generates a meaningful response for the user.
The knowledge base contains web-scraped content from the DTDC website, internal support documentation, FAQs, and operational data, enabling real-time updates and accurate responses. The knowledge base contents are stored as vector embeddings in Amazon OpenSearch Service, providing quick and relevant responses. For general queries, the logistics agent fetches information from Amazon Bedrock Knowledge Bases, providing accuracy and relevance. Using semantic similarity search, relevant chunks of information are retrieved from the knowledge base based on the user’s query, which Amazon Bedrock then uses to generate a context-aware response. If no relevant data is found in the knowledge base, a fallback response (preconfigured in the Amazon Bedrock prompt) is returned, indicating that the system couldn’t assist with the request.
The logistics agent queries and associated responses are stored in Amazon Relational Database Service (Amazon RDS) for PostgreSQL for enhanced scalability and relational data handling. App Runners initiates the Dashboard API call to update the queries and associated responses in Amazon RDS. We discuss this in more detail the following section.
Throughout the process, Amazon CloudWatch Logs captures key events such as intent recognition, Lambda invocations, API responses, and fallback triggers for auditing and system monitoring. AWS CloudTrail records and monitors activity in the AWS account, including actions taken by users, roles, or AWS services. It logs these events, which can be used for operational auditing, governance, and compliance.
Amazon GuardDuty is a threat detection service that continuously monitors, analyzes, and processes AWS data sources and logs in your AWS environment. GuardDuty uses threat intelligence feeds, such as lists of malicious IP addresses and domains, file hashes, and ML models to identify suspicious and potentially malicious activity in the AWS environment.
Logistics agent dashboard
The following high-level architecture diagram illustrates the logistics agent dashboard, which captures the end-user interactions and its associated responses.

The logistics agent dashboard is hosted as a static website using CloudFront and Amazon S3. Dashboard access is allowed only for the DTDC admin team.
The dashboard is populated through API calls using Amazon API Gateway with Lambda as a backend, which retrieves the dashboard data from Amazon RDS for PostgreSQL.
The dashboard provides real-time insights into the logistics agent performance, including accuracy, unresolved queries, query categories, session statistics, and user interaction data (see the following screenshot). It provides actionable insights with features such as heat maps, pie charts, and session logs. Real-time data is logged and analyzed on the dashboard, enabling continuous improvement and quick issue resolution.

Solution challenges and benefits
When implementing DIVA 2.0, DTDC and ShellKode faced several significant challenges. Integrating real-time data from multiple legacy systems was crucial for providing accurate, up-to-date information on tracking, rates, and serviceability. This was likely addressed through the robust API integration capabilities of Amazon Bedrock Agents. Another major hurdle was training the AI to understand complex logistics terminology and multi-step queries, which was overcome by using Amazon Bedrock LLMs and Amazon Bedrock Knowledge Bases, fine-tuned with industry-specific data. The team also had to navigate the delicate process of transitioning from the old rigid DIVA system while maintaining service continuity and preserving historical data, potentially employing a phased approach with parallel systems. Finally, scaling the solution to handle over 400,000 monthly queries while maintaining performance was a significant challenge, addressed by using the cloud infrastructure of Amazon Bedrock Agents for optimal scalability and performance. These challenges underscore the complexity of upgrading to an AI-powered system in a high-volume, data-intensive industry like logistics, and highlight how AWS solutions provided the necessary tools to overcome these obstacles. DTDC realized the following benefits from powering the logistics agent with generative AI using Amazon Bedrock:

Enhanced conversations and real-time data access with customer support agents – Powered by Amazon Bedrock Agents, the solution improves natural language understanding, enabling more fluid and engaging conversations. With multi-step reasoning, it can handle a broader range of queries with greater accuracy. Additionally, by integrating seamlessly with DTDC’s API layer, the logistics agent provides real-time access to vital information, such as tracking shipments, service availability, and calculating shipping rates. The combination of advanced conversational capabilities and real-time data provides fast, accurate, and contextually relevant responses.
Intelligent data processing and accurate FAQ responses – For complex queries, the logistics agent uses LLM technology to process raw data and deliver structured, tailored responses. This makes sure users get clear, actionable insights. For frequently asked questions, the logistics agent uses Amazon Bedrock Knowledge Bases to deliver precise answers without requiring human support, reducing wait times and enhancing the overall user experience.
Reduced live agent dependency and continuous improvement – Although the logistics agent hasn’t eliminated the need for customer support, the number of queries handled by the customer support team has reduced by 51.4%. The system provides valuable insights into key performance metrics like peak query times, unresolved issues, and overall engagement through integrated real-time analytics, helping refine and improve the assistant’s capabilities over time.

Results
The generative AI-powered logistics agent has reduced the burden on customer support teams and shortened resolution times, resulting in better customer experience:

Powered by Amazon Bedrock, DIVA 2.0 understands queries in natural language and supports dynamic conversations with a response accuracy of 93%
Based on the last 3 months of dashboard metrics data, they observed the following:

71% of the inquiries were related to consignments (256,048), whereas 29.5% were general inquiries (107,132)
51.4% of consignment inquiries (131,530) didn’t result in a support ticket, whereas 48.6% (124,518) led to new support ticket creation
Of the inquiries that resulted in tickets, 40% started with the customer support center before moving to the AI assistant, whereas 60% began with the assistant before involving the customer support center

DIVA 2.0 has reduced the number of queries handled by the customer support team by 51.4%. DTDC’s support team can now focus on more critical issues, improving overall efficiency.
Summary
This post demonstrated how Amazon Bedrock can transform a traditional chatbot to a generative AI-powered logistics agent that provides better customer experience through dynamic conversation. For businesses facing similar challenges, this solution offers a blueprint for modernizing your AI assistant while maintaining compliance with industry standards.
To learn more about this AWS solution, contact AWS for further assistance. AWS can provide detailed information about implementation, pricing, and how to tailor the solution to your specific business needs.

About the authors
Rishi Sareen – Chief Information Officer (CIO), DTDC is a seasoned technology leader with over two decades of experience in driving digital transformation, enterprise IT strategy, and innovation across the logistics and supply chain sector. He specializes in building agile, AI-driven, and secure technology ecosystems that enhance operational efficiency and customer experience. Rishi leads initiatives spanning system modernization, data intelligence, automation, cybersecurity, cloud, and artificial intelligence. He is deeply committed to aligning technology with business outcomes while fostering a culture of continuous improvement and purposeful innovation. A strong advocate for people-centric leadership, Rishi places high emphasis on nurturing talent, building high-performing teams, and mentoring future-ready technology leaders who can thrive in dynamic, AI-powered environments. Known for his strategic vision and disciplined execution, he has led large-scale digital initiatives and transformation programs that deliver lasting business impact.
Arunraja Karthick – Head – IT Services & Security (CISO), DTDC is a strategic IT and cybersecurity leader with over 15 years of experience driving enterprise-scale digital transformation. As the Head of IT Services & Security (CISO) at DTDC Express Limited, he leads the organization’s core IT, cloud, and security programs—transforming legacy environments into agile, secure, and cloud-native ecosystems. Under his leadership, DTDC has adopted a hybrid cloud architecture spanning AWS, GCP, and on-prem colocation, with a vision to enable dynamic workload mobility and vendor-neutral scalability. Arunraja has led critical modernization efforts, including the migration of key business applications to microservices and containerized platforms, while ensuring high availability and regulatory compliance. Known for his deep technical insight and execution discipline, he has implemented enterprise-wide cybersecurity frameworks—from Email DLP, Mobile Device Management, and Conditional Access to Hybrid WAF and advanced SOC operations. He has also championed secure access transformation through Zero Trust-aligned Secure WebVPN, redefining how internal users access corporate apps. Arunraja’s leadership is grounded in platform thinking, automation, and a user-first mindset. His recent initiatives include the enterprise rollout of GenAI copilots for customer experience and operations, as well as unified policy-based DLP and content control mechanisms across endpoints and cloud. Recognized as an Influential Technology Leader, Arunraja continues to challenge conventional IT boundaries—aligning security, agility, and innovation to power business evolution.
Bakrudeen K an AWS Ambassador, leads the AI/ML practice at Shellkode, focusing on driving innovation in artificial intelligence, especially in Generative AI. He plays a key role in building teams and advanced AI solutions, Agentic Assistants, and other next-gen technologies. Bakrudeen has made notable contributions to AI/ML research and development. In 2023 and 2024, he received the Generative AI Consulting Excellence Partner Award at the AI Conclave and the Social Impact Partner of the Year Award for Generative AI at AWS re:Invent 2024, both on behalf of Shellkode reflecting the team’s strong commitment to innovation and impact in the AI space.
Suresh Kanniappan is a Solutions Architect at AWS, handling Automotive, Manufacturing and Logistics enterprises in India. He is passionate about cloud security and Industry solutions that can solve real world problems. Prior to AWS, he worked for AWS customers and partners in consulting, migration and solution architecture roles for over 14 years.
Sid Chandilya is a Sr. Customer Relations Manager at AWS, responsible for tech led business transformation with Automotive, Manufacturing and Logistics enterprises in India. Sid is peculiarly passionate about challenging status quos, building a joint “Think Big” vision with customer CXOs and leveraging Ai infused tech to accelerate outcomes. He is known for his deep understanding of industry imperatives (working backward from customer) and translating the business pain points into tech solution.

Automate enterprise workflows by integrating Salesforce Agentforce wit …

AI agents are rapidly transforming enterprise operations. Although a single agent can perform specific tasks effectively, complex business processes often span multiple systems, requiring data retrieval, analysis, decision-making, and action execution across different systems. With multi-agent collaboration, specialized AI agents can work together to automate intricate workflows.
This post explores a practical collaboration, integrating Salesforce Agentforce with Amazon Bedrock Agents and Amazon Redshift, to automate enterprise workflows.
Multi-agent collaboration in Enterprise AI
Enterprise environments today are complex, featuring diverse technologies across multiple systems. Salesforce and AWS provide distinct advantages to customers. Many organizations already maintain significant infrastructure on AWS, including data, AI, and various business applications such as ERP, finance, supply chain, HRMS, and workforce management systems. Agentforce delivers powerful AI-driven agent capabilities that are grounded in enterprise context and data. While Salesforce provides a rich source of trusted business data, customers increasingly need agents that can access and act on information across multiple systems. By integrating AWS-powered AI services into Agentforce, organizations can orchestrate intelligent agents that operate across Salesforce and AWS, unlocking the strengths of both.
Agentforce and Amazon Bedrock Agents can work together in flexible ways, leveraging the unique strengths of both platforms to deliver smarter, more comprehensive AI workflows. Example collaboration models include:

Agentforce as the primary orchestrator:

Manages end to end customer-oriented workflows
Delegates specialized tasks to Amazon Bedrock Agents as needed through custom actions
Coordinates access to external data and services across systems

This integration creates a more powerful solution that maximizes the benefits of both Salesforce and AWS, so you can achieve better business outcomes through enhanced AI capabilities and cross-system functionality.
Agentforce overview
Agentforce brings digital labor to every employee, department, and business process, augmenting teams and elevating customer experiences.It works seamlessly with your existing applications, data, and business logic to take meaningful action across the enterprise. And because it’s built on the trusted Salesforce platform, your data stays secure, governed, and in your control. With Agentforce, you can:

Deploy prebuilt agents designed for specific roles, industries, or use cases
Enable agents to take action with existing workflows, code, and APIs
Connect your agents to enterprise data securely
Deliver accurate and grounded outcomes through the Atlas Reasoning Engine

Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases overview
Amazon Bedrock is a fully managed AWS service offering access to high-performing foundation models (FMs) from various AI companies through a single API. In this post, we discuss the following features:

Amazon Bedrock Agents – Managed AI agents use FMs to understand user requests, break down complex tasks into steps, maintain conversation context, and orchestrate actions. They can interact with company systems and data sources through APIs (configured through action groups) and access information through knowledge bases. You provide instructions in natural language, select an FM, and configure data sources and tools (APIs), and Amazon Bedrock handles the orchestration.
Amazon Bedrock Knowledge Bases – This capability enables agents to perform Retrieval Augmented Generation (RAG) using your company’s private data sources. You connect the knowledge base to your data hosted in AWS, such as in Amazon Simple Storage Service (Amazon S3) or Amazon Redshift, and it automatically handles the vectorization and retrieval process. When asked a question or given a task, the agent can query the knowledge base to find relevant information, providing more accurate, context-aware responses and decisions without needing to retrain the underlying FM.

Agentforce and Amazon Bedrock Agent integration patterns
Agentforce can call Amazon Bedrock agents in different ways, allowing flexibility to build different architectures. The following diagram illustrates synchronous and asynchronous patterns.

For a synchronous or request-reply interaction, Agentforce uses custom agent actions facilitated by External Services, Apex Invocable Methods, or Flow to call an Amazon Bedrock agent. The authentication to AWS is facilitated using named credentials. Named credentials are designed to securely manage authentication details for external services integrated with Salesforce. They alleviate the need to hardcode sensitive information like user names and passwords, minimizing the risk of exposure and potential data breaches. This separation of credentials from the application code can significantly enhance security posture. Named credentials streamline integration by providing a centralized and consistent method for handling authentication, reducing complexity and potential errors. You can use Salesforce Private Connect to provide a secure private connection with AWS using AWS PrivateLink. Refer to Private Integration Between Salesforce and Amazon API Gateway for additional details.

For asynchronous calls, Agentforce uses Salesforce Event Relay and Flow with Amazon EventBridge to call an Amazon Bedrock agent.

In this post, we discuss the synchronous call pattern. We encourage you to explore Salesforce Event Relay with EventBridge to build event-driven agentic AI workflows. Agentforce also offers the Agent API, which makes it straightforward to call an Agentforce agent from an Amazon Bedrock agent, using EventBridge API destinations, for bi-directional agentic AI workflows.
Solution overview
To illustrate the multi-agent collaboration between Agentforce and AWS, we use the following architecture, which provides access to Internet of Things (IoT) sensor data to the Agentforce agent and handles potentially erroneous sensor readings using a multi-agent approach.

The example workflow consists of the following steps:

Coral Cloud has equipped their rooms with smart air conditioners and temperature sensors. These IoT devices capture critical information such as room temperature and error code and store it in Coral Cloud’s AWS database in Amazon Redshift.
Agentforce agent calls an Amazon Bedrock agent through the Agent Wrapper API with questions such as “What is the temperature in room 123” to answer customer questions related to the comfort of the room. This API is implemented as an AWS Lambda function, acting as the entry point in the AWS Cloud.
The Amazon Bedrock agent, upon receiving the request, needs context. It queries its configured knowledge base by generating the necessary SQL query.
The knowledge base is connected to a Redshift database containing historical sensor data or contextual information (like the sensor’s thresholds and maintenance history). It retrieves relevant information based on the agent’s query and responds back with an answer.
With the initial data and the context from the knowledge base, the Amazon Bedrock agent uses its underlying FM and natural language instructions to decide the appropriate action. In this scenario, detecting an error prompts it to create a case when it receives erroneous readings from a sensor.
The action group contains the Agentforce Agent Wrapper Lambda function. The Amazon Bedrock agent securely passes the necessary details (like which sensor or room needs a case) to this function.
The Agentforce Agent Wrapper Lambda function acts as an adapter. It translates the request from the Amazon Bedrock agent into the specific format required by the Agentforce service‘s API or interface.
The Lambda function calls Agentforce, instructing it to create a case associated with the contact or account linked to the sensor that sent the erroneous reading.
Agentforce uses its internal logic (agent, topics, and actions) to create or escalate the case within Salesforce.

This workflow demonstrates how Amazon Bedrock Agents orchestrates tasks, using Amazon Bedrock Knowledge Bases for context and action groups (through Lambda) to interact with Agentforce to complete the end-to-end process.
Prerequisites
Before building this architecture, make sure you have the following:

AWS account – An active AWS account with permissions to use Amazon Bedrock, Lambda, Amazon Redshift, AWS Identity and Access Management (IAM), and API Gateway.
Amazon Bedrock access – Access to Amazon Bedrock Agents and to Anthropic’s Claude 3.5 Haiku v1 enabled in your chosen AWS Region.
Redshift resources – An operational Redshift cluster or Amazon Redshift Serverless endpoint. The relevant tables containing sensor data (historical readings, sensor thresholds, and maintenance history) must be created and populated.
Agentforce system – Access to and understanding of the Agentforce system, including how to configure it. You can sign up for a developer edition with Agentforce and Data Cloud.
Lambda knowledge – Familiarity with creating, deploying, and managing Lambda functions (using Python).
IAM roles and policies – Understanding of how to create IAM roles with the necessary permissions for Amazon Bedrock Agents, Lambda functions (to call Amazon Bedrock, Amazon Redshift, and the Agentforce API), and Amazon Bedrock Knowledge Bases.

Prepare Amazon Redshift data
Make sure your data is structured and available in your Redshift instance. Note the database name, credentials, and table and column names.
Create IAM roles
For this post, we create two IAM roles:

custom_AmazonBedrockExecutionRoleForAgents:

Attach the following AWS managed policies to the role:

AmazonBedrockFullAccess
AmazonRedshiftDataFullAccess

In the trust relationship, provide the following trust policy (provide your AWS account ID):

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “AmazonBedrockAgentBedrockFoundationModelPolicyProd”,
“Effect”: “Allow”,
“Principal”: {
“Service”: “bedrock.amazonaws.com”
},
“Action”: “sts:AssumeRole”,
“Condition”: {
“StringEquals”: {
“aws:SourceAccount”: “YOUR_ACCOUNT_ID”
}
}
}
]
}

custom_AWSLambdaExecutionRole:

Attach the following AWS managed policies to the role:

AmazonBedrockFullAccess
AmazonLambdaBasicExecutionRole

In the trust relationship, provide the following trust policy (provide your AWS account ID):

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {
“Service”: “lambda.amazonaws.com”
},
“Action”: “sts:AssumeRole”,
“Condition”: {
“StringEquals”: {
“aws:SourceAccount”: “YOUR_ACCOUNT_ID”
}
}
}
]
}

Create an Amazon Bedrock knowledge base
Complete the following steps to create an Amazon Bedrock knowledge base:

On the Amazon Bedrock console, choose Knowledge Bases in the navigation pane.
Choose Create and Knowledge Base with structured data store.

On the Provide Knowledge Base details page, provide the following information:

Enter a name and optional description.
For Query engine, select Amazon Redshift.
For IAM permissions, select Use an existing service role and choose custom_AmazonBedrockExecutionRoleForAgents.
Choose Next.

For Query engine connection details, select Redshift provisioned and choose your cluster.
For Authentication, select IAM Role.
For Storage configuration, select Amazon Redshift database and Redshift database list.
On the Configure query engine page, provide the following information:
Provide table and column descriptions. The following is an example.
Choose Create Knowledge Base.

After you create the knowledge base, open the Redshift query editor and grant permissions for the role to access Redshift tables by running the following queries:

CREATE USER “IAMR:custom_AmazonBedrockExecutionRoleForAgents” WITH PASSWORD DISABLE;

GRANT SELECT ON ALL TABLES IN SCHEMA dev.knowledgebase TO “IAMR:custom_AmazonBedrockExecutionRoleForAgents”;

GRANT USAGE ON SCHEMA dev.knowledgebase TO “IAMR:custom_AmazonBedrockExecutionRoleForAgents”;

For more information, refer to set up your query engine and permissions for creating a knowledge base with structured data store.

5. Choose Sync to sync the query engine.

Make sure the status shows as Complete before moving to the next steps.

When the sync is complete, choose Test Knowledge Base.
Select Retrieval and response generation: data sources and model and choose Claude 3.5 Haiku for the model.
Enter a question about your data and make sure you get a valid answer.

Create an Amazon Bedrock agent
Complete the following steps to create an Amazon Bedrock agent:

On the Amazon Bedrock console, choose Agents in the navigation pane.
Choose Create agent.
On the Agent details page, provide the following information:

Enter a name and optional description.
For Agent resource role, select Use an existing service role and choose custom_AmazonBedrockExecutionRoleForAgents.

Provide detailed instructions for your agent. The following is an example:

You are an IoT device monitoring and alerting agent.
You have access to the structured data containing reading, maintenance, threshold data for IoT devices.
You answer questions about device reading, maintenance schedule and thresholds.
You can also create case via Agentforce.
When you receive comma separated values parse them as device_id, temperature, voltage, connectivity and error_code.
First check if the temperature is less than min temperature, more than max temperature and connectivity is more than the connectivity threshold for the product associated with the device id.
If there is an error code, send information to agentforce to create case. The information sent to agentforce should include device readings such as device id, error code.
It should also include the threshold values related to the product associated with the device such as min temperature, max temperature and connectivity,
In response to your call to agentforce just return the summary of the information provided with all the attributes provided.
Do not omit any information in the response. Do not include the word escalated in agent.

Choose Save to save the agent.
Add the knowledge base you created in previous step to this agent.

Provide detailed instructions about the knowledge base for the agent.

Choose Save and then choose Prepare the agent.
Test the agent by asking a question (in the following example, we ask about sensor readings).

Choose Create alias.
On the Create alias page, provide the following information:

Enter an alias name and optional description.
For Associate version, select Create a new version and associate it to this alias.
For Select throughput, select On-demand.
Choose Create alias.

Note down the agent ID, which you will use in subsequent steps.
Note down the alias ID and agent ID, which you will use in subsequent steps.

Create a Lambda function
Complete the following steps to create a Lambda function to receive requests from Agentforce:

On the Lambda console, choose Functions in the navigation pane.
Choose Create function.
Configure the function with the following logic to receive requests through API Gateway and call Amazon Bedrock agents:

import boto3
import uuid
import json
import pprint
import traceback
import time
import logging
from agent_utils import invoke_agent_generate_response
logger = logging.getLogger()
logger.setLevel(logging.INFO)
bedrock_agent_runtime_client = boto3.client(
service_name=”bedrock-agent-runtime”,
region_name=”REGION_NAME”, # replace with the region name from your account
)
def lambda_handler(event, context):
logger.info(event)
body = event[‘body’]
input_text = json.loads(body)[‘inputText’]
agent_id = ‘XXXXXXXX’ # replace with the agent id from your account
agent_alias_id = ‘XXXXXXX’ # replace with the alias id from your account
session_id:str = str(uuid.uuid1()) # random identifier
enable_trace:bool = False
end_session:bool = False
final_answer = None
response = call_agent(input_text, agent_id, agent_alias_id)
print(“response : “)
print(response)

return {
‘headers’: {
‘Content-Type’ : ‘application/json’,
‘Access-Control-Allow-Headers’: ‘*’,
‘Access-Control-Allow-Origin’: ‘*’,
‘Access-Control-Allow-Methods’: ‘*’
},

‘statusCode’: 200,
‘body’: json.dumps({“outputText” : response })
}
def call_agent(inputText, agentId, agentAliasId):
session_id = str(uuid.uuid1())
enable_trace = False
end_session = False
while True:
try:
agent_response = bedrock_agent_runtime_client.invoke_agent(
inputText=inputText,
agentId=agentId,
agentAliasId=agentAliasId,
sessionId=session_id,
enableTrace=enable_trace,
endSession=end_session
)
logger.info(“Agent raw response:”)
pprint.pprint(agent_response)
if ‘completion’ not in agent_response:
raise ValueError(“Missing ‘completion’ in agent response”)
for event in agent_response[‘completion’]:
chunk = event.get(‘chunk’)
# print(‘chunk: ‘, chunk)
if chunk:
decoded_bytes = chunk.get(“bytes”).decode()
# print(‘bytes: ‘, decoded_bytes)
return decoded_bytes
except Exception as e:
print(traceback.format_exc())
return f”Error: {str(e)}”

Define the necessary IAM permissions by assigning custom_AWSLambdaExecutionRole.

Create a REST API
Complete the following steps to create a REST API in API Gateway:

On the API Gateway console, create a REST API with proxy integration.

Enable API key required to protect the API from unauthenticated access.

Configure the usage plan and API key. For more details, see Set up API keys for REST APIs in API Gateway.
Deploy the API.
Note down the Invoke URL to use in subsequent steps.

Create named credentials in Salesforce
Now that you have created an Amazon Bedrock agent with an API Gateway endpoint and Lambda wrapper, let’s complete the configuration on the Salesforce side. Complete the following steps:

Log in to Salesforce.
Navigate to Setup, Security, Named Credentials.
On the External Credentials tab, choose New.

Provide the following information:

Enter a label and name.
For Authentication Protocol, choose Custom.
Choose Save.

Open the External Credentials entry to provide additional details:

Under Principals, create a new principal and provide the parameter name and value.

Under Custom Headers, create a new entry and provide a name and value.
Choose Save.

Now you can grant access to the agent user to access these credentials.

Navigate to Setup, Users, User Profile, Enabled External Credential Principal Access and add the external credential principal you created to the allow list.

Choose New to create a named credentials entry.
Provide details such as label, name, the URL of the API Gateway endpoint, and authentication protocol, then choose Save.

You can optionally use Salesforce Private Connect with PrivateLink to provide a secure private connection with. This allows critical data to flow from the Salesforce environment to AWS without using the public internet.
Add an external service in Salesforce
Complete the following steps to add an external service in Salesforce:

In Salesforce, navigate to Setup, Integrations, External Services and choose Add an External Service.
For Select an API source, choose From API Specification.

On the Edit an External Service page, provide the following information:

Enter a name and optional description.
For Service Schema, choose Upload from local.
For Select a Named Credential, choose the named credential you created.

Upload an Open API specification for the API Gateway endpoint. See the following example:

openapi: 3.0.0
info:
title: Bedrock Agent Wrapper API
version: 1.0.0
description: Bedrock Agent Wrapper API
paths:
/proxy:
post:
operationId: call-bedrock-agent
summary: Call Bedrock Agent
description: Call Bedrock Agent
requestBody:
description: input
required: true
content:
application/json:
schema:
$ref: ‘#/components/schemas/input’
responses:
‘200’:
description: Successful response
content:
application/json:
schema:
$ref: ‘#/components/schemas/output’
‘500’:
description: Server error
components:
schemas:
input:
type: object
properties:
inputText:
type: string
agentId:
type: string
agentAlias:
type: string
output:
type: object
properties:
outputText:
type: string

Choose Save and Next.
Enable the operation to make it available for Agentforce to invoke.
Choose Finish.

Create an Agentforce agent action to use the external service
Complete the following steps to create an Agentforce agent action:

In Salesforce, navigate to Setup, Agentforce, Einstein Generative AI, Agentforce Studio, Agentforce Assets.
On the Actions tab, choose New Agent Action.
Under Connect to an existing action, provide the following information:

For Reference Action Type, choose API.
For Reference Action Category, choose External Services.
For Reference Action, choose the Call Bedrock Agent action that you configured.
Enter an agent action label and API name.
Choose Next.

Provide the following information to complete the agent action configuration:

For Agent Action Instructions, enter Call Bedrock Agent to get the information about device readings, sensor readings, maintenance or threshold information.
For Loading Text, enter Calling Bedrock Agent.
Under Input, for Body, enter Provide the input in the input Text field.
Under Outputs, for 200, enter Successful response.

Save the agent action.

Configure the Agentforce agent to use the agent action
Complete the following steps to configure the Agentforce agent to use the agent action:

In Salesforce, navigate to Setup, Agentforce, Einstein Generative AI, Agentforce Studio, Agentforce Agents and open the agent in Agent Builder.
Create a new topic.
On the Topic Configuration tab, provide the following information:

For Name, enter Device Information.
For Classification Description, enter This topic handles inquiries related to device and sensor information, including reading, maintenance, and threshold.
For Scope, enter Your job is only to provide information about device readings, sensor readings, device maintenance, sensor maintenance, and threshold. Do not attempt to address issues outside of providing device information.
For Instructions, enter the following:

If a user asks for device readings or sensor readings, provide the information.
If a user asks for device maintenance or sensor maintenance, provide the information.
When searching for device information, include the device or sensor id and any relevant keywords in your search query.

On the This Topic’s Actions tab, choose New and Add from Asset Library.

Choose the Call Bedrock Agent action.

Activate the agent and enter a question, such as “What is the latest reading for sensor with device id CITDEV003.”

The agent will indicate that it is calling the Amazon Bedrock agent, as shown in the following screenshot.

The agent will fetch the information using the Amazon Bedrock agent from the associated knowledge base.
Clean up
To avoid additional costs, delete the resources that you created when you no longer need them:

Delete the Amazon Bedrock knowledge base:

On the Amazon Bedrock console, choose Knowledge Bases in the navigation pane.
Select the knowledge base you created and choose Delete.

Delete the Amazon Bedrock agent:

On the Amazon Bedrock console, choose Agents in the navigation pane.
Select the agent you created and choose Delete.

Delete the Lambda function:

On the Lambda console, choose Functions in the navigation pane.
Select the function you created and choose Delete.

Delete the REST API:

On the API Gateway console, choose APIs in the navigation pane.
Select the REST API you created and choose Delete.

Conclusion
In this post, we described an architecture that demonstrates the power of combining AI services on AWS with Agentforce. By using Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases for contextual understanding through RAG, and Lambda functions and API Gateway to bridge interactions with Agentforce, businesses can build sophisticated, automated workflows. As AI capabilities continue to grow, such collaborative multi-agent systems will become increasingly central to enterprise automation strategies. In an upcoming post, we will show you how to build the asynchronous integration pattern from Agentforce to Amazon Bedrock using Salesforce Event Relay.
To get started, see Become an Agentblazer Innovator and refer to How Amazon Bedrock Agents works.

About the authors
Yogesh Dhimate is a Sr. Partner Solutions Architect at AWS, leading technology partnership with Salesforce. Prior to joining AWS, Yogesh worked with leading companies including Salesforce driving their industry solution initiatives. With over 20 years of experience in product management and solutions architecture Yogesh brings unique perspective in cloud computing and artificial intelligence.
Kranthi Pullagurla has over 20+ years’ experience across Application Integration and Cloud Migrations across Multiple Cloud providers. He works with AWS Partners to build solutions on AWS that our joint customers can use. Prior to joining AWS, Kranthi was a strategic advisor at MuleSoft (now Salesforce). Kranthi has experience advising C-level customer executives on their digital transformation journey in the cloud.
Shitij Agarwal is a Partner Solutions Architect at AWS. He creates joint solutions with strategic ISV partners to deliver value to customers. When not at work, he is busy exploring New York city and the hiking trails that surround it, and going on bike rides.
Ross Belmont is a Senior Director of Product Management at Salesforce covering Platform Data Services. He has more than 15 years of experience with the Salesforce ecosystem.
Sharda Rao is a Senior Director of Product Management at Salesforce covering Agentforce Go To Market strategy
Hunter Reh is an AI Architect at Salesforce and a passionate builder who has developed over 100 agents since the launch of Agentforce. Outside of work, he enjoys exploring new trails on his bike or getting lost in a great book.

How Amazon Bedrock powers next-generation account planning at AWS

At AWS, our sales teams create customer-focused documents called account plans to deeply understand each AWS customer’s unique goals and challenges, helping account teams provide tailored guidance and support that accelerates customer success on AWS. As our business has expanded, the account planning process has become more intricate, requiring detailed analysis, reviews, and cross-team alignment to deliver meaningful value to customers. This complexity, combined with the manual review effort involved, has led to significant operational overhead. To address this challenge, we launched Account Plan Pulse in January 2025, a generative AI tool designed to streamline and enhance the account planning process. Implementing Pulse delivered a 37% improvement in plan quality year-over-year, while decreasing the overall time to complete, review, and approve plans by 52%.
In this post, we share how we built Pulse using Amazon Bedrock to reduce review time and provide actionable account plan summaries for ease of collaboration and consumption, helping AWS sales teams better serve our customers. Amazon Bedrock is a comprehensive, secure, and flexible service for building generative AI applications and agents. It connects you to leading foundation models (FMs), services to deploy and operate agents, and tools for fine-tuning, safeguarding, and optimizing models, along with knowledge bases to connect applications to your latest data so that you have everything you need to quickly move from experimentation to real-world deployment.
Challenges with increasing scale and complexity
As AWS continued to grow and evolve, our account planning processes needed to adapt to meet increasing scale and complexity. Before enterprise-ready large language models (LLMs) became available through Amazon Bedrock, we explored rule-based document processing to evaluate account plans, which proved inadequate for handling nuanced content and growing document volumes. By 2024, three critical challenges had emerged:

Disparate plan quality and format – With teams operating across numerous AWS Regions and serving customers in diverse industries, account plans naturally developed variations in structure, detail, and format. This inconsistency made it difficult to make sure critical customer needs were described effectively and consistently. Additionally, the evaluation of account plan quality was inherently subjective, relying heavily on human judgment to assess each plan’s depth, strategic alignment, and customer focus.
Resource-intensive review process – The quality assessment process relied on manual reviews by sales leadership. Though thorough, these reviews consumed valuable time that could otherwise be devoted to strategic customer engagements. As our business scaled, this approach created bottlenecks in plan approval and implementation.
Knowledge silos – We identified untapped potential for cross-team collaboration. Developing methods to extract and share knowledge would transform individual account plans into collective best practices to better serve our customers.

Solution overview
To address these challenges, we designed Pulse, a generative AI solution that uses Amazon Bedrock to analyze and improve account plans. The following diagram illustrates the solution workflow.

The workflow consists of the following steps:

Account plan narrative content is pulled from our CRM system on a scheduled basis through an asynchronous batch processing pipeline.
The data flows through a series of processing stages:

Preprocessing to structure and normalize the data and generate metadata.
LLM inference to analyze content and generate insights.
Validation to confirm quality and compliance.

Results are stored securely for reporting and dashboard visualization.

We’ve integrated Pulse directly with existing sales workflows to maximize user adoption and have established feedback loops that continuously refine performance. The following diagram shows the solution architecture.

In the following sections, we explore the key components of the solution in more detail.
Ingestion
We implement a batch processing pipeline that extracts account plans from our CRM system into Amazon Simple Storage Service (Amazon S3) buckets. A scheduler triggers this pipeline on a regular cadence, facilitating continuous analysis of the most current information.
Preprocessing
Considering the dynamic nature of account plans, they are processed in daily snapshots, with only updated plans included in each run. Preprocessing is conducted at two layers: an extract, transform, and load (ETL) flow layer to organize required files to be processed, and just before model calls as part of input validation. This approach, using the plan’s last modified date, is crucial for avoiding multiple runs on the same content. The preprocessing pipeline handles the daily scheduled job that reads account plan data stored as Parquet files in Amazon S3, extracts text content from HTML fields, and generates structured metadata for each document. To optimize processing efficiency, the system compares document timestamps to process only recently modified plans, significantly reducing computational overhead and costs. The processed text content and metadata are then transformed into a standardized format and stored back to Amazon S3 as Parquet files, creating a clean dataset ready for LLM analysis.
Analysis with Amazon Bedrock
The core of our solution uses Amazon Bedrock, which provides a variety of model choices and control, data customization, safety and guardrails, cost optimization, and orchestration. We use the Amazon Bedrock FMs to perform two key functions:

Account plan evaluation – Pulse evaluates plans against 10 business-critical categories, creating a standardized Account Plan Readiness Index. This automated evaluation identifies improvement areas with specific improvement recommendations.
Actionable insights – Amazon Bedrock extracts and synthesizes patterns across plans, identifying customer strategic focus and market trends that might otherwise remain isolated in individual documents.

We implement these capabilities through asynchronous batch processing, where evaluation and summarization workloads operate independently. The evaluation process runs each account through 27 specific questions with tailored control prompts, and the summarization process generates topical overviews for straightforward consumption and knowledge sharing.
For this implementation, we use structured output prompting with schema constraints to provide consistent formatting that integrates with our reporting tools.
Validation
Our validation framework includes the following components:

Input and output validations are critical as part of the OWASP Top 10 for Large Language Model Applications. The input validation is essential by the introduction of necessary guardrails and prompt validation, and the output validation makes sure the results are structured and constrained to expected responses.
Automated quality and compliance checks against established business rules.
Additional review for outputs that don’t meet quality thresholds.
A feedback mechanism that improves system accuracy over time.

Storage and visualization
The solution includes the following storage and visualization components:

Amazon S3 provides secure storage for all processed account plans and insights.
A daily run cadence refreshes insight and enables progress tracking.
Interactive dashboards offer both executive summaries and detailed plan views.

Engineering for production: Building reliable AI evaluations
When transitioning Pulse from prototype to production, we implemented a robust engineering framework to address three critical AI-specific challenges. First, the non-deterministic nature of LLMs meant identical inputs could produce varying outputs, potentially compromising evaluation consistency. Second, account plans naturally evolve throughout the year with customer relationships, making static evaluation methods insufficient. Third, different AWS teams prioritize different aspects of account plans based on specific customer industry and business needs, requiring flexible evaluation criteria. To maintain evaluation reliability, we developed a statistical framework using Coefficient of Variation (CoV) analysis across multiple model runs on account plan inputs. The goal is to use the CoV as a correction factor to address the data dispersion, which we achieved by calculating the overall CoV at the evaluated question level. With this approach, we can scientifically measure and stabilize output variability, establish clear thresholds for selective manual reviews, and detect performance shifts requiring recalibration. Account plans falling within confidence thresholds proceed automatically in the system, and those outside established thresholds are flagged for manual review. We complemented this with a dynamic threshold weighting system that aligns evaluations with organizational priorities by assigning different weights to criteria based on business impact. This customizes thresholds across different account types—for example, applying different evaluation parameters to enterprise accounts versus mid-market accounts. These business thresholds undergo periodic review with sales leadership and adjustment based on feedback, so our AI evaluations remain relevant while maintaining quality and saving valuable time.
Conclusion
In this post, we shared how Pulse, powered by Amazon Bedrock, has transformed the account planning process for AWS sales teams. Through automated reviews and structured validation, Pulse streamlines quality assessments and breaks down knowledge silos by surfacing actionable customer intelligence across our global organization. This helps our sales teams spend less time on reviews and more time making data-driven decisions for strategic customer engagements.
Looking ahead, we’re excited to enhance Pulse’s capabilities to measure account plan execution by connecting strategic planning with sales activities and customer outcomes. By analyzing account plan narratives, we aim to identify and act on new opportunities, creating deeper insights into how strategic planning drives customer success on AWS.
We aim to continue to use the new capabilities of Amazon Bedrock for enhanced and robust improvements to our processes. By building flows for orchestrating our workflows, use of Amazon Bedrock Guardrails, introduction of agentic frameworks, and use of Strands Agents and Amazon Bedrock AgentCore, we can make a more dynamic flow in the future.
To learn more about Amazon Bedrock, refer to the Amazon Bedrock User Guide, Amazon Bedrock Workshop: AWS Code Samples, AWS Workshops, and Using generative AI on AWS for diverse content types. For the latest news on AWS, see What’s New with AWS?

About the authors
Karnika Sharma is a Senior Product Manager in the AWS Sales, Marketing, and Global Services (SMGS) org, where she works on empowering the global sales organization to accelerate customer growth with AWS. She’s passionate about bridging machine learning and AI innovation with real-world impact, building solutions that serve both business goals and broader societal needs. Outside of work, she finds joy in plein air sketching, biking, board games, and traveling.
Dayo Oguntoyinbo is a Sr. Data Scientist with the AWS Sales, Marketing, and Global Services (SMGS) Organization. He helps both AWS internal teams and external customers take advantage of the power of AI/ML technologies and solutions. Dayo brings over 12 years of cross-industry experience. He specializes in reproducible and full-lifecycle AI/ML, including generative AI solutions, with a focus on delivering measurable business impacts. He has MSc. (Tech) in Communication Engineering. Dayo is passionate about advancing generative AI/ML technologies to drive real-world impact.
Mihir Gadgil is a Senior Data Engineer in the AWS Sales, Marketing, and Global Services (SMGS) org, specializing in enterprise-scale data solutions and generative AI applications. With 9+ years of experience and a Master’s in Information Technology & Management, he focuses on building robust data pipelines, complex data modeling, and ETL/ELT processes. His expertise drives business transformation through innovative data engineering solutions, advanced analytics capabilities.
Carlos Chinchilla is a Solutions Architect at Amazon Web Services (AWS), where he works with customers across EMEA to implement AI and machine learning solutions. With a background in telecommunications engineering from the Technical University of Madrid, he focuses on building AI-powered applications using both open source frameworks and AWS services. His work includes developing AI assistants, machine learning pipelines, and helping organizations use cloud technologies for innovation.
Sofian Hamiti is a technology leader with over 10 years of experience building AI solutions, and leading high-performing teams to maximize customer outcomes. He is passionate in empowering diverse talent to drive global impact and achieve their career aspirations.
Sujit Narapareddy, Head of Data & Analytics at AWS Global Sales, is a technology leader driving global enterprise transformation. He leads data product and platform teams that power AWS’s Go-to-Market through AI-augmented analytics and intelligent automation. With a proven track record in enterprise solutions, he has transformed sales productivity, data governance, and operational excellence. Previously at JPMorgan Chase Business Banking, he shaped next-generation FinTech capabilities through data innovation.

Model Context Protocol (MCP) FAQs: Everything You Need to Know in 2025

The Model Context Protocol (MCP) has rapidly become a foundational standard for connecting large language models (LLMs) and other AI applications with the systems and data they need to be genuinely useful. In 2025, MCP is widely adopted, reshaping how enterprises, developers, and end-users experience AI-powered automation, knowledge retrieval, and real-time decision making. Below is a comprehensive, technical FAQ-style guide to MCP as of August 2025.

What Is the Model Context Protocol (MCP)?

MCP is an open, standardized protocol for secure, structured communication between AI models (such as Claude, GPT-4, and others) and external tools, services, and data sources. Think of it as a universal connector—like USB-C for AI—enabling models to access databases, APIs, file systems, business tools, and more, all through a common language. Developed by Anthropic and released as open-source in November 2024, MCP was designed to replace the fragmented landscape of custom integrations, making it easier, safer, and more scalable to connect AI to real-world systems.

Why Does MCP Matter in 2025?

Eliminates Integration Silos: Before MCP, every new data source or tool required its own custom connector. This was costly, slow, and created interoperability headaches—the so-called “NxM integration problem”.

Enhances Model Performance: By providing real-time, contextually relevant data, MCP allows AI models to answer questions, write code, analyze documents, and automate workflows with far greater accuracy and relevance.

Enables Agentic AI: MCP powers “agentic” AI systems that can autonomously interact with multiple systems, retrieve the latest information, and even take actions (e.g., update a database, send a Slack message, retrieve a file).

Supports Enterprise Adoption: Major tech players like Microsoft, Google, and OpenAI now support MCP, and adoption is surging—some estimates suggest 90% of organizations will use MCP by the end of 2025.

Drives Market Growth: The MCP ecosystem is expanding rapidly, with the market projected to grow from $1.2 billion in 2022 to $4.5 billion in 2025.

How Does MCP Work?

MCP uses a client-server architecture inspired by the Language Server Protocol (LSP), with JSON-RPC 2.0 as the underlying message format. Here’s how it works at a technical level:

Host Application: The user-facing AI application (e.g., Claude Desktop, an AI-enhanced IDE).

MCP Client: Embedded in the host app, it translates user requests into MCP protocol messages and manages connections to MCP servers.

MCP Server: Exposes specific capabilities (e.g., access to a database, a code repository, a business tool). Servers can be local (via STDIO) or remote (via HTTP+SSE).

Transport Layer: Communication happens over standard protocols (STDIO for local, HTTP+SSE for remote), with all messages in JSON-RPC 2.0 format.

Authorization: Recent MCP spec updates (June 2025) clarify how to handle secure, role-based access to MCP servers.

Example Flow:A user asks their AI assistant, “What’s the latest revenue figure?” The MCP client in the app sends a request to the MCP server connected to the company’s finance system. The server retrieves the actual, up-to-date number (not a stale training data guess) and returns it to the model, which then answers the user.

Who Creates and Maintains MCP Servers?

Developers and Organizations: Anyone can build an MCP server to expose their data or tools to AI applications. Anthropic provides SDKs, documentation, and a growing open-source repository of reference servers (e.g., for GitHub, Postgres, Google Drive).

Ecosystem Growth: Early adopters include Block, Apollo, Zed, Replit, Codeium, and Sourcegraph. These companies use MCP to let their AI agents access live data and execute real functions.

Official Registry: Plans are underway for a centralized MCP server registry, making it easier to discover and integrate available servers.

What Are the Key Benefits of MCP?

BenefitDescriptionStandardizationOne protocol for all integrations, reducing development overheadReal-Time Data AccessAI models fetch the latest information, not just training dataSecure, Role-Based AccessGranular permissions and authorization controlsScalabilityEasily add new data sources or tools without rebuilding integrationsPerformance GainsSome companies report up to 30% efficiency gains and 25% fewer errorsOpen EcosystemOpen-source, vendor-neutral, and supported by major AI providers

What Are the Technical Components of MCP?

Base Protocol: Core JSON-RPC message types for requests, responses, notifications.

SDKs: Libraries for building MCP clients and servers in various languages.

Local and Remote Modes: STDIO for local integrations, HTTP+SSE for remote.

Authorization Spec: Defines how to authenticate and authorize access to MCP servers.

Sampling (Future): Planned feature for servers to request completions from LLMs, enabling AI-to-AI collaboration.

What Are Common Use Cases for MCP in 2025?

Enterprise Knowledge Assistants: Chatbots that answer questions using the latest company documents, databases, and tools.

Developer Tools: AI-powered IDEs that can query codebases, run tests, and deploy changes directly.

Business Automation: Agents that handle customer support, procurement, or analytics by interfacing with multiple business systems.

Personal Productivity: AI assistants that manage calendars, emails, and files across different platforms.

Industry-Specific AI: Healthcare, finance, and education applications that require secure, real-time access to sensitive or regulated data.

What Are the Challenges and Limitations?

Security and Compliance: As MCP adoption grows, ensuring secure, compliant access to sensitive data is a top priority.

Maturity: The protocol is still evolving, with some features (like sampling) not yet widely supported.

Learning Curve: Developers new to MCP need to understand its architecture and JSON-RPC messaging.

Legacy System Integration: Not all older systems have MCP servers available yet, though the ecosystem is expanding rapidly.

FAQ Quick Reference

Is MCP open source? Yes, fully open-source and developed by Anthropic.

Which companies support MCP? Major players include Anthropic, Microsoft, OpenAI, Google, Block, Apollo, and many SaaS/platform providers.

Does MCP replace APIs? No, it standardizes how AI models interact with APIs and other systems—APIs still exist, but MCP provides a unified way to connect them to AI.

How do I get started with MCP? Begin with the official specification, SDKs, and open-source server examples from Anthropic.

Is MCP secure? The protocol includes authorization controls, but implementation security depends on how organizations configure their servers.

Summary

The Model Context Protocol is the backbone of modern AI integration in 2025. By standardizing how AI models access and interact with the world’s data and tools, MCP unlocks new levels of productivity, accuracy, and automation. Enterprises, developers, and end-users all benefit from a more connected, capable, and efficient AI ecosystem—one that’s only just beginning to reveal its full potential.
The post Model Context Protocol (MCP) FAQs: Everything You Need to Know in 2025 appeared first on MarkTechPost.

This AI Paper Introduces C3: A Bilingual Benchmark Dataset and Evaluat …

Spoken Dialogue Models (SDMs) are at the frontier of conversational AI, enabling seamless spoken interactions between humans and machines. Yet, as SDMs become integral to digital assistants, smart devices, and customer service bots, evaluating their true ability to handle the real-world intricacies of human dialogue remains a significant challenge. A new research paper from China introduced C3 benchmark directly addresses this gap, providing a comprehensive, bilingual evaluation suite for SDMs—emphasizing the unique difficulties inherent in spoken conversations.

The Unexplored Complexity of Spoken Dialogue

While text-based Large Language Models (LLMs) have benefited from extensive benchmarking, spoken dialogues present a distinct set of challenges:

Phonological Ambiguity: Variations in intonation, stress, pauses, and homophones can entirely alter meaning, especially across languages with tonal elements such as Chinese.

Semantic Ambiguity: Words and sentences with multiple meanings (lexical and syntactic ambiguity) demand careful disambiguation.

Omission and Coreference: Speakers often omit words or use pronouns, relying on context for understanding—a recurring challenge for AI models.

Multi-turn Interaction: Natural dialogue isn’t one-shot; understanding often accumulates over several conversational turns, requiring robust memory and coherent history tracking.

Existing benchmarks for SDMs are often limited to a single language, restricted to single-turn dialogues, and rarely address ambiguity or context-dependency, leaving large evaluation gaps.

C3 Benchmark: Dataset Design and Scope

C3—“A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations”—introduces:

1,079 instances across English and Chinese, intentionally spanning five key phenomena:

Phonological Ambiguity

Semantic Ambiguity

Omission

Coreference

Multi-turn Interaction

Audio-text paired samples enabling true spoken dialogue evaluation (with 1,586 pairs due to multi-turn settings).

Careful manual quality controls: Audio is regenerated or human-voiced to ensure uniform timbre and remove background noise.

Task-oriented instructions crafted for each type of phenomenon, urging SDMs to detect, interpret, resolve, and generate appropriately.

Balanced coverage of both languages, with Chinese examples emphasizing tone and unique referential structures not present in English.

Evaluation Methodology: LLM-as-a-Judge and Human Alignment

The research team introduces an innovative LLM-based automatic evaluation method—using strong LLMs (GPT-4o, DeepSeek-R1) to judge SDM responses, with results closely correlating with independent human evaluation (Pearson and Spearman > 0.87, p < 0.001).

Automatic Evaluation: For most tasks, output audio is transcribed and compared to reference answers by the LLM. For phenomena solely discernible in audio (e.g., intonation), humans annotate responses.

Task-specific Metrics: For omission and coreference, both detection and resolution accuracy are measured.

Reliability Testing: Multiple human raters and robust statistical validation confirm that automatic and human judges are highly consistent.

Benchmark Results: Model Performance and Key Findings

Results from evaluating six state-of-the-art end-to-end SDMs across English and Chinese reveal:

ModelTop Score (English)Top Score (Chinese)GPT-4o-Audio-Preview55.68%29.45%Qwen2.5-Omni51.91%240.08%

Analysis by Phenomena:

Ambiguity is Tougher than Context-Dependency: SDMs score significantly lower on phonological and semantic ambiguity than on omission, coreference, or multi-turn tasks—especially in Chinese, where semantic ambiguity drops below 4% accuracy.

Language Matters: All SDMs perform better on English than Chinese in most categories. The gap persists even among models designed for both languages.

Model Variation: Some models (like Qwen2.5-Omni) excel at multi-turn and context tracking, while others (like GPT-4o-Audio-Preview) dominate ambiguity resolution in English.

Omission and Coreference: Detection is usually easier than resolution/completion—demonstrating that recognizing a problem is distinct from addressing it.

Implications for Future Research

C3 conclusively demonstrates that:

Current SDMs are far from human-level in challenging conversational phenomena.

Language-specific features (especially tonal and referential aspects of Chinese) require tailored modeling and evaluation.

Benchmarking must move beyond single-turn, ambiguity-free settings.

The open-source nature of C3, along with its robust bilingual design, provides the foundation for the next wave of SDMs—enabling researchers and engineers to isolate and improve on the most challenging aspects of spoken AI.2507.22968v1.pdf

Conclusion

The C3 benchmark marks an important advancement in evaluating SDMs, pushing conversations beyond simple scripts toward the genuine messiness of human interaction. By carefully exposing models to phonological, semantic, and contextual complexity in both English and Chinese, C3 lays the groundwork for future systems that can truly understand—and participate in—complex spoken dialogue.

Check out the Paper and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post This AI Paper Introduces C3: A Bilingual Benchmark Dataset and Evaluation Framework for Complex Spoken Dialogue Modeling appeared first on MarkTechPost.

A Coding Implementation to Build a Self-Adaptive Goal-Oriented AI Agen …

In this tutorial, we dive into building an advanced AI agent system based on the SAGE framework, Self-Adaptive Goal-oriented Execution, using Google’s Gemini API. We walk through each core component of the framework: Self-Assessment, Adaptive Planning, Goal-oriented Execution, and Experience Integration. By combining these, we aim to create an intelligent, self-improving agent that can deconstruct a high-level goal, plan its steps, execute tasks methodically, and learn from its outcomes. This hands-on walkthrough helps us understand the underlying architecture and also demonstrates how to orchestrate complex decision-making using real-time AI generation. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserimport google.generativeai as genai
import json
import time
from typing import Dict, List, Any, Optional
from dataclasses import dataclass, asdict
from enum import Enum

class TaskStatus(Enum):
PENDING = “pending”
IN_PROGRESS = “in_progress”
COMPLETED = “completed”
FAILED = “failed”

We start by importing the necessary libraries, including google.generativeai for interacting with the Gemini model, and Python modules like json, time, and dataclasses for task management. We define a TaskStatus enum to help us track the progress of each task as pending, in progress, completed, or failed. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser@dataclass
class Task:
id: str
description: str
priority: int
status: TaskStatus = TaskStatus.PENDING
dependencies: List[str] = None
result: Optional[str] = None

def __post_init__(self):
if self.dependencies is None:
self.dependencies = []

class SAGEAgent:
“””Self-Adaptive Goal-oriented Execution AI Agent”””

def __init__(self, api_key: str, model_name: str = “gemini-1.5-flash”):
genai.configure(api_key=api_key)
self.model = genai.GenerativeModel(model_name)
self.memory = []
self.tasks = {}
self.context = {}
self.iteration_count = 0

def self_assess(self, goal: str, context: Dict[str, Any]) -> Dict[str, Any]:
“””S: Self-Assessment – Evaluate current state and capabilities”””
assessment_prompt = f”””
You are an AI agent conducting self-assessment. Respond ONLY with valid JSON, no additional text.

GOAL: {goal}
CONTEXT: {json.dumps(context, indent=2)}
TASKS_PROCESSED: {len(self.tasks)}

Provide assessment as JSON with these exact keys:
{{
“progress_score”: <number 0-100>,
“resources”: [“list of available resources”],
“gaps”: [“list of knowledge gaps”],
“risks”: [“list of potential risks”],
“recommendations”: [“list of next steps”]
}}
“””

response = self.model.generate_content(assessment_prompt)
try:
text = response.text.strip()
if text.startswith(‘“`’):
text = text.split(‘“`’)[1]
if text.startswith(‘json’):
text = text[4:]
text = text.strip()
return json.loads(text)
except Exception as e:
print(f”Assessment parsing error: {e}”)
return {
“progress_score”: 25,
“resources”: [“AI capabilities”, “Internet knowledge”],
“gaps”: [“Specific domain expertise”, “Real-time data”],
“risks”: [“Information accuracy”, “Scope complexity”],
“recommendations”: [“Break down into smaller tasks”, “Focus on research first”]
}

def adaptive_plan(self, goal: str, assessment: Dict[str, Any]) -> List[Task]:
“””A: Adaptive Planning – Create dynamic, context-aware task decomposition”””
planning_prompt = f”””
You are an AI task planner. Respond ONLY with valid JSON array, no additional text.

MAIN_GOAL: {goal}
ASSESSMENT: {json.dumps(assessment, indent=2)}

Create 3-4 actionable tasks as JSON array:
[
{{
“id”: “task_1”,
“description”: “Clear, specific task description”,
“priority”: 5,
“dependencies”: []
}},
{{
“id”: “task_2”,
“description”: “Another specific task”,
“priority”: 4,
“dependencies”: [“task_1”]
}}
]

Each task must have: id (string), description (string), priority (1-5), dependencies (array of strings)
“””

response = self.model.generate_content(planning_prompt)
try:
text = response.text.strip()
if text.startswith(‘“`’):
text = text.split(‘“`’)[1]
if text.startswith(‘json’):
text = text[4:]
text = text.strip()

task_data = json.loads(text)
tasks = []
for i, task_info in enumerate(task_data):
task = Task(
id=task_info.get(‘id’, f’task_{i+1}’),
description=task_info.get(‘description’, ‘Undefined task’),
priority=task_info.get(‘priority’, 3),
dependencies=task_info.get(‘dependencies’, [])
)
tasks.append(task)
return tasks
except Exception as e:
print(f”Planning parsing error: {e}”)
return [
Task(id=”research_1″, description=”Research sustainable urban gardening basics”, priority=5),
Task(id=”research_2″, description=”Identify space-efficient growing methods”, priority=4),
Task(id=”compile_1″, description=”Organize findings into structured guide”, priority=3, dependencies=[“research_1”, “research_2”])
]

def execute_goal_oriented(self, task: Task) -> str:
“””G: Goal-oriented Execution – Execute specific task with focused attention”””
execution_prompt = f”””
GOAL-ORIENTED EXECUTION:
Task: {task.description}
Priority: {task.priority}
Context: {json.dumps(self.context, indent=2)}

Execute this task step-by-step:
1. Break down the task into concrete actions
2. Execute each action methodically
3. Validate results at each step
4. Provide comprehensive output

Focus on practical, actionable results. Be specific and thorough.
“””

response = self.model.generate_content(execution_prompt)
return response.text.strip()

def integrate_experience(self, task: Task, result: str, success: bool) -> Dict[str, Any]:
“””E: Experience Integration – Learn from outcomes and update knowledge”””
integration_prompt = f”””
You are learning from task execution. Respond ONLY with valid JSON, no additional text.

TASK: {task.description}
RESULT: {result[:200]}…
SUCCESS: {success}

Provide learning insights as JSON:
{{
“learnings”: [“key insight 1”, “key insight 2”],
“patterns”: [“pattern observed 1”, “pattern observed 2”],
“adjustments”: [“adjustment for future 1”, “adjustment for future 2”],
“confidence_boost”: <number -10 to 10>
}}
“””

response = self.model.generate_content(integration_prompt)
try:
text = response.text.strip()
if text.startswith(‘“`’):
text = text.split(‘“`’)[1]
if text.startswith(‘json’):
text = text[4:]
text = text.strip()

experience = json.loads(text)
experience[‘task_id’] = task.id
experience[‘timestamp’] = time.time()
self.memory.append(experience)
return experience
except Exception as e:
print(f”Experience parsing error: {e}”)
experience = {
“learnings”: [f”Completed task: {task.description}”],
“patterns”: [“Task execution follows planned approach”],
“adjustments”: [“Continue systematic approach”],
“confidence_boost”: 5 if success else -2,
“task_id”: task.id,
“timestamp”: time.time()
}
self.memory.append(experience)
return experience

def execute_sage_cycle(self, goal: str, max_iterations: int = 3) -> Dict[str, Any]:
“””Execute complete SAGE cycle for goal achievement”””
print(f” Starting SAGE cycle for goal: {goal}”)
results = {“goal”: goal, “iterations”: [], “final_status”: “unknown”}

for iteration in range(max_iterations):
self.iteration_count += 1
print(f”n SAGE Iteration {iteration + 1}”)

print(” Self-Assessment…”)
assessment = self.self_assess(goal, self.context)
print(f”Progress Score: {assessment.get(‘progress_score’, 0)}/100″)

print(” Adaptive Planning…”)
tasks = self.adaptive_plan(goal, assessment)
print(f”Generated {len(tasks)} tasks”)

print(” Goal-oriented Execution…”)
iteration_results = []

for task in sorted(tasks, key=lambda x: x.priority, reverse=True):
if self._dependencies_met(task):
print(f” Executing: {task.description}”)
task.status = TaskStatus.IN_PROGRESS

try:
result = self.execute_goal_oriented(task)
task.result = result
task.status = TaskStatus.COMPLETED
success = True
print(f” Completed: {task.id}”)
except Exception as e:
task.status = TaskStatus.FAILED
task.result = f”Error: {str(e)}”
success = False
print(f” Failed: {task.id}”)

experience = self.integrate_experience(task, task.result, success)

self.tasks[task.id] = task
iteration_results.append({
“task”: asdict(task),
“experience”: experience
})

self._update_context(iteration_results)

results[“iterations”].append({
“iteration”: iteration + 1,
“assessment”: assessment,
“tasks_generated”: len(tasks),
“tasks_completed”: len([r for r in iteration_results if r[“task”][“status”] == “completed”]),
“results”: iteration_results
})

if assessment.get(‘progress_score’, 0) >= 90:
results[“final_status”] = “achieved”
print(” Goal achieved!”)
break

if results[“final_status”] == “unknown”:
results[“final_status”] = “in_progress”

return results

def _dependencies_met(self, task: Task) -> bool:
“””Check if task dependencies are satisfied”””
for dep_id in task.dependencies:
if dep_id not in self.tasks or self.tasks[dep_id].status != TaskStatus.COMPLETED:
return False
return True

def _update_context(self, results: List[Dict[str, Any]]):
“””Update agent context based on execution results”””
completed_tasks = [r for r in results if r[“task”][“status”] == “completed”]
self.context.update({
“completed_tasks”: len(completed_tasks),
“total_tasks”: len(self.tasks),
“success_rate”: len(completed_tasks) / len(results) if results else 0,
“last_update”: time.time()
})

We define a Task data class to encapsulate each unit of work, including its ID, description, priority, and dependencies. Then, we build the SAGEAgent class, which serves as the brain of our framework. It orchestrates the full cycle, self-assessing progress, planning adaptive tasks, executing each task with focus, and learning from outcomes to improve performance in future iterations. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserif __name__ == “__main__”:
API_KEY = “Use Your Own API Key Here”

try:
agent = SAGEAgent(API_KEY, model_name=”gemini-1.5-flash”)

goal = “Research and create a comprehensive guide on sustainable urban gardening practices”

results = agent.execute_sage_cycle(goal, max_iterations=2)

print(“n” + “=”*50)
print(” SAGE EXECUTION SUMMARY”)
print(“=”*50)
print(f”Goal: {results[‘goal’]}”)
print(f”Status: {results[‘final_status’]}”)
print(f”Iterations: {len(results[‘iterations’])}”)

for i, iteration in enumerate(results[‘iterations’], 1):
print(f”nIteration {i}:”)
print(f” Assessment Score: {iteration[‘assessment’].get(‘progress_score’, 0)}/100″)
print(f” Tasks Generated: {iteration[‘tasks_generated’]}”)
print(f” Tasks Completed: {iteration[‘tasks_completed’]}”)

print(“n Agent Memory Entries:”, len(agent.memory))
print(” Total Tasks Processed:”, len(agent.tasks))

except Exception as e:
print(f”Demo requires valid Gemini API key. Error: {e}”)
print(“Get your free API key from: https://makersuite.google.com/app/apikey”)

We wrap up the tutorial by initializing the SAGEAgent with our Gemini API key and defining a sample goal on sustainable urban gardening. We then execute the full SAGE cycle and print a detailed summary, including progress scores, task counts, and memory insights, allowing us to evaluate how effectively our agent performed across iterations.

In conclusion, we successfully implemented and ran a complete SAGE cycle with our Gemini-powered agent. We observe how the system assesses its progress, dynamically generates actionable tasks, executes them with precision, and refines its strategy through learned experience. This modular design empowers us to extend the framework further for more complex, multi-agent environments or domain-specific applications.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post A Coding Implementation to Build a Self-Adaptive Goal-Oriented AI Agent Using Google Gemini and the SAGE Framework appeared first on MarkTechPost.

Pioneering AI workflows at scale: A deep dive into Asana AI Studio and …

Organizations today face a critical challenge: managing an ever-increasing volume of tasks and information across multiple systems. Although traditional task management tools help organize work, they often fall short in delivering the intelligence needed for truly efficient operations.
Today, we’re excited to announce the integration of Asana AI Studio with Amazon Q index, bringing generative AI directly into your daily workflows. This dynamic combination helps teams work smarter by powering AI workflows at scale with data from everyday applications. The result is seamless automation for key use cases like project intake, campaign management, and product launches, transforming how teams work and deliver results.
In this post, we explore how Asana AI Studio and Amazon Q index transform enterprise efficiency through intelligent workflow automation and enhanced data accessibility. We start by examining the core capabilities of this powerful integration. Then we dive deep into the technical implementation and a step-by-step process of how to set up the Amazon Q Business data accessor and configure the Asana admin console. Throughout the post, we cover essential topics including security considerations, access controls, and best practices for maximizing the value of this integration.
Whether you’re looking to streamline operations, improve decision-making, or break down data silos, this comprehensive guide will show you how to harness the full potential of Asana AI Studio and Amazon Q index in your enterprise environment.
The power of integrating Asana AI Studio with Amazon Q index
Asana AI Studio represents a breakthrough in no-code automation, helping teams create and deploy AI-powered workflows that streamline operations and minimize routine busywork. As the leading work management service for human and AI coordination, Asana offers enterprise-grade solutions that break down traditional data silos and enhance collaboration through AI-driven automation and insights.
Amazon Q index for independent software vendors (ISVs) provides seamless integration of generative AI applications with enterprise data and metadata through an Amazon Q index, so you can search across your application data alongside other enterprise content. This integration capability makes sure ISVs can provide their customers with a unified search experience while maintaining strict security, access controls, and ownership over their data.
With the refined indexing of Amazon Q combined with Asana’s comprehensive work management data, organizations can transform how they harness the power of their data.
Through the Amazon Q Business data accessor capability, Asana can securely tap into and analyze information from diverse business applications, converting previously isolated data into meaningful business intelligence. This unlocks key use cases like project intake, campaign management, and product launches, all while maintaining data security.
The following video demonstrates this solution in action, as a team member uses Asana to quickly access project information, create automated workflows, and receive AI-powered recommendations for task optimization across connected services.

How the integration benefits enterprises
Organizations can transform scattered information into real business outcomes with the combined power of Asana AI Studio and Amazon Q index:

Connect applications in minutes – Use cross-application data quickly and confidently with one-time setup and secure permissions
Build integrated workflows – Design AI-powered workflows that unify teams and applications, with no code required
Scale AI company-wide safely and securely – Access real-time data and insights where your teams already work while the system intelligently maintains strict security protocols and existing access control list (ACL) permissions to help keep sensitive information protected

Spencer Herrick, Principal AI Product Manager at Asana, shares:

“At Asana, we’re committed to helping teams move faster with clarity and confidence. Integrating Amazon Q index into our AI infrastructure is a powerful step forward—it allows us to unify knowledge scattered across enterprise systems and surface highly relevant insights directly in workflows. This not only accelerates decision-making but also improves the quality of execution across teams. AWS has been an exceptional partner throughout this process, and we’re thrilled by the early results and the opportunity to continue innovating on top of this foundation.”

Solution overview
The Amazon Q Business data accessor functions as a secure gateway, linking enterprise tools to the Amazon Q index. This component offers organizations a protected channel through which Asana AI Studio can retrieve information from their Amazon Q index. This enables Asana’s AI features to provide relevant answers to user queries by incorporating data from various connected systems.
The result is an AI-powered experience that combines Asana’s work management capabilities with your organization’s broader information landscape that uses enterprise data across multiple systems. When a user asks a question through smart chat or triggers an AI Studio workflow, Asana’s AI orchestrator processes the query, checking both Asana’s native data for relevant information as well as the Amazon Q Business data accessor to search the Amazon Q index, hosted across the customer’s AWS environment. The index can aggregate data from various enterprise systems such as Google Workspace, Microsoft 365, and Salesforce using built-in Amazon Q Business connectors. The orchestrator then uses generative AI to provide users with contextual, actionable insights directly within the Asana system. With this integration, teams can quickly access project documentation, communication history, and other critical information, enhancing productivity and decision-making efficiency across the aspects of work management.
The following diagram illustrates the overall solution integrating Asana AI Studio and Amazon Q index in the customer’s environment.

Prerequisites
To use the Amazon Q index integration with Asana’s AI features, you must have the following in place:

An AWS account with the necessary permissions and service access
An active Amazon Q Business setup, configured with AWS IAM Identity Center for user authentication
Asana’s AI features enabled
Super admin rights within your Asana workspace

The Amazon Q Business data accessor facilitates a smooth connection between Asana’s AI tools and your Amazon Q index. To implement this integration, you must perform some initial setup in both the Amazon Q Business environment and your Asana admin console. This typically involves configuring authentication methods, specifying which data sources to include, and setting appropriate access controls.
For a comprehensive guide on setting up Amazon Q Business, refer to Innovate on enterprise data with generative AI & Amazon Q Business application. For Asana-specific steps, you can find detailed instructions in the Asana Help Center for Amazon Q index, but we also cover key steps in this post.
Add Asana as a data accessor in Amazon Q Business
Complete the following steps to add Asana as an Amazon Q Business data accessor:

On the Amazon Q Business console, choose Applications in the navigation pane.
Open the application you want to add a data accessor to.
Choose Data accessors in the navigation pane.

Choose Add data accessor.
For Data accessors, choose Asana.

For External ID, enter the external ID provided to you by Asana.

In Asana, the tenant ID (or external ID) in the Amazon Q Business data accessor is called the domain ID. To retrieve the Asana domain ID, choose your account profile in Asana and choose Admin Console. Choose Settings, and scroll to Domain settings to retrieve the ID.

Choose Create trusted token issuer.

For Trusted token issuer name, enter the name of the trusted token issuer, then choose Create trusted token issuer.

For Data source access, select Allow access to all data sources.
For User access, select All users with application access, or select Define access based on user and groups for more granular access control.
Choose Add data accessor.

After you add Asana as a data accessor successfully, a pop-up window will appear with data accessor configuration details. Share the data accessor configuration details with Asana, because it uses them to make a secure connection from the Asana AI Studio system through the Amazon Q index SearchRelevantContent API to retrieve content from your Amazon Q index.

Connect an Amazon Q index to the Asana admin console
Connecting an Amazon Q index to Asana requires super admin permission. Complete the following steps to connect:

Log in to the Asana admin console
Choose Settings, Asana AI, and Data connectors.
Choose Link account.
Fill in the required fields with information gathered from the Amazon Q Business console.
Choose Verify connection.

After an Amazon Q index is connected to Asana, the option Data to use from connected apps will appear in the rule builder settings within AI Studio. You can now start running workflow rules with enriched context from third-party data sources like Google Drive, Microsoft Outlook, and Salesforce, as shown in the following screenshot.

You can also use Asana smart chat in your Asana workspace and start asking questions. While interacting, smart chat will automatically search for relevant information across Asana and the connected third-party data sources like Google Drive using Amazon Q index to provide you with instant answers, as shown in the following screenshot.

Clean up
When you’re done using this solution, clean up the resources you created:

On your Asana admin page, navigate to Settings, Asana AI, Data connectors, and disconnect Amazon Q.

On Amazon Q Business console, delete the Asana data accessor from the Data accessors page. Deleting this data accessor will delete permissions and access to the data accessor for users.
Delete the Amazon Q Business application that you created as a prerequisite on the Applications page. Deleting the Amazon Q Business application will remove the associated index and data source connectors and avoid incurring additional costs.

Conclusion
This new Asana and Amazon Q index integration marks a significant advancement in intelligent work management. By merging Asana’s robust work management system with Amazon Q index’s comprehensive search capabilities, organizations can now effortlessly access and use information previously scattered across multiple systems. AI Studio users can craft more intelligent workflows by using external data through the new Data to use from connected apps option in the rule builder, and smart chat users can benefit from automatically cited responses that draw from both Asana and connected third-party sources. All of this is achieved while maintaining security and data ownership, effectively breaking down information silos that often hinder collaboration and efficiency.
Whether you’re aiming to increase efficiency, optimize workflows, or make more informed decisions, the Asana and Amazon Q index integration provides the necessary tools to transform how your organization works. We encourage you to explore the Amazon Q Business console and Asana’s AI features documentation to unlock the full potential of your connected workplace. By embracing this powerful integration, you’re not just improving your work management, you’re setting the stage for a more intelligent and efficient future for your team.

About the Authors
Spencer Herrick is Principal AI Product Manager at Asana. He leads the development of AI-first products that help organizations seamlessly integrate new AI capabilities into collaborative work management. His journey into AI began not in Silicon Valley, but at the Large Hadron Collider in Switzerland, where he worked as an experimental physicist. Since then, Spencer has built and led innovative product teams at Grammarly, Uber, and Postmates, consistently pushing the boundaries of what’s possible at the intersection of AI, personalization, and user experience.
Sourabh Banerjee is a Senior WW GenAI specialist at AWS with over 10 years of experience in leading product and GTM strategy. Sourabh leads global strategy and execution for helping customers and partners unlock business productivity through Amazon Q Business across industries.
Chinmayee Rane is a Generative AI Specialist Solutions Architect at AWS, with a core focus on generative AI. She helps ISVs accelerate the adoption of generative AI by designing scalable and impactful solutions. With a strong background in applied mathematics and machine learning, she specializes in intelligent document processing and AI-driven innovation. Outside of work, she enjoys salsa and bachata dancing.
Bobby Williams is a Senior Solutions Architect at AWS. He has decades of experience designing, building, and supporting enterprise software solutions that scale globally. He works on solutions across industry verticals and horizontals and is driven to create a delightful experience for every customer.
Akhilesh Amara is a Software Development Engineer on the Amazon Q team based in Seattle, WA. He is contributing to the development and enhancement of intelligent and innovative AI tools.
Chanki Nathani is a Sr. Partner Solutions Architect at AWS, focusing on business applications. With over 10 years of experience, he helps ISVs build and scale their SaaS solutions on AWS, specializing in serverless architectures and generative AI implementations. Chanki helps organizations use AWS AI/ML services, including Amazon Bedrock and Amazon Q Business, to transform their applications and enhance user experiences.

Responsible AI for the payments industry – Part 1

The payments industry stands at the forefront of digital transformation, with artificial intelligence (AI) rapidly becoming a cornerstone technology that powers a variety of solutions, from fraud detection to customer service. According to the following Number Analytics report, digital payment transactions are projected to exceed $15 trillion globally by 2027. Generative AI has expanded the scope and urgency of responsible AI in payments, introducing new considerations around content generation, conversational interfaces, and other complex dimensions. As financial institutions and payment solutions providers increasingly adopt AI solutions to enhance efficiency, improve security, and deliver personalized experiences, the responsible implementation of these technologies becomes paramount. According to the following McKinsey report, AI could add an estimated $13 trillion to the global economy by 2030, representing about a 16% increase in cumulative GDP compared with today. This translates to approximately 1.2% additional GDP growth per year through 2030.
AI in payments helps drive technological advancement and strengthens building trust. When customers entrust their financial data and transactions to payment systems, they expect convenience and security, additionally fairness, transparency, and respect for their privacy. AWS recognizes the critical demands facing payment services and solution providers, offering frameworks that can help executives and AI practitioners transform responsible AI into a potential competitive advantage. The following Accenture report has additional statistics and data about responsible AI.
This post explores the unique challenges facing the payments industry in scaling AI adoption, the regulatory considerations that shape implementation decisions, and practical approaches to applying responsible AI principles. In Part 2, we provide practical implementation strategies to operationalize responsible AI within your payment systems.
Payment industry challenges
The payments industry presents a unique landscape for AI implementation, where the stakes are high and the potential impact on individuals is significant. Payment technologies directly impact consumers’ financial transactions and merchant options, making responsible AI practices an important consideration and a critical necessity.
The payments landscape—encompassing consumers, merchants, payment networks, issuers, banks, and payment processors—faces several challenges when implementing AI solutions:

Data classification and privacy – Payment data is among the most sensitive information. In addition to financial details, it also includes patterns that can reveal personal behaviors, preferences, and life circumstances. Due to various regulations, AI systems that process these data systems are required to maintain the highest standards of privacy protection and data security.
Real-time processing requirements – Payment systems often require split-second decisions, such as approving a transaction, flagging potential fraud, or routing payments. Production AI systems seek to deliver high standards for accuracy, latency, and cost while maintaining security and minimizing friction. This is important because failed transactions or incorrect decisions might result in poor customer experience or other financial loss.
Global operational context – Payment providers often operate across jurisdictions with varying regulatory frameworks and standards. These include India’s Unified Payments Interface (UPI), Brazil’s PIX instant payment system, the United States’ FedNow and Real-Time Payments (RTP) networks, and the European Union’s Payment Services Directive (PSD2) and Single Euro Payments Area (SEPA) regulations. AI systems should be adaptable enough to function appropriately across these diverse contexts while adhering to consistent responsible standards.
Financial inclusion imperatives – The payment industry seeks to expand access to financial services for their customers. It’s important to design AI systems that promote inclusive financial access by mitigating bias and discriminatory outcomes. Responsible AI considerations can help create equitable opportunities while delivering frictionless experiences for diverse communities.
Regulatory landscape – The payments industry navigates one of the economy’s most stringent regulatory environments, with AI implementation adding new layers of compliance requirements:

Global regulatory frameworks – From the EU’s General Data Protection Regulation (GDPR) and the upcoming EU AI Act to the Consumer Financial Protection Bureau (CFPB) guidelines in the US, payment solution providers navigate disparate global requirements, presenting a unique challenge for scaling AI usage across the globe.
Explainability requirements – Regulators increasingly demand that financial institutions be able to explain AI-driven decisions, especially those that impact consumers directly, like multimodal AI for combining biometric, behavioral, and contextual authentication.
Anti-discrimination mandates – Financial regulations in many jurisdictions explicitly prohibit discriminatory practices. AI systems should be designed and monitored to help prevent inadvertent bias in decisions related to payment approvals and comply with fair lending laws.
Model risk management – Regulatory frameworks like Regulation E in the US require financial institutions to validate models, including AI systems, and maintain robust governance processes around their development, implementation, and ongoing monitoring.

The regulatory landscape for AI in financial services continues to evolve rapidly. Payment providers strive to stay abreast of changes and maintain flexible systems that can adapt to new requirements.
Core principles of responsible AI
In the following sections, we review how responsible AI considerations can be applied in the payment industry. The core principles include controllability, privacy and security, safety, fairness, veracity and robustness, explainability, transparency, and governance, as illustrated in the following figure.

Controllability
Controllability refers to the extent to which an AI system behaves as designed, without deviating from its functional objectives and constraints. Controllability promotes practices that keep AI systems within designed limits while maintaining human control. This principle requires robust human oversight mechanisms, allowing for intervention, modification, and fine-grained control over AI-driven financial processes. In practice, this means creating sophisticated review workflows, establishing clear human-in-the-loop protocols for high-stakes financial decisions, and maintaining the ability to override or modify AI recommendations when necessary.
In the payment industry, you can apply controllability in the following ways:

Create human review workflows for high-value or unusual transactions using Amazon Augmented AI (Amazon A2I). For more details, see Automate digitization of transactional documents with human oversight using Amazon Textract and Amazon A2I.
Develop override mechanisms for AI-generated fraud alerts. One possible approach could be implementing a human-in-the-loop system. For an example implementation, refer to Implement human-in-the-loop confirmation with Amazon Bedrock Agents.
Establish clear protocols to flag and escalate AI-related decisions that impact customer financial health. This can help establish a defined path to take in the case of any discrepancy or anomalies.
Implement configurable AI systems that can be adjusted based on specific institutional policies. This can help make sure the AI systems are agile and flexible with ever-evolving changes, which can be configurable to steer model behavior accordingly.
Design user interfaces (UIs) in which users can provide context or challenge AI-driven decisions.

Privacy and security: Protecting consumer information
Given the sensitive nature of financial data, privacy and security represent a critical consideration in AI-driven payment systems. A multi-layered protection strategy might include advanced encryption protocols, rigorous data minimization techniques, and comprehensive safeguards for personally identifiable information (PII). Compliance with global data protection regulations represents a legal requirement and is also a fundamental commitment to responsibly protecting individuals’ most sensitive financial information.
In the payment industry, you can maintain privacy and security with the following methods:

Implement advanced encryption for all transaction data. Use AWS Key Management Service (AWS KMS) for encrypting data at rest and Transport Layer Security (TLS) for encrypting data in transit.
Implement environment segmentation for a multi-layered protection strategy.
Apply differential privacy techniques. For more details, see How differential privacy helps unlock insights without revealing data at the individual-level.
Create anonymized datasets for AI training that can’t be traced back to individual customers.
Develop secure multi-factor authentication systems powered by AI. This can be done using multi-factor authentication for IAM.

Safety: Mitigating potential risks
Safety in AI-driven payment systems focuses on proactively identifying and mitigating potential risks. This involves developing comprehensive risk assessment frameworks (such as NIST AI Risk Management Framework, which provides structured approaches to govern, map, measure, and manage AI risks), implementing advanced guardrails to help prevent unintended system behaviors, and creating fail-safe mechanisms that protect both payment solutions providers and users from potential AI-related vulnerabilities. The goal is to create AI systems that work well and are fundamentally reliable and trustworthy.
In the payment industry, you can implement safety measures as follows:

Develop guardrails to help prevent unauthorized transaction patterns. One possible way is using Amazon Bedrock Guardrails. For an example solution, see Implement model-independent safety measures with Amazon Bedrock Guardrails.
Create AI systems that can detect and help prevent potential financial fraud in real-time.
Implement multi-layered risk assessment models for complex financial products. One possible method is using an Amazon SageMaker inference pipeline.
Design fail-safe mechanisms that can halt AI decision-making during anomalous conditions. This can be done by architecting the system to determine anomalous behavior, flagging it, and possibly adding a human in the loop for those transactions.
Implement red teaming and perform penetration testing to identify potential system vulnerabilities before they can be exploited.

Fairness: Detect and mitigate bias
To create a more inclusive financial landscape and promote demographic parity, fairness should be a key consideration in payments. Financial institutions are required to rigorously examine their AI systems to mitigate potential bias or discriminatory outcomes across demographic groups. This means algorithms and training data for applications such as credit scoring, loan approval, or fraud detection should be carefully calibrated and meticulously assessed for biases.
In the payment industry, you can implement fairness in the following ways:

Assess models and data for the presence and utilization of attributes such as gender, race, or socioeconomic background to promote demographic parity. Tools such as Amazon Bedrock Evaluations or Amazon SageMaker Clarify can help evaluate and assess the application’s bias in data and model output.
Implement observability, monitoring, and alerts using AWS services like Amazon CloudWatch to support regulatory compliance and provide non-discriminatory opportunities across customer demographics.
Evaluate data used for model training for biases using tools like SageMaker Clarify to correct and mitigate disparities.

These guidelines can be applied for various payment applications and processes, including fraud detection, loan approval, financial risk assessment, credit scoring, and more.
Veracity and robustness: Promoting accuracy and reliability
Truthful and accurate system output is an important consideration for AI in payment systems. By continuously validating AI models, organizations can make sure that financial predictions, risk assessments, and transaction analyses maintain consistent accuracy over time. To achieve robustness, AI systems must maintain performance across diverse scenarios, handle unexpected inputs, and adapt to changing financial landscapes without compromising accuracy or reliability.
In the payment industry, you can apply robustness through the following methods:

Create AI models that maintain accuracy across diverse economic conditions.
Implement rigorous testing protocols that simulate various financial scenarios. For example test tools, refer to Test automation.
Create cross-validation mechanisms to verify AI model predictions. SageMaker provides built-in cross-validation capabilities, experiment tracking, and continuous model monitoring, and AWS Step Functions orchestrates complex validation workflows across multiple methods. For critical predictions, Amazon A2I enables human-in-the-loop validation.
Use Retrieval Augmented Generation (RAG) and Amazon Bedrock Knowledge Bases to improve accuracy of AI-powered payment decision systems, reducing the risk of hallucinations.

Explainability: Making complex decisions understandable
Explainability bridges the gap between complex AI algorithms and human understanding. In payments, this means developing AI systems can articulate the reasoning behind its decisions in clear, understandable terms. AI should provide insights that are meaningful and accessible to users and financial professionals explaining a risk calculation, fraud detection flag, or transaction recommendation depending on the business use case.
In the payment industry, you can implement explainability as follows:

Generate consumer-friendly reports that break down complex financial algorithms.
Create interactive tools so users can explore the factors behind their financial assessments.
Develop visualization tools that demonstrate how AI arrives at specific financial recommendations.
Provide regulatory compliance-aligned documentation that explains AI model methodologies.
Design multilevel explanation systems that cater to both technical and non-technical audiences.

Transparency: Articulate the decision-making process
Transparency refers to providing clear, accessible, and meaningful information that helps stakeholders understand the system’s capabilities, limitations, and potential impacts. Transparency transforms AI from an opaque black box into a human understandable, communicative system. In the payments sector, this principle demands that AI-powered financial decisions be both accurate and explicable. Financial institutions should be able to evidence how credit limits are determined, why a transaction might be flagged, or how a financial risk assessment is calculated.
In the payment industry, you can promote transparency in the following ways:

Create interactive dashboards that break down how AI calculates transaction risks. You can use services like Amazon QuickSight to build interactive dashboards and data stories. You can use SageMaker for feature importance summary or SHAP (SHapley Additive exPlanations) reports that quantify how much each input feature contributes to a model’s prediction for a specific instance.
Offer real-time notifications that explain why a transaction was flagged or declined. You can send notifications using Amazon Simple Notification Service (Amazon SNS).
Develop customer-facing tools that help users understand the factors influencing their credit scores. AI agents can provide interactive feedback about the factors involved and deliver more details to users. You can build these AI agents using Amazon Bedrock.

Governance: Promoting accuracy and reliability
Governance establishes the framework for responsible AI implementation and ongoing monitoring and management. In payments, this means creating clear structures for AI oversight, defining roles and responsibilities, and establishing processes for regular review and intervention when necessary. Effective governance makes sure AI systems operate within established responsible AI boundaries while maintaining alignment with organizational values and regulatory requirements.
In the payment industry, you can apply governance as follows:

Implement cross-functional AI review boards with representation from legal, compliance, and ethics teams.
Establish clear escalation paths for AI-related decisions that require human judgment.
Develop comprehensive documentation of AI system capabilities, limitations, and risk profiles.
Create regular audit schedules to evaluate AI performance against responsible AI dimensions.
Design feedback mechanisms that incorporate stakeholder input into AI governance processes.
Maintain version control and change management protocols for AI model updates.

Conclusion
As we’ve explored throughout this guide, responsible AI in the payments industry represents both a strategic imperative and competitive advantage. By embracing the core principles of controllability, privacy, safety, fairness, veracity, explainability, transparency, and governance, payment providers can build AI systems that enhance efficiency and security, and additionally foster trust with customers and regulators. In an industry where financial data sensitivity and real-time decision-making intersect with global regulatory frameworks, those who prioritize responsible AI practices will be better positioned to navigate challenges while delivering innovative solutions. We invite you to assess your organization’s current AI implementation against these principles and refer to Part 2 of this series, where we provide practical implementation strategies to operationalize responsible AI within your payment systems.
As the payments landscape continues to evolve, organizations that establish responsible AI as a core competency will mitigate risks and build stronger customer relationships based on trust and transparency. In an industry where trust is the ultimate currency, responsible AI is a responsible choice and an important business imperative.
To learn more about responsible AI, refer to the AWS Responsible Use of AI Guide.

About the authors
Neelam Koshiya Neelam Koshiya is principal Applied AI Architect (GenAI specialist) at AWS. With a background in software engineering, she moved organically into an architecture role. Her current focus is to help enterprise customers with their ML/ genAI journeys for strategic business outcomes. She likes to build content/mechanisms to scale to larger audience. She is passionate about innovation and inclusion. In her spare time, she enjoys reading and being outdoors.
Ana Gosseen Ana is a Solutions Architect at AWS who partners with independent software vendors in the public sector space. She leverages her background in data management and information sciences to guide organizations through technology modernization journeys, with particular focus on generative AI implementation. She is passionate about driving innovation in the public sector while championing responsible AI adoption. She spends her free time exploring the outdoors with her family and dog, and pursuing her passion for reading.

Responsible AI for the payments industry – Part 2

In Part 1 of our series, we explored the foundational concepts of responsible AI in the payments industry. In this post, we discuss the practical implementation of responsible AI frameworks.
The need for responsible AI
The implementation of responsible AI is not passive, but a dynamic process of reimagining how technology can serve a customer’s needs. With a holistic approach that extends beyond traditional boundaries of technology, responsibility, law, and customer experience, AI can become a powerful, transparent, and trustworthy partner in financial decision-making. Responsible AI is an additional layer and a core architectural principle that influences every stage of product development. This means redesigning development processes to include responsibility assessment checkpoints. Bias testing becomes as critical as functional testing. In addition to technical specifications, documentation now requires comprehensive explanations of decision-making processes. Accountability is built into the system’s core, with clear mechanisms for tracking and addressing potential responsibility challenges. Tenants of Responsible AI should be thought as a part of the product management and application development life cycle as highlighted in the picture below:

In the following sections, we provide several recommendations for responsible AI.
The AI Responsible Committee
Consider establishing an AI Responsible Committee for your financial institution. This cross-functional body can serve as a central hub for AI governance, bringing together experts from various disciplines to guide responsible AI innovation and support alignment with responsible AI practices.
Cross-functional oversight: Dismantling organizational boundaries
Traditional organizational structures can create barriers that fragment technological development. Cross-functional oversight breaks down these silos, creating integrated workflows that promote responsible considerations in the AI development process.
This approach might require reenvisioning how different departments collaborate. Doing so can help you integrate compliance as part of a larger AI development process, rather than as final checkpoints. In this setting, legal teams have an opportunity to be strategic partners, and customer experience professionals become translators between technological capabilities and human needs.
The result is a holistic approach where responsible considerations are not added retrospectively and are fundamental to the design process. Every AI system becomes a collaborative creation, refined through multiple lenses of expertise.
Policy documentation: Transforming principles into operational excellence
Policy documentation can help promote frameworks that guide technological innovation. These documents serve as comprehensive blueprints that translate abstract principles into actionable guidelines.
An effective AI policy articulates an organization’s approach to technological development, establishing clear principles around data usage, transparency, fairness, and human-centric design. These policies can also reflect an organization’s commitment to responsible innovation.
Responsible AI as organizational leadership
By creating responsibly grounded, adaptive AI systems, financial institutions can transform technology from a potentially disruptive force into a powerful tool for creating more inclusive, transparent, and trustworthy financial systems. Responsible AI is a continuous journey of innovation, reflection, and commitment to creating technology that helps human achieve their objectives.
Global collaborative landscape
The landscape of responsible AI in financial services is rapidly evolving, driven by a network of organizations, regulators, and industry leaders committed to transforming technological innovation to be responsible, transparent, and socially responsible. From non-profit initiatives like the Responsible AI Institute to industry consortiums such as the Veritas Consortium led by the Monetary Authority of Singapore, these organizations are developing comprehensive frameworks, governance models, and best practices that move past traditional compliance mechanisms, creating holistic approaches to AI implementation that prioritize fairness, accountability, and human-centric design.
This emerging landscape represents a fundamental shift in innovation, with regulators, tech companies, academia, and industry working together to establish AI standards while driving innovation. By developing detailed methodologies for assessing AI solutions, creating open-source governance frameworks, and establishing dedicated committees, these initiatives are mitigating risks. Additionally, this will help in actively shaping a future where AI serves as a powerful, trustworthy tool that enhances financial services while protecting individual rights and societal interests. The collective goal is to make sure AI technologies in payments are developed and deployed with a commitment to transparency, fairness, and responsible considerations. Organizations can establish dedicated mechanisms for monitoring global developments, participating in industry working groups, and maintaining ongoing dialogues with regulatory bodies, academic researchers, and responsible AI experts to make sure their AI strategies remain at the forefront of responsible technological innovation.
AI lifecycle phases
The following figure illustrates the different phases in the AI lifecycle, comprising design, development, deployment, and operation.

In the following sections, we discuss these phases in more detail.
Design phase
The design phase establishes the foundation for AI systems. In this phase, AI builders should consider assessing risks through frameworks like NIST AI Risk Management Framework. This includes documenting and narrowly define use cases, stakeholders, risks, and mitigation strategies while recognizing AI’s probabilistic nature, technical limitations, confidence levels, and human review requirements.
In payments and financial services, risk assessment can help identify harmful events for use cases such as fraud detection, transaction authentication, and credit decisioning systems. For example, in use cases where binary outcomes are generated, the design should carefully balance false positives that could block legitimate transactions against false negatives that allow fraudulent ones. Financial regulators often require explainability in automated decisioning processes affecting consumers, adding another layer of complexity to the design considerations.
The following figure shows example of a decision boundary visualization or classification boundary plot. It’s a type of scatter plot that displays the training data points (as colored dots) and the decision regions created by different machine learning (ML) classifiers (as colored background regions). This visualization technique is commonly used in ML to compare how different algorithms partition the feature space and make classification decisions. Similar plots can help with responsible AI by making algorithmic decision-making transparent and interpretable, helping stakeholders visually understand how different models create boundaries and potentially differ between groups.
Additionally, here is visualization comparing performance of various ML algorithms.

Development phase
The development phase involves collecting and curating training and testing data, building system components, and adapting AI systems into functional applications through an iterative process. Builders define explainability requirements based on risk levels, develop metrics and test plans, and promote data representativeness across demographics and geographies.
Payment AI systems specifically require highly representative training data spanning transaction types, merchant categories, geographic regions, and spending patterns. Data security is paramount, with secure storage measures to protect data. Testing should incorporate diverse scenarios like unusual transaction patterns, and performance assessment should use multiple datasets and metrics. Development also includes implementing fairness measures to mitigate bias in credit decisions or fraud flagging, with comprehensive adversarial testing (also known as red teaming) to identify vulnerabilities that could enable financial crime. Adversarial testing is a security evaluation method that involves actively attempting to break or exploit vulnerabilities in a system, particularly in AI and ML models. It involves simulating attacks to identify weaknesses and improve the robustness and security of the system. This proactive approach helps uncover potential flaws that might be exploited by malicious actors. The following screenshot illustrates experimentation tracking and a training loss plot in Amazon SageMaker Studio.
Deployment phase
In the development phase, AI systems move into production environments with careful consideration for confidence indicators and human review processes. Before live deployment, systems should undergo testing in operational environments with attention to localization needs across different regions.
In payment applications, deployers are encouraged validate performance, monitor concept drift as user behavior changes over time, and maintain version control with documented rollback processes to address unexpected issues during updates. Deployment includes establishing clear thresholds for human intervention, particularly for high-value transactions or unusual activity patterns that fall outside normal parameters, with localization for different markets’ payment behaviors and regulatory requirements.
The following graph is an example of using Amazon Sagemaker Model Monitor to monitor data and model drift.

Operation phase
The operation phase covers ongoing system management after deployment. System owners should notify users about AI interactions, consider providing opt-out options, and maintain accessibility for the intended users. This phase establishes feedback mechanisms through in-system tools or third-party outreach for continuous and thorough testing for improvement.
The operation phase for payment AI systems includes transparent communication with customers about AI-driven decisions affecting their accounts. Continuous monitoring tracks concept drift as payment patterns evolve with new technologies, merchants, or consumer behaviors. Feedback mechanisms capture both customer complaints and successful fraud prevention cases to refine models. Safeguarding mechanisms like guardrails enhance safety by constraining inputs or outputs within predefined boundaries, ranging from simple word filters to sophisticated model-based protections.
The following are practical recommendations:

Performance monitoring – Advanced monitoring frameworks track technical efficiency and nuanced indicators of fairness, transparency, and potential systemic biases. These systems create a continuous feedback loop, helping organizations detect and address potential issues before they become significant problems.
Feedback mechanisms – Feedback in responsible AI is a sophisticated, multi-channel system. Rather than focusing on collecting data, these mechanisms can create dynamic, responsive systems that can adapt in real time. By establishing comprehensive feedback channels—from internal stakeholders, customers, regulators, and independent reviewers—organizations can create AI systems that are technologically sophisticated and responsive to human needs.
Model retraining – Regular, structured model training processes make sure AI systems remain aligned with changing economic landscapes, emerging regulatory requirements, and evolving societal norms. This approach requires developing adaptive learning capabilities that can intelligently adjust to new data sources, changing contexts, and emerging technological capabilities.

Conclusion
The responsible use of AI in the payments industry represents a significant challenge and an extraordinary opportunity. By implementing robust governance frameworks, promoting fairness, maintaining transparency, protecting privacy, and committing to continuous improvement, payment providers can harness the power of AI while upholding the highest standards of responsibility and compliance.
AWS is committed to supporting payment industry stakeholders on this journey through comprehensive tools, frameworks, and best practices for responsible AI implementation. By partnering with AWS, organizations can expect to accelerate their AI adoption while aligning with regulatory requirements and customer expectations.
As the payments landscape continues to evolve, organizations that establish responsible AI as a core competency will mitigate risks and build stronger customer relationships based on trust and transparency. For more details, refer to the following Accenture report on responsible AI. In an industry built on a foundation of trust, responsible AI is a responsible choice and an important business imperative and success.
To learn more about responsible AI, refer to the AWS Responsible Use of AI Guide.

About the authors
Neelam Koshiya Neelam Koshiya is principal Applied AI Architect (GenAI specialist) at AWS. With a background in software engineering, she moved organically into an architecture role. Her current focus is to help enterprise customers with their ML/ genAI journeys for strategic business outcomes. She likes to build content/mechanisms to scale to larger audience. She is passionate about innovation and inclusion. In her spare time, she enjoys reading and being outdoors.
Ana Gosseen Ana is a Solutions Architect at AWS who partners with independent software vendors in the public sector space. She leverages her background in data management and information sciences to guide organizations through technology modernization journeys, with particular focus on generative AI implementation. She is passionate about driving innovation in the public sector while championing responsible AI adoption. She spends her free time exploring the outdoors with her family and dog, and pursuing her passion for reading.

OpenAI Just Released the Hottest Open-Weight LLMs: gpt-oss-120B (Runs …

OpenAI has just sent seismic waves through the AI world: for the first time since GPT-2 hit the scene in 2019, the company is releasing not one, but TWO open-weight language models. Meet gpt-oss-120b and gpt-oss-20b—models that anyone can download, inspect, fine-tune, and run on their own hardware. This launch doesn’t just shift the AI landscape; it detonates a new era of transparency, customization, and raw computational power for researchers, developers, and enthusiasts everywhere.

Why Is This Release a Big Deal?

OpenAI has long cultivated a reputation for both jaw-dropping model capabilities and a fortress-like approach to proprietary tech. That changed on August 5, 2025. These new models are distributed under the permissive Apache 2.0 license, making them open for commercial and experimental use. The difference? Instead of hiding behind cloud APIs, anyone can now put OpenAI-grade models under their microscope—or put them directly to work on problems at the edge, in enterprise, or even on consumer devices.

Meet the Models: Technical Marvels with Real-World Muscle

gpt-oss-120B

Size: 117 billion parameters (with 5.1 billion active parameters per token, thanks to Mixture-of-Experts tech)

Performance: Punches at the level of OpenAI’s o4-mini (or better) in real-world benchmarks.

Hardware: Runs on a single high-end GPU—think Nvidia H100, or 80GB-class cards. No server farm required.

Reasoning: Features chain-of-thought and agentic capabilities—ideal for research automation, technical writing, code generation, and more.

Customization: Supports configurable “reasoning effort” (low, medium, high), so you can dial up power when needed or save resources when you don’t.

Context: Handles up to a massive 128,000 tokens—enough text to read entire books at a time.

Fine-Tuning: Built for easy customization and local/private inference—no rate limits, full data privacy, and total deployment control.

gpt-oss-20B

Size: 21 billion parameters (with 3.6 billion active parameters per token, also Mixture-of-Experts).

Performance: Sits squarely between o3-mini and o4-mini in reasoning tasks—on par with the best “small” models available.

Hardware: Runs on consumer-grade laptops—with just 16GB RAM or equivalent, it’s the most powerful open-weight reasoning model you can fit on a phone or local PC.

Mobile Ready: Specifically optimized to deliver low-latency, private on-device AI for smartphones (including Qualcomm Snapdragon support), edge devices, and any scenario needing local inference minus the cloud.

Agentic Powers: Like its big sibling, 20B can use APIs, generate structured outputs, and execute Python code on demand.

Technical Details: Mixture-of-Experts and MXFP4 Quantization

Both models use a Mixture-of-Experts (MoE) architecture, only activating a handful of “expert” subnetworks per token. The result? Enormous parameter counts with modest memory usage and lightning-fast inference—perfect for today’s high-performance consumer and enterprise hardware.

Add to that native MXFP4 quantization, shrinking model memory footprints without sacrificing accuracy. The 120B model fits snugly onto a single advanced GPU; the 20B model can run comfortably on laptops, desktops, and even mobile hardware.

Real-World Impact: Tools for Enterprise, Developers, and Hobbyists

For Enterprises: On-premises deployment for data privacy and compliance. No more black-box cloud AI: financial, healthcare, and legal sectors can now own and secure every bit of their LLM workflow.

For Developers: Freedom to tinker, fine-tune, and extend. No API limits, no SaaS bills, just pure, customizable AI with full control over latency or cost.

For the Community: Models are already available on Hugging Face, Ollama, and more—go from download to deployment in minutes.

How Does GPT-OSS Stack Up?

Here’s the kicker: gpt-oss-120B is the first freely available open-weight model that matches the performance of top-tier commercial models like o4-mini. The 20B variant not only bridges the performance gap for on-device AI but will likely accelerate innovation and push boundaries on what’s possible with local LLMs.

The Future Is Open (Again)

OpenAI’s GPT-OSS isn’t just a release; it’s a clarion call. By making state-of-the-art reasoning, tool use, and agentic capabilities available for anyone to inspect and deploy, OpenAI throws open the door to an entire community of makers, researchers, and enterprises—not just to use, but to build on, iterate, and evolve.

Check out the gpt-oss-120B, gpt-oss-20B and  Technical Blog. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post OpenAI Just Released the Hottest Open-Weight LLMs: gpt-oss-120B (Runs on a High-End Laptop) and gpt-oss-20B (Runs on a Phone) appeared first on MarkTechPost.

Anthropic AI Introduces Persona Vectors to Monitor and Control Persona …

LLMs are deployed through conversational interfaces that present helpful, harmless, and honest assistant personas. However, they fail to maintain consistent personality traits throughout the training and deployment phases. LLMs show dramatic and unpredictable persona shifts when exposed to different prompting strategies or contextual inputs. The training process can also cause unintended personality shifts, as seen when modifications to RLHF unintentionally create overly sycophantic behaviors in GPT-4o, leading to validation of harmful content and reinforcement of negative emotions. This highlights weaknesses in current LLM deployment practices and emphasizes the urgent need for reliable tools to detect and prevent harmful persona shifts.

Related works like linear probing techniques extract interpretable directions for behaviors like entity recognition, sycophancy, and refusal patterns by creating contrastive sample pairs and computing activation differences. However, these methods struggle with unexpected generalization during finetuning, where training on narrow domain examples can cause broader misalignment through emergent shifts along meaningful linear directions. Current prediction and control methods, including gradient-based analysis for identifying harmful training samples, sparse autoencoder ablation techniques, and directional feature removal during training, show limited effectiveness in preventing unwanted behavioral changes.

A team of researchers from Anthropic, UT Austin, Constellation, Truthful AI, and UC Berkeley present an approach to address persona instability in LLMs through persona vectors in activation space. The method extracts directions corresponding to specific personality traits like evil behavior, sycophancy, and hallucination propensity using an automated pipeline that requires only natural-language descriptions of target traits. Moreover, it shows that intended and unintended personality shifts after finetuning strongly correlate with movements along persona vectors, offering opportunities for intervention via post-hoc correction or preventative steering methods. Moreover, researchers show that finetuning-induced persona shifts can be predicted before finetuning, identifying problematic training data at both the dataset and individual sample levels.

To monitor persona shifts during finetuning, two datasets are constructed. The first one is trait-eliciting datasets that contain explicit examples of malicious responses, sycophantic behaviors, and fabricated information. The second is “emergent misalignment-like” (“EM-like”) datasets, which contain narrow domain-specific issues such as incorrect medical advice, flawed political arguments, invalid math problems, and vulnerable code. Moreover, researchers extract average hidden states to detect behavioral shifts during finetuning mediated by persona vectors at the last prompt token across evaluation sets, computing the difference to provide activation shift vectors. These shift vectors are then mapped onto previously extracted persona directions to measure finetuning-induced changes along specific trait dimensions.

Dataset-level projection difference metrics show a strong correlation with trait expression after finetuning, allowing early detection of training datasets that may trigger unwanted persona characteristics. It proves more effective than raw projection methods in predicting trait shifts, as it considers the base model’s natural response patterns to specific prompts. Sample-level detection achieves high separability between problematic and control samples across trait-eliciting datasets (Evil II, Sycophantic II, Hallucination II) and “EM-like” datasets (Opinion Mistake II). The persona directions identify individual training samples that induce persona shifts with fine-grained precision, outperforming traditional data filtering methods and providing broad coverage across trait-eliciting content and domain-specific errors.

In conclusion, researchers introduced an automated pipeline that extracts persona vectors from natural-language trait descriptions, providing tools for monitoring and controlling personality shifts across deployment, training, and pre-training phases in LLMs. Future research directions include characterizing the complete persona space dimensionality, identifying natural persona bases, exploring correlations between persona vectors and trait co-expression patterns, and investigating limitations of linear methods for certain personality traits. This study builds a foundational understanding of persona dynamics in models and offers practical frameworks for creating more reliable and controllable language model systems.

Check out the Paper, Technical Blog and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Anthropic AI Introduces Persona Vectors to Monitor and Control Personality Shifts in LLMs appeared first on MarkTechPost.

Building a Multi-Agent Conversational AI Framework with Microsoft Auto …

In this tutorial, we explore how to integrate Microsoft AutoGen with Google’s free Gemini API using LiteLLM, enabling us to build a powerful, multi-agent conversational AI framework that runs seamlessly on Google Colab. We walk through the process of setting up the environment, configuring Gemini for compatibility with AutoGen, and building specialized teams of agents for research, business analysis, and software development tasks. By combining the strengths of structured agent roles and real-time LLM-powered collaboration, we create a versatile system that can execute complex workflows autonomously. Check out the Full Codes here.

Copy CodeCopiedUse a different Browser!pip install AutoGen
!pip install pyautogen google-generativeai litellm

import os
import json
import asyncio
from typing import Dict, List, Any, Optional, Callable
from datetime import datetime
import logging

import autogen
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

import google.generativeai as genai
import litellm

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

We begin by installing the necessary libraries, AutoGen, LiteLLM, and Google Generative AI, to enable multi-agent orchestration with Gemini models. Then, we import essential modules and set up logging to monitor our framework’s execution. This prepares our environment for building intelligent agent interactions. Check out the Full Codes here.

Copy CodeCopiedUse a different Browserclass GeminiAutoGenFramework:
“””
Complete AutoGen framework using free Gemini API
Supports multi-agent conversations, code execution, and retrieval
“””

def __init__(self, gemini_api_key: str):
“””Initialize with Gemini API key”””
self.gemini_api_key = gemini_api_key
self.setup_gemini_config()
self.agents: Dict[str, autogen.Agent] = {}
self.group_chats: Dict[str, GroupChat] = {}

def setup_gemini_config(self):
“””Configure Gemini for AutoGen”””
os.environ[“GOOGLE_API_KEY”] = self.gemini_api_key

self.llm_config = {
“config_list”: [
{
“model”: “gemini/gemini-1.5-flash”,
“api_key”: self.gemini_api_key,
“api_type”: “google”,
“temperature”: 0.7,
“max_tokens”: 4096,
}
],
“timeout”: 120,
“cache_seed”: 42,
}

self.llm_config_pro = {
“config_list”: [
{
“model”: “gemini/gemini-1.5-pro”,
“api_key”: self.gemini_api_key,
“api_type”: “google”,
“temperature”: 0.5,
“max_tokens”: 8192,
}
],
“timeout”: 180,
“cache_seed”: 42,
}

def create_assistant_agent(self, name: str, system_message: str,
use_pro_model: bool = False) -> AssistantAgent:
“””Create a specialized assistant agent”””
config = self.llm_config_pro if use_pro_model else self.llm_config

agent = AssistantAgent(
name=name,
system_message=system_message,
llm_config=config,
human_input_mode=”NEVER”,
max_consecutive_auto_reply=10,
code_execution_config=False,
)

self.agents[name] = agent
return agent

def create_user_proxy(self, name: str = “UserProxy”,
enable_code_execution: bool = True) -> UserProxyAgent:
“””Create user proxy agent with optional code execution”””

code_config = {
“work_dir”: “autogen_workspace”,
“use_docker”: False,
“timeout”: 60,
“last_n_messages”: 3,
} if enable_code_execution else False

agent = UserProxyAgent(
name=name,
human_input_mode=”TERMINATE”,
max_consecutive_auto_reply=0,
is_termination_msg=lambda x: x.get(“content”, “”).rstrip().endswith(“TERMINATE”),
code_execution_config=code_config,
system_message=”””A human admin. Interact with the agents to solve tasks.
Reply TERMINATE when the task is solved.”””
)

self.agents[name] = agent
return agent

def create_research_team(self) -> Dict[str, autogen.Agent]:
“””Create a research-focused agent team”””

researcher = self.create_assistant_agent(
name=”Researcher”,
system_message=”””You are a Senior Research Analyst. Your role is to:
1. Gather and analyze information on given topics
2. Identify key trends, patterns, and insights
3. Provide comprehensive research summaries
4. Cite sources and maintain objectivity

Always structure your research with clear sections and bullet points.
Be thorough but concise.”””
)

analyst = self.create_assistant_agent(
name=”DataAnalyst”,
system_message=”””You are a Data Analysis Expert. Your role is to:
1. Analyze quantitative data and statistics
2. Create data visualizations and charts
3. Identify patterns and correlations
4. Provide statistical insights and interpretations

Use Python code when needed for calculations and visualizations.
Always explain your analytical approach.”””
)

writer = self.create_assistant_agent(
name=”Writer”,
system_message=”””You are a Technical Writer and Content Strategist. Your role is to:
1. Transform research and analysis into clear, engaging content
2. Create well-structured reports and articles
3. Ensure content is accessible to the target audience
4. Maintain professional tone and accuracy

Structure content with clear headings, bullet points, and conclusions.”””
)

executor = self.create_user_proxy(“CodeExecutor”, enable_code_execution=True)

return {
“researcher”: researcher,
“analyst”: analyst,
“writer”: writer,
“executor”: executor
}

def create_business_team(self) -> Dict[str, autogen.Agent]:
“””Create business analysis team”””

strategist = self.create_assistant_agent(
name=”BusinessStrategist”,
system_message=”””You are a Senior Business Strategy Consultant. Your role is to:
1. Analyze business problems and opportunities
2. Develop strategic recommendations and action plans
3. Assess market dynamics and competitive landscape
4. Provide implementation roadmaps

Think systematically and consider multiple perspectives.
Always provide actionable recommendations.”””,
use_pro_model=True
)

financial_analyst = self.create_assistant_agent(
name=”FinancialAnalyst”,
system_message=”””You are a Financial Analysis Expert. Your role is to:
1. Perform financial modeling and analysis
2. Assess financial risks and opportunities
3. Calculate ROI, NPV, and other financial metrics
4. Provide budget and investment recommendations

Use quantitative analysis and provide clear financial insights.”””
)

market_researcher = self.create_assistant_agent(
name=”MarketResearcher”,
system_message=”””You are a Market Research Specialist. Your role is to:
1. Analyze market trends and consumer behavior
2. Research competitive landscape and positioning
3. Identify target markets and customer segments
4. Provide market sizing and opportunity assessment

Focus on actionable market insights and recommendations.”””
)

return {
“strategist”: strategist,
“financial_analyst”: financial_analyst,
“market_researcher”: market_researcher,
“executor”: self.create_user_proxy(“BusinessExecutor”)
}

def create_development_team(self) -> Dict[str, autogen.Agent]:
“””Create software development team”””

developer = self.create_assistant_agent(
name=”SeniorDeveloper”,
system_message=”””You are a Senior Software Developer. Your role is to:
1. Write high-quality, efficient code
2. Design software architecture and solutions
3. Debug and optimize existing code
4. Follow best practices and coding standards

Always explain your code and design decisions.
Focus on clean, maintainable solutions.”””
)

devops = self.create_assistant_agent(
name=”DevOpsEngineer”,
system_message=”””You are a DevOps Engineer. Your role is to:
1. Design deployment and infrastructure solutions
2. Automate build, test, and deployment processes
3. Monitor system performance and reliability
4. Implement security and scalability best practices

Focus on automation, reliability, and scalability.”””
)

qa_engineer = self.create_assistant_agent(
name=”QAEngineer”,
system_message=”””You are a Quality Assurance Engineer. Your role is to:
1. Design comprehensive test strategies and cases
2. Identify potential bugs and edge cases
3. Ensure code quality and performance standards
4. Validate requirements and user acceptance criteria

Be thorough and think about edge cases and failure scenarios.”””
)

return {
“developer”: developer,
“devops”: devops,
“qa_engineer”: qa_engineer,
“executor”: self.create_user_proxy(“DevExecutor”, enable_code_execution=True)
}

def create_group_chat(self, agents: List[autogen.Agent], chat_name: str,
max_round: int = 10) -> GroupChat:
“””Create group chat with specified agents”””

group_chat = GroupChat(
agents=agents,
messages=[],
max_round=max_round,
speaker_selection_method=”round_robin”,
allow_repeat_speaker=False,
)

self.group_chats[chat_name] = group_chat
return group_chat

def run_research_project(self, topic: str, max_rounds: int = 8) -> str:
“””Run a comprehensive research project”””

team = self.create_research_team()
agents_list = [team[“researcher”], team[“analyst”], team[“writer”], team[“executor”]]

group_chat = self.create_group_chat(agents_list, “research_chat”, max_rounds)
manager = GroupChatManager(
groupchat=group_chat,
llm_config=self.llm_config
)

initial_message = f”””
Research Project: {topic}

Please collaborate to produce a comprehensive research report following this workflow:
1. Researcher: Gather information and key insights about {topic}
2. DataAnalyst: Analyze any quantitative aspects and create visualizations if needed
3. Writer: Create a well-structured final report based on the research and analysis
4. CodeExecutor: Execute any code needed for analysis or visualization

The final deliverable should be a professional research report with:
– Executive summary
– Key findings and insights
– Data analysis (if applicable)
– Conclusions and recommendations

Begin the research process now.
“””

chat_result = team[“executor”].initiate_chat(
manager,
message=initial_message,
max_consecutive_auto_reply=0
)

return self._extract_final_result(chat_result)

def run_business_analysis(self, business_problem: str, max_rounds: int = 8) -> str:
“””Run business analysis project”””

team = self.create_business_team()
agents_list = [team[“strategist”], team[“financial_analyst”],
team[“market_researcher”], team[“executor”]]

group_chat = self.create_group_chat(agents_list, “business_chat”, max_rounds)
manager = GroupChatManager(
groupchat=group_chat,
llm_config=self.llm_config_pro
)

initial_message = f”””
Business Analysis Project: {business_problem}

Please collaborate to provide comprehensive business analysis following this approach:
1. BusinessStrategist: Analyze the business problem and develop strategic framework
2. FinancialAnalyst: Assess financial implications and create financial models
3. MarketResearcher: Research market context and competitive landscape
4. BusinessExecutor: Coordinate and compile final recommendations

Final deliverable should include:
– Problem analysis and root causes
– Strategic recommendations
– Financial impact assessment
– Market opportunity analysis
– Implementation roadmap

Begin the analysis now.
“””

chat_result = team[“executor”].initiate_chat(
manager,
message=initial_message,
max_consecutive_auto_reply=0
)

return self._extract_final_result(chat_result)

def run_development_project(self, project_description: str, max_rounds: int = 10) -> str:
“””Run software development project”””

team = self.create_development_team()
agents_list = [team[“developer”], team[“devops”], team[“qa_engineer”], team[“executor”]]

group_chat = self.create_group_chat(agents_list, “dev_chat”, max_rounds)
manager = GroupChatManager(
groupchat=group_chat,
llm_config=self.llm_config
)

initial_message = f”””
Development Project: {project_description}

Please collaborate to deliver a complete software solution:
1. SeniorDeveloper: Design architecture and write core code
2. DevOpsEngineer: Plan deployment and infrastructure
3. QAEngineer: Design tests and quality assurance approach
4. DevExecutor: Execute code and coordinate implementation

Deliverables should include:
– System architecture and design
– Working code implementation
– Deployment configuration
– Test cases and QA plan
– Documentation

Start development now.
“””

chat_result = team[“executor”].initiate_chat(
manager,
message=initial_message,
max_consecutive_auto_reply=0
)

return self._extract_final_result(chat_result)

def _extract_final_result(self, chat_result) -> str:
“””Extract and format final result from chat”””
if hasattr(chat_result, ‘chat_history’):
messages = chat_result.chat_history
else:
messages = chat_result

final_messages = []
for msg in messages[-5:]:
if isinstance(msg, dict) and ‘content’ in msg:
final_messages.append(f”{msg.get(‘name’, ‘Agent’)}: {msg[‘content’]}”)

return “nn”.join(final_messages)

def get_framework_stats(self) -> Dict[str, Any]:
“””Get framework statistics”””
return {
“agents”: list(self.agents.keys()),
“group_chats”: list(self.group_chats.keys()),
“llm_config”: {
“model”: self.llm_config[“config_list”][0][“model”],
“temperature”: self.llm_config[“config_list”][0][“temperature”]
},
“timestamp”: datetime.now().isoformat()
}

We define a class GeminiAutoGenFramework that serves as the core engine for our multi-agent collaboration system using the free Gemini API. Within this class, we configure the model, create specialized agents for research, business, and development tasks, and enable group conversations among them. This setup allows us to simulate real-world workflows by letting AI agents research, analyze, write, and even execute code in a coordinated and modular fashion. Check out the Full Codes here.

Copy CodeCopiedUse a different Browserdef demo_autogen_framework():
“””Demo the AutoGen framework”””
print(” Microsoft AutoGen + Gemini Framework Demo”)
print(“=” * 60)

GEMINI_API_KEY = “your-gemini-api-key-here”

framework = GeminiAutoGenFramework(GEMINI_API_KEY)

print(” Framework initialized successfully!”)
print(f” Stats: {json.dumps(framework.get_framework_stats(), indent=2)}”)

return framework

async def run_demo_projects(framework):
“””Run demonstration projects”””

print(“n Running Research Project…”)
research_result = framework.run_research_project(
“Impact of Generative AI on Software Development in 2025”
)
print(“Research Result (excerpt):”)
print(research_result[:500] + “…” if len(research_result) > 500 else research_result)

print(“n Running Business Analysis…”)
business_result = framework.run_business_analysis(
“A mid-sized company wants to implement AI-powered customer service. ”
“They currently have 50 support staff and handle 1000 tickets daily. ”
“Budget is $500K annually.”
)
print(“Business Analysis Result (excerpt):”)
print(business_result[:500] + “…” if len(business_result) > 500 else business_result)

print(“n Running Development Project…”)
dev_result = framework.run_development_project(
“Build a Python web scraper that extracts product information from e-commerce sites, ”
“stores data in a database, and provides a REST API for data access.”
)
print(“Development Result (excerpt):”)
print(dev_result[:500] + “…” if len(dev_result) > 500 else dev_result)

if __name__ == “__main__”:
print(“Microsoft AutoGen + Gemini Framework Ready! “)
print(“n For Google Colab, run:”)
print(“!pip install pyautogen google-generativeai litellm”)
print(“n Get your free Gemini API key:”)
print(“https://makersuite.google.com/app/apikey”)
print(“n Quick start:”)
print(“””
# Initialize framework
# framework = GeminiAutoGenFramework(“your-gemini-api-key”)

# Run research project
result = framework.run_research_project(“AI Trends 2025”)
print(result)

# Run business analysis
result = framework.run_business_analysis(“Market entry strategy for AI startup”)
print(result)

# Run development project
result = framework.run_development_project(“Build a REST API for user management”)
print(result)
“””)

We conclude our framework by incorporating a demo function that initializes the GeminiAutoGenFramework, prints system statistics, and executes three real-world project simulations: research, business analysis, and software development. This lets us validate the capabilities of our agent teams in action and provides a plug-and-play starting point for any user working in Google Colab.

In conclusion, we have a fully functional multi-agent AI system that can conduct in-depth research, analyze business scenarios, and develop software projects with minimal human intervention. We’ve seen how to orchestrate various specialized agents and how to run projects that reflect real-world use cases. This framework showcases the potential of combining Microsoft AutoGen and Gemini and also provides a reusable blueprint for building intelligent, task-oriented agent teams in our applications.

Check out the Full Codes here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Building a Multi-Agent Conversational AI Framework with Microsoft AutoGen and Gemini API appeared first on MarkTechPost.

Build an AI assistant using Amazon Q Business with Amazon S3 clickable …

Organizations need user-friendly ways to build AI assistants that can reference enterprise documents while maintaining document security. This post shows how to use Amazon Q Business to create an AI assistant that provides clickable URLs to source documents stored in Amazon Simple Storage Service (Amazon S3), to support secure document access and verification. Amazon Q Business is a generative AI-powered conversational assistant that answers questions and completes tasks based on the information in your enterprise systems and enhances workforce productivity.
In this post, we demonstrate how to build an AI assistant using Amazon Q Business that responds to user requests based on your enterprise documents stored in an S3 bucket, and how the users can use the reference URLs in the AI assistant responses to view or download the referred documents, and verify the AI responses to practice responsible AI. You can follow the instructions in this post to build an AI assistant either using the provided sample dataset or your own dataset, and interact with it using the Amazon Q Business web experience and API.
Solution overview
You can build a secure AI assistant for your employees where the AI responses are based on a set of enterprise documents. You store the documents in an S3 bucket and configure the S3 bucket as a data source, or upload the files directly to your Amazon Q Business application from the Amazon Q Business console. Authenticated users subscribed to the Amazon Q Business application can interact with your AI assistant using the Amazon Q Business web experience from their web browsers or with a custom application built by your organization. The Amazon Q Business powered AI assistant provides source attributions to each response with clickable URLs pointing to the documents from which the response is generated. The users can use the URLs to access the reference documents securely, to get more information and practice responsible AI, without requiring the credentials to the S3 bucket where the documents are stored, and the Amazon Q Business application validates the authorization of the authenticated user accessing URL before letting the user view or download a document.
The following diagram shows the internal workings of Amazon S3 clickable URLs, including how the document contents are staged in an S3 bucket during ingestion, and how the workflow of the GetDocumentContent API lets the user securely view or download the document using the URL links.

An S3 bucket containing the enterprise documents to be used by the AI assistant is configured as a data source for an Amazon Q Business application. When the data source is synchronized for the first time, the Amazon Q Business S3 connector crawls the customer’s bucket and ingests the documents, along with their metadata and access control lists (ACLs). During ingestion, the content of each document is stored by Amazon Q Business in a staging S3 bucket in the Amazon Q Business service account. The text extracted from the document, along with the metadata and ACLs, are ingested in an Amazon Q Business index. On subsequent data source sync operations, documents that have changed or are newly added to the customer’s S3 bucket are reingested, their contents are added or updated in the staging bucket, and the contents of the documents deleted from the customer’s S3 bucket are deleted from the staging bucket.When you upload the files directly, the files are processed in a similar way, by storing the document content in the staging bucket and ingesting the extracted text and metadata in the index.
When an authenticated user asks a question or writes a prompt to the AI assistant using the Amazon Q Business web experience or a customer developed application, the UI layer of the application invokes the Chat or ChatSync API. The response to the API includes the source attributions, source reference URLs, and passages from the indexed document that were used as context for the underlying large language model (LLM) to generate the response to the user’s query. When the user chooses a reference URL pointing to a document ingested using the Amazon S3 data source or files uploaded directly, the UI layer is required to invoke the GetDocumentContent API (labeled 1 in the preceding diagram) to obtain the contents of the document to be displayed or downloaded. Chat, ChatSync, and GetDocumentContent APIs can only be invoked using identity-aware credentials of the authenticated user.
Upon receiving the GetDocumentContent API, Amazon Q Business uses the user identity from the identity-aware credentials, retrieves the ACLs for the document being requested, and validates that the user is authorized to access that document. On successful validation, it generates a pre-signed URL for the document content object stored in the staging bucket, and returns it to the UI in response to the GetDocumentContent API call (labeled 3 in the preceding diagram). If the authorization validation fails, an error is returned (labeled 2 in the preceding diagram).
The UI layer can then use the pre-signed URL to display the document content in the web browser or download it to the user’s local computer. Requiring identity-aware credentials and authorization validation makes sure only authenticated users authorized to access the document can view or download the document content. The validity of the pre-signed URL is restricted to 5 minutes. After the pre-signed URL is made available to the user and the document content is downloaded, Amazon Q Business or AWS does not have control of the pre-signed URL, as well as the document content, and following the shared security responsibility model, it is the customer’s responsibility to secure the document further.
To get a hands-on experience of Amazon S3 clickable URLs, follow the instructions in this post to create an AI assistant using an Amazon Q Business application, with an S3 bucket configured as a data source, and upload some files to the data source. You can use the provided sample data SampleData.zip or choose a few documents of your choice. You can then use the Amazon Q Business web experience to ask a few questions about the data you ingested, and use the source reference URLs from the responses to your questions to view or download the referenced documents and validate the responses you got from the AI assistant. We also show how to use the AWS Command Line Interface (AWS CLI) to use the Amazon S3 clickable URLs feature with the Amazon Q Business API.
Considerations for using Amazon S3 clickable URLs
Consider the following when using Amazon S3 clickable URLs:

At the time of writing, the Amazon S3 clickable URLs feature is available on Amazon Q Business applications using AWS IAM Identity Center or IAM federation for user access management, and not available for Amazon Q Business applications created using anonymous mode.
If you already use an Amazon S3 data source for your Amazon Q Business application, you must perform a full sync of the data source for the Amazon S3 clickable URLs feature to be available to your users.
If you already use an Amazon Q Business web experience for your users to interact with your AI assistant, you must add the following permissions to the AWS Identity and Access Management (IAM) role for the Amazon Q Business web experience:

{
      “Sid”: “QBusinessGetDocumentContentPermission”,
      “Effect”: “Allow”,
      “Action”: [“qbusiness:GetDocumentContent”],
      “Resource”: [
        “arn:aws:qbusiness:{{region}}:{{source_account}}:application/{{application_id}}”,
        “arn:aws:qbusiness:{{region}}:{{source_account}}:application/{{application_id}}/index/*”
      ]
}

Prerequisites
To deploy the solution using the instructions in this post in your own AWS account, make sure that you have the following:

An AWS account
Amazon S3 and AWS IAM Identity Center permissions
Privileges to create an Amazon Q application, AWS resources, and IAM roles and policies
Basic knowledge of AWS services and the AWS CLI
Follow the steps for Setting up for Amazon Q Business if you’re using Amazon Q Business for the first time

Create your S3 bucket and upload data
Choose an AWS Region where Amazon Q Business is available, keeping in mind that you must create all the AWS resources in this example in this Region. If you already have an S3 bucket with a few documents uploaded, you can use it for this exercise. Otherwise, for instructions to prepare an S3 bucket as a data source, refer to Creating a general purpose bucket. Download and unzip SampleData.zip to your local computer. Open the S3 bucket you created on the Amazon S3 console and upload the contents of the ACME Project Space, HR Data, and IT Help folders to the S3 bucket.

The following screenshot shows the list of uploaded files.

Create an Amazon Q Business application
Depending on your choice of user access management method, create an IAM Identity Center integrated Amazon Q Business application or an IAM federated Amazon Q Business application. At the time of writing, Amazon S3 clickable URLs are not available for Amazon Q Business applications with anonymous access.
To create an IAM Identity Center integrated Amazon Q Business application, complete the following steps:

On the Amazon Q Business console, choose Applications in the navigation pane.
Choose Create application.
For Application name, enter a unique name or use the automatically generated name.
For User access, select Authenticated access.
For Outcome, select Web experience.

For Access management method, select AWS IAM Identity Center.

If IAM Identity Center is correctly configured either in your account or in the AWS Organization to which your account belongs, and is in the same Region, you will see a message about the application being connected to the IAM Identity Center instance.

Choose the users who will have access to this application and their subscription tiers. For this post, both Q Business Pro and Q Business Lite subscription tiers will work.
Choose Create.

Create an index
In preparation to configure data sources, you must first create an index. Complete the following steps:

On the Amazon Q Business console, choose Applications in the navigation pane.
Open your application.
Under Enhancements in the navigation pane, choose Data sources.
Choose Add an index.

Select create a new index.
For Index name, keep the automatically generated name.
For Index provisioning, select your preferred provisioning method. For this post, either Enterprise or Starter will work.
Leave Number of units as 1.
Choose Add an index.

The creation process takes a few minutes to complete.

Create data sources
To configure your Amazon S3 data source, complete the following steps. For more details, refer to Connecting Amazon Q Business to Amazon S3 using the console.

On the Amazon Q Business console, choose Applications in the navigation pane.
Open your application.
Under Enhancements in the navigation pane, choose Data sources.
Choose Add data source.

On the Add data source page, choose Amazon S3 as your data source.

For Data source name, enter a name.
For IAM role, choose Create a new service role.
For Role name, keep the automatically generated name.

Under Sync scope, enter the location of the S3 bucket you created earlier.

For Sync mode, select Full sync.
For Frequency, choose Run on demand.
Choose Add data source.

After the data source is created, choose Sync now to start the data source sync.

It takes a few minutes for the data source sync to complete.

The Data sources page shows the status of the data sources, as shown in the following screenshot.

Now let’s create a data source with uploaded files.

On the Data sources page, choose Add data source.
Choose Upload files.

Under Select files, choose Choose files.
Open the location where you unzipped the sample data and choose the file national_park_services_infograph.pdf.

Choose Upload to upload the file to the index.

Interact with your AI assistant
Now it’s time to test the AI assistant. In the following sections, we demonstrate how to use the Amazon Q Business web experience and the API to interact with your AI assistant.
Using Amazon Q Business web experience
Open the deployed URL of your Amazon Q Business application in a web browser window to start the web experience for your AI assistant and sign in as one of the subscribed users.

After the web experience starts, enter a prompt based on the data you indexed. If you are using the sample data provided with the post, you can use the prompt “What is the eligibility criteria for employees to receive health benefits?” as shown in the following screenshot. When you view the reference sources below the response, you will notice a download icon next to the file name, which you can use to download the file to view.

Choose the file name and choose Save to save the file to your computer.
Keep in mind that although Amazon Q Business checks the ACLs to confirm that you are authorized to access the document before downloading, anyone who has access to the computer where you download the file will be able to access the document.

Choose the download status icon in your browser and choose the open icon to open the file.

The document will open for your reference, as shown in the following screenshot.

Now let’s look at the example of a PDF document, which in this case is the data source containing the files you uploaded, in response to the prompt “How many parks are governed by the National Parks Service?” Because most web browsers can open the PDF file on a new tab, notice the file open icon next to the source file name—this is different from the file download icon in the previous case of a .docx file. When you choose the file name, the document opens in a new tab.

The following screenshot shows the PDF in the new browser tab.

Using the Amazon Q Business API
In this section, we show how to use the AWS CLI to experience how clickable URLs work when using API. To verify that an end-user is authenticated and receives fine-grained authorization to their user ID and group-based resources, a subset of the Amazon Q Business APIs (Chat, ChatSync, ListConversations, ListMessages, DeleteConversation, PutFeedback, GetDocumentContent) require identity-aware AWS Sig V4 credentials for the authenticated user on whose behalf the API call is being made. You must use the appropriate procedure to get identity-aware credentials based on whether your Amazon Q Business application user access management is configured with IAM Identity Center or IAM federation. You can apply these credentials by setting environment variables on your command line where the AWS CLI is installed; for convenience, you can choose AWS CloudShell.
First, use the ChatSync API to make a query to your Amazon Q Business application:

aws qbusiness chat-sync –region <YOUR-AWS-REGION> 
    –application-id <YOUR-AMAZON-Q-BUSINESS-APPLICATION-ID> 
    –user-message “what is the eligibility criteria to receive health benefits?”

This command will get a response similar to the following:

{
    “conversationId”: “<YOUR-CONVERSATION-ID>”,
    “systemMessage”: “Employees are eligible for health benefits if they have an appointment of more than six months (at least six months plus one day) and a time base of half-time or more. Eligible employees have 60 calendar days from the date of appointment or a permitting event to enroll in a health plan, or during an Open Enrollment period.”,
    “systemMessageId”: “<YOUR-SYSTEM-MESSAGE-ID>”,
    “userMessageId”: “<YOUR-USER-MESSAGE-ID>”,
    “sourceAttributions”: [
        {
            “title”: “Employee+health+benefits+policy.docx”,
            “snippet”: “nEmployee health benefits policy This document outlines the policy for employee health benefits. Benefit Eligibility Employees are eligible for health benefits if they have an appointment of more than six months (at least six months plus one day) and a time base of half-time or more. Eligible employees have 60 calendar days from the date of appointment or a permitting event to enroll in a health plan, or during an Open Enrollment period. For questions about your eligibility, contact your department’s personnel office. Making Changes to Your Current Benefits You may make changes to your benefits during Open Enrollment, usually during September and October of each year, or based on a permitting event outside of Open Enrollment. You may not change your health benefits choice during the year unless you experience a permitting event. You must apply for any changes or enrollments within 60 calendar days of the permitting event date. For questions about permitting events, contact your department’s personnel office. Permitting events or qualifying life events There are exceptions to the annual open enrollment period. These are called qualifying life events or permitting events and if you experience one or more of them, you can buy new coverage or change your existing coverage.”,
            “url”: “https://<YOUR-S3-BUCKET-NAME>/DemoData/hr-data/Employee%2Bhealth%2Bbenefits%2Bpolicy.docx”,
            “citationNumber”: 1,
            “textMessageSegments”: [
                {
                    “beginOffset”: 167,
                    “endOffset”: 324,
                    “snippetExcerpt”: {
                        “text”: “benefits if they have an appointment of more than six months (at least six months plus one day) and a time base of half-time or more. Eligible employees have 60 calendar days from the date of appointment or a permitting event to enroll in a health plan, or during an Open Enrollment period”
                    }
                }
            ],
            “documentId”: “s3://<YOUR-S3-BUCKET-NAME>/DemoData/hr-data/Employee+health+benefits+policy.docx”,
            “indexId”: “<INDEX-ID-OF-YOUR-AMAZON-Q-BUSINESS-APPLICATION>”,
            “datasourceId”: “<DATA-SOURCE-ID-OF-YOUR-S3-DATA-SOURCE>”
        }
    ],
    “failedAttachments”: []

Next, use the GetDocumentContent API using the information from the source attributions in the ChatSync API response to download and display the document to the user:

aws qbusiness get-document-content –region <YOUR-AWS-REGION> 
    –application-id <YOUR-AMAZON-Q-BUSINESS-APPLICATION-ID> 
    –document-id <THE-DOCUMENT-ID-FROM-THE-SOURCE-ATTRIBUTIONS> 
    –index-id <INDEX-ID-FROM-THE-SOURCE-ATTRIBUTIONS> 
    –data-source-id <DATA-SOURCE-ID-FROM-THE-SOURCE-ATTRIBUTIONS> 
    –output-format RAW

When Amazon Q Business receives the GetDocumentContent API call, the ACLs, when present, are verified to confirm that the user making the API call is authorized to access the document, and then a short interval pre-signed URL is returned in response to a successful invocation of the GetDocumentContent API that you can use to download or view the document:

{
    “presignedUrl”: “<PRESIGNED-URL-TO-THE-STAGED-DOCUMENT-CONTENT>”,
    “mimeType”: “<MIME-TYPE-OF-THE-DOCUMENT>”
}

Troubleshooting
This section discusses a few errors you might encounter as you use Amazon S3 clickable URLs for the source references in your conversations with your Amazon Q Business powered AI assistant.
Refer to Troubleshooting your Amazon S3 connector for information about error codes you might see for the Amazon S3 connector and suggested troubleshooting actions. If you encounter an HTTP status code 403 (Forbidden) error when you open your Amazon Q Business application, it means that the user is unable to access the application. To find the common causes and how to address them, refer to Troubleshooting Amazon Q Business and identity provider integration.

Full sync required – While attempting to access referenced URLs from an Amazon S3 or uploaded files data source, the user gets the following error message: “Error: This document cannot be downloaded because the raw document download feature requires a full connector sync performed after 07/02/2025. Your admin has not yet completed this full sync. Please contact your admin to request a complete sync of the data source.” This error can be resolved after performing a full sync of the Amazon S3 data source, or deleting the files from the uploaded files data source and uploading them again.
You can no longer access a document referred in the conversation history – While browsing through conversation history, the user chooses a reference URL from an Amazon S3 data source and can’t view or download the file with the following error: “Error: You no longer have permission to access this document. The access permissions for this document have been changed since you last accessed it. Please contact your admin if you believe you should have access to this content.” This error implies that the permissions for the document in the ACLs on the S3 bucket configured as the data source changed, so the user no longer authorized to access the file, and the ACLs got updated in the Amazon Q Business index in a data source sync. If the user believes that they should have access to the document, they must contact the administrator to address the ACLs and perform a data source sync.
The document you are trying to access no longer exists – While browsing through conversation history, the user chooses a reference URL from an Amazon S3 or uploaded files data source, and can’t view or download the file with the following error: “Error: The document you’re trying to access no longer exists in the data source. It may have been deleted or moved since it was last referenced. Please check with the admin if you need access to this document.” This error implies that the document is deleted from the S3 bucket or moved to a different location, and therefore also got deleted from the Amazon Q Business index and staging bucket for the specific document ID during a data source sync. This error will also manifest when a document from the uploaded files data source is deleted by the administrator subsequent to the conversation. If the user believes that the document should not be deleted, they should contact the administrator to attempt to restore the document and perform a data source sync.
You can’t download this document because your web experience lacks the required permissions – When the user chooses a reference URL from an Amazon S3 or uploaded files data source, they can’t view or download the file with the following error: “Error: Unable to download this document because your Web Experience lacks the required permissions. Your admin needs to update the IAM role for the Web Experience to include permissions for the GetDocumentContent API. Please contact your admin to request this IAM role update.” The administrator can attempt to resolve this error by updating the IAM role for the web experience with permissions to invoke the GetDocumentContent API, as discussed in the considerations section earlier in this post.

Clean up
To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created: the Amazon Q application, data sources, and corresponding IAM roles. Complete the following steps:

To delete the Amazon Q application, go to the Amazon Q console and, on the Applications page, select your application.
On the Actions drop-down menu, choose Delete.
To confirm deletion, enter delete in the field and choose Delete. Wait until you get the confirmation message; the process can take up to 15 minutes.
To delete the S3 bucket you created during this exercise, empty the bucket and then delete the bucket.
Delete your IAM Identity Center instance.

Conclusion
In this post, we showed how to build an AI assistant with Amazon Q Business based on your enterprise documents stored in an S3 bucket or by directly uploading the documents to the data source. Amazon S3 clickable URLs provide a user-friendly mechanism for authenticated users to securely view or download the documents referenced in responses to users’ queries, validate accuracy, and practice responsible AI—a critical success factor for an enterprise AI assistant solution.
For more information about the Amazon Q Business S3 connector, see Discover insights from Amazon S3 with Amazon Q S3 connector.

About the authors
Abhinav Jawadekar is a Principal Solutions Architect in the Amazon Q Business service team at AWS. Abhinav works with AWS customers and partners to help them build generative AI solutions on AWS.