i-genie, Author at i-genie.co.uk

Advancing AI agent governance with Boomi and AWS: A unified approach t …

Posted on July 3, 2025 by i-genie

Just as APIs became the standard for integration, AI agents are transforming workflow automation through intelligent task coordination. AI agents are already enhancing decision-making and streamlining operations across enterprises. But as adoption accelerates, organizations face growing complexity in managing them at scale. Organizations struggle with observability and lifecycle management, finding it difficult to monitor performance and manage versions effectively. Governance and security concerns arise as these agents process sensitive data, which requires strict compliance and access controls. Perhaps most concerningly, without proper management, organizations face the risk of agent sprawl—the unchecked proliferation of AI agents leading to inefficiency and security vulnerabilities.
Boomi and AWS have collaborated to address the complexity surrounding AI agents with Agent Control Tower, an AI agent management solution developed by Boomi and tightly integrated with Amazon Bedrock. Agent Control Tower, part of the Boomi Agentstudio solution, provides the governance framework to manage this transformation, with capabilities that address both current and emerging compliance needs.
As a leader in enterprise iPaaS per Gartner’s Magic Quadrant, based on Completeness of Vision and Ability to Execute, Boomi serves over 20,000 enterprise customers, with three-quarters of these customers operating on AWS. This includes a significant presence among Fortune 500 and Global 2000 organizations across critical sectors such as healthcare, finance, technology, and manufacturing. Boomi is innovating with generative AI, with more than 2,000 customers using its AI agents. The convergence of capabilities that Boomi provides—spanning AI, integration, automation, API management, and data management—with AWS and its proven track record in reliability, security, and AI innovation creates a compelling foundation for standardized AI agent governance at scale. In this post, we share how Boomi partnered with AWS to help enterprises accelerate and scale AI adoption with confidence using Agent Control Tower.
A unified AI management solution
Built on AWS, Agent Control Tower uniquely delivers a single control plane for managing AI agents across multiple systems, including other cloud providers and on-premises environments. At its core, it offers comprehensive observability and monitoring, providing real-time performance tracking and deep visibility into agent decision-making and behavior.
The following screenshot showcases how users can view summary data across agent providers and add or manage providers.

The following screenshot shows an example of the Monitoring and Compliance dashboard.

Agent Control Tower also provides a single pane of glass for visibility into the tools used by each agent, as illustrated in the following screenshot.

Agent Control Tower provides key governance and security controls such as centralized policy enforcement and role-based access control, and enables meeting regulatory compliance with frameworks like GDPR and HIPAA. Furthermore, its lifecycle management capabilities enable automated agent discovery, version tracking, and operational control through features such as pause and resume functionality. Agent Control Tower is positioned as one of the first, if not the first, unified solutions that provides full lifecycle AI agent management with integrated governance and orchestration features. Although many vendors focus on releasing AI agents, there are few that focus on solutions for managing, deploying, and governing AI agents at scale.
The following screenshot shows an example of how users can review agent details and disable or enable an agent.

As shown in the following screenshot, users can drill down into details for each part of the agent.

Amazon Bedrock: Enabling and enhancing AI governance
Using Amazon Bedrock, organizations can implement security guardrails and content moderation while maintaining the flexibility to select and switch between AI models for optimized performance and accuracy. Organizations can create and enable access to curated knowledge bases and predefined action groups, enabling sophisticated multi-agent collaboration. Amazon Bedrock also provides comprehensive metrics and trace logs for agents to help facilitate complete transparency and accountability in agent operations. Through deep integration with Amazon Bedrock, Boomi’s Agent Control Tower enhances agent transparency and governance, offering a unified, actionable view of agent configurations and activities across environments.
The following diagram illustrates the Agent Control Tower architecture on AWS.

Business impact: Transforming enterprise AI operations
Consider a global manufacturer using AI agents for supply chain optimization. With Agent Control Tower, they can monitor agent performance across regions in real time, enforce consistent security policies, and enable regulatory compliance. When issues arise, they can quickly identify and resolve them while maintaining the ability to scale AI operations confidently. With this level of control and visibility, organizations can deploy AI agents more effectively while maintaining robust security and compliance standards.
Conclusion
Boomi customers have already deployed more than 33,000 agents and are seeing up to 80% less time spent on documentation and 50% faster issue resolution. With Boomi and AWS, enterprises can accelerate and scale AI adoption with confidence, backed by a product that puts visibility, governance, and security first. Discover how Agent Control Tower can help your organization manage AI agent sprawl and take advantage of scalable, compliance-aligned innovation. Take a guided tour and learn more about Boomi Agent Control Tower and Amazon Bedrock integration. Or, you can get started today with AI FastTrack.

About the authors
Deepak Chandrasekar is the VP of Software Engineering & User Experience and leads multidisciplinary teams at Boomi. He oversees flagship initiatives like Boomi’s Agent Control Tower, Task Automation, and Market Reach, while driving a cohesive and intelligent experience layer across products. Previously, Deepak held a key leadership role at Unifi Software, which was acquired by Boomi. With a passion for building scalable, and intuitive AI-powered solutions, he brings a commitment to engineering excellence and responsible innovation.
Sandeep Singh is Director of Engineering at Boomi, where he leads global teams building solutions that enable enterprise integration and automation at scale. He drives initiatives like Boomi Agent Control Tower, Marketplace, and Labs, empowering partners and customers with intelligent, trusted solutions. With leadership experience at GE and Fujitsu, Sandeep brings expertise in API strategy, product engineering, and AI/ML solutions. A former solution architect, he is passionate about designing mission-critical systems and driving innovation through scalable, intelligent solutions.
Santosh Ameti is a seasoned Engineering leader in the Amazon Bedrock team and has built Agents, Evaluation, Guardrails, and Prompt Management solutions. His team continuously innovates in the agentic space, delivering one of the most secure and managed agentic solutions for enterprises.
Greg Sligh is a Senior Solutions Architect at AWS with more than 25 years of experience in software engineering, software architecture, consulting, and IT and Engineering leadership roles across multiple industries. For the majority of his career, he has focused on creating and delivering distributed, data-driven applications with particular focus on scale, performance, and resiliency. Now he helps ISVs meet their objectives across technologies, with particular focus on AI/ML.
Padma Iyer is a Senior Customer Solutions Manager at Amazon Web Services, where she specializes in supporting ISVs. With a passion for cloud transformation and financial technology, Padma works closely with ISVs to guide them through successful cloud transformations, using best practices to optimize their operations and drive business growth. Padma has over 20 years of industry experience spanning banking, tech, and consulting.

Baidu Researchers Propose AI Search Paradigm: A Multi-Agent Framework …

Posted on July 2, 2025 by i-genie

The Need for Cognitive and Adaptive Search Engines

Modern search systems are evolving rapidly as the demand for context-aware, adaptive information retrieval grows. With the increasing volume and complexity of user queries, particularly those requiring layered reasoning, systems are no longer limited to simple keyword matching or document ranking. Instead, they aim to mimic the cognitive behaviors humans exhibit when gathering and processing information. This transition towards a more sophisticated, collaborative approach marks a fundamental shift in how intelligent systems are designed to respond to users.

Limitations of Traditional and RAG Systems

Despite these advances, current methods still face critical limitations. Retrieval-augmented generation (RAG) systems, while useful for direct question answering, often operate in rigid pipelines. They struggle with tasks that involve conflicting information sources, contextual ambiguity, or multi-step reasoning. For example, a query that compares the ages of historical figures requires understanding, calculating, and comparing information from separate documents—tasks that demand more than simple retrieval and generation. The absence of adaptive planning and robust reasoning mechanisms often leads to shallow or incomplete answers in such cases.

The Emergence of Multi-Agent Architectures in Search

Several tools have been introduced to enhance search performance, including Learning-to-Rank systems and advanced retrieval mechanisms utilizing Large Language Models (LLMs). These frameworks incorporate features like user behavior data, semantic understanding, and heuristic models. However, even advanced RAG methods, including ReAct and RQ-RAG, primarily follow static logic, which limits their ability to effectively reconfigure plans or recover from execution failures. Their dependence on one-shot document retrieval and single-agent execution further restricts their ability to handle complex, context-dependent tasks.

Introduction of the AI Search Paradigm by Baidu

Researchers from Baidu introduced a new approach called the “AI Search Paradigm,” designed to overcome the limitations of static, single-agent models. It comprises a multi-agent framework with four key agents: Master, Planner, Executor, and Writer. Each agent is assigned a specific role within the search process. The Master coordinates the entire workflow based on the complexity of the query. The Planner structures complex tasks into sub-queries. The Executor manages tool usage and task completion. Finally, the Writer synthesizes the outputs into a coherent response. This modular architecture enables flexibility and precise task execution that traditional systems lack.

Use of Directed Acyclic Graphs for Task Planning

The framework introduces a Directed Acyclic Graph (DAG) to organize complex queries into dependent sub-tasks. The Planner chooses relevant tools from the MCP servers to address each sub-task. The Executor then invokes these tools iteratively, adjusting queries and fallback strategies when tools fail or data is insufficient. This dynamic reassignment ensures continuity and completeness. The Writer evaluates the results, filters inconsistencies, and compiles a structured response. For example, in a query asking who is older than Emperor Wu of Han and Julius Caesar, the system retrieves birthdates from different tools, performs the age calculation, and delivers the result—all in a coordinated, multi-agent process.

Qualitative Evaluations and Workflow Configurations

The performance of this new system was evaluated using several case studies and comparative workflows. Unlike traditional RAG systems, which operate in a one-shot retrieval mode, the AI Search Paradigm dynamically replans and reflects on each sub-task. The system supports three team configurations based on complexity: Writer-Only, Executor-Inclusive, and Planner-Enhanced. For the Emperor age comparison query, the Planner decomposed the task into three sub-steps and assigned tools accordingly. The final output stated that Emperor Wu of Han lived for 69 years and Julius Caesar for 56 years, indicating a 13-year difference—an output accurately synthesized across multiple sub-tasks. While the paper focused more on qualitative insights than numeric performance metrics, it demonstrated strong improvements in user satisfaction and robustness across tasks.

Conclusion: Toward Scalable, Multi-Agent Search Intelligence

In conclusion, this research presents a modular, agent-based framework that enables search systems to surpass document retrieval and emulate human-style reasoning. The AI Search Paradigm represents a significant advancement by incorporating real-time planning, dynamic execution, and coherent synthesis. It not only solves current limitations but also offers a foundation for scalable, trustworthy search solutions driven by structured collaboration between intelligent agents.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Baidu Researchers Propose AI Search Paradigm: A Multi-Agent Framework for Smarter Information Retrieval appeared first on MarkTechPost.

Baidu Open Sources ERNIE 4.5: LLM Series Scaling from 0.3B to 424B Par …

Posted on July 2, 2025 by i-genie

Baidu has officially open-sourced its latest ERNIE 4.5 series, a powerful family of foundation models designed for enhanced language understanding, reasoning, and generation. The release includes ten model variants ranging from compact 0.3B dense models to massive Mixture-of-Experts (MoE) architectures, with the largest variant totaling 424B parameters. These models are now freely available to the global research and developer community through Hugging Face, enabling open experimentation and broader access to cutting-edge Chinese and multilingual language technology.

Technical Overview of ERNIE 4.5 Architecture

The ERNIE 4.5 series builds on Baidu’s previous iterations of ERNIE models by introducing advanced model architectures, including both dense and sparsely activated MoE designs. The MoE variants are particularly notable for scaling parameter counts efficiently: the ERNIE 4.5-MoE-3B and ERNIE 4.5-MoE-47B variants activate only a subset of experts per input token (typically 2 of 64 experts), keeping the number of active parameters manageable while retaining model expressivity and generalization capabilities.

ERNIE 4.5 models are trained using a mixture of supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF), and contrastive alignment techniques. The training corpus spans 5.6 trillion tokens across diverse domains in both Chinese and English, using Baidu’s proprietary multi-stage pretraining pipeline. The resulting models demonstrate high fidelity in instruction-following, multi-turn conversation, long-form generation, and reasoning benchmarks.

Model Variants and Open-Source Release

The ERNIE 4.5 release includes the following ten variants:

Dense Models: ERNIE 4.5-0.3B, 0.5B, 1.8B, and 4B

MoE Models: ERNIE 4.5-MoE-3B, 4B, 6B, 15B, 47B, and 424B total parameters (with varying active parameters)

The MoE-47B variant, for instance, activates only 3B parameters during inference while having a total of 47B. Similarly, the 424B model—the largest ever released by Baidu—employs sparse activation strategies to make inference feasible and scalable. These models support both FP16 and INT8 quantization for efficient deployment.

Performance Benchmarks

ERNIE 4.5 models show significant improvements on several key Chinese and multilingual NLP tasks. According to the official technical report:

On CMMLU, ERNIE 4.5 surpasses previous ERNIE versions and achieves state-of-the-art accuracy in Chinese language understanding.

On MMLU, the multilingual benchmark, ERNIE 4.5-47B demonstrates competitive performance with other leading LLMs like GPT-4 and Claude.

For long-form generation, ERNIE 4.5 achieves higher coherence and factuality scores when evaluated using Baidu’s internal metrics.

In instruction-following tasks, the models benefit from contrastive fine-tuning, showing improved alignment with user intent and reduced hallucination rates compared to earlier ERNIE versions.

Applications and Deployment

ERNIE 4.5 models are optimized for a broad range of applications:

Chatbots and Assistants: Multilingual support and instruction-following alignment make it suitable for AI assistants.

Search and Question Answering: High retrieval and generation fidelity allow for integration with RAG pipelines.

Content Generation: Long-form text and knowledge-rich content generation are improved with better factual grounding.

Code and Multimodal Extension: Although the current release focuses on text, Baidu indicates that ERNIE 4.5 is compatible with multimodal extensions.

With support for up to 128K context length in some variants, the ERNIE 4.5 family can be used in tasks requiring memory and reasoning across long documents or sessions.

Conclusion

The ERNIE 4.5 series represents a significant step in open-source AI development, offering a versatile set of models tailored for scalable, multilingual, and instruction-aligned tasks. Baidu’s decision to release models ranging from lightweight 0.3B variants to a 424B-parameter MoE model underscores its commitment to inclusive and transparent AI research. With comprehensive documentation, open availability on Hugging Face, and support for efficient deployment, ERNIE 4.5 is positioned to accelerate global advancements in natural language understanding and generation.

Check out the Paper and Models on Hugging Face. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Baidu Open Sources ERNIE 4.5: LLM Series Scaling from 0.3B to 424B Parameters appeared first on MarkTechPost.

OMEGA: A Structured Math Benchmark to Probe the Reasoning Limits of LL …

Posted on July 2, 2025 by i-genie

Introduction to Generalization in Mathematical Reasoning

Large-scale language models with long CoT reasoning, such as DeepSeek-R1, have shown good results on Olympiad-level mathematics. However, models trained through Supervised Fine-Tuning or Reinforcement Learning depend on limited techniques, such as repeating known algebra rules or defaulting to coordinate geometry in diagram problems. Since these models follow learned reasoning patterns rather than showing true mathematical creativity, they face challenges with complex tasks that demand original insights. Current math datasets are poorly suited for analyzing math skills that RL models can learn. Large-scale corpora integrate a range of math questions varying in topic and difficulty, making it challenging to isolate specific reasoning skills.

Limitations of Current Mathematical Benchmarks

Current methods, such as out-of-distribution generalization, focus on handling test distributions that differ from training data, which is crucial for mathematical reasoning, physical modeling, and financial forecasting. Compositional generalization techniques aim to help models systematically combine learned skills. Researchers have created datasets through various methods to benchmark mathematical abilities, which include hiring humans to write problems like GSM8K and MinervaMath, collecting exam questions such as AIME and OlympiadBench, and scraping and filtering exam corpora like NuminaMath and BigMath. However, these approaches either lack sufficient challenge for modern LLMs or fail to provide analysis granularity.

Introducing OMEGA: A Controlled Benchmark for Reasoning Skills

Researchers from the University of California, Ai2, the University of Washington, and dmodel.ai have proposed OMEGA, a benchmark designed to evaluate three dimensions of Out-of-Distribution generalization, inspired by Boden’s typology of creativity. It creates matched training and test pairs designed to isolate specific reasoning skills across three dimensions: Exploratory, Compositional, and Transformative. OMEGA’s test and train problems are constructed using carefully engineered templates, allowing precise control over diversity, complexity, and the specific reasoning strategies required for solutions. Moreover, it employs 40 templated problem generators across six mathematical domains: arithmetic, algebra, combinatorics, number theory, geometry, and logic & puzzles.

Evaluation on Frontier LLMs and Reinforcement Learning Setup

Researchers evaluate four frontier models, including DeepSeek-R1, Claude-3.7-Sonnet, OpenAI-o3-mini, and OpenAI-o4-mini, across different complexity levels. For RL generalization experiments, the framework applies the GRPO algorithm on 1,000 training problems using Qwen2.5-7B-Instruct and Qwen2.5-Math-7B models. Exploratory generalization trains on restricted complexity levels and evaluates on higher complexity problems. Compositional generalization involves training models on individual skills in isolation and testing their ability to combine and apply those skills effectively. Transformational generalization trains on conventional solution approaches and evaluates performance on problems that need unconventional strategies.

Performance Observations and Model Behavior Patterns

Reasoning LLMs tend to perform worse as problem complexity increases, often finding correct solutions early but spending too many tokens on unnecessary verification. RL applied only on low-complexity problems enhances generalization to medium-complexity problems, with larger gains on in-domain examples than out-of-distribution ones, indicating RL’s effectiveness at reinforcing familiar patterns. For instance, in the Zebra Logic domain, the base model achieves only 30% accuracy. However, RL training increased performance by 61 points on in-domain examples and 53 points on out-of-distribution examples without SFT.

Conclusion: Toward Advancing Transformational Reasoning

In conclusion, researchers introduced OMEGA, a benchmark that isolates and evaluates three axes of out-of-distribution generalization in mathematical reasoning: explorative, compositional, and transformative. The empirical study reveals three insights: (a) RL fine-tuning significantly improves performance on in-distribution and exploratory generalization tasks, (b) RL’s benefits for compositional tasks are limited, and (c) RL fails to induce genuinely new reasoning patterns. These findings highlight a fundamental limitation: RL can amplify problem-solving breadth and depth, but it falls short in enabling the creative leaps essential for transformational reasoning. Future work should explore curriculum scaffolding and meta-reasoning controllers.

Check out the Paper, Project Page and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post OMEGA: A Structured Math Benchmark to Probe the Reasoning Limits of LLMs appeared first on MarkTechPost.

Use Amazon SageMaker Unified Studio to build complex AI workflows usin …

Posted on July 2, 2025 by i-genie

Organizations face the challenge to manage data, multiple artificial intelligence and machine learning (AI/ML) tools, and workflows across different environments, impacting productivity and governance. A unified development environment consolidates data processing, model development, and AI application deployment into a single system. This integration streamlines workflows, enhances collaboration, and accelerates AI solution development from concept to production.
The next generation of Amazon SageMaker is the center for your data, analytics, and AI. SageMaker brings together AWS AI/ML and analytics capabilities and delivers an integrated experience for analytics and AI with unified access to data. Amazon SageMaker Unified Studio is a single data and AI development environment where you can find and access your data and act on it using AWS analytics and AI/ML services, for SQL analytics, data processing, model development, and generative AI application development.
With SageMaker Unified Studio, you can efficiently build generative AI applications in a trusted and secure environment using Amazon Bedrock. You can choose from a selection of high-performing foundation models (FMs) and advanced customization and tooling such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, Amazon Bedrock Agents, and Amazon Bedrock Flows. You can rapidly tailor and deploy generative AI applications, and share with the built-in catalog for discovery.
In this post, we demonstrate how you can use SageMaker Unified Studio to create complex AI workflows using Amazon Bedrock Flows.
Solution overview
Consider FinAssist Corp, a leading financial institution developing a generative AI-powered agent support application. The solution offers the following key features:

Complaint reference system – An AI-powered system providing quick access to historical complaint data, enabling customer service representatives to efficiently handle customer follow-ups, support internal audits, and aid in training new staff.
Intelligent knowledge base – A comprehensive data source of resolved complaints that quickly retrieves relevant complaint details, resolution actions, and outcome summaries.
Streamlined workflow management – Enhanced consistency in customer communications through standardized access to past case information, supporting compliance checks and process improvement initiatives.
Flexible query capability – A straightforward interface supporting various query scenarios, from customer inquiries about past resolutions to internal reviews of complaint handling procedures.

Let’s explore how SageMaker Unified Studio and Amazon Bedrock Flows, integrated with Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents, address these challenges by creating an AI-powered complaint reference system. The following diagram illustrates the solution architecture.

The solution uses the following key components:

SageMaker Unified Studio – Provides the development environment
Flow app – Orchestrates the workflow, including:

Knowledge base queries
Prompt-based classification
Conditional routing
Agent-based response generation

The workflow processes user queries through the following steps:

A user submits a complaint-related question.
The knowledge base provides relevant complaint information.
The prompt classifies if the query is about resolution timing.
Based on the classification using the condition, the application takes the following action:

Routes the query to an AI agent for specific resolution responses.
Returns general complaint information.

The application generates an appropriate response for the user.

Prerequisites
For this example, you need the following:

Access to SageMaker Unified Studio. (You will need the SageMaker Unified Studio portal URL from your administrator). You can authenticate using either:

AWS Identity and Access Management (IAM) user credentials.
Single sign-on (SSO) credentials with AWS IAM Identity Center.

The IAM user or IAM Identity Center user must have appropriate permissions for:

SageMaker Unified Studio.

Amazon Bedrock (including Amazon Bedrock Flows, Amazon Bedrock Agents, Amazon Bedrock Prompt Management, and Amazon Bedrock Knowledge Bases).
For more information, refer to Identity-based policy examples.

Access to Amazon Bedrock FMs (make sure these are enabled for your account), for example:Anthropic’s Claude 3 Haiku (for the agent).
Configure access to your Amazon Bedrock serverless models for Amazon Bedrock in SageMaker Unified Studio projects.
Amazon Titan Embedding (for the knowledge base).
Sample complaint data prepared in CSV format for creating the knowledge base.

Prepare your data
We have created a sample dataset to use for Amazon Bedrock Knowledge Bases. This dataset has information of complaints received by customer service representatives and resolution information.The following is an example from the sample dataset:

complaint_id,product,sub_product,issue,sub_issue,complaint_summary,action_taken,next_steps,financial_institution,state,submitted_via,resolution_type,timely_response
FIN-2024-001,04/26/24,”Mortgage”,”Conventional mortgage”,”Payment issue”,”Escrow dispute”,”Customer disputes mortgage payment increase after recent escrow analysis”,”Reviewed escrow analysis, explained property tax increase impact, provided detailed payment breakdown”,”1. Send written explanation of escrow analysis 2. Schedule annual escrow review 3. Provide payment assistance options”,”Financial Institution-1″,”TX”,”Web”,”Closed with explanation”,”Yes”
FIN-2024-002,04/26/24,”Money transfer”,”Wire transfer”,”Processing delay”,”International transfer”,”Wire transfer of $10,000 delayed, customer concerned about international payment deadline”,”Located wire transfer in system, expedited processing, waived wire fee”,”1. Confirm receipt with receiving bank 2. Update customer on delivery 3. Document process improvement needs”,”Financial Institution-2″,”FL”,”Phone”,”Closed with monetary relief”,”No”

Create a project
In SageMaker Unified Studio, users can use projects to collaborate on various business use cases. Within projects, you can manage data assets in the SageMaker Unified Studio catalog, perform data analysis, organize workflows, develop ML models, build generative AI applications, and more.
To create a project, complete the following steps:

Open the SageMaker Unified Studio landing page using the URL from your admin.
Choose Create project.
Enter a project name and optional description.
For Project profile, choose Generative AI application development.
Choose Continue.

Complete your project configuration, then choose Create project.

Create a prompt
Let’s create a reusable prompt to capture the instructions for FMs, which we will use later while creating the flow application. For more information, see Reuse and share Amazon Bedrock prompts.

In SageMaker Unified Studio, on the Build menu, choose Prompt under Machine Learning & Generative AI.

Provide a name for the prompt.
Choose the appropriate FM (for this example, we choose Claude 3 Haiku).
For Prompt message, we enter the following:

You are a complaint analysis classifier. You will receive complaint data from a knowledge base. Analyze the {{input}} and respond with a single letter:
T: If the input contains information about complaint resolution timing, response time, or processing timeline (whether timely or delayed)
F: For all other types of complaint information
Return only ‘T’ or ‘F’ based on whether the knowledge base response is about resolution timing. Do not add any additional text or explanation – respond with just the single letter ‘T’ or ‘F’.

Choose Save.

Choose Create version.

Create a chat agent
Let’s create a chat agent to handle specific resolution responses. Complete the following steps:

In SageMaker Unified Studio, on the Build menu, choose Chat agent under Machine Learning & Generative AI.
Provide a name for the prompt.
Choose the appropriate FM (for this example, we choose Claude 3 Haiku).
For Enter a system prompt, we enter the following:

You are a Financial Complaints Assistant AI. You will receive complaint information from a knowledge base and questions about resolution timing.
When responding to resolution timing queries:
1. Use the provided complaint information to confirm if it was resolved within timeline
2. For timely resolutions, provide:
– Confirmation of timely completion
– Specific actions taken (from the provided complaint data)
– Next steps that were completed
2. For delayed resolutions, provide:
– Acknowledgment of delay
– Standard compensation package:
• $75 service credit
• Priority Status upgrade for 6 months
• Service fees waived for current billing cycle
– Actions taken (from the provided complaint data)
– Contact information for follow-up: Priority Line: **************
Always reference the specific complaint details provided in your input when discussing actions taken and resolution process.

Choose Save.

After the agent is saved, choose Deploy.
For Alias name, enter demoAlias.
Choose Deploy.

Create a flow
Now that we have our prompt and agent ready, let’s create a flow that will orchestrate the complaint handling process:

In SageMaker Unified Studio, on the Build menu, choose Flow under Machine Learning & Generative AI.

Create a new flow called demo-flow.

Add a knowledge base to your flow application
Complete the following steps to add a knowledge base node to the flow:

In the navigation pane, on the Nodes tab, choose Knowledge Base.
On the Configure tab, provide the following information:

For Node name, enter a name (for example, complaints_kb).
Choose Create new Knowledge Base.

In the Create Knowledge Base pane, enter the following information:

For Name, enter a name (for example, complaints).
For Description, enter a description (for example, user complaints information).
For Add data sources, select Local file and upload the complaints.txt file.
For Embeddings model, choose Titan Text Embeddings V2.
For Vector store, choose OpenSearch Serverless.
Choose Create.

After you create the knowledge base, choose it in the flow.
In the details name, provide the following information:
For Response generation model, choose Claude 3 Haiku.
Connect the output of the flow input node with the input of the knowledge base node.
Connect the output of the knowledge base node with the input of the flow output node.

Choose Save.

Add a prompt to your flow application
Now let’s add the prompt you created earlier to the flow:

On the Nodes tab in the Flow app builder pane, add a prompt node.
On the Configure tab for the prompt node, provide the following information:
For Node name, enter a name (for example, demo_prompt).
For Prompt, choose financeAssistantPrompt.
For Version, choose 1.
Connect the output of the knowledge base node with the input of the prompt node.
Choose Save.

Add a condition to your flow application
The condition node determines how the flow handles different types of queries. It evaluates whether a query is about resolution timing or general complaint information, enabling the flow to route the query appropriately. When a query is about resolution timing, it will be directed to the chat agent for specialized handling; otherwise, it will receive a direct response from the knowledge base. Complete the following steps to add a condition:

On the Nodes tab in the Flow app builder pane, add a condition node.
On the Configure tab for the condition node, provide the following information:

For Node name, enter a name (for example, demo_condition).
Under Conditions, for Condition, enter conditionInput == “T”.
Connect the output of the prompt node with the input of the condition node.

Choose Save.

Add a chat agent to your flow application
Now let’s add the chat agent you created earlier to the flow:

On the Nodes tab in the Flow app builder pane, add the agent node.
On the Configure tab for the agent node, provide the following information:

For Node name, enter a name (for example, demo_agent).
For Chat agent, choose DemoAgent.
For Alias, choose demoAlias.

Create the following node connections:

Connect the input of the condition node (demo_condition) to the output of the prompt node (demo_prompt).
Connect the output of the condition node:

Set If condition is true to the agent node (demo_agent).
Set If condition is false to the existing flow output node (FlowOutputNode).

Connect the output of the knowledge base node (complaints_kb) to the input of the following:

The agent node (demo_agent).
The flow output node (FlowOutputNode).

Connect the output of the agent node (demo_agent) to a new flow output node named FlowOutputNode_2.

Choose Save.

Test the flow application
Now that the flow application is ready, let’s test it. On the right side of the page, choose the expand icon to open the Test pane.

In the Enter prompt text box, we can ask a few questions related to the dataset created earlier. The following screenshots show some examples.

Clean up
To clean up your resources, delete the flow, agent, prompt, knowledge base, and associated OpenSearch Serverless resources.
Conclusion
In this post, we demonstrated how to build an AI-powered complaint reference system using a flow application in SageMaker Unified Studio. By using the integrated capabilities of SageMaker Unified Studio with Amazon Bedrock features like Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, and Amazon Bedrock Flows, you can rapidly develop and deploy sophisticated AI applications without extensive coding.
As you build AI workflows using SageMaker Unified Studio, remember to adhere to the AWS Shared Responsibility Model for security. Implement SageMaker Unified Studio security best practices, including proper IAM configurations and data encryption. You can also refer to Secure a generative AI assistant with OWASP Top 10 mitigation for details on how to assess the security posture of a generative AI assistant using OWASP TOP 10 mitigations for common threats. Following these guidelines helps establish robust AI applications that maintain data integrity and system protection.
To learn more, refer to Amazon Bedrock in SageMaker Unified Studio and join discussions and share your experiences in AWS Generative AI Community.
We look forward to seeing the innovative solutions you will create with these powerful new features.

About the authors
Sumeet Tripathi is an Enterprise Support Lead (TAM) at AWS in North Carolina. He has over 17 years of experience in technology across various roles. He is passionate about helping customers to reduce operational challenges and friction. His focus area is AI/ML and Energy & Utilities Segment. Outside work, He enjoys traveling with family, watching cricket and movies.
Vishal Naik is a Sr. Solutions Architect at Amazon Web Services (AWS). He is a builder who enjoys helping customers accomplish their business needs and solve complex challenges with AWS solutions and best practices. His core area of focus includes Generative AI and Machine Learning. In his spare time, Vishal loves making short films on time travel and alternate universe themes.

Accelerating AI innovation: Scale MCP servers for enterprise workloads …

Posted on July 2, 2025 by i-genie

Generative AI has been moving at a rapid pace, with new tools, offerings, and models released frequently. According to Gartner, agentic AI is one of the top technology trends of 2025, and organizations are performing prototypes on how to use agents in their enterprise environment. Agents depend on tools, and each tool might have its own mechanism to send and receive information. Model Context Protocol (MCP) by Anthropic is an open source protocol that attempts to solve this challenge. It provides a protocol and communication standard that is cross-compatible with different tools, and can be used by an agentic application’s large language model (LLM) to connect to enterprise APIs or external tools using a standard mechanism. However, large enterprise organizations like financial services tend to have complex data governance and operating models, which makes it challenging to implement agents working with MCP.
One major challenge is the siloed approach in which individual teams build their own tools, leading to duplication of efforts and wasted resources. This approach slows down innovation and creates inconsistencies in integrations and enterprise design. Furthermore, managing multiple disconnected MCP tools across teams makes it difficult to scale AI initiatives effectively. These inefficiencies hinder enterprises from fully taking advantage of generative AI for tasks like post-trade processing, customer service automation, and regulatory compliance.
In this post, we present a centralized MCP server implementation using Amazon Bedrock that offers an innovative approach by providing shared access to tools and resources. With this approach, teams can focus on building AI capabilities rather than spending time developing or maintaining tools. By standardizing access to resources and tools through MCP, organizations can accelerate the development of AI agents, so teams can reach production faster. Additionally, a centralized approach provides consistency and standardization and reduces operational overhead, because the tools are managed by a dedicated team rather than across individual teams. It also enables centralized governance that enforces controlled access to MCP servers, which reduces the risk of data exfiltration and prevents unauthorized or insecure tool use across the organization.
Solution overview
The following figure illustrates a proposed solution based on a financial services use case that uses MCP servers across multiple lines of business (LoBs), such as compliance, trading, operations, and risk management. Each LoB performs distinct functions tailored to their specific business. For instance, the trading LoB focuses on trade execution, whereas the risk LoB performs risk limit checks. For performing these functions, each division provides a set of MCP servers that facilitate actions and access to relevant data within their LoBs. These servers are accessible to agents developed within the respective LoBs and can also be exposed to agents outside LoBs.

The development of MCP servers is decentralized. Each LoB is responsible for developing the servers that support their specific functions. When the development of a server is complete, it’s hosted centrally and accessible across LoBs. It takes the form of a registry or marketplace that facilitates integration of AI-driven solutions across divisions while maintaining control and governance over shared resources.
In the following sections, we explore what the solution looks like on a conceptual level.
Agentic application interaction with a central MCP server hub
The following flow diagram showcases how an agentic application built using Amazon Bedrock interacts with one of the MCP servers located in the MCP server hub.
The flow consists of the following steps:

The application connects to the central MCP hub through the load balancer and requests a list of available tools from the specific MCP server. This can be fine-grained based on what servers the agentic application has access to.
The trade server responds with list of tools available, including details such as tool name, description, and required input parameters.
The agentic application invokes an Amazon Bedrock agent and provides the list of tools available.
Using this information, the agent determines what to do next based on the given task and the list of tools available to it.
The agent chooses the most suitable tool and responds with the tool name and input parameters. The control comes back to the agentic application.
The agentic application calls for the execution of the tool through the MCP server using the tool name and input parameters.
The trade MCP server executes the tool and returns the results of the execution back to the application.
The application returns the results of the tool execution back to the Amazon Bedrock agent.
The agent observes the tool execution results and determines the next step.

Let’s dive into the technical architecture of the solution.
Architecture overview
The following diagram illustrates the architecture to host the centralized cluster of MCP servers for an LoB.

The architecture can be split in five sections:

MCP server discovery API
Agentic applications
Central MCP server hub
Tools and resources

Let’s explore each section in detail:

MCP server discovery API – This API is a dedicated endpoint for discovering various MCP servers. Different teams can call this API to find what MCP servers are available in the registry; read their description, tool, and resource details; and decide which MCP server would be the right one for their agentic application. When a new MCP server is published, it’s added to an Amazon DynamoDB database. MCP server owners are responsible for keeping the registry information up-to-date.
Agentic application – The agentic applications are hosted on AWS Fargate for Amazon Elastic Container Service (Amazon ECS) and built using Amazon Bedrock Agents. Teams can also use the newly released open source AWS Strands Agents SDK, or other agentic frameworks of choice, to build the agentic application and their own containerized solution to host the agentic application. The agentic applications access Amazon Bedrock through a secure private virtual private cloud (VPC) endpoint. It uses private VPC endpoints to access MCP servers.
Central MCP server hub – This is where the MCP servers are hosted. Access to servers is enabled through an AWS Network Load Balancer. Technically, each server is a Docker container that can is hosted on Amazon ECS, but you can choose your own container deployment solution. These servers can scale individually without impacting the other server. These servers in turn connect to one or more tools using private VPC endpoints.
Tools and resources – This component holds the tools, such as databases, another application, Amazon Simple Storage Service (Amazon S3), or other tools. For enterprises, access to the tools and resources is provided only through private VPC endpoints.

Benefits of the solution
The solution offers the following key benefits:

Scalability and resilience – Because you’re using Amazon ECS on Fargate, you get scalability out of the box without managing infrastructure and handling scaling concerns. Amazon ECS automatically detects and recovers from failures by restarting failed MCP server tasks locally or reprovisioning containers, minimizing downtime. It can also redirect traffic away from unhealthy Availability Zones and rebalance tasks across healthy Availability Zones to provide uninterrupted access to the server.
Security – Access to MCP servers is secured at the network level through network controls such as PrivateLink. This makes sure the agentic application only connects to trusted MCP servers hosted by the organization, and vice versa. Each Fargate workload runs in an isolated environment. This prevents resource sharing between tasks. For application authentication and authorization, we propose using an MCP Auth Server (refer to the following GitHub repo) to hand off those tasks to a dedicated component that can scale independently.

At the time of writing, the MCP protocol doesn’t provide built-in mechanisms for user-level access control or authorization. Organizations requiring user-specific access restrictions must implement additional security layers on top of the MCP protocol. For a reference implementation, refer to the following GitHub repo.
Let’s dive deeper in the implementation of this solution.
Use case
The implementation is based on a financial services use case featuring post-trade execution. Post-trade execution refers to the processes and steps that take place after an equity buy/sell order has been placed by a customer. It involves many steps, including verifying trade details, actual transfer of assets, providing a detailed report of the execution, running fraudulent checks, and more. For simplification of the demo, we focus on the order execution step.
Although this use case is tailored to the financial industry, you can apply the architecture and the approach to other enterprise workloads as well. The entire code of this implementation is available on GitHub. We use the AWS Cloud Development Kit (AWS CDK) for Python to deploy this solution, which creates an agentic application connected to tools through the MCP server. It also creates a Streamlit UI to interact with the agentic application.
The following code snippet provides access to the MCP discovery API:

def get_server_registry():
# Initialize DynamoDB client
dynamodb = boto3.resource(‘dynamodb’)
table = dynamodb.Table(DDBTBL_MCP_SERVER_REGISTRY)

try:
# Scan the table to get all items
response = table.scan()
items = response.get(‘Items’, [])

# Format the items to include only id, description, server
formatted_items = []
for item in items:
formatted_item = {
‘id’: item.get(‘id’, ”),
‘description’: item.get(‘description’, ”),
‘server’: item.get(‘server’, ”),
}
formatted_items.append(formatted_item)

# Return the formatted items as JSON
return {
‘statusCode’: 200,
‘headers’: cors_headers,
‘body’: json.dumps(formatted_items)
}
except Exception as e:
# Handle any errors
return {
‘statusCode’: 500,
‘headers’: cors_headers,
‘body’: json.dumps({‘error’: str(e)})
}

The preceding code is invoked through an AWS Lambda function. The complete code is available in the GitHub repository. The following graphic shows the response of the discovery API.

Let’s explore a scenario where the user submits a question: “Buy 100 shares of AMZN at USD 186, to be distributed equally between accounts A31 and B12.”To execute this task, the agentic application invokes the trade-execution MCP server. The following code is the sample implementation of the MCP server for trade execution:

from fastmcp import FastMCP
from starlette.requests import Request
from starlette.responses import PlainTextResponse
mcp = FastMCP(“server”)

@mcp.custom_route(“/”, methods=[“GET”])
async def health_check(request: Request) -> PlainTextResponse:
return PlainTextResponse(“OK”)

@mcp.tool()
async def executeTrade(ticker, quantity, price):
“””
Execute a trade for the given ticker, quantity, and price.

Sample input:
{
“ticker”: “AMZN”,
“quantity”: 1000,
“price”: 150.25
}
“””
# Simulate trade execution
return {
“tradeId”: “T12345”,
“status”: “Executed”,
“timestamp”: “2025-04-09T22:58:00”
}

@mcp.tool()
async def sendTradeDetails(tradeId):
“””
Send trade details for the given tradeId.
Sample input:
{
“tradeId”: “T12345”
}
“””
return {
“status”: “Details Sent”,
“recipientSystem”: “MiddleOffice”,
“timestamp”: “2025-04-09T22:59:00”
}
if __name__ == “__main__”:
mcp.run(host=”0.0.0.0″, transport=”streamable-http”)

The complete code is available in the following GitHub repo.
The following graphic shows the MCP server execution in action.

This is a sample implementation of the use case focusing on the deployment step. For a production scenario, we strongly recommend adding a human oversight workflow to monitor the execution and provide input at various steps of the trade execution.
Now you’re ready to deploy this solution.
Prerequisites
Prerequisites for the solution are available in the README.md of the GitHub repository.
Deploy the application
Complete the following steps to run this solution:

Navigate to the README.md file of the GitHub repository to find the instructions to deploy the solution. Follow these steps to complete deployment.

The successful deployment will exit with a message similar to the one shown in the following screenshot.

When the deployment is complete, access the Streamlit application.

You can find the Streamlit URL in the terminal output, similar to the following screenshot.

Enter the URL of the Streamlit application in a browser to open the application console.

On the application console, different sets of MCP servers are listed in the left pane under MCP Server Registry. Each set corresponds to an MCP server and includes the definition of the tools, such as the name, description, and input parameters.

In the right pane, Agentic App, a request is pre-populated: “Buy 100 shares of AMZN at USD 186, to be distributed equally between accounts A31 and B12.” This request is ready to be submitted to the agent for execution.

Choose Submit to invoke an Amazon Bedrock agent to process the request.

The agentic application will evaluate the request together with the list of tools it has access to, and iterate through a series of tools execution and evaluation to fulfil the request.You can view the trace output to see the tools that the agent used. For each tool used, you can see the values of the input parameters, followed by the corresponding results. In this case, the agent operated as follows:

The agent first used the function executeTrade with input parameters of ticker=AMZN, quantity=100, and price=186
After the trade was executed, used the allocateTrade tool to allocate the trade position between two portfolio accounts

Clean up
You will incur charges when you consume the services used in this solution. Instructions to clean up the resources are available in the README.md of the GitHub repository.
Summary
This solution offers a straightforward and enterprise-ready approach to implement MCP servers on AWS. With this centralized operating model, teams can focus on building their applications rather than maintaining the MCP servers. As enterprises continue to embrace agentic workflows, centralized MCP servers offer a practical solution for overcoming operational silos and inefficiencies. With the AWS scalable infrastructure and advanced tools like Amazon Bedrock Agents and Amazon ECS, enterprises can accelerate their journey toward smarter workflows and better customer outcomes.
Check out the GitHub repository to replicate the solution in your own AWS environment.
To learn more about how to run MCP servers on AWS, refer to the following resources:

Harness the power of MCP servers with Amazon Bedrock Agents
Unlocking the power of Model Context Protocol (MCP) on AWS
Amazon Bedrock Agents Samples GitHub repository

About the authors
Xan Huang is a Senior Solutions Architect with AWS and is based in Singapore. He works with major financial institutions to design and build secure, scalable, and highly available solutions in the cloud. Outside of work, Xan dedicates most of his free time to his family, where he lovingly takes direction from his two young daughters, aged one and four. You can find Xan on LinkedIn: https://www.linkedin.com/in/xanhuang/
Vikesh Pandey is a Principal GenAI/ML Specialist Solutions Architect at AWS helping large financial institutions adopt and scale generative AI and ML workloads. He is the author of book “Generative AI for financial services.” He carries more than decade of experience building enterprise-grade applications on generative AI/ML and related technologies. In his spare time, he plays an unnamed sport with his son that lies somewhere between football and rugby.

Choosing the right approach for generative AI-powered structured data …

Posted on July 2, 2025 by i-genie

Organizations want direct answers to their business questions without the complexity of writing SQL queries or navigating through business intelligence (BI) dashboards to extract data from structured data stores. Examples of structured data include tables, databases, and data warehouses that conform to a predefined schema. Large language model (LLM)-powered natural language query systems transform how we interact with data, so you can ask questions like “Which region has the highest revenue?” and receive immediate, insightful responses. Implementing these capabilities requires careful consideration of your specific needs—whether you need to integrate knowledge from other systems (for example, unstructured sources like documents), serve internal or external users, handle the analytical complexity of questions, or customize responses for business appropriateness, among other factors.
In this post, we discuss LLM-powered structured data query patterns in AWS. We provide a decision framework to help you select the best pattern for your specific use case.
Business challenge: Making structured data accessible
Organizations have vast amounts of structured data but struggle to make it effectively accessible to non-technical users for several reasons:

Business users lack the technical knowledge (like SQL) needed to query data
Employees rely on BI teams or data scientists for analysis, limiting self-service capabilities
Gaining insights often involves time delays that impact decision-making
Predefined dashboards constrain spontaneous exploration of data
Users might not know what questions are possible or where relevant data resides

Solution overview
An effective solution should provide the following:

A conversational interface that allows employees to query structured data sources without technical expertise
The ability to ask questions in everyday language and receive accurate, trustworthy answers
Automatic generation of visualizations and explanations to clearly communicate insights.
Integration of information from different data sources (both structured and unstructured) presented in a unified manner
Ease of integration with existing investments and rapid deployment capabilities
Access restriction based on identities, roles, and permissions

In the following sections, we explore five patterns that can address these needs, highlighting the architecture, ideal use cases, benefits, considerations, and implementation resources for each approach.
Pattern 1: Direct conversational interface using an enterprise assistant
This pattern uses Amazon Q Business, a generative AI-powered assistant, to provide a chat interface on data sources with native connectors. When users ask questions in natural language, Amazon Q Business connects to the data source, interprets the question, and retrieves relevant information without requiring intermediate services. The following diagram illustrates this workflow.

This approach is ideal for internal enterprise assistants that need to answer business user-facing questions from both structured and unstructured data sources in a unified experience. For example, HR personnel can ask “What’s our parental leave policy and how many employees used it last quarter?” and receive answers drawn from both leave policy documentation and employee databases together in one interaction. With this pattern, you can benefit from the following:

Simplified connectivity through the extensive Amazon Q Business library of built-in connectors
Streamlined implementation with a single service to configure and manage
Unified search experience for accessing both structured and unstructured information
Built-in understanding and respect existing identities, roles, and permissions

You can define the scope of data to be pulled in the form of a SQL query. Amazon Q Business pre-indexes database content based on defined SQL queries and uses this index when responding to user questions. Similarly, you can define the sync mode and schedule to determine how often you want to update your index. Amazon Q Business does the heavy lifting of indexing the data using a Retrieval Augmented Generation (RAG) approach and using an LLM to generate well-written answers. For more details on how to set up Amazon Q Business with an Amazon Aurora PostgreSQL-Compatible Edition connector, see Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector. You can also refer to the complete list of supported data source connectors.
Pattern 2: Enhancing BI tool with natural language querying capabilities
This pattern uses Amazon Q in QuickSight to process natural language queries against datasets that have been previously configured in Amazon QuickSight. Users can ask questions in everyday language within the QuickSight interface and get visualized answers without writing SQL. This approach works with QuickSight (Enterprise or Q edition) and supports various data sources, including Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon Athena, and others. The architecture is depicted in the following diagram.

This pattern is well-suited for internal BI and analytics use cases. Business analysts, executives, and other employees can ask ad-hoc questions to get immediate visualized insights in the form of dashboards. For example, executives can ask questions like “What were our top 5 regions by revenue last quarter?” and immediately see responsive charts, reducing dependency on analytics teams. The benefits of this pattern are as follows:

It enables natural language queries that produce rich visualizations and charts
No coding or machine learning (ML) experience is needed—the heavy lifting like natural language interpretation and SQL generation is managed by Amazon Q in QuickSight
It integrates seamlessly within the familiar QuickSight dashboard environment

Existing QuickSight users might find this the most straightforward way to take advantage of generative AI benefits. You can optimize this pattern for higher-quality results by configuring topics like curated fields, synonyms, and expected question phrasing. This pattern will pull data only from a specific configured data source in QuickSight to produce a dashboard as an output. For more details, check out QuickSight DemoCentral to view a demo in QuickSight, see the generative BI learning dashboard, and view guided instructions to create dashboards with Amazon Q. Also refer to the list of supported data sources.
Pattern 3: Combining BI visualization with conversational AI for a seamless experience
This pattern merges BI visualization capabilities with conversational AI to create a seamless knowledge experience. By integrating Amazon Q in QuickSight with Amazon Q Business (with the QuickSight plugin enabled), organizations can provide users with a unified conversational interface that draws on both unstructured and structured data. The following diagram illustrates the architecture.

This is ideal for enterprises that want an internal AI assistant to answer a variety of questions—whether it’s a metric from a database or knowledge from a document. For example, executives can ask “What was our Q4 revenue growth?” and see visualized results from data warehouses through Amazon Redshift through QuickSight, then immediately follow up with “What is our company vacation policy?” to access HR documentation—all within the same conversation flow. This pattern offers the following benefits:

It unifies answers from structured data (databases and warehouses) and unstructured data (documents, wikis, emails) in a single application
It delivers rich visualizations alongside conversational responses in a seamless experience with real-time analysis in chat
There is no duplication of work—if your BI team has already built datasets and topics in QuickSight for analytics, you use that in Amazon Q Business
It maintains conversational context when switching between data and document-based inquiries

For more details, see Query structured data from Amazon Q Business using Amazon QuickSight integration and Amazon Q Business now provides insights from your databases and data warehouses (preview).
Another variation of this pattern is recommended for BI users who want to expose unified data through rich visuals in QuickSight, as illustrated in the following diagram.

For more details, see Integrate unstructured data into Amazon QuickSight using Amazon Q Business.
Pattern 4: Building knowledge bases from structured data using managed text-to-SQL
This pattern uses Amazon Bedrock Knowledge Bases to enable structured data retrieval. The service provides a fully managed text-to-SQL module that alleviates common challenges in developing natural language query applications for structured data. This implementation uses Amazon Bedrock (Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases) along with your choice of data warehouse such as Amazon Redshift or Amazon SageMaker Lakehouse. The following diagram illustrates the workflow.

For example, a seller can use this capability embedded into an ecommerce application to ask a complex query like “Give me top 5 products whose sales increased by 50% last year as compared to previous year? Also group the results by product category.” The system automatically generates the appropriate SQL, executes it against the data sources, and delivers results or a summarized narrative. This pattern features the following benefits:

It provides fully managed text-to-SQL capabilities without requiring model training
It enables direct querying of data from the source without data movement
It supports complex analytical queries on warehouse data
It offers flexibility in foundation model (FM) selection through Amazon Bedrock
API connectivity, personalization options, and context-aware chat features make it better suited for customer facing applications

Choose this pattern when you need a flexible, developer-oriented solution. This approach works well for applications (internal or external) where you control the UI design. Default outputs are primarily text or structured data. However, executing arbitrary SQL queries can be a security risk for text-to-SQL applications. It is recommended that you take precautions as needed, such as using restricted roles, read-only databases, and sandboxing. For more information on how to build this pattern, see Empower financial analytics by creating structured knowledge bases using Amazon Bedrock and Amazon Redshift. For a list of supported structured data stores, refer to Create a knowledge base by connecting to a structured data store.
Pattern 5: Custom text-to-SQL implementation with flexible model selection
This pattern represents a build-your-own solution using FMs to convert natural language to SQL, execute queries on data warehouses, and return results. Choose Amazon Bedrock when you want to quickly integrate this capability without deep ML expertise—it offers a fully managed service with ready-to-use FMs through a unified API, handling infrastructure needs with pay-as-you-go pricing. Alternatively, select Amazon SageMaker AI when you require extensive model customization to build specialized needs—it provides complete ML lifecycle tools for data scientists and ML engineers to build, train, and deploy custom models with greater control. For more information, refer to our Amazon Bedrock or Amazon SageMaker AI decision guide. The following diagram illustrates the architecture.

Use this pattern if your use case requires specific open-weight models, or you want to fine-tune models on your domain-specific data. For example, if you need highly accurate results for your query, then you can use this pattern to fine-tune models on specific schema structures, while maintaining the flexibility to integrate with existing workflows and multi-cloud environments. This pattern offers the following benefits:

It provides maximum customization in model selection, fine-tuning, and system design
It supports complex logic across multiple data sources
It offers complete control over security and deployment in your virtual private cloud (VPC)
It enables flexible interface implementation (Slack bots, custom web UIs, notebook plugins)
You can implement it for external user-facing solutions

For more information on steps to build this pattern, see Build a robust text-to-SQL solution generating complex queries, self-correcting, and querying diverse data sources.
Pattern comparison: Making the right choice
To make effective decisions, let’s compare these patterns across key criteria.
Data workload suitability
Different out-of-the-box patterns handle transactional (operational) and analytical (historical or aggregated) data with varying degrees of effectiveness. Patterns 1 and 3, which use Amazon Q Business, work with indexed data and are optimized for lookup-style queries against previously indexed content rather than real-time transactional database queries. Pattern 2, which uses Amazon Q in QuickSight, gets visual output for transactional information for ad-hoc analysis. Pattern 4, which uses Amazon Bedrock structured data retrieval, is specifically designed for analytical systems and data warehouses, excelling at complex queries on large datasets. Pattern 5 is a self-managed text-to-SQL option that can be built to support both transactional or analytical needs of users.
Target audience
Architectures highlighted in Patterns 1, 2, and 3 (using Amazon Q Business, Amazon Q in QuickSight, or a combination) are best suited for internal enterprise use. However, you can use Amazon QuickSight Embedded to embed data visuals, dashboards, and natural language queries into both internal or customer-facing applications. Amazon Q Business serves as an enterprise AI assistant for organizational knowledge that uses subscription-based pricing tiers that is designed for internal employees. Pattern 4 (using Amazon Bedrock) can be used to build both internal as well as customer-facing applications. This is because, unlike the subscription-based model of Amazon Q Business, Amazon Bedrock provides API-driven services that alleviate per-user costs and identity management overhead for external customer scenarios. This makes it well-suited for customer-facing experiences where you need to serve potentially thousands of external users. The custom LLM solutions in Pattern 5 can similarly be tailored to external application requirements.
Interface and output format
Different patterns deliver answers through different interaction models:

Conversational experiences – Patterns 1 and 3 (using Amazon Q Business) provide chat-based interfaces. Pattern 4 (using Amazon Bedrock Knowledge Bases for structured data retrieval) naturally supports AI assistant integration, and Pattern 5 (a custom text-to-SQL solution) can be designed for a variety of interaction models.
Visualization-focused output – Pattern 2 (using Amazon Q in QuickSight) specializes in generating on-the-fly visualizations such as charts and tables in response to user questions.
API integration – For embedding capabilities into existing applications, Patterns 4 and 5 offer the most flexible API-based integration options.

The following figure is a comparison matrix of AWS structured data query patterns.

Conclusion
Between these patterns, your optimal choice depends on the following key factors:

Data location and characteristics – Is your data in operational databases, already in a data warehouse, or distributed across various sources?
User profile and interaction model – Are you supporting internal or external users? Do they prefer conversational or visualization-focused interfaces?
Available resources and expertise – Do you have ML specialists available, or do you need a fully managed solution?
Accuracy and governance requirements – Do you need strictly controlled semantics and curation, or is broader query flexibility acceptable with monitoring?

By understanding these patterns and their trade-offs, you can architect solutions that align with your business objectives.

About the authors
Akshara Shah is a Senior Solutions Architect at Amazon Web Services. She helps commercial customers build cloud-based generative AI services to meet their business needs. She has been designing, developing, and implementing solutions that leverage AI and ML technologies for more than 10 years. Outside of work, she loves painting, exercising and spending time with family.
Sanghwa Na is a Generative AI Specialist Solutions Architect at Amazon Web Services. Based in San Francisco, he works with customers to design and build generative AI solutions using large language models and foundation models on AWS. He focuses on helping organizations adopt AI technologies that drive real business value

Building Advanced Multi-Agent AI Workflows by Leveraging AutoGen and S …

Posted on July 1, 2025 by i-genie

In this tutorial, we walk you through the seamless integration of AutoGen and Semantic Kernel with Google’s Gemini Flash model. We begin by setting up our GeminiWrapper and SemanticKernelGeminiPlugin classes to bridge the generative power of Gemini with AutoGen’s multi-agent orchestration. From there, we configure specialist agents, ranging from code reviewers to creative analysts, demonstrating how we can leverage AutoGen’s ConversableAgent API alongside Semantic Kernel’s decorated functions for text analysis, summarization, code review, and creative problem-solving. By combining AutoGen’s robust agent framework with Semantic Kernel’s function-driven approach, we create an advanced AI assistant that adapts to a variety of tasks with structured, actionable insights.

Copy CodeCopiedUse a different Browser!pip install pyautogen semantic-kernel google-generativeai python-dotenv

import os
import asyncio
from typing import Dict, Any, List
import autogen
import google.generativeai as genai
from semantic_kernel import Kernel
from semantic_kernel.functions import KernelArguments
from semantic_kernel.functions.kernel_function_decorator import kernel_function

We start by installing the core dependencies: pyautogen, semantic-kernel, google-generativeai, and python-dotenv, ensuring we have all the necessary libraries for our multi-agent and semantic function setup. Then we import essential Python modules (os, asyncio, typing) along with autogen for agent orchestration, genai for Gemini API access, and the Semantic Kernel classes and decorators to define our AI functions.

Copy CodeCopiedUse a different BrowserGEMINI_API_KEY = “Use Your API Key Here”
genai.configure(api_key=GEMINI_API_KEY)

config_list = [
{
“model”: “gemini-1.5-flash”,
“api_key”: GEMINI_API_KEY,
“api_type”: “google”,
“api_base”: “https://generativelanguage.googleapis.com/v1beta”,
}
]

We define our GEMINI_API_KEY placeholder and immediately configure the genai client so all subsequent Gemini calls are authenticated. Then we build a config_list containing the Gemini Flash model settings, model name, API key, endpoint type, and base URL, which we’ll hand off to our agents for LLM interactions.

Copy CodeCopiedUse a different Browserclass GeminiWrapper:
“””Wrapper for Gemini API to work with AutoGen”””

def __init__(self, model_name=”gemini-1.5-flash”):
self.model = genai.GenerativeModel(model_name)

def generate_response(self, prompt: str, temperature: float = 0.7) -> str:
“””Generate response using Gemini”””
try:
response = self.model.generate_content(
prompt,
generation_config=genai.types.GenerationConfig(
temperature=temperature,
max_output_tokens=2048,
)
)
return response.text
except Exception as e:
return f”Gemini API Error: {str(e)}”

We encapsulate all Gemini Flash interactions in a GeminiWrapper class, where we initialize a GenerativeModel for our chosen model and expose a simple generate_response method. In this method, we pass the prompt and temperature into Gemini’s generate_content API (capped at 2048 tokens) and return the raw text or a formatted error.

Copy CodeCopiedUse a different Browserclass SemanticKernelGeminiPlugin:
“””Semantic Kernel plugin using Gemini Flash for advanced AI operations”””

def __init__(self):
self.kernel = Kernel()
self.gemini = GeminiWrapper()

@kernel_function(name=”analyze_text”, description=”Analyze text for sentiment and key insights”)
def analyze_text(self, text: str) -> str:
“””Analyze text using Gemini Flash”””
prompt = f”””
Analyze the following text comprehensively:

Text: {text}

Provide analysis in this format:
– Sentiment: [positive/negative/neutral with confidence]
– Key Themes: [main topics and concepts]
– Insights: [important observations and patterns]
– Recommendations: [actionable next steps]
– Tone: [formal/informal/technical/emotional]
“””

return self.gemini.generate_response(prompt, temperature=0.3)

@kernel_function(name=”generate_summary”, description=”Generate comprehensive summary”)
def generate_summary(self, content: str) -> str:
“””Generate summary using Gemini’s advanced capabilities”””
prompt = f”””
Create a comprehensive summary of the following content:

Content: {content}

Provide:
1. Executive Summary (2-3 sentences)
2. Key Points (bullet format)
3. Important Details
4. Conclusion/Implications
“””

return self.gemini.generate_response(prompt, temperature=0.4)

@kernel_function(name=”code_analysis”, description=”Analyze code for quality and suggestions”)
def code_analysis(self, code: str) -> str:
“””Analyze code using Gemini’s code understanding”””
prompt = f”””
Analyze this code comprehensively:

“`
{code}
“`

Provide analysis covering:
– Code Quality: [readability, structure, best practices]
– Performance: [efficiency, optimization opportunities]
– Security: [potential vulnerabilities, security best practices]
– Maintainability: [documentation, modularity, extensibility]
– Suggestions: [specific improvements with examples]
“””

return self.gemini.generate_response(prompt, temperature=0.2)

@kernel_function(name=”creative_solution”, description=”Generate creative solutions to problems”)
def creative_solution(self, problem: str) -> str:
“””Generate creative solutions using Gemini’s creative capabilities”””
prompt = f”””
Problem: {problem}

Generate creative solutions:
1. Conventional Approaches (2-3 standard solutions)
2. Innovative Ideas (3-4 creative alternatives)
3. Hybrid Solutions (combining different approaches)
4. Implementation Strategy (practical steps)
5. Potential Challenges and Mitigation
“””

return self.gemini.generate_response(prompt, temperature=0.8)

We encapsulate our Semantic Kernel logic in the SemanticKernelGeminiPlugin, where we initialize both the Kernel and our GeminiWrapper to power custom AI functions. Using the @kernel_function decorator, we declare methods like analyze_text, generate_summary, code_analysis, and creative_solution, each of which constructs a structured prompt and delegates the heavy lifting to Gemini Flash. This plugin lets us seamlessly register and invoke advanced AI operations within our Semantic Kernel environment.

Copy CodeCopiedUse a different Browserclass AdvancedGeminiAgent:
“””Advanced AI Agent using Gemini Flash with AutoGen and Semantic Kernel”””

def __init__(self):
self.sk_plugin = SemanticKernelGeminiPlugin()
self.gemini = GeminiWrapper()
self.setup_agents()

def setup_agents(self):
“””Initialize AutoGen agents with Gemini Flash”””

gemini_config = {
“config_list”: [{“model”: “gemini-1.5-flash”, “api_key”: GEMINI_API_KEY}],
“temperature”: 0.7,
}

self.assistant = autogen.ConversableAgent(
name=”GeminiAssistant”,
llm_config=gemini_config,
system_message=”””You are an advanced AI assistant powered by Gemini Flash with Semantic Kernel capabilities.
You excel at analysis, problem-solving, and creative thinking. Always provide comprehensive, actionable insights.
Use structured responses and consider multiple perspectives.”””,
human_input_mode=”NEVER”,
)

self.code_reviewer = autogen.ConversableAgent(
name=”GeminiCodeReviewer”,
llm_config={**gemini_config, “temperature”: 0.3},
system_message=”””You are a senior code reviewer powered by Gemini Flash.
Analyze code for best practices, security, performance, and maintainability.
Provide specific, actionable feedback with examples.”””,
human_input_mode=”NEVER”,
)

self.creative_analyst = autogen.ConversableAgent(
name=”GeminiCreativeAnalyst”,
llm_config={**gemini_config, “temperature”: 0.8},
system_message=”””You are a creative problem solver and innovation expert powered by Gemini Flash.
Generate innovative solutions, and provide fresh perspectives.
Balance creativity with practicality.”””,
human_input_mode=”NEVER”,
)

self.data_specialist = autogen.ConversableAgent(
name=”GeminiDataSpecialist”,
llm_config={**gemini_config, “temperature”: 0.4},
system_message=”””You are a data analysis expert powered by Gemini Flash.
Provide evidence-based recommendations and statistical perspectives.”””,
human_input_mode=”NEVER”,
)

self.user_proxy = autogen.ConversableAgent(
name=”UserProxy”,
human_input_mode=”NEVER”,
max_consecutive_auto_reply=2,
is_termination_msg=lambda x: x.get(“content”, “”).rstrip().endswith(“TERMINATE”),
llm_config=False,
)

def analyze_with_semantic_kernel(self, content: str, analysis_type: str) -> str:
“””Bridge function between AutoGen and Semantic Kernel with Gemini”””
try:
if analysis_type == “text”:
return self.sk_plugin.analyze_text(content)
elif analysis_type == “code”:
return self.sk_plugin.code_analysis(content)
elif analysis_type == “summary”:
return self.sk_plugin.generate_summary(content)
elif analysis_type == “creative”:
return self.sk_plugin.creative_solution(content)
else:
return “Invalid analysis type. Use ‘text’, ‘code’, ‘summary’, or ‘creative’.”
except Exception as e:
return f”Semantic Kernel Analysis Error: {str(e)}”

def multi_agent_collaboration(self, task: str) -> Dict[str, str]:
“””Orchestrate multi-agent collaboration using Gemini”””
results = {}

agents = {
“assistant”: (self.assistant, “comprehensive analysis”),
“code_reviewer”: (self.code_reviewer, “code review perspective”),
“creative_analyst”: (self.creative_analyst, “creative solutions”),
“data_specialist”: (self.data_specialist, “data-driven insights”)
}

for agent_name, (agent, perspective) in agents.items():
try:
prompt = f”Task: {task}nnProvide your {perspective} on this task.”
response = agent.generate_reply([{“role”: “user”, “content”: prompt}])
results[agent_name] = response if isinstance(response, str) else str(response)
except Exception as e:
results[agent_name] = f”Agent {agent_name} error: {str(e)}”

return results

def run_comprehensive_analysis(self, query: str) -> Dict[str, Any]:
“””Run comprehensive analysis using all Gemini-powered capabilities”””
results = {}

analyses = [“text”, “summary”, “creative”]
for analysis_type in analyses:
try:
results[f”sk_{analysis_type}”] = self.analyze_with_semantic_kernel(query, analysis_type)
except Exception as e:
results[f”sk_{analysis_type}”] = f”Error: {str(e)}”

try:
results[“multi_agent”] = self.multi_agent_collaboration(query)
except Exception as e:
results[“multi_agent”] = f”Multi-agent error: {str(e)}”

try:
results[“direct_gemini”] = self.gemini.generate_response(
f”Provide a comprehensive analysis of: {query}”, temperature=0.6
)
except Exception as e:
results[“direct_gemini”] = f”Direct Gemini error: {str(e)}”

return results

We add our end-to-end AI orchestration in the AdvancedGeminiAgent class, where we initialize our Semantic Kernel plugin, Gemini wrapper, and configure a suite of specialist AutoGen agents (assistant, code reviewer, creative analyst, data specialist, and user proxy). With simple methods for semantic-kernel bridging, multi-agent collaboration, and direct Gemini calls, we enable a seamless, comprehensive analysis pipeline for any user query.

Copy CodeCopiedUse a different Browserdef main():
“””Main execution function for Google Colab with Gemini Flash”””
print(” Initializing Advanced Gemini Flash AI Agent…”)
print(” Using Gemini 1.5 Flash for high-speed, cost-effective AI processing”)

try:
agent = AdvancedGeminiAgent()
print(” Agent initialized successfully!”)
except Exception as e:
print(f” Initialization error: {str(e)}”)
print(” Make sure to set your Gemini API key!”)
return

demo_queries = [
“How can AI transform education in developing countries?”,
“def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)”,
“What are the most promising renewable energy technologies for 2025?”
]

print(“n Running Gemini Flash Powered Analysis…”)

for i, query in enumerate(demo_queries, 1):
print(f”n{‘=’*60}”)
print(f” Demo {i}: {query}”)
print(‘=’*60)

try:
results = agent.run_comprehensive_analysis(query)

for key, value in results.items():
if key == “multi_agent” and isinstance(value, dict):
print(f”n {key.upper().replace(‘_’, ‘ ‘)}:”)
for agent_name, response in value.items():
print(f” {agent_name}: {str(response)[:200]}…”)
else:
print(f”n {key.upper().replace(‘_’, ‘ ‘)}:”)
print(f” {str(value)[:300]}…”)

except Exception as e:
print(f” Error in demo {i}: {str(e)}”)

print(f”n{‘=’*60}”)
print(” Gemini Flash AI Agent Demo Completed!”)
print(” To use with your API key, replace ‘your-gemini-api-key-here'”)
print(” Get your free Gemini API key at: https://makersuite.google.com/app/apikey”)

if __name__ == “__main__”:
main()

Finally, we run the main function that initializes the AdvancedGeminiAgent, prints out status messages, and iterates through a set of demo queries. As we run each query, we collect and display results from semantic-kernel analyses, multi-agent collaboration, and direct Gemini responses, ensuring a clear, step-by-step showcase of our multi-agent AI workflow.

In conclusion, we showcased how AutoGen and Semantic Kernel complement each other to produce a versatile, multi-agent AI system powered by Gemini Flash. We highlighted how AutoGen simplifies the orchestration of diverse expert agents, while Semantic Kernel provides a clean, declarative layer for defining and invoking advanced AI functions. By uniting these tools in a Colab notebook, we’ve enabled rapid experimentation and prototyping of complex AI workflows without sacrificing clarity or control.

Check out the Codes. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post Building Advanced Multi-Agent AI Workflows by Leveraging AutoGen and Semantic Kernel appeared first on MarkTechPost.

TabArena: Benchmarking Tabular Machine Learning with Reproducibility a …

Posted on July 1, 2025 by i-genie

Understanding the Importance of Benchmarking in Tabular ML

Machine learning on tabular data focuses on building models that learn patterns from structured datasets, typically composed of rows and columns similar to those found in spreadsheets. These datasets are used in industries ranging from healthcare to finance, where accuracy and interpretability are essential. Techniques such as gradient-boosted trees and neural networks are commonly used, and recent advances have introduced foundation models designed to handle tabular data structures. Ensuring fair and effective comparisons between these methods has become increasingly important as new models continue to emerge.

Challenges with Existing Benchmarks

One challenge in this domain is that benchmarks for evaluating models on tabular data are often outdated or flawed. Many benchmarks continue to utilize obsolete datasets with licensing issues or those that do not accurately reflect real-world tabular use cases. Furthermore, some benchmarks include data leaks or synthetic tasks, which distort model evaluation. Without active maintenance or updates, these benchmarks fail to keep pace with advances in modeling, leaving researchers and practitioners with tools that cannot reliably measure current model performance.

Limitations of Current Benchmarking Tools

Several tools have attempted to benchmark models, but they typically rely on automatic dataset selection and minimal human oversight. This introduces inconsistencies in performance evaluation due to unverified data quality, duplication, or preprocessing errors. Furthermore, many of these benchmarks utilize only default model settings and avoid extensive hyperparameter tuning or ensemble techniques. The result is a lack of reproducibility and a limited understanding of how models perform under real-world conditions. Even widely cited benchmarks often fail to specify essential implementation details or restrict their evaluations to narrow validation protocols.

Introducing TabArena: A Living Benchmarking Platform

Researchers from Amazon Web Services, University of Freiburg, INRIA Paris, Ecole Normale Supérieure, PSL Research University, PriorLabs, and the ELLIS Institute Tübingen have introduced TabArena—a continuously maintained benchmark system designed for tabular machine learning. The research introduced TabArena to function as a dynamic and evolving platform. Unlike previous benchmarks that are static and outdated soon after release, TabArena is maintained like software: versioned, community-driven, and updated based on new findings and user contributions. The system was launched with 51 carefully curated datasets and 16 well-implemented machine-learning models.

Three Pillars of TabArena’s Design

The research team constructed TabArena on three main pillars: robust model implementation, detailed hyperparameter optimization, and rigorous evaluation. All models are built using AutoGluon and adhere to a unified framework that supports preprocessing, cross-validation, metric tracking, and ensembling. Hyperparameter tuning involves evaluating up to 200 different configurations for most models, except TabICL and TabDPT, which were tested for in-context learning only. For validation, the team uses 8-fold cross-validation and applies ensembling across different runs of the same model. Foundation models, due to their complexity, are trained on merged training-validation splits as recommended by their original developers. Each benchmarking configuration is evaluated with a one-hour time limit on standard computing resources.

Performance Insights from 25 Million Model Evaluations

Performance results from TabArena are based on an extensive evaluation involving approximately 25 million model instances. The analysis showed that ensemble strategies significantly improve performance across all model types. Gradient-boosted decision trees still perform strongly, but deep-learning models with tuning and ensembling are on par with, or even better than, them. For instance, AutoGluon 1.3 achieved marked results under a 4-hour training budget. Foundation models, particularly TabPFNv2 and TabICL, demonstrated strong performance on smaller datasets thanks to their effective in-context learning capabilities, even without tuning. Ensembles combining different types of models achieved state-of-the-art performance, although not all individual models contributed equally to the final results. These findings highlight the importance of both model diversity and the effectiveness of ensemble methods.

Significance of TabArena for the ML Community

The article identifies a clear gap in reliable, current benchmarking for tabular machine learning and offers a well-structured solution. By creating TabArena, the researchers have introduced a platform that addresses critical issues of reproducibility, data curation, and performance evaluation. The method relies on detailed curation and practical validation strategies, making it a significant contribution for anyone developing or evaluating models on tabular data.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post TabArena: Benchmarking Tabular Machine Learning with Reproducibility and Ensembling at Scale appeared first on MarkTechPost.

LongWriter-Zero: A Reinforcement Learning Framework for Ultra-Long Tex …

Posted on July 1, 2025 by i-genie

Introduction to Ultra-Long Text Generation Challenges

Generating ultra-long texts that span thousands of words is becoming increasingly important for real-world tasks such as storytelling, legal writing, and educational materials. However, large language models still face significant challenges, including length limits and quality issues, as their outputs become increasingly longer. Common problems include incoherence, topic drift, repetition, and poor structure. Earlier methods, such as LongWriter, utilize supervised fine-tuning on synthetic data to address this issue; however, this data is costly to create, difficult to generate, and often feels unnatural. Moreover, relying on existing LLMs to create training data limits creativity, and typical training methods don’t effectively improve the overall coherence or formatting of long outputs.

Evolution of Long-Form Text Generation Methods

Recent research into long-form text generation has focused on improving coherence, personalization, and extending output length beyond 2,000 words. Early models, such as Re3 and DOC, used recursive strategies to maintain structure, while LongLaMP and others introduced personalization through reasoning-aware self-training. Suri built a large instruction-following dataset but was limited to outputs under 5,000 tokens due to reliance on back-translation. LongWriter advanced this by generating outputs of 6k–20k tokens using supervised fine-tuning and preference optimization, though it retained biases from its teacher models. On another front, RL has improved reasoning in LLMs like DeepSeek-R1 and QwQ-32B, yet RL remains underexplored for ultra-long text generation.

LongWriter-Zero: Reinforcement Learning Without Synthetic Data

Researchers from Tsinghua University and SUTD introduce LongWriter-Zero. This approach uses RL to train LLMs for ultra-long text generation, without relying on annotated or synthetic data. Starting from the Qwen2.5-32B base model, they apply RL with carefully designed reward models targeting text length, quality, and structure. Their framework draws inspiration from success in math and coding tasks, exploring three key factors: reward design, inference-time scaling, and continual pretraining. LongWriter-Zero surpasses traditional supervised fine-tuning methods, achieving state-of-the-art performance on WritingBench and Arena-Write, even outperforming 100B+ models like DeepSeek-R1.

Novel Optimization Strategy and Benchmarking

The study introduces a reinforcement learning approach to improve ultra-long text generation using LLMs. The researchers build on PPO with a method called Group Relative Policy Optimization, training a 32B parameter model on instruction-following data with a 14k-token output limit. They evaluate outputs using a new benchmark, Arena-Write, and design a reward system that balances text length, fluency, coherence, and format. A key insight is that having the model “think” before writing using intermediate reasoning steps leads to better structure and control. Further gains are achieved through pretraining on writing-heavy data, underscoring the importance of a robust, writing-focused foundation.

Results on Long-Form Generation Benchmarks

LongWriter-Zero is evaluated through a two-step process: continual pretraining on long books using 30 billion tokens, followed by reinforcement learning fine-tuning over 150 steps with “Think” prompts to encourage reasoning. It scores 8.69 on WritingBench, outperforming GPT-4o (8.16), Qwen2.5-Max (8.37), and DeepSeek-R1 (8.55), leading in five out of six domains. In Arena-Write, it attains the highest Elo score of 1447. Removing “Think” prompts or pretraining results in major performance drops, confirming their importance. The model also achieves a win rate of 98.2 percent in GPT-4.1-based comparisons, with human evaluations validating its strength in long-form writing.

Conclusion and Future Outlook on Reward Design

In conclusion, LongWriter-Zero proposes a reinforcement learning approach to ultra-long text generation, thereby avoiding the need for synthetic or labeled datasets. Built on Qwen2.5-32B and trained from scratch, it utilizes reward models that target length control, writing quality, and formatting. It achieves top scores on WritingBench (8.69) and Arena-Write (Elo 1447), outperforming GPT-4o (8.16), DeepSeek-R1 (8.55), and Qwen3-235B-A22B (Elo 1343). Human and GPT-4.1-based evaluations show win rates as high as 98.2%. However, it faces reward model hacking, such as inflating length through repetition or inserting keywords like “quantum entanglement” for higher scores. Addressing these limitations will require a better design of rewards and human-in-the-loop strategies.

Check out the Paper and Dataset Card. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post LongWriter-Zero: A Reinforcement Learning Framework for Ultra-Long Text Generation Without Synthetic Data appeared first on MarkTechPost.

Build and deploy AI inference workflows with new enhancements to the A …

Posted on July 1, 2025 by i-genie

Amazon SageMaker Inference has been a popular tool for deploying advanced machine learning (ML) and generative AI models at scale. As AI applications become increasingly complex, customers want to deploy multiple models in a coordinated group that collectively process inference requests for an application. In addition, with the evolution of generative AI applications, many use cases now require inference workflows—sequences of interconnected models operating in predefined logical flows. This trend drives a growing need for more sophisticated inference offerings.
To address this need, we are introducing a new capability in the SageMaker Python SDK that revolutionizes how you build and deploy inference workflows on SageMaker. We will take Amazon Search as an example to show case how this feature is used in helping customers building inference workflows. This new Python SDK capability provides a streamlined and simplified experience that abstracts away the underlying complexities of packaging and deploying groups of models and their collective inference logic, allowing you to focus on what matter most—your business logic and model integrations.
In this post, we provide an overview of the user experience, detailing how to set up and deploy these workflows with multiple models using the SageMaker Python SDK. We walk through examples of building complex inference workflows, deploying them to SageMaker endpoints, and invoking them for real-time inference. We also show how customers like Amazon Search plan to use SageMaker Inference workflows to provide more relevant search results to Amazon shoppers.
Whether you are building a simple two-step process or a complex, multimodal AI application, this new feature provides the tools you need to bring your vision to life. This tool aims to make it easy for developers and businesses to create and manage complex AI systems, helping them build more powerful and efficient AI applications.
In the following sections, we dive deeper into details of the SageMaker Python SDK, walk through practical examples, and showcase how this new capability can transform your AI development and deployment process.
Key improvements and user experience
The SageMaker Python SDK now includes new features for creating and managing inference workflows. These additions aim to address common challenges in developing and deploying inference workflows:

Deployment of multiple models – The core of this new experience is the deployment of multiple models as inference components within a single SageMaker endpoint. With this approach, you can create a more unified inference workflow. By consolidating multiple models into one endpoint, you can reduce the number of endpoints that need to be managed. This consolidation can also improve operational tasks, resource utilization, and potentially costs.
Workflow definition with workflow mode – The new workflow mode extends the existing Model Builder capabilities. It allows for the definition of inference workflows using Python code. Users familiar with the ModelBuilder class might find this feature to be an extension of their existing knowledge. This mode enables creating multi-step workflows, connecting models, and specifying the data flow between different models in the workflows. The goal is to reduce the complexity of managing these workflows and enable you to focus more on the logic of the resulting compound AI system.
Development and deployment options – A new deployment option has been introduced for the development phase. This feature is designed to allow for quicker deployment of workflows to development environments. The intention is to enable faster testing and refinement of workflows. This could be particularly relevant when experimenting with different configurations or adjusting models.
Invocation flexibility – The SDK now provides options for invoking individual models or entire workflows. You can choose to call a specific inference component used in a workflow or the entire workflow. This flexibility can be useful in scenarios where access to a specific model is needed, or when only a portion of the workflow needs to be executed.
Dependency management – You can use SageMaker Deep Learning Containers (DLCs) or the SageMaker distribution that comes preconfigured with various model serving libraries and tools. These are intended to serve as a starting point for common use cases.

To get started, use the SageMaker Python SDK to deploy your models as inference components. Then, use the workflow mode to create an inference workflow, represented as Python code using the container of your choice. Deploy the workflow container as another inference component on the same endpoints as the models or a dedicated endpoint. You can run the workflow by invoking the inference component that represents the workflow. The user experience is entirely code-based, using the SageMaker Python SDK. This approach allows you to define, deploy, and manage inference workflows using SDK abstractions offered by this feature and Python programming. The workflow mode provides flexibility to specify complex sequences of model invocations and data transformations, and the option to deploy as components or endpoints caters to various scaling and integration needs.
Solution overview
The following diagram illustrates a reference architecture using the SageMaker Python SDK.

The improved SageMaker Python SDK introduces a more intuitive and flexible approach to building and deploying AI inference workflows. Let’s explore the key components and classes that make up the experience:

ModelBuilder simplifies the process of packaging individual models as inference components. It handles model loading, dependency management, and container configuration automatically.
The CustomOrchestrator class provides a standardized way to define custom inference logic that orchestrates multiple models in the workflow. Users implement the handle() method to specify this logic and can use an orchestration library or none at all (plain Python).
A single deploy() call handles the deployment of the components and workflow orchestrator.
The Python SDK supports invocation against the custom inference workflow or individual inference components.
The Python SDK supports both synchronous and streaming inference.

CustomOrchestrator is an abstract base class that serves as a template for defining custom inference orchestration logic. It standardizes the structure of entry point-based inference scripts, making it straightforward for users to create consistent and reusable code. The handle method in the class is an abstract method that users implement to define their custom orchestration logic.

class CustomOrchestrator (ABC):
“””
Templated class used to standardize the structure of an entry point based inference script.
“””

@abstractmethod
def handle(self, data, context=None):
“””abstract class for defining an entrypoint for the model server”””
return NotImplemented

With this templated class, users can integrate into their custom workflow code, and then point to this code in the model builder using a file path or directly using a class or method name. Using this class and the ModelBuilder class, it enables a more streamlined workflow for AI inference:

Users define their custom workflow by implementing the CustomOrchestrator class.
The custom CustomOrchestrator is passed to ModelBuilder using the ModelBuilder inference_spec parameter.
ModelBuilder packages the CustomOrchestrator along with the model artifacts.
The packaged model is deployed to a SageMaker endpoint (for example, using a TorchServe container).
When invoked, the SageMaker endpoint uses the custom handle() function defined in the CustomOrchestrator to handle the input payload.

In the follow sections, we provide two examples of custom workflow orchestrators implemented with plain Python code. For simplicity, the examples use two inference components.
We explore how to create a simple workflow that deploys two large language models (LLMs) on SageMaker Inference endpoints along with a simple Python orchestrator that calls the two models. We create an IT customer service workflow where one model processes the initial request and another suggests solutions. You can find the example notebook in the GitHub repo.
Prerequisites
To run the example notebooks, you need an AWS account with an AWS Identity and Access Management (IAM) role with least-privilege permissions to manage resources created. For details, refer to Create an AWS account. You might need to request a service quota increase for the corresponding SageMaker hosting instances. In this example, we host multiple models on the same SageMaker endpoint, so we use two ml.g5.24xlarge SageMaker hosting instances.
Python inference orchestration
First, let’s define our custom orchestration class that inherits from CustomOrchestrator. The workflow is structured around a custom inference entry point that handles the request data, processes it, and retrieves predictions from the configured model endpoints. See the following code:

class PythonCustomInferenceEntryPoint(CustomOrchestrator):
def __init__(self, region_name, endpoint_name, component_names):
self.region_name = region_name
self.endpoint_name = endpoint_name
self.component_names = component_names

def preprocess(self, data):
payload = {
“inputs”: data.decode(“utf-8″)
}
return json.dumps(payload)

def _invoke_workflow(self, data):
# First model (Llama) inference
payload = self.preprocess(data)

llama_response = self.client.invoke_endpoint(
EndpointName=self.endpoint_name,
Body=payload,
ContentType=”application/json”,
InferenceComponentName=self.component_names[0]
)
llama_generated_text = json.loads(llama_response.get(‘Body’).read())[‘generated_text’]

# Second model (Mistral) inference
parameters = {
“max_new_tokens”: 50
}
payload = {
“inputs”: llama_generated_text,
“parameters”: parameters
}
mistral_response = self.client.invoke_endpoint(
EndpointName=self.endpoint_name,
Body=json.dumps(payload),
ContentType=”application/json”,
InferenceComponentName=self.component_names[1]
)
return {“generated_text”: json.loads(mistral_response.get(‘Body’).read())[‘generated_text’]}

def handle(self, data, context=None):
return self._invoke_workflow(data)

This code performs the following functions:

Defines the orchestration that sequentially calls two models using their inference component names
Processes the response from the first model before passing it to the second model
Returns the final generated response

This plain Python approach provides flexibility and control over the request-response flow, enabling seamless cascading of outputs across multiple model components.
Build and deploy the workflow
To deploy the workflow, we first create our inference components and then build the custom workflow. One inference component will host a Meta Llama 3.1 8B model, and the other will host a Mistral 7B model.

from sagemaker.serve import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

# Create a ModelBuilder instance for Llama 3.1 8B
# Pre-benchmarked ResourceRequirements will be taken from JumpStart, as Llama-3.1-8b is a supported model.
llama_model_builder = ModelBuilder(
model=”meta-textgeneration-llama-3-1-8b”,
schema_builder=SchemaBuilder(sample_input, sample_output),
inference_component_name=llama_ic_name,
instance_type=”ml.g5.24xlarge”
)

# Create a ModelBuilder instance for Mistral 7B model.
mistral_mb = ModelBuilder(
model=”huggingface-llm-mistral-7b”,
instance_type=”ml.g5.24xlarge”,
schema_builder=SchemaBuilder(sample_input, sample_output),
inference_component_name=mistral_ic_name,
resource_requirements=ResourceRequirements(
requests={
“memory”: 49152,
“num_accelerators”: 2,
“copies”: 1
}
),
instance_type=”ml.g5.24xlarge”
)

Now we can tie it all together to create one more ModelBuilder to which we pass the modelbuilder_list, which contains the ModelBuilder objects we just created for each inference component and the custom workflow. Then we call the build() function to prepare the workflow for deployment.

# Create workflow ModelBuilder
orchestrator= ModelBuilder(
inference_spec=PythonCustomInferenceEntryPoint(
region_name=region,
endpoint_name=llama_mistral_endpoint_name,
component_names=[llama_ic_name, mistral_ic_name],
),
dependencies={
“auto”: False,
“custom”: [
“cloudpickle”,
“graphene”,
# Define other dependencies here.
],
},
sagemaker_session=Session(),
role_arn=role,
resource_requirements=ResourceRequirements(
requests={
“memory”: 4096,
“num_accelerators”: 1,
“copies”: 1,
“num_cpus”: 2
}
),
name=custom_workflow_name, # Endpoint name for your custom workflow
schema_builder=SchemaBuilder(sample_input={“inputs”: “test”}, sample_output=”Test”),
modelbuilder_list=[llama_model_builder, mistral_mb] # Inference Component ModelBuilders created in Step 2
)
# call the build function to prepare the workflow for deployment
orchestrator.build()

In the preceding code snippet, you can comment out the section that defines the resource_requirements to have the custom workflow deployed on a separate endpoint instance, which can be a dedicated CPU instance to handle the custom workflow payload.
By calling the deploy() function, we deploy the custom workflow and the inference components to your desired instance type, in this example ml.g5.24.xlarge. If you choose to deploy the custom workflow to a separate instance, by default, it will use the ml.c5.xlarge instance type. You can set inference_workflow_instance_type and inference_workflow_initial_instance_count to configure the instances required to host the custom workflow.

predictors = orchestrator.deploy(
instance_type=”ml.g5.24xlarge”,
initial_instance_count=1,
accept_eula=True, # Required for Llama3
endpoint_name=llama_mistral_endpoint_name
# inference_workflow_instance_type=”ml.t2.medium”, # default
# inference_workflow_initial_instance_count=1 # default
)

Invoke the endpoint
After you deploy the workflow, you can invoke the endpoint using the predictor object:

from sagemaker.serializers import JSONSerializer
predictors[-1].serializer = JSONSerializer()
predictors[-1].predict(“Tell me a story about ducks.”)

You can also invoke each inference component in the deployed endpoint. For example, we can test the Llama inference component with a synchronous invocation, and Mistral with streaming:

from sagemaker.predictor import Predictor
# create predictor for the inference component of Llama model
llama_predictor = Predictor(endpoint_name=llama_mistral_endpoint_name, component_name=llama_ic_name)
llama_predictor.content_type = “application/json”

llama_predictor.predict(json.dumps(payload))

When handling the streaming response, we need to read each line of the output separately. The following example code demonstrates this streaming handling by checking for newline characters to separate and print each token in real time:

mistral_predictor = Predictor(endpoint_name=llama_mistral_endpoint_name, component_name=mistral_ic_name)
mistral_predictor.content_type = “application/json”

body = json.dumps({
“inputs”: prompt,
# specify the parameters as needed
“parameters”: parameters
})

for line in mistral_predictor.predict_stream(body):
decoded_line = line.decode(‘utf-8’)
if ‘n’ in decoded_line:
# Split by newline to handle multiple tokens in the same line
tokens = decoded_line.split(‘n’)
for token in tokens[:-1]: # Print all tokens except the last one with a newline
print(token)
# Print the last token without a newline, as it might be followed by more tokens
print(tokens[-1], end=”)
else:
# Print the token without a newline if it doesn’t contain ‘n’
print(decoded_line, end=”)

So far, we have walked through the example code to demonstrate how to build complex inference logic using Python orchestration, deploy them to SageMaker endpoints, and invoke them for real-time inference. The Python SDK automatically handles the following:

Model packaging and container configuration
Dependency management and environment setup
Endpoint creation and component coordination

Whether you’re building a simple workflow of two models or a complex multimodal application, the new SDK provides the building blocks needed to bring your inference workflows to life with minimal boilerplate code.
Customer story: Amazon Search
Amazon Search is a critical component of the Amazon shopping experience, processing an enormous volume of queries across billions of products across diverse categories. At the core of this system are sophisticated matching and ranking workflows, which determine the order and relevance of search results presented to customers. These workflows execute large deep learning models in predefined sequences, often sharing models across different workflows to improve price-performance and accuracy. This approach makes sure that whether a customer is searching for electronics, fashion items, books, or other products, they receive the most pertinent results tailored to their query.
The SageMaker Python SDK enhancement offers valuable capabilities that align well with Amazon Search’s requirements for these ranking workflows. It provides a standard interface for developing and deploying complex inference workflows crucial for effective search result ranking. The enhanced Python SDK enables efficient reuse of shared models across multiple ranking workflows while maintaining the flexibility to customize logic for specific product categories. Importantly, it allows individual models within these workflows to scale independently, providing optimal resource allocation and performance based on varying demand across different parts of the search system.
Amazon Search is exploring the broad adoption of these Python SDK enhancements across their search ranking infrastructure. This initiative aims to further refine and improve search capabilities, enabling the team to build, version, and catalog workflows that power search ranking more effectively across different product categories. The ability to share models across workflows and scale them independently offers new levels of efficiency and adaptability in managing the complex search ecosystem.
Vaclav Petricek, Sr. Manager of Applied Science at Amazon Search, highlighted the potential impact of these SageMaker Python SDK enhancements: “These capabilities represent a significant advancement in our ability to develop and deploy sophisticated inference workflows that power search matching and ranking. The flexibility to build workflows using Python, share models across workflows, and scale them independently is particularly exciting, as it opens up new possibilities for optimizing our search infrastructure and rapidly iterating on our matching and ranking algorithms as well as new AI features. Ultimately, these SageMaker Inference enhancements will allow us to more efficiently create and manage the complex algorithms powering Amazon’s search experience, enabling us to deliver even more relevant results to our customers.”
The following diagram illustrates a sample solution architecture used by Amazon Search.

Clean up
When you’re done testing the models, as a best practice, delete the endpoint to save costs if the endpoint is no longer required. You can follow the cleanup section the demo notebook or use following code to delete the model and endpoint created by the demo:

mistral_predictor.delete_predictor()
llama_predictor.delete_predictor()
llama_predictor.delete_endpoint()
workflow_predictor.delete_predictor()

Conclusion
The new SageMaker Python SDK enhancements for inference workflows mark a significant advancement in the development and deployment of complex AI inference workflows. By abstracting the underlying complexities, these enhancements empower inference customers to focus on innovation rather than infrastructure management. This feature bridges sophisticated AI applications with the robust SageMaker infrastructure, enabling developers to use familiar Python-based tools while harnessing the powerful inference capabilities of SageMaker.
Early adopters, including Amazon Search, are already exploring how these capabilities can drive major improvements in AI-powered customer experiences across diverse industries. We invite all SageMaker users to explore this new functionality, whether you’re developing classic ML models, building generative AI applications or multi-model workflows, or tackling multi-step inference scenarios. The enhanced SDK provides the flexibility, ease of use, and scalability needed to bring your ideas to life. As AI continues to evolve, SageMaker Inference evolves with it, providing you with the tools to stay at the forefront of innovation. Start building your next-generation AI inference workflows today with the enhanced SageMaker Python SDK.

About the authors
Melanie Li, PhD, is a Senior Generative AI Specialist Solutions Architect at AWS based in Sydney, Australia, where her focus is on working with customers to build solutions leveraging state-of-the-art AI and machine learning tools. She has been actively involved in multiple Generative AI initiatives across APJ, harnessing the power of Large Language Models (LLMs). Prior to joining AWS, Dr. Li held data science roles in the financial and retail industries.
Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of Generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.
Osho Gupta is a Senior Software Developer at AWS SageMaker. He is passionate about ML infrastructure space, and is motivated to learn & advance underlying technologies that optimize Gen AI training & inference performance. In his spare time, Osho enjoys paddle boarding, hiking, traveling, and spending time with his friends & family.
Joseph Zhang is a software engineer at AWS. He started his AWS career at EC2 before eventually transitioning to SageMaker, and now works on developing GenAI-related features. Outside of work he enjoys both playing and watching sports (go Warriors!), spending time with family, and making coffee.
Gary Wang is a Software Developer at AWS SageMaker. He is passionate about AI/ML operations and building new things. In his spare time, Gary enjoys running, hiking, trying new food, and spending time with his friends and family.
James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In h is spare time he enjoys seeking out new cultures, new experiences, and staying up to date with the latest technology trends. You can find him on LinkedIn.
Vaclav Petricek is a Senior Applied Science Manager at Amazon Search, where he led teams that built Amazon Rufus and now leads science and engineering teams that work on the next generation of Natural Language Shopping. He is passionate about shipping AI experiences that make people’s lives better. Vaclav loves off-piste skiing, playing tennis, and backpacking with his wife and three children.
Wei Li is a Senior Software Dev Engineer in Amazon Search. She is passionate about Large Language Model training and inference technologies, and loves integrating these solutions into Search Infrastructure to enhance natural language shopping experiences. During her leisure time, she enjoys gardening, painting, and reading.
Brian Granger is a Senior Principal Technologist at Amazon Web Services and a professor of physics and data science at Cal Poly State University in San Luis Obispo, CA. He works at the intersection of UX design and engineering on tools for scientific computing, data science, machine learning, and data visualization. Brian is a co-founder and leader of Project Jupyter, co-founder of the Altair project for statistical visualization, and creator of the PyZMQ project for ZMQ-based message passing in Python. At AWS he is a technical and open source leader in the AI/ML organization. Brian also represents AWS as a board member of the PyTorch Foundation. He is a winner of the 2017 ACM Software System Award and the 2023 NASA Exceptional Public Achievement Medal for his work on Project Jupyter. He has a Ph.D. in theoretical physics from the University of Colorado.

Context extraction from image files in Amazon Q Business using LLMs

Posted on July 1, 2025 by i-genie

To effectively convey complex information, organizations increasingly rely on visual documentation through diagrams, charts, and technical illustrations. Although text documents are well-integrated into modern knowledge management systems, rich information contained in diagrams, charts, technical schematics, and visual documentation often remains inaccessible to search and AI assistants. This creates significant gaps in organizational knowledge bases, leading to interpreting visual data manually and preventing automation systems from using critical visual information for comprehensive insights and decision-making. While Amazon Q Business already handles embedded images within documents, the custom document enrichment (CDE) feature extends these capabilities significantly by processing standalone image files (for example, JPGs and PNGs).
In this post, we look at a step-by-step implementation for using the CDE feature within an Amazon Q Business application. We walk you through an AWS Lambda function configured within CDE to process various image file types, and we showcase an example scenario of how this integration enhances the Amazon Q Business ability to provide comprehensive insights. By following this practical guide, you can significantly expand your organization’s searchable knowledge base, enabling more complete answers and insights that incorporate both textual and visual information sources.
Example scenario: Analyzing regional educational demographics
Consider a scenario where you’re working for a national educational consultancy that has charts, graphs, and demographic data across different AWS Regions stored in an Amazon Simple Storage Service (Amazon S3) bucket. The following image shows student distribution by age range across various cities using a bar chart. The insights in visualizations like this are valuable for decision-making but traditionally locked within image formats in your S3 buckets and other storage.
With Amazon Q Business and CDE, we show you how to enable natural language queries against such visualizations. For example, your team could ask questions such as “Which city has the highest number of students in the 13–15 age range?” or “Compare the student demographics between City 1 and City 4” directly through the Amazon Q Business application interface.

You can bridge this gap using the Amazon Q Business CDE feature to:

Detect and process image files during the document ingestion process
Use Amazon Bedrock with AWS Lambda to interpret the visual information
Extract structured data and insights from charts and graphs
Make this information searchable using natural language queries

Solution overview
In this solution, we walk you through how to implement a CDE-based solution for your educational demographic data visualizations. The solution empowers organizations to extract meaningful information from image files using the CDE capability of Amazon Q Business. When Amazon Q Business encounters the S3 path during ingestion, CDE rules automatically trigger a Lambda function. The Lambda function identifies the image files and calls the Amazon Bedrock API, which uses multimodal large language models (LLMs) to analyze and extract contextual information from each image. The extracted text is then seamlessly integrated into the knowledge base in Amazon Q Business. End users can then quickly search for valuable data and insights from images based on their actual context. By bridging the gap between visual content and searchable text, this solution helps organizations unlock valuable insights previously hidden within their image repositories.
The following figure shows the high-level architecture diagram used for this solution.

For this use case, we use Amazon S3 as our data source. However, this same solution is adaptable to other data source types supported by Amazon Q Business, or it can be implemented with custom data sources as needed.To complete the solution, follow these high-level implementation steps:

Create an Amazon Q Business application and sync with an S3 bucket.
Configure the Amazon Q Business application CDE for the Amazon S3 data source.
Extract context from the images.

Prerequisites
The following prerequisites are needed for implementation:

An AWS account.
At least one Amazon Q Business Pro user that has admin permissions to set up and configure Amazon Q Business. For pricing information, refer to Amazon Q Business pricing.
AWS Identity and Access Management (IAM) permissions to create and manage IAM roles and policies.
A supported data source to connect, such as an S3 bucket containing your public documents.
Access to an Amazon Bedrock LLM in the required AWS Region.

Create an Amazon Q Business application and sync with an S3 bucket
To create an Amazon Q Business application and connect it to your S3 bucket, complete the following steps. These steps provide a general overview of how to create an Amazon Q Business application and synchronize it with an S3 bucket. For more comprehensive, step-by-step guidance, follow the detailed instructions in the blog post Discover insights from Amazon S3 with Amazon Q S3 connector.

Initiate your application setup through either the AWS Management Console or AWS Command Line Interface (AWS CLI).
Create an index for your Amazon Q Business application.
Use the built-in Amazon S3 connector to link your application with documents stored in your organization’s S3 buckets.

Configure the Amazon Q Business application CDE for the Amazon S3 data source
With the CDE feature of Amazon Q Business, you can make the most of your Amazon S3 data sources by using the sophisticated capabilities to modify, enhance, and filter documents during the ingestion process, ultimately making enterprise content more discoverable and valuable. When connecting Amazon Q Business to S3 repositories, you can use CDE to seamlessly transform your raw data, applying modifications that significantly improve search quality and information accessibility. This powerful functionality extends to extracting context from binary files such as images through integration with Amazon Bedrock services, enabling organizations to unlock insights from previously inaccessible content formats. By implementing CDE for Amazon S3 data sources, businesses can maximize the utility of their enterprise data within Amazon Q, creating a more comprehensive and intelligent knowledge base that responds effectively to user queries.To configure the Amazon Q Business application CDE for the Amazon S3 data source, complete the following steps:

Select your application and navigate to Data sources.
Choose your existing Amazon S3 data source or create a new one. Verify that Audio/Video under Multi-media content configuration is not enabled.
In the data source configuration, locate the Custom Document Enrichment section.
Configure the pre-extraction rules to trigger a Lambda function when specific S3 bucket conditions are satisfied. Check the following screenshot for an example configuration.

Pre-extraction rules are executed before Amazon Q Business processes files from your S3 bucket.
Extract context from the images
To extract insights from an image file, the Lambda function makes an Amazon Bedrock API call using Anthropic’s Claude 3.7 Sonnet model. You can modify the code to use other Amazon Bedrock models based on your use case.
Constructing the prompt is a critical piece of the code. We recommend trying various prompts to get the desired output for your use case. Amazon Bedrock offers the capability to optimize a prompt that you can use to enhance your use case specific input.
Examine the following Lambda function code snippets, written in Python, to understand the Amazon Bedrock model setup along with a sample prompt to extract insights from an image.
In the following code snippet, we start by importing relevant Python libraries, define constants, and initialize AWS SDK for Python (Boto3) clients for Amazon S3 and Amazon Bedrock runtime. For more information, refer to the Boto3 documentation.

import boto3
import logging
import json
from typing import List, Dict, Any
from botocore.config import Config

MODEL_ID = “us.anthropic.claude-3-7-sonnet-20250219-v1:0”
MAX_TOKENS = 2000
MAX_RETRIES = 2
FILE_FORMATS = (“jpg”, “jpeg”, “png”)

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client(‘s3’)
bedrock = boto3.client(‘bedrock-runtime’, config=Config(read_timeout=3600, region_name=’us-east-1′))

The prompt passed to the Amazon Bedrock model, Anthropic’s Claude 3.7 Sonnet in this case, is broken into two parts: prompt_prefix and prompt_suffix. The prompt breakdown makes it more readable and manageable. Additionally, the Amazon Bedrock prompt caching feature can be used to reduce response latency as well as input token cost. You can modify the prompt to extract information based on your specific use case as needed.

prompt_prefix = “””You are an expert image reader tasked with generating detailed descriptions for various “””
“””types of images. These images may include technical diagrams,”””
“”” graphs and charts, categorization diagrams, data flow and process flow diagrams,”””
“”” hierarchical and timeline diagrams, infographics, “””
“””screenshots and product diagrams/images from user manuals. “””
“”” The description of these images needs to be very detailed so that user can ask “””
“”” questions based on the image, which can be answered by only looking at the descriptions “””
“”” that you generate.
Here is the image you need to analyze:

<image>
“””

prompt_suffix = “””
</image>

Please follow these steps to analyze the image and generate a comprehensive description:

1. Image type: Classify the image as one of technical diagrams, graphs and charts, categorization diagrams, data flow and process flow diagrams, hierarchical and timeline diagrams, infographics, screenshots and product diagrams/images from user manuals. The description of these images needs to be very detailed so that user can ask questions based on the image, which can be answered by only looking at the descriptions that you generate or other.

2. Items:
Carefully examine the image and extract all entities, texts, and numbers present. List these elements in <image_items> tags.

3. Detailed Description:
Using the information from the previous steps, provide a detailed description of the image. This should include the type of diagram or chart, its main purpose, and how the various elements interact or relate to each other. Capture all the crucial details that can be used to answer any followup questions. Write this description in <image_description> tags.

4. Data Estimation (for charts and graphs only):
   If the image is a chart or graph, capture the data in the image in CSV format to be able to recreate the image from the data. Ensure your response captures all relevant details from the chart that might be necessary to answer any follow up questions from the chart.
   If exact values cannot be inferred, provide an estimated range for each value in <estimation> tags.
   If no data is present, respond with “No data found”.

Present your analysis in the following format:

<analysis>
<image_type>
[Classify the image type here]
</image_type>

<image_items>
[List all extracted entities, texts, and numbers here]
</image_items>

<image_description>
[Provide a detailed description of the image here]
</image_description>

<data>
[If applicable, provide estimated number ranges for chart elements here]
</data>
</analysis>

Remember to be thorough and precise in your analysis. If you’re unsure about any aspect of the image, state your uncertainty clearly in the relevant section.
“””

The lambda_handler is the main entry point for the Lambda function. While invoking this Lambda function, the CDE passes the data source’s information within event object input. In this case, the S3 bucket and the S3 object key are retrieved from the event object along with the file format. Further processing of the input happens only if the file_format matches the expected file types. For production ready code, implement proper error handling for unexpected errors.

def lambda_handler(event, context):
   logger.info(“Received event: %s” % json.dumps(event))
   s3Bucket = event.get(“s3Bucket”)
   s3ObjectKey = event.get(“s3ObjectKey”)
   metadata = event.get(“metadata”)
   file_format = s3ObjectKey.lower().split(‘.’)[-1]
   new_key = ‘cde_output/’ + s3ObjectKey + ‘.txt’
   if (file_format in FILE_FORMATS):
   afterCDE = generate_image_description(s3Bucket, s3ObjectKey, file_format)
   s3.put_object(Bucket = s3Bucket, Key = new_key, Body=afterCDE)
   return {
   “version” : “v0”,
   “s3ObjectKey”: new_key,
   “metadataUpdates”: []
   }

The generate_image_description function calls two other functions: first to construct the message that is passed to the Amazon Bedrock model and second to invoke the model. It returns the final text output extracted from the image file by the model invocation.

def generate_image_description(s3Bucket: str, s3ObjectKey: str, file_format: str) -> str:
   “””
   Generate a description for an image.
   Inputs:
   image_file: str – Path to the image file
   Output:
   str – Generated image description
   “””
   messages = _llm_input(s3Bucket, s3ObjectKey, file_format)
   response = _invoke_model(messages)
   return response[‘output’][‘message’][‘content’][0][‘text’]

The _llm_input function takes in the S3 object’s details passed as input along with the file type (png, jpg) and builds the message in the format expected by the model invoked by Amazon Bedrock.

def _llm_input(s3Bucket: str, s3ObjectKey: str, file_format: str) -> List[Dict[str, Any]]:
   s3_response = s3.get_object(Bucket = s3Bucket, Key = s3ObjectKey)
   image_content = s3_response[‘Body’].read()
   message = {
   “role”: “user”,
   “content”: [
   {“text”: prompt_prefix},
   {
   “image”: {
   “format”: file_format,
   “source”: {
   “bytes”: image_content
   }
   }
   },
   {“text”: prompt_suffix}
   ]
   }
   return [message]

The _invoke_model function calls the converse API using the Amazon Bedrock runtime client. This API returns the response generated by the model. The values within inferenceConfig settings for maxTokens and temperature are used to limit the length of the response and make the responses more deterministic (less random) respectively.

def _invoke_model(messages: List[Dict[str, Any]]) -> Dict[str, Any]:
   “””
   Call the Bedrock model with retry logic.
   Input:
   messages: List[Dict[str, Any]] – Prepared messages for the model
   Output:
   Dict[str, Any] – Model response
   “””
   for attempt in range(MAX_RETRIES):
   try:
   response = bedrock.converse(
   modelId=MODEL_ID,
   messages=messages,
   inferenceConfig={
   “maxTokens”: MAX_TOKENS,
   “temperature”: 0,
   }
   )
   return response
   except Exception as e:
   print(e)

   raise Exception(f”Failed to call model after {MAX_RETRIES} attempts”)

Putting all the preceding code pieces together, the full Lambda function code is shown in the following block:

# Example Lambda function for image processing
import boto3
import logging
import json
from typing import List, Dict, Any
from botocore.config import Config

MODEL_ID = “us.anthropic.claude-3-7-sonnet-20250219-v1:0”
MAX_TOKENS = 2000
MAX_RETRIES = 2
FILE_FORMATS = (“jpg”, “jpeg”, “png”)

logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client(‘s3’)
bedrock = boto3.client(‘bedrock-runtime’, config=Config(read_timeout=3600, region_name=’us-east-1′))

<image>
“””

prompt_suffix = “””
</image>

Please follow these steps to analyze the image and generate a comprehensive description:

2. Items:
Carefully examine the image and extract all entities, texts, and numbers present. List these elements in <image_items> tags.

Present your analysis in the following format:

<analysis>
<image_type>
[Classify the image type here]
</image_type>

<image_items>
[List all extracted entities, texts, and numbers here]
</image_items>

<image_description>
[Provide a detailed description of the image here]
</image_description>

<data>
[If applicable, provide estimated number ranges for chart elements here]
</data>
</analysis>

Remember to be thorough and precise in your analysis. If you’re unsure about any aspect of the image, state your uncertainty clearly in the relevant section.
“””

We strongly recommend testing and validating code in a nonproduction environment before deploying it to production. In addition to Amazon Q pricing, this solution will incur charges for AWS Lambda and Amazon Bedrock. For more information, refer to AWS Lambda pricing and Amazon Bedrock pricing.
After the Amazon S3 data is synced with the Amazon Q index, you can prompt the Amazon Q Business application to get the extracted insights as shown in the following section.
Example prompts and results
The following question and answer pairs refer the Student Age Distribution graph at the beginning of this post.
Q: Which City has the highest number of students in the 13-15 age range?

Q: Compare the student demographics between City 1 and City 4?

In the original graph, the bars representing student counts lacked explicit numerical labels, which could make data interpretation challenging on a scale. However, with Amazon Q Business and its integration capabilities, this limitation can be overcome. By using Amazon Q Business to process these visualizations with Amazon Bedrock LLMs using the CDE feature, we’ve enabled a more interactive and insightful analysis experience. The service effectively extracts the contextual information embedded in the graph, even when explicit labels are absent. This powerful combination means that end users can ask questions about the visualization and receive responses based on the underlying data. Rather than being limited by what’s explicitly labeled in the graph, users can now explore deeper insights through natural language queries. This capability demonstrates how Amazon Q Business transforms static visualizations into queryable knowledge assets, enhancing the value of your existing data visualizations without requiring additional formatting or preparation work.
Best practices for Amazon S3 CDE configuration
When setting up CDE for your Amazon S3 data source, consider these best practices:

Use conditional rules to only process specific file types that need transformation.
Monitor Lambda execution with Amazon CloudWatch to track processing errors and performance.
Set appropriate timeout values for your Lambda functions, especially when processing large files.
Consider incremental syncing to process only new or modified documents in your S3 bucket.
Use document attributes to track which documents have been processed by CDE.

Cleanup
Complete the following steps to clean up your resources:

Go to the Amazon Q Business application and select Remove and unsubscribe for users and groups.
Delete the Amazon Q Business application.
Delete the Lambda function.
Empty and delete the S3 bucket. For instructions, refer to Deleting a general purpose bucket.

Conclusion
This solution demonstrates how combining Amazon Q Business, custom document enrichment, and Amazon Bedrock can transform static visualizations into queryable knowledge assets, significantly enhancing the value of existing data visualizations without additional formatting work. By using these powerful AWS services together, organizations can bridge the gap between visual information and actionable insights, enabling users to interact with different file types in more intuitive ways.
Explore What is Amazon Q Business? and Getting started with Amazon Bedrock in the documentation to implement this solution for your specific use cases and unlock the potential of your visual data.
About the Authors

About the authors
Amit Chaudhary Amit Chaudhary is a Senior Solutions Architect at Amazon Web Services. His focus area is AI/ML, and he helps customers with generative AI, large language models, and prompt engineering. Outside of work, Amit enjoys spending time with his family.
Nikhil Jha Nikhil Jha is a Senior Technical Account Manager at Amazon Web Services. His focus areas include AI/ML, building Generative AI resources, and analytics. In his spare time, he enjoys exploring the outdoors with his family.

Build AWS architecture diagrams using Amazon Q CLI and MCP

Posted on July 1, 2025 by i-genie

Creating professional AWS architecture diagrams is a fundamental task for solutions architects, developers, and technical teams. These diagrams serve as essential communication tools for stakeholders, documentation of compliance requirements, and blueprints for implementation teams. However, traditional diagramming approaches present several challenges:

Time-consuming process – Creating detailed architecture diagrams manually can take hours or even days
Steep learning curve – Learning specialized diagramming tools requires significant investment
Inconsistent styling – Maintaining visual consistency across multiple diagrams is difficult
Outdated AWS icons – Keeping up with the latest AWS service icons and best practices challenging.
Difficult maintenance – Updating diagrams as architectures evolve can become increasingly burdensome

Amazon Q Developer CLI with the Model Context Protocol (MCP) offers a streamlined approach to creating AWS architecture diagrams. By using generative AI through natural language prompts, architects can now generate professional diagrams in minutes rather than hours, while adhering to AWS best practices.
In this post, we explore how to use Amazon Q Developer CLI with the AWS Diagram MCP and the AWS Documentation MCP servers to create sophisticated architecture diagrams that follow AWS best practices. We discuss techniques for basic diagrams and real-world diagrams, with detailed examples and step-by-step instructions.
Solution overview
Amazon Q Developer CLI is a command line interface that brings the generative AI capabilities of Amazon Q directly to your terminal. Developers can interact with Amazon Q through natural language prompts, making it an invaluable tool for various development tasks.
Developed by Anthropic as an open protocol, the Model Context Protocol (MCP) provides a standardized way to connect AI models to virtually any data source or tool. Using a client-server architecture (as illustrated in the following diagram), the MCP helps developers expose their data through lightweight MCP servers while building AI applications as MCP clients that connect to these servers.
The MCP uses a client-server architecture containing the following components:

Host – A program or AI tool that requires access to data through the MCP protocol, such as Anthropic’s Claude Desktop, an integrated development environment (IDE), AWS MCP CLI, or other AI applications
Client – Protocol clients that maintain one-to-one connections with server
Server – Lightweight programs that expose capabilities through standardized MCP or act as tools
Data sources – Local data sources such as databases and file systems, or external systems available over the internet through APIs (web APIs) that MCP servers can connect with

As announced in April 2025, MCP enables Amazon Q Developer to connect with specialized servers that extend its capabilities beyond what’s possible with the base model alone. MCP servers act as plugins for Amazon Q, providing domain-specific knowledge and functionality. The AWS Diagram MCP server specifically enables Amazon Q to generate architecture diagrams using the Python diagrams package, with access to the complete AWS icon set and architectural best practices.
Prerequisites
To implement this solution, you must have an AWS account with appropriate permissions and follow the steps below.
Set up your environment
Before you can start creating diagrams, you need to set up your environment with Amazon Q CLI, the AWS Diagram MCP server, and AWS Documentation MCP server. This section provides detailed instructions for installation and configuration.
Install Amazon Q Developer CLI
Amazon Q Developer CLI is available as a standalone installation. Complete the following steps to install it:

Download and install Amazon Q Developer CLI. For instructions, see Using Amazon Q Developer on the command line.
Verify the installation by running the following command: q –version You should see output similar to the following: Amazon Q Developer CLI version 1.x.x
Configure Amazon Q CLI with your AWS credentials: q login
Choose the login method suitable for you:

Use for free with AWS Builder ID
Use with Pro license

Set up MCP servers
Complete the following steps to set up your MCP servers:

Install uv using the following command: pip install uv
Install Python 3.10 or newer: uv python install 3.10
Install GraphViz for your operating system.
Add the servers to your ~/.aws/amazonq/mcp.json file:

{
“mcpServers”: {
“awslabs.aws-diagram-mcp-server”: {
“command”: “uvx”,
“args”: [“awslabs.aws-diagram-mcp-server”],
“env”: {
“FASTMCP_LOG_LEVEL”: “ERROR”
},
“autoApprove”: [],
“disabled”: false
},
“awslabs.aws-documentation-mcp-server”: {
“command”: “uvx”,
“args”: [“awslabs.aws-documentation-mcp-server@latest”],
“env”: {
“FASTMCP_LOG_LEVEL”: “ERROR”
},
“autoApprove”: [],
“disabled”: false
}
}
}

Now, Amazon Q CLI automatically discovers MCP servers in the ~/.aws/amazonq/mcp.json file.
Understanding MCP server tools
The AWS Diagram MCP server provides several powerful tools:

list_icons – Lists available icons from the diagrams package, organized by provider and service category
get_diagram_examples – Provides example code for different types of diagrams (AWS, sequence, flow, class, and others)
generate_diagram – Creates a diagram from Python code using the diagrams package

The AWS Documentation MCP server provides the following useful tools:

search_documentation – Searches AWS documentation using the official AWS Documentation Search API
read_documentation – Fetches and converts AWS documentation pages to markdown format
recommend – Gets content recommendations for AWS documentation pages

These tools work together to help you create accurate architecture diagrams that follow AWS best practices.
Test your setup
Let’s verify that everything is working correctly by generating a simple diagram:

Start the Amazon Q CLI chat interface and verify the output shows the MCP servers being loaded and initialized: q chat
In the chat interface, enter the following prompt: Please create a diagram showing an EC2 instance in a VPC connecting to an external S3 bucket. Include essential networking components (VPC, subnets, Internet Gateway, Route Table), security elements (Security Groups, NACLs), and clearly mark the connection between EC2 and S3. Label everything appropriately concisely and indicate that all resources are in the us-east-1 region. Check for AWS documentation to ensure it adheres to AWS best practices before you create the diagram.
Amazon Q CLI will ask you to trust the tool that is being used; enter t to trust it.Amazon Q CLI will generate and display a simple diagram showing the requested architecture. Your diagram should look similar to the following screenshot, though there might be variations in layout, styling, or specific details because it’s created using generative AI. The core architectural components and relationships will be represented, but the exact visual presentation might differ slightly with each generation. If you see the diagram, your environment is set up correctly. If you encounter issues, verify that Amazon Q CLI can access the MCP servers by making sure you installed the necessary tools and the servers are in the ~/.aws/amazonq/mcp.json file.

Configuration options
The AWS Diagram MCP server supports several configuration options to customize your diagramming experience:

Output directory – By default, diagrams are saved in a generated-diagrams directory in your current working directory. You can specify a different location in your prompts.
Diagram format – The default output format is PNG, but you can request other formats like SVG in your prompts.
Styling options – You can specify colors, shapes, and other styling elements in your prompts.

Now that our environment is set up, let’s create more diagrams.
Create AWS architecture diagrams
In this section, we walk through the process of multiple AWS architecture diagrams using Amazon Q CLI with the AWS Diagram MCP server and AWS Documentation MCP server to make sure our requirements follow best practices.
When you provide a prompt to Amazon Q CLI, the AWS Diagram and Documentation MCP servers complete the following steps:

Interpret your requirements.
Check for best practices on the AWS documentation.
Generate Python code using the diagrams package.
Execute the code to create the diagram.
Return the diagram as an image.

This process happens seamlessly, so you can focus on describing what you want rather than how to create it.
AWS architecture diagrams typically include the following components:

Nodes – AWS services and resources
Edges – Connections between nodes showing relationships or data flow
Clusters – Logical groupings of nodes, such as virtual private clouds (VPCs), subnets, and Availability Zones
Labels – Text descriptions for nodes and connections

Example 1: Create a web application architecture
Let’s create a diagram for a simple web application hosted on AWS. Enter the following prompt:
Create a diagram for a simple web application with an Application Load Balancer, two EC2 instances, and an RDS database. Check for AWS documentation to ensure it adheres to AWS best practices before you create the diagram

After you enter your prompt, Amazon Q CLI will search AWS documentation for best practices using the search_documentation tool from awslabsaws_documentation_mcp_server.
Following the search of the relevant AWS documentation, it will read the documentation using the read_documentation tool from the MCP server awslabsaws_documentation_mcp_server.
Amazon Q CLI will then list the needed AWS service icons using the list_icons tool, and will use generate_diagram with awslabsaws_diagram_mcp_server.
You should receive an output with a description of the diagram created based on the prompt along with the location of where the diagram was saved.
Amazon Q CLI will generate and display the diagram.

The generated diagram shows the following key components:

An Application Load Balancer as the entry point
Two Amazon Elastic Compute Cloud (Amazon EC2) instances for the application tier
An Amazon Relational Database Service (Amazon RDS) instance for the database tier
Connections showing the flow of traffic

Example 2: Create a multi-tier architecture
Multi-tier architectures separate applications into functional layers (presentation, application, and data) to improve scalability and security. We use the following prompt to create our diagram:
Create a diagram for a three-tier web application with a presentation tier (ALB and CloudFront), application tier (ECS with Fargate), and data tier (Aurora PostgreSQL). Include VPC with public and private subnets across multiple AZs. Check for AWS documentation to ensure it adheres to AWS best practices before you create the diagram.

The diagram shows the following key components:

A presentation tier in public subnets
An application tier in private subnets
A data tier in isolated private subnets
Proper security group configurations
Traffic flow between tiers

Example 3: Create a serverless architecture
We use the following prompt to create a diagram for a serverless architecture:
Create a diagram for a serverless web application using API Gateway, Lambda, DynamoDB, and S3 for static website hosting. Include Cognito for user authentication and CloudFront for content delivery. Check for AWS documentation to ensure it adheres to AWS best practices before you create the diagram.

The diagram includes the following key components:

Amazon Simple Storage Service (Amazon S3) hosting static website content
Amazon CloudFront distributing content globally
Amazon API Gateway handling API requests
AWS Lambda functions implementing business logic
Amazon DynamoDB storing application data
Amazon Cognito managing user authentication

Example 4: Create a data processing diagram
We use the following prompt to create a diagram for a data processing pipeline:
Create a diagram for a data processing pipeline with components organized in clusters for data ingestion, processing, storage, and analytics. Include Kinesis, Lambda, S3, Glue, and QuickSight. Check for AWS documentation to ensure it adheres to AWS best practices before you create the diagram.

The diagram organizes components into distinct clusters:

Data ingestion – Amazon Kinesis Data Streams, Amazon Data Firehose, Amazon Simple Queue Service
Data processing – Lambda functions, AWS Glue jobs
Data storage – S3 buckets, DynamoDB tables
Data analytics – AWS Glue, Amazon Athena, Amazon QuickSight

Real-world examples
Let’s explore some real-world architecture patterns and how to create diagrams for them using Amazon Q CLI with the AWS Diagram MCP server.
Ecommerce platform
Ecommerce platforms require scalable, resilient architectures to handle variable traffic and maintain high availability. We use the following prompt to create an example diagram:
Create a diagram for an e-commerce platform with microservices architecture. Include components for product catalog, shopping cart, checkout, payment processing, order management, and user authentication. Ensure the architecture follows AWS best practices for scalability and security. Check for AWS documentation to ensure it adheres to AWS best practices before you create the diagram.

The diagram includes the following key components:

API Gateway as the entry point for client applications
Microservices implemented as containers in Amazon Elastic Container Service (Amazon ECS) with AWS Fargate
RDS databases for product catalog, shopping cart, and order data
Amazon ElastiCache for product data caching and session management
Amazon Cognito for authentication
Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Notification Service (Amazon SNS) for asynchronous communication between services
CloudFront for content delivery and static assets from Amazon S3
Amazon Route 53 for DNS management
AWS WAF for web application security
AWS Lambda functions for serverless microservice implementation
AWS Secrets Manager for secure credential storage
Amazon CloudWatch for monitoring and observability

Intelligent document processing solution
We use the following prompt to create a diagram for an intelligent document processing (IDP) architecture:
Create a diagram for an intelligent document processing (IDP) application on AWS. Include components for document ingestion, OCR and text extraction, intelligent data extraction (using NLP and/or computer vision), human review and validation, and data output/integration. Ensure the architecture follows AWS best practices for scalability and security, leveraging services like S3, Lambda, Textract, Comprehend, SageMaker (for custom models, if applicable), and potentially Augmented AI (A2I). Check for AWS documentation related to intelligent document processing best practices to ensure it adheres to AWS best practices before you create the diagram.

The diagram includes the following key components:

Amazon API Gateway as the entry point for client applications, providing a secure and scalable interface
Microservices implemented as containers in ECS with Fargate, enabling flexible and scalable processing
Amazon RDS databases for product catalog, shopping cart, and order data, providing reliable structured data storage
Amazon ElastiCache for product data caching and session management, improving performance and user experience
Amazon Cognito for authentication, ensuring secure access control
Amazon Simple Queue Service and Amazon Simple Notification Service for asynchronous communication between services, enabling decoupled and resilient architecture
Amazon CloudFront for content delivery and static assets from S3, optimizing global performance
Amazon Route53 for DNS management, providing reliable routing
AWS WAF for web application security, protecting against common web exploits
AWS Lambda functions for serverless microservice implementation, offering cost-effective scaling
AWS Secrets Manager for secure credential storage, enhancing security posture
Amazon CloudWatch for monitoring and observability, providing insights into system performance and health.

Clean up
If you no longer need to use the AWS Cost Analysis MCP server with Amazon Q CLI, you can remove it from your configuration:

Open your ~/.aws/amazonq/mcp.json file.
Remove or comment out the MCP server entries.
Save the file.

This will prevent the server from being loaded when you start Amazon Q CLI in the future.
Conclusion
In this post, we explored how to use Amazon Q CLI with the AWS Documentation MCP and AWS Diagram MCP servers to create professional AWS architecture diagrams that adhere to AWS best practices referenced from official AWS documentation. This approach offers significant advantages over traditional diagramming methods:

Time savings – Generate complex diagrams in minutes instead of hours
Consistency – Make sure diagrams follow the same style and conventions
Best practices – Automatically incorporate AWS architectural guidelines
Iterative refinement – Quickly modify diagrams through simple prompts
Validation – Check architectures against official AWS documentation and recommendations

As you continue your journey with AWS architecture diagrams, we encourage you to deepen your knowledge by learning more about the Model Context Protocol (MCP) to understand how it enhances the capabilities of Amazon Q. When seeking inspiration for your own designs, the AWS Architecture Center offers a wealth of reference architectures that follow best practices. For creating visually consistent diagrams, be sure to visit the AWS Icons page, where you can find the complete official icon set. And to stay at the cutting edge of these tools, keep an eye on updates to the official AWS MCP Servers—they’re constantly evolving with new features to make your diagramming experience even better.

About the Authors
Joel Asante, an Austin-based Solutions Architect at Amazon Web Services (AWS), works with GovTech (Government Technology) customers. With a strong background in data science and application development, he brings deep technical expertise to creating secure and scalable cloud architectures for his customers. Joel is passionate about data analytics, machine learning, and robotics, leveraging his development experience to design innovative solutions that meet complex government requirements. He holds 13 AWS certifications and enjoys family time, fitness, and cheering for the Kansas City Chiefs and Los Angeles Lakers in his spare time.
Dunieski Otano is a Solutions Architect at Amazon Web Services based out of Miami, Florida. He works with World Wide Public Sector MNO (Multi-International Organizations) customers. His passion is Security, Machine Learning and Artificial Intelligence, and Serverless. He works with his customers to help them build and deploy high available, scalable, and secure solutions. Dunieski holds 14 AWS certifications and is an AWS Golden Jacket recipient. In his free time, you will find him spending time with his family and dog, watching a great movie, coding, or flying his drone.
Varun Jasti is a Solutions Architect at Amazon Web Services, working with AWS Partners to design and scale artificial intelligence solutions for public sector use cases to meet compliance standards. With a background in Computer Science, his work covers broad range of ML use cases primarily focusing on LLM training/inferencing and computer vision. In his spare time, he loves playing tennis and swimming.

A Coding Guide to Build a Functional Data Analysis Workflow Using Lila …

Posted on June 30, 2025 by i-genie

In this tutorial, we demonstrate a fully functional and modular data analysis pipeline using the Lilac library, without relying on signal processing. It combines Lilac’s dataset management capabilities with Python’s functional programming paradigm to create a clean, extensible workflow. From setting up a project and generating realistic sample data to extracting insights and exporting filtered outputs, the tutorial emphasizes reusable, testable code structures. Core functional utilities, such as pipe, map_over, and filter_by, are used to build a declarative flow, while Pandas facilitates detailed data transformations and quality analysis.

Copy CodeCopiedUse a different Browser!pip install lilac[all] pandas numpy

To get started, we install the required libraries using the command !pip install lilac[all] pandas numpy. This ensures we have the full Lilac suite alongside Pandas and NumPy for smooth data handling and analysis. We should run this in our notebook before proceeding.

Copy CodeCopiedUse a different Browserimport json
import uuid
import pandas as pd
from pathlib import Path
from typing import List, Dict, Any, Tuple, Optional
from functools import reduce, partial
import lilac as ll

We import all the essential libraries. These include json and uuid for handling data and generating unique project names, pandas for working with data in tabular form, and Path from pathlib for managing directories. We also introduce type hints for improved function clarity and functools for functional composition patterns. Finally, we import the core Lilac library as ll to manage our datasets.

Copy CodeCopiedUse a different Browserdef pipe(*functions):
“””Compose functions left to right (pipe operator)”””
return lambda x: reduce(lambda acc, f: f(acc), functions, x)

def map_over(func, iterable):
“””Functional map wrapper”””
return list(map(func, iterable))

def filter_by(predicate, iterable):
“””Functional filter wrapper”””
return list(filter(predicate, iterable))

def create_sample_data() -> List[Dict[str, Any]]:
“””Generate realistic sample data for analysis”””
return [
{“id”: 1, “text”: “What is machine learning?”, “category”: “tech”, “score”: 0.9, “tokens”: 5},
{“id”: 2, “text”: “Machine learning is AI subset”, “category”: “tech”, “score”: 0.8, “tokens”: 6},
{“id”: 3, “text”: “Contact support for help”, “category”: “support”, “score”: 0.7, “tokens”: 4},
{“id”: 4, “text”: “What is machine learning?”, “category”: “tech”, “score”: 0.9, “tokens”: 5},
{“id”: 5, “text”: “Deep learning neural networks”, “category”: “tech”, “score”: 0.85, “tokens”: 4},
{“id”: 6, “text”: “How to optimize models?”, “category”: “tech”, “score”: 0.75, “tokens”: 5},
{“id”: 7, “text”: “Performance tuning guide”, “category”: “guide”, “score”: 0.6, “tokens”: 3},
{“id”: 8, “text”: “Advanced optimization techniques”, “category”: “tech”, “score”: 0.95, “tokens”: 3},
{“id”: 9, “text”: “Gradient descent algorithm”, “category”: “tech”, “score”: 0.88, “tokens”: 3},
{“id”: 10, “text”: “Model evaluation metrics”, “category”: “tech”, “score”: 0.82, “tokens”: 3},
]

In this section, we define reusable functional utilities. The pipe function helps us chain transformations clearly, while map_over and filter_by allow us to transform or filter iterable data functionally. Then, we create a sample dataset that mimics real-world records, featuring fields such as text, category, score, and tokens, which we will later use to demonstrate Lilac’s data curation capabilities.

Copy CodeCopiedUse a different Browserdef setup_lilac_project(project_name: str) -> str:
“””Initialize Lilac project directory”””
project_dir = f”./{project_name}-{uuid.uuid4().hex[:6]}”
Path(project_dir).mkdir(exist_ok=True)
ll.set_project_dir(project_dir)
return project_dir

def create_dataset_from_data(name: str, data: List[Dict]) -> ll.Dataset:
“””Create Lilac dataset from data”””
data_file = f”{name}.jsonl”
with open(data_file, ‘w’) as f:
for item in data:
f.write(json.dumps(item) + ‘n’)

config = ll.DatasetConfig(
namespace=”tutorial”,
name=name,
source=ll.sources.JSONSource(filepaths=[data_file])
)

return ll.create_dataset(config)

With the setup_lilac_project function, we initialize a unique working directory for our Lilac project and register it using Lilac’s API. Using create_dataset_from_data, we convert our raw list of dictionaries into a .jsonl file and create a Lilac dataset by defining its configuration. This prepares the data for clean and structured analysis.

Copy CodeCopiedUse a different Browserdef extract_dataframe(dataset: ll.Dataset, fields: List[str]) -> pd.DataFrame:
“””Extract data as pandas DataFrame”””
return dataset.to_pandas(fields)

def apply_functional_filters(df: pd.DataFrame) -> Dict[str, pd.DataFrame]:
“””Apply various filters and return multiple filtered versions”””

filters = {
‘high_score’: lambda df: df[df[‘score’] >= 0.8],
‘tech_category’: lambda df: df[df[‘category’] == ‘tech’],
‘min_tokens’: lambda df: df[df[‘tokens’] >= 4],
‘no_duplicates’: lambda df: df.drop_duplicates(subset=[‘text’], keep=’first’),
‘combined_quality’: lambda df: df[(df[‘score’] >= 0.8) & (df[‘tokens’] >= 3) & (df[‘category’] == ‘tech’)]
}

return {name: filter_func(df.copy()) for name, filter_func in filters.items()}

We extract the dataset into a Pandas DataFrame using extract_dataframe, which allows us to work with selected fields in a familiar format. Then, using apply_functional_filters, we define and apply a set of logical filters, such as high-score selection, category-based filtering, token count constraints, duplicate removal, and composite quality conditions, to generate multiple filtered views of the data.

Copy CodeCopiedUse a different Browserdef analyze_data_quality(df: pd.DataFrame) -> Dict[str, Any]:
“””Analyze data quality metrics”””
return {
‘total_records’: len(df),
‘unique_texts’: df[‘text’].nunique(),
‘duplicate_rate’: 1 – (df[‘text’].nunique() / len(df)),
‘avg_score’: df[‘score’].mean(),
‘category_distribution’: df[‘category’].value_counts().to_dict(),
‘score_distribution’: {
‘high’: len(df[df[‘score’] >= 0.8]),
‘medium’: len(df[(df[‘score’] >= 0.6) & (df[‘score’] < 0.8)]),
‘low’: len(df[df[‘score’] < 0.6])
},
‘token_stats’: {
‘mean’: df[‘tokens’].mean(),
‘min’: df[‘tokens’].min(),
‘max’: df[‘tokens’].max()
}
}

def create_data_transformations() -> Dict[str, callable]:
“””Create various data transformation functions”””
return {
‘normalize_scores’: lambda df: df.assign(norm_score=df[‘score’] / df[‘score’].max()),
‘add_length_category’: lambda df: df.assign(
length_cat=pd.cut(df[‘tokens’], bins=[0, 3, 5, float(‘inf’)], labels=[‘short’, ‘medium’, ‘long’])
),
‘add_quality_tier’: lambda df: df.assign(
quality_tier=pd.cut(df[‘score’], bins=[0, 0.6, 0.8, 1.0], labels=[‘low’, ‘medium’, ‘high’])
),
‘add_category_rank’: lambda df: df.assign(
category_rank=df.groupby(‘category’)[‘score’].rank(ascending=False)
)
}

To evaluate the dataset quality, we use analyze_data_quality, which helps us measure key metrics like total and unique records, duplicate rates, category breakdowns, and score/token distributions. This gives us a clear picture of the dataset’s readiness and reliability. We also define transformation functions using create_data_transformations, enabling enhancements such as score normalization, token-length categorization, quality tier assignment, and intra-category ranking.

Copy CodeCopiedUse a different Browserdef apply_transformations(df: pd.DataFrame, transform_names: List[str]) -> pd.DataFrame:
“””Apply selected transformations”””
transformations = create_data_transformations()
selected_transforms = [transformations[name] for name in transform_names if name in transformations]

return pipe(*selected_transforms)(df.copy()) if selected_transforms else df

def export_filtered_data(filtered_datasets: Dict[str, pd.DataFrame], output_dir: str) -> None:
“””Export filtered datasets to files”””
Path(output_dir).mkdir(exist_ok=True)

for name, df in filtered_datasets.items():
output_file = Path(output_dir) / f”{name}_filtered.jsonl”
with open(output_file, ‘w’) as f:
for _, row in df.iterrows():
f.write(json.dumps(row.to_dict()) + ‘n’)
print(f”Exported {len(df)} records to {output_file}”)

Then, through apply_transformations, we selectively apply the needed transformations in a functional chain, ensuring our data is enriched and structured. Once filtered, we use export_filtered_data to write each dataset variant into a separate .jsonl file. This enables us to store subsets, such as high-quality entries or non-duplicate records, in an organized format for downstream use.

Copy CodeCopiedUse a different Browserdef main_analysis_pipeline():
“””Main analysis pipeline demonstrating functional approach”””

print(” Setting up Lilac project…”)
project_dir = setup_lilac_project(“advanced_tutorial”)

print(” Creating sample dataset…”)
sample_data = create_sample_data()
dataset = create_dataset_from_data(“sample_data”, sample_data)

print(” Extracting data…”)
df = extract_dataframe(dataset, [‘id’, ‘text’, ‘category’, ‘score’, ‘tokens’])

print(” Analyzing data quality…”)
quality_report = analyze_data_quality(df)
print(f”Original data: {quality_report[‘total_records’]} records”)
print(f”Duplicates: {quality_report[‘duplicate_rate’]:.1%}”)
print(f”Average score: {quality_report[‘avg_score’]:.2f}”)

print(” Applying transformations…”)
transformed_df = apply_transformations(df, [‘normalize_scores’, ‘add_length_category’, ‘add_quality_tier’])

print(” Applying filters…”)
filtered_datasets = apply_functional_filters(transformed_df)

print(“n Filter Results:”)
for name, filtered_df in filtered_datasets.items():
print(f” {name}: {len(filtered_df)} records”)

print(” Exporting filtered datasets…”)
export_filtered_data(filtered_datasets, f”{project_dir}/exports”)

print(“n Top Quality Records:”)
best_quality = filtered_datasets[‘combined_quality’].head(3)
for _, row in best_quality.iterrows():
print(f” • {row[‘text’]} (score: {row[‘score’]}, category: {row[‘category’]})”)

return {
‘original_data’: df,
‘transformed_data’: transformed_df,
‘filtered_data’: filtered_datasets,
‘quality_report’: quality_report
}

if __name__ == “__main__”:
results = main_analysis_pipeline()
print(“n Analysis complete! Check the exports folder for filtered datasets.”)

Finally, in the main_analysis_pipeline, we execute the full workflow, from setup to data export, showcasing how Lilac, combined with functional programming, allows us to build modular, scalable, and expressive pipelines. We even print out the top-quality entries as a quick snapshot. This function represents our full data curation loop, powered by Lilac.

In conclusion, users will have gained a hands-on understanding of creating a reproducible data pipeline that leverages Lilac’s dataset abstractions and functional programming patterns for scalable, clean analysis. The pipeline covers all critical stages, including dataset creation, transformation, filtering, quality analysis, and export, offering flexibility for both experimentation and deployment. It also demonstrates how to embed meaningful metadata such as normalized scores, quality tiers, and length categories, which can be instrumental in downstream tasks like modeling or human review.

Check out the Codes. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
The post A Coding Guide to Build a Functional Data Analysis Workflow Using Lilac for Transforming, Filtering, and Exporting Structured Insights appeared first on MarkTechPost.

UC San Diego Researchers Introduced Dex1B: A Billion-Scale Dataset for …

Posted on June 30, 2025 by i-genie

Challenges in Dexterous Hand Manipulation Data Collection

Creating large-scale data for dexterous hand manipulation remains a major challenge in robotics. Although hands offer greater flexibility and richer manipulation potential than simpler tools, such as grippers, their complexity makes them difficult to control effectively. Many in the field have questioned whether dexterous hands are worth the added difficulty. The real issue, however, may be a lack of diverse, high-quality training data. Existing methods, such as human demonstrations, optimization, and reinforcement learning, offer partial solutions but have limitations. Generative models have emerged as a promising alternative; however, they often struggle with physical feasibility and tend to produce limited diversity by adhering too closely to known examples.

Evolution of Dexterous Hand Manipulation Approaches

Dexterous hand manipulation has long been central to robotics, initially driven by control-based techniques for precise multi-fingered grasping. Though these methods achieved impressive accuracy, they often struggled to generalize across varied settings. Learning-based approaches later emerged, offering greater adaptability through techniques such as pose prediction, contact maps, and intermediate representations, although they remain sensitive to data quality. Existing datasets, both synthetic and real-world, have their limits, either lacking diversity or being confined to human hand shapes.

Introduction to Dex1B Dataset

Researchers at UC San Diego have developed Dex1B, a massive dataset of one billion high-quality, diverse demonstrations for dexterous hand tasks like grasping and articulation. They combined optimization techniques with generative models, using geometric constraints for feasibility and conditioning strategies to boost diversity. Starting with a small, carefully curated dataset, they trained a generative model to scale up efficiently. A debiasing mechanism further enhanced diversity. Compared to previous datasets, such as DexGraspNet, Dex1B offers vastly more data. They also introduced DexSimple, a strong new baseline that leverages this scale to outperform past methods by 22% on grasping tasks.

Dex1B Benchmark Design and Methodology

The Dex1B benchmark is a large-scale dataset designed to evaluate two key dexterous manipulation tasks, grasping and articulation, using over one billion demonstrations across three robotic hands. Initially, a small but high-quality seed dataset is created using optimization methods. This seed data trains a generative model that produces more diverse and scalable demonstrations. To ensure success and variety, the team applies debiasing techniques and post-optimization adjustments. Tasks are completed via smooth, collision-free motion planning. The result is a richly diverse, simulation-validated dataset that enables realistic, high-volume training for complex hand-object interactions.

Insights on Multimodal Attention in Model Performance

Recent research explores the effect of combining cross-attention with self-attention in multimodal models. While self-attention facilitates understanding of relationships within a single modality, cross-attention enables the model to connect information across different modalities. The study finds that using both together improves performance, particularly in tasks that require aligning and integrating text and image features. Interestingly, cross-attention alone can sometimes outperform self-attention, especially when applied at deeper layers. This insight suggests that carefully designing how and where attention mechanisms are utilized within a model is crucial for comprehending and processing complex multimodal data.

Conclusion: Dex1B’s Impact and Future Potential

In conclusion, Dex1B is a massive synthetic dataset comprising one billion demonstrations for dexterous hand tasks, such as grasping and articulation. To generate this data efficiently, the researchers designed an iterative pipeline that combines optimization techniques with a generative model called DexSimple. Starting with an initial dataset created through optimization, DexSimple generates diverse, realistic manipulation proposals, which are then refined and quality-checked. Enhanced with geometric constraints, DexSimple significantly outperforms previous models on benchmarks like DexGraspNet. The dataset and model prove effective not only in simulation but also in real-world robotics, advancing the field of dexterous hand manipulation with scalable, high-quality data.

Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post UC San Diego Researchers Introduced Dex1B: A Billion-Scale Dataset for Dexterous Hand Manipulation in Robotics appeared first on MarkTechPost.