How climate tech startups are building foundation models with Amazon S …

Climate tech startups are companies that use technology and innovation to address the climate crisis, with a primary focus on either reducing greenhouse gas emissions or helping society adapt to climate change impacts. Their unifying mission is to create scalable solutions that accelerate the transition to a sustainable, low-carbon future. Solutions to the climate crisis are ever more important as climate-driven extreme weather disasters increase globally. In 2024, climate disasters caused more than $417B in damages globally, and there’s no slowing down in 2025 with LA wildfires that destroyed more than $135B in the first month of the year alone. Climate tech startups are at the forefront of building impactful solutions to the climate crisis, and they’re using generative AI to build as quickly as possible.
In this post, we show how climate tech startups are developing foundation models (FMs) that use extensive environmental datasets to tackle issues such as carbon capture, carbon-negative fuels, new materials design for microplastics destruction, and ecosystem preservation. These specialized models require advanced computational capabilities to process and analyze vast amounts of data effectively.
Amazon Web Services (AWS) provides the essential compute infrastructure to support these endeavors, offering scalable and powerful resources through Amazon SageMaker HyperPod. SageMaker HyperPod is a purpose-built infrastructure service that automates the management of large-scale AI training clusters so developers can efficiently build and train complex models such as large language models (LLMs) by automatically handling cluster provisioning, monitoring, and fault tolerance across thousands of GPUs. With SageMaker HyperPod, startups can train complex AI models on diverse environmental datasets, including satellite imagery and atmospheric measurements, with enhanced speed and efficiency. This computational backbone is vital for startups striving to create solutions that are not only innovative but also scalable and impactful.
The increasing complexity of environmental data demands robust data infrastructure and sophisticated model architectures. Integrating multimodal data, employing specialized attention mechanisms for spatial-temporal data, and using reinforcement learning are crucial for building effective climate-focused models. SageMaker HyperPod optimized GPU clustering and scalable resources help startups save time and money while meeting advanced technical requirements, which means they can focus on innovation. As climate technology demands grow, these capabilities allow startups to develop transformative environmental solutions using Amazon SageMaker HyperPod.
Trends among climate tech startups building with generative AI
Climate tech startups’ adoption of generative AI is evolving rapidly. Starting in early 2023, we saw the first wave of climate tech startups adopting generative AI to optimize operations. For example, startups such as BrainBox AI and Pendulum used Amazon Bedrock and fine-tuned existing LLMs on AWS Trainium using Amazon SageMaker to more rapidly onboard new customers through automated document ingestion and data extraction. Midway through 2023, we saw the next wave of climate tech startups building sophisticated intelligent assistants by fine-tuning existing LLMs for specific use cases. For example, NET2GRID used Amazon SageMaker for fine-tuning and deploying scale-based LLMs based on Llama 7B to build EnergyAI, an assistant that provides quick, personalized responses to utility customers’ energy-related questions.
Over the last 6 months, we’ve seen a flurry of climate tech startups building FMs that address specific climate and environmental challenges. Unlike language-based models, these startups are building models based on real-world data, like weather or geospatial earth data. Whereas LLMs such as Anthropic’s Claude or Amazon Nova have hundreds of billions of parameters, climate tech startups are building smaller models with just a few billion parameters. This means these models are faster and less expensive to train. We’re seeing some emerging trends in use cases or climate challenges that startups are addressing by building FMs. Here are the top use cases, in order of popularity:

Weather – Trained on historic weather data, these models offer short-term and long-term, hyperaccurate, hyperlocal weather and climate predictions, some focusing on specific weather elements like wind, heat, or sun.
Sustainable material discovery – Trained on scientific data, these models invent new sustainable material that solve specific problems, like more efficient direct air capture sorbents to reduce the cost of carbon removal or molecules to destroy microplastics from the environment.
Natural ecosystems – Trained on a mix of data from satellites, lidar, and on-the ground sensors, these models offer insights into natural ecosystems, biodiversity, and wildfire predictions.
Geological modeling – Trained on geological data, these models help determine the best locations for geothermal or mining operations to reduce waste and save money.

To offer a more concrete look at these trends, the following is a deep dive into how climate tech startups are building FMs on AWS.
Orbital Materials: Foundation models for sustainable material discovery
Orbital Materials has built a proprietary AI platform to design, synthesize, and test new sustainable materials. Developing new advanced materials has traditionally been a slow process of trial and error in the lab. Orbital replaces this with generative AI design, radically speeding up materials discovery and new technology commercialization. They’ve released a generative AI model called “Orb” that suggests new material design, which the team then tests and perfects in the lab.
Orb is a diffusion model that Orbital Materials trained from scratch using SageMaker HyperPod. The first product the startup designed with Orb is a sorbent for carbon capture in direct air capture facilities. Since establishing its lab in the first quarter of 2024, Orbital has achieved a tenfold improvement in its material’s performance using its AI platform—an order of magnitude faster than traditional development and breaking new ground in carbon removal efficacy. By improving the performance of the materials, the company can help drive down the costs of carbon removal, which can enable rapid scale-up. They chose to use SageMaker HyperPod because they “like the one-stop shop for control and monitoring,” explained Jonathan Godwin, CEO of Orbital Material. Orbital was able to reduce their total cost of ownership (TCO) for their GPU cluster with Amazon SageMaker HyperPod deep health checks for stress testing their GPU instances to swap out faulty nodes. Moreover, Orbital can use SageMaker HyperPod to automatically swap out failing nodes and restart model training from the last saved checkpoint, freeing up time for the Orbital Materials team. The SageMaker HyperPod monitoring agent continually monitors and detects potential issues, including memory exhaustion, disk failures, GPU anomalies, kernel deadlocks, container runtime issues, and out-of-memory (OOM) crashes. Based on the underlying issue the monitoring agent either replaces or reboots the node.
With the launch of SageMaker HyperPod on Amazon Elastic Kubernetes Service (Amazon EKS), Orbital can set up a unified control plane consisting of both CPU-based workloads and GPU-accelerated tasks within the same Kubernetes cluster. This architectural approach eliminates the traditional complexity of managing separate clusters for different compute resources, significantly reducing operational overhead. Orbital can also monitor the health status of SageMaker HyperPod nodes through Amazon CloudWatch Container Insights with enhanced observability for Amazon EKS. Amazon CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from containerized applications and microservices, providing detailed insights into performance, health, and status metrics for CPU, GPU, Trainium, or Elastic Fabric Adapter (EFA) and file system up to the container level.
AWS and Orbital Materials have established a deep partnership that enables fly-wheel growth. The companies have entered a multiyear partnership, in which Orbital Material builds its FMs with SageMaker HyperPod and other AWS services. In return, Orbital Materials is using AI to develop new data center decarbonization and efficiency technologies. To further spin the fly-wheel, Orbital will be making its market-leading open source AI model for simulating advanced materials, Orb, generally available for AWS customers by using Amazon SageMaker JumpStart and AWS Marketplace. This marks the first AI-for-materials model to be on AWS platforms. With Orb, AWS customers working on advanced materials and technologies such as semiconductors, batteries, and electronics can access market-leading accelerated research and development (R&D) within a secure and unified cloud environment.
The architectural advantages of SageMaker HyperPod on Amazon EKS are demonstrated in the following diagram. The diagram illustrates how Orbital can establish a unified control plane that manages both CPU-based workloads and GPU-accelerated tasks within a single Kubernetes cluster. This streamlined architecture eliminates the traditional complexity of managing separate clusters for different compute resources, providing a more efficient and integrated approach to resource management. The visualization shows how this consolidated infrastructure enables Orbital to seamlessly orchestrate their diverse computational needs through a single control interface.

Hum.AI: Foundation models for earth observation
Hum.AI is building generative AI FMs that provide general intelligence of the natural world. Customers can use the platform to track and predict ecosystems and biodiversity to understand business impact and better protect the environment. For example, they work with coastal communities who use the platform and insights to restore coastal ecosystems and improve biodiversity.
Hum.AI’s foundation model looks at natural world data and learns to represent it visually. They’re training on 50 years of historic data collected by satellites, which amounts to thousands of petabytes of data. To accommodate processing this massive dataset, they chose SageMaker HyperPod for its scalable infrastructure. Through their innovative model architecture, the company achieved the ability to see underwater from space for the very first time, overcoming the historical challenges posed by water reflections
Hum.AI’s FM architecture employs a variational autoencoder (VAE) and generative adversarial network (GAN) hybrid design, specifically optimized for satellite imagery analysis. It’s an encoder-decoder model, where the encoder transforms satellite data into a learned latent space, while the decoder reconstructs the imagery (after being processed in the latent space), maintaining consistency across different satellite sources. The discriminator network provides both adversarial training signals and learned feature-wise reconstruction metrics. This approach helps preserve important ecosystem details that would otherwise be lost with traditional pixel-based comparisons, particularly for underwater environments, where water reflections typically interfere with visibility.
Using SageMaker HyperPod to train such a complex model enables Hum.AI to efficiently process their personally curated SeeFar dataset through distributed training across multiple GPU-based instances. The model simultaneously optimizes both VAE and GAN objectives across GPUs. This, paired with the SageMaker HyperPod auto-resume feature that automatically resumes a training run from the latest checkpoint, provides training continuity, even through node failures.
Hum.AI also used the SageMaker HyperPod out-of-the-box comprehensive observability features through Amazon Managed Service for Prometheus and Amazon Managed Service for Grafana for metric tracking. For their distributed training needs, they used dashboards to monitor cluster performance, GPU metrics, network traffic, and storage operations. This extensive monitoring infrastructure enabled Hum.AI to optimize their training process and maintain high resource utilization throughout their model development.

“Our decision to use SageMaker HyperPod was simple; it was the only service out there where you can continue training through failure. We were able to train larger models faster by taking advantage of the large-scale clusters and redundancy offered by SageMaker HyperPod. We were able to execute experiments faster and iterate models at speeds that were impossible prior to SageMaker HyperPod. SageMaker HyperPod took all of the worry out of large-scale training failures. They’ve built the infrastructure to hot swap GPUs if anything goes wrong, and it saves thousands in lost progress between checkpoints. The SageMaker HyperPod team personally helped us set up and execute large training rapidly and easily.”
– Kelly Zheng, CEO of Hum.AI.

Hum.AI’s innovative approach to model training is illustrated in the following figure. The diagram showcases how their model simultaneously optimizes both VAE and GAN objectives across multiple GPUs. This distributed training strategy is complemented by the SageMaker HyperPod auto-resume feature, which automatically restarts training runs from the latest checkpoint. Together, these capabilities provide continual and efficient training, even in the face of potential node failures. The image provides a visual representation of this robust training process, highlighting the seamless integration between Hum.AI’s model architecture and SageMaker HyperPod infrastructure support.

How to save time and money building with Amazon SageMaker HyperPod
Amazon SageMaker HyperPod removes the undifferentiated heavy lifting for climate tech startups building FMs, saving them time and money. For more information on how SageMaker HyperPod’s resiliency helps save costs while training, check out Reduce ML training costs with Amazon SageMaker HyperPod.
At its core is deep infrastructure control optimized for processing complex environmental data, featuring secure access to Amazon Elastic Compute Cloud (Amazon EC2) instances and seamless integration with orchestration tools such as Slurm and Amazon EKS. This infrastructure excels at handling multimodal environmental inputs, from satellite imagery to sensor network data, through distributed training across thousands of accelerators.
The intelligent resource management available in SageMaker HyperPod is particularly valuable for climate modeling, automatically governing task priorities and resource allocation while reducing operational overhead by up to 40%. This efficiency is crucial for climate tech startups processing vast environmental datasets because the system maintains progress through checkpointing while making sure that critical climate modeling workloads receive necessary resources.
For climate tech innovators, the SageMaker HyperPod library of over 30 curated model training recipes accelerates development, allowing teams to begin training environmental models in minutes rather than weeks. The platform’s integration with Amazon EKS provides robust fault tolerance and high availability, essential for maintaining continual environmental monitoring and analysis.
SageMaker HyperPod flexible training plans are particularly beneficial for climate tech projects, allowing organizations to specify completion dates and resource requirements while automatically optimizing capacity for complex environmental data processing. The system’s ability to suggest alternative plans provides optimal resource utilization for computationally intensive climate modeling tasks.With support for next-generation AI accelerators such as the AWS Trainium chips and comprehensive monitoring tools, SageMaker HyperPod provides climate tech startups with a sustainable and efficient foundation for developing sophisticated environmental solutions. This infrastructure enables organizations to focus on their core mission of addressing climate challenges while maintaining operational efficiency and environmental responsibility.
Practices for sustainable computing
Climate tech companies are especially aware of the importance of sustainable computing practices. One key approach is the meticulous monitoring and optimization of energy consumption during computational processes. By adopting efficient training strategies, such as reducing the number of unnecessary training iterations and employing energy-efficient algorithms, startups can significantly lower their carbon footprint.
Additionally, the integration of renewable energy sources to power data centers plays a crucial role in minimizing environmental impact. AWS is determined to make the cloud the cleanest and the most energy-efficient way to run all our customers’ infrastructure and business. We have made significant progress over the years. For example, Amazon is the largest corporate purchaser of renewable energy in the world, every year since 2020. We’ve achieved our renewable energy goal to match all the electricity consumed across our operations—including our data centers—with 100% renewable energy, and we did this 7 years ahead of our original 2030 timeline.
Companies are also turning to carbon-aware computing principles, which involve scheduling computational tasks to coincide with periods of low carbon intensity on the grid. This practice means that the energy used for computing has a lower environmental impact. Implementing these strategies not only aligns with broader sustainability goals but also promotes cost efficiency and resource conservation. As the demand for advanced computational capabilities grows, climate tech startups are becoming vigilant in their commitment to sustainable practices so that their innovations contribute positively to both technological progress and environmental stewardship.
Conclusion
Amazon SageMaker HyperPod is emerging as a crucial tool for climate tech startups in their quest to develop innovative solutions to pressing environmental challenges. By providing scalable, efficient, and cost-effective infrastructure for training complex multimodal and multi- model architectures, SageMaker HyperPod enables these companies to process vast amounts of environmental data and create sophisticated predictive models. From Orbital Materials’ sustainable material discovery to Hum.AI’s advanced earth observation capabilities, SageMaker HyperPod is powering breakthroughs that were previously out of reach. As climate change continues to pose urgent global challenges, SageMaker HyperPod automated management of large-scale AI training clusters, coupled with its fault-tolerance and cost-optimization features, allows climate tech innovators to focus on their core mission rather than infrastructure management. By using SageMaker HyperPod, climate tech startups are not only building more efficient models—they’re accelerating the development of powerful new tools in our collective effort to address the global climate crisis.

About the authors
Ilan Gleiser is a Principal GenAI Specialist at Amazon Web Services (AWS) on the WWSO Frameworks team, focusing on developing scalable artificial general intelligence architectures and optimizing foundation model training and inference. With a rich background in AI and machine learning, Ilan has published over 30 blog posts and delivered more than 100 prototypes globally over the last 5 years. Ilan holds a master’s degree in mathematical economics.
Lisbeth Kaufman is the Head of Climate Tech BD, Startups and Venture Capital at Amazon Web Services (AWS). Her mission is to help the best climate tech startups succeed and reverse the global climate crisis. Her team has technical resources, go-to-market support, and connections to help climate tech startups overcome obstacles and scale. Lisbeth worked on climate policy as an energy/environment/agriculture policy advisor in the U.S. Senate. She has a BA from Yale and an MBA from NYU Stern, where she was a Dean’s Scholar. Lisbeth helps climate tech founders with product, growth, fundraising, and making strategic connections to teams at AWS and Amazon.
Aman Shanbhag is an Associate Specialist Solutions Architect on the ML Frameworks team at Amazon Web Services (AWS), where he helps customers and partners with deploying ML training and inference solutions at scale. Before joining AWS, Aman graduated from Rice University with degrees in computer science, mathematics, and entrepreneurship.
Rohit Talluri is a Generative AI GTM Specialist at Amazon Web Services (AWS). He is partnering with top generative AI model builders, strategic customers, key AI/ML partners, and AWS Service Teams to enable the next generation of artificial intelligence, machine learning, and accelerated computing on AWS. He was previously an Enterprise Solutions Architect and the Global Solutions Lead for AWS Mergers & Acquisitions Advisory.
Ankit Anand is a Senior Foundation Models Go-To-Market (GTM) Specialist at AWS. He partners with top generative AI model builders, strategic customers, and AWS Service Teams to enable the next generation of AI/ML workloads on AWS. Ankit’s experience includes product management expertise within the financial services industry for high-frequency/low-latency trading and business development for Amazon Alexa.

Supercharge your development with Claude Code and Amazon Bedrock promp …

Prompt caching in Amazon Bedrock is now generally available, delivering performance and cost benefits for agentic AI applications. Coding assistants that process large codebases represent an ideal use case for prompt caching.
In this post, we’ll explore how to combine Amazon Bedrock prompt caching with Claude Code—a coding agent released by Anthropic that is now generally available. This powerful combination transforms your development workflow by delivering lightning-fast responses from reducing inference response latency, as well as lowering input token costs. You’ll discover how this makes AI-assisted coding not just more efficient, but also more economically viable for everyday development tasks.
What is Claude Code?

Claude Code is Anthropic’s AI coding assistant powered by Claude Sonnet 4. It operates directly in your terminal, your favorite IDEs such as VS Code and Jetbrains, and in the background with Claude Code SDK, understanding your project context and taking actions without requiring you to manually manipulate and add generated code to a project. Unlike traditional coding assistants, Claude Code can:

Write code and fix bugs spanning multiple files across your codebase
 Answer questions about your code’s architecture and logic
Execute and fix tests, linting, and other commands
Search through git history, resolve merge conflicts, and create commits and PRs
Operate all of your other command line tools, like AWS CLI, Terraform, and k8s

The most compelling aspect of Claude Code is how it integrates into your existing workflow. You simply point it to your project directory and interact with it using natural language commands. Claude Code also supports Model Context Protocol (MCP), allowing you to connect external tools and data sources directly to your terminal and customize its AI capabilities with your context.
To learn more, see Claude Code tutorials and Claude Code: Best practices for agentic coding.
Amazon Bedrock prompt caching for AI-assisted development
The prompt caching feature of Amazon Bedrock dramatically reduces both response times and costs when working with large context. Here’s how it works: When prompt caching is enabled, your agentic AI application (such as Claude Code) inserts cache checkpoint markers at specific points in your prompts. Amazon Bedrock then interprets these application-defined markers and creates cache checkpoints that save the entire model state after processing the preceding text. On subsequent requests, if your prompt reuses that same prefix, the model loads the cached state instead of recomputing.
In the context of Claude Code specifically, this means the application intelligently manages these cache points when processing your codebase, allowing Claude to “remember” previously analyzed code without incurring the full computational and financial cost of reprocessing it. When you ask multiple questions about the same code or iteratively refine solutions, Claude Code leverages these cache checkpoints to deliver faster responses while dramatically reducing token consumption and associated costs.
To learn more, see documentation for Amazon Bedrock prompt caching.
Solution overview: Try Claude Code with Amazon Bedrock prompt caching
Prerequisites

An AWS account with access to Amazon Bedrock.
Appropriate AWS Identity and Access Management (IAM) roles and permissions for Amazon Bedrock.
Amazon Bedrock model access to Anthropic Claude Sonnet 4 on AWS Regions where prompt caching is currently supported such as us-east-1 and us-west-2.
AWS command line interface (AWS CLI) configured with your AWS credentials.

Prompt caching is automatically turned on for supported models and AWS Regions.
Setting up Claude Code with Claude Sonnet 4 on Amazon Bedrock
After configuring AWS CLI with your credentials, follow these steps:

In your terminal, execute the following commands: # Install Claude Code
npm install -g @anthropic-ai/claude-code

# Configure for Amazon Bedrock
export CLAUDE_CODE_USE_BEDROCK=1
export ANTHROPIC_MODEL=’us.anthropic.claude-sonnet-4-20250514-v1:0′
export ANTHROPIC_SMALL_FAST_MODEL=’us.anthropic.claude-3-5-haiku-20241022-v1:0′

# Launch Claude Code
claude
Verify that Claude Code is running by checking for the Welcome to Claude Code! message in your terminal.

To learn more about how to configure Claude Code for Amazon Bedrock, see Connect to Amazon Bedrock.
Getting started with prompt caching
To get started, let’s experiment with a simple prompt.

In Claude Code, execute the prompt: build a basic text-based calculator
Review and respond to Claude Code’s requests:

When prompted with questions like Do you want to create calculator.py? select 1. Yes to continue. Example question: Do you want to create calculator.py?

1. Yes
2. Yes, and don’t ask again for this session (shift+tab)
3. No, and tell Claude what to do differently (esc)
Carefully review each request before approving to maintain security.

After Claude Code generates the calculator application, it will display execution instructions such as: Run the calculator with: python3 calculator.py
Test the application by executing the instructed command above. Then, follow the on-screen prompts to perform calculations.

Claude Code automatically enables prompt caching to optimize performance and costs. To monitor token usage and costs, use the /cost command. You will receive a detailed breakdown similar to this:
/cost
⎿ Total cost: $0.0827
⎿ Total duration (API): 26.3s
⎿ Total duration (wall): 42.3s
⎿ Total code changes: 62 lines added, 0 lines removed
This output provides valuable insights into your session’s resource consumption, including total cost, API processing time, wall clock time, and code modifications.
Getting started with prompt caching
To understand the benefits of prompt caching, let’s try the same prompt without prompt caching for comparison:

In the terminal, exit Claude Code by pressing Ctrl+C.
To create a new project directory, run the command: mkdir test-disable-prompt-caching; cd test-disable-prompt-caching
Disable prompt caching by setting an environment variable: export DISABLE_PROMPT_CACHING=1
Execute claude to run Claude Code.
Verify prompt caching is disabled by checking the terminal output. You should see Prompt caching: off under the Overrides (via env) section.
Execute the prompt: build a basic text-based calculator
After completion, execute /cost to view resource usage.

You will see a higher resource consumption compared to when prompt caching is enabled, even with a simple prompt:
/cost
⎿ Total cost: $0.1029
⎿ Total duration (API): 32s
⎿ Total duration (wall): 1m 17.5s
⎿ Total code changes: 57 lines added, 0 lines removed
Without prompt caching, each interaction incurs the full cost of processing your context.
Cleanup
To re-enable prompt caching, exit Claude Code and run unset DISABLE_PROMPT_CACHING before restarting Claude. Claude Code does not incur cost when you are not using it.
Prompt caching for complex codebases and efficient iteration
When working with complex codebases, prompt caching delivers significantly greater benefits than with simple prompts. For an illustrative example, consider the initial prompt: Develop a game similar to Pac-Man. This initial prompt generates the foundational project structure and files. As you refine the application with prompts such as Implement unique chase patterns for different ghosts, the coding agent must comprehend your entire codebase to be able to make targeted changes.
Without prompt caching, you force the model to reprocess thousands of tokens representing your code structure, class relationships, and existing implementations, with each iteration.
Prompt caching alleviates this redundancy by preserving your complex context, transforming your software development workflow with:

Dramatically reduced token costs for repeated interactions with the same files
Faster response times as Claude Code doesn’t need to reprocess your entire codebase
Efficient development cycles as you iterate without incurring full costs each time

Prompt caching with Model Context Protocol (MCP)
Model Context Protocol (MCP) transforms your coding experience by connecting coding agents to your specific tools and information sources. You can connect Claude Code to MCP servers that integrate to your file systems, databases, development tools and other productivity tools. This transforms a generic coding assistant into a personalized assistant that can interact with your data and tools beyond your codebase, follow your organization’s best practices, accelerating your unique development processes and workflows.
When you build on AWS, you gain additional advantages by leveraging AWS open source MCP servers for code assistants that provide intelligent AWS documentation search, best-practice recommendations, and real-time cost visibility, analysis and insights – without leaving your software development workflow.
Amazon Bedrock prompt caching becomes essential when working with MCP, as it preserves complex context across multiple interactions. With MCP continuously enriching your prompts with external knowledge and tools, prompt caching alleviates the need to repeatedly process this expanded context, slashing costs by up to 90% and reducing latency by up to 85%. This optimization proves particularly valuable as your MCP servers deliver increasingly sophisticated context about your unique development environment, so you can rapidly iterate through complex coding challenges while maintaining relevant context for up to 5 minutes without performance penalties or additional costs.
Considerations when deploying Claude Code to your organization
With Claude Code now generally available, many customers are considering deployment options on AWS to take advantage of its coding capabilities. For deployments, consider your foundational architecture for security and governance:
Consider leveraging AWS IAM Identity Center, formerly AWS Single Sign On (SSO) to centrally govern identity and access to Claude Code. This verifies that only authorized developers have access. Additionally, it allows developers to access resources with temporary, role-based credentials, alleviating the need for static access keys and enhancing security. Prior to opening Claude Code, make sure that you configure AWS CLI to use an IAM Identity Center profile by using aws configure sso –profile <PROFILE_NAME>. Then, you login using the profile created aws sso login –profile <PROFILE_NAME>.
Consider implementing a generative AI gateway on AWS to track and attribute costs effectively across different teams or projects using inference profiles. For Claude Code to use a custom endpoint, configure the ANTHROPIC_BEDROCK_BASE_URL environment variable with the gateway endpoint. Note that the gateway should be a pass-through proxy, see example implementation with LiteLLM. To learn more about AI gateway solutions, contact your AWS account team.
Consider automated configuration of default environment variables. This includes the environment variables outlined in this post, such as CLAUDE_CODE_USE_BEDROCK, ANTHROPIC_MODEL, and ANTHROPIC_FAST_MODEL. This will configure Claude Code to automatically connect Bedrock, providing a consistent baseline for development across teams. To begin with, organizations can start by providing developers with self-service instructions.
Consider permissions, memory and MCP servers for your organization. Security teams can configure managed permissions for what Claude Code is and is not allowed to do, which cannot be overwritten by local configuration. In addition, you can configure memory across all projects which allows you to auto-add common bash commands files workflows, and style conventions to align with your organization’s preference. This can be done by deploying your CLAUDE.md file into an enterprise directory /<enterprise root>/CLAUDE.md or the user’s home directory ~/.claude/CLAUDE.md. Finally, we recommend that one central team configures MCP servers and checks a .mcp.json configuration into the codebase so that all users benefit.
To learn more, see Claude Code team setup documentation or contact your AWS account team.
Conclusion
In this post, you learned how Amazon Bedrock prompt caching can significantly enhance AI applications, with Claude Code’s agentic AI assistant serving as a powerful demonstration. By leveraging prompt caching, you can process large codebases more efficiently, helping to dramatically reduce costs and response times. With this technology you can have faster, more natural interactions with your code, allowing you to iterate rapidly with generative AI. You also learned about Model Context Protocol (MCP), and how the seamless integration of external tools lets you customize your AI assistant with specific context like documentation and web resources. Whether you’re tackling complex debugging, refactoring legacy systems, or developing new features, the combination of Amazon Bedrock’s prompt caching and AI coding agents like Claude Code offers a more responsive, cost-effective, and intelligent approach to software development.
Amazon Bedrock prompt caching is generally available with Claude 4 Sonnet and Claude 3.5 Haiku. To learn more, see prompt caching and Amazon Bedrock.
Anthropic Claude Code is now generally available. To learn more, see Claude Code overview and contact your AWS account team for guidance on deployment.

About the Authors
Jonathan Evans is a Worldwide Solutions Architect for Generative AI at AWS, where he helps customers leverage cutting-edge AI technologies with Anthropic’s Claude models on Amazon Bedrock, to solve complex business challenges. With a background in AI/ML engineering and hands-on experience supporting machine learning workflows in the cloud, Jonathan is passionate about making advanced AI accessible and impactful for organizations of all sizes.
Daniel Wirjo is a Solutions Architect at AWS, focused on SaaS and AI startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.
Omar Elkharbotly is a Senior Cloud Support Engineer at AWS, specializing in Data, Machine Learning, and Generative AI solutions. With extensive experience in helping customers architect and optimize their cloud-based AI/ML/GenAI workloads, Omar works closely with AWS customers to solve complex technical challenges and implement best practices across the AWS AI/ML/GenAI service portfolio. He is passionate about helping organizations leverage the full potential of cloud computing to drive innovation in generative AI and machine learning.
Gideon Teo is a FSI Solution Architect at AWS in Melbourne, where he brings specialised expertise in Amazon SageMaker and Amazon Bedrock. With a deep passion for both traditional AI/ML methodologies and the emerging field of Generative AI, he helps financial institutions leverage cutting-edge technologies to solve complex business challenges. Outside of work, he cherishes quality time with friends and family, and continuously expands his knowledge across diverse technology domains.

OpenAI Introduces Four Key Updates to Its AI Agent Framework

OpenAI has announced a set of targeted updates to its AI agent development stack, aimed at expanding platform compatibility, improving support for voice interfaces, and enhancing observability. These updates reflect a consistent progression toward building practical, controllable, and auditable AI agents that can be integrated into real-world applications across client and server environments.

1. TypeScript Support for the Agents SDK

OpenAI’s Agents SDK is now available in TypeScript, extending the existing Python implementation to developers working in JavaScript and Node.js environments. The TypeScript SDK provides parity with the Python version, including foundational components such as:

Handoffs: Mechanisms to route execution to other agents or processes.

Guardrails: Runtime checks that constrain tool behavior to defined boundaries.

Tracing: Hooks for collecting structured telemetry during agent execution.

MCP (Model Context Protocol): Protocols for passing contextual state between agent steps and tool calls.

This addition brings the SDK into alignment with modern web and cloud-native application stacks. Developers can now build and deploy agents across both frontend (browser) and backend (Node.js) contexts using a unified set of abstractions. The open documentation is available at openai-agents-js.

2. RealtimeAgent with Human-in-the-Loop Capabilities

OpenAI introduced a new RealtimeAgent abstraction to support latency-sensitive voice applications. RealtimeAgents extend the Agents SDK with audio input/output, stateful interactions, and interruption handling.

One of the more substantial features is human-in-the-loop (HITL) approval, allowing developers to intercept an agent’s execution at runtime, serialize its state, and require manual confirmation before continuing. This is especially relevant for applications requiring oversight, compliance checkpoints, or domain-specific validation during tool execution.

Developers can pause execution, inspect the serialized state, and resume the agent with full context retention. The workflow is described in detail in OpenAI’s HITL documentation.

3. Traceability for Realtime API Sessions

Complementing the RealtimeAgent feature, OpenAI has expanded the Traces dashboard to include support for voice agent sessions. Tracing now covers full Realtime API sessions—whether initiated via the SDK or directly through API calls.

The Traces interface allows visualization of:

Audio inputs and outputs (streamed or buffered)

Tool invocations and parameters

User interruptions and agent resumptions

This provides a consistent audit trail for both text-based and audio-first agents, simplifying debugging, quality assurance, and performance tuning across modalities. The trace format is standardized and integrates with OpenAI’s broader monitoring stack, offering visibility without requiring additional instrumentation.

Further implementation details are available in the voice agent guide at openai-agents-js/guides/voice-agents.

4. Refinements to the Speech-to-Speech Pipeline

OpenAI has also made updates to its underlying speech-to-speech model, which powers real-time audio interactions. Enhancements focus on reducing latency, improving naturalness, and handling interruptions more effectively.

While the model’s core capabilities—speech recognition, synthesis, and real-time feedback—remain in place, the refinements offer better alignment for dialog systems where responsiveness and tone variation are essential. This includes:

Lower latency streaming: More immediate turn-taking in spoken conversations.

Expressive audio generation: Improved intonation and pause modeling.

Robustness to interruptions: Agents can respond gracefully to overlapping input.

These changes align with OpenAI’s broader efforts to support embodied and conversational agents that function in dynamic, multimodal contexts.

Conclusion

Together, these four updates strengthen the foundation for building voice-enabled, traceable, and developer-friendly AI agents. By providing deeper integrations with TypeScript environments, introducing structured control points in real-time flows, and enhancing observability and speech interaction quality, OpenAI continues to move toward a more modular and interoperable agent ecosystem.

Four updates to building agents with OpenAI: Agents SDK in TypeScript, a new RealtimeAgent feature for voice agents, Traces support for the Realtime API, and improvements to our speech-to-speech model.— OpenAI Developers (@OpenAIDevs) June 3, 2025

The post OpenAI Introduces Four Key Updates to Its AI Agent Framework appeared first on MarkTechPost.

Hugging Face Releases SmolVLA: A Compact Vision-Language-Action Model …

Despite recent progress in robotic control via large-scale vision-language-action (VLA) models, real-world deployment remains constrained by hardware and data requirements. Most VLA models depend on transformer-based backbones with billions of parameters, resulting in significant memory and compute costs. This limits experimentation to well-resourced labs and clouds, excluding practitioners working with lower-cost hardware. Additionally, much of the current progress in VLA research remains either proprietary or based on non-reproducible methodologies, impeding open research. Finally, data heterogeneity across robotic platforms—differences in morphology, sensors, and control modes—poses a further challenge to generalizability and cross-platform learning.

Hugging Face Introduces SmolVLA: A Lightweight, Open VLA Framework

Hugging Face presents SmolVLA, a compact vision-language-action model developed for affordability and deployment efficiency. Unlike conventional VLAs, SmolVLA is trained entirely on community-collected datasets and is optimized to run on single-GPU or CPU environments. The model architecture integrates a trimmed version of a pretrained vision-language model (SmolVLM-2) and a transformer-based action expert. This structure enables efficient low-level control from natural language instructions and RGB camera inputs.

A distinguishing feature of SmolVLA is its asynchronous inference stack, which decouples action prediction from execution. This design enables low-latency control suitable for real-time applications, even in resource-constrained settings. SmolVLA is released under an open license with accompanying code, training data, and deployment tools.

Architectural Overview and Design Trade-Offs

The SmolVLA model is structured into two primary components:

Perception Module (SmolVLM-2): A pretrained compact vision-language encoder processes sequences of RGB images, sensorimotor states, and language instructions. For efficiency, the model limits visual tokens through downsampling and only uses the lower half of transformer layers, based on empirical findings that earlier layers often yield more transferable features.

Action Expert: A lightweight transformer, trained with flow matching, predicts sequences of continuous control actions. The action expert alternates between self-attention and cross-attention layers, balancing internal action coherence and conditioning on perception inputs. Causal masking is applied to enforce temporal consistency.

To reduce computational overhead, linear projections are used to align the modalities’ token dimensions. Action chunks are generated instead of single-step predictions, reducing the frequency of inference calls. The model is trained using bfloat16 precision and Torch’s JIT compilation for runtime optimization.

Empirical Evaluation: Simulation and Real-World Performance

SmolVLA is evaluated across both simulation benchmarks (LIBERO and Meta-World) and real-world robotic tasks using low-cost SO100 and SO101 platforms. The model is trained from scratch on ~23K episodes across 481 community datasets, with task labels auto-generated using a VLM. Evaluation metrics include task-level success rates under both in-distribution and out-of-distribution conditions.

In the LIBERO benchmark, SmolVLA (0.45B) achieves an average success rate of 87.3%, closely matching or surpassing larger models such as π₀ (3.3B). In Meta-World, the model outperforms diffusion policies and smaller-scale VLAs across task difficulty levels. These results are notable considering SmolVLA’s smaller training footprint and absence of robotics-specific pretraining.

In real-world settings, SmolVLA achieves average success rates of 78.3% across pick-place, stacking, and sorting tasks—outperforming both ACT (trained from scratch) and π₀ (finetuned). Moreover, SmolVLA generalizes across robotic embodiments, maintaining performance on SO101 despite training exclusively on SO100 data.

Performance Implications of Asynchronous Inference

SmolVLA’s asynchronous inference stack improves control efficiency by overlapping prediction and execution. Compared to traditional synchronous inference, this approach reduces average task time by ~30% and doubles the number of completed actions in fixed-time scenarios. This is particularly beneficial for edge deployments where inference delays degrade real-time performance.

Conclusion

SmolVLA demonstrates that compact, reproducible, and open-source VLA models can support competent robotic control on low-cost hardware. Through careful architectural choices—layer pruning, chunked action prediction, and asynchronous execution—SmolVLA maintains performance while significantly reducing computational demands.

The model’s open training and deployment stack, paired with real-world evaluations, offers a practical foundation for further research in efficient and accessible robot learning. Future directions include expanding cross-embodiment datasets, scaling model capacity without sacrificing latency, and exploring joint training on multimodal corpora beyond robotics data.

Check out the Paper and Model on Hugging Face . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
The post Hugging Face Releases SmolVLA: A Compact Vision-Language-Action Model for Affordable and Efficient Robotics appeared first on MarkTechPost.

From Exploration Collapse to Predictable Limits: Shanghai AI Lab Propo …

Recent advances in reasoning-centric large language models (LLMs) have expanded the scope of reinforcement learning (RL) beyond narrow, task-specific applications, enabling broader generalization and reasoning capabilities. However, this shift introduces significant challenges, particularly in scaling the training compute required for learning from experience. Unlike imitation learning through pre-training and fine-tuning, RL demands a more computationally intensive approach. A central issue is the decline in policy entropy, which affects the balance between exploiting known strategies and exploring new ones. This exploitation-exploration trade-off is fundamental in RL, and controlling policy entropy has become critical to maintaining effective exploration during training.

Existing efforts address the exploration-exploitation trade-off in RL by utilizing policy entropy. Maximum entropy RL introduces a regularization term to the reward function, promoting uncertainty in action selection and encouraging broader exploration. While this technique has been widely adopted in conventional RL algorithms, its application to LLMs remains debated. Moreover, predictability in RL for LLMs is not explored. While neural scaling laws guide LLM development, similar predictive frameworks for RL training remain limited. Existing RL methods for LLMs with verifiable rewards show promise in reasoning improvements, but lack a deep understanding of their core mechanisms.

Researchers from Shanghai AI Laboratory, Tsinghua University, UIUC, Peking University, Nanjing University, and CUHK provide an approach to address the collapse of policy entropy in RL for reasoning-centric LLMs. They established a transformation equation, R = −a exp H + b, where H is entropy, R is downstream performance, and a and b are fitting coefficients. This empirical law strongly suggests that policy performance is traded from policy entropy, thus bottlenecked by its exhaustion. Researchers investigate entropy dynamics, and their derivation highlights that the change in policy entropy is driven by the covariance between action probability and the change in logits. They also proposed two techniques, namely Clip-Cov and KL-Cov, which clip and apply a KL penalty to tokens with high covariances, respectively.​

To investigate and validate the entropy collapse phenomenon in RL-tuned LLMs, researchers applied RL to LLMs on verifiable tasks, like math and coding, using an autoregressive generation setup where models produce token sequences based on input prompts. The study involves 11 widely adopted open-source models spanning four families: Qwen2.5, Mistral, LLaMA, and DeepSeek, with parameters ranging from 0.5B to 32 B. Evaluations are performed on eight public benchmarks, including MATH500, AIME 2024, AMC, and Eurus-2-RL-Code. Moreover, RL training follows the veRL framework in a zero-shot setting, utilizing algorithms like GRPO, REINFORCE++, and PRIME to optimize policy performance while observing entropy dynamics.

The proposed Clip-Cov and KL-Cov techniques were evaluated on the Qwen2.5 models using the DAPOMATH dataset for math tasks. These methods achieve non-trivial performance gains across all benchmarks. In comparison to the GRPO baseline, these methods improve performance by 2.0% on average for the 7B model and by 6.4% for the 32B model. For example, when the baseline’s entropy reaches a plateau, the KL-Cov method still sustains an entropy level over 10 times higher. The methods can maintain a higher level of entropy throughout the training. Moreover, the methods yield more substantial gains on the larger Qwen2.5-32B model, with improvements of 15.0% and 14.6% compared to GRPO on the most challenging benchmarks, AIME24 and AIME25, respectively. 

In conclusion, researchers have overcome the challenge of policy entropy collapse in RL for reasoning-centric LLMs. The findings highlight a trade-off between performance improvement and diminished exploration, which ultimately limits further gains. Through theoretical analysis and empirical validation, researchers identify entropy dynamics as a key bottleneck and propose two effective regularization strategies—Clip-Cov and KL-Cov to manage high-covariance tokens and sustain exploration. As RL emerges as a crucial axis for scaling beyond pre-training, addressing entropy collapse becomes essential. This work provides foundational insights into the role of entropy, guiding future efforts to scale RL toward more intelligent and capable language models.

Check out the Paper and GitHub Page . All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.

The post From Exploration Collapse to Predictable Limits: Shanghai AI Lab Proposes Entropy-Based Scaling Laws for Reinforcement Learning in LLMs appeared first on MarkTechPost.

Unlocking the power of Model Context Protocol (MCP) on AWS

We’ve witnessed remarkable advances in model capabilities as generative AI companies have invested in developing their offerings. Language models such as Anthropic’s Claude Opus 4 & Sonnet 4 and Amazon Nova on Amazon Bedrock can reason, write, and generate responses with increasing sophistication. But even as these models grow more powerful, they can only work with the information available to them.
No matter how impressive a model might be, it’s confined to the data it was trained on or what’s manually provided in its context window. It’s like having the world’s best analyst locked in a room with incomplete files—brilliant, but isolated from your organization’s most current and relevant information.
This isolation creates three critical challenges for enterprises using generative AI:

Information silos trap valuable data behind custom APIs and proprietary interfaces
Integration complexity requires building and maintaining bespoke connectors and glue code for every data source or tool provided to the language model for every data source
Scalability bottlenecks appear as organizations attempt to connect more models to more systems and tools

Sound familiar? If you’re an AI-focused developer, technical decision-maker, or solution architect working with Amazon Web Services (AWS) and language models, you’ve likely encountered these obstacles firsthand. Let’s explore how the Model Context Protocol (MCP) offers a path forward.
What is the MCP?
The MCP is an open standard that creates a universal language for AI systems to communicate with external data sources, tools, and services. Conceptually, MCP functions as a universal translator, enabling seamless dialogue between language models and the diverse systems where your valuable information resides.
Developed by Anthropic and released as an open source project, MCP addresses a fundamental challenge: how to provide AI models with consistent, secure access to the information they need, when they need it, regardless of where that information lives.

At its core, MCP implements a client-server architecture:

MCP clients are AI applications like Anthropic’s Claude Desktop or custom solutions built on Amazon Bedrock that need access to external data
MCP servers provide standardized access to specific data sources, whether that’s a GitHub repository, Slack workspace, or AWS service
Communication flow between clients and servers follows a well-defined protocol that can run locally or remotely

This architecture supports three essential primitives that form the foundation of MCP:

Tools – Functions that models can call to retrieve information or perform actions
Resources – Data that can be included in the model’s context such as database records, images, or file contents
Prompts – Templates that guide how models interact with specific tools or resources

What makes MCP especially powerful is its ability to work across both local and remote implementations. You can run MCP servers directly on your development machine for testing or deploy them as distributed services across your AWS infrastructure for enterprise-scale applications.
Solving the M×N integration problem
Before diving deeper into the AWS specific implementation details, it’s worth understanding the fundamental integration challenge MCP solves.
Imagine you’re building AI applications that need to access multiple data sources in your organization. Without a standardized protocol, you face what we call the “M×N problem”: for M different AI applications connecting to N different data sources, you need to build and maintain M×N custom integrations.
This creates an integration matrix that quickly becomes unmanageable as your organization adds more AI applications and data sources. Each new system requires multiple custom integrations, with development teams duplicating efforts across projects. MCP transforms this M×N problem into a simpler M+N equation: with MCP, you build M clients and N servers, requiring only M+N implementations. These solutions to the MCP problem are shown in the following diagram.

This approach draws inspiration from other successful protocols that solved similar challenges:

APIs standardized how web applications interact with the backend
Language Server Protocol (LSP) standardizes how integrated development environments (IDEs) interact with language-specific tools for coding

In the same way that these protocols revolutionized their domains, MCP is poised to transform how AI applications interact with the diverse landscape of data sources in modern enterprises.
Why MCP matters for AWS users
For AWS customers, MCP represents a particularly compelling opportunity. AWS offers hundreds of services, each with its own APIs and data formats. By adopting MCP as a standardized protocol for AI interactions, you can:

Streamline integration between Amazon Bedrock language models and AWS data services
Use existing AWS security mechanisms such as AWS Identity and Access Management (IAM) for consistent access control
Build composable, scalable AI solutions that align with AWS architectural best practices

MCP and the AWS service landscape
What makes MCP particularly powerful in the AWS context is how it can interface with the broader AWS service landscape. Imagine AI applications that can seamlessly access information from:

Amazon Simple Storage (Amazon S3) buckets containing documents, images, and unstructured data
Amazon DynamoDB tables storing structured business information
Amazon Relational Database Service (Amazon RDS) databases with historical transaction records
Amazon CloudWatch logs for operational intelligence
Amazon Bedrock Knowledge Bases for semantic search capabilities

MCP servers act as consistent interfaces to these diverse data sources, providing language models with a unified access pattern regardless of the underlying AWS service architecture. This alleviates the need for custom integration code for each service and enables AI systems to work with your AWS resources in a way that respects your existing security boundaries and access controls.
In the remaining sections of this post, we explore how MCP works with AWS services, examine specific implementation examples, and provide guidance for technical decision-makers considering adopt MCP in their organizations.
How MCP works with AWS services, particularly Amazon Bedrock
Now that we’ve shown the fundamental value proposition of MCP, we dive into how it integrates with AWS services, with a special focus on Amazon Bedrock. This integration creates a powerful foundation for building context-aware AI applications that can securely access your organization’s data and tools.
Amazon Bedrock and language models
Amazon Bedrock represents the strategic commitment by AWS to make foundation models (FMs) accessible, secure, and enterprise-ready. It’s a fully managed service that provides a unified API across multiple leading language models, including:

Anthropic’s Claude
Meta’s Llama
Amazon Titan and Amazon Nova

What makes Amazon Bedrock particularly compelling for enterprise deployments is its integration with the broader AWS landscape. You can run FMs with the same security, compliance, and operational tools you already use for your AWS workloads. This includes IAM for access control and CloudWatch for monitoring.
At the heart of the versatility of Amazon Bedrock is the Converse API—the interface that enables multiturn conversations with language models. The Converse API includes built-in support for what AWS calls “tool use,” allowing models to:

Recognize when they need information outside their training data
Request that information from external systems using well-defined function calls
Incorporate the returned data into their responses

This tool use capability in the Amazon Bedrock Converse API dovetails perfectly with MCP’s design, creating a natural integration point.
MCP and Amazon Bedrock integration architecture
Integrating MCP with Amazon Bedrock involves creating a bridge between the model’s ability to request information (through the Converse API) and MCP’s standardized protocol for accessing external systems.
Integration flow walkthrough
To help you understand how MCP and Amazon Bedrock work together in practice, we walk through a typical interaction flow, step-by-step:

The user initiates a query through your application interface:

“What were our Q1 sales figures for the Northwest region?”

Your application forwards the query to Amazon Bedrock through the Converse API:

# Initialize the Bedrock runtime client with your AWS credentials
bedrock = boto3.client(service_name=’bedrock-runtime’, region_name=’us-east-1′)

# Define the query from the user
user_query = “What were our Q1 sales figures for the Northwest region?”

# available_tools contains tool definitions that match MCP server capabilities
# These will be exposed to the model through the Converse API

# Call the Converse API with the user’s query and available tools
response = bedrock.converse(
modelId=”us.anthropic.claude-3-7-sonnet-20250219-v1:0″, # Specify which language model to use
messages=[{“role”: “user”, “content”: [{“text”: user_query}]}], # Format the user’s message
toolConfig={“tools”: available_tools} # Pass the tool definitions to the model
)

Amazon Bedrock processes the query and determines that it needs financial data that isn’t in its training data
Amazon Bedrock returns a toolUse message, requesting access to a specific tool:

{
“role”: “assistant”, // Indicates this message is from the model
“content”: [{
“toolUse”: { // The model is requesting to use a tool
“toolUseId”: “tu_01234567”, // Unique identifier for this tool use request
“name”: “query_sales_data”, // Name of the tool the model wants to use
“input”: { // Parameters for the tool call
“quarter”: “Q1”, // The model extracted this parameter from the user query
“region”: “Northwest” // Another parameter extracted from the user query
}
}
}]
}

Your MCP client application receives this toolUse message and translates it into an MCP protocol tool call
The MCP client routes the request to the appropriate MCP server (in this case, a server connected to your financial database)
The MCP server executes the tool, retrieving the requested data from your systems:

# Call the tool through the MCP protocol
# session is the MCP client session established earlier
result = await session.call_tool(
“query_sales_data”, # The tool name from the toolUse message
{
“quarter”: “Q1”, # Pass through the parameters from the toolUse message
“region”: “Northwest”
}
)
# The MCP server handles authentication, data access, and result formatting
# This abstracts away the complexity of accessing different data sources

The tool results are returned through the MCP protocol to your client application
Your application sends the results back to Amazon Bedrock as a toolResult message:

{
“role”: “user”, // This is sent as if from the user, but contains tool results
“content”: [{
“toolResult”: { // Indicates this is a result from a tool
“toolUseId”: “tu_01234567”, // Must match the ID from the original toolUse
“content”: [{
“json”: { // Results are formatted as JSON
“total_sales”: 12450000, // Numerical data accessible to the model
“growth”: 0.12, // Percentage growth for analysis
“top_products”: [“Product A”, “Product B”, “Product C”] // List data
}
}]
}
}]
}

Amazon Bedrock generates a final response incorporating the tool results:

“Based on the data I’ve retrieved, our Q1 sales figures for the Northwest region were $12.45 million,
representing a 12% growth compared to the previous quarter.
The top-performing products were Product A, Product B, and Product C.”

Your application returns the final response to the user

This entire process, illustrated in the following diagram, happens in seconds, giving users the impression of a seamless conversation with an AI that has direct access to their organization’s data. Behind the scenes, MCP is handling the complex work of securely routing requests to the right tools and data sources.

In the next section, we explore a practical implementation example that shows how to connect an MCP server to Amazon Bedrock Knowledge Bases, providing a blueprint for your own implementations.
Practical implementation example: Amazon Bedrock Knowledge Bases integration
As you might recall from our earlier discussion of strategic use cases, enterprise knowledge bases represent one of the most valuable applications of MCP on AWS. Now, we explore a concrete implementation of MCP that connects language models to Amazon Bedrock Knowledge Bases. The code for the MCP server can be found in the AWS Labs MCP code repository and for the client in the same AWS Labs MCP samples directory on GitHub. This example brings to life the “universal translator” concept we introduced earlier, demonstrating how MCP can transform the way AI systems interact with enterprise knowledge repositories.
Understanding the challenge
Enterprise knowledge bases contain vast repositories of information—from documentation and policies to technical guides and product specifications. Traditional search approaches are often inadequate when users ask natural language questions, failing to understand context or identify the most relevant content.
Amazon Bedrock Knowledge Bases provide vector search capabilities that improve upon traditional keyword search, but even this approach has limitations:

Manual filter configuration requires predefined knowledge of metadata structures
Query-result mismatch occurs when users don’t use the exact terminology in the knowledge base
Relevance challenges arise when similar documents compete for attention
Context switching between searching and reasoning disrupts user experience

The MCP server we explore addresses these challenges by creating an intelligent layer between language models and knowledge bases.
Architecture overview
At a high level, our MCP server for Amazon Bedrock Knowledge Bases follows a clean, well-organized architecture that builds upon the client-server pattern we outlined previously. The server exposes two key interfaces to language models:

A knowledge bases resource that provides discovery capabilities for available knowledge bases
A query tool that enables dynamic searching across these knowledge bases

Remember the M×N integration problem we discussed earlier? This implementation provides a tangible example of how MCP solves it – creating a standardized interface between a large language model and your Amazon Bedrock Knowledge Base repositories.
Knowledge base discovery resource
The server begins with a resource that enables language models to discover available knowledge bases:

@mcp.resource(uri=’resource://knowledgebases’, name=’KnowledgeBases’, mime_type=’application/json’)
async def knowledgebases_resource() -> str:
“””List all available Amazon Bedrock Knowledge Bases and their data sources.
 
This resource returns a mapping of knowledge base IDs to their details, including:
– name: The human-readable name of the knowledge base
– data_sources: A list of data sources within the knowledge base, each with:
– id: The unique identifier of the data source
– name: The human-readable name of the data source
 
## Example response structure:
“`json
{
“kb-12345”: {
“name”: “Customer Support KB”,
“data_sources”: [
{“id”: “ds-abc123”, “name”: “Technical Documentation”},
{“id”: “ds-def456”, “name”: “FAQs”}
]
},
“kb-67890”: {
“name”: “Product Information KB”,
“data_sources”: [
{“id”: “ds-ghi789”, “name”: “Product Specifications”}
]
}
}
“`
 
## How to use this information:
1. Extract the knowledge base IDs (like “kb-12345”) for use with the QueryKnowledgeBases tool
2. Note the data source IDs if you want to filter queries to specific data sources
3. Use the names to determine which knowledge base and data source(s) are most relevant to the user’s query
“””
return json.dumps(await discover_knowledge_bases(kb_agent_mgmt_client, kb_inclusion_tag_key))

This resource serves as both documentation and a discovery mechanism that language models can use to identify available knowledge bases before querying them.
Querying knowledge bases with the MCP tool
The core functionality of this MCP server resides in its QueryKnowledgeBases tool:

@mcp.tool(name=’QueryKnowledgeBases’)
async def query_knowledge_bases_tool(
query: str = Field(
…, description=’A natural language query to search the knowledge base with’
),
knowledge_base_id: str = Field(
…,
description=’The knowledge base ID to query. It must be a valid ID from the resource://knowledgebases MCP resource’,
),
number_of_results: int = Field(
10,
description=’The number of results to return. Use smaller values for focused results and larger values for broader coverage.’,
),
reranking: bool = Field(
kb_reranking_enabled,
description=’Whether to rerank the results. Useful for improving relevance and sorting. Can be globally configured with BEDROCK_KB_RERANKING_ENABLED environment variable.’,
),
reranking_model_name: Literal[‘COHERE’, ‘AMAZON’] = Field(
‘AMAZON’,
description=”The name of the reranking model to use. Options: ‘COHERE’, ‘AMAZON'”,
),
data_source_ids: Optional[List[str]] = Field(
None,
description=’The data source IDs to filter the knowledge base by. It must be a list of valid data source IDs from the resource://knowledgebases MCP resource’,
),
) -> str:
“””Query an Amazon Bedrock Knowledge Base using natural language.
 
## Usage Requirements
– You MUST first use the `resource://knowledgebases` resource to get valid knowledge base IDs
– You can query different knowledge bases or make multiple queries to the same knowledge base
 
## Query Tips
– Use clear, specific natural language queries for best results
– You can use this tool MULTIPLE TIMES with different queries to gather comprehensive information
– Break complex questions into multiple focused queries
– Consider querying for factual information and explanations separately
  “””
## Additional Implementation details …

What makes this tool powerful is its flexibility in querying knowledge bases with natural language. It supports several key features:

Configurable result sizes – Adjust the number of results based on whether you need focused or comprehensive information
Optional reranking – Improve relevance using language models (such as reranking models from Amazon or Cohere)
Data source filtering – Target specific sections of the knowledge base when needed

Reranking is disabled by default in this implementation but can be quickly enabled through environment variables or direct parameter configuration.
Enhanced relevance with reranking
A notable feature of this implementation is the ability to rerank search results using language models available through Amazon Bedrock. This capability allows the system to rescore search results based on deeper semantic understanding:

# Parse reranking enabled environment variable
kb_reranking_enabled_raw = os.getenv(‘BEDROCK_KB_RERANKING_ENABLED’)
kb_reranking_enabled = False # Default value is now False (off)
if kb_reranking_enabled_raw is not None:
kb_reranking_enabled_raw = kb_reranking_enabled_raw.strip().lower()
if kb_reranking_enabled_raw in (‘true’, ‘1’, ‘yes’, ‘on’):
kb_reranking_enabled = True

Reranking is particularly valuable for queries where semantic similarity might not be enough to determine the most relevant content. For example, when answering a specific question, the most relevant document isn’t necessarily the one with the most keyword matches, but the one that directly addresses the question being asked.
Full interaction flow
This section walks through a complete interaction flow to show how all these components work together:

The user asks a question to a language model such as Anthropic’s Claude through an application:

“What’s our quarterly IT security audit procedure?”

The language model recognizes it needs to access the knowledge base and calls the MCP tool:

{
“toolUse”: {
“name”: “QueryKnowledgeBases”,
“input”: {
“query”: “What’s our quarterly IT security audit procedure?”,
“knowledge_base_id”: “kb-12345abcde”,
“reranking”: true
}
}

The MCP server processes the request by querying the knowledge base with the specified parameters
The MCP server returns formatted results to the language model, including content, location, and relevance scores:

Applied metadata filters: department=IT, document_type=Procedure, category=Security Audit, frequency=Quarterly
 
Results have been reranked to improve relevance.
 
Result 1 [Score: 0.92]:
Source: S3 – s3://company-docs/procedures/it/security/quarterly-audit-procedure-v2.pdf
Content: The IT Security Quarterly Audit Procedure outlines the steps for conducting systematic evaluations of the organization’s information security controls. This document provides guidance on scope, methodology, reporting requirements, and remediation tracking…
Metadata:
• department: IT
• document_type: Procedure
• category: Security Audit
• frequency: Quarterly
• last_updated: 2024-11-15

The language model incorporates these results into its response to the user:

Based on our IT Security Quarterly Audit Procedure, the process involves four main phases:
 
1. Planning: Define scope, objectives, and schedule for the audit
2. Execution: Conduct the technical assessments and control evaluations
3. Reporting: Document findings, risks, and recommendations
4. Remediation: Track and verify resolution of identified issues
 
The procedure was last updated on November 15, 2024, and specifies that the Cybersecurity team leads the effort with support from IT Operations.

This interaction, illustrated in the following diagram, demonstrates the seamless fusion of language model capabilities with enterprise knowledge, enabled by the MCP protocol. The user doesn’t need to specify complex search parameters or know the structure of the knowledge base—the integration layer handles these details automatically.

Looking ahead: The MCP journey continues
As we’ve explored throughout this post, the Model Context Protocol provides a powerful framework for connecting language models to your enterprise data and tools on AWS. But this is just the beginning of the journey.
The MCP landscape is rapidly evolving, with new capabilities and implementations emerging regularly. In future posts in this series, we’ll dive deeper into advanced MCP architectures and use cases, with a particular focus on remote MCP implementation.
The introduction of the new Streamable HTTP transport layer represents a significant advancement for MCP, enabling truly enterprise-scale deployments with features such as:

Stateless server options for simplified scaling
Session ID management for request routing
Robust authentication and authorization mechanisms for secure access control
Horizontal scaling across server nodes
Enhanced resilience and fault tolerance

These capabilities will be essential as organizations move from proof-of-concept implementations to production-grade MCP deployments that serve multiple teams and use cases.
We invite you to follow this blog post series as we continue to explore how MCP and AWS services can work together to create more powerful, context-aware AI applications for your organization.
Conclusion
As language models continue to transform how we interact with technology, the ability to connect these models to enterprise data and systems becomes increasingly critical. The Model Context Protocol (MCP) offers a standardized, secure, and scalable approach to integration.
Through MCP, AWS customers can:

Establish a standardized protocol for AI-data connections
Reduce development overhead and maintenance costs
Enforce consistent security and governance policies
Create more powerful, context-aware AI experiences

The Amazon Bedrock Knowledge Bases implementation we explored demonstrates how MCP can transform simple retrieval into intelligent discovery, adding value far beyond what either component could deliver independently.
Getting started
Ready to begin your MCP journey on AWS? Here are some resources to help you get started:
Learning resources:

AWS Labs MCP client and server code repository
Model Context Protocol documentation
Amazon Bedrock Developer Guide
MCP servers repository

Implementation steps:

Identify a high-value use case where AI needs access to enterprise data
Select the appropriate MCP servers for your data sources
Set up a development environment with local MCP implementations
Integrate with Amazon Bedrock using the patterns described in this post
Deploy to production with appropriate security and scaling considerations

Remember that MCP offers a “start small, scale incrementally” approach. You can begin with a single server connecting to one data source, then expand your implementation as you validate the value and establish patterns for your organization.
We encourage you to try the MCP with AWS services today. Start with a simple implementation, perhaps connecting a language model to your documentation or code repositories, and experience firsthand the power of context-aware AI.
Share your experiences, challenges, and successes with the community. The open source nature of MCP means that your contributions—whether code, use cases, or feedback—can help shape the future of this important protocol.
In a world where AI capabilities are advancing rapidly, the difference between good and great implementations often comes down to context. With MCP and AWS, you have the tools to make sure your AI systems have the right context at the right time, unlocking their full potential for your organization.
This blog post is part of a series exploring the Model Context Protocol (MCP) on AWS. In our next installment, we’ll explore the world of agentic AI, demonstrating how to build autonomous agents using the open-source Strands Agents SDK with MCP to create intelligent systems that can reason, plan, and execute complex multi-step workflows. We’ll also explore advanced implementation patterns, remote MCP architectures, and discover additional use cases for MCP.

About the authors
Aditya Addepalli is a Delivery Consultant at AWS, where he works to lead, architect, and build applications directly with customers. With a strong passion for Applied AI, he builds bespoke solutions and contributes to the ecosystem while consistently keeping himself at the edge of technology. Outside of work, you can find him meeting new people, working out, playing video games and basketball, or feeding his curiosity through personal projects.
Elie Schoppik leads live education at Anthropic as their Head of Technical Training. He has spent over a decade in technical education, working with multiple coding schools and starting one of his own. With a background in consulting, education, and software engineering, Elie brings a practical approach to teaching Software Engineering and AI. He’s shared his insights at a variety of technical conferences as well as universities including MIT, Columbia, Wharton, and UC Berkeley.
Jawhny Cooke is a Senior Anthropic Specialist Solutions Architect for Generative AI at AWS. He specializes in integrating and deploying Anthropic models on AWS infrastructure. He partners with customers and AI providers to implement production-grade generative AI solutions through Amazon Bedrock, offering expert guidance on architecture design and system implementation to maximize the potential of these advanced models.
Kenton Blacutt is an AI Consultant within the GenAI Innovation Center. He works hands-on with customers helping them solve real-world business problems with cutting edge AWS technologies, especially Amazon Q and Bedrock. In his free time, he likes to travel, experiment with new AI techniques, and run an occasional marathon.
Mani Khanuja is a Principal Generative AI Specialist Solutions Architect, author of the book Applied Machine Learning and High-Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
Nicolai van der Smagt is a Senior Specialist Solutions Architect for Generative AI at AWS, focusing on third-party model integration and deployment. He collaborates with AWS’ biggest AI partners to bring their models to Amazon Bedrock, while helping customers architect and implement production-ready generative AI solutions with these models.

Build a scalable AI assistant to help refugees using AWS

This post is co-written with Taras Tsarenko, Vitalil Bozadzhy, and Vladyslav Horbatenko. 
As organizations worldwide seek to use AI for social impact, the Danish humanitarian organization Bevar Ukraine has developed a comprehensive virtual generative AI-powered assistant called Victor, aimed at addressing the pressing needs of Ukrainian refugees integrating into Danish society. This post details our technical implementation using AWS services to create a scalable, multilingual AI assistant system that provides automated assistance while maintaining data security and GDPR compliance.
Bevar Ukraine was established in 2014 and has been at the forefront of supporting Ukrainian refugees in Denmark since the full-scale war in 2022, providing assistance to over 30,000 Ukrainians with housing, job search, and integration services. The organization has also delivered more than 200 tons of humanitarian aid to Ukraine, including medical supplies, generators, and essential items for civilians affected by the war.
Background and challenges
The integration of refugees into host countries presents multiple challenges, particularly in accessing public services and navigating complex legal procedures. Traditional support systems, relying heavily on human social workers, often face scalability limitations and language barriers. Bevar Ukraine’s solution addresses these challenges through an AI-powered system that operates continuously while maintaining high standards of service quality.
Solution overview
The solution’s backbone comprises several AWS services to deliver a reliable, secure, and efficient generative AI-powered digital assistant for Ukrainian refugees. The team consisting of three volunteer software developers developed the solution within weeks.
The following diagram illustrates the solution architecture.

Amazon Elastic Compute Cloud (Amazon EC2) serves as the primary compute layer, using Spot Instances to optimize costs. Amazon Simple Storage Service (Amazon S3) provides secure storage for conversation logs and supporting documents, and Amazon Bedrock powers the core natural language processing capabilities. Bevar Ukraine uses Amazon DynamoDB for real-time data access and session management, providing low-latency responses even under high load.
In the process of implementation, we discovered that Anthropic’s Claude 3.5 large language model (LLM) is best suited due to its advanced dialogue logic and ability to maintain a human-like tone. It’s best for thorough, reasoned responses and generating more creative content, which makes Victor’s replies more natural and engaging.
Amazon Titan Embeddings G1 – Text v1.2 excels at producing high-quality vector representations of multilingual text, enabling efficient semantic search and similarity comparisons. This is particularly valuable when Victor needs to retrieve relevant information from a large knowledge base or match users’ queries to previously seen inputs. Amazon Titan Embeddings also integrates smoothly with AWS, simplifying tasks like indexing, search, and retrieval.
In real-world interactions with Victor, some queries require short, specific answers, whereas others need creative generation or contextual understanding. By combining Anthropic’s Claude 3.5. for generation and Amazon Titan Embeddings G1 for semantic retrieval, Victor can route each query through the most appropriate pipeline, retrieving relevant context through embeddings and generating a response, resulting in more accurate and context-aware answers.
Amazon Bedrock provides a remarkable interface to call Anthropic’s Claude 3.5 and Amazon Titan Embeddings G1 (along with other models) without creating separate integrations for each provider, simplifying development and maintenance.
For multilingual support, we used embedders that support multi-language embeddings and translated our materials using Amazon Translate. This enhances the resilience of our Retrieval Augmented Generation (RAG) system. The application is built securely and uses AWS services to accomplish this. AWS Key Management Service (AWS KMS) simplifies the process of encrypting data within the application, and Amazon API Gateway supports the applications REST endpoints. User authentication and authorization capabilities are supported by Amazon Cognito, which provides secure and scalable customer identity and access management (CIAM) capabilities.
The application runs on AWS infrastructure using services that are designed to be secure and scalable like Amazon S3, AWS Lambda, and DynamoDB.
Tips and recommendations
Building an AI assistant solution for refugees using Amazon Bedrock and other AWS services has provided valuable insights into creating impactful AI-powered humanitarian solutions. Through this implementation, we discovered key considerations that organizations should keep in mind when developing similar solutions. The experience highlighted the importance of balancing technical capabilities with human-centric design, providing multilingual support, maintaining data privacy, and creating scalable yet cost-effective solutions. These learnings can serve as a foundation for organizations looking to use AI and cloud technologies to support humanitarian causes, particularly in creating accessible and helpful digital assistance for displaced populations. The following are the main

Use the Amazon Bedrock playground to test multiple LLMs side by side using the same prompt. This helps you find the model that gives the best quality, style, and tone of response for your specific use case (for example, factual accuracy vs. conversational tone).
Experiment with prompts and settings to improve responses.
Keep costs in mind; set up monitoring and budgets in AWS.
For tasks involving information retrieval or semantic search, select an embedding model while making sure to pick the appropriate settings. Pay attention to the size of the embeddings, because larger vectors can capture more meaning but might increase costs. Also, check that the model supports the languages your application requires.
If you’re using a knowledge base, use the Amazon Bedrock knowledge base playground to experiment with how content is chunked and how many passages are retrieved for each query. Finding the right number of retrieved passages can make a big difference in how clear and focused the final answers are—sometimes fewer, high-quality chunks work better than sending too much context.
To enforce safety and privacy, use Amazon Bedrock Guardrails. Guardrails can help prevent the model from leaking sensitive information, such as personal data or internal business content, and you can block harmful responses or enforce a specific tone and formatting style.
Start with a simple prototype, test the embedding quality in your domain, and expand iteratively.

Integration and enhancement layer
Bevar Ukraine has extended the core AWS infrastructure with several complementary technologies:

Pinecone vector database – For efficient storage and retrieval of semantic embeddings
DSPy framework – For structured prompt engineering and optimization of Anthropic’s Claude 3.5 Sonnet responses
EasyWeek – For appointment scheduling and resource management
Telegram API – For UI delivery
Amazon Bedrock Guardrails – For security policy enforcement
Amazon Rekognition – For document verification
GitHub-based continuous integration and delivery (CI/CD) pipeline – For rapid feature deployment

Key technical insights
The implementation revealed several crucial technical considerations. The DSPy framework was crucial in optimizing and enhancing our language model prompts. By integrating additional layers of reasoning and context awareness tools, DSPy notably improved response accuracy, consistency, and depth. The team found that designing a robust knowledge base with comprehensive metadata was fundamental to the system’s effectiveness.
GDPR compliance required careful architectural decisions, including data minimization, secure storage, and clear user consent mechanisms. Cost optimization was achieved through strategic use of EC2 Spot Instances and implementation of API request throttling, resulting in significant operational savings without compromising performance.
Future enhancements
Our roadmap includes several technical improvements to enhance the system’s capabilities:

Implementing advanced context dispatching using machine learning algorithms to improve service coordination across multiple domains
Developing a sophisticated human-in-the-loop validation system for complex cases requiring expert oversight
Migrating suitable components to a serverless architecture using Lambda to optimize resource utilization and costs
Enhancing the knowledge base with advanced semantic search capabilities and automated content updates

Results
This solution, which serves hundreds of Ukrainian refugees in Denmark daily, demonstrates the potential of AWS services in creating scalable, secure, and efficient AI-powered systems for social impact. As a result, volunteers and employees of Bevar Ukraine have saved thousands of hours, and instead of answering repetitive questions from refugees, can support them in more complicated life situations. For refugees, the virtual assistant Victor is a lifeline support that allows users to get responses to the most pressing questions about public services in Denmark and many other questions in seconds instead of having to wait for an available volunteer to help. Given the vast knowledge base Victor is using to generate responses, the quality of support has improved as well.
Conclusion
Through careful architecture design and integration of complementary technologies, we’ve created a platform that effectively addresses the challenges faced by refugees while maintaining high standards of security and data protection.
The success of this implementation provides a blueprint for similar solutions in other social service domains, potentially supporting refugees and other people in need around the world, highlighting the importance of combining robust cloud infrastructure with thoughtful system design to create meaningful social impact.

About the Authors
Taras Tsarenko is a Program Manager at Bevar Ukraine. For over a decade in the world of technology, Taras has led everything from tight-knit agile teams of 5 or more to a company of 90 people that became the best small IT company in Ukraine under 100 people in 2015. Taras is a builder who thrives at the intersection of strategy and execution, where technical expertise meets human impact, whether it’s streamlining workflows, solving complex problems, or empowering teams to create meaningful products. Taras specializes in AI-driven solutions and data engineering, leveraging technologies like machine learning and generative AI using Amazon SageMaker AI, Amazon Bedrock, Amazon OpenSearch Service, and more. Taras is an AWS Certified ML Engineer Associate.
Anton Garvanko is a Senior Analytics Sales Specialist for Europe North at AWS. As a finance professional turned salesman, Anton spent 15 years in various finance leadership roles in supply chain and logistics as well as financial services industries. Anton joined Amazon over 5 years ago and has been part of specialist sales teams focusing on business intelligence, analytics, and generative AI for over 3 years. He is passionate about connecting the worlds of finance and IT by making sure that business intelligence and analytics powered by generative AI support everyday decision-making across industries and use cases.
Vitalii Bozadzhy is a Senior Developer with extensive experience in building high-load, cloud-based solutions, specializing in Java, Golang, SWIFT, and Python. He specializes in scalable backend systems, microservice architectures designed to automate business processes, as well as building reliable and secure cloud infrastructures. Furthermore, he has experience in optimizing compute resources and building advanced solutions integrated into products. His expertise covers the full development cycle—from design and architecture to deployment and maintenance—with a strong focus on performance, fault tolerance, and innovation.
Vladyslav Horbatenko is a computer science student, Professor Assistant, and Data Scientist with a strong focus on artificial intelligence. Vladyslav began his journey with machine learning, reinforcement learning, and deep learning, and gradually became more interested in large language models (LLMs) and their potential impact. This led him to deepen his understanding of LLMs, and now he works on developing, maintaining, and improving LLM-based solutions. He contributes to innovative projects while staying up to date with the latest advancements in AI.

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integratio …

Noodoe is a global leader in EV charging innovation, offering advanced solutions that empower operators to optimize their charging station operations and provide exceptional user experiences. Their universal charging stations are compatible with all EV brands and feature intuitive payment options, including credit cards and Apple Pay. Powered by the Noodoe EV OS cloud management system, the company delivers round-the-clock automated monitoring, diagnostics, and maintenance, achieving a market-leading uptime of 99.83%. With operations in over 15 countries and a robust commitment to sustainability, Noodoe is transforming the EV charging industry through cutting-edge technology and a user-first approach.
Despite its technological strengths, Noodoe has encountered key challenges in helping station operators optimize performance and select the most cost-effective electricity pricing strategies across diverse markets. Traditional systems lack the capability to efficiently process vast amounts of real-time and historical data or provide personalized, station-level recommendations. This limits operators’ ability to make timely, informed decisions—resulting in higher electricity costs, underutilized assets, and a subpar customer experience. These inefficiencies not only reduce profitability but also hinder the ability to scale in a competitive and fast-evolving EV charging landscape.
To solve this, Noodoe has integrated large language models (LLMs) through Amazon Bedrock and Amazon Bedrock Agents to deliver intelligent automation, real-time data access, and multilingual support. These AI-powered tools analyze usage patterns, station diagnostics, and external variables like weather or grid conditions to generate highly tailored pricing recommendations. By using the structured orchestration and prompt-based reasoning of Amazon Bedrock, Noodoe equips operators with actionable insights that improve margins, enhance station utilization, and allow them to offer more competitive rates to users—ultimately boosting customer satisfaction. This service is delivered through a subscription model, creating a new, scalable revenue stream for Noodoe while reinforcing its leadership and innovation in the EV infrastructure space.
In this post, we explore how Noodoe uses AI and Amazon Bedrock to optimize EV charging operations. By integrating LLMs, Noodoe enhances station diagnostics, enables dynamic pricing, and delivers multilingual support. These innovations reduce downtime, maximize efficiency, and improve sustainability. Read on to discover how AI is transforming EV charging management.
Solution overview
The Noodoe AI-enhanced diagnostics flow is built on a multi-step process that combines data collection, AI-powered analytics, and seamless translation for global accessibility, as illustrated in the following figure.

The physical charging station network currently operates over 1,000 sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. As illustrated in the following image, it uses the EV Cloud and LLMs to generate relevant recommendations following backend processing.

The following screenshot shows an example of the results in the UI.
Overview of Noodoe AI-enhanced diagnostics
The following diagram illustrates the solution data flow.

To meet the feature requirements, the system operation process includes the following steps:

Charging data is processed through the EV service before entering the database.
The charging history data and pricing data are stored in the EV database.
Amazon EventBridge Scheduler periodically triggers the EV service to perform analysis.
The EV service calls the AI service to analyze historical data and provide pricing recommendations.
The AI service collects the organized historical data to prepare the prompt template.
This information, combined with appropriate prompts, is used in conjunction with Amazon Bedrock Agents as an AI-pricing agent to extract relevant information. The AI-pricing agent analyzes this combined data to identify daily peak and off-peak periods and provide recommendations for user pricing plans.
Optionally, if translation is needed for non-English users, these results from the AI-pricing agent are further processed through another Amazon Bedrock agent for translation.
Optionally, the translation agent uses Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock to get the result in the corresponding language.
Finally, the AI service collects the results in the user’s language for formatting and other processing, then inserts them into a template to create a comprehensive report that is pushed to the user’s end.

In the following section, we dive deep into these steps and the AWS services used.
Architecture of Noodoe AI-enhanced diagnostics
Noodoe faced key challenges in building a globally scalable, reliable, and cost-efficient architecture. They needed a solution that could support rapid expansion, handle high data volumes, and deliver consistent performance across AWS Regions. Addressing these requirements required careful architectural planning to provide flexibility and resilience.
The following diagram illustrates the solution architecture Noodoe built to overcome these challenges to support global growth.
The EV charging optimization platform structures the data flow across multiple AWS services, providing efficient data ingestion, processing, and AI-driven decision-making. Amazon Elastic Kubernetes Service (Amazon EKS) retrieves data from Amazon DocumentDB, processes it, and invokes Amazon Bedrock Agents for reasoning and analysis. This structured data pipeline enables optimized pricing strategies and multilingual customer interactions. By using containerized applications, event-driven workflows, and AI capabilities, the system provides scalable and flexible insights to EV station operators.
Data ingestion and processing
EV charging stations send real-time charging data to AWS IoT Core, which acts as the initial entry point for data processing. The data is then transmitted to Amazon Managed Streaming for Apache Kafka (Amazon MSK) to facilitate high-throughput, reliable streaming. From Amazon MSK, data flows into Amazon EKS, where the EV service processes it before storing the charging history and trend records in DocumentDB. This structured storage provides efficient retrieval for analysis and prediction.
AI-powered pricing analysis
To optimize pricing strategies, Amazon EventBridge triggers a pricing prediction function at regular intervals. This function retrieves historical charging data from DocumentDB and sends it, along with predefined prompts, to the Amazon Bedrock AI-pricing agent. The AI agent, powered by Anthropic’s Claude on Amazon Bedrock, evaluates station usage trends, peak and off-peak periods, and pricing inefficiencies to generate optimal pricing recommendations. Although the pricing agent doesn’t access an Amazon Bedrock knowledge base or trigger action groups, it uses preprocessing and post processing features to refine predictions and improve decision-making.
Multilingual support and report generation
If translation is required, the pricing analysis results are forwarded to the Amazon Bedrock translate agent, which converts the insights into the operator’s preferred language. The translated and structured data is then formatted into a predefined report template and stored in a designated database for later retrieval. This provides seamless access to actionable insights across diverse markets.
UI, monitoring, and performance optimization
Operators access the system through a web-based UI, with Amazon Route 53 and Amazon CloudFront providing fast and efficient content delivery. An Application Load Balancer distributes incoming requests across multiple EKS instances, providing high availability. To optimize performance, Amazon ElastiCache accelerates data retrieval while reducing database load. For system monitoring and observability, Amazon CloudWatch provides additional monitoring and observability. The administrator of Noodoe uses Amazon Managed Service for Prometheus and Amazon Managed Grafana for system monitoring and visualization.
This architecture empowers Noodoe with an AI-driven, scalable, and intelligent EV charging management solution, enhancing station utilization, revenue optimization, and customer experience worldwide.
Summary
The Noodoe AI-enhanced diagnostics flow transforms EV charging operations by integrating Amazon Bedrock Agents, blending rule-based automation, real-time user input, and LLM-powered insights for smarter decision-making. Backed by a comprehensive knowledge base and streamlined APIs, the solution empowers operators to automate workflows, optimize pricing, and boost station performance at scale. Ongoing expansion of the knowledge base, workflow refinement, and real-world testing further improve efficiency and reliability. This approach has delivered a 15% increase in revenue and reduced implementation time by 10%. Continuous feedback and clear documentation equip users to effectively use AI-driven diagnostics for more intelligent charging management.
Roman Kleinerman, Vice President of Products at Noodoe, shares: “We’ve seen revenue increases of 10–25% depending on the location and number of stations, as customers use our Al solution to optimize pricing strategies.”
Noodoe is dedicated to delivering smarter, more intelligent EV charging services that benefit both end-users and operators. Currently, Noodoe operates over 1,000 charging sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. Looking ahead, the system is being enhanced to support near real-time dynamic pricing optimization by incorporating factors such as demand, grid conditions, time of day, and weather. Amazon Bedrock Agents help enable these intelligent capabilities, powering dynamic pricing, load balancing, and grid-aware routing to optimize energy distribution and guide users to the most efficient stations. Future enhancements will offer personalized charging recommendations and incentives based on user preferences, maximizing value for both customers and operators. Start building intelligent, AI-powered EV charging solutions with Amazon Bedrock.

About the Authors
Ray Wang is a Senior Solutions Architect at AWS. With 12 years of experience in the IT industry, Ray is dedicated to building modern solutions on the cloud, especially in NoSQL, big data, machine learning, and Generative AI. As a hungry go-getter, he passed all 14 AWS certificates to make his technical field not only deep but wide. He loves to read and watch sci-fi movies in his spare time.
Howard Su is a Solutions Architect at AWS. With many years of experience in software development and system operations, Howard have served in various roles including RD, QA, and SRE, and Howard have been responsible for the architectural design of numerous large-scale systems, as well as participating in several cloud migrations. After accumulating years of development and operations experience, Howard is dedicated to promoting cloud-native service technologies and becoming an advocate for DevOps.
Tony Trinh is a Senior AIML Specialist Architect at AWS. With 13+ years of experience in the IT industry, Tony specializes in architecting scalable, compliance-driven AI and ML solutions—particularly in generative AI, MLOps, and cloud-native data platforms. As a perpetual learner, he’s doing research in Visual Language Model, Responsible AI & Computer Vision and authoring a book in ML engineering. In his spare time, Tony enjoys outdoor activity, experimenting with home improvement, and exploring Melbourne’s vibrant coffee scene.

Build GraphRAG applications using Amazon Bedrock Knowledge Bases

In these days, it is more common to companies adopting AI-first strategy to stay competitive and more efficient. As generative AI adoption grows, the technology’s ability to solve problems is also improving (an example is the use case to generate comprehensive market report). One way to simplify the growing complexity of problems to be solved is through graphs, which excel at modeling relationships and extracting meaningful insights from interconnected data and entities.
In this post, we explore how to use Graph-based Retrieval-Augmented Generation (GraphRAG) in Amazon Bedrock Knowledge Bases to build intelligent applications. Unlike traditional vector search, which retrieves documents based on similarity scores, knowledge graphs encode relationships between entities, allowing large language models (LLMs) to retrieve information with context-aware reasoning. This means that instead of only finding the most relevant document, the system can infer connections between entities and concepts, improving response accuracy and reducing hallucinations. To inspect the graph built, Graph Explorer is a great tool.
Introduction to GraphRAG
Traditional Retrieval-Augmented Generation (RAG) approaches improve generative AI by fetching relevant documents from a knowledge source, but they often struggle with context fragmentation, when relevant information is spread across multiple documents or sources.
This is where GraphRAG comes in. GraphRAG was created to enhance knowledge retrieval and reasoning by leveraging knowledge graphs, which structure information as entities and their relationships. Unlike traditional RAG methods that rely solely on vector search or keyword matching, GraphRAG enables multi-hop reasoning (logical connections between different pieces of context), better entity linking, and contextual retrieval. This makes it particularly valuable for complex document interpretation, such as legal contracts, research papers, compliance guidelines, and technical documentation.
Amazon Bedrock Knowledge Bases GraphRAG
Amazon Bedrock Knowledge Bases is a managed service for storing, retrieving, and structuring enterprise knowledge. It seamlessly integrates with the foundation models available through Amazon Bedrock, enabling AI applications to generate more informed and trustworthy responses. Amazon Bedrock Knowledge Bases now supports GraphRAG, an advanced feature that enhances traditional RAG by integrating graph-based retrieval. This allows LLMs to understand relationships between entities, facts, and concepts, making responses more contextually relevant and explainable.
How Amazon Bedrock Knowledge Bases GraphRAG works
Graphs are generated by creating a structured representation of data as nodes (entities) and edges (relationships) between those nodes. The process typically involves identifying key entities within the data, determining how these entities relate to each other, and then modeling these relationships as connections in the graph. After the traditional RAG process, Amazon Bedrock Knowledge Bases GraphRAG performs additional steps to improve the quality of the generated response:

It identifies and retrieves related graph nodes or chunk identifiers that are linked to the initially retrieved document chunks.
The system then expands on this information by traversing the graph structure, retrieving additional details about these related chunks from the vector store.
By using this enriched context, which includes relevant entities and their key connections, GraphRAG can generate more comprehensive responses.

How graphs are constructed
Imagine extracting information from unstructured data such as PDF files. In Amazon Bedrock Knowledge Bases, graphs are constructed through a process that extends traditional PDF ingestion. The system creates three types of nodes: chunk, document, and entity. The ingestion pipeline begins by splitting documents from an Amazon Simple Storage Service (Amazon S3) folder into chunks using customizable methods (you can choose between basic fixed-size chunking to more complex LLM-based chunking mechanisms). Each chunk is then embedded, and an ExtractChunkEntity step uses an LLM to identify key entities within the chunk. This information, along with the chunk’s embedding, text, and document ID, is sent to Amazon Neptune Analytics for storage. The insertion process creates interconnected nodes and edges, linking chunks to their source documents and extracted entities using the bulk load API in Amazon Neptune. The following figure illustrates this process.

Use case
Consider a company that needs to analyze a large range of documents, and needs to correlate entities that are spread across those documents to answer some questions (for example, Which companies has Amazon invested in or acquired in recent years?). Extracting meaningful insights from this unstructured data and connecting it with other internal and external information poses a significant challenge. To address this, the company decides to build a GraphRAG application using Amazon Bedrock Knowledge Bases, usign the graph databases to represent complex relationships within the data.
One business requirement for the company is to generate a comprehensive market report that provides a detailed analysis of how internal and external information are correlated with industry trends, the company’s actions, and performance metrics. By using Amazon Bedrock Knowledge Bases, the company can create a knowledge graph that represents the intricate connections between press releases, products, companies, people, financial data, external documents and industry events. The Graph Explorer tool becomes invaluable in this process, helping data scientists and analysts to visualize those connections, export relevant subgraphs, and seamlessly integrate them with the LLMs in Amazon Bedrock. After the graph is well structured, anyone in the company can ask questions in natural language using Amazon Bedrock LLMs and generate deeper insights from a knowledge base with correlated information across multiple documents and entities.
Solution overview
In this GraphRAG application using Amazon Bedrock Knowledge Bases, we’ve designed a streamlined process to transform raw documents into a rich, interconnected graph of knowledge. Here’s how it works:

Document ingestion: Users can upload documents manually to Amazon S3 or set up automatic ingestion pipelines.
Chunk, entity extraction, and embeddings generation: In the knowledge base, documents are first split into chunks using fixed size chunking or customizable methods, then embeddings are computed for each chunk. Finally, an LLM is prompted to extract key entities from each chunk, creating a GraphDocument that includes the entity list, chunk embedding, chunked text, and document ID.
Graph construction: The embeddings, along with the extracted entities and their relationships, are used to construct a knowledge graph. The constructed graph data, including nodes (entities) and edges (relationships), is automatically inserted into Amazon Neptune.
Data exploration: With the graph database populated, users can quickly explore the data using Graph Explorer. This intuitive interface allows for visual navigation of the knowledge graph, helping users understand relationships and connections within the data.
LLM-powered application: Finally, users can leverage LLMs through Amazon Bedrock to query the graph and retrieve correlated information across documents. This enables powerful, context-aware responses that draw insights from the entire corpus of ingested documents.

The following figure illustrates this solution.

Prerequisites
The example solution in this post uses datasets from the following websites:

 Amazon Press Center archive
Amazon Investor relations quarterly reports

Also, you need to:

Create an S3 bucket to store the files on AWS. In this example, we named this bucket: blog-graphrag-s3.
Download and upload the PDF and XLS files from the websites into the S3 bucket.

Building the Graph RAG Application

Open the AWS Management Console for Amazon Bedrock.
In the navigation pane, under Knowledge Bases, choose Create.
Select Knowledge Base with vector store, and choose Create.
Enter a name for Knowledge Base name (for example: knowledge-base-graphrag-demo) and optional description.
Select Create and use a new service role.
Select Data source as Amazon S3.
Leave everything else as default and choose Next to continue.
Enter a Data source name (for example: knowledge-base-graphrag-data-source).
Select an S3 bucket by choosing Browse S3. (If you don’t have an S3 bucket in your account, create one. Make sure to upload all the necessary files.)
After the S3 bucket is created and files are uploaded, choose blog-graphrag-s3 bucket.
Leave everything else as default and choose Next.
Choose Select model and then select an embeddings model (in this example, we chose the Titan Text Embeddings V2 model).
In the Vector database section, under Vector store creation method select Quick create a new vector store, for the Vector store select Amazon Neptune Analytics (GraphRAG),and choose Next to continue.
Review all the details.
Choose Create Knowledge Base after reviewing all the details.
Creating a knowledge base on Amazon Bedrock might take several minutes to complete depending on the size of the data present in the data source. You should see the status of the knowledge base as Available after it is created successfully.

Update and sync the graph with your data

Select the Data source name (in this example, knowledge-base-graphrag-data-source ) to view the synchronization history.
Choose Sync to update the data source.

Visualize the graph using Graph Explorer
Let’s look at the graph created by the knowledge base by navigating to the Amazon Neptune console. Make sure that you’re in the same AWS Region where you created the knowledge base.

Open the Amazon Neptune console.
In the navigation pane, choose Analytics and then Graphs.
You should see the graph created by the knowledge base.

To view the graph in Graph Explorer, you need to create a notebook by going to the Notebooks section.
You can create the notebook instance manually or by using an AWS CloudFormation template. In this post, we will show you how to do it using the Amazon Neptune console (manual).
To create a notebook instance:

Choose Notebooks.
Choose Create notebook.

Select the Analytics as the Neptune Service
Associate the notebook with the graph you just created (in this case: bedrock-knowledge-base-imwhqu).
Select the notebook instance type.
Enter a name for the notebook instance in the Notebook name
Create an AWS Identity and Access Management (IAM) role and use the Neptune default configuration.
Select VPC, Subnet, and Security group.

Leave Internet access as default and choose Create notebook.

Notebook instance creation might take a few minutes. After the Notebook is created, you should see the status as Ready.
To see the Graph Explorer:

Go to Actions and choose Open Graph Explorer.

By default, public connectivity is disabled for the graph database. To connect to the graph, you must either have a private graph endpoint or enable public connectivity. For this post, you will enable public connectivity for this graph.
To set up a public connection to view the graph (optional):

Go back to the graph you created earlier (under Analytics, Graphs).
Select your graph by choosing the round button to the left of the Graph Identifier.
Choose Modify.
Select the check box Enable public connectivity in the Network
Choose Next.
Review changes and choose Submit.

To open the Graph Explorer:

Go back to Notebooks.
After the the Notebook Instance is created, click on in the instance name (in this case: aws-neptune-analytics-neptune-analytics-demo-notebook).
Then, choose Actions and then choose Open Graph Explore

You should now see Graph Explorer. To see the graph, add a node to the canvas, then explore and navigate into the graph.

Playground: Working with LLMs to extract insights from the knowledge base using GraphRAG
You’re ready to test the knowledge base.

Choose the knowledge base, select a model, and choose Apply.
Choose Run after adding the prompt. In the example shown in the following screenshot, we asked How is AWS Increasing energy efficiency?).

Choose Show details to see the Source chunk.
Choose Metadata associated with this chunk to view the chunk ID, data source ID, and source URI.

In the next example, we asked a more complex question: Which companies has AMAZON invested in or acquired in recent years?

Another way to improve the relevance of query responses is to use a reranker model. Using the reranker model in GraphRAG involves providing a query and a list of documents to be reordered based on relevance. The reranker calculates relevance scores for each document in relation to the query, improving the accuracy and pertinence of retrieved results for subsequent use in generating responses or prompts. In the Amazon Bedrock Playgrounds, you can see the results generated by the reranking model in two ways: the data ranked by the reranking solitary (the following figure), or a combination of the reranking model and the LLM to generate new insights.

To use the reranker model:

Check the availability of the reranker model
Go to AWS Management Console for Amazon Bedrock.
From the navigation pane, under Builder tools, choose Knowledge Bases
Choose the same knowledge base we created in the steps before knowledge-base-graphrag-demo.
Click on Test Knowledge Base.
Choose Configurations, expand the Reranking section, choose Select model, and select a reranker model (in this post, we choose Cohere Rerank 3.5).

Clean up
To clean up your resources, complete the following tasks:

Delete the Neptune notebooks: aws-neptune-graphrag.
Delete the Amazon Bedrock Knowledge Bases: knowledge-base-graphrag-demo.
Delete content from the Amazon S3 bucket blog-graphrag-s3.

Conclusion
Using Graph Explorer in combination with Amazon Neptune and Amazon Bedrock LLMs provides a solution for building sophisticated GraphRAG applications. Graph Explorer offers intuitive visualization and exploration of complex relationships within data, making it straightforward to understand and analyze company connections and investments. You can use Amazon Neptune graph database capabilities to set up efficient querying of interconnected data, allowing for rapid correlation of information across various entities and relationships.
By using this approach to analyze Amazon’s investment and acquisition history of Amazon, we can quickly identify patterns and insights that might otherwise be overlooked. For instance, when examining the questions “Which companies has Amazon invested in or acquired in recent years?” or “How is AWS increasing energy efficiency?” The GraphRAG application can cross the knowledge graph, correlating press releases, investor relations information, entities, and financial data to provide a comprehensive overview of Amazon’s strategic moves.
The integration of Amazon Bedrock LLMs further enhances the accuracy and relevance of generated results. These models can contextualize the graph data, helping you to understand the nuances in company relationships and investment trends, and be supportive in generating comprehensive market reports. This combination of graph-based knowledge and natural language processing enables more precise answers and data interpretation, going beyond basic fact retrieval to offer analysis of Amazon’s investment strategy.
In summary, the synergy between Graph Explorer, Amazon Neptune, and Amazon Bedrock LLMs creates a framework for building GraphRAG applications that can extract meaningful insights from complex datasets. This approach streamlines the process of analyzing corporate investments and create new ways to analyze unstructured data across various industries and use cases.

About the authors
Ruan Roloff is a ProServe Cloud Architect specializing in Data & AI at AWS. During his time at AWS, he was responsible for the data journey and data product strategy of customers across a range of industries, including finance, oil and gas, manufacturing, digital natives and public sector — helping these organizations achieve multi-million dollar use cases. Outside of work, Ruan likes to assemble and disassemble things, fish on the beach with friends, play SFII, and go hiking in the woods with his family.
Sai Devisetty is a Technical Account Manager at AWS. He helps customers in the Financial Services industry with their operations in AWS. Outside of work, Sai cherishes family time and enjoys exploring new destinations.
Madhur Prashant is a Generative AI Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.
Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Streamline personalization development: How automated ML workflows acc …

Crafting unique, customized experiences that resonate with customers is a potent strategy for boosting engagement and fostering brand loyalty. However, creating dynamic personalized content is challenging and time-consuming because of the need for real-time data processing, complex algorithms for customer segmentation, and continuous optimization to adapt to shifting behaviors and preferences—all while providing scalability and accuracy. Despite these challenges, the potential rewards make personalization a worthwhile pursuit for many businesses. Amazon Personalize is a fully managed machine learning (ML) service that uses your data to generate product and content recommendations for your users. Amazon Personalize helps accelerate time-to-value with custom models that are trained on data you provide, such as your users, catalog items, and the interactions between users and items to generate personalized content and product recommendations. You can choose from various recipes—algorithms for specific use-cases—to find the ones that fit your needs, such as recommending items that a user is mostly likely to engage with next given their past interactions or next best action that a user is most likely to take.
To maintain a personalized user experience, it’s crucial to implement machine learning operations (MLOps) practices, including continuous integration, deployment, and training of your ML models. MLOps facilitates seamless integration across various ML tools and frameworks, streamlining the development process. A robust machine learning solution for maintaining personalized experiences typically includes automated pipeline construction, as well as automated configuration, training, retraining, and deployment of personalization models. While services like Amazon Personalize offer a ready-to-use recommendation engine, establishing a comprehensive MLOps lifecycle for a personalization solution remains a complex undertaking. This process involves intricate steps to make sure that models remain accurate and relevant as user behaviors and preferences evolve over time.
This blog post presents an MLOps solution that uses AWS Cloud Development Kit (AWS CDK) and services like AWS Step Functions, Amazon EventBridge and Amazon Personalize to automate provisioning resources for data preparation, model training, deployment, and monitoring for Amazon Personalize.
Features and benefits
Deploying this solution offers improved scalability and traceability and allows you to quickly set up a production-ready environment to seamlessly deliver tailored recommendations to users using Amazon Personalize. This solution:

Streamlines the creation and management of Amazon Personalize resources.
Provides greater flexibility in resource management and selective feature activation.
Enhances readability and comprehensibility of complex workflows.
Enables event-driven architecture by publishing key Amazon Personalize events, allowing real-time monitoring, and enabling automated responses and integrations with other systems.
Includes automated creation of Amazon Personalize resources, including recommenders, solutions, and solution versions.
Facilitates end-to-end workflow automation for dataset import, model training, and deployment in Amazon Personalize.
Improves organization and modularity of complex processes through nested step functions.
Provides flexible activation of specific solution components using AWS CDK.

Solution overview
This solution uses AWS CDK layer 3 constructs. Constructs are the basic building blocks of AWS CDK applications. A construct is a component within your application that represents one or more AWS CloudFormation resources and their configuration.

The solution architecture is shown in the preceding figure and includes:

An Amazon Simple Storage Service (Amazon S3) bucket is used to store interactions, users, and items datasets. In this step, you need to configure your bucket permissions so that Amazon Personalize and AWS Glue can access the datasets and input files.
AWS Glue is used to preprocess the interactions, users, and item datasets. This step helps ensure that the datasets comply with the training data requirements of Amazon Personalize. For more information, see Preparing training data for Amazon Personalize.
EventBridge is used to schedule regular updates, by triggering the workflow and for publishing events related to resource provisioning. Because Step Functions workflow orchestrates the workflow based on the input configuration file, you use that configuration when setting up the scheduled start of Step Functions.
Step Functions workflow manages all resource provisioning of the Amazon Personalize dataset group (including datasets, schemas, event tracker, filters, solutions, campaigns, and batch inference jobs). Step Functions provides monitoring across the solution through event logs. You can also visually track the stages of your workflow in the Step Functions console. You can adjust the input configuration file to better fit your use case, by defining schemas, recipes, and inference options. The solution workflow will have the following steps:

A preprocessing job that runs an AWS Glue job, if provided. This step facilitates any preprocessing of the data that might be required.
Create a dataset group, which is a container for Amazon Personalize resources.
Create a dataset import job for the datasets based on the defined S3 bucket.
Create filters that define any filtering that you want to apply on top of the recommendations.
Create an event tracker for ingesting real-time events, such as user interactions, which in turn influence the recommendations provided.
Create solutions and recommenders for creating custom resources and domain recommenders.
Create a campaign; or a batch inference or segment job for generating inferences for real-time, batch, and segmentation use cases respectively.

If you have a batch inference use case, then recommendations that match your inputs will be output into the S3 bucket that you defined in the input configuration file.
An Amazon EventBridge event bus, where resource status notification updates are posted throughout the AWS Step Functions workflow.

Prerequisites
Before you deploy the AWS CDK stack, make sure that you have the following prerequisites in place:

Install and configure AWS Command Line Interface (AWS CLI).
Install Python 3.12 or newer
Install Node.js 20.16.0 or newer.
Install AWS CDK 2.88.0 or newer.
Docker 27.5.1 or newer (required for AWS Lambda function bundling).

Newer versions of AWS CLI, Python, Node.js, and the AWS CDK are generally compatible, this solution has been tested with the versions listed.
Deploy the solution
With the prerequisites in place, use the following steps to deploy the solution:

Clone the repository to a new folder on your desktop using the following command:

git clone https://github.com/aws-samples/automate-mlops-personalize-cdk-pipeline.git

Create a Python virtual environment for development:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Define an Amazon Personalize MLOps pipeline instance PersonalizeMlOpsPipeline (see personalize_pipeline_stack.py for the complete example, which also includes different inference options). In this walkthrough, you create a custom solution with an associated campaign and batch inference job:

PersonalizeMlOpsPipeline(
self, ‘PersonalizePipelineSolution’,
pre_processing_config={
“job_class”: PreprocessingGlueJobFlow
},
enable_filters=True,
enable_event_tracker=True,
recommendation_config=[
{
“type”: “solutions”,
“inference_options”: [“campaigns”, “batchInferenceJobs”]
},
{
“type”: “recommenders”
}
]

)

Where:

‘PersonalizePipelineSolution‘ – The name of the pipeline solution stack
pre_processing_config – Configuration for the pre-processing job to transform raw data into a format usable by Amazon Personalize. For using AWS Glue jobs for preprocessing specify the AWS Glue job class (PreprocessingGlueJobFlow) as a value to the parameter job_class. Currently, only AWS Glue jobs are supported. You can pass the name of the AWS Glue job that you need to run as a part of the input config. This doesn’t deploy the actual AWS Glue job responsible for pre-processing the files; the actual AWS Glue must be created outside of this solution and the name passed as an input to the state machine. A sample AWS Glue job is supplied in the accompanying repo, which shows how pre-processing can be done.
enable_filters – A Boolean value to enable dataset filters for pre-processing. When set to true, the pipeline will create the state machines needed to create filters. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
enable_event_tracker – A Boolean value to enable the Amazon Personalize event tracker. When set to true, the pipeline will create the state machines needed to create an event tracker. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
recommendation_config – Configuration options for recommendations. The two types currently supported are solutions and recommenders. Within the solutions type, you can have multiple options such as campaigns, batchInferenceJobs, and batchSegmentJobs. Based on the selected options, the corresponding state machine and components are created. In the earlier example, we used campaigns and batchInferenceJobs as the option, which means that only the campaigns and batch inference job state machines will be deployed with the AWS CDK.

After the infrastructure is deployed you can also enable and disable certain options through the state machine input configuration file. You can use this AWS CDK code to control what components are deployed in your AWS environment and with the input config, you can select what components run.
Preprocessing: As an optional step, you can use an existing AWS Glue job for preprocessing your data before feeding it into Amazon Personalize, which uses this data to generate recommendations for your end users. While this post demonstrates the process using the Movie Lens dataset, you can adapt it for your own datasets or custom processing needs. To do so, navigate to the glue_job folder and modify the movie_script.py file accordingly, or create an entirely new AWS Glue job tailored to your specific requirements. This preprocessing step, though optional, can be crucial in making sure that your data is optimally formatted for Amazon Personalize to generate accurate recommendations.

Make sure that the AWS Glue job is configured to write its output to an S3 bucket. This bucket should then be specified as an input source in the Step Functions input configuration file.
Verify that the AWS Glue service has the necessary permissions to access the S3 bucket mentioned in your script.
In the input configuration, you’ll need to provide the name of the AWS Glue job that will be executed by the main state machine workflow. It’s crucial that this specified AWS Glue job runs without any errors, because any failures could potentially cause the entire state machine execution to fail.

Package and deploy the solution with AWS CDK, allowing for the most flexibility in development:
Before you can deploy the pipeline using AWS CDK, you need to set up AWS credentials on your local machine. You can refer Set up AWS temporary credentials for more details.

# bootstrap CDK (required once – deploys a CDK bootstrap CloudFormation stack for assets)
cdk bootstrap
# build the solution
cdk synth
# build and deploy the solution
cdk deploy

Run the pipeline
Before initiating the pipeline, create the resources that follow and document the resource names for future reference.

Set up an S3 bucket for dataset storage. If you plan to use the preprocessing step, this should be the same bucket as the output destination.
Update the S3 bucket policy to grant Amazon Personalize the necessary access permissions. See Giving Amazon Personalize access to Amazon S3 resources for policy examples.
Create an AWS Identity and Access Management (IAM) role to be used by the state machine for accessing Amazon Personalize resources.

You can find detailed instructions and policy examples in the GitHub repository.
After you’ve set up these resources, you can create the input configuration file for the Step Functions state machine. If you configure the optional AWS Glue job it will create the input files that are required as an input to the pipeline, refer Configure the Glue Job to create the output files for more details.
Create input configuration
This input file is crucial because it contains all the essential information needed to create and manage your Amazon Personalize resources, this input configuration json acts as input to the Step Functions state machine. The file can contain the following top level objects

datasetGroup
datasets
eventTracker
filters
solutions (can contain campaigns, batchInferenceJobs and batchSegmentJobs)
recommenders

Customize the configuration file according to your specific requirements and include or exclude sections based on the Amazon Personalize artifacts that you want to create. For the dataset import jobs in the datasets section, replace AWS_ACCOUNT_ID, S3_BUCKET_NAME and IAM_ROLE_ARN with the appropriate values. The following is a snippet of the input configuration file. For a complete sample, see input_media.json.

“datasetImportJob”: {
“serviceConfig”: {
“dataSource”: {
“dataLocation”: “s3://S3_BUCKET_NAME/users.csv”
},
“roleArn”: “arn:aws:iam::AWS_ACCOUNT_ID:role/IAM_ROLE_NAME”,
“tags”: [
{
“tagKey”: “string”,
“tagValue”: “string”
}
],
“importMode”: “FULL”,
“jobName”: “UsersDatasetImportJob”
},
“createNewJob”: true
}
}

Likewise, if you’re using batch inference or batch segment jobs, remember to also update the BUCKET_NAME and IAM ROLE ARN in those sections. It’s important to verify that you have the required input files for batch inference stored in your S3 bucket. Adjust the file paths in your configuration to accurately reflect the location of these files within your bucket structure. This helps ensure that Amazon Personalize can access the correct data when executing these batch processes.
Adjust the AWS Glue Job name in the configuration file if you have configured it as a part of the AWS CDK stack.
See the property table for a deep dive into each property and identify whether it’s optional or required.
Execute the pipeline
You can run the pipeline using the main state machine by the name PersonalizePipelineSolution from the Step Functions Console or set up a schedule in EventBridge (find the step-by step process in the Schedule the workflow for continued maintenance of the solution section of this post).

In the AWS Management Console for Step Functions, navigate to State machines and select the PersonalizePipelineSolution.

Choose Start Execution and enter the configuration file that you created for your use case based on the steps in the Create input configuration section.

Choose Start execution and monitor the State Machine execution. In the Step Functions console, you will find a visual representation of the workflow and can track at what stage the execution is. Event logs will give you insight into the progress of the stages and information if there are any errors. The following figure is an example of a completed workflow:

After the workflow finishes, you can view the resources in the Amazon Personalize console. For batch inference jobs specifically, you can locate the corresponding step under the Inference tasks section of the graph, and within the Custom Resources area of the Amazon Personalize console.

Get recommendations (real-time inference)
After your pipeline has completed its run successfully, you can obtain recommendations. In the example configuration, we chose to deploy campaigns as the inference option. As a result, you’ll have access to a campaign that can provide real-time recommendations.
We use the Amazon Personalize console to get recommendations. Choose Dataset groups and select your dataset group name. Choose Campaigns and select your campaign name. Enter a userid and items Ids of your choice to test personalized ranking, you can get the userid and item Ids from the input file in the Amazon S3 bucket you configured.

 
Get recommendations (batch inference)
If you have configured batch inference to run, start by verifying that the batch inference step has successfully completed in the Step Functions workflow. Then, use the Amazon S3 console to navigate to the destination S3 bucket for your batch inference job. If you don’t see an output file there, verify that you’ve provided the correct path for the input file in your input configuration.

 
Schedule the workflow for continued maintenance of the solution
While Amazon Personalize offers automatic training for solutions through its console or SDK, allowing users to set retraining frequencies such as every three days, this MLOps workflow provides an enhanced approach. By using EventBridge schedules you gain more precise control over the timing of retraining processes. Using this method, you can specify exact dates and times for retraining executions. To implement this advanced scheduling, you can configure an EventBridge schedule to trigger the Step Functions execution, giving you finer granularity in managing your machine learning model updates.

Navigate to the Amazon EventBridge Console and select EventBridge Schedule and then choose Create schedule.

You can establish a recurring schedule for executing your entire workflow. A key benefit of this solution is the enhanced control it offers over the specific date and time you want your workflow to run. This allows for precise timing of your processes, which you can use to align the workflow execution with your operational needs or optimal data processing windows.

Select AWS Step Functions (as shown below) as your target.

Insert the input configuration file that you prepared previously as the input and click Next.

An additional step you can take is to set up a dead-letter queue with Amazon Simple Query Service (Amazon SQS) to handle failed Step Functions executions.
Monitoring and notification
To maintain the reliability, availability, and performance of Step Functions and your solution, set up monitoring and logging. You can set up an EventBridge rule to receive notifications about events that are of interest, such as batch inference being ready in the S3 bucket. Here is how you can set that up:

Navigate to Amazon Simple Notification Service (Amazon SNS) console and create an SNS topic that will be the target for your event.

Amazon SNS supports subscription for different endpoint types such as HTTP/HTTPS, email, Lambda, SMS, and so on. For this example, use an email endpoint.

After you create the topic and the subscription, navigate to the EventBridge console and select Create Rule. Define the details associated with the event such as the name, description, and the event bus.

To set up the event rule, you’ll use the pattern form. You use this form to define the specific events that will trigger notifications. For the batch segment job completion step, you should configure the source and detail-type fields as follows:

{
“source”: [“solutions.aws.personalize”],
“detail-type”: [“Personalize BatchSegmentJob status change”]
}

Select the SNS topic as your target and proceed.

With this procedure, you have set up an EventBridge rule to receive notifications on your email when an object is created in your batch inference bucket. You can also set up logic based on your use case to trigger any downstream processes such as creation of email campaigns with the results of your inference by choosing different targets such as Lambda.
Additionally, you can use Step Functions and Amazon Personalize monitoring through Amazon CloudWatch metrics. See Logging and Monitoring AWS Step Functions and Monitoring Amazon Personalize for more information.
Handling schema updates
Schema updates are available in Amazon Personalize for adding columns to the existing schema. Note that deleting columns from existing schemas isn’t currently supported. To update the schema, make sure that you’re modifying the schema in the input configuration passed to Step Functions. See Replacing a dataset’s schema to add new columns for more information.
Clean up
To avoid incurring additional costs, delete the resources you created during this solution walkthrough. You can clean up the solution by deleting the CloudFormation stack you deployed as part of the setup.
Using the console

Sign in to the AWS CloudFormation console.
On the Stacks page, select this solution’s installation stack.
Choose Delete.

Using the AWS CLI

$ aws cloudformation delete-stack —stack-name <installation-stack-name> 

Conclusion
This MLOps solution for Amazon Personalize offers a powerful, automated approach to creating and maintaining personalized user experiences at scale. By using AWS services like AWS CDK, Step Functions, and EventBridge, the solution streamlines the entire process from data preparation through model deployment and monitoring. The flexibility of this solution means that you can customize it to fit various use cases, and integration with EventBridge keeps models up to date. Delivering exceptional personalized experiences is critical for business growth, and this solution provides an efficient way to harness the power of Amazon Personalize to improve user engagement, customer loyalty, and business results. We encourage you to explore and adapt this solution to enhance your personalization efforts and stay ahead in the competitive digital landscape.
To learn more about the capabilities discussed in this post, check out Amazon Personalize features and the Amazon Personalize Developer Guide.
Additional resources:

How VistaPrint delivers personalized product recommendations with Amazon Personalize
Amazon Personalize launches new recipes supporting larger item catalogs with lower latency
Introducing automatic training for solutions in Amazon Personalize
Build a news recommender application with Amazon Personalize
Unlock personalized experiences powered by AI using Amazon Personalize and Amazon OpenSearch Service

About the Authors
Reagan Rosario brings over a decade of technical expertise to his role as a Sr. Specialist Solutions Architect in Generative AI at AWS. Reagan transforms enterprise systems through strategic implementation of AI-powered cloud solutions, automated workflows, and innovative architecture design. His specialty lies in guiding organizations through digital evolution—preserving core business value while implementing cutting-edge generative AI capabilities that dramatically enhance operations and create new possibilities.
Nensi Hakobjanyan is a Solutions Architect at Amazon Web Services, where she supports enterprise Retail and CPG customers in designing and implementing cloud solutions. In addition to her deep expertise in cloud architecture, Nensi brings extensive experience in Machine Learning and Artificial Intelligence, helping organizations unlock the full potential of data-driven innovation. She is passionate about helping customers through digital transformation and building scalable, future-ready solutions in the cloud.

Fast-track SOP processing using Amazon Bedrock

Standard operating procedures (SOPs) are essential documents in the context of regulations and compliance. SOPs outline specific steps for various processes, making sure practices are consistent, efficient, and compliant with regulatory standards.
SOP documents typically include key sections such as the title, scope, purpose, responsibilities, procedures, documentation, citations (references), and a detailed approval and revision history. In FDA-regulated industries such as healthcare and life sciences, SOPs play a crucial role in defining manufacturing, clinical, laboratory, quality control, quality assurance, and regulatory compliance practices.
When a regulatory body like the US Food and Drug Administration (FDA) introduces changes to regulations, organizations are required to evaluate the changes against their internal SOPs. When necessary, they must update their SOPs to align with the regulation changes and maintain compliance.
In this post, we show different approaches using Amazon Bedrock to identify relationships between regulation changes and SOPs.
Challenge
In the healthcare and life sciences industry, regulatory authorities like the FDA and the European Medicines Agency (EMA) frequently update regulations across various areas, such as clinical trials, medical devices, drug development and approvals, quality risk management, systems and data management, and technology adoption. These regulatory updates often require organizations to correspondingly update their internal SOPs to align with the changes. This process is typically manual, requiring a team of subject matter experts to review the regulatory changes, screen the SOPs to identify relevance, determine the impact, and specify what needs to be updated. This manual approach adds significant overhead for companies and can result in review cycles lasting several days to months.
To address this challenge, we explore approaches that can help automate the identification of relationships between regulatory changes and SOPs. These approaches can also be extended to assess the impact of regulatory changes on an organization’s internal processes and documentation. By using automation, companies can streamline the SOP update process, reducing the time and resources required to maintain alignment with evolving regulatory requirements.
Sample Data
For this post, we used SOPs published by the FDA’s Center for Biologics Evaluation and Research. These publicly available SOPs are used by the FDA staff to guide their duties.
Specifically, we focused on the following SOPs related to biologics procedures. This narrow scope allowed us to dive deeper into a specific regulatory domain within the larger healthcare and life sciences industry.

SOPP 8005: Formal Dispute Resolution Process
SOPP 8117: Issuing Tracking Numbers in Advance of Electronic Submissions in eCTD Format
SOPP 8201: Administrative Processing of Clinical Holds for Investigational New Drug Applications
SOPP 8301: Receipt and Processing of Master Files
SOPP 8403: Issuance, Reissuance, and Voluntary Revocation of Biological Products Licenses
SOPP 8507: Procedures for Responding to an Illegitimate Product Notification and Request for Termination of Notification
SOPP 8717: Required Biocompatibility Training and Toxicology Profiles for Evaluation of Medical Devices
SOPP 9151: Procedures for Handling Requests for Exemptions and Alternative Procedures for Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) Under 21 CFR Section 1271.155

In addition to the SOPs, we also used three of the FDA’s Biologics Guidance Documents to test the relationship between the regulatory documents and the SOPs.
These guidance documents describe the FDA’s policy interpretations on regulatory issues related to the biologics domain. They cover a wide range of topics, including processing, content, evaluation, approval, inspection, and enforcement of policies. The guidance documents also discuss specific products or issues relating to the design, production, labeling, promotion, manufacturing, and testing of regulated products.
We used the following specific FDA Biologics Guidance Documents for this analysis:

Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry
Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry
Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry

Approaches
A key step in assessing the impact of regulatory changes is to identify if a regulatory guidance is related to an organization’s SOPs. We used Amazon Bedrock along with Amazon Simple Storage Service (Amazon S3) to store the input dataset.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Our experiments used Anthropic’s Claude 3 Opus large language model (LLM) on Amazon Bedrock. However, you can use the broad selection of models available on Amazon Bedrock to experiment with alternative models and choose the one that best suits your specific requirements. Amazon Bedrock frequently releases updated versions of existing AI models that can be accessed and used by simply applying a configuration change, making it a highly flexible choice for deploying the latest AI capabilities.
We focused on the following approaches:

Full document match – Comparing the full text of the regulatory guidance and SOP documents
Text similarity – This approach consists of two options:

Vector embeddings – Measuring the semantic similarity between the guidance and SOP texts
Keyword Search – Identifying relevant keywords and their occurrences in the documents

Taxonomy topic match – Mapping the guidance and SOP content to a taxonomic structure to identify topical relationships

This post details the approaches we explored and the learnings from our experiments.
Full document match
The following diagram illustrates the full document match architecture.

In this approach, we compared each regulatory change to every SOP by passing the full contents of the SOP and the regulatory change to the model. The goal was to identify relationship between the regulatory change and the SOP.
The following is a sample prompt to check if an SOP is related to a regulation change:

Human:

Act as an expert who understands the Food and Drug Administration’s Center for Biologics Evaluation and Research (CBER) internal Biologics procedures, which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies).

Given a regulatory guidance change, and a document describing a CBER Biologics procedure of evaluation, research or review, tell me if the regulatory change is related the procedure.

<regulatory_change>
{regchange}
</regulatory_change>

<biologics_procedure>
{procedure}
</biologics_procedure>

To answer this question, first, determine what the subject of the regulatory change is. For example, the subject might be something like “regulatory considerations for human cells and tissues” or “processing of clinical holds for new drug applications”.

In order for the regulatory change to be related to the biologics procedure, the procedure description must include information about the topic. In other words, you should have a specific section from the procedure that is clearly related to the regulatory change in order to say that it is related.

For example, if a procedure description mentions cells or tissues, it could be related to how they need to be handled or manipulated. Similarly if a procedure mentions dispute resolution, it could be related to how a biologics procedure handles dispute resolution. Tell me whether the regulatory change is related to that the biologics procedure. Don’t guess, and explain your thinking. If you can’t find specific elements in the procedure that are clearly related to the regulations, then the answer is that the procedure is not related by the regulatory change.

Your answer should end with the word “RELATED” or “UNRELATED”, reflecting your overall evaluation of whether the procedure description is related to the regulation change or not.

When we ran the full document matching approach using Amazon Bedrock across all the SOPs and the regulatory guidance documents in the dataset, the results showed accurate identification of related SOPs. For example, SOPP 9151 was correctly identified as the only SOP related to the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation change, with others being identified as unrelated:

Assistant:

SOPP9151-061306.txt:

The regulatory change is about the regulation of human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides guidance on the criteria for determining how HCT/Ps will be regulated, registration and listing requirements, donor eligibility requirements, current good tissue practice requirements, and FDA inspection and enforcement of HCT/P establishments.

The biologics procedure is specifically about handling requests for exemptions and alternative procedures for HCT/Ps under 21 CFR 1271.155. The procedure outlines the responsibilities and steps for CBER staff to process and respond to these exemption requests.

There is a clear relationship between the regulatory change and the biologics procedure. The regulatory change provides the overall framework and requirements for regulating HCT/Ps, while the biologics procedure focuses on one specific aspect within that framework – handling exemption requests from certain HCT/P requirements.

The procedure directly references and provides instructions for implementing 21 CFR 1271.155, which is part of the HCT/P regulations covered in the regulatory change document. So the biologics procedure is a detailed process that enables the agency to grant exemptions and alternatives to the HCT/P requirements established in the regulations.

In summary, the regulatory change establishes the rules for HCT/Ps, and the biologics procedure describes how the agency will process requests for exceptions to those rules. Therefore, the two documents are closely related.

RELATED

Similarly, SOPP 8005 was correctly identified as the only SOP related to the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation change, and the other SOPs were identified as unrelated.
Finally, SOP 8201 was also correctly identified as the only SOP related to the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation change.
These results demonstrate the effectiveness of the full document matching approach in accurately linking the relevant SOPs to their corresponding regulatory guidance documents.
Text similarity
The following diagram illustrates the text similarity match workflow.

In our second approach, we indexed the SOPs using either vector embeddings for semantic similarity or a keyword-based similarity approach. This allowed us to submit the contents of a regulatory change as a query and return the most similar SOP documents.
The steps involved in this text similarity approach are:

Index the SOPs:

For a vector embeddings approach, we generated vector representations of the SOP contents using an LLM to capture semantic similarities.
For a keyword-based approach, we identified the most relevant keywords in each SOP and built an index based on their occurrences.

Query the index:

For a given regulatory change, we submitted the text as a query to the SOP index.
The index then returned the most similar SOPs based on the chosen similarity metric (semantic or keyword-based).

Vector Search
For the text similarity approach, we used the open source in-memory database ChromaDB to generate the vector embeddings and perform the search.
We created a collection within ChromaDB containing all the SOP documents. We then independently queried each regulation guidance document text against this SOP collection. We used the default L2 distance algorithm, where a lower distance score indicates a closer match between the query and the indexed SOP documents.
Although the vector embedding-based text similarity approach identified the top matching SOP document in some cases, it also produced some inaccurate results.
For example, when querying with the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, SOPP 9151 was correctly identified as the top match. However, a few other unrelated SOP documents also had low distance scores, which could potentially lead to them being misidentified as relevant:

[
    [
        “SOPP9151-061306.txt”,
        0.640560507774353
    ],
    [
        “SOPP-8717–Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt”,
        0.8971723914146423
    ],
    [
        “SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt”,
        0.9069873690605164
    ],
    [
        “SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt”,
        1.069345235824585
    ],
    [
        “SOPP-8507V4-IllegitimateProds-Final.txt”,
        1.1143898963928223
    ],
    [
        “SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt”,
        1.157564640045166
    ],
    [
        “SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt”,
        1.2106068134307861
    ],
    [
        “SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt”,
        1.3174282312393188
    ]
]

Similarly, when querying with the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation, the vector search incorrectly identified SOPP 8717 as the best match, whereas SOPP 8005, which is more directly related to formal dispute resolution, had a higher distance score:

[
    [
        “SOPP-8717–Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt”,
        0.848071277141571
    ],
       …
       …
]

Finally, for the regulation Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry, the vector search again identified SOPP 8717 as the top match, rather than the more relevant SOP 8201:

[
[
“SOPP-8717–Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt”,
0.8028254508972168
],


]

Keyword search
We also explored a keyword-based similarity method using the Whoosh Python search library. We first created an index of all the SOP documents using the Whoosh library. Then, for each regulation guidance document, we separately searched the index using a Whoosh query parser.
The Whoosh library returns a search score for each matched SOP document, where a higher score indicates a better match to the query.
When searching for the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, the top matching document was incorrectly identified as SOPP 8301 instead of the more relevant SOPP 9151:

8 results found
<Hit {‘fname’: ‘SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt’}> with score of 744.420492560645
<Hit {‘fname’: ‘SOPP-8507V4-IllegitimateProds-Final.txt’}> with score of 609.3431135327576
<Hit {‘fname’: ‘SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt’}> with score of 588.5899332740212
<Hit {‘fname’: ‘SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt’}> with score of 582.278310231856
<Hit {‘fname’: ‘SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt’}> with score of 449.11608593886564
<Hit {‘fname’: ‘SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt’}> with score of 377.9723456561558
<Hit {‘fname’: ‘SOPP9151-061306.txt’}> with score of 328.67972016789844
<Hit {‘fname’: ‘SOPP-8717–Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt’}> with score of 160.6985781375766

Similarly, when searching for the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation using the keyword-based similarity approach, the Whoosh library again incorrectly identified SOPP 8301 as the top matching document, instead of the more relevant SOPP 8005:

8 results found
<Hit {‘fname’: ‘SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt’}> with score of 721.9535225922815

In contrast with the previous examples, when searching for the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation, SOP 8201 was correctly identified as the top matching document:

8 results found
<Hit {‘fname’: ‘SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt’}> with score of 503.3284407067554

These results indicate that although the keyword-based similarity approach can be effective in some cases, it might also struggle to accurately identify the most relevant SOPs, similar to the challenges faced with the vector embedding-based method.
Taxonomy-based topic matching
The following diagram illustrates the taxonomy-based topic matching architecture.

In our third approach, we focused on building a hierarchical taxonomy of topics associated with the subject of biologics procedures. This taxonomy-based approach aimed to capture the topical relationships between the regulatory guidance documents and the SOPs. This approach has the potential to provide a cost-effective solution, because the entire SOP or input document doesn’t need to be passed with the prompt for every query to the LLM. Performing a full text match between an SOP and a regulatory change might not be a cost-effective approach, especially as the input documents grow in size.
The key steps in this approach were:
Step 1: We constructed a multi-level taxonomy that organized the topics related to biologics procedures. The taxonomy included a hierarchy of main topics, each of which could include subtopics.
The following is an example prompt for building the taxonomy:

Human:

Act as an expert in Biologics procedures for the Food & Drug Administration’s Center for Biologics Evaluation and Research (CBER), which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies). Biologics procedures help CBER staff in regulating administration and management of biologics evaluations and reviews including but not limited to clinical studies, electronic submissions, dispute resolutions and management of biologics and related products

I want you to create a hierarchy or taxonomy of topics relating to the biologics procedures. For example, some of the topics may be related to

    – Administrative Guidances

    – Adverse Events and Product Deviation Guidances

    – Application Submission Guidances

    – Biosimilars Guidances

    – Clinical Guidances

    – CMC & GMP Guidances

    – Devices Guidances

    – Generics Guidances

    – Labeling and Promotion Guidances

    – Allergenics Guidances

    – Blood Guidances

    – Cellular and Gene Therapy Guidances

    – Tissue Guidances

    – Vaccine and Related Biological Product Guidances

    – Xenotransplantation Guidances

    – Communication

    – Marketing Applications

    – Compliance

    – Medical Devices

    – Master Files

    – Investigational New Drugs

    – Communication

    – General Information

    – Therapeutic Products

Here’s an example of how a topic might be organized:

<topic>
  <name>Master Files</name>
  <subtopic>
      <name>Drug Substance</name>
  </subtopic>

  <subtopic>
      <name>Packaging Material</name>
  </subtopic>

  <subtopic>
      <name>Preparation Information</name>
  </subtopic>

  <subtopic>
      <name>Clinical Study Data</name>
  </subtopic>

  <subtopic>
      <name>Non-Clinical Study Data</name>
  </subtopic>
</topic>

Your hierarchy should include all the topics covering the CBER’s biologics procedures, and should be sufficiently exhaustive that it covers many different types of activities and duties in different areas of biologics administration, evaluation and review.

Think step by step, using these steps:

Think about all of the important topics that enable CBER’s staff to perform their duties in administering, evaluating, reviewing and managing of biologics and related products.
Organize those topics into a hierarchy or taxonomy. There should be a single top-level item called <biologics_topics>.
Output that hierarchy in an XML format. Make sure that each item in the XML is in plain English and doesn’t use special characters like the underscore.

The following is a sample taxonomy hierarchy returned by Amazon Bedrock when using Anthropic’s Claude 3 Opus LLM:

Assistant:

<biologics_topics>

  <topic>
    <name>Administrative Guidances</name>
    <subtopic>
      <name>Forms</name>
    </subtopic>
    <subtopic> 
      <name>Meetings</name>
    </subtopic>
    <subtopic>
      <name>Dispute Resolution</name> 
    </subtopic>
  </topic>

  <topic>
    <name>Submission Guidances</name>
    <subtopic>
      <name>Investigational New Drug Applications</name>
    </subtopic>
    <subtopic>
      <name>Biologics License Applications</name>
    </subtopic>
    <subtopic> 
      <name>Supplements and Other Changes to an Approved Application</name>
    </subtopic>
    <subtopic>
      <name>Biosimilars</name>
    </subtopic>
  </topic>

  <topic> 
    <name>Clinical Guidances</name>
    <subtopic>
      <name>Clinical Trial Design</name>
    </subtopic>
    <subtopic>
      <name>Human Subject Protection</name> 
    </subtopic>
    <subtopic>
      <name>Pediatric Studies</name>
    </subtopic>
  </topic>

  <topic>
    <name>Safety Guidances</name> 
    <subtopic>
      <name>Adverse Event Reporting</name>
    </subtopic>
    <subtopic>
      <name>Safety Assessments</name>
    </subtopic>
    <subtopic>
      <name>Pharmacovigilance</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Quality Guidances</name>
    <subtopic> 
      <name>Chemistry Manufacturing and Controls</name>
    </subtopic>
    <subtopic>
      <name>Good Manufacturing Practices</name>
    </subtopic>
    <subtopic>
      <name>Analytical Methods</name>
    </subtopic>
  </topic>

  <topic>
    <name>Labeling Guidances</name>
    <subtopic>
      <name>Prescribing Information</name> 
    </subtopic>
    <subtopic>
      <name>Medication Guides</name>
    </subtopic>
    <subtopic>
      <name>Promotional Labeling and Advertising</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Specific Guidances</name>
    <subtopic>
      <name>Vaccines</name>
    </subtopic> 
    <subtopic>
      <name>Blood and Blood Components</name>
    </subtopic>
    <subtopic>
      <name>Cellular and Gene Therapies</name> 
    </subtopic>
    <subtopic>
      <name>Allergenics</name>
    </subtopic>
    <subtopic> 
      <name>Tissues</name>
    </subtopic>
    <subtopic>
      <name>Xenotransplantation Products</name>
    </subtopic>
  </topic>

  <topic>
    <name>Compliance and Enforcement</name>
    <subtopic>
      <name>Inspections</name>
    </subtopic>
    <subtopic>
      <name>Recalls</name> 
    </subtopic>
    <subtopic>
      <name>Warning Letters</name>
    </subtopic>
  </topic>

</biologics_topics>

This hierarchy covers the key areas involved in CBER’s regulation of biologics, from administrative processes, to investigational products, marketing applications, clinical development, safety monitoring, quality control, labeling, and compliance for the major categories of biological products. The structure aims to capture the breadth of guidances and procedures that CBER staff rely on to ensure the safety, purity, potency and effectiveness of biologics and related products.
Step 2: Using the taxonomy we built, we labeled each SOP with the topics that are covered in the SOP. We instructed the model to select the most specific ones from the list of topics and subtopics.
The following is an example prompt to label SOPs with taxonomy topics using Anthropic’s Claude 3 Opus model:

Human:

Act as an expert who understands Biologics procedures.

The following information describes the details of a Biologics procedure. Choose the topics covered in the description only from the following list of possible topics:

{possible_topics}

<procedure_description>

{procedure}

</procedure_description>

Explain your thinking, and end the explanation by generating a JSON string array containing all of the matching topic names.

All topic names must come from the list of possible topics. Do not return a topic unless it is found within the list of valid topics, subtopics, and sub-subtopics. 

If you generate a topic name that is not found in the list of possible topics, it is an error.

Make sure to select the most specific leaf level sub-subtopic or subtopic available from the list of possible topics, subtopics, and sub-subtopics. Be sure your list covers all of the main points of the Biologics procedure description.

The following is the example SOP topic labeling output from Amazon Bedrock:
Assistant:

SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt: [‘Administrative Guidances’, ‘Dispute Resolution’]

SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt: [‘Administrative Guidances’, ‘Submission Guidances’, ‘Investigational New Drug Applications’, ‘Biologics License Applications’]

SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt: [‘Investigational New Drug Applications’, ‘Meetings’, ‘Clinical Trial Design’, ‘Adverse Event Reporting’]

SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt: [‘Administrative Guidances’, ‘Submission Guidances’, ‘Compliance and Enforcement’]

SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt: [‘Submission Guidances’, ‘Compliance and Enforcement’, ‘Labeling Guidances’]

SOPP-8507V4-IllegitimateProds-Final.txt: [‘Compliance and Enforcement’, ‘Inspections’, ‘Recalls’]

SOPP-8717–Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt: [‘Product Quality Guidances’, ‘Submission Guidances’, ‘Administrative Guidances’]

SOPP9151-061306.txt: [‘Cellular and Gene Therapies’, ‘Inspections’, ‘Tissues’]

Step 3: To find the relationships between the regulatory guidance documents and the SOPs, we followed a similar approach to label the regulatory changes with the most specific topics or subtopics from the built taxonomy.
The following is an example prompt to label regulatory guidance documents with taxonomy topics:

Human:
Act as an expert who understands Biologics procedures. The following information describes a regulatory guidance or change that affects how certain Biologics procedures. Please choose the main topic covered in the change description from the following list of possible topics:
{topics_list}
<regulatory_guidance_description>
{regulatory_guidance} </regulatory_guidance_description>
Explain your thinking, and end the explanation by generating an XML item called <topic> with the relevant topic string in it. Make sure the topic is the most specific one available from the list of possible topics, subtopics, and descendants of those.
You must choose a single leaf-level XML node from the list of possible topics. The correct answer may be in a <topic> node, or <subtopic> or <subsubtopic>. Choose only one, and return the <name> of the selected node inside of a XML item called <topic>.
Assistant:

Step 4: If the topic assigned to a regulatory change document matches one of the topics of an SOP, we can conclude the regulatory change is related to the SOP.
The following are example outputs of regulatory guidance topic alignment using Amazon Bedrock with the Anthropic Claude 3 Opus model.
The first example uses regulatory guidance Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry. The aligned taxonomy topic is Cellular and Gene Therapies. This topic aligns with SOPP 9151.

For reg change Regs/Regulation_Of_HCT_Ps_Small_Entity_Compliance_Guide.txt:

The regulatory guidance is focused on human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides an overview of the regulations in 21 CFR Part 1271 that apply to HCT/Ps.

The key topics covered include:

Criteria for regulation solely under section 361 of the PHS Act vs as drugs, devices and/or biological products
Registration and listing requirements for HCT/P establishments
Donor eligibility requirements
Current Good Tissue Practice (CGTP) requirements
FDA inspection and enforcement related to HCT/P establishments

Looking at the list of possible topics, the most relevant high-level topic is:

<topic>

  <name>Product Specific Guidances</name>

Within that, the most applicable subtopic is:

<subtopic>

  <name>Cellular and Gene Therapies</name>

Therefore, the most specific and relevant topic for this regulatory guidance is:

<topic>Cellular and Gene Therapies</topic>

The Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulatory guidance was aligned with the Dispute Resolution topic, which was also identified as a key topic covered in SOPP 8005.

For reg change Regs/Formal_Dispute_Resolution.txt:

The regulatory guidance description is about formal dispute resolution procedures for sponsors to appeal scientific and/or medical disputes to management above the division level at CDER and CBER. This falls under the broad topic of “Administrative Guidances”, and the specific subtopic of “Dispute Resolution”.

Therefore, the most specific and appropriate topic for this guidance is:

<topic>Dispute Resolution</topic>

The Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulatory guidance was mapped to the Investigational New Drug Applications topic, which was found to be aligned with the content covered in SOP 8201, indicating a strong relationship between the guidance and this particular SOP:
For reg change Regs/Submitting_And_Reviewing_Complete_Responses_To_Clinical_Holds.txt:
The regulatory guidance is about the process for submitting and reviewing responses to clinical holds on INDs. The key points are:

When FDA imposes a clinical hold on an IND, the study cannot proceed until the sponsor submits a complete response addressing all the clinical hold issues, and FDA notifies the sponsor they can proceed.
The guidance describes what the sponsor should include in the complete response, how to submit it, and how FDA will review and respond to it within 30 days.
It also covers procedural details like how FDA will track and measure the 30-day response timeline for PDUFA goals.

Looking at the list of possible topics, this guidance falls under:

<topic>

  <name>Submission Guidances</name>

  <subtopic>

    <name>Investigational New Drug Applications</name>

  </subtopic>

</topic>

Since it is specifically about the process for responding to clinical holds on INDs, the most relevant leaf-level topic is:

<topic>Investigational New Drug Applications</topic>

The taxonomic alignment approach was effective in accurately identifying the relationships between the regulatory changes and the SOPs in the test dataset.
Learnings
The following table summarizes our observations. SOPs formatted as red bold italic in the table are misidentified.

Regulation
Full text match
Taxonomy topic match
Text similarity

Vector embeddings
Keyword Search

Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry
SOPP 9151
SOPP 9151
SOPP 9151
SOPP 8301

Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry
SOPP 8005
SOPP 8005
SOPP 8717
SOPP 8201

Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry
SOP 8201
SOP 8201
SOPP 8717
SOP 8201

The combination of the full text matching and taxonomy-based topic matching approaches, using Amazon Bedrock and the Anthropic Claude 3 Opus model, enabled accurate identification of the SOPs most closely related to the regulation guidance documents in the dataset. In contrast, the text similarity methods using vector embeddings and keyword search were less successful in correctly matching the SOPs to the relevant regulatory guidance documents.
Conclusion
In this post, we explored various approaches to quickly identify the relationships between regulatory changes and an organization’s SOPs, using Amazon Bedrock and Anthropic’s Claude 3 Opus model. The methods we evaluated included full text matching, text similarity using vector embeddings and keyword search, and a taxonomy-based topic alignment approach.
Our findings indicate that the full text matching and taxonomy-based topic matching were the most effective in accurately identifying the SOPs most closely related to the regulation guidance documents in the test dataset. In contrast, the text similarity techniques using vector embeddings and keyword search were less reliable in consistently matching the SOPs to the relevant regulatory documents.
Both the full text matching and taxonomy-based approaches can be viable options for organizations to assess the relationships between regulatory changes and their internal SOPs. The full text matching might provide more accurate results but requires providing the complete text of the SOP or input document, which could have cost implications.
The taxonomy-based approach, on the other hand, offers a structured way to map the content of the documents to a customizable topic hierarchy. Although the initial taxonomy might not be complete or fully accurate, it can be further enriched and tailored to an organization’s specific needs. If you choose a taxonomy-based approach, you can use a machine-generated starting point and then refine it to better suit your domain and use case requirements.
By adopting the taxonomy-based approach and adapting it to their specific needs, organizations can not only identify the relationships between regulatory changes and SOPs, they can also assess the potential impact of regulatory changes on their internal procedures. This can help streamline the process of screening SOPs against new regulations and fast-track the impact assessment in regulated industries like life sciences and others.
If you want to implement a similar solution in your AWS environment, reach out to your AWS account team for assistance.
I would like to acknowledge Greg Sommerville, Thomaz Silva and Murtuza Bootwala for their contributions to this blog. It couldn’t have been done without them.

About the Author
Ganesh Raam Ramadurai is a Senior Technical Program Manager at Amazon Web Services (AWS), where he leads the PACE (Prototyping and Cloud Engineering) team. He specializes in delivering innovative, AI/ML and Generative AI-driven prototypes that help AWS customers explore emerging technologies and unlock real-world business value. With a strong focus on experimentation, scalability, and impact, Ganesh works at the intersection of strategy and engineering—accelerating customer innovation and enabling transformative outcomes across industries.

The Legal Accountability of AI-Generated Deepfakes in Election Misinfo …

How Deepfakes Are Created

Generative AI models enable the creation of highly realistic fake media. Most deepfakes today are produced by training deep neural networks on real images, video or audio of a target person. The two predominant AI architectures are generative adversarial networks (GANs) and autoencoders. A GAN consists of a generator network that produces synthetic images and a discriminator network that tries to distinguish fakes from real data. Through iterative training, the generator learns to produce outputs that increasingly fool the discriminator¹. Autoencoder-based tools similarly learn to encode a target face and then decode it onto a source video. In practice, deepfake creators use accessible software: open-source tools like DeepFaceLab and FaceSwap dominate video face-swapping (one estimate suggests DeepFaceLab was used for over 95% of known deepfake videos)². Voice-cloning tools (often built on similar AI principles) can mimic a person’s speech from minutes of audio. Commercial platforms like Synthesia allow text-to-video avatars (turning typed scripts into lifelike “spokespeople”), which have already been misused in disinformation campaigns³. Even mobile apps (e.g. FaceApp, Zao) let users do basic face swaps in minutes⁴. In short, advances in GANs and related models make deepfakes cheaper and easier to generate than ever.

Diagram of a generative adversarial network (GAN): A generator network creates fake images from random input and a discriminator network distinguishes fakes from real examples. Over time the generator improves until its outputs “fool” the discriminator⁵

During creation, a deepfake algorithm is typically trained on a large dataset of real images or audio from the target. The more varied and high-quality the training data, the more realistic the deepfake. The output often then undergoes post-processing (color adjustments, lip-syncing refinements) to enhance believability¹. Technical defenses focus on two fronts: detection and authentication. Detection uses AI models to spot inconsistencies (blinking irregularities, audio artifacts or metadata mismatches) that betray a synthetic origin⁵. Authentication embeds markers before dissemination – for example, invisible watermarks or cryptographically signed metadata indicating authenticity⁶. The EU AI Act will soon mandate that major AI content providers embed machine-readable “watermark” signals in synthetic media⁷. However, as GAO notes, detection is an arms race – even a marked deepfake can sometimes evade notice – and labels alone don’t stop false narratives from spreading⁸⁹.

Deepfakes in Recent Elections: Examples

Deepfakes and AI-generated imagery already have made headlines in election cycles around the world. In the 2024 U.S. primary season, a digitally-altered audio robocall mimicked President Biden’s voice urging Democrats not to vote in the New Hampshire primary. The caller (“Susan Anderson”) was later fined $6 million by the FCC and indicted under existing telemarketing laws¹⁰¹¹. (Importantly, FCC rules on robocalls applied regardless of AI: the perpetrator could have used a voice actor or recording instead.) Also in 2024, former President Trump posted on social media a collage implying that pop singer Taylor Swift endorsed his campaign, using AI-generated images of Swift in “Swifties for Trump” shirts¹². The posts sparked media uproar, though analysts noted the same effect could have been achieved without AI (e.g., by photoshopping text on real images)¹². Similarly, Elon Musk’s X platform carried AI-generated clips, including a parody “Ad” depicting Vice-President Harris’s voice via an AI clone¹³.

Beyond the U.S., deepfake-like content has appeared globally. In Indonesia’s 2024 presidential election, a video surfaced on social media in which a convincingly generated image of the late President Suharto appeared to endorse the candidate of the Golkar Party. Days later, the endorsed candidate (who is Suharto’s son-in-law) won the presidency¹⁴. In Bangladesh, a viral deepfake video superimposed the face of opposition leader Rumeen Farhana onto a bikini-clad body – an incendiary fabrication designed to discredit her in the conservative Muslim-majority society¹⁵. Moldova’s pro-Western President Maia Sandu has been repeatedly targeted by AI-driven disinformation; one deepfake video falsely showed her resigning and endorsing a Russian-friendly party, apparently to sow distrust in the electoral process¹⁶. Even in Taiwan (amidst tensions with China), a TikTok clip circulated that synthetically portrayed a U.S. politician making foreign-policy statements – stoking confusion ahead of Taiwanese elections¹⁷. In Slovakia’s recent campaign, AI-generated audio mimicking the liberal party leader suggested he plotted vote-rigging and beer-price hikes – instantly spreading on social media just days before the election¹⁸. These examples show that deepfakes have touched diverse polities (from Bangladesh and Indonesia to Moldova, Slovakia, India and beyond), often aiming to undermine candidates or confuse voters¹⁵¹⁸.

Notably, many of the most viral “deepfakes” in 2024 were actually circulated as obvious memes or claims, rather than subtle deceptions. Experts observed that outright undetectable AI deepfakes were relatively rare; more common were AI-generated memes plainly shared by partisans, or cheaply doctored “cheapfakes” made with basic editing tools¹³¹⁹. For instance, social media was awash with memes of Kamala Harris in Soviet garb or of Black Americans holding Trump signs¹³, but these were typically used satirically, not meant to be secretly believed. Nonetheless, even unsophisticated fakes can sway opinion: a U.S. study found that false presidential ads (not necessarily AI-made) did change voter attitudes in swing states. In sum, deepfakes are a real and growing phenomenon in election campaigns²⁰²¹ worldwide – a trend taken seriously by voters and regulators alike.

U.S. Legal Framework and Accountability

In the U.S., deepfake creators and distributors of election misinformation face a patchwork of tools, but no single comprehensive federal “deepfake law.” Existing laws relevant to disinformation include statutes against impersonating government officials, electioneering (such as the Bipartisan Campaign Reform Act, which requires disclaimers on political ads), and targeted statutes like criminal electioneering communications. In some cases ordinary laws have been stretched: the NH robocall used the Telephone Consumer Protection Act and mail/telemarketing fraud provisions, resulting in the $6M fine and a criminal charge. Similarly, voice impostors can potentially violate laws against “false advertising” or “unlawful corporate communications.” However, these laws were enacted before AI, and litigators have warned they often do not fit neatly. For example, deceptive deepfake claims not tied to a specific victim do not easily fit into defamation or privacy torts. Voter intimidation laws (prohibiting threats or coercion) also leave a gap for non-threatening falsehoods about voting logistics or endorsements.

Recognizing these gaps, some courts and agencies are invoking other theories. The U.S. Department of Justice has recently charged individuals under broad fraud statutes (e.g. for a plot to impersonate an aide to swing votes in 2020), and state attorneys general have considered deepfake misinformation as interference with voting rights. Notably, the Federal Election Commission (FEC) is preparing to enforce new rules: in April 2024 it issued an advisory opinion limiting “non-candidate electioneering communications” that use falsified media, effectively requiring that political ads use only real images of the candidate. If finalized, that would make it unlawful for campaigns to pay for ads depicting a candidate saying things they never did. Similarly, the Federal Trade Commission (FTC) and Department of Justice (DOJ) have signaled that purely commercial deepfakes could violate consumer protection or election laws (for example, liability for mass false impersonation or for foreign-funded electioneering).

U.S. Legislation and Proposals

Federal lawmakers have proposed new statutes. The DEEPFAKES Accountability Act (H.R.5586 in the 118th Congress) would, among other things, impose a disclosure requirement: political ads featuring a manipulated media likeness would need clear disclaimers identifying the content as synthetic. It also increases penalties for producing false election videos or audio intended to influence the vote. While not yet enacted, supporters argue it would provide a uniform rule for all federal and state campaigns. The Brennan Center supports transparency requirements over outright bans, suggesting laws should narrowly target deceptive deepfakes in paid ads or certain categories (e.g. false claims about time/place/manner of voting) while carving out parody and news coverage.

At the state level, over 20 states have passed deepfake laws specifically for elections. For example, Florida and California forbid distributing falsified audio/visual media of candidates with intent to deceive voters (though Florida’s law exempts parody). Some states (like Texas) define “deepfake” in statutes and allow candidates to sue or revoke candidacies of violators. These measures have had mixed success: courts have struck down overly broad provisions that acted as prior restraints (e.g. Minnesota’s 2023 law was challenged for threatening injunctions against anyone “reasonably believed” to violate it). Critically, these state laws raise First Amendment issues: political speech is highly protected, so any restriction must be tightly tailored. Already, Texas and Virginia statutes are under legal review, and Elon Musk’s company has sued under California’s law (which requires platforms to label or block deepfakes) as unconstitutional. In practice, most lawsuits have so far centered on defamation or intellectual property (for instance, a celebrity suing over a botched celebrity-deepfake video), rather than election-focused statutes.

Policy Recommendations: Balancing Integrity and Speech

Given the rapidly evolving technology, experts recommend a multi-pronged approach. Most stress transparency and disclosure as core principles. For example, the Brennan Center urges requiring any political communication that uses AI-synthesized images or voice to include a clear label. This could be a digital watermark or a visible disclaimer. Transparency has two advantages: it forces campaigns and platforms to “own” the use of AI, and it alerts audiences to treat the content with skepticism.

Outright bans on all deepfakes would likely violate free speech, but targeted bans on specific harms (e.g. automated phone calls impersonating voters, or videos claiming false polling information) may be defensible. Indeed, Florida already penalizes misuse of recordings in voter suppression. Another recommendation is limited liability: tying penalties to demonstrable intent to mislead, not to the mere act of content creation. Both U.S. federal proposals and EU law generally condition fines on the “appearance of fraud” or deception.

Technical solutions can complement laws. Watermarking original media (as encouraged by the EU AI Act) could deter the reuse of authentic images in doctored fakes. Open tools for deepfake detection – some supported by government research grants – should be deployed by fact-checkers and social platforms. Making detection datasets publicly available (e.g. the MIT OpenDATATEST) helps improve AI models to spot fakes. International cooperation is also urged: cross-border agreements on information-sharing could help trace and halt disinformation campaigns. The G7 and APEC have all recently committed to fighting election interference via AI, which may lead to joint norms or rapid response teams.

Ultimately, many analysts believe the strongest “cure” is a well-informed public: education campaigns to teach voters to question sensational media, and a robust independent press to debunk falsehoods swiftly. While the law can penalize the worst offenders, awareness and resilience in the electorate are crucial buffers against influence operations. As Georgia Tech’s Sean Parker quipped in 2019, “the real question is not if deepfakes will influence elections, but who will be empowered by the first effective one.” Thus policies should aim to deter malicious use without unduly chilling innovation or satire.

References:

2024 Deepfakes Guide and Statistics

https://www.wired.com/story/synthesia-ai-deepfakes-it-control-riparbelli/.

https://www.gao.gov/products/gao-24-107292.

https://technologyquotient.freshfields.com/post/102jb19/eu-ai-act-unpacked-8-new-rules-on-deepfakes.

https://knightcolumbia.org/blog/we-looked-at-78-election-deepfakes-political-misinformation-is-not-an-ai-problem.

https://www.npr.org/2024/12/21/nx-s1-5220301/deepfakes-memes-artificial-intelligence-elections.

https://apnews.com/article/artificial-intelligence-elections-disinformation-chatgpt-bc283e7426402f0b4baa7df280a4c3fd.

https://www.lawfaremedia.org/article/new-and-old-tools-to-tackle-deepfakes-and-election-lies-in-2024.

https://www.brennancenter.org/our-work/research-reports/regulating-ai-deepfakes-and-synthetic-media-political-arena.

https://firstamendment.mtsu.edu/article/political-deepfakes-and-elections/.

https://www.ncsl.org/technology-and-communication/deceptive-audio-or-visual-media-deepfakes-2024-legislation.

https://law.unh.edu/sites/default/files/media/2022/06/nagumotu_pp113-157.pdf.

https://dfrlab.org/2024/10/02/brazil-election-ai-research/.

https://dfrlab.org/2024/11/26/brazil-election-ai-deepfakes/.

https://freedomhouse.org/article/eu-digital-services-act-win-transparency.

The post The Legal Accountability of AI-Generated Deepfakes in Election Misinformation appeared first on MarkTechPost.

Deploy Amazon SageMaker Projects with Terraform Cloud

Amazon SageMaker Projects empower data scientists to self-serve Amazon Web Services (AWS) tooling and infrastructure to organize all entities of the machine learning (ML) lifecycle, and further enable organizations to standardize and constrain the resources available to their data science teams in pre-packaged templates.
For AWS customers using Terraform to define and manage their infrastructure-as-code (IaC), the current best practice for enabling Amazon SageMaker Projects carries a dependency on AWS CloudFormation to facilitate integration between AWS Service Catalog and Terraform. This blocks enterprise customers whose IT governance prohibit use of vendor-specific IaC such as CloudFormation from using Terraform Cloud.
This post outlines how you can enable SageMaker Projects with Terraform Cloud, removing the CloudFormation dependency.
AWS Service Catalog engine for Terraform Cloud
SageMaker Projects are directly mapped to AWS Service Catalog products. To obviate the use of CloudFormation, these products must be designated as Terraform products that use the AWS Service Catalog Engine (SCE) for Terraform Cloud. This module, actively maintained by Hashicorp, contains AWS-native infrastructure for integrating Service Catalog with Terraform Cloud so that your Service Catalog products are deployed using the Terraform Cloud platform.
By following the steps in this post, you can use the Service Catalog engine to deploy SageMaker Projects directly from Terraform Cloud.
Prerequisites
To successfully deploy the example, you must have the following:

An AWS account with the necessary permissions to create and manage SageMaker Projects and Service Catalog products. See the Service Catalog documentation for more information on Service Catalog permissions.
An existing Amazon SageMaker Studio domain with an associated Amazon SageMaker user profile. The SageMaker Studio domain must have SageMaker Projects enabled. See Use quick setup for Amazon SageMaker AI.
A Unix terminal with the AWS Command Line Interface (AWS CLI) and Terraform installed. See the Installing or updating to the latest version of the AWS CLIand the Install Terraform for more information about installation.
An existing Terraform Cloud account with the necessary permissions to create and manage workspaces. See the following tutorials to quickly create your own account:

HCP Terraform – intro and sign Up
Log In to HCP Terraform from the CLI

See Terraform teams and organizations documentation for more information about Terraform Cloud permissions.
Deployment steps

Clone the sagemaker-custom-project-templates repository from the AWS Samples GitHub to your local machine, update the submodules, and navigate to the mlops-terraform-cloud directory.

$ git clone https://github.com/aws-samples/sagemaker-custom-project-templates.git
$ cd sagemaker-custom-project_templates
$ git submodule update –init –recursive
$ cd mlops-terraform-cloud

The preceding code base above creates a Service Catalog portfolio, adds the SageMaker Project template as a Service Catalog product to the portfolio, allows the SageMaker Studio role to access the Service Catalog product, and adds the necessary tags to make the product visible in SageMaker Studio. See Create Custom Project Templates in the SageMaker Projects Documentation for more information about this process.

Login to your Terraform Cloud account

$ terraform login

This prompts your browser to sign into your HCP account and generates a security token. Copy this security token and paste it back into your terminal.

Navigate to your AWS account and retrieve the SageMaker user role Amazon Resource Name (ARN) for the SageMaker user profile associated with your SageMaker Studio domain. This role is used to grant SageMaker Studio users permissions to create and manage SageMaker Projects.

In the AWS Management Console for Amazon SageMaker, choose Domains from the navigation pane
Select your studio domain
Under User Profiles, select your user profile
In the User Details, copy the ARN

Create a tfvars file with the necessary variables for the Terraform Cloud workspace

$ cp terraform.tfvars.example terraform.tfvars

Set the appropriate values in the newly created tfvars file. The following variables are required:

tfc_organization = “my-tfc-organization”
tfc_team = “aws-service-catalog”
token_rotation_interval_in_days = 30
sagemaker_user_role_arns = [“arn:aws:iam::XXXXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole”]

Make sure that your desired Terraform Cloud (TFC) organization has the proper entitlements and that your tfc_team is unique for this deployment. See the Terraform Organizations Overview for more information on creating organizations.

Initialize the Terraform Cloud workspace

$ terraform init

Apply the Terraform Cloud workspace

$ terraform apply

Go back to the SageMaker console using the user profile associated with the SageMaker user role ARN that you copied previously and choose Open Studio application
In the navigation pane, choose Deployments and then choose Projects
Choose Create project, select the mlops-tf-cloud-example product and then choose Next
In Project details, enter a unique name for the template and (option) enter a project description. Choose Create
In a separate tab or window, go back to your Terraform Cloud account’s Workspaces and you’ll see a workspace being provisioned directly from your SageMaker Project deployment. The naming convention of the Workspace will be <ACCOUNT_ID>-<SAGEMAKER_PROJECT_ID>

Further customization
This example can be modified to include custom Terraform in your SageMaker Project template. To do so, define your Terraform in the mlops-product/product directory. When ready to deploy, be sure to archive and compress this Terraform using the following command:

$ cd mlops-product
$ tar -czf product.tar.gz product

Cleanup
To remove the resources deployed by this example, run the following from the project directory:

$ terraform destroy

Conclusion
In this post you defined, deployed, and provisioned a SageMaker Project custom template purely in Terraform. With no dependencies on other IaC tools, you can now enable SageMaker Projects strictly within your Terraform Enterprise infrastructure.

About the author
Max Copeland is a Machine Learning Engineer for AWS, leading customer engagements spanning ML-Ops, data science, data engineering, and generative AI.

How ZURU improved the accuracy of floor plan generation by 109% using …

ZURU Tech is on a mission to change the way we build, from town houses and hospitals to office towers, schools, apartment blocks, and more. Dreamcatcher is a user-friendly platform developed by ZURU that allows users with any level of experience to collaborate in the building design and construction process. With the simple click of a button, an entire building can be ordered, manufactured and delivered to the construction site for assembly.
ZURU collaborated with AWS Generative AI Innovation Center and AWS Professional Services to implement a more accurate text-to-floor plan generator using generative AI. With it, users can specify a description of the building they want to design using natural language. For example, instead of designing the foundation, walls, and key aspects of a building from scratch, a user could enter, “Create a house with three bedrooms, two bathrooms, and an outdoor space for entertainment.” The solution would generate a unique floor plan within the 3D design space, allowing users with a non-technical understanding of architecture and construction to create a well-designed house
In this post, we show you why a solution using a large language model (LLM) was chosen. We explore how model selection, prompt engineering, and fine-tuning can be used to improve results. And we explain how the team made sure they could iterate quickly through an evaluation framework using key services such as Amazon Bedrock and Amazon SageMaker.
Understanding the challenge
The foundation for generating a house within Dreamcatcher’s 3D building system is to first confirm we can generate a 2D floor plan based on the user’s prompt. The ZURU team found that generating 2D floor plans, such as the one in the following image, using different machine learning (ML) techniques requires success across two key criteria.

First, the model must understand rooms, the purpose of each room, and their orientation to one another within a two-dimensional vector system. This can also be described as how well the model can adhere to the features described from a user’s prompt. Second, there is also a mathematical component to making sure rooms adhere to criteria such as specific dimensions and floor space. To be certain that they were on the right track and to allow for fast R&D iteration cycles, the ZURU team created a novel evaluation framework that would measure the output of different models based on showing the level of accuracy across these two key metrics.
The ZURU team initially looked at using generative adversarial networks (GAN) for floor plan generation, but experimentation with a GPT2 LLM had positive results based on the test framework. This reinforced the idea that an LLM-based approach could provide the required accuracy for a text-to–floor plan generator.
Improving the results
To improve on the results of the GPT2 model, we worked together and defined two further experiments. The first was a prompt engineering approach. Using Anthropic’s Claude 3.5 Sonnet in Amazon Bedrock the team was able to evaluate the impact of a leading proprietary model with contextual examples included in the prompts.  The second approach focused on using fine-tuning with Llama 3B variants to evaluate the improvement of accuracy when the model weights are directly influenced using high-quality examples.
Dataset preparation and analysis
To create the initial dataset, floor plans from thousands of houses were gathered from publicly available sources and reviewed by a team of in-house architects. To streamline the review process, the ZURU team built a custom application with a simple yes/no decision mechanism similar to those found in popular social matching applications, allowing architects to quickly approve plans compatible with the ZURU building system or reject those with disqualifying features. This intuitive approach significantly accelerated ZURU’s evaluation process while maintaining clear decision criteria for each floor plan.
To further enhance this dataset, we began with careful dataset preparation including filtering out the low-quality data (30%) by evaluating the metric score of ground truth dataset. Following this filtering mechanism, data points not achieving 100% accuracy on instruction adherence are removed from the training dataset. This data preparation technique helped to improve the efficiency and quality of the fine-tuning and prompt engineering by more than 20%.
During our exploratory data analysis we found that the dataset contained prompts that can match multiple floor plans as well as floor plans that could match multiple prompts. By moving all related prompt and floor plan combinations to the same data split (either training, validation, or testing) we were able to prevent data leakage and promote robust evaluation.
Prompt engineering approach
As part of our approach, we implemented dynamic matching for few-shot prompting that is different from traditional static sampling methods. Combining this with the implementation of prompt decomposition, we could increase the overall accuracy of the generated floor plan content.
With a dynamic few-shot prompting methodology, we retrieve the most relevant examples at run time based on the details of the input prompt from a high-quality dataset and provide this as part of the prompt to the generative AI model.
The dynamic few-shot prompting approach is further enhanced by prompt decomposition, where we break down complex tasks into smaller, more manageable components to achieve better results from language models. By decomposing queries, each component can be optimized for its specific purpose. We found that combining these methods resulted in improved relevancy in example selection and lower latency in retrieving the example data, leading to better performance and higher quality results.
Prompt engineering architecture
The workflow and architecture implemented for prototyping shown in the following figure demonstrates a systematic approach to AI model optimization. When a user query such as “Build me a house with three bedrooms and two bathrooms” is entered, the workflow follows these steps:

We use prompt decomposition to execute three smaller tasks that retrieve highly relevant examples that match the same features for a house that the user has requested
We use the relevant examples and inject it into the prompt to perform dynamic few-shot prompting to generate a floor plan
We use the reflection technique to ask the generative AI model to self-reflect and asses that the generated content adheres to our requirements

Deep dive on workflow and architecture
The first step in our workflow is to understand the unique features of the house, which we can use as search criteria to find the most relevant examples in the subsequent steps. For this step, we use Amazon Bedrock, which provides a serverless API-driven endpoint for inference. From the wide range of generative AI models offered by Amazon Bedrock, we choose Mistral 7B, which provides the right balance between cost, latency, and accuracy required for this small decomposed step.
The second step is to search for the most relevant examples using the unique features we found. We use Amazon Bedrock Knowledge Bases backed by Amazon OpenSearch Serverless as a vector database to implement metadata filtering and hybrid search to retrieve the most relevant record identifiers. Amazon Simple Storage Service (Amazon S3) is used for storage of the data set, and Amazon Bedrock Knowledge Bases provides a managed solution for vectorizing and indexing the metadata into the vector database.
In the third step, we retrieve the actual floor plan data by record identifier using Amazon DynamoDB. By splitting the search and retrieval of floor plan examples into two steps, we were able to use purpose-built services with Amazon OpenSearch, allowing for low-latency search, and DynamoDB for low-latency data retrieval by key value leading to optimized performance.
After retrieving the most relevant examples for the user’s prompt, in step four we use Amazon Bedrock and Anthropic’s Claude 3.5 Sonnet as a model with leading benchmarks in deep reasoning and mathematics to generate our new floor plan.
Finally, in step five, we implement reflection. We use Amazon Bedrock with Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock again and pass the original prompt, instructions, examples and newly generated floor plan back with a final instruction for the model to reflect and double-check its generated floor plan and correct mistakes.
Fine-tuning approach
We explored two methods for optimizing LLMs for automated floorplan generation: full parameter fine-tuning and Low-Rank Adaptation (LoRA)–based fine-tuning. Full fine-tuning adjusts all LLM parameters, which requires significant memory and training time. In contrast, LoRA tunes only a small subset of parameters, reducing memory requirements and training time.
Workflow and architecture
We implemented our workflow containing data processing, fine-tuning, and inference and testing steps shown in the following figure below, all within a SageMaker Jupyter Lab Notebook provisioned with an ml.p4.24xlarge instance, giving us access to Nvidia A100 GPUs. Because we used a Jupyter notebook and ran all parts of our workflow interactively, we were able to iterate quickly and debug our experiments while maturing the training and testing scripts.

Deep dive on fine tuning workflow
One key insight from our experiments was the critical importance of dataset quality and diversity. Further to our initial dataset preparation, when fine-tuning a model, we found that carefully selecting training samples with larger diversity helped the model learn more robust representations. Additionally, although larger batch sizes generally improved performance (within memory constraints), we had to carefully balance this against computational resources (320 GB GPU memory in an ml.p4.24xlarge  instance) and training time (ideally within 1–2 days).
We conducted several iterations to optimize performance, experimenting with various approaches including initial few-sample quick instruction fine-tuning, larger dataset fine-tuning, fine-tuning with early stopping, comparing Llama 3.1 8B and Llama 3 8B models, and varying instruction length in fine-tuning samples. Through these iterations, we found that full fine-tuning of the Llama 3.1 8B model using a curated dataset of 200,000 samples produced the best results.
The training process for full fine-tuning Llama 3.1 8B with BF16 and a microbatch size of three involved eight epochs with 30,000 steps, taking 25 hours to complete. In contrast, the LoRA approach showed significant computational efficiency, requiring only 2 hours of training time and producing an 89 MB checkpoint.
Evaluation framework
The testing framework implements an efficient evaluation methodology that optimizes resource utilization and time while maintaining statistical validity. Key components include:

A prompt deduplication system that identifies and consolidates duplicate instructions in the test dataset, reducing computational overhead and enabling faster iteration cycles for model improvement
A distribution-based performance assessment that filters unique test cases, promotes representative sampling through statistical analysis, and projects results across the full dataset
A metric-based evaluation that implements scoring across key criteria enabling comparative analysis against both the baseline GPT2 model and other approaches.

Results and business impact
To understand how well each approach in our experiment performed, we used the evaluation framework and compared several key metrics. For the purposes of this post, we focus on two of these key metrics. The first reflects how well the model was able to follow users’ instructions to reflect the features required in the house. The second metric looks at how well the features of the house adhered to instructions in mathematical and positioning and orientation. The following image show these results in a graph.

We found that the prompt engineering approach with Anthropic’s Claude 3.5 Sonnet as well as the full fine-tuning approach with Llama 3.1 8b increased the instruction adherence quality over the baseline GPT2 model by 109%, showing that, depending on a team’s skillsets, both approaches could be used to improve the quality of understanding an LLM when generating content such as floor plans.
When looking at mathematical correctness, our prompt engineering approach wasn’t able to create significant improvements over the baseline, but full fine-tuning was a clear winner with a 54% increase over the baseline GPT2 results.
The LoRA-based tuning approach achieves slightly lower performance scores being 20% less in the metric scores on instruction adherence and 50% lower scores on mathematical correctness compared to full fine-tuning, demonstrating the tradeoffs that can be made when it comes to time, cost, and hardware compared to model accuracy.
Conclusion
ZURU Tech has set its vision on fundamentally transforming the way we design and construct buildings. In this post, we highlighted the approach to building and improving a text-to–floor plan generator based on LLMs to create a highly useable and streamlined workflow within a 3D-modeling system. We dived into advanced concepts of prompt engineering using Amazon Bedrock and detailed approaches to fine-tuning LLMs using Amazon SageMaker, showing the different tradeoffs you can make to significantly improve on the accuracy of the content that is generated.
To learn more about the Generative AI Innovation Center program, get in touch with your account team.

About the Authors
Federico Di Mattia is the team leader and Product Owner of ZURU AI at ZURU Tech in Modena, Italy. With a focus on AI-driven innovation, he leads the development of Generative AI solutions that enhance business processes and drive ZURU’s growth.
Niro Amerasinghe is a Senior Solutions Architect based out of Auckland, New Zealand. With experience in architecture, product development, and engineering, he helps customers in using Amazon Web Services (AWS) to grow their businesses.
Haofei Feng is a Senior Cloud Architect at AWS with over 18 years of expertise in DevOps, IT Infrastructure, Data Analytics, and AI. He specializes in guiding organizations through cloud transformation and generative AI initiatives, designing scalable and secure GenAI solutions on AWS. Based in Sydney, Australia, when not architecting solutions for clients, he cherishes time with his family and Border Collies.
Sheldon Liu is an applied scientist, ANZ Tech Lead at the AWS Generative AI Innovation Center. He partners with enterprise customers across diverse industries to develop and implement innovative generative AI solutions, accelerating their AI adoption journey while driving significant business outcomes.
Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.
Simone Bartoli is a Machine Learning Software Engineer at ZURU Tech, in Modena, Italy. With a background in computer vision, machine learning, and full-stack web development, Simone specializes in creating innovative solutions that leverage cutting-edge technologies to enhance business processes and drive growth.
Marco Venturelli is a Senior Machine Learning Engineer at ZURU Tech in Modena, Italy. With a background in computer vision and AI, he leverages his experience to innovate with generative AI, enriching the Dreamcatcher software with smart features.
Stefano Pellegrini is a Generative AI Software Engineer at ZURU Tech in Italy. Specializing in GAN and diffusion-based image generation, he creates tailored image-generation solutions for various departments across ZURU.
Enrico Petrucci is a Machine Learning Software Engineer at ZURU Tech, based in Modena, Italy. With a strong background in machine learning and NLP tasks, he currently focuses on leveraging Generative AI and Large Language Models to develop innovative agentic systems that provide tailored solutions for specific business cases.

Going beyond AI assistants: Examples from Amazon.com reinventing indus …

Generative AI revolutionizes business operations through various applications, including conversational assistants such as Amazon’s Rufus and Amazon Seller Assistant. Additionally, some of the most impactful generative AI applications operate autonomously behind the scenes, an essential capability that empowers enterprises to transform their operations, data processing, and content creation at scale. These non-conversational implementations, often in the form of agentic workflows powered by large language models (LLMs), execute specific business objectives across industries without direct user interaction.
Non-conversational applications offer unique advantages such as higher latency tolerance, batch processing, and caching, but their autonomous nature requires stronger guardrails and exhaustive quality assurance compared to conversational applications, which benefit from real-time user feedback and supervision.
This post examines four diverse Amazon.com examples of such generative AI applications:

Amazon.com listing creation and catalog data quality improvements – Demonstrating how LLMs are helping selling partners and Amazon.com create higher-quality listings at scale
Prescription processing in Amazon Pharmacy – Showcasing implementation in a highly regulated environment and task decomposition for agentic workflows
Review highlights – Illustrating massive scale batch processing, traditional machine learning (ML) integration, use of smaller LLMs, and cost-effective solution at scale
Amazon Ads creative image and video generation – Highlighting multimodal generative AI and responsible AI practices in creative endeavors

Each case study reveals different aspects of implementing non-conversational generative AI applications, from technical architecture to operational considerations. Throughout these examples, you will learn how the comprehensive suite of AWS services, including Amazon Bedrock and Amazon SageMaker, are the key to success. Finally, we list key learnings commonly shared across these use cases.
Creating high-quality product listings on Amazon.com
Creating high-quality product listings with comprehensive details helps customers make informed purchase decisions. Traditionally, selling partners manually entered dozens of attributes per product. The new generative AI solution, launched in 2024, transforms this process by proactively acquiring product information from brand websites and other sources to improve the customer experience across numerous product categories.
Generative AI simplifies the selling partner experience by enabling information input in various formats such as URLs, product images, or spreadsheets and automatically translating this into the required structure and format. Over 900,000 selling partners have used it, with nearly 80% of generated listing drafts accepted with minimal edits. AI-generated content provides comprehensive product details that help with clarity and accuracy, which can contribute to product discoverability in customer searches.
For new listings, the workflow begins with selling partners providing initial information. The system then generates comprehensive listings using multiple information sources, including titles, descriptions, and detailed attributes. Generated listings are shared with selling partners for approval or editing.
For existing listings, the system identifies products that can be enriched with additional data.
Data integration and processing for a large variety of outputs
The Amazon team built robust connectors for internal and external sources with LLM-friendly APIs using Amazon Bedrock and other AWS services to seamlessly integrate into Amazon.com backend systems.
A key challenge is synthesizing diverse data into cohesive listings across more than 50 attributes, both textual and numerical. LLMs require specific control mechanisms and instructions to accurately interpret ecommerce concepts because they might not perform optimally with such complex, varied data. For example, LLMs might misinterpret “capacity” in a knife block as dimensions rather than number of slots, or mistake “Fit Wear” as a style description instead of a brand name. Prompt engineering and fine-tuning were extensively used to address these cases.
Generation and validation with LLMs
The generated product listings should be complete and correct. To help this, the solution implements a multistep workflow using LLMs for both generation and validation of attributes. This dual-LLM approach helps prevent hallucinations, which is critical when dealing with safety hazards or technical specifications. The team developed advanced self-reflection techniques to make sure the generation and validation processes complement each other effectively.
The following figure illustrates the generation process with validation both performed by LLMs.

Figure 1. Product Listing creation workflow

Multi-layer quality assurance with human feedback
Human feedback is central to the solution’s quality assurance. The process includes Amazon.com experts for initial evaluation and selling partner input for acceptance or edits. This provides high-quality output and enables ongoing enhancement of AI models.
The quality assurance process includes automated testing methods combining ML-, algorithm-, or LLM-based evaluations. Failed listings undergo regeneration, and successful listings proceed to further testing. Using causal inference models, we identify underlying features affecting listing performance and opportunities for enrichment. Ultimately, listings that pass quality checks and receive selling partner acceptance are published, making sure customers receive accurate and comprehensive product information.
The following figure illustrates the workflow of going to production with testing, evaluation, and monitoring of product listing generation.

Figure 2. Product Listing testing and human in the loop workflow

Application-level system optimization for accuracy and cost
Given the high standards for accuracy and completeness, the team adopted a comprehensive experimentation approach with an automated optimization system. This system explores various combinations of LLMs, prompts, playbooks, workflows, and AI tools to iterate for higher business metrics, including cost. Through continuous evaluation and automated testing, the product listing generator effectively balances performance, cost, and efficiency while staying adaptable to new AI developments. This approach means customers benefit from high-quality product information, and selling partners have access to cutting-edge tools for creating listings efficiently.
Generative AI-powered prescription processing in Amazon Pharmacy
Building upon the human-AI hybrid workflows previously discussed in the seller listing example, Amazon Pharmacy demonstrates how these principles can be applied in a Health Insurance Portability and Accountability Act (HIPAA)-regulated industry. Having shared a conversational assistant for patient care specialists in the post Learn how Amazon Pharmacy created their LLM-based chat-bot using Amazon SageMaker, we now focus on automated prescription processing, which you can read about in The life of a prescription at Amazon Pharmacy and the following research paper in Nature Magazine.
At Amazon Pharmacy, we developed an AI system built on Amazon Bedrock and SageMaker to help pharmacy technicians process medication directions more accurately and efficiently. This solution integrates human experts with LLMs in creation and validation roles to enhance precision in medication instructions for our patients.
Agentic workflow design for healthcare accuracy
The prescription processing system combines human expertise (data entry technicians and pharmacists) with AI support for direction suggestions and feedback. The workflow, shown in the following diagram, begins with a pharmacy knowledge-based preprocessor standardizing raw prescription text in Amazon DynamoDB, followed by fine-tuned small language models (SLMs) on SageMaker identifying critical components (dosage, frequency).

(a)

(b)
(c)

Figure 3. (a) Data entry technician and pharmacist workflow with two GenAI modules, (b) Suggestion module workflow and (c) Flagging module workflow

The system seamlessly integrates experts such as data entry technicians and pharmacists, where generative AI complements the overall workflow towards agility and accuracy to better serve our patients. A direction assembly system with safety guardrails then generates instructions for data entry technicians to create their typed directions through the suggestion module. The flagging module flags or corrects errors and enforces further safety measures as feedback provided to the data entry technician. The technician finalizes highly accurate, safe-typed directions for pharmacists who can either provide feedback or execute the directions to the downstream service.
One highlight from the solution is the use of task decomposition, which empowers engineers and scientists to break the overall process into a multitude of steps with individual modules made of substeps. The team extensively used fine-tuned SLMs. In addition, the process employs traditional ML procedures such as named entity recognition (NER) or estimation of final confidence with regression models. Using SLMs and traditional ML in such contained, well-defined procedures significantly improved processing speed while maintaining rigorous safety standards due to incorporation of appropriate guardrails on specific steps.
The system comprises multiple well-defined substeps, with each subprocess operating as a specialized component working semi-autonomously yet collaboratively within the workflow toward the overall objective. This decomposed approach, with specific validations at each stage, proved more effective than end-to-end solutions while enabling the use of fine-tuned SLMs. The team used AWS Fargate to orchestrate the workflow given its current integration into existing backend systems.
In their product development journey, the team turned to Amazon Bedrock, which provided high-performing LLMs with ease-of-use features tailored to generative AI applications. SageMaker enabled further LLM selections, deeper customizability, and traditional ML methods. To learn more about this technique, see How task decomposition and smaller LLMs can make AI more affordable and read about the Amazon Pharmacy business case study.
Building a reliable application with guardrails and HITL
To comply with HIPAA standards and provide patient privacy, we implemented strict data governance practices alongside a hybrid approach that combines fine-tuned LLMs using Amazon Bedrock APIs with Retrieval Augmented Generation (RAG) using Amazon OpenSearch Service. This combination enables efficient knowledge retrieval while maintaining high accuracy for specific subtasks.
Managing LLM hallucinations—which is critical in healthcare—required more than just fine-tuning on large datasets. Our solution implements domain-specific guardrails built on Amazon Bedrock Guardrails, complemented by human-in-the-loop (HITL) oversight to promote system reliability.
The Amazon Pharmacy team continues to enhance this system through real-time pharmacist feedback and expanded prescription format capabilities. This balanced approach of innovation, domain expertise, advanced AI services, and human oversight not only improves operational efficiency, but means that the AI system properly augments healthcare professionals in delivering optimal patient care.
Generative AI-powered customer review highlights
Whereas our previous example showcased how Amazon Pharmacy integrates LLMs into real-time workflows for prescription processing, this next use case demonstrates how similar techniques—SLMs, traditional ML, and thoughtful workflow design—can be applied to offline batch inferencing at massive scale.
Amazon has introduced AI-generated customer review highlights to process over 200 million annual product reviews and ratings. This feature distills shared customer opinions into concise paragraphs highlighting positive, neutral, and negative feedback about products and their features. Shoppers can quickly grasp consensus while maintaining transparency by providing access to related customer reviews and keeping original reviews available.
The system enhances shopping decisions through an interface where customers can explore review highlights by selecting specific features (such as picture quality, remote functionality, or ease of installation for a Fire TV). Features are visually coded with green check marks for positive sentiment, orange minus signs for negative, and gray for neutral—which means shoppers can quickly identify product strengths and weaknesses based on verified purchase reviews. The following screenshot shows review highlights regarding noise level for a product.

Figure 4. An example product review highlights for a product.

A recipe for cost-effective use of LLMs for offline use cases
The team developed a cost-effective hybrid architecture combining traditional ML methods with specialized SLMs. This approach assigns sentiment analysis and keyword extraction to traditional ML while using optimized SLMs for complex text generation tasks, improving both accuracy and processing efficiency. The following diagram shows ttraditional ML and LLMs working to provide the overall workflow.

Figure 5. Use of traditional ML and LLMs in a workflow.

The feature employs SageMaker batch transform for asynchronous processing, significantly reducing costs compared to real-time endpoints. To deliver a near zero-latency experience, the solution caches extracted insights alongside existing reviews, reducing wait times and enabling simultaneous access by multiple customers without additional computation. The system processes new reviews incrementally, updating insights without reprocessing the complete dataset. For optimal performance and cost-effectiveness, the feature uses Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances for batch transform jobs, providing up to 40% better price-performance to alternatives.
By following this comprehensive approach, the team effectively managed costs while handling the massive scale of reviews and products so that the solution remained both efficient and scalable.
Amazon Ads AI-powered creative image and video generation
Having explored mostly text-centric generative AI applications in previous examples, we now turn to multimodal generative AI with Amazon Ads creative content generation for sponsored ads. The solution has capabilities for image and video generation, the details of which we share in this section. In common, this solution uses Amazon Nova creative content generation models at its core.
Working backward from customer need, a March 2023 Amazon survey revealed that nearly 75% of advertisers struggling with campaign success cited creative content generation as their primary challenge. Many advertisers—particularly those without in-house capabilities or agency support—face significant barriers due to the expertise and costs of producing quality visuals. The Amazon Ads solution democratizes visual content creation, making it accessible and efficient for advertisers of different sizes. The impact has been substantial: advertisers using AI-generated images in Sponsored Brands campaigns saw nearly 8% click-through rates (CTR) and submitted 88% more campaigns than non-users.
Last year, the AWS Machine Learning Blog published a post detailing the image generation solution. Since then, Amazon has adopted Amazon Nova Canvas as its foundation for creative image generation, creating professional-grade images from text or image prompts with features for text-based editing and controls for color scheme and layout adjustments.
In September 2024, the Amazon Ads team included the creation of short-form video ads from product images. This feature uses foundation models available on Amazon Bedrock to give customers control over visual style, pacing, camera motion, rotation, and zooming through natural language, using an agentic workflow to first describe video storyboards and then generate the content for the story. The following screenshot shows an example of creative image generation for product backgrounds on Amazon Ads.

Figure 6. Ads image generation example for a product.

As discussed in the original post, responsible AI is at the center of the solution, and Amazon Nova creative models come with built-in controls to support safety and responsible AI use, including watermarking and content moderation.
The solution uses AWS Step Functions with AWS Lambda functions to orchestrate serverless orchestration of both image and video generation processes. Generated content is stored in Amazon Simple Storage Service (Amazon S3) with metadata in DynamoDB, and Amazon API Gateway provides customer access to the generation capabilities. The solution now employs Amazon Bedrock Guardrails in addition to maintaining Amazon Rekognition and Amazon Comprehend integration at various steps for additional safety checks. The following screenshot shows creative AI-generated videos on Amazon Ads campaign builder.

Figure 7. Ads video generation for a product

Creating high-quality ad creatives at scale presented complex challenges. The generative AI model needed to produce appealing, brand-appropriate images across diverse product categories and advertising contexts while remaining accessible to advertisers regardless of technical expertise. Quality assurance and improvement are fundamental to both image and video generation capabilities. The system undergoes continual enhancement through extensive HITL processes enabled by Amazon SageMaker Ground Truth. This implementation delivers a powerful tool that transforms advertisers’ creative process, making high-quality visual content creation more accessible across diverse product categories and contexts.
This is just the beginning of Amazon Ads using generative AI to empower advertising customers to create the content they need to drive their advertising objectives. The solution demonstrates how reducing creative barriers directly increases advertising activity while maintaining high standards for responsible AI use.
Key technical learnings and discussions
Non-conversational applications benefit from higher latency tolerance, enabling batch processing and caching, but require robust validation mechanisms and stronger guardrails due to their autonomous nature. These insights apply to both non-conversational and conversational AI implementations:

Task decomposition and agentic workflows – Breaking complex problems into smaller components has proven valuable across implementations. This deliberate decomposition by domain experts enables specialized models for specific subtasks, as demonstrated in Amazon Pharmacy prescription processing, where fine-tuned SLMs handle discrete tasks such as dosage identification. This strategy allows for specialized agents with clear validation steps, improving reliability and simplifying maintenance. The Amazon seller listing use case exemplifies this through its multistep workflow with separate generation and validation processes. Additionally, the review highlights use case showcased cost-effective and controlled use of LLMs by using traditional ML for preprocessing and performing parts that could be associated with an LLM task.
Hybrid architectures and model selection – Combining traditional ML with LLMs provides better control and cost-effectiveness than pure LLM approaches. Traditional ML excels at well-defined tasks, as shown in the review highlights system for sentiment analysis and information extraction. Amazon teams have strategically deployed both large and small language models based on requirements, integrating RAG with fine-tuning for effective domain-specific applications like the Amazon Pharmacy implementation.
Cost optimization strategies – Amazon teams achieved efficiency through batch processing, caching mechanisms for high-volume operations, specialized instance types such as AWS Inferentia and AWS Trainium, and optimized model selection. Review highlights demonstrates how incremental processing reduces computational needs, and Amazon Ads used Amazon Nova foundation models (FMs) to cost-effectively create creative content.
Quality assurance and control mechanisms – Quality control relies on domain-specific guardrails through Amazon Bedrock Guardrails and multilayered validation combining automated testing with human evaluation. Dual-LLM approaches for generation and validation help prevent hallucinations in Amazon seller listings, and self-reflection techniques improve accuracy. Amazon Nova creative FMs provide inherent responsible AI controls, complemented by continual A/B testing and performance measurement.
HITL implementation – The HITL approach spans multiple layers, from expert evaluation by pharmacists to end-user feedback from selling partners. Amazon teams established structured improvement workflows, balancing automation and human oversight based on specific domain requirements and risk profiles.
Responsible AI and compliance – Responsible AI practices include content ingestion guardrails for regulated environments and adherence to regulations such as HIPAA. Amazon teams integrated content moderation for user-facing applications, maintained transparency in review highlights by providing access to source information, and implemented data governance with monitoring to promote quality and compliance.

These patterns enable scalable, reliable, and cost-effective generative AI solutions while maintaining quality and responsibility standards. The implementations demonstrate that effective solutions require not just sophisticated models, but careful attention to architecture, operations, and governance, supported by AWS services and established practices.
Next steps
The examples from Amazon.com shared in this post illustrate how generative AI can create value beyond traditional conversational assistants. We invite you to follow these examples or create your own solution to discover how generative AI can reinvent your business or even your industry. You can visit the AWS generative AI use cases page to start the ideation process.
These examples showed that effective generative AI implementations often benefit from combining different types of models and workflows. To learn what FMs are supported by AWS services, refer to Supported foundation models in Amazon Bedrock and Amazon SageMaker JumpStart Foundation Models. We also suggest you explore Amazon Bedrock Flows, which can ease the path towards building workflows. Additionally, we remind you that Trainium and Inferentia accelerators provide important cost savings in these applications.
Agentic workflows, as illustrated in our examples, have proven particularly valuable. We recommend exploring Amazon Bedrock Agents for quickly building agentic workflows.
Successful generative AI implementation extends beyond model selection—it represents a comprehensive software development process from experimentation to application monitoring. To begin building your foundation across these essential services, we invite you to explore Amazon QuickStart.
Conclusion
These examples demonstrate how generative AI extends beyond conversational assistants to drive innovation and efficiency across industries. Success comes from combining AWS services with strong engineering practices and business understanding. Ultimately, effective generative AI solutions focus on solving real business problems while maintaining high standards of quality and responsibility.
To learn more about how Amazon uses AI, refer to Artificial Intelligence in Amazon News.

About the Authors
Burak Gozluklu is a Principal AI/ML Specialist Solutions Architect and lead GenAI Scientist Architect for Amazon.com on AWS, based in Boston, MA. He helps strategic customers adopt AWS technologies and specifically Generative AI solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. He maintains his connection to academia as a research affiliate at MIT. Outside of work, Burak is an enthusiast of yoga.
Emilio Maldonado is a Senior leader at Amazon responsible for Product Knowledge, oriented at building systems to scale the e-commerce Catalog metadata, organize all product attributes, and leverage GenAI to infer precise information that guides Sellers and Shoppers to interact with products. He’s passionate about developing dynamic teams and forming partnerships. He holds a Bachelor of Science in C.S. from Tecnologico de Monterrey (ITESM) and an MBA from Wharton, University of Pennsylvania.
Wenchao Tong is a Sr. Principal Technologist at Amazon Ads in Palo Alto, CA, where he spearheads the development of GenAI applications for creative building and performance optimization. His work empowers customers to enhance product and brand awareness and drive sales by leveraging innovative AI technologies to improve creative performance and quality. Wenchao holds a Master’s degree in Computer Science from Tongji University. Outside of work, he enjoys hiking, board games, and spending time with his family.
Alexandre Alves is a Sr. Principal Engineer at Amazon Health Services, specializing in ML, optimization, and distributed systems. He helps deliver wellness-forward health experiences.
Puneet Sahni is Sr. Principal Engineer in Amazon. He works on improving the data quality of all products available in Amazon catalog. He is passionate about leveraging product data to improve our customer experiences. He has a Master’s degree in Electrical engineering from Indian Institute of Technology (IIT) Bombay. Outside of work he enjoying spending time with his young kids and travelling.
Vaughn Schermerhorn is a Director at Amazon, where he leads Shopping Discovery and Evaluation—spanning Customer Reviews, content moderation, and site navigation across Amazon’s global marketplaces. He manages a multidisciplinary organization of applied scientists, engineers, and product leaders focused on surfacing trustworthy customer insights through scalable ML models, multimodal information retrieval, and real-time system architecture. His team develops and operates large-scale distributed systems that power billions of shopping decisions daily. Vaughn holds degrees from Georgetown University and San Diego State University and has lived and worked in the U.S., Germany, and Argentina. Outside of work, he enjoys reading, travel, and time with his family.
Tarik Arici is a Principal Applied Scientist at Amazon Selection and Catalog Systems (ASCS), working on Catalog Quality Enhancement using GenAI workflows. He has a PhD in Electrical and Computer Engineering from Georgia Tech. Outside of work, Tarik enjoys swimming and biking.