Understanding the Universal Tool Calling Protocol (UTCP)

The Universal Tool Calling Protocol (UTCP) is a lightweight, secure, and scalable way for AI agents and applications to find and call tools directly, without the need for additional wrapper servers.

Key Features

Lightweight and secure – Allows tools to be accessed directly, avoiding unnecessary middle layers.

Scalable – Can support a large number of tools and providers without losing performance.

Modular design – Version 1.0.0 introduces a plugin-based core, making the protocol easier to extend, test, and package.

Built on Pydantic models – Provides simple, well-defined data structures that make implementation straightforward.

The Problem with Current Approaches

Traditional solutions for integrating tools often require:

Building and maintaining wrapper servers for every tool

Routing all traffic through a central protocol or service

Reimplementing authentication and security for each tool

Accepting additional latency and complexity

These steps add friction for developers and slow down execution.

The UTCP Solution

UTCP offers a better alternative by:

Defining a clear, language-agnostic standard for describing tools and their interfaces

Allowing agents to connect directly to tools using their native communication protocols

Providing an architecture that lets developers add:

New communication protocols (HTTP, SSE, CLI, etc.)

Alternative storage systems

Custom search strategies

All of this can be done without modifying the core library.

By eliminating the need for wrapper servers or other heavy middle layers, UTCP streamlines the way AI agents and applications connect with tools. It reduces latency and overall complexity, since requests no longer have to pass through extra infrastructure. Authentication and security become simpler as well, because UTCP allows agents to use the tool’s existing mechanisms rather than duplicating them in an intermediary service. This leaner approach also makes it easier to build, test, and maintain integrations, while naturally supporting growth as the number of tools and providers increases.

How It Works

UTCP makes tool integration simple and predictable. First, an AI agent discovers your tools by fetching a UTCP manual, which contains definitions and metadata for every capability you expose. Next, the agent learns how to call these tools by reading the manual and understanding the associated call templates. Once the definitions are clear, the agent can invoke your APIs directly using their native communication protocols. Finally, your API processes the request and returns a normal response. This process ensures seamless interoperability without extra middleware or custom translation layers.

Source: https://www.utcp.io/

Architecture Overview

Version 1.0 of UTCP introduces a modular, plugin-based architecture designed for scalability and flexibility. At its core are manuals, which define tools and their metadata, as well as call templates that specify how to interact with each tool over different protocols. 

The UTCP Client acts as the engine for discovering tools and executing calls. Around this core is a plugin system that supports protocol adapters, custom communication methods, tool repositories, and search strategies. This separation of concerns makes it easy to extend the system or customize it for a particular environment without altering its foundation.

How is UTCP different from MCP?

UTCP and MCP both help AI agents connect with external tools, but they focus on different needs. UTCP enables direct calls to APIs, CLIs, WebSockets, and other interfaces through simple JSON manuals, keeping infrastructure light and latency low. MCP provides a more structured layer, wrapping tools behind dedicated servers and standardizing communication with JSON-RPC.

Key points:

Architecture: UTCP connects agents straight to tools; MCP uses a server layer for routing.

Performance & Overhead: UTCP minimizes hops; MCP centralizes calls but adds a layer of processing.

Infrastructure: UTCP requires only manuals and a discovery endpoint, while MCP relies on servers for wrapping and routing.

Protocol Support: UTCP works across HTTP, WebSocket, CLI, SSE, and more; MCP focuses on JSON-RPC transport.

Security & Auth: UTCP uses the tool’s existing mechanisms, while MCP manages access inside its servers.

Flexibility: UTCP supports hybrid deployments through its MCP plugin, while MCP offers centralized management and monitoring.

Both approaches are useful: UTCP is ideal for lightweight, flexible integrations, while MCP suits teams wanting a standardized gateway with built-in control.

Conclusion

UTCP is a versatile solution for both tool providers and AI developers. It lets API owners, SaaS providers, and enterprise teams expose services like REST or GraphQL endpoints to AI agents in a simple, secure way. At the same time, developers building agents or applications can use UTCP to connect effortlessly with internal or external tools. By removing complexity and overhead, it streamlines integration and makes it easier for software to access powerful capabilities.

The post Understanding the Universal Tool Calling Protocol (UTCP) appeared first on MarkTechPost.

Meta AI Proposes ‘Metacognitive Reuse’: Turning LLM Chains-of-Thou …

Meta researchers introduced a method that compresses repeated reasoning patterns into short, named procedures—“behaviors”—and then conditions models to use them at inference or distills them via fine-tuning. The result: up to 46% fewer reasoning tokens on MATH while matching or improving accuracy, and up to 10% accuracy gains in a self-improvement setting on AIME, without changing model weights. The work frames this as procedural memory for LLMs—how to reason, not just what to recall—implemented with a curated, searchable “behavior handbook.”

https://arxiv.org/pdf/2509.13237

What problem does this solve?

Long chain-of-thought (CoT) traces repeatedly re-derive common sub-procedures (e.g., inclusion–exclusion, base conversions, geometric angle sums). That redundancy burns tokens, adds latency, and can crowd out exploration. Meta’s idea is to abstract recurring steps into concise, named behaviors (name + one-line instruction) recovered from prior traces via an LLM-driven reflection pipeline, then reuse them during future reasoning. On math benchmarks (MATH-500; AIME-24/25), this reduces output length substantially while preserving or improving solution quality.

How does the pipeline work?

Three roles, one handbook:

Metacognitive Strategist (R1-Llama-70B):

solves a problem to produce a trace, 2) reflects on the trace to identify generalizable steps, 3) emits behaviors as (behavior_name → instruction) entries. These populate a behavior handbook (procedural memory).

Teacher (LLM B): generates behavior-conditioned responses used to build training corpora.

Student (LLM C): consumes behaviors in-context (inference) or is fine-tuned on behavior-conditioned data.Retrieval is topic-based on MATH and embedding-based (BGE-M3 + FAISS) on AIME.

Prompts: The team provides explicit prompts for solution, reflection, behavior extraction, and behavior-conditioned inference (BCI). In BCI, the model is instructed to reference behaviors explicitly in its reasoning, encouraging consistently short, structured derivations.

What are the evaluation modes?

Behavior-Conditioned Inference (BCI): Retrieve K relevant behaviors and prepend them to the prompt.

Behavior-Guided Self-Improvement: Extract behaviors from a model’s own earlier attempts and feed them back as hints for revision.

Behavior-Conditioned SFT (BC-SFT): Fine-tune students on teacher outputs that already follow behavior-guided reasoning, so the behavior usage becomes parametric (no retrieval at test time).

Key results (MATH, AIME-24/25)

Token efficiency: On MATH-500, BCI reduces reasoning tokens by up to 46% versus the same model without behaviors, while matching or improving accuracy. This holds for both R1-Llama-70B and Qwen3-32B students across token budgets (2,048–16,384).

Self-improvement gains: On AIME-24, behavior-guided self-improvement beats a critique-and-revise baseline at nearly every budget, with up to 10% higher accuracy as budgets increase, indicating better test-time scaling of accuracy (not just shorter traces).

BC-SFT quality lift: Across Llama-3.1-8B-Instruct, Qwen2.5-14B-Base, Qwen2.5-32B-Instruct, and Qwen3-14B, BC-SFT consistently outperforms (accuracy) standard SFT and the original base across budgets, while remaining more token-efficient. Importantly, the advantage is not explained by an easier training corpus: teacher correctness rates in the two training sets (original vs. behavior-conditioned) are close, yet BC-SFT students generalize better on AIME-24/25.

Why does this work?

The handbook stores procedural knowledge (how-to strategies), distinct from classic RAG’s declarative knowledge (facts). By converting verbose derivations into short, reusable steps, the model skips re-derivation and reallocates compute to novel subproblems. Behavior prompts serve as structured hints that bias the decoder toward efficient, correct trajectories; BC-SFT then internalizes these trajectories so that behaviors are implicitly invoked without prompt overhead.

What’s inside a “behavior”?

Behaviors range from domain-general reasoning moves to precise mathematical tools, e.g.,

behavior_inclusion_exclusion_principle: avoid double counting by subtracting intersections;

behavior_translate_verbal_to_equation: formalize word problems systematically;

behavior_distance_from_point_to_line: apply |Ax+By+C|/√(A²+B²) for tangency checks.During BCI, the student explicitly cites behaviors when they’re used, making traces auditable and compact.

Retrieval and cost considerations

On MATH, behaviors are retrieved by topic; on AIME, top-K behaviors are selected via BGE-M3 embeddings and FAISS. While BCI introduces extra input tokens (the behaviors), input tokens are pre-computable and non-autoregressive, and are often billed cheaper than output tokens on commercial APIs. Since BCI shrinks output tokens, the overall cost can drop while latency improves. BC-SFT eliminates retrieval at test time entirely.

Image source: marktechpost.com

Summary

Meta’s behavior-handbook approach operationalizes procedural memory for LLMs: it abstracts recurring reasoning steps into reusable “behaviors,” applies them via behavior-conditioned inference or distills them with BC-SFT, and empirically delivers up to 46% fewer reasoning tokens with accuracy that holds or improves (≈10% gains in self-correction regimes). The method is straightforward to integrate—an index, a retriever, optional fine-tuning—and surfaces auditable traces, though scaling beyond math and managing a growing behavior corpus remain open engineering problems.

Check out the PAPER. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Meta AI Proposes ‘Metacognitive Reuse’: Turning LLM Chains-of-Thought into a Procedural Handbook that Cuts Tokens by 46% appeared first on MarkTechPost.

IBM and ETH Zürich Researchers Unveil Analog Foundation Models to Tac …

IBM researchers, together with ETH Zürich, have unveiled a new class of Analog Foundation Models (AFMs) designed to bridge the gap between large language models (LLMs) and Analog In-Memory Computing (AIMC) hardware. AIMC has long promised a radical leap in efficiency—running models with a billion parameters in a footprint small enough for embedded or edge devices—thanks to dense non-volatile memory (NVM) that combines storage and computation. But the technology’s Achilles’ heel has been noise: performing matrix-vector multiplications directly inside NVM devices yields non-deterministic errors that cripple off-the-shelf models.

Why does analog computing matter for LLMs?

Unlike GPUs or TPUs that shuttle data between memory and compute units, AIMC performs matrix-vector multiplications directly inside memory arrays. This design removes the von Neumann bottleneck and delivers massive improvements in throughput and power efficiency. Prior studies showed that combining AIMC with 3D NVM and Mixture-of-Experts (MoE) architectures could, in principle, support trillion-parameter models on compact accelerators. That could make foundation-scale AI feasible on devices well beyond data-centers.

https://arxiv.org/pdf/2505.09663

What makes Analog In-Memory Computing (AIMC) so difficult to use in practice?

The biggest barrier is noise. AIMC computations suffer from device variability, DAC/ADC quantization, and runtime fluctuations that degrade model accuracy. Unlike quantization on GPUs—where errors are deterministic and manageable—analog noise is stochastic and unpredictable. Earlier research found ways to adapt small networks like CNNs and RNNs (<100M parameters) to tolerate such noise, but LLMs with billions of parameters consistently broke down under AIMC constraints.

How do Analog Foundation Models address the noise problem?

The IBM team introduces Analog Foundation Models, which integrate hardware-aware training to prepare LLMs for analog execution. Their pipeline uses:

Noise injection during training to simulate AIMC randomness.

Iterative weight clipping to stabilize distributions within device limits.

Learned static input/output quantization ranges aligned with real hardware constraints.

Distillation from pre-trained LLMs using 20B tokens of synthetic data.

These methods, implemented with AIHWKIT-Lightning, allow models like Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct to sustain performance comparable to weight-quantized 4-bit / activation 8-bit baselines under analog noise. In evaluations across reasoning and factual benchmarks, AFMs outperformed both quantization-aware training (QAT) and post-training quantization (SpinQuant).

Do these models work only for analog hardware?

No. An unexpected outcome is that AFMs also perform strongly on low-precision digital hardware. Because AFMs are trained to tolerate noise and clipping, they handle simple post-training round-to-nearest (RTN) quantization better than existing methods. This makes them useful not just for AIMC accelerators, but also for commodity digital inference hardware.

Can performance scale with more compute at inference time?

Yes. The researchers tested test-time compute scaling on the MATH-500 benchmark, generating multiple answers per query and selecting the best via a reward model. AFMs showed better scaling behavior than QAT models, with accuracy gaps shrinking as more inference compute was allocated. This is consistent with AIMC’s strengths—low-power, high-throughput inference rather than training.

https://arxiv.org/pdf/2505.09663

How does it impact Analog In-Memory Computing (AIMC) future?

The research team provides the first systematic demonstration that large LLMs can be adapted to AIMC hardware without catastrophic accuracy loss. While training AFMs is resource-heavy and reasoning tasks like GSM8K still show accuracy gaps, the results are a milestone. The combination of energy efficiency, robustness to noise, and cross-compatibility with digital hardware makes AFMs a promising direction for scaling foundation models beyond GPU limits.

Summary

The introduction of Analog Foundation Models marks a critical milestone for scaling LLMs beyond the limits of digital accelerators. By making models robust to the unpredictable noise of analog in-memory computing, the research team shows that AIMC can move from a theoretical promise to a practical platform. While training costs remain high and reasoning benchmarks still show gaps, this work establishes a path toward energy-efficient large scale models running on compact hardware, pushing foundation models closer to edge deployment

Check out the PAPER and GITHUB PAGE. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post IBM and ETH Zürich Researchers Unveil Analog Foundation Models to Tackle Noise in In-Memory AI Hardware appeared first on MarkTechPost.

LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and Wha …

What exactly is being measured when a judge LLM assigns a 1–5 (or pairwise) score?

Most “correctness/faithfulness/completeness” rubrics are project-specific. Without task-grounded definitions, a scalar score can drift from business outcomes (e.g., “useful marketing post” vs. “high completeness”). Surveys of LLM-as-a-judge (LAJ) note that rubric ambiguity and prompt template choices materially shift scores and human correlations.

How stable are judge decisions to prompt position and formatting?

Large controlled studies find position bias: identical candidates receive different preferences depending on order; list-wise and pairwise setups both show measurable drift (e.g., repetition stability, position consistency, preference fairness).

Work cataloging verbosity bias shows longer responses are often favored independent of quality; several reports also describe self-preference (judges prefer text closer to their own style/policy).

Do judge scores consistently match human judgments of factuality?

Empirical results are mixed. For summary factuality, one study reported low or inconsistent correlations with humans for strong models (GPT-4, PaLM-2), with only partial signal from GPT-3.5 on certain error types.

Conversely, domain-bounded setups (e.g., explanation quality for recommenders) have reported usable agreement with careful prompt design and ensembling across heterogeneous judges.

Taken together, correlation seems task- and setup-dependent, not a general guarantee.

How robust are judge LLMs to strategic manipulation?

LLM-as-a-Judge (LAJ) pipelines are attackable. Studies show universal and transferable prompt attacks can inflate assessment scores; defenses (template hardening, sanitization, re-tokenization filters) mitigate but do not eliminate susceptibility.

Newer evaluations differentiate content-author vs. system-prompt attacks and document degradation across several families (Gemma, Llama, GPT-4, Claude) under controlled perturbations.

Is pairwise preference safer than absolute scoring?

Preference learning often favors pairwise ranking, yet recent research finds protocol choice itself introduces artifacts: pairwise judges can be more vulnerable to distractors that generator models learn to exploit; absolute (pointwise) scores avoid order bias but suffer scale drift. Reliability therefore hinges on protocol, randomization, and controls rather than a single universally superior scheme.

Could “judging” encourage overconfident model behavior?

Recent reporting on evaluation incentives argues that test-centric scoring can reward guessing and penalize abstention, shaping models toward confident hallucinations; proposals suggest scoring schemes that explicitly value calibrated uncertainty. While this is a training-time concern, it feeds back into how evaluations are designed and interpreted.

Where do generic “judge” scores fall short for production systems?

When an application has deterministic sub-steps (retrieval, routing, ranking), component metrics offer crisp targets and regression tests. Common retrieval metrics include Precision@k, Recall@k, MRR, and nDCG; these are well-defined, auditable, and comparable across runs.

Industry guides emphasize separating retrieval and generation and aligning subsystem metrics with end goals, independent of any judge LLM.

If judge LLMs are fragile, what does “evaluation” look like in the wild?

Public engineering playbooks increasingly describe trace-first, outcome-linked evaluation: capture end-to-end traces (inputs, retrieved chunks, tool calls, prompts, responses) using OpenTelemetry GenAI semantic conventions and attach explicit outcome labels (resolved/unresolved, complaint/no-complaint). This supports longitudinal analysis, controlled experiments, and error clustering—regardless of whether any judge model is used for triage.

Tooling ecosystems (e.g., LangSmith and others) document trace/eval wiring and OTel interoperability; these are descriptions of current practice rather than endorsements of a particular vendor.

Are there domains where LLM-as-a-Judge (LAJ) seems comparatively reliable?

Some constrained tasks with tight rubrics and short outputs report better reproducibility, especially when ensembles of judges and human-anchored calibration sets are used. But cross-domain generalization remains limited, and bias/attack vectors persist.

Does LLM-as-a-Judge (LAJ) performance drift with content style, domain, or “polish”?

Beyond length and order, studies and news coverage indicate LLMs sometimes over-simplify or over-generalize scientific claims compared to domain experts—useful context when using LAJ to score technical material or safety-critical text.

Key Technical Observations

Biases are measurable (position, verbosity, self-preference) and can materially change rankings without content changes. Controls (randomization, de-biasing templates) reduce but do not eliminate effects.

Adversarial pressure matters: prompt-level attacks can systematically inflate scores; current defenses are partial.

Human agreement varies by task: factuality and long-form quality show mixed correlations; narrow domains with careful design and ensembling fare better.

Component metrics remain well-posed for deterministic steps (retrieval/routing), enabling precise regression tracking independent of judge LLMs.

Trace-based online evaluation described in industry literature (OTel GenAI) supports outcome-linked monitoring and experimentation.

Summary

In conclusion, this article does not argue against the existence of LLM-as-a-Judge but highlights the nuances, limitations, and ongoing debates around its reliability and robustness. The intention is not to dismiss its use but to frame open questions that need further exploration. Companies and research groups actively developing or deploying LLM-as-a-Judge (LAJ) pipelines are invited to share their perspectives, empirical findings, and mitigation strategies—adding valuable depth and balance to the broader conversation on evaluation in the GenAI era.

The post LLM-as-a-Judge: Where Do Its Signals Break, When Do They Hold, and What Should “Evaluation” Mean? appeared first on MarkTechPost.

An Internet of AI Agents? Coral Protocol Introduces Coral v1: An MCP-N …

Coral Protocol has released Coral v1 of its agent stack, aiming to standardize how developers discover, compose, and operate AI agents across heterogeneous frameworks. The release centers on an MCP-based runtime (Coral Server) that enables threaded, mention-addressed agent-to-agent messaging, a developer workflow (CLI + Studio) for orchestration and observability, and a public registry for agent discovery. Coral plans to pay-per-usage payouts on Solana as “coming soon,” not generally available.

What Coral v1 Actually Ships

For the first time, anyone can: → Publish AI agents on a marketplace where the world can discover them → Get paid for AI agents they create → Rent agents on demand to build AI startups 10x faster

Coral Server (runtime): Implements Model Context Protocol (MCP) primitives so agents can register, create threads, send messages, and mention other agents, enabling structured A2A coordination instead of brittle context splicing.

Coral CLI + Studio: Add remote/local agents, wire them into shared threads, and inspect thread/message telemetry for debugging and performance tuning.

Registry surface: A discovery layer to find and integrate agents. Monetization and hosted checkout are explicitly marked as “coming soon.”

Why Interoperability Matters

Agent frameworks (e.g., LangChain, CrewAI, custom stacks) don’t speak a common operational protocol, which blocks composition. Coral’s MCP threading model provides a common transport and addressing scheme, so specialized agents can coordinate without ad-hoc glue code or prompt concatenation. The Coral Protocol team emphasized on persistent threads and mention-based targeting to keep collaboration organized and low-overhead.

Reference Implementation: Anemoi on GAIA

Coral’s open implementation Anemoi demonstrates the semi-centralized pattern: a light planner + specialized workers communicating directly over Coral MCP threads. On GAIA, Anemoi reports 52.73% pass@3 using GPT-4.1-mini (planner) and GPT-4o (workers), surpassing a reproduced OWL setup at 43.63% under identical LLM/tooling. The arXiv paper and GitHub readme both document these numbers and the coordination loop (plan → execute → critique → refine).

The design reduces reliance on a single powerful planner, trims redundant token passing, and improves scalability/cost for long-horizon tasks—credible, benchmark-anchored evidence that structured A2A beats naive prompt chaining when planner capacity is limited.

Incentives and Marketplace Status

Coral positions a usage-based marketplace where agent authors can list agents with pricing metadata and get paid per call. As of this writing, the developer page clearly labels “Pay Per Usage / Get Paid Automatically” and “Hosted checkout” as coming soon—teams should avoid assuming GA for payouts until Coral updates availability.

Summary

Coral v1 contributes a standards-first interop runtime for multi-agent systems, plus practical tooling for discovery and observability. The Anemoi GAIA results provide empirical backing for the A2A, thread-based design under constrained planners. The marketplace narrative is compelling, but treat monetization as upcoming per Coral’s own site; build against the runtime/registry now and keep payments feature-flagged until GA.

Introducing Coral v1.For the first time, anyone can:→ Publish AI agents on a marketplace where the world can discover them→ Get paid for AI agents they create→ Rent agents on demand to build AI startups 10x fasterHere’s why this matters pic.twitter.com/viqc4d7ajC— Coral Protocol (@Coral_Protocol) September 20, 2025

The post An Internet of AI Agents? Coral Protocol Introduces Coral v1: An MCP-Native Runtime and Registry for Cross-Framework AI Agents appeared first on MarkTechPost.

A Coding Guide to End-to-End Robotics Learning with LeRobot: Training, …

In this tutorial, we walk step by step through using Hugging Face’s LeRobot library to train and evaluate a behavior-cloning policy on the PushT dataset. We begin by setting up the environment in Google Colab, installing the required dependencies, and loading the dataset through LeRobot’s unified API. We then design a compact visuomotor policy that combines a convolutional backbone with a small MLP head, allowing us to map image and state observations directly to robot actions. By training on a subset of the dataset for speed, we are able to quickly demonstrate how LeRobot enables reproducible, dataset-driven robot learning pipelines. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip -q install –upgrade lerobot torch torchvision timm imageio[ffmpeg]

import os, math, random, io, sys, json, pathlib, time
import torch, torch.nn as nn, torch.nn.functional as F
from torch.utils.data import DataLoader, Subset
from torchvision.utils import make_grid, save_image
import numpy as np
import imageio.v2 as imageio

try:
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
except Exception:
from lerobot.datasets.lerobot_dataset import LeRobotDataset

DEVICE = “cuda” if torch.cuda.is_available() else “cpu”
SEED = 42
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)

We begin by installing the required libraries and setting up our environment for training. We import all the essential modules, configure the dataset loader, and fix the random seed to ensure reproducibility. We also detect whether we are running on a GPU or CPU, allowing our experiments to run efficiently. Check out the FULL CODES here.

Copy CodeCopiedUse a different BrowserREPO_ID = “lerobot/pusht”
ds = LeRobotDataset(REPO_ID)
print(“Dataset length:”, len(ds))

s0 = ds[0]
keys = list(s0.keys())
print(“Sample keys:”, keys)

def key_with(prefixes):
for k in keys:
for p in prefixes:
if k.startswith(p): return k
return None

K_IMG = key_with([“observation.image”, “observation.images”, “observation.rgb”])
K_STATE = key_with([“observation.state”])
K_ACT = “action”
assert K_ACT in s0, f”No ‘action’ key found in sample. Found: {keys}”
print(“Using keys -> IMG:”, K_IMG, “STATE:”, K_STATE, “ACT:”, K_ACT)

We load the PushT dataset with LeRobot and inspect its structure. We check the available keys, identify which ones correspond to images, states, and actions, and map them for consistent access throughout our training pipeline. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass PushTWrapper(torch.utils.data.Dataset):
def __init__(self, base):
self.base = base
def __len__(self): return len(self.base)
def __getitem__(self, i):
x = self.base[i]
img = x[K_IMG]
if img.ndim == 4: img = img[-1]
img = img.float() / 255.0 if img.dtype==torch.uint8 else img.float()
state = x.get(K_STATE, torch.zeros(2))
state = state.float().reshape(-1)
act = x[K_ACT].float().reshape(-1)
if img.shape[-2:] != (96,96):
img = F.interpolate(img.unsqueeze(0), size=(96,96), mode=”bilinear”, align_corners=False)[0]
return {“image”: img, “state”: state, “action”: act}

wrapped = PushTWrapper(ds)
N = len(wrapped)
idx = list(range(N))
random.shuffle(idx)
n_train = int(0.9*N)
train_idx, val_idx = idx[:n_train], idx[n_train:]

train_ds = Subset(wrapped, train_idx[:12000])
val_ds = Subset(wrapped, val_idx[:2000])

BATCH = 128
train_loader = DataLoader(train_ds, batch_size=BATCH, shuffle=True, num_workers=2, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size=BATCH, shuffle=False, num_workers=2, pin_memory=True)

We wrap each sample so we consistently get a normalized 96×96 image, a flattened state, and an action, picking the last frame if a temporal stack is present. We then shuffle, split into train/val, and cap sizes for fast Colab runs. Finally, we create efficient DataLoaders with batching, shuffling, and pinned memory to keep training smooth. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass SmallBackbone(nn.Module):
def __init__(self, out=256):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(3, 32, 5, 2, 2), nn.ReLU(inplace=True),
nn.Conv2d(32, 64, 3, 2, 1), nn.ReLU(inplace=True),
nn.Conv2d(64,128, 3, 2, 1), nn.ReLU(inplace=True),
nn.Conv2d(128,128,3, 1, 1), nn.ReLU(inplace=True),
)
self.head = nn.Sequential(nn.AdaptiveAvgPool2d(1), nn.Flatten(), nn.Linear(128, out), nn.ReLU(inplace=True))
def forward(self, x): return self.head(self.conv(x))

class BCPolicy(nn.Module):
def __init__(self, img_dim=256, state_dim=2, hidden=256, act_dim=2):
super().__init__()
self.backbone = SmallBackbone(img_dim)
self.mlp = nn.Sequential(
nn.Linear(img_dim + state_dim, hidden), nn.ReLU(inplace=True),
nn.Linear(hidden, hidden//2), nn.ReLU(inplace=True),
nn.Linear(hidden//2, act_dim)
)
def forward(self, img, state):
z = self.backbone(img)
if state.ndim==1: state = state.unsqueeze(0)
z = torch.cat([z, state], dim=-1)
return self.mlp(z)

policy = BCPolicy().to(DEVICE)
opt = torch.optim.AdamW(policy.parameters(), lr=3e-4, weight_decay=1e-4)
scaler = torch.cuda.amp.GradScaler(enabled=(DEVICE==”cuda”))

@torch.no_grad()
def evaluate():
policy.eval()
mse, n = 0.0, 0
for batch in val_loader:
img = batch[“image”].to(DEVICE, non_blocking=True)
st = batch[“state”].to(DEVICE, non_blocking=True)
act = batch[“action”].to(DEVICE, non_blocking=True)
pred = policy(img, st)
mse += F.mse_loss(pred, act, reduction=”sum”).item()
n += act.numel()
return mse / n

def cosine_lr(step, total, base=3e-4, min_lr=3e-5):
if step>=total: return min_lr
cos = 0.5*(1+math.cos(math.pi*step/total))
return min_lr + (base-min_lr)*cos

EPOCHS = 4
steps_total = EPOCHS*len(train_loader)
step = 0
best = float(“inf”)
ckpt = “/content/lerobot_pusht_bc.pt”

for epoch in range(EPOCHS):
policy.train()
for batch in train_loader:
lr = cosine_lr(step, steps_total); step += 1
for g in opt.param_groups: g[“lr”] = lr

img = batch[“image”].to(DEVICE, non_blocking=True)
st = batch[“state”].to(DEVICE, non_blocking=True)
act = batch[“action”].to(DEVICE, non_blocking=True)

opt.zero_grad(set_to_none=True)
with torch.cuda.amp.autocast(enabled=(DEVICE==”cuda”)):
pred = policy(img, st)
loss = F.smooth_l1_loss(pred, act)
scaler.scale(loss).backward()
nn.utils.clip_grad_norm_(policy.parameters(), 1.0)
scaler.step(opt); scaler.update()

val_mse = evaluate()
print(f”Epoch {epoch+1}/{EPOCHS} | Val MSE: {val_mse:.6f}”)
if val_mse < best:
best = val_mse
torch.save({“state_dict”: policy.state_dict(), “val_mse”: best}, ckpt)

print(“Best Val MSE:”, best, “| Saved:”, ckpt)

We define a compact visuomotor policy: a CNN backbone extracts image features that we fuse with the robot state to predict 2-D actions. We train with AdamW, a cosine learning-rate schedule, mixed precision, and gradient clipping, while evaluating with MSE on the validation set. We checkpoint the best model by validation loss so we can reload the strongest policy later. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserpolicy.load_state_dict(torch.load(ckpt)[“state_dict”]); policy.eval()
os.makedirs(“/content/vis”, exist_ok=True)

def draw_arrow(imgCHW, action_xy, scale=40):
import PIL.Image, PIL.ImageDraw
C,H,W = imgCHW.shape
arr = (imgCHW.clamp(0,1).permute(1,2,0).cpu().numpy()*255).astype(np.uint8)
im = PIL.Image.fromarray(arr)
dr = PIL.ImageDraw.Draw(im)
cx, cy = W//2, H//2
dx, dy = float(action_xy[0])*scale, float(-action_xy[1])*scale
dr.line((cx, cy, cx+dx, cy+dy), width=3, fill=(0,255,0))
return np.array(im)

frames = []
with torch.no_grad():
for i in range(60):
b = wrapped[i]
img = b[“image”].unsqueeze(0).to(DEVICE)
st = b[“state”].unsqueeze(0).to(DEVICE)
pred = policy(img, st)[0].cpu()
frames.append(draw_arrow(b[“image”], pred))
video_path = “/content/vis/pusht_pred.mp4”
imageio.mimsave(video_path, frames, fps=10)
print(“Wrote”, video_path)

grid = make_grid(torch.stack([wrapped[i][“image”] for i in range(16)]), nrow=8)
save_image(grid, “/content/vis/grid.png”)
print(“Saved grid:”, “/content/vis/grid.png”)

We reload the best checkpoint and switch the policy to eval so we can visualize its behavior. We overlay predicted action arrows on frames, stitch them into a short MP4, and also save a quick image grid for a snapshot view of the dataset. This lets us confirm, at a glance, what actions our model outputs on real PushT observations.

In conclusion, we see how easily LeRobot integrates data handling, policy definition, and evaluation into a single framework. By training our lightweight policy and visualizing predicted actions on PushT frames, we confirm that the library gives us a practical entry point into robot learning without needing real-world hardware. We are now equipped to extend the pipeline to more advanced models, such as diffusion or ACT policies, to experiment with different datasets, and even to share our trained policies on the Hugging Face Hub.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post A Coding Guide to End-to-End Robotics Learning with LeRobot: Training, Evaluating, and Visualizing Behavior Cloning Policies on PushT appeared first on MarkTechPost.

Google’s Sensible Agent Reframes Augmented Reality (AR) Assistance a …

Sensible Agent is an AI research framework and prototype from Google that chooses both the action an augmented reality (AR) agent should take and the interaction modality to deliver/confirm it, conditioned on real-time multimodal context (e.g., whether hands are busy, ambient noise, social setting). Rather than treating “what to suggest” and “how to ask” as separate problems, it computes them jointly to minimize friction and social awkwardness in the wild.

https://research.google/pubs/sensible-agent-a-framework-for-unobtrusive-interaction-with-proactive-ar-agent/

What interaction failure modes is it targeting?

Voice-first prompting is brittle: it’s slow under time pressure, unusable with busy hands/eyes, and awkward in public. Sensible Agent’s core bet is that a high-quality suggestion delivered through the wrong channel is effectively noise. The framework explicitly models the joint decision of (a) what the agent proposes (recommend/guide/remind/automate) and (b) how it’s presented and confirmed (visual, audio, or both; inputs via head nod/shake/tilt, gaze dwell, finger poses, short-vocabulary speech, or non-lexical conversational sounds). By binding content selection to modality feasibility and social acceptability, the system aims to lower perceived effort while preserving utility.

How is the system architected at runtime?

A prototype on an Android-class XR headset implements a pipeline with three main stages. First, context parsing fuses egocentric imagery (vision-language inference for scene/activity/familiarity) with an ambient audio classifier (YAMNet) to detect conditions like noise or conversation. Second, a proactive query generator prompts a large multimodal model with few-shot exemplars to select the action, query structure (binary / multi-choice / icon-cue), and presentation modality. Third, the interaction layer enables only those input methods compatible with the sensed I/O availability, e.g., head nod for “yes” when whispering isn’t acceptable, or gaze dwell when hands are occupied.

Where do the few-shot policies come from—designer instinct or data?

The team seeded the policy space with two studies: an expert workshop (n=12) to enumerate when proactive help is useful and which micro-inputs are socially acceptable; and a context mapping study (n=40; 960 entries) across everyday scenarios (e.g., gym, grocery, museum, commuting, cooking) where participants specified desired agent actions and chose a preferred query type and modality given the context. These mappings ground the few-shot exemplars used at runtime, shifting the choice of “what+how” from ad-hoc heuristics to data-derived patterns (e.g., multi-choice in unfamiliar environments, binary under time pressure, icon + visual in socially sensitive settings).

What concrete interaction techniques does the prototype support?

For binary confirmations, the system recognizes head nod/shake; for multi-choice, a head-tilt scheme maps left/right/back to options 1/2/3. Finger-pose gestures support numeric selection and thumbs up/down; gaze dwell triggers visual buttons where raycast pointing would be fussy; short-vocabulary speech (e.g., “yes,” “no,” “one,” “two,” “three”) provides a minimal dictation path; and non-lexical conversational sounds (“mm-hm”) cover noisy or whisper-only contexts. Crucially, the pipeline only offers modalities that are feasible under current constraints (e.g., suppress audio prompts in quiet spaces; avoid gaze dwell if the user isn’t looking at the HUD).

https://research.google/pubs/sensible-agent-a-framework-for-unobtrusive-interaction-with-proactive-ar-agent/

Does the joint decision actually reduce interaction cost?

A preliminary within-subjects user study (n=10) comparing the framework to a voice-prompt baseline across AR and 360° VR reported lower perceived interaction effort and lower intrusiveness while maintaining usability and preference. This is a small sample typical of early HCI validation; it’s directional evidence rather than product-grade proof, but it aligns with the thesis that coupling intent and modality reduces overhead.

How does the audio side work, and why YAMNet?

YAMNet is a lightweight, MobileNet-v1–based audio event classifier trained on Google’s AudioSet, predicting 521 classes. In this context it’s a practical choice to detect rough ambient conditions—speech presence, music, crowd noise—fast enough to gate audio prompts or to bias toward visual/gesture interaction when speech would be awkward or unreliable. The model’s ubiquity in TensorFlow Hub and Edge guides makes it straightforward to deploy on device.

How can you integrate it into an existing AR or mobile assistant stack?

A minimal adoption plan looks like this: (1) instrument a lightweight context parser (VLM on egocentric frames + ambient audio tags) to produce a compact state; (2) build a few-shot table of context→(action, query type, modality) mappings from internal pilots or user studies; (3) prompt an LMM to emit both the “what” and the “how” at once; (4) expose only feasible input methods per state and keep confirmations binary by default; (5) log choices and outcomes for offline policy learning. The Sensible Agent artifacts show this is feasible in WebXR/Chrome on Android-class hardware, so migrating to a native HMD runtime or even a phone-based HUD is mostly an engineering exercise.

Summary

Sensible Agent operationalizes proactive AR as a coupled policy problem—selecting the action and the interaction modality in a single, context-conditioned decision—and validates the approach with a working WebXR prototype and small-N user study showing lower perceived interaction effort relative to a voice baseline. The framework’s contribution is not a product but a reproducible recipe: a dataset of context→(what/how) mappings, few-shot prompts to bind them at runtime, and low-effort input primitives that respect social and I/O constraints.

Check out the Paper and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Google’s Sensible Agent Reframes Augmented Reality (AR) Assistance as a Coupled “what+how” Decision—So What does that Change? appeared first on MarkTechPost.

Top Computer Vision CV Blogs & News Websites (2025)

Computer vision moved fast in 2025: new multimodal backbones, larger open datasets, and tighter model–systems integration. Practitioners need sources that publish rigorously, link code and benchmarks, and track deployment patterns—not marketing posts. This list prioritizes primary research hubs, lab blogs, and production-oriented engineering outlets with consistent update cadence. Use it to monitor SOTA shifts, grab reproducible code paths, and translate papers into deployable pipelines.

Google Research (AI Blog)

Primary source for advances from Google/DeepMind teams, including vision architectures (e.g., V-MoE) and periodic research year-in-review posts across CV and multimodal. Posts typically include method summaries, figures, and links to papers/code.

Marktechpost

Consistent reporting on new computer-vision models, datasets, and benchmarks with links to papers, code, and demos. Dedicated CV category plus frequent deep-dives (e.g., DINOv3 releases and analysis). Useful for staying on top of weekly research drops without wading through raw feeds.

AI at Meta

High-signal posts with preprints and open-source drops. Recent examples include DINOv3—scaled self-supervised backbones with SOTA across dense prediction tasks—which provide technical detail and artifacts.

NVIDIA Technical Blog

Production-oriented content on VLM-powered analytics, optimized inference, and GPU pipelines. Category feed for Computer Vision includes blueprints, SDK usage, and performance guidance relevant to enterprise deployments.

arXiv cs.CV — raw research firehose

The canonical preprint feed for CV. Use the recent or new views for daily updates; taxonomy confirms scope (image processing, pattern recognition, scene understanding). Best paired with RSS + custom filters.

CVF Open Access (CVPR/ICCV/ECCV)

Final versions of main-conference papers and workshops, searchable and citable. CVPR 2025 proceedings and workshop menus are already live, making this the authoritative archive post-acceptance.

BAIR Blog (UC Berkeley)

Occasional but deep posts on frontier topics (e.g., extremely large image modeling, robotics-vision crossovers). Good for conceptual clarity directly from authors.

Stanford Blog

Technical explainers and lab roundups (e.g., SAIL at CVPR 2025) with links to papers/talks. Useful to scan emerging directions across perception, generative models, and embodied vision.

Roboflow Blog

High-frequency, implementation-focused posts (labeling, training, deployment, apps, and trend reports). Strong for practitioners who need working pipelines and edge deployments.

Hugging Face Blog

Hands-on guides (VLMs, FiftyOne integrations) and ecosystem notes across Transformers, Diffusers, and timm; good for rapid prototyping and fine-tuning CV/VLM stacks.

PyTorch Blog

Change logs, APIs, and recipes affecting CV training/inference (Transforms V2, multi-weight support, FX feature extraction). Read when upgrading training stacks.

The post Top Computer Vision CV Blogs & News Websites (2025) appeared first on MarkTechPost.

Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit …

Qwen has released Qwen3-ASR-Toolkit, an MIT-licensed Python CLI that programmatically bypasses the Qwen3-ASR-Flash API’s 3-minute/10 MB per-request limit by performing VAD-aware chunking, parallel API calls, and automatic resampling/format normalization via FFmpeg. The result is stable, hour-scale transcription pipelines with configurable concurrency, context injection, and clean text post-processing. Python ≥3.8 prerequisite, Install with:

Copy CodeCopiedUse a different Browserpip install qwen3-asr-toolkit

What the toolkit adds on top of the API

Long-audio handling. The toolkit slices input using voice activity detection (VAD) at natural pauses, keeping each chunk under the API’s hard duration/size caps, then merges outputs in order.

Parallel throughput. A thread pool dispatches multiple chunks concurrently to DashScope endpoints, improving wall-clock latency for hour-long inputs. You control concurrency via -j/–num-threads.

Format & rate normalization. Any common audio/video container (MP4/MOV/MKV/MP3/WAV/M4A, etc.) is converted to the API’s required mono 16 kHz before submission. Requires FFmpeg installed on PATH.

Text cleanup & context. The tool includes post-processing to reduce repetitions/hallucinations and supports context injection to bias recognition toward domain terms; the underlying API also exposes language detection and inverse text normalization (ITN) toggles.

The official Qwen3-ASR-Flash API is single-turn and enforces ≤3 min duration and ≤10 MB payloads per call. That is reasonable for interactive requests but awkward for long media. The toolkit operationalizes best practices—VAD-aware segmentation + concurrent calls—so teams can batch large archives or live capture dumps without writing orchestration from scratch.

Quick start

Install prerequisites

Copy CodeCopiedUse a different Browser# System: FFmpeg must be available
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install -y ffmpeg

Install the CLI

Copy CodeCopiedUse a different Browserpip install qwen3-asr-toolkit

Configure credentials

Copy CodeCopiedUse a different Browser# International endpoint key
export DASHSCOPE_API_KEY=”sk-…”

Run

Copy CodeCopiedUse a different Browser# Basic: local video, default 4 threads
qwen3-asr -i “/path/to/lecture.mp4”

# Faster: raise parallelism and pass key explicitly (optional if env var set)
qwen3-asr -i “/path/to/podcast.wav” -j 8 -key “sk-…”

# Improve domain accuracy with context
qwen3-asr -i “/path/to/earnings_call.m4a”
-c “tickers, CFO name, product names, Q3 revenue guidance”

Arguments you’ll actually use:-i/–input-file (file path or http/https URL), -j/–num-threads, -c/–context, -key/–dashscope-api-key, -t/–tmp-dir, -s/–silence. Output is printed and saved as <input_basename>.txt.

Minimal pipeline architecture

Load local file or URL → 2) VAD to find silence boundaries → 3) Chunk under API caps → 4) Resample to 16 kHz mono → 5) Parallel submit to DashScope → 6) Aggregate segments in order → 7) Post-process text (dedupe, repetitions) → 8) Emit .txt transcript.

Summary

Qwen3-ASR-Toolkit turns Qwen3-ASR-Flash into a practical long-audio pipeline by combining VAD-based segmentation, FFmpeg normalization (mono/16 kHz), and parallel API dispatch under the 3-minute/10 MB caps. Teams get deterministic chunking, configurable throughput, and optional context/LID/ITN controls without custom orchestration. For production, pin the package version, verify region endpoints/keys, and tune thread count to your network and QPS—then pip install qwen3-asr-toolkit and ship.

Check out the GitHub Page for Codes. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Qwen3-ASR-Toolkit: An Advanced Open Source Python Command-Line Toolkit for Using the Qwen-ASR API Beyond the 3 Minutes/10 MB Limit appeared first on MarkTechPost.

Move your AI agents from proof of concept to production with Amazon Be …

Building an AI agent that can handle a real-life use case in production is a complex undertaking. Although creating a proof of concept demonstrates the potential, moving to production requires addressing scalability, security, observability, and operational concerns that don’t surface in development environments.
This post explores how Amazon Bedrock AgentCore helps you transition your agentic applications from experimental proof of concept to production-ready systems. We follow the journey of a customer support agent that evolves from a simple local prototype to a comprehensive, enterprise-grade solution capable of handling multiple concurrent users while maintaining security and performance standards.
Amazon Bedrock AgentCore is a comprehensive suite of services designed to help you build, deploy, and scale agentic AI applications. If you’re new to AgentCore, we recommend exploring our existing deep-dive posts on individual services: AgentCore Runtime for secure agent deployment and scaling, AgentCore Gateway for enterprise tool development, AgentCore Identity for securing agentic AI at scale, AgentCore Memory for building context-aware agents, AgentCore Code Interpreter for code execution, AgentCore Browser Tool for web interaction, and AgentCore Observability for transparency on your agent behavior. This post demonstrates how these services work together in a real-world scenario.
The customer support agent journey
Customer support represents one of the most common and compelling use cases for agentic AI. Modern businesses handle thousands of customer inquiries daily, ranging from simple policy questions to complex technical troubleshooting. Traditional approaches often fall short: rule-based chatbots frustrate customers with rigid responses, and human-only support teams struggle with scalability and consistency. An intelligent customer support agent needs to seamlessly handle diverse scenarios: managing customer orders and accounts, looking up return policies, searching product catalogs, troubleshooting technical issues through web research, and remembering customer preferences across multiple interactions. Most importantly, it must do all this while maintaining the security and reliability standards expected in enterprise environments. Consider the typical evolution path many organizations follow when building such agents:

The proof of concept stage – Teams start with a simple local prototype that demonstrates core capabilities, such as a basic agent that can answer policy questions and search for products. This works well for demos but lacks the robustness needed for real customer interactions.
The reality check – As soon as you try to scale beyond a few test users, challenges emerge. The agent forgets previous conversations, tools become unreliable under load, there’s no way to monitor performance, and security becomes a paramount concern.
The production challenge – Moving to production requires addressing session management, secure tool sharing, observability, authentication, and building interfaces that customers actually want to use. Many promising proofs of concept stall at this stage due to the complexity of these requirements.

In this post, we address each challenge systematically. We start with a prototype agent equipped with three essential tools: return policy lookup, product information search, and web search for troubleshooting. From there, we add the capabilities needed for production deployment: persistent memory for conversation continuity and a hyper-personalized experience, centralized tool management for reliability and security, full observability for monitoring and debugging, and finally a customer-facing web interface. This progression mirrors the real-world path from proof of concept to production, demonstrating how Amazon Bedrock AgentCore services work together to solve the operational challenges that emerge as your agentic applications mature. For simplification and demonstration purposes, we consider a single-agent architecture. In real-life use cases, customer support agents are often created as multi-agent architectures and those scenarios are also supported by Amazon Bedrock AgentCore services.
Solution overview
Every production system starts with a proof of concept, and our customer support agent is no exception. In this first phase, we build a functional prototype that demonstrates the core capabilities needed for customer support. In this case, we use Strands Agents, an open source agent framework, to build the proof of concept and Anthropic’s Claude 3.7 Sonnet on Amazon Bedrock as the large language model (LLM) powering our agent. For your application, you can use another agent framework and model of your choice.
Agents rely on tools to take actions and interact with live systems. Several tools are used in customer support agents, but to keep our example simple, we focus on three core capabilities to handle the most common customer inquiries:

Return policy lookup – Customers frequently ask about return windows, conditions, and processes. Our tool provides structured policy information based on product categories, covering everything from return timeframes to refund processing and shipping policies.
Product information retrieval – Technical specifications, warranty details, and compatibility information are essential for both pre-purchase questions and troubleshooting. This tool serves as a bridge to your product catalog, delivering formatted technical details that customers can understand.
Web search for troubleshooting – Complex technical issues often require the latest solutions or community-generated fixes not found in internal documentation. Web search capability allows the agent to access the web for current troubleshooting guides and technical solutions in real time.

The tools implementation and the end-to-end code for this use case are available in our GitHub repository. In this post, we focus on the main code that connects with Amazon Bedrock AgentCore, but you can follow the end-to-end journey in the repository.
Create the agent
With the tools available, let’s create the agent. The architecture for our proof of concept will look like the following diagram.

You can find the end-to-end code for this post on the GitHub repository. For simplicity, we show only the essential parts for our end-to-end code here:

from strands import Agent
from strands.models import BedrockModel

@tool
def get_return_policy(product_category: str) -> str:
    “””Get return policy information for a specific product category.”””
    # Returns structured policy info: windows, conditions, processes, refunds
    # check github for full code
    return {“return_window”: “10 days”, “conditions”: “”}
    
@tool  
def get_product_info(product_type: str) -> str:
    “””Get detailed technical specifications and information for electronics products.”””
    # Returns warranty, specs, features, compatibility details
    # check github for full code
    return {“product”: “ThinkPad X1 Carbon”, “info”: “ThinkPad X1 Carbon info”}
    
@tool
def web_search(keywords: str, region: str = “us-en”, max_results: int = 5) -> str:
    “””Search the web for updated troubleshooting information.”””
    # Provides access to current technical solutions and guides
    # check github for full code
    return “results from websearch”
    
# Initialize the Bedrock model
model = BedrockModel(
    model_id=”us.anthropic.claude-3-7-sonnet-20250219-v1:0″,
    temperature=0.3
)

# Create the customer support agent
agent = Agent(
    model=model,
    tools=[
        get_product_info,
        get_return_policy,
        web_search
    ],
    system_prompt=”””You are a helpful customer support assistant for an electronics company.
    Use the appropriate tools to provide accurate information and always offer additional help.”””
)

Test the proof of concept
When we test our prototype with realistic customer queries, the agent demonstrates the correct tool selection and interaction with real-world systems:

# Return policy inquiry
response = agent(“What’s the return policy for my ThinkPad X1 Carbon?”)
# Agent correctly uses get_return_policy with “laptops” category

# Technical troubleshooting  
response = agent(“My iPhone 14 heats up, how do I fix it?”)
# Agent uses web_search to find current troubleshooting solutions

The agent works well for these individual queries, correctly mapping laptop inquiries to return policy lookups and complex technical issues to web search, providing comprehensive and actionable responses.
The proof of concept reality check
Our proof of concept successfully demonstrates that an agent can handle diverse customer support scenarios using the right combination of tools and reasoning. The agent runs perfectly on your local machine and handles queries correctly. However, this is where the proof of concept gap becomes obvious. The tools are defined as local functions in your agent code, the agent responds quickly, and everything seems production-ready. But several critical limitations become apparent the moment you think beyond single-user testing:

Memory loss between sessions – If you restart your notebook or application, the agent completely forgets previous conversations. A customer who was discussing a laptop return yesterday would need to start from scratch today, re-explaining their entire situation. This isn’t just inconvenient—it’s a poor customer experience that breaks the conversational flow that makes AI agents valuable.
Single customer limitation – Your current agent can only handle one conversation at a time. If two customers try to use your support system simultaneously, their conversations would interfere with each other, or worse, one customer might see another’s conversation history. There’s no mechanism to maintain separate conversation context for different users.
Tools embedded in code – Your tools are defined directly in the agent code. This means:

You can’t reuse these tools across different agents (sales agent, technical support agent, and so on).
Updating a tool requires changing the agent code and redeploying everything.
Different teams can’t maintain different tools independently.

No production infrastructure – The agent runs locally with no consideration for scalability, security, monitoring, and reliability.

These fundamental architectural barriers can prevent real customer deployment. Agent building teams can take months to address these issues, which delays the time to value from their work and adds significant costs to the application. This is where Amazon Bedrock AgentCore services become essential. Rather than spending months building these production capabilities from scratch, Amazon Bedrock AgentCore provides managed services that address each gap systematically.
Let’s begin our journey to production by solving the memory problem first, transforming our agent from one that forgets every conversation into one that remembers customers across conversations and can hyper-personalize conversations using Amazon Bedrock AgentCore Memory.
Add persistent memory for hyper-personalized agents
The first major limitation we identified in our proof of concept was memory loss—our agent forgot everything between sessions, forcing customers to repeat their context every time. This “goldfish agent” behavior breaks the conversational experience that makes AI agents valuable in the first place.
Amazon Bedrock AgentCore Memory solves this by providing managed, persistent memory that operates on two complementary levels:

Short-term memory – Immediate conversation context and session-based information for continuity within interactions
Long-term memory – Persistent information extracted across multiple conversations, including customer preferences, facts, and behavioral patterns

After adding Amazon Bedrock AgentCore Memory to our customer support agent, our new architecture will look like the following diagram.

Install dependencies
Before we start, let’s install our dependencies: boto3, the AgentCore SDK, and the AgentCore Starter Toolkit SDK. Those will help us quickly add Amazon Bedrock AgentCore capabilities to our agent proof of concept. See the following code:

pip install boto3 bedrock-agentcore bedrock-agentcore-starter-toolkit

Create the memory resources
Amazon Bedrock AgentCore Memory uses configurable strategies to determine what information to extract and store. For our customer support use case, we use two complementary strategies:

USER_PREFERENCE – Automatically extracts and stores customer preferences like “prefers ThinkPad laptops,” “uses Linux,” or “plays competitive FPS games.” This enables personalized recommendations across conversations.
SEMANTIC – Captures factual information using vector embeddings, such as “customer has MacBook Pro order #MB-78432” or “reported overheating issues during video editing.” This provides relevant context for troubleshooting.

See the following code:

from bedrock_agentcore.memory import MemoryClient
from bedrock_agentcore.memory.constants import StrategyType

memory_client = MemoryClient(region_name=region)

strategies = [
    {
        StrategyType.USER_PREFERENCE.value: {
            “name”: “CustomerPreferences”,
            “description”: “Captures customer preferences and behavior”,
            “namespaces”: [“support/customer/{actorId}/preferences”],
        }
    },
    {
        StrategyType.SEMANTIC.value: {
            “name”: “CustomerSupportSemantic”,
            “description”: “Stores facts from conversations”,
            “namespaces”: [“support/customer/{actorId}/semantic”],
        }
    },
]

# Create memory resource with both strategies
response = memory_client.create_memory_and_wait(
    name=”CustomerSupportMemory”,
    description=”Customer support agent memory”,
    strategies=strategies,
    event_expiry_days=90,
)

Integrate with Strands Agents hooks
The key to making memory work seamlessly is automation—customers shouldn’t need to think about it, and agents shouldn’t require manual memory management. Strands Agents provides a powerful hook system that lets you intercept agent lifecycle events and handle memory operations automatically. The hook system enables both built-in components and user code to react to or modify agent behavior through strongly-typed event callbacks. For our use case, we create CustomerSupportMemoryHooks to retrieve the customer context and save the support interactions:

MessageAddedEvent hook – Triggered when customers send messages, this hook automatically retrieves relevant memory context and injects it into the query. The agent receives both the customer’s question and relevant historical context without manual intervention.
AfterInvocationEvent hook – Triggered after agent responses, this hook automatically saves the interaction to memory. The conversation becomes part of the customer’s persistent history immediately.

See the following code:

class CustomerSupportMemoryHooks(HookProvider):
    def retrieve_customer_context(self, event: MessageAddedEvent):
        “””Inject customer context before processing queries”””
        user_query = event.agent.messages[-1][“content”][0][“text”]
        
        # Retrieve relevant memories from both strategies
        all_context = []
        for context_type, namespace in self.namespaces.items():
            memories = self.client.retrieve_memories(
                memory_id=self.memory_id,
                namespace=namespace.format(actorId=self.actor_id),
                query=user_query,
                top_k=3,
            )
            # Format and add to context
            for memory in memories:
                if memory.get(“content”, {}).get(“text”):
                    all_context.append(f”[{context_type.upper()}] {memory[‘content’][‘text’]}”)
        
        # Inject context into the user query
        if all_context:
            context_text = “n”.join(all_context)
            original_text = event.agent.messages[-1][“content”][0][“text”]
            event.agent.messages[-1][“content”][0][“text”] = f”Customer Context:n{context_text}nn{original_text}”

    def save_support_interaction(self, event: AfterInvocationEvent):
        “””Save interactions after agent responses”””
        # Get last customer query and agent response check github for implementation
        customer_query = “This is a sample query”
        agent_response = “LLM gave a sample response”
        
        # Extract customer query and agent response
        # Save to memory for future retrieval
        self.client.create_event(
            memory_id=self.memory_id,
            actor_id=self.actor_id,
            session_id=self.session_id,
            messages=[(customer_query, “USER”), (agent_response, “ASSISTANT”)]
        )

In this code, we can see that our hooks are the ones interacting with Amazon Bedrock AgentCore Memory to save and retrieve memory events.
Integrate memory with the agent
Adding memory to our existing agent requires minimal code changes; you can simply instantiate the memory hooks and pass them to the agent constructor. The agent code then only needs to connect with the memory hooks to use the full power of Amazon Bedrock AgentCore Memory. We will create a new hook for each session, which will help us handle different customer interactions. See the following code:

# Create memory hooks for this customer session
memory_hooks = CustomerSupportMemoryHooks(
    memory_id=memory_id,
    client=memory_client,
    actor_id=customer_id,
    session_id=session_id
)

# Create agent with memory capabilities
agent = Agent(
    model=model,

    tools=[get_product_info, get_return_policy, web_search],
    system_prompt=SYSTEM_PROMPT
)

Test the memory in action
Let’s see how memory transforms the customer experience. When we invoke the agent, it uses the memory from previous interactions to show customer interests in gaming headphones, ThinkPad laptops, and MacBook thermal issues:

# Test personalized recommendations
response = agent(“Which headphones would you recommend?”)
# Agent remembers: “prefers low latency for competitive FPS games”
# Response includes gaming-focused recommendations

# Test preference recall
response = agent(“What is my preferred laptop brand?”)  
# Agent remembers: “prefers ThinkPad models” and “needs Linux compatibility”
# Response acknowledges ThinkPad preference and suggests compatible models

The transformation is immediately apparent. Instead of generic responses, the agent now provides personalized recommendations based on the customer’s stated preferences and past interactions. The customer doesn’t need to re-explain their gaming needs or Linux requirements—the agent already knows.
Benefits of Amazon Bedrock AgentCore Memory
With Amazon Bedrock AgentCore Memory integrated, our agent now delivers the following benefits:

Conversation continuity – Customers can pick up where they left off, even across different sessions or support channels
Personalized service – Recommendations and responses are tailored to individual preferences and past issues
Contextual troubleshooting – Access to previous problems and solutions enables more effective support
Seamless experience – Memory operations happen automatically without customer or agent intervention

However, we still have limitations to address. Our tools remain embedded in the agent code, preventing reuse across different support agents or teams. Security and access controls are minimal, and we still can’t handle multiple customers simultaneously in a production environment.
In the next section, we address these challenges by centralizing our tools using Amazon Bedrock AgentCore Gateway and implementing proper identity management with Amazon Bedrock AgentCore Identity, creating a scalable and secure foundation for our customer support system.
Centralize tools with Amazon Bedrock AgentCore Gateway and Amazon Bedrock AgentCore Identity
With memory solved, our next challenge is tool architecture. Currently, our tools are embedded directly in the agent code—a pattern that works for prototypes but creates significant problems at scale. When you need multiple agents (customer support, sales, technical support), each one duplicates the same tools, leading to extensive code, inconsistent behavior, and maintenance nightmares.
Amazon Bedrock AgentCore Gateway simplifies this process by centralizing tools into reusable, secure endpoints that agents can access. Combined with Amazon Bedrock AgentCore Identity for authentication, it creates an enterprise-grade tool sharing infrastructure.
We will now update our agent to use Amazon Bedrock AgentCore Gateway and Amazon Bedrock AgentCore Identity. The architecture will look like the following diagram.

In this case, we convert our web search tool to be used in the gateway and keep the return policy and get product information tools local to this agent. That is important because web search is a common capability that can be reused across different use cases in an organization, and return policy and production information are capabilities commonly associated with customer support services. With Amazon Bedrock AgentCore services, you can decide which capabilities to use and how to combine them. In this case, we also use two new tools that could have been developed by other teams: check warranty and get customer profile. Because those teams have already exposed those tools using AWS Lambda functions, we can use them as targets to our Amazon Bedrock AgentCore Gateway. Amazon Bedrock AgentCore Gateway can also support REST APIs as target. That means that if we have an OpenAPI specification or a Smithy model, we can also quickly expose our tools using Amazon Bedrock AgentCore Gateway.
Convert existing services to MCP
Amazon Bedrock AgentCore Gateway uses the Model Context Protocol (MCP) to standardize how agents access tools. Converting existing Lambda functions into MCP endpoints requires minimal changes—mainly adding tool schemas and handling the MCP context. To use this functionality, we convert our local tools to Lambda functions and create the tools schema definitions to make these functions discoverable by agents:

# Original Lambda function (simplified)
def web_search(keywords: str, region: str = “us-en”, max_results: int = 5) -> str:
    # web_search functionality
        
def lambda_handler(event, context):
    if get_tool_name(event) == “web_search”:
        query = get_named_parameter(event=event, name=”query”)
        
        search_result = web_search(keywords)
        return {“statusCode”: 200, “body”: search_result}

The following code is the tool schema definition:

{
        “name”: “web_search”,
        “description”: “Search the web for updated information using DuckDuckGo”,
        “inputSchema”: {
            “type”: “object”,
            “properties”: {
                “keywords”: {
                    “type”: “string”,
                    “description”: “The search query keywords”
                },
                “region”: {
                    “type”: “string”,
                    “description”: “The search region (e.g., us-en, uk-en, ru-ru)”
                },
                “max_results”: {
                    “type”: “integer”,
                    “description”: “The maximum number of results to return”
                }
            },
            “required”: [
                “keywords”
            ]
        }
    }

For demonstration purposes, we build a new Lambda function from scratch. In reality, organizations already have different functionalities available as REST services or Lambda functions, and this approach lets you expose existing enterprise services as agent tools without rebuilding them.
Configure security with Amazon Bedrock AgentCore Gateway and integrate with Amazon Bedrock AgentCore Identity
Amazon Bedrock AgentCore Gateway requires authentication for both inbound and outbound connections. Amazon Bedrock AgentCore Identity handles this through standard OAuth flows. After you set up an OAuth authorization configuration, you can create a new gateway and pass this configuration to it. See the following code:

# Create gateway with JWT-based authentication
auth_config = {
    “customJWTAuthorizer”: {
        “allowedClients”: [cognito_client_id],
        “discoveryUrl”: cognito_discovery_url
    }
}

gateway_response = gateway_client.create_gateway(
    name=”customersupport-gw”,
    roleArn=gateway_iam_role,
    protocolType=”MCP”,
    authorizerType=”CUSTOM_JWT”,
    authorizerConfiguration=auth_config,
    description=”Customer Support AgentCore Gateway”
)

For inbound authentication, agents must present valid JSON Web Token (JWT) tokens (from identity providers like Amazon Cognito, Okta, and EntraID) as a compact, self-contained standard for securely transmitting information between parties to access Amazon Bedrock AgentCore Gateway tools.
For outbound authentication, Amazon Bedrock AgentCore Gateway can authenticate to downstream services using AWS Identity and Access Management (IAM) roles, API keys, or OAuth tokens.
For demonstration purposes, we have created an Amazon Cognito user pool with a dummy user name and password. For your use case, you should set a proper identity provider and manage the users accordingly. This configure makes sure only authorized agents can access specific tools and a full audit trail is provided.
Add Lambda targets
After you set up Amazon Bedrock AgentCore Gateway, adding Lambda functions as tool targets is straightforward:

lambda_target_config = {
    “mcp”: {
        “lambda”: {
            “lambdaArn”: lambda_function_arn,
            “toolSchema”: {“inlinePayload”: api_spec},
        }
    }
}

gateway_client.create_gateway_target(
    gatewayIdentifier=gateway_id,
    name=”LambdaTools”,
    targetConfiguration=lambda_target_config,
    credentialProviderConfigurations=[{
        “credentialProviderType”: “GATEWAY_IAM_ROLE”
    }]
)

The gateway now exposes your Lambda functions as MCP tools that authorized agents can discover and use.
Integrate MCP tools with Strands Agents
Converting our agent to use centralized tools requires updating the tool configuration. We keep some tools local, such as product info and return policies specific to customer support that will likely not be reused in other use cases, and use centralized tools for shared capabilities. Because Strands Agents has a native integration for MCP tools, we can simply use the MCPClient from Strands with a streamablehttp_client. See the following code:

# Get OAuth token for gateway access
gateway_access_token = get_token(
    client_id=cognito_client_id,
    client_secret=cognito_client_secret,
    scope=auth_scope,
    url=token_url
)

# Create authenticated MCP client
mcp_client = MCPClient(
    lambda: streamablehttp_client(
        gateway_url,
        headers={“Authorization”: f”Bearer {gateway_access_token[‘access_token’]}”}
    )
)

# Combine local and MCP tools
tools = [
    get_product_info,     # Local tool (customer support specific)
    get_return_policy,    # Local tool (customer support specific)
] + mcp_client.list_tools_sync()  # Centralized tools from gateway

agent = Agent(
    model=model,
    tools=tools,
    hooks=[memory_hooks],
    system_prompt=SYSTEM_PROMPT
)

Test the enhanced agent
With the centralized tools integrated, our agent now has access to enterprise capabilities like warranty checking:

# Test web search using centralized tool  
response = agent(“How can I fix Lenovo ThinkPad with a blue screen?”)
# Agent uses web_search from AgentCore Gateway

The agent seamlessly combines local tools with centralized ones, providing comprehensive support capabilities while maintaining security and access control.
However, we still have a significant limitation: our entire agent runs locally on our development machine. For production deployment, we need scalable infrastructure, comprehensive observability, and the ability to handle multiple concurrent users.
In the next section, we address this by deploying our agent to Amazon Bedrock AgentCore Runtime, transforming our local prototype into a production-ready system with Amazon Bedrock AgentCore Observability and automatic scaling capabilities.
Deploy to production with Amazon Bedrock AgentCore Runtime
With the tools centralized and secured, our final major hurdle is production deployment. Our agent currently runs locally on your laptop, which is ideal for experimentation but unsuitable for real customers. Production requires scalable infrastructure, comprehensive monitoring, automatic error recovery, and the ability to handle multiple concurrent users reliably.
Amazon Bedrock AgentCore Runtime transforms your local agent into a production-ready service with minimal code changes. Combined with Amazon Bedrock AgentCore Observability, it provides enterprise-grade reliability, automatic scaling, and comprehensive monitoring capabilities that operations teams need to maintain agentic applications in production.
Our architecture will look like the following diagram.

Minimal code changes for production
Converting your local agent requires adding just four lines of code:

# Your existing agent code remains unchanged
model = BedrockModel(model_id=”us.anthropic.claude-3-7-sonnet-20250219-v1:0″)
memory_hooks = CustomerSupportMemoryHooks(memory_id, memory_client, actor_id, session_id)
agent = Agent(
    model=model,
    tools=[get_return_policy, get_product_info],
    system_prompt=SYSTEM_PROMPT,
    hooks=[memory_hooks]
)

def invoke(payload):
    user_input = payload.get(“prompt”, “”)
    response = agent(user_input)
    return response.message[“content”][0][“text”]

if __name__ == “__main__”:

BedrockAgentCoreApp automatically creates an HTTP server with the required /invocations and /ping endpoints, handles proper content types and response formats, manages error handling according to AWS standards, and provides the infrastructure bridge between your agent code and Amazon Bedrock AgentCore Runtime.
Secure production deployment
Production deployment requires proper authentication and access control. Amazon Bedrock AgentCore Runtime integrates with Amazon Bedrock AgentCore Identity to provide enterprise-grade security. Using the Bedrock AgentCore Starter Toolkit, we can deploy our application using three simple steps: configure, launch, and invoke.
During the configuration, a Docker file is created to guide the deployment of our agent. It contains information about the agent and its dependencies, the Amazon Bedrock AgentCore Identity configuration, and the Amazon Bedrock AgentCore Observability configuration to be used. During the launch step, AWS CodeBuild is used to run this Dockerfile and an Amazon Elastic Container Registry (Amazon ECR) repository is created to store the agent dependencies. The Amazon Bedrock AgentCore Runtime agent is then created, using the image of the ECR repository, and an endpoint is generated and used to invoke the agent in applications. If your agent is configured with OAuth authentication through Amazon Bedrock AgentCore Identity, like ours will be, you also need to pass the authentication token during the agent invocation step. The following diagram illustrates this process.

The code to configure and launch our agent on Amazon Bedrock AgentCore Runtime will look as follows:

from bedrock_agentcore_starter_toolkit import Runtime

# Configure secure deployment with Cognito authentication
agentcore_runtime = Runtime()

response = agentcore_runtime.configure(
    entrypoint=”lab_helpers/lab4_runtime.py”,
    execution_role=execution_role_arn,
    auto_create_ecr=True,
    requirements_file=”requirements.txt”,
    region=region,
    agent_name=”customer_support_agent”,
    authorizer_configuration={
        “customJWTAuthorizer”: {
            “allowedClients”: [cognito_client_id],
            “discoveryUrl”: cognito_discovery_url,
        }
    }
)

# Deploy to production
launch_result = agentcore_runtime.launch()

This configuration creates a secure endpoint that only accepts requests with valid JWT tokens from your identity provider (such as Amazon Cognito, Okta, or Entra). For our agent, we use a dummy setup with Amazon Cognito, but your application can use an identity provider of your choosing. The deployment process automatically builds your agent into a container, creates the necessary AWS infrastructure, and establishes monitoring and logging pipelines.
Session management and isolation
One of the most critical production features for agents is proper session management. Amazon Bedrock AgentCore Runtime automatically handles session isolation, making sure different customers’ conversations don’t interfere with each other:

# Customer 1 conversation
response1 = agentcore_runtime.invoke(
    {“prompt”: “My iPhone Bluetooth isn’t working. What should I do?”},
    bearer_token=auth_token,
    session_id=”session-customer-1″
)

# Customer 1 follow-up (maintains context)
response2 = agentcore_runtime.invoke(
    {“prompt”: “I’ve turned Bluetooth on and off but it still doesn’t work”},
    bearer_token=auth_token,
    session_id=”session-customer-1″  # Same session, context preserved
)

# Customer 2 conversation (completely separate)
response3 = agentcore_runtime.invoke(
    {“prompt”: “Still not working. What is going on?”},
    bearer_token=auth_token,
    session_id=”session-customer-2″  # Different session, no context
)

Customer 1’s follow-up maintains full context about their iPhone Bluetooth issue, whereas Customer 2’s message (in a different session) has no context and the agent appropriately asks for more information. This automatic session isolation is crucial for production customer support scenarios.
Comprehensive observability with Amazon Bedrock AgentCore Observability
Production agents need comprehensive monitoring to diagnose issues, optimize performance, and maintain reliability. Amazon Bedrock AgentCore Observability automatically instruments your agent code and sends telemetry data to Amazon CloudWatch, where you can analyze patterns and troubleshoot issues in real time. The observability data includes session-level tracking, so you can trace individual customer session interactions and understand exactly what happened during a support interaction. You can use Amazon Bedrock AgentCore Observability with an agent of your choice, hosted in Amazon Bedrock AgentCore Runtime or not. Because Amazon Bedrock AgentCore Runtime automatically integrates with Amazon Bedrock AgentCore Observability, we don’t need extra work to observe our agent.
With Amazon Bedrock AgentCore Runtime deployment, your agent is ready to be used in production. However, we still have one limitation: our agent is accessible only through SDK or API calls, requiring customers to write code or use technical tools to interact with it. For true customer-facing deployment, we need a user-friendly web interface that customers can access through their browsers.
In the following section, we demonstrate the complete journey by building a sample web application using Streamlit, providing an intuitive chat interface that can interact with our production-ready Amazon Bedrock AgentCore Runtime endpoint. The exposed endpoint maintains the security, scalability, and observability capabilities we’ve built throughout our journey from proof of concept to production. In a real-world scenario, you would integrate this endpoint with your existing customer-facing applications and UI frameworks.
Create a customer-facing UI
With our agent deployed to production, the final step is creating a customer-facing UI that customers can use to interface with the agent. Although SDK access works for developers, customers need an intuitive web interface for seamless support interactions.
To demonstrate a complete solution, we build a sample Streamlit-based web-application that connects to our production-ready Amazon Bedrock AgentCore Runtime endpoint. The frontend includes secure Amazon Cognito authentication, real-time streaming responses, persistent session management, and a clean chat interface. Although we use Streamlit for rapid-prototyping, enterprises would typically integrate the endpoint with their existing interface or preferred UI frameworks.
The end-to-end application (shown in the following diagram) maintains full conversation context across the sessions while providing the security, scalability, and observability capabilities that we built throughout this post. The result is a complete customer support agentic system that handles everything from initial authentication to complex multi-turn troubleshooting conversations, demonstrating how Amazon Bedrock AgentCore services transform prototypes into production-ready customer applications.

Conclusion
Our journey from prototype to production demonstrates how Amazon Bedrock AgentCore services address the traditional barriers to deploying enterprise-ready agentic applications. What started as a simple local customer support chatbot transformed into a comprehensive, production-grade system capable of serving multiple concurrent users with persistent memory, secure tool sharing, comprehensive observability, and an intuitive web interface—without months of custom infrastructure development.
The transformation required minimal code changes at each step, showcasing how Amazon Bedrock AgentCore services work together to solve the operational challenges that typically stall promising proofs of concept. Memory capabilities avoid the “goldfish agent” problem, centralized tool management through Amazon Bedrock AgentCore Gateway creates a reusable infrastructure that securely serves multiple use cases, Amazon Bedrock AgentCore Runtime provides enterprise-grade deployment with automatic scaling, and Amazon Bedrock AgentCore Observability delivers the monitoring capabilities operations teams need to maintain production systems.
The following video provides an overview of AgentCore capabilities.

Ready to build your own production-ready agent? Start with our complete end-to-end tutorial, where you can follow along with the exact code and configurations we’ve explored in this post. For additional use cases and implementation patterns, explore the broader GitHub repository, and dive deeper into service capabilities and best practices in the Amazon Bedrock AgentCore documentation.

About the authors
Maira Ladeira Tanke is a Tech Lead for Agentic AI at AWS, where she enables customers on their journey to develop autonomous AI systems. With over 10 years of experience in AI/ML, Maira partners with enterprise customers to accelerate the adoption of agentic applications using Amazon Bedrock AgentCore and Strands Agents, helping organizations harness the power of foundation models to drive innovation and business transformation. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Building AI agents is 5% AI and 100% software engineering

Production-grade agents live or die on data plumbing, controls, and observability—not on model choice. The doc-to-chat pipeline below maps the concrete layers and why they matter.

What is a “doc-to-chat” pipeline?

A doc-to-chat pipeline ingests enterprise documents, standardizes them, enforces governance, indexes embeddings alongside relational features, and serves retrieval + generation behind authenticated APIs with human-in-the-loop (HITL) checkpoints. It’s the reference architecture for agentic Q&A, copilots, and workflow automation where answers must respect permissions and be audit-ready. Production implementations are variations of RAG (retrieval-augmented generation) hardened with LLM guardrails, governance, and OpenTelemetry-backed tracing.

How do you integrate cleanly with the existing stack?

Use standard service boundaries (REST/JSON, gRPC) over a storage layer your org already trusts. For tables, Iceberg gives ACID, schema evolution, partition evolution, and snapshots—critical for reproducible retrieval and backfills. For vectors, use a system that coexists with SQL filters: pgvector collocates embeddings with business keys and ACL tags in PostgreSQL; dedicated engines like Milvus handle high-QPS ANN with disaggregated storage/compute. In practice, many teams run both: SQL+pgvector for transactional joins and Milvus for heavy retrieval.

Key properties

Iceberg tables: ACID, hidden partitioning, snapshot isolation; vendor support across warehouses.

pgvector: SQL + vector similarity in one query plan for precise joins and policy enforcement.

Milvus: layered, horizontally scalable architecture for large-scale similarity search.

How do agents, humans, and workflows coordinate on one “knowledge fabric”?

Production agents require explicit coordination points where humans approve, correct, or escalate. AWS A2I provides managed HITL loops (private workforces, flow definitions) and is a concrete blueprint for gating low-confidence outputs. Frameworks like LangGraph model these human checkpoints inside agent graphs so approvals are first-class steps in the DAG, not ad hoc callbacks. Use them to gate actions like publishing summaries, filing tickets, or committing code.

Pattern: LLM → confidence/guardrail checks → HITL gate → side-effects. Persist every artifact (prompt, retrieval set, decision) for auditability and future re-runs.

How is reliability enforced before anything reaches the model?

Treat reliability as layered defenses:

Language + content guardrails: Pre-validate inputs/outputs for safety and policy. Options span managed (Bedrock Guardrails) and OSS (NeMo Guardrails, Guardrails AI; Llama Guard). Independent comparisons and a position paper catalog the trade-offs.

PII detection/redaction: Run analyzers on both source docs and model I/O. Microsoft Presidio offers recognizers and masking, with explicit caveats to combine with additional controls.

Access control and lineage: Enforce row-/column-level ACLs and audit across catalogs (Unity Catalog) so retrieval respects permissions; unify lineage and access policies across workspaces.

Retrieval quality gates: Evaluate RAG with reference-free metrics (faithfulness, context precision/recall) using Ragas/related tooling; block or down-rank poor contexts.

How do you scale indexing and retrieval under real traffic?

Two axes matter: ingest throughput and query concurrency.

Ingest: Normalize at the lakehouse edge; write to Iceberg for versioned snapshots, then embed asynchronously. This enables deterministic rebuilds and point-in-time re-indexing.

Vector serving: Milvus’s shared-storage, disaggregated compute architecture supports horizontal scaling with independent failure domains; use HNSW/IVF/Flat hybrids and replica sets to balance recall/latency.

SQL + vector: Keep business joins server-side (pgvector), e.g., WHERE tenant_id = ? AND acl_tag @> … ORDER BY embedding <-> :q LIMIT k. This avoids N+1 trips and respects policies.

Chunking/embedding strategy: Tune chunk size/overlap and semantic boundaries; bad chunking is the silent killer of recall.

For structured+unstructured fusion, prefer hybrid retrieval (BM25 + ANN + reranker) and store structured features next to vectors to support filters and re-ranking features at query time.

How do you monitor beyond logs?

You need traces, metrics, and evaluations stitched together:

Distributed tracing: Emit OpenTelemetry spans across ingestion, retrieval, model calls, and tools; LangSmith natively ingests OTEL traces and interoperates with external APMs (Jaeger, Datadog, Elastic). This gives end-to-end timing, prompts, contexts, and costs per request.

LLM observability platforms: Compare options (LangSmith, Arize Phoenix, LangFuse, Datadog) by tracing, evals, cost tracking, and enterprise readiness. Independent roundups and matrixes are available.

Continuous evaluation: Schedule RAG evals (Ragas/DeepEval/MLflow) on canary sets and live traffic replays; track faithfulness and grounding drift over time.

Add schema profiling/mapping on ingestion to keep observability attached to data shape changes (e.g., new templates, table evolution) and to explain retrieval regressions when upstream sources shift.

Example: doc-to-chat reference flow (signals and gates)

Ingest: connectors → text extraction → normalization → Iceberg write (ACID, snapshots).

Govern: PII scan (Presidio) → redact/mask → catalog registration with ACL policies.

Index: embedding jobs → pgvector (policy-aware joins) and Milvus (high-QPS ANN).

Serve: REST/gRPC → hybrid retrieval → guardrails → LLM → tool use.

HITL: low-confidence paths route to A2I/LangGraph approval steps.

Observe: OTEL traces to LangSmith/APM + scheduled RAG evaluations.

Why “5% AI, 100% software engineering” is accurate in practice?

Most outages and trust failures in agent systems are not model regressions; they’re data quality, permissioning, retrieval decay, or missing telemetry. The controls above—ACID tables, ACL catalogs, PII guardrails, hybrid retrieval, OTEL traces, and human gates—determine whether the same base model is safe, fast, and credibly correct for your users. Invest in these first; swap models later if needed.

References:

https://iceberg.apache.org/docs/1.9.0/evolution/

https://iceberg.apache.org/docs/1.5.2/

https://docs.snowflake.com/en/user-guide/tables-iceberg

https://docs.dremio.com/current/developer/data-formats/apache-iceberg/

https://github.com/pgvector/pgvector

https://www.postgresql.org/about/news/pgvector-070-released-2852/

https://github.com/pgvector/pgvector-go

https://github.com/pgvector/pgvector-rust

https://github.com/pgvector/pgvector-java

https://milvus.io/docs/four_layers.md

https://milvus.io/docs/v2.3.x/architecture_overview.md

https://milvus.io/docs/v2.2.x/architecture.md

https://www.linkedin.com/posts/armand-ruiz_

https://docs.vespa.ai/en/tutorials/hybrid-search.html

https://www.elastic.co/what-is/hybrid-search

https://www.elastic.co/search-labs/blog/hybrid-search-elasticsearch

https://docs.cohere.com/reference/rerank

https://docs.cohere.com/docs/rerank

https://cohere.com/rerank

https://opentelemetry.io/docs/concepts/signals/traces/

https://opentelemetry.io/docs/specs/otel/logs/

https://docs.smith.langchain.com/evaluation

https://docs.smith.langchain.com/evaluation/concepts

https://docs.smith.langchain.com/reference/python/evaluation

https://docs.smith.langchain.com/observability

https://www.langchain.com/langsmith

https://arize.com/docs/phoenix

https://github.com/Arize-ai/phoenix

https://langfuse.com/docs/observability/get-started

https://langfuse.com/docs/observability/overview

https://docs.datadoghq.com/opentelemetry/

https://langchain-ai.github.io/langgraph/concepts/human_in_the_loop/

https://langchain-ai.github.io/langgraph/tutorials/get-started/4-human-in-the-loop/

https://docs.langchain.com/oss/python/langgraph/add-human-in-the-loop

https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-use-augmented-ai-a2i-human-review-loops.html

https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-start-human-loop.html

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-a2i-runtime.html

https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-monitor-humanloop-results.html

https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html

https://aws.amazon.com/bedrock/guardrails/

https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateGuardrail.html

https://docs.aws.amazon.com/bedrock/latest/userguide/agents-guardrail.html

https://docs.nvidia.com/nemo-guardrails/index.html

https://developer.nvidia.com/nemo-guardrails

https://github.com/NVIDIA/NeMo-Guardrails

https://docs.nvidia.com/nemo/guardrails/latest/user-guides/guardrails-library.html

https://guardrailsai.com/docs/

https://github.com/guardrails-ai/guardrails

https://guardrailsai.com/docs/getting_started/quickstart

https://guardrailsai.com/docs/getting_started/guardrails_server

https://pypi.org/project/guardrails-ai/

https://github.com/guardrails-ai/guardrails_pii

https://huggingface.co/meta-llama/Llama-Guard-4-12B

https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/

https://microsoft.github.io/presidio/

https://github.com/microsoft/presidio

https://github.com/microsoft/presidio-research

https://docs.databricks.com/aws/en/data-governance/unity-catalog/access-control

https://docs.databricks.com/aws/en/data-governance/unity-catalog/manage-privileges/

https://docs.databricks.com/aws/en/data-governance/unity-catalog/abac/

https://docs.ragas.io/

https://docs.ragas.io/en/stable/references/evaluate/

https://docs.ragas.io/en/latest/tutorials/rag/

https://python.langchain.com/docs/concepts/text_splitters/

https://python.langchain.com/api_reference/text_splitters/index.html

https://pypi.org/project/langchain-text-splitters/

https://milvus.io/docs/evaluation_with_deepeval.md

https://mlflow.org/docs/latest/genai/eval-monitor/

https://mlflow.org/docs/2.10.1/llms/rag/notebooks/mlflow-e2e-evaluation.html

The post Building AI agents is 5% AI and 100% software engineering appeared first on MarkTechPost.

MIT’s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Effici …

Table of contentsHardware Generation without TemplatesInput IR: Affine, Relation-Centric Semantics (Deconstruct)Front End: FU Graph + Memory Co-Design (Architect)Back End: Compile & Optimize to RTL (Compile & Optimize)OutcomeImportance for each segmentHow the “Compiler for AI Chips” Works—Step-by-Step ?Where It Lands in the Ecosystem?Summary

MIT researchers (Han Lab) introduced LEGO, a compiler-like framework that takes tensor workloads (e.g., GEMM, Conv2D, attention, MTTKRP) and automatically generates synthesizable RTL for spatial accelerators—no handwritten templates. LEGO’s front end expresses workloads and dataflows in a relation-centric affine representation, builds FU (functional unit) interconnects and on-chip memory layouts for reuse, and supports fusing multiple spatial dataflows in a single design. The back end lowers to a primitive-level graph and uses linear programming and graph transforms to insert pipeline registers, rewire broadcasts, extract reduction trees, and shrink area and power. Evaluated across foundation models and classic CNNs/Transformers, LEGO’s generated hardware shows 3.2× speedup and 2.4× energy efficiency over Gemmini under matched resources.

https://hanlab.mit.edu/projects/lego

Hardware Generation without Templates

Existing flows either: (1) analyze dataflows without generating hardware, or (2) generate RTL from hand-tuned templates with fixed topologies. These approaches restrict the architecture space and struggle with modern workloads that need to switch dataflows dynamically across layers/ops (e.g., conv vs. depthwise vs. attention). LEGO directly targets any dataflow and combinations, generating both architecture and RTL from a high-level description rather than configuring a few numeric parameters in a template.

https://hanlab.mit.edu/projects/lego

Input IR: Affine, Relation-Centric Semantics (Deconstruct)

LEGO models tensor programs as loop nests with three index classes: temporal (for-loops), spatial (par-for FUs), and computation (pre-tiling iteration domain). Two affine relations drive the compiler:

Data mapping fI→Df_{I→D}: maps computation indices to tensor indices.

Dataflow mapping fTS→If_{TS→I}: maps temporal/spatial indices to computation indices.

This affine-only representation eliminates modulo/division in the core analysis, making reuse detection and address generation a linear-algebra problem. LEGO also decouples control flow from dataflow (a vector c encodes control signal propagation/delay), enabling shared control across FUs and substantially reducing control logic overhead.

Front End: FU Graph + Memory Co-Design (Architect)

The main objectives is to maximize reuse and on-chip bandwidth while minimizing interconnect/mux overhead.

Interconnection synthesis. LEGO formulates reuse as solving linear systems over the affine relations to discover direct and delay (FIFO) connections between FUs. It then computes minimum-spanning arborescences (Chu-Liu/Edmonds) to keep only necessary edges (cost = FIFO depth). A BFS-based heuristic rewrites direct interconnects when multiple dataflows must co-exist, prioritizing chain reuse and nodes already fed by delay connections to cut muxes and data nodes.

Banked memory synthesis. Given the set of FUs that must read/write a tensor in the same cycle, LEGO computes bank counts per tensor dimension from the maximum index deltas (optionally dividing by GCD to reduce banks). It then instantiates data-distribution switches to route between banks and FUs, leaving FU-to-FU reuse to the interconnect.

Dataflow fusion. Interconnects for different spatial dataflows are combined into a single FU-level Architecture Description Graph (ADG); careful planning avoids naïve mux-heavy merges and yields up to ~20% energy gains compared to naïve fusion.

Back End: Compile & Optimize to RTL (Compile & Optimize)

The ADG is lowered to a Detailed Architecture Graph (DAG) of primitives (FIFOs, muxes, adders, address generators). LEGO applies several LP/graph passes:

Delay matching via LP. A linear program chooses output delays DvD_v to minimize inserted pipeline registers ∑(Dv−Du−Lv)⋅bitwidthsum (D_v-D_u-L_v)cdot text{bitwidth} across edges—meeting timing alignment with minimal storage.

Broadcast pin rewiring. A two-stage optimization (virtual cost shaping + MST-based rewiring among destinations) converts expensive broadcasts into forward chains, enabling register sharing and lower latency; a final LP re-balances delays.

Reduction tree extraction + pin reuse. Sequential adder chains become balanced trees; a 0-1 ILP remaps reducer inputs across dataflows so fewer physical pins are required (mux instead of add). This reduces both logic depth and register count.

These passes focus on the datapath, which dominates resources (e.g., FU-array registers ≈ 40% area, 60% power), and produce ~35% area savings versus naïve generation.

Outcome

Setup. LEGO is implemented in C++ with HiGHS as the LP solver and emits SpinalHDL→Verilog. Evaluation covers tensor kernels and end-to-end models (AlexNet, MobileNetV2, ResNet-50, EfficientNetV2, BERT, GPT-2, CoAtNet, DDPM, Stable Diffusion, LLaMA-7B). A single LEGO-MNICOC accelerator instance is used across models; a mapper picks per-layer tiling/dataflow. Gemmini is the main baseline under matched resources (256 MACs, 256 KB on-chip buffer, 128-bit bus @ 16 GB/s).

End-to-end speed/efficiency. LEGO achieves 3.2× speedup and 2.4× energy efficiency on average vs. Gemmini. Gains stem from: (i) a fast, accurate performance model guiding mapping; (ii) dynamic spatial dataflow switching enabled by generated interconnects (e.g., depthwise conv layers choose OH–OW–IC–OC). Both designs are bandwidth-bound on GPT-2.

Resource breakdown. Example SoC-style configuration shows FU array and NoC dominate area/power, with PPUs contributing ~2–5%. This supports the decision to aggressively optimize datapaths and control reuse.

Generative models. On a larger 1024-FU configuration, LEGO sustains >80% utilization for DDPM/Stable Diffusion; LLaMA-7B remains bandwidth-limited (expected for low operational intensity).

https://hanlab.mit.edu/projects/lego

Importance for each segment

For researchers: LEGO provides a mathematically grounded path from loop-nest specifications to spatial hardware with provable LP-based optimizations. It abstracts away low-level RTL and exposes meaningful levers (tiling, spatialization, reuse patterns) for systematic exploration.

For practitioners: It is effectively hardware-as-code. You can target arbitrary dataflows and fuse them in one accelerator, letting a compiler derive interconnects, buffers, and controllers while shrinking mux/FIFO overheads. This improves energy and supports multi-op pipelines without manual template redesign.

For product leaders: By lowering the barrier to custom silicon, LEGO enables task-tuned, power-efficient edge accelerators (wearables, IoT) that keep pace with fast-moving AI stacks—the silicon adapts to the model, not the other way around. End-to-end results against a state-of-the-art generator (Gemmini) quantify the upside.

How the “Compiler for AI Chips” Works—Step-by-Step?

Deconstruct (Affine IR). Write the tensor op as loop nests; supply affine f_{I→D} (data mapping), f_{TS→I} (dataflow), and control flow vector c. This specifies what to compute and how it is spatialized, without templates.

Architect (Graph Synthesis). Solve reuse equations → FU interconnects (direct/delay) → MST/heuristics for minimal edges and fused dataflows; compute banked memory and distribution switches to satisfy concurrent accesses without conflicts.

Compile & Optimize (LP + Graph Transforms). Lower to a primitive DAG; run delay-matching LP, broadcast rewiring (MST), reduction-tree extraction, and pin-reuse ILP; perform bit-width inference and optional power gating. These passes jointly deliver ~35% area and ~28% energy savings vs. naïve codegen.

Where It Lands in the Ecosystem?

Compared with analysis tools (Timeloop/MAESTRO) and template-bound generators (Gemmini, DNA, MAGNET), LEGO is template-free, supports any dataflow and their combinations, and emits synthesizable RTL. Results show comparable or better area/power versus expert handwritten accelerators under similar dataflows and technologies, while offering one-architecture-for-many-models deployment.

Summary

LEGO operationalizes hardware generation as compilation for tensor programs: an affine front end for reuse-aware interconnect/memory synthesis and an LP-powered back end for datapath minimization. The framework’s measured 3.2× performance and 2.4× energy gains over a leading open generator, plus ~35% area reductions from back-end optimizations, position it as a practical path to application-specific AI accelerators at the edge and beyond.

Check out the Paper and Project Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post MIT’s LEGO: A Compiler for AI Chips that Auto-Generates Fast, Efficient Spatial Accelerators appeared first on MarkTechPost.

Bringing AI Agents Into Any UI: The AG-UI Protocol for Real-Time, Stru …

AI agents are no longer just chatbots that spit out answers. They’re evolving into complex systems that can reason step by step, call APIs, update dashboards, and collaborate with humans in real time. But this raises a key question: how should agents talk to user interfaces?

Ad-hoc sockets and custom APIs can work for prototypes, but they don’t scale. Each project reinvents how to stream outputs, manage tool calls, or handle user corrections. That’s exactly the gap the AG-UI (Agent–User Interaction) Protocol aims to fill.

What AG-UI Brings to the Table

AG-UI is a streaming event protocol designed for agent-to-UI communication. Instead of returning a single blob of text, agents emit a continuous sequence of JSON events:

TEXT_MESSAGE_CONTENT for streaming responses token by token.

TOOL_CALL_START / ARGS / END for external function calls.

STATE_SNAPSHOT and STATE_DELTA for keeping UI state in sync with the backend.

Lifecycle events (RUN_STARTED, RUN_FINISHED) to frame each interaction.

All of this flows over standard transports like HTTP SSE or WebSockets, so developers don’t have to build custom protocols. The frontend subscribes once and can render partial results, update charts, and even send user corrections mid-run.

This design makes AG-UI more than a messaging layer—it’s a contract between agents and UIs. Backend frameworks can evolve, UIs can change, but as long as they speak AG-UI, everything stays interoperable.

First-Party and Partner Integrations

One reason AG-UI is gaining traction is its breadth of supported integrations. Instead of leaving developers to wire everything manually, many agent frameworks already ship with AG-UI support.

Mastra (TypeScript): Native AG-UI support with strong typing, ideal for finance and data-driven copilots.

LangGraph: AG-UI integrated into orchestration workflows so every node emits structured events.

CrewAI: Multi-agent coordination exposed to UIs via AG-UI, letting users follow and guide “agent crews.”

Agno: Full-stack multi-agent systems with AG-UI-ready backends for dashboards and ops tools.

LlamaIndex: Adds interactive data retrieval workflows with live evidence streaming to UIs.

Pydantic AI: Python SDK with AG-UI baked in, plus example apps like the AG-UI Dojo.

CopilotKit: Frontend toolkit offering React components that subscribe to AG-UI streams.

Other integrations are in progress—like AWS Bedrock Agents, Google ADK, and Cloudflare Agents—which will make AG-UI accessible on major cloud platforms. Language SDKs are also expanding: Kotlin support is complete, while .NET, Go, Rust, Nim, and Java are in development.

Real-World Use Cases

Healthcare, finance, and analytics teams use AG-UI to turn critical data streams into live, context-rich interfaces: clinicians see patient vitals update without page reloads, stock traders trigger a stock-analysis agent and watch results stream inline, and analysts view a LangGraph-powered dashboard that visualizes charting plans token by token as the agent reasons.

Beyond data display, AG-UI simplifies workflow automation. Common patterns—data migration, research summarization, form-filling—are reduced to a single SSE event stream instead of custom sockets or polling loops. Because agents emit only STATE_DELTA patches, the UI refreshes just the pieces that changed, cutting bandwidth and eliminating jarring reloads. The same mechanism powers 24/7 customer-support bots that show typing indicators, tool-call progress, and final answers within one chat window, keeping users engaged throughout the interaction.

For developers, the protocol enables code-assistants and multi-agent applications with minimal glue code. Experiences that mirror GitHub Copilot—real-time suggestions streaming into editors—are built by simply listening to AG-UI events. Frameworks such as LangGraph, CrewAI, and Mastra already emit the spec’s 16 event types, so teams can swap back-end agents while the front-end remains unchanged. This decoupling speeds prototyping across domains: tax software can show optimistic deduction estimates while validation runs in the background, and a CRM page can autofill client details as an agent returns structured data to a Svelte + Tailwind UI.

AG-UI Dojo

CopilotKit has also recently introduced AG-UI Dojo, a “learning-first” suite of minimal, runnable demos that teach and validate AG-UI integrations end-to-end. Each demo includes a live preview, code, and linked docs, covering six primitives needed for production agent UIs: agentic chat (streaming + tool hooks), human-in-the-loop planning, agentic and tool-based generative UI, shared state, and predictive state updates for real-time collaboration. Teams can use the Dojo as a checklist to troubleshoot event ordering, payload shape, and UI–agent state sync before shipping, reducing integration ambiguity and debugging time.

You can play around with the Dojo here, Dojo source code and more technical details on the Dojo are available in the blog

Roadmap and Community Contributions

The public roadmap shows where AG-UI is heading and where developers can plug in:

SDK maturity: Ongoing investment in TypeScript and Python SDKs, with expansion into more languages.

Debugging and developer tools: Better error handling, observability, and lifecycle event clarity.

Performance and transports: Work on large payload handling and alternative streaming transports beyond SSE/WS.

Sample apps and playgrounds: The AG-UI Dojo demonstrates building blocks for UIs and is expanding with more patterns.

On the contribution side, the community has added integrations, improved SDKs, expanded documentation, and built demos. Pull requests across frameworks like Mastra, LangGraph, and Pydantic AI have come from both maintainers and external contributors. This collaborative model ensures AG-UI is shaped by real developer needs, not just spec writers.

Summary

AG-UI is emerging as the default interaction protocol for agent UIs. It standardizes the messy middle ground between agents and frontends, making applications more responsive, transparent, and maintainable.

With first-party integrations across popular frameworks, community contributions shaping the roadmap, and tooling like the AG-UI Dojo lowering the barrier to entry, the ecosystem is maturing fast.

Launch AG-UI with a single command, choose your agent framework, and be prototyping in under five minutes.

Copy CodeCopiedUse a different Browsernpx create-ag-ui-app@latest
#then
<pick your agent framework> 

#For details and patterns, see the quickstart blog: go.copilotkit.ai/ag-ui-cli-blog.

FAQs

FAQ 1: What problem does AG-UI solve?

AG-UI standardizes how agents communicate with user interfaces. Instead of ad-hoc APIs, it defines a clear event protocol for streaming text, tool calls, state updates, and lifecycle signals—making interactive UIs easier to build and maintain.

FAQ 2: Which frameworks already support AG-UI?

AG-UI has first-party integrations with Mastra, LangGraph, CrewAI, Agno, LlamaIndex, and Pydantic AI. Partner integrations include CopilotKit on the frontend. Support for AWS Bedrock Agents, Google ADK, and additional languages like .NET, Go, and Rust is in progress.

FAQ 3: How does AG-UI differ from REST APIs?

REST works for single request–response tasks. AG-UI is designed for interactive agents—it supports streaming output, incremental updates, tool usage, and user input during a run, which REST cannot handle natively.

FAQ 4: What transports does AG-UI use?

By default, AG-UI runs over HTTP Server-Sent Events (SSE). It also supports WebSockets, and the roadmap includes exploration of alternative transports for high-performance or binary data use cases.

FAQ 5: How can developers get started with AG-UI?

You can install official SDKs (TypeScript, Python) or use supported frameworks like Mastra or Pydantic AI. The AG-UI Dojo provides working examples and UI building blocks to experiment with event streams.

Thanks to the CopilotKit team for the thought leadership/ Resources for this article. CopilotKit team has supported us in this content/article.

The post Bringing AI Agents Into Any UI: The AG-UI Protocol for Real-Time, Structured Agent–Frontend Streams appeared first on MarkTechPost.

Scale visual production using Stability AI Image Services in Amazon Be …

This post was written with Alex Gnibus of Stability AI.
Stability AI Image Services are now available in Amazon Bedrock, offering ready-to-use media editing capabilities delivered through the Amazon Bedrock API. These image editing tools expand on the capabilities of Stability AI’s Stable Diffusion 3.5 models (SD3.5) and Stable Image Core and Ultra models, which are already available in Amazon Bedrock and have set new standards in image generation.
The professional creative production process consists of multiple editing steps to get the exact output needed. With Stability AI Image Services in Amazon Bedrock, you can modify, enhance, and transform existing images without jumping between multiple systems or sending files to external services. Everything runs through the same Amazon Bedrock experience you’re already using. The business impact can be immediate for teams that produce visual content at scale.
In this post, we explore examples of how these tools enable precise creative control to accelerate professional-grade visual content.
Editing tools now available in Amazon Bedrock
Stability AI Image Services span 9 tools across two categories: Edit and Control. Each tool handles specific editing tasks that typically require specialized software or manual intervention.
Edit: Advanced capabilities for granular editing steps
The tools in the Edit category make complex editing tasks more accessible and efficient.
The suite begins with fundamental yet powerful retouching tools. The Erase Object tool, for example, removes unwanted elements from images while intelligently maintaining background consistency. The following animation showcases the Erase Object tool removing a mannequin from a product shot while preserving the background. The tool can transform a source image based on a mask image or derive the mask from the source image’s alpha channel.

The Remove Background tool automatically isolates subjects with precision. This enables the creation of clean, professional product listings with consistent backgrounds or a variety of lifestyle settings, which is a game changer for ecommerce.
The following example illustrates the removal of an image background, while preserving details of a furniture product in the foreground.

The Search and Recolor and Search and Replace tools target specific elements within images for modification. Search and Recolor changes object colors; for example, showing different colorways of a dress without new photoshoots. In the following illustration, Search and Recolor changes the color swatch on furniture.

Search and Replace can swap objects entirely, which is useful for updating seasonal elements in marketing materials or replacing products. The following is an application of Search and Replace for virtual try-on experiences.

The Inpaint tool intelligently modifies images by filling in or replacing specified areas with new content based on the content of a mask image.
Control: Structural and stylistic precision
This category of tools provides precise manipulation of image structure and style through three specialized tools.
The Sketch tool transforms sketch-style renderings into photorealistic concepts. Architecture firms might use this to convert conceptual drawings into realistic visualizations, and apparel brands to turn design sketches into product mockups. The tool helps accelerate the creative production process from initial concepts to final visual execution.
In this example, the Sketch tool transforms a building architecture drawing to help real estate developers visualize the concept against a cityscape.

In another example, the Sketch tool transforms a mannequin drawing into a photorealistic model shot.

The Structure tool maintains the structural elements of input images while allowing content modification. This tool helps preserve layouts, compositions, and spatial relationships while changing subjects or styles. Creative teams can use the Structure tool to recreate scenes with different subjects or render new characters while maintaining consistent framing.
The following example demonstrates the Structure tool transforming a workshop scene into a new scene while preserving the composition and spatial relationships.

The Style Guide and Style Transfer tools help marketing teams produce new images that align with brand style and guidelines. The Style Guide tool takes artistic styles and colors from a reference style image and generates new images based on text prompts.
In the following example, the Style Guide tool takes clues from a brand’s color palette and textures and generates new images matching brand identity.

The Style Transfer tool uses visual characteristics from reference images to transform existing images, while preserving the original composition. For example, a home decor retailer can transform product imagery from modern minimalist to traditional styles without new photography. Marketing teams could create seasonal variations by applying different visual styles to existing product catalogs.
Solution overview
To demonstrate Stability AI Image Services in Amazon Bedrock, let’s walk through an example using a Jupyter notebook found in the GitHub repo.
Prerequisites
To follow along, you must have the following prerequisites:

An AWS account.
AWS credentials configured for creating and accessing Amazon Bedrock and Amazon SageMaker AI resources.
An AWS Identity and Access Management (IAM) execution role for SageMaker AI, which has the AmazonSageMakerFullAccess and AmazonBedrockLimitedAccess AWS managed policies attached. For more details, see How to use SageMaker AI execution roles.
A SageMaker notebook instance.
Stability AI Image Services model access, which you can request through the Amazon Bedrock console. Refer to Access Amazon Bedrock foundation models for more details.

Create a SageMaker AI notebook instance
Complete the following steps to create a SageMaker AI notebook instance, which can be used to run the sample notebook:

On the SageMaker AI console, in the navigation pane, under Applications and IDEs, choose Notebooks.
Choose Create notebook instance.
For Notebook instance name, enter a name for your notebook instance (for example, ai-images-notebook-instance).
For Notebook Instance type, choose ml.t2.medium.
For Platform identifier, choose Amazon Linux 2.
For IAM role, choose either an existing IAM role, which has the AmazonSageMakerFullAccess and AmazonBedrockLimitedAccess policies attached, or choose Create a new role.
Note the name of the IAM role that you chose.
Leave other settings as default and choose Create notebook instance.

After a few minutes, SageMaker AI creates a notebook instance, and its status changes from Pending to InService.
Confirm the IAM role for the notebook instance has the necessary permissions
Complete the following steps to verify that the SageMaker AI execution role that you assigned to the notebook instance has the correct permissions:

On the IAM console, in the navigation pane, under Access management, choose Roles.
In the Roles search bar, enter the name of the SageMaker AI execution role that you used when creating the notebook instance.
Choose the IAM role.
Under Permissions policies, verify that the AWS managed policies AmazonSageMakerFullAccess and AmazonBedrockLimitedAccess are present.
(Optional) If either policy is missing, choose Add permissions, then choose Attach policies to attach the missing policy.

In the Other permissions policies search bar, enter the policy name.
Select the policy, then chose Add permissions.

Run the notebook
Complete the following steps to run the notebook:

On the SageMaker AI console, in the navigation pane, under Applications and IDEs, choose Notebooks.
Choose the newly created ai-images-notebook-instance notebook instance.
Wait for the notebook to be in InService status.
Choose the Open JupyterLab link to launch JupyterLab in a new browser tab.
On the Git menu, choose Clone a Repository.
Enter the URI https://github.com/aws-samples/stabilityai-sample-notebooks.git and select Include submodules and Download the repository.
Choose Clone.
On the File menu, choose Open from path.
Enter the following: stabilityai-sample-notebooks/stability-ai-image-services/stability-ai-image-services-sample-notebook.ipynb
Choose Open.
When prompted, choose the kernel conda_python3, then choose Select.
Run through each notebook cell to experience Stability AI Image Services in Amazon Bedrock.

Clean up
To avoid ongoing charges, stop the ai-images-notebook-instance SageMaker AI notebook instance that you created in this walkthrough:

On the SageMaker AI console, in the navigation pane, under Applications and IDEs, choose Notebooks.
Choose the ai-images-notebook-instance SageMaker AI notebook instance that you created.
Choose Actions, then choose Stop.

After a few minutes, the notebook instance transitions from Stopping to Stopped status.

Choose Actions, then Delete.

After a few seconds, SageMaker AI deletes the notebook instance.
For more details, refer to Clean up Amazon SageMaker notebook instance resources.
Conclusion
The availability of Stability AI Image Services in Amazon Bedrock is an exciting step forward for visual content creation and manipulation, with particularly time-saving implications for professional creative teams at enterprises.
For example, in media and entertainment, creators can rapidly enhance scenes and create special effects, and marketing teams can generate multiple campaign variations effortlessly. Retail and ecommerce businesses can streamline product photography and digital catalog creation, and gaming developers can prototype environments more efficiently. Architecture firms can visualize design concepts instantly, and educational institutions can create more engaging visual content.
With these tools, businesses of different sizes can produce professional-grade, highly engaging visual content with efficiency and creativity. These tools can streamline operations, reduce costs, and open new creative possibilities, helping brands tell their stories more effectively and engage customers in more compelling ways.
To get started, check out Stability AI models in Amazon Bedrock and the AWS Samples GitHub repo.

About the authors
Alex Gnibus is a Product Marketing Manager at Stability AI, connecting the dots between cutting-edge research breakthroughs and practical use cases. With experience spanning from creative agencies to deep enterprise tech, Alex brings both technical expertise and an understanding of the challenges that professional creative teams can solve with generative AI.
Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area. She helps AWS Enterprise customers grow by understanding their goals and challenges and guiding them on how they can architect their applications in a cloud-based manner while making sure they are resilient and scalable. She’s passionate about machine learning technologies and environmental sustainability.
Fabio Branco is a Senior Customer Solutions Manager at Amazon Web Services (AWS) and strategic advisor helping customers achieve business transformation, drive innovation through generative AI and data solutions, and successfully navigate their cloud journeys. Prior to AWS, he held Product Management, Engineering, Consulting, and Technology Delivery roles across multiple Fortune 500 companies in industries, including retail and consumer goods, oil and gas, financial services, insurance, and aerospace and defense.
Suleman Patel is a Senior Solutions Architect at Amazon Web Services (AWS), with a special focus on machine learning and modernization. With expertise in both business and technology, Suleman helps customers design and build solutions that tackle real-world business problems. When he’s not immersed in his work, Suleman loves exploring the outdoors, taking road trips, and cooking up delicious dishes in the kitchen.

Prompting for precision with Stability AI Image Services in Amazon Bed …

Amazon Bedrock now offers Stability AI Image Services: 9 tools that improve how businesses create and modify images. The technology extends Stable Diffusion and Stable Image models to give you precise control over image creation and editing. Clear prompts are critical—they provide art direction to the AI system. Strong prompts control specific elements like tone, texture, lighting, and composition to create the desired visual outcomes. This capability serves professional needs across product photography, concept, and marketing campaigns.
In this post, we expand on the post Understanding prompt engineering: Unlock the creative potential of Stability AI models on AWS. We show how to effectively use advanced prompting techniques to maximize image generation quality and precision for enterprise application using Stability AI Image Services in Amazon Bedrock.
Solution overview
Stability AI Image Services are available as APIs in Amazon Bedrock, featuring capabilities such as, in-painting, style transfer, recoloring, background removal, object removal, style guide, and much more.
In the following sections, we first discuss prompt structure for maximum control of image generation, then we provide advanced techniques of prompting for stylistic guidance. Code samples can be found in the following GitHub repository.
Prerequisites
To get started with Stability AI Image Services in Amazon Bedrock, follow the instructions in Getting started with the API to complete the following prerequisites:

Set up your AWS account.
Acquire credentials to grant programmatic access.
Attach the Amazon Bedrock permission to an AWS Identity and Access Management (IAM) user or role.
Request access to the Amazon Bedrock models.

Structure prompts that maximize control
To maximize the granular capabilities of Stability AI Image Services in Amazon Bedrock, you must construct prompts that enable fine-grained control.
This section outlines best practices for building effective prompts that produce the desired output. We demonstrate how prompt structure affects results and why more structured prompts typically yield more consistent and controllable outcomes.
Choose the right prompt type for your use case
Selecting the right prompt format helps the model better understand your intent. Three primary prompt formats deliver different levels of control and readability:

Natural language maximizes readability and is best for general usage
Tag-based formats enable precise structural control and are ideal for technical application
Hybrid formats combine natural language and the structural elements of tags to provide even more control

The following table provides examples of these three common ways to phrase your prompts. Each prompt format has its strengths depending on your goal or the interface you’re using.

Prompt type
Prompt example
Generated image using Stable Image Ultra in Amazon Bedrock
Description and use case

Basic Prompt (Natural Language)
“A clean product photo of a perfume bottle on a marble countertop”

This is readable and intuitive. Great for exploration, conversational tools, and some model types. Stable Diffusion 3.5 responds best to this style.

Tag-Based Prompt
“perfume bottle, marble surface, soft light, high quality, product photo”

Used in many generation UIs or with models trained on datasets like LAION or Danbooru. Compact and good for stacking details.

Hybrid Prompt
“perfume bottle on marble counter, soft studio lighting, sharp focus, f/2.8lens”

Best of both worlds. Add emphasis with weighting syntax to influence the model’s priorities.

Build modular prompts
Modular prompting enhances AI image generation effectiveness. This approach divides prompts into distinct components, each specifying what to draw and how it should appear. Modular structures provide several benefits: they help prevent conflicting or confusing instructions, allow for precise output control, and simplify prompt debugging. By isolating individual elements, you can quickly identify and adjust effective or ineffective parts of your prompts. This method ultimately leads to more refined and targeted AI-generated images.
The following table provides examples of modular prompt modules. Experiment with different prompt sequences for your desired outcome; for example, placing the style before the subject will give it a more visual weight.

Module
Example
Description

Prefix
“fashion editorial portrait of”
Sets the tone and intent for a high-fashion styled portrait

Subject
“a woman with medium-brown skin and short coiled hair”
Gives the model’s look and surface detail to help guide facial features

Modifiers
“wearing an asymmetrical black mesh top, metallic jewelry”
Adds stylized clothing and accessories for visual interest

Action
“seated with her shoulders angled, eyes locked on camera, one arm lifted”
Describes body language and pose to give dynamic composition

Environment
“bathed in intersecting beams of hard directional light through window slats”
Adds context for dramatic light play and atmosphere

Style
“high-contrast chiaroscuro lighting, sculptural and abstract”
Informs the aesthetic and mood (shadow-driven, moody, architectural)

Camera/Lighting
“shot on 85mm, studio setup, layered shadows and light falling across face and body”
Adds technical precision and helps control realism and fidelity

The following example illustrates how to use a modular prompt to generate the desired output.

Modular Prompt
Generated Image Using Stable Image Ultra in Amazon Bedrock

“fashion editorial portrait of a woman with medium-brown skin and short coiled hair, wearing an asymmetrical black mesh top and metallic jewelry, seated with shoulders angled and one arm lifted, eyes locked on camera, bathed in intersecting beams of hard directional light through window slats, layered shadows and highlights sculpting her face and body, high-contrast chiaroscuro lighting, abstract and bold, shot on 85mm in studio”

Use negative prompts for polished output
Negative prompts improve AI output quality by removing specific visual elements. Explicitly defining what not to include in the prompt guides the model’s output, typically leading to professional outputs. Negative prompts act like a retoucher’s checklist used to address aspects of an image to enhance quality and appeal. For example, “No weird hands. No blurry corners. No cartoon filters. Definitely no watermarks.” Negative prompts result in clean, confident, compositions, free of distracting element and distortions.
The following table provides examples of additional tokens that can be used in negative prompts.

Artifact Type
Tokens to Use

Low quality or noise
blurry, lowres, jpeg artifacts, noisy

Anatomy or model issues
deformed, extra limbs, bad hands, missing fingers

Style clashes
cartoon, illustration, anime, painting

Technical errors
watermark, text, signature, overexposed

General cleanup
ugly, poorly drawn, distortion, worst quality

The following example illustrates how a well-structured negative prompt can enhance photorealism.

Without Negative Prompt
Prompt “(medium full shot) of (charming office cubicle) made of glass material, multiple colors, modern style, space-saving, upholstered seat, patina, gold trim, located in a modern garden, with sleek furniture, stylish decor, bright lighting, comfortable seating, Masterpiece, best quality, raw photo, realistic, very aesthetic, dark “

With Negative Prompt
Prompt “(medium full shot) of (charming office cubicle) made of glass material, multiple colors, modern style, space-saving, upholstered seat, patina, gold trim, located in a modern garden, with sleek furniture, stylish decor, bright lighting, comfortable seating, Masterpiece, best quality, raw photo, realistic, very aesthetic, dark” Negative Prompt “cartoon, 3d render, cgi, oversaturated, smooth plastic textures, unreal lighting, artificial, matte surface, painterly, dreamy, glossy finish, digital art, low detail background”

Emphasize or suppress elements with prompt weighting
Prompt weighting controls the influence of individual elements in AI image generation. These numerical weights prioritize specific prompt components over others. For example, to emphasize the character over the background, you can apply a 1.8 weight to “character” (character: 1.8) and 1.1 to “background” (background: 1.1), which makes sure the model prioritizes character detail while maintaining environmental context. This targeted emphasis produces more precise outputs by minimizing competition between prompt elements and clarifying the model’s priorities.
The syntax for prompt weights is (<term>:<weight>). You can also use a shorthand such as ((<term>)), where the number of parentheses represent the weight. Values between 0.0–1.0 deemphasize the term, and values between 1.1–2.0 emphasize the term.For example:

(term:1.2): Emphasize
(term:0.8): Deemphasize
((term)): Shorthand for (term:1.2)
(((((((((term)))))))): Shorthand for (term:1.8)

The following example shows how prompt weights contribute to the generated output.

Prompt with weights “editorial product photo of (a translucent gel moisturizer jar:1.4) placed on a (frosted glass pedestal:1.2), surrounded by (dewy pink flower petals:1.1), with soft (diffused lighting:1.3), subtle water droplets, shallow depth of field”

Prompt without weights “editorial product photo of a translucent gel moisturizer jar placed on a frosted glass pedestal, surrounded by dewy pink flower petals, with soft, subtle water droplets, shallow depth of field”

You can also use weights in negative prompts to reduce how strongly the model avoids something. For example, “(text:0.5), (blurry:0.2), (lowres:0.1).” This tells the model to be especially sure to avoid generating blurry text or low-resolution content.
Giving specific stylistic guidance
Effective prompt writing when using Stability AI Image Services such as Style Transfer and Style Guide requires a good understanding of style matching and reference-driven prompting. These techniques help provide clear stylistic direction for both text-to-image and image-to-image creation.
Image-to-image style transfer extracts stylistic elements from an input image (control image) and uses it to guide the creation of an output image based on the prompt. Approach writing the prompt as if you’re directing a professional photographer or stylist. Focus on materials, lighting quality, and artistic intention—not just objects. For example, a well-structured prompt might read: “Close-up editorial photo of a translucent green lip gloss tube on crushed iridescent plastic, diffused colored lighting, shallow DOF, high fashion product styling.”
Style tag layering: Known aesthetic labels that align with brand identity
The art of crafting effective prompts often relies on incorporating established style tags that resonate with familiar visual languages and datasets. By strategically blending terms from recognized aesthetic categories (ranging from editorial photography and analog film to anime, cyberpunk cityscapes, and brutalist structures), creators can guide the AI toward specific visual outcomes that align with their brand identity. These style descriptors serve as powerful anchors in the prompt engineering process. The versatility of these tags extends further through their ability to be combined and weighted, allowing for nuanced control over the final aesthetic. For instance, a skincare brand might blend the clean lines of product photography with dreamy, surreal elements, whereas a tech company could merge brutalist structure with cyberpunk elements for a distinctive visual identity. This approach to style mixing helps creators improve their outputs while maintaining clear ties to recognizable visual genres that resonate with their target audience. The key is understanding how these style tags interact and using their combinations to create unique, yet culturally relevant, visual expressions that serve specific creative or commercial objectives. The following table provides examples of prompts for a desired aesthetic.

Desired aesthetic
Prompt phrases
Example use case

Retro / Y2K
2000s nostalgia, flash photography, candy tones, harsh lighting
Metallic textures, thin fonts, early digital feel.

Clean modern
neutral tones, soft gradients, minimalist styling, editorial layout
Great for wellness or skincare products.

Bold streetwear
urban background, oversized fit, strong pose, midday shadow
Fashion photography and lifestyle ads. Prioritize outfit structure and location cues.

Hyperreal surrealism
dreamcore lighting, glossy textures, cinematic DOF, surreal shadows
Plays well in music, fashion, or alt-culture campaigns.

Invoke a named style as a reference
Some prompt structures benefit from invoking a named visual signature from a specific artist, especially when combined with your own stylistic phrasing or workflows, as shown in the following example.

Prompt “editorial studio portrait of a woman with glowing skin in minimalist glam makeup, high-contrast lighting, clean background, (depiction of Van Gogh style:1.3)”

The following is a more conceptual example.

Prompt “product shot of a silver hair oil bottle with soft reflections on curved chrome, (depiction of Wes Anderson style:1.2), under cold studio lighting”

These phrases function like calling on a genre; they imply choices around materials, lighting, layout, and color tonality.
Use reference images to guide style
Another useful technique is using a reference image to guide the pose, color, or composition of the output. For use cases like matching a pose from a lookbook image, transferring a color palette from a campaign still, or copying shadowplay from a photo shoot, you can extract and apply structure or style from reference images.
Stability AI Image Services support a variety of image-to-image workflows where you can use a reference image (control image) to guide the output, such as Structure, Sketch, and Style. Tools like ControlNet (a neural network architecture developed by Stability AI that enhances control), IP-Adapter (an image prompt adapter), or clip-based captioning also enable further control when paired with Stability AI models.
We will discuss ControlNet, IP-Adapter, and clip-based captioning in a subsequent post.
The following is an example of an image-to-image workflow:

Find a high-quality editorial reference.
Use it with a depth, canny, or seg ControlNet to lock a pose.
Style with a prompt.

Prompt “fashion editorial of a model in layered knitwear, dramatic colored lighting, strong shadows, high ISO texture”

Create the right mood with lighting control
In a prompt, lighting sets tone, adds dimensionality, and mimics the language of photography. It shouldn’t just be “bright vs. dark.” Lighting is often the style itself, especially for audiences like Gen Z, for instance TikTok, early-aughts flash, harsh backlight, and color gels. The following table provides some useful lighting style prompt terms.

Lighting style
Prompt terms
Example use case

High-contrast studio
hard directional light, deep shadows, controlled highlights
Beauty, tech, fashion with punchy visuals

Soft editorial
diffused light, soft shadows, ambient glow, overcast
Skincare, fashion, wellness

Colored gel lighting
blue and pink gel lighting, dramatic color shadows, rim lighting
Nightlife, music-adjacent fashion, youth-forward styling

Natural bounce
golden hour, soft natural light, sun flare, warm tones
Outdoors, lifestyle, brand-friendly minimalism

Build intent with posing and framing terms
Good posing helps products feel aspirational and digital models more dynamic. With AI, you must be intentional. Framing and pose cues help avoid stiffness, anatomical errors, and randomness. The following table provides some useful posing and framing prompt terms.

Prompt cue
Description
Tip

looking off camera
Creates candid or editorial energy
Useful for lookbooks or ad pages

hands in motion
Adds realism and fluidity
Avoids awkward, static body posture

seated with body turned
Adds depth and twist to the torso
Reduces symmetry, feels natural

shot from low angle
Power or status cue
Works well for stylized streetwear or product hero shots

Example: Putting it all together
The following example puts together what we’ve discussed in this post.

Prompt “studio portrait of a model with platinum hair in metallic cargo pants and a cropped mesh hoodie, seated with legs wide on (acrylic stairs:1.6), magenta and teal gel lighting from left and behind, dramatic contrast, shot on 50mm, streetwear editorial for Gen Z campaign” Negative prompt “blurry, extra limbs, watermark, cartoon, distorted face missing fingers, bad anatomy”

Let’s break down the preceding prompt. We direct the look of the subject (platinum hair, metallic clothes), specify their pose (seated wide-legged, confident, unposed), define the environment (acrylic stairs and studio setup, controlled, modern), state the lighting (mixed gel sources, bold stylization), designate the lens (50mm, portrait realism), and lastly detail the purpose (for Gen Z campaign, sets visual and cultural tone). Together, the prompt produces the desired result.
Best practices and troubleshooting
Prompting is rarely a one-and-done task, especially for creative use cases. Most great images come from refining an idea over multiple attempts. Consider the following methodology to iterate over your prompts:

Keep a prompt log
Change one variable at a time
Save seeds and base images
Use comparison grids

Sometimes things go wrong—maybe the model ignores your prompt, or the image looks messy. These issues are common and often quick to fix, and you can get sharper, cleaner, and more intentional outputs with every adjustment. The following table provides useful tips for troubleshooting your prompts.

Problem
Cause of issue
How to fix it

Style feels random
Model is confused or terms are vague
Clarify style, add weight, remove conflicts

Face gets warped
Over-styled or lacks facial cues
Add portrait of, headshot, or adjust pose or lighting

Image is too dark
Lighting not defined
Add softbox from left, natural light, or time of day

Repetitive poses
Same seed or static structure
Switch seed or change camera angle or subject action

Lacks realism or feels “AI-ish”
Wrong tone or artifacts
Add negatives like cartoon, digital texture, distorted

Conclusion
Mastering advanced prompting techniques can turn basic image generation into professional creative outputs. Stability AI Image Services in Amazon Bedrock provide precise control over visual creation and editing, helping businesses convert concepts into production-ready assets. The combination of technical expertise and creative intent can help creators achieve the precision and consistency required in professional settings. This control proves valuable across multiple applications, such as marketing campaigns, brand consistency, and product visualizations. This post demonstrated how to optimize Stability AI Image Services in Amazon Bedrock to produce high-quality imagery that aligns with your creative goals.
To implement these techniques, access Stability AI Image Services through Amazon Bedrock or explore Stability AI’s foundation models available in Amazon SageMaker JumpStart. You can also find practical code examples in our GitHub repository.

About the authors
Maxfield Hulker is the VP of Community and Business Development at Stability AI. He is a longtime leader in the generative AI space. He has helped build creator-focused platforms like Civitai and Dream Studio. Maxfield regularly publishes guides and tutorials to make advanced AI techniques more accessible.
Suleman Patel is a Senior Solutions Architect at Amazon Web Services (AWS), with a special focus on machine learning and modernization. Leveraging his expertise in both business and technology, Suleman helps customers design and build solutions that tackle real-world business problems. When he’s not immersed in his work, Suleman loves exploring the outdoors, taking road trips, and cooking up delicious dishes in the kitchen.
Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area working with generative AI model providers and helping customer optimize their generative AI workloads on AWS. She helps enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-based manner while supporting resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.
Fabio Branco is a Senior Customer Solutions Manager at Amazon Web Services (AWS) and a strategic advisor, helping customers achieve business transformation, drive innovation through generative AI and data solutions, and successfully navigate their cloud journeys. Prior to AWS, he held Product Management, Engineering, Consulting, and Technology Delivery roles across multiple Fortune 500 companies in industries, including retail and consumer goods, oil and gas, financial services, insurance, and aerospace and defense.