How to Design a Fully Interactive, Reactive, and Dynamic Terminal-Base …

In this tutorial, we build an advanced interactive dashboard using Textual, and we explore how terminal-first UI frameworks can feel as expressive and dynamic as modern web dashboards. As we write and run each snippet, we actively construct the interface piece by piece, widgets, layouts, reactive state, and event flows, so we can see how Textual behaves like a live UI engine right inside Google Colab. By the end, we notice how naturally we can blend tables, trees, forms, and progress indicators into a cohesive application that feels fast, clean, and responsive. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install textual textual-web nest-asyncio

from textual.app import App, ComposeResult
from textual.containers import Container, Horizontal, Vertical
from textual.widgets import (
Header, Footer, Button, DataTable, Static, Input,
Label, ProgressBar, Tree, Select
)
from textual.reactive import reactive
from textual import on
from datetime import datetime
import random

class StatsCard(Static):
value = reactive(0)

def __init__(self, title: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.title = title

def compose(self) -> ComposeResult:
yield Label(self.title)
yield Label(str(self.value), id=”stat-value”)

def watch_value(self, new_value: int) -> None:
if self.is_mounted:
try:
self.query_one(“#stat-value”, Label).update(str(new_value))
except Exception:
pass

We set up the environment and import all the necessary components to build our Textual application. As we define the StatsCard widget, we establish a reusable component that reacts to changes in value and updates itself automatically. We begin to see how Textual’s reactive system lets us create dynamic UI elements with minimal effort. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass DataDashboard(App):
CSS = “””
Screen { background: $surface; }
#main-container { height: 100%; padding: 1; }
#stats-row { height: auto; margin-bottom: 1; }
StatsCard { border: solid $primary; height: 5; padding: 1; margin-right: 1; width: 1fr; }
#stat-value { text-style: bold; color: $accent; content-align: center middle; }
#control-panel { height: 12; border: solid $secondary; padding: 1; margin-bottom: 1; }
#data-section { height: 1fr; }
#left-panel { width: 30; border: solid $secondary; padding: 1; margin-right: 1; }
DataTable { height: 100%; border: solid $primary; }
Input { margin: 1 0; }
Button { margin: 1 1 1 0; }
ProgressBar { margin: 1 0; }
“””

BINDINGS = [
(“d”, “toggle_dark”, “Toggle Dark Mode”),
(“q”, “quit”, “Quit”),
(“a”, “add_row”, “Add Row”),
(“c”, “clear_table”, “Clear Table”),
]

total_rows = reactive(0)
total_sales = reactive(0)
avg_rating = reactive(0.0)

We define the DataDashboard class and configure global styles, key bindings, and reactive attributes. We decide how the app should look and behave right from the top, giving us full control over themes and interactivity. This structure helps us create a polished dashboard without writing any HTML or JS. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser def compose(self) -> ComposeResult:
yield Header(show_clock=True)

with Container(id=”main-container”):
with Horizontal(id=”stats-row”):
yield StatsCard(“Total Rows”, id=”card-rows”)
yield StatsCard(“Total Sales”, id=”card-sales”)
yield StatsCard(“Avg Rating”, id=”card-rating”)

with Vertical(id=”control-panel”):
yield Input(placeholder=”Product Name”, id=”input-name”)
yield Select(
[(“Electronics”, “electronics”),
(“Books”, “books”),
(“Clothing”, “clothing”)],
prompt=”Select Category”,
id=”select-category”
)
with Horizontal():
yield Button(“Add Row”, variant=”primary”, id=”btn-add”)
yield Button(“Clear Table”, variant=”warning”, id=”btn-clear”)
yield Button(“Generate Data”, variant=”success”, id=”btn-generate”)
yield ProgressBar(total=100, id=”progress”)

with Horizontal(id=”data-section”):
with Container(id=”left-panel”):
yield Label(“Navigation”)
tree = Tree(“Dashboard”)
tree.root.expand()
products = tree.root.add(“Products”, expand=True)
products.add_leaf(“Electronics”)
products.add_leaf(“Books”)
products.add_leaf(“Clothing”)
tree.root.add_leaf(“Reports”)
tree.root.add_leaf(“Settings”)
yield tree

yield DataTable(id=”data-table”)

yield Footer()

We compose the entire UI layout, arranging containers, cards, form inputs, buttons, a navigation tree, and a data table. As we structure these components, we watch the interface take shape exactly the way we envision it. This snippet lets us design the visual skeleton of the dashboard in a clean, declarative manner. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser def on_mount(self) -> None:
table = self.query_one(DataTable)
table.add_columns(“ID”, “Product”, “Category”, “Price”, “Sales”, “Rating”)
table.cursor_type = “row”
self.generate_sample_data(5)
self.set_interval(0.1, self.update_progress)

def generate_sample_data(self, count: int = 5) -> None:
table = self.query_one(DataTable)
categories = [“Electronics”, “Books”, “Clothing”]
products = {
“Electronics”: [“Laptop”, “Phone”, “Tablet”, “Headphones”],
“Books”: [“Novel”, “Textbook”, “Magazine”, “Comic”],
“Clothing”: [“Shirt”, “Pants”, “Jacket”, “Shoes”]
}

for _ in range(count):
category = random.choice(categories)
product = random.choice(products[category])
row_id = self.total_rows + 1
price = round(random.uniform(10, 500), 2)
sales = random.randint(1, 100)
rating = round(random.uniform(1, 5), 1)

table.add_row(
str(row_id),
product,
category,
f”${price}”,
str(sales),
str(rating)
)

self.total_rows += 1
self.total_sales += sales

self.update_stats()

def update_stats(self) -> None:
self.query_one(“#card-rows”, StatsCard).value = self.total_rows
self.query_one(“#card-sales”, StatsCard).value = self.total_sales

if self.total_rows > 0:
table = self.query_one(DataTable)
total_rating = sum(float(row[5]) for row in table.rows)
self.avg_rating = round(total_rating / self.total_rows, 2)
self.query_one(“#card-rating”, StatsCard).value = self.avg_rating

def update_progress(self) -> None:
progress = self.query_one(ProgressBar)
progress.advance(1)
if progress.progress >= 100:
progress.progress = 0

We implement all the logic for generating data, computing statistics, animating progress, and updating cards. We see how quickly we can bind backend logic to frontend components using Textual’s reactive model. This step makes the dashboard feel alive as numbers update instantly and progress bars animate smoothly. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser @on(Button.Pressed, “#btn-add”)
def handle_add_button(self) -> None:
name_input = self.query_one(“#input-name”, Input)
category = self.query_one(“#select-category”, Select).value

if name_input.value and category:
table = self.query_one(DataTable)
row_id = self.total_rows + 1
price = round(random.uniform(10, 500), 2)
sales = random.randint(1, 100)
rating = round(random.uniform(1, 5), 1)

table.add_row(
str(row_id),
name_input.value,
str(category),
f”${price}”,
str(sales),
str(rating)
)

self.total_rows += 1
self.total_sales += sales
self.update_stats()
name_input.value = “”

@on(Button.Pressed, “#btn-clear”)
def handle_clear_button(self) -> None:
table = self.query_one(DataTable)
table.clear()
self.total_rows = 0
self.total_sales = 0
self.avg_rating = 0
self.update_stats()

@on(Button.Pressed, “#btn-generate”)
def handle_generate_button(self) -> None:
self.generate_sample_data(10)

def action_toggle_dark(self) -> None:
self.dark = not self.dark

def action_add_row(self) -> None:
self.handle_add_button()

def action_clear_table(self) -> None:
self.handle_clear_button()

if __name__ == “__main__”:
import nest_asyncio
nest_asyncio.apply()
app = DataDashboard()
app.run()

We connect UI events to backend actions using button handlers, keyboard shortcuts, and app-level functions. As we run the app, we interact with a fully functional dashboard that responds instantly to every click and command. This snippet completes the application and demonstrates how easily Textual enables us to build dynamic, state-driven UIs.

In conclusion, we see the whole dashboard come together in a fully functional, interactive form that runs directly from a notebook environment. We experience firsthand how Textual lets us design terminal UIs with the structure and feel of web apps, while staying entirely in Python. This tutorial leaves us confident that we can extend this foundation, even adding charts, API feeds, and multi-page navigation, as we continue to experiment with Textual’s modern reactive UI capabilities.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Design a Fully Interactive, Reactive, and Dynamic Terminal-Based Data Dashboard Using Textual? appeared first on MarkTechPost.

OpenAI Researchers Train Weight Sparse Transformers to Expose Interpre …

If neural networks are now making decisions everywhere from code editors to safety systems, how can we actually see the specific circuits inside that drive each behavior? OpenAI has introduced a new mechanistic interpretability research study that trains language models to use sparse internal wiring, so that model behavior can be explained using small, explicit circuits.

https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf

Training transformers to be weight sparse

Most transformer language models are dense. Each neuron reads from and writes to many residual channels, and features are often in superposition. This makes circuit level analysis difficult. Previous OpenAI work tried to learn sparse feature bases on top of dense models using sparse autoencoders. The new research work instead changes the base model so that the transformer itself is weight sparse.

The OpenAI team trains decoder only transformers with an architecture similar to GPT 2. After each optimizer step with AdamW optimizer, they enforce a fixed sparsity level on every weight matrix and bias, including token embeddings. Only the largest magnitude entries in each matrix are kept. The rest are set to zero. Over training, an annealing schedule gradually drives the fraction of non zero parameters down until the model reaches a target sparsity.

In the most extreme setting, roughly 1 in 1000 weights is non zero. Activations are also somewhat sparse. Around 1 in 4 activations are non zero at a typical node location. The effective connectivity graph is therefore very thin even when the model width is large. This encourages disentangled features that map cleanly onto the residual channels the circuit uses.

https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf

Measuring interpretability through task specific pruning

To quantify whether these models are easier to understand, OpenAI team does not rely on qualitative examples alone. The research team define a suite of simple algorithmic tasks based on Python next token prediction. One example, single_double_quote, requires the model to close a Python string with the right quote character. Another example, set_or_string, requires the model to choose between .add and += based on whether a variable was initialized as a set or a string.

For each task, they search for the smallest subnetwork, called a circuit, that can still perform the task up to a fixed loss threshold. The pruning is node based. A node is an MLP neuron at a specific layer, an attention head, or a residual stream channel at a specific layer. When a node is pruned, its activation is replaced by its mean over the pretraining distribution. This is mean ablation.

The search uses continuous mask parameters for each node and a Heaviside style gate, optimized with a straight through estimator like surrogate gradient. The complexity of a circuit is measured as the count of active edges between retained nodes. The main interpretability metric is the geometric mean of edge counts across all tasks.

Example circuits in sparse transformers

On the single_double_quote task, the sparse models yield a compact and fully interpretable circuit. In an early MLP layer, one neuron behaves as a quote detector that activates on both single and double quotes. A second neuron behaves as a quote type classifier that distinguishes the two quote types. Later, an attention head uses these signals to attend back to the opening quote position and copy its type to the closing position.

In circuit graph terms, the mechanism uses 5 residual channels, 2 MLP neurons in layer 0, and 1 attention head in a later layer with a single relevant query key channel and a single value channel. If the rest of the model is ablated, this subgraph still solves the task. If these few edges are removed, the model fails on the task. The circuit is therefore both sufficient and necessary in the operational sense defined by the paper.

https://cdn.openai.com/pdf/41df8f28-d4ef-43e9-aed2-823f9393e470/circuit-sparsity-paper.pdf

For more complex behaviors, such as type tracking of a variable named current inside a function body, the recovered circuits are larger and only partially understood. The research team show an example where one attention operation writes the variable name into the token set() at the definition, and another attention operation later copies the type information from that token back into a later use of current. This still yields a relatively small circuit graph.

Key Takeaways

Weight-sparse transformers by design: OpenAI trains GPT-2 style decoder only transformers so that almost all weights are zero, around 1 in 1000 weights is non zero, enforcing sparsity across all weights and biases including token embeddings, which yields thin connectivity graphs that are structurally easier to analyze.

Interpretability is measured as minimal circuit size: The work defines a benchmark of simple Python next token tasks and, for each task, searches for the smallest subnetwork, in terms of active edges between nodes, that still reaches a fixed loss, using node level pruning with mean ablation and a straight through estimator style mask optimization.

Concrete, fully reverse engineered circuits emerge: On tasks such as predicting matching quote characters, the sparse model yields a compact circuit with a few residual channels, 2 key MLP neurons and 1 attention head that the authors can fully reverse engineer and verify as both sufficient and necessary for the behavior.

Sparsity delivers much smaller circuits at fixed capability: At matched pre-training loss levels, weight sparse models require circuits that are roughly 16 times smaller than those recovered from dense baselines, defining a capability interpretability frontier where increased sparsity improves interpretability while slightly reducing raw capability.

Editorial Comments

OpenAI’s work on weight sparse transformers is a pragmatic step toward making mechanistic interpretability operational. By enforcing sparsity directly in the base model, the paper turns abstract discussions of circuits into concrete graphs with measurable edge counts, clear necessity and sufficiency tests, and reproducible benchmarks on Python next token tasks. The models are small and inefficient, but the methodology is relevant for future safety audits and debugging workflows. This research treats interpretability as a first class design constraint rather than an after the fact diagnostic.

Check out the Paper, GitHub Repo and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post OpenAI Researchers Train Weight Sparse Transformers to Expose Interpretable Circuits appeared first on MarkTechPost.

How to Design an Advanced Multi-Agent Reasoning System with spaCy Feat …

In this tutorial, we build an advanced Agentic AI system using spaCy, designed to allow multiple intelligent agents to reason, collaborate, reflect, and learn from experience. We work through the entire pipeline step by step, observing how each agent processes tasks using planning, memory, communication, and semantic reasoning. By the end, we see how the system evolves into a dynamic multi-agent architecture capable of extracting entities, interpreting context, forming reasoning chains, and constructing knowledge graphs, all while continuously improving through reflection and episodic learning. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install spacy networkx matplotlib -q

import spacy
from typing import List, Dict, Any, Optional, Tuple
from dataclasses import dataclass, field
from collections import defaultdict, deque
from enum import Enum
import json
import hashlib
from datetime import datetime

class MessageType(Enum):
REQUEST = “request”
RESPONSE = “response”
BROADCAST = “broadcast”
QUERY = “query”

@dataclass
class Message:
sender: str
receiver: str
msg_type: MessageType
content: Dict[str, Any]
timestamp: float = field(default_factory=lambda: datetime.now().timestamp())
priority: int = 1
def get_id(self) -> str:
return hashlib.md5(f”{self.sender}{self.timestamp}”.encode()).hexdigest()[:8]

@dataclass
class AgentTask:
task_id: str
task_type: str
data: Any
priority: int = 1
dependencies: List[str] = field(default_factory=list)
metadata: Dict = field(default_factory=dict)

@dataclass
class Observation:
state: str
action: str
result: Any
confidence: float
timestamp: float = field(default_factory=lambda: datetime.now().timestamp())

class WorkingMemory:
def __init__(self, capacity: int = 10):
self.capacity = capacity
self.items = deque(maxlen=capacity)
self.attention_scores = {}
def add(self, key: str, value: Any, attention: float = 1.0):
self.items.append((key, value))
self.attention_scores[key] = attention
def recall(self, n: int = 5) -> List[Tuple[str, Any]]:
sorted_items = sorted(self.items, key=lambda x: self.attention_scores.get(x[0], 0), reverse=True)
return sorted_items[:n]
def get(self, key: str) -> Optional[Any]:
for k, v in self.items:
if k == key:
return v
return None

class EpisodicMemory:
def __init__(self):
self.episodes = []
self.success_patterns = defaultdict(int)
def store(self, observation: Observation):
self.episodes.append(observation)
if observation.confidence > 0.7:
pattern = f”{observation.state}→{observation.action}”
self.success_patterns[pattern] += 1
def query_similar(self, state: str, top_k: int = 3) -> List[Observation]:
scored = [(obs, self._similarity(state, obs.state)) for obs in self.episodes[-50:]]
scored.sort(key=lambda x: x[1], reverse=True)
return [obs for obs, _ in scored[:top_k]]
def _similarity(self, state1: str, state2: str) -> float:
words1, words2 = set(state1.split()), set(state2.split())
if not words1 or not words2:
return 0.0
return len(words1 & words2) / len(words1 | words2)

We establish all the core structures required for our agentic system. We import key libraries, define message and task formats, and build both working and episodic memory modules. As we define these foundations, we lay the groundwork for reasoning, storage, and communication. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass ReflectionModule:
def __init__(self):
self.performance_log = []
def reflect(self, task_type: str, confidence: float, result: Any) -> Dict[str, Any]:
self.performance_log.append({‘task’: task_type, ‘confidence’: confidence, ‘timestamp’: datetime.now().timestamp()})
recent = [p for p in self.performance_log if p[‘task’] == task_type][-5:]
avg_conf = sum(p[‘confidence’] for p in recent) / len(recent) if recent else 0.5
insights = {
‘performance_trend’: ‘improving’ if confidence > avg_conf else ‘declining’,
‘avg_confidence’: avg_conf,
‘recommendation’: self._get_recommendation(confidence, avg_conf)
}
return insights
def _get_recommendation(self, current: float, average: float) -> str:
if current < 0.4:
return “Request assistance from specialized agent”
elif current < average:
return “Review similar past cases for patterns”
else:
return “Continue with current approach”

class AdvancedAgent:
def __init__(self, name: str, specialty: str, nlp):
self.name = name
self.specialty = specialty
self.nlp = nlp
self.working_memory = WorkingMemory()
self.episodic_memory = EpisodicMemory()
self.reflector = ReflectionModule()
self.message_queue = deque()
self.collaboration_graph = defaultdict(int)
def plan(self, task: AgentTask) -> List[str]:
similar = self.episodic_memory.query_similar(str(task.data))
if similar and similar[0].confidence > 0.7:
return [similar[0].action]
return self._default_plan(task)
def _default_plan(self, task: AgentTask) -> List[str]:
return [‘analyze’, ‘extract’, ‘validate’]
def send_message(self, receiver: str, msg_type: MessageType, content: Dict):
msg = Message(self.name, receiver, msg_type, content)
self.message_queue.append(msg)
return msg
def receive_message(self, message: Message):
self.message_queue.append(message)
self.collaboration_graph[message.sender] += 1
def process(self, task: AgentTask) -> Dict[str, Any]:
raise NotImplementedError

class CognitiveEntityAgent(AdvancedAgent):
def process(self, task: AgentTask) -> Dict[str, Any]:
doc = self.nlp(task.data)
entities = defaultdict(list)
entity_contexts = []
for ent in doc.ents:
context_start = max(0, ent.start – 5)
context_end = min(len(doc), ent.end + 5)
context = doc[context_start:context_end].text
entities[ent.label_].append(ent.text)
entity_contexts.append({‘entity’: ent.text, ‘type’: ent.label_, ‘context’: context, ‘position’: (ent.start_char, ent.end_char)})
for ent_type, ents in entities.items():
attention = len(ents) / len(doc.ents) if doc.ents else 0
self.working_memory.add(f”entities_{ent_type}”, ents, attention)
confidence = min(len(entities) / 4, 1.0) if entities else 0.3
obs = Observation(state=f”entity_extraction_{len(doc)}tokens”, action=”extract_with_context”, result=len(entity_contexts), confidence=confidence)
self.episodic_memory.store(obs)
reflection = self.reflector.reflect(‘entity_extraction’, confidence, entities)
return {‘entities’: dict(entities), ‘contexts’: entity_contexts, ‘confidence’: confidence, ‘reflection’: reflection, ‘next_actions’: [‘semantic_analysis’, ‘knowledge_graph’] if confidence > 0.5 else []}

We construct the reflection engine and the base agent class, which provides every agent with reasoning, planning, and memory capabilities. We then implement the Cognitive Entity Agent, which processes text to extract entities with context and stores meaningful observations. As we run this part, we watch the agent learn from experience while dynamically adjusting its strategy. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass SemanticReasoningAgent(AdvancedAgent):
def process(self, task: AgentTask) -> Dict[str, Any]:
doc = self.nlp(task.data)
reasoning_chains = []
for sent in doc.sents:
chain = self._extract_reasoning_chain(sent)
if chain:
reasoning_chains.append(chain)
entity_memory = self.working_memory.recall(3)
semantic_clusters = self._cluster_by_semantics(doc)
confidence = min(len(reasoning_chains) / 3, 1.0) if reasoning_chains else 0.4
obs = Observation(state=f”semantic_analysis_{len(list(doc.sents))}sents”, action=”reason_and_cluster”, result=len(reasoning_chains), confidence=confidence)
self.episodic_memory.store(obs)
return {‘reasoning_chains’: reasoning_chains, ‘semantic_clusters’: semantic_clusters, ‘memory_context’: entity_memory, ‘confidence’: confidence, ‘next_actions’: [‘knowledge_integration’]}
def _extract_reasoning_chain(self, sent) -> Optional[Dict]:
subj, verb, obj = None, None, None
for token in sent:
if token.dep_ == ‘nsubj’:
subj = token
elif token.pos_ == ‘VERB’:
verb = token
elif token.dep_ in [‘dobj’, ‘attr’, ‘pobj’]:
obj = token
if subj and verb and obj:
return {‘subject’: subj.text, ‘predicate’: verb.lemma_, ‘object’: obj.text, ‘confidence’: 0.8}
return None
def _cluster_by_semantics(self, doc) -> List[Dict]:
clusters = []
nouns = [token for token in doc if token.pos_ in [‘NOUN’, ‘PROPN’]]
visited = set()
for noun in nouns:
if noun.i in visited:
continue
cluster = [noun.text]
visited.add(noun.i)
for other in nouns:
if other.i != noun.i and other.i not in visited:
if noun.similarity(other) > 0.5:
cluster.append(other.text)
visited.add(other.i)
if len(cluster) > 1:
clusters.append({‘concepts’: cluster, ‘size’: len(cluster)})
return clusters

We design the Semantic Reasoning Agent, which analyzes sentence structures, forms reasoning chains, and groups concepts based on semantic similarity. We integrate working memory to enrich the understanding the agent builds. As we execute this, we see how the system moves from surface-level extraction to deeper inference. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserclass KnowledgeGraphAgent(AdvancedAgent):
def process(self, task: AgentTask) -> Dict[str, Any]:
doc = self.nlp(task.data)
graph = {‘nodes’: set(), ‘edges’: []}
for sent in doc.sents:
entities = list(sent.ents)
if len(entities) >= 2:
for ent in entities:
graph[‘nodes’].add((ent.text, ent.label_))
root = sent.root
if root.pos_ == ‘VERB’:
for i in range(len(entities) – 1):
graph[‘edges’].append({‘from’: entities[i].text, ‘relation’: root.lemma_, ‘to’: entities[i+1].text, ‘sentence’: sent.text[:100]})
graph[‘nodes’] = list(graph[‘nodes’])
confidence = min(len(graph[‘edges’]) / 5, 1.0) if graph[‘edges’] else 0.3
obs = Observation(state=f”knowledge_graph_{len(graph[‘nodes’])}nodes”, action=”construct_graph”, result=len(graph[‘edges’]), confidence=confidence)
self.episodic_memory.store(obs)
return {‘graph’: graph, ‘node_count’: len(graph[‘nodes’]), ‘edge_count’: len(graph[‘edges’]), ‘confidence’: confidence, ‘next_actions’: []}

class MetaController:
def __init__(self):
self.nlp = spacy.load(‘en_core_web_sm’)
self.agents = {
‘cognitive_entity’: CognitiveEntityAgent(‘CognitiveEntity’, ‘entity_analysis’, self.nlp),
‘semantic_reasoning’: SemanticReasoningAgent(‘SemanticReasoner’, ‘reasoning’, self.nlp),
‘knowledge_graph’: KnowledgeGraphAgent(‘KnowledgeBuilder’, ‘graph_construction’, self.nlp)
}
self.task_history = []
self.global_memory = WorkingMemory(capacity=20)
def execute_with_planning(self, text: str) -> Dict[str, Any]:
initial_task = AgentTask(task_id=”task_001″, task_type=”cognitive_entity”, data=text, metadata={‘source’: ‘user_input’})
results = {}
task_queue = [initial_task]
iterations = 0
max_iterations = 10
while task_queue and iterations < max_iterations:
task = task_queue.pop(0)
agent = self.agents.get(task.task_type)
if not agent or task.task_type in results:
continue
result = agent.process(task)
results[task.task_type] = result
self.global_memory.add(task.task_type, result, result[‘confidence’])
for next_action in result.get(‘next_actions’, []):
if next_action in self.agents and next_action not in results:
next_task = AgentTask(task_id=f”task_{iterations+1:03d}”, task_type=next_action, data=text, dependencies=[task.task_id])
task_queue.append(next_task)
iterations += 1
self.task_history.append({‘results’: results, ‘iterations’: iterations, ‘timestamp’: datetime.now().isoformat()})
return results
def generate_insights(self, results: Dict[str, Any]) -> str:
report = “=” * 70 + “n”
report += ” ADVANCED AGENTIC AI SYSTEM – ANALYSIS REPORTn”
report += “=” * 70 + “nn”
for agent_type, result in results.items():
agent = self.agents[agent_type]
report += f” {agent.name}n”
report += f” Specialty: {agent.specialty}n”
report += f” Confidence: {result[‘confidence’]:.2%}n”
if ‘reflection’ in result:
report += f” Performance: {result[‘reflection’].get(‘performance_trend’, ‘N/A’)}n”
report += ” Key Findings:n”
report += json.dumps({k: v for k, v in result.items() if k not in [‘reflection’, ‘next_actions’]}, indent=6) + “nn”
report += ” System-Level Insights:n”
report += f” Total iterations: {len(self.task_history)}n”
report += f” Active agents: {len(results)}n”
report += f” Global memory size: {len(self.global_memory.items)}n”
return report

We implement the Knowledge Graph Agent, enabling the system to connect entities through relations extracted from text. We then build the Meta-Controller, which coordinates all agents, manages planning, and handles multi-step execution. As we use this component, we watch the system behave like a true multi-agent pipeline with dynamic flow control. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserif __name__ == “__main__”:
sample_text = “””
Artificial intelligence researchers at OpenAI and DeepMind are developing
advanced language models. Sam Altman leads OpenAI in San Francisco, while
Demis Hassabis heads DeepMind in London. These organizations collaborate
with universities like MIT and Stanford. Their research focuses on machine
learning, neural networks, and reinforcement learning. The breakthrough
came when transformers revolutionized natural language processing in 2017.
“””
controller = MetaController()
results = controller.execute_with_planning(sample_text)
print(controller.generate_insights(results))
print(“Advanced multi-agent analysis complete with reflection and learning!”)

We run the entire agentic system end-to-end on a sample text. We execute planning, call each agent in sequence, and generate a comprehensive analysis report. As we reach this stage, we see the full power of the multi-agent architecture working together in real time.

In conclusion, we developed a comprehensive multi-agent reasoning framework that operates on real-world text using spaCy, integrating planning, learning, and memory into a cohesive workflow. We observe how each agent contributes a unique layer of understanding, and we see the Meta-Controller orchestrate them to generate rich, interpretable insights. Lastly, we recognize the flexibility and extensibility of this agentic design, and we feel confident that we can now adapt it to more complex tasks, larger datasets, or even integrate language models to further enhance the system’s intelligence.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Design an Advanced Multi-Agent Reasoning System with spaCy Featuring Planning, Reflection, Memory, and Knowledge Graphs appeared first on MarkTechPost.

Comparing the Top 6 Agent-Native Rails for the Agentic Internet: MCP, …

As AI agents move from single-app copilots to autonomous systems that browse, transact, and coordinate with each other, a new infrastructure layer is emerging underneath them. This article compares six key “agent-native rails” — MCP, A2A, AP2, ACP, x402, and Kite — focusing on how they standardize tool access, inter-agent communication, payment authorization, and settlement, and what that means for engineers designing secure, commerce-capable agentic systems.

The agent stack is around six trending agentic ‘rails’:

MCP – standard interface for tools and data.

A2A – transport and lifecycle for agent-to-agent calls.

AP2 – trust and mandates for agent-initiated payments.

ACP – interaction model for agentic checkout and commerce flows.

x402 – HTTP-native, on-chain payment protocol for APIs and agents.

Kite – L1 + state channels for high-frequency agent payments and policy-enforced autonomy.

They are complementary, not competing: MCP and A2A wire agents to context and each other, AP2/ACP encode commercial intent, and x402/Kite handle settlement.

The 6 rails at a glance

RailLayerPrimary roleTransport / substrateMCP (Model Context Protocol)Tools & dataStandard interface to tools, data sources, promptsJSON-RPC over stdio / process, HTTP / SSEA2A (Agent2Agent)Agent meshDiscovery and task lifecycle between agentsJSON-RPC 2.0 over HTTPS, optional SSE streamsAP2 (Agent Payments Protocol)Payment control planeVerifiable mandates and roles for agent paymentsProtocol-agnostic over existing rails, including blockchains like SuiACP (Agentic Commerce Protocol)Commerce flowsShared language for catalog, offers, checkout stateProtocol spec + HTTP APIs, open standard co-developed by OpenAI and Stripex402Settlement railInternet-native, per-request payments for APIs and agentsHTTP 402 with on-chain stablecoins such as USDCKiteL1 + state channelsAgent-centric chain with identity and streaming micropaymentsL1 chain + off-chain state-channel rails for agents

The rest of the article unpacks each rail along four axes:

Capabilities

Security posture

Ecosystem traction

OS / runtime integration trajectory

1. MCP: tool and context rail

Capabilities

The Model Context Protocol is an open protocol for connecting LLM applications to external tools and data. It defines a client–server architecture:

MCP clients (agents, IDEs, chat UIs) connect to

MCP servers that expose tools, resources, and prompts via a standardized JSON-RPC schema.

Tools are strongly typed (name + JSON schema for parameters and results) and can wrap arbitrary systems: HTTP APIs, databases, file operations, internal services, etc.

The same protocol works across transports (stdio for local processes, HTTP/SSE for remote servers), which is why multiple runtimes can consume the same MCP servers.

Security posture

MCP is deliberately agnostic about identity and payments. Security is inherited from the host:

Servers can run locally or remotely and may have full access to files, networks, and cloud APIs.

The main risks are classic: arbitrary code execution in tools, prompt injection, over-privileged credentials, and exfiltration of sensitive data.

Security guidance from Red Hat and others focuses on:

Least-privilege credentials per MCP server.

Sandboxing tools where possible.

Strong review and signing of server configurations.

Logging and audit for tool calls.

MCP itself does not give you access control semantics like ‘this agent can call this tool only under policy P’; those are layered on by hosts and IAM systems.

Ecosystem traction

MCP moved from Anthropic-only to ecosystem standard quickly:

Anthropic launched MCP and open-sourced the spec and TypeScript schemas.

OpenAI added full MCP client support in ChatGPT Developer Mode and the platform ‘Connectors’ system.

Microsoft integrated MCP into VS Code, Visual Studio, GitHub Copilot, and Copilot for Azure, including an “Azure MCP server.”

LangChain and LangGraph ship langchain-mcp-adapters for treating MCP tools as first-class LangChain tools.

Cloudflare runs a catalog of managed remote MCP servers and exposes them via its Agents SDK.

MCP is now effectively the ‘USB-C port’ for agent tools across IDEs, browsers, cloud agents, and edge runtimes

2. A2A: agent-to-agent protocol

Capabilities

The Agent2Agent (A2A) protocol is an open standard for inter-agent communication and task handoff. The spec defines:

A2A client – initiates tasks on behalf of a user or system.

A2A server (remote agent) – exposes a JSON-RPC endpoint that executes tasks.

Agent cards – JSON metadata at well-known paths (for example, /.well-known/agent-card.json) describing capabilities, endpoint, and auth.

Transport is standardized:

JSON-RPC 2.0 over HTTPS for requests and responses.

Optional SSE streams for long-running or streaming tasks.

This gives agents a common ‘RPC fabric’ independent of vendor or framework.

Security posture

At the protocol layer, A2A leans on common web primitives:

HTTPS with standard auth (API keys, OAuth-like tokens, mTLS) negotiated based on agent cards.

JSON-RPC 2.0 message format; parser correctness is a concern, since bugs in JSON-RPC handling become a security vector.

Red Hat and other analyses highlight:

Keep JSON-RPC libraries patched.

Protect against replay and downgrade attacks at the HTTP / TLS layer.

Treat agent-to-agent traffic like service-mesh traffic: identity, authz, and rate-limiting matter.

The protocol does not itself decide which agents should talk; that is a policy question for the platform.

Ecosystem traction

Google introduced A2A and is driving it as an interoperability layer for agents across enterprise platforms.

The A2A open-source org maintains the reference spec and implementation.

Amazon Bedrock AgentCore Runtime now supports A2A as a first-class protocol, with documented contract requirements.

Third-party frameworks (for example, CopilotKit) are adopting A2A for cross-agent and app-agent communication.

3. AP2: payment control layer

Capabilities

Agent Payments Protocol (AP2) is Google’s open standard for agent-initiated payments. Its core problem statement: when an AI agent pays, how do we know it had permission, the payment matches user intent, and someone is clearly accountable?

AP2 introduces:

Mandates – cryptographically signed digital contracts that encode who can pay, under which limits, for what kinds of transactions.

Role separation – payer agents, merchants, issuers, networks, and wallets each have explicit protocol roles.

Rail-agnostic design – AP2 can authorize payments over cards, bank transfers, or programmable blockchains such as Sui.

The protocol is designed to compose with A2A and MCP: A2A handles the messaging, MCP connects to tools, AP2 governs the payment semantics.

Security posture

Security is the main reason AP2 exists:

Mandates are signed using modern public-key cryptography and can be independently verified.

The protocol explicitly targets authorization, authenticity, and accountability: did the agent have permission, does the action match user intent, and who is liable if something goes wrong.

Ecosystem traction

AP2 is still early but already has meaningful backing:

Google announced AP2 with more than 60 organizations across ecommerce, payments, banking, and crypto as collaborators or early supporters.

Cohorts include networks like Mastercard and American Express, wallets and PSPs such as PayPal, and crypto players including Coinbase.

4. ACP: commerce interaction model

Capabilities

The Agentic Commerce Protocol (ACP), co-developed by OpenAI and Stripe, is the interaction model underlying ChatGPT Instant Checkout. It gives agents and merchants a shared language for:

Product discovery (catalog and offers).

Configuration (variants, shipping options).

Checkout state (selected item, price, shipping, terms).

Fulfillment and post-purchase status.

ACP is designed to:

Work across processors and business types without forcing backend rewrites.

Keep merchants as the merchant of record for fulfillment, returns, and support, even when the interaction starts in an agent.

Security posture

In ACP deployments:

Payments are handled by processors such as Stripe; ACP itself focuses on the structure of the commerce interaction, not on cryptography.

OpenAI’s Instant Checkout uses limited-scope payment credentials and explicit confirmation steps in the ChatGPT UI, which makes agent-initiated purchases visible to the user.

ACP does not replace anti-fraud, KYC, or PCI responsibilities; those remain with the PSPs and merchants.

Ecosystem traction

OpenAI and Stripe have open-sourced ACP and are actively recruiting merchants and platforms.

Instant Checkout is live for Etsy sellers, with Shopify merchants and additional regions coming next, and multiple press reports highlight ACP as the underlying protocol.

Salesforce has announced ACP-based integrations for its Agentforce Commerce stack.

ACP is essentially becoming the agent-side ‘checkout API‘ for multiple commerce ecosystems.

5. x402: HTTP-native settlement

Capabilities

x402 is Coinbase’s open payment protocol for AI agents and APIs. It revives HTTP status code 402 Payment Required as the trigger for machine-initiated, per-request payments.

Key properties:

Instant, automatic stablecoin payments over HTTP, primarily using USDC on chains like Base.

Clients (agents, apps) can pay for API calls, content, or services without accounts or sessions, by programmatically responding to 402 challenges.

Designed for both human and machine consumers, but the machine-to-machine case is explicitly emphasized.

Security posture

Settlement is on-chain, so the usual blockchain guarantees (and risks) apply: immutability, transparent balances, but exposure to contract bugs and key theft.

Coinbase runs the compliant infrastructure (KYT, sanctions screening, etc.) behind its managed offering.

There are no chargebacks; dispute handling must be layered at ACP/AP2 or application level.

Ecosystem traction

Coinbase and Cloudflare announced the x402 Foundation to push x402 as an open standard for internet payments, targeting both agents and human-facing APIs.

Cloudflare integrated x402 into its Agents SDK and MCP integration, so Workers and agents can offer paywalled endpoints and call x402 servers with a single wrapper.

6. Kite: agent-native L1 and state channels

Capabilities

Kite is an AI-oriented L1 chain and payment rail designed for agentic commerce. It states:

State-channel based micropayments– agents open off-chain channels and stream tiny payments with instant finality, settling periodically on-chain.

Agent-centric identity and constraints– cryptographic identity is used to bind agents and users, with protocol-level spend constraints and policy enforcement.

PoAI-oriented design– the chain is explicitly tuned for the AI-agent economy, not generic DeFi.

Security posture

Kite inherits L1 security concerns (consensus safety, smart-contract correctness) plus state-channel specifics:

Off-chain channels must be protected against fraud (for example, outdated state publication) and key compromise.

Policy constraints are enforced at protocol level; if implemented correctly, this can significantly reduce the chance of runaway spending by agents.

Because the design is agent-specific, there is less ‘legacy baggage’ than in generalized DeFi chains, but also less battle-tested code.

Ecosystem traction

PayPal Ventures and others have publicly backed Kite as part of the agentic commerce stack.

Crypto and infra publications describe it as a complementary rail to x402, optimized for streaming, high-frequency interactions between agents.

The ecosystem is still young compared to mainstream L1s, but it is clearly positioned as an ‘AI-payments L1,’ not a general-purpose chain.

How the rails compose in real systems

A realistic agentic workflow will touch several of these rails:

Tooling and data

An IDE agent, OS agent, or backend agent connects to internal APIs, file systems, and monitoring systems via MCP servers.

Multi-agent orchestration

The primary agent delegates specialized tasks (for example, cost optimization, legal review, marketing ops) to other agents via A2A.

Commerce flow

For purchasing, the agent enters an ACP flow with a merchant: fetch catalog, configure a product, receive a priced offer, confirm checkout state.

Payment authorization

The user has previously granted an AP2 mandate to a wallet-backed payment agent, specifying limits and scope. The commerce or orchestration agent requests payment via that AP2-capable payment agent.

Settlement

Depending on the scenario, the payment agent may:

Use traditional rails (card, bank) under AP2, or

Use x402 for per-call on-chain payments to an API, or

Use Kite state channels for streaming micro-transactions between agents.

This composition preserves separation of concerns:

MCP & A2A: who talks to whom, and about what.

AP2 & ACP: how intent, consent, and liability for commerce are encoded.

x402 & Kite: how value is actually moved at low latency.

References:

Model Context Protocol – official sitehttps://modelcontextprotocol.io/

Anthropic: “Introducing the Model Context Protocol”https://www.anthropic.com/news/model-context-protocol

Claude Docs: “Model Context Protocol (MCP)”https://docs.claude.com/en/docs/mcp

OpenAI Docs: “Connectors and MCP servers”https://platform.openai.com/docs/guides/tools-connectors-mcp

OpenAI Docs: “MCP Server Documentation”https://platform.openai.com/docs/mcp

LangChain MCP Adapters – GitHubhttps://github.com/langchain-ai/langchain-mcp-adapters

LangChain Docs: “Model Context Protocol (MCP)”https://docs.langchain.com/oss/python/langchain/mcp

npm package: @langchain/mcp-adaptershttps://www.npmjs.com/package/%40langchain/mcp-adapters

Azure AI Foundry: “Create an MCP Server with Azure AI Agent Service”https://devblogs.microsoft.com/foundry/integrating-azure-ai-agents-mcp/

Azure AI Foundry Docs: “Connect to Model Context Protocol servers (preview)”https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/model-context-protocol

Azure AI Foundry MCP Server – May 2025 updatehttps://devblogs.microsoft.com/foundry/azure-ai-foundry-mcp-server-may-2025/

Windows AI Foundry (MCP integration in Windows)https://developer.microsoft.com/en-us/windows/ai/

The Verge: “Windows is getting support for the ‘USB-C of AI apps’”https://www.theverge.com/news/669298/microsoft-windows-ai-foundry-mcp-support

Agent2Agent (A2A) Protocol – official specificationhttps://a2a-protocol.org/latest/specification/

Google Developers Blog: “Announcing the Agent2Agent Protocol (A2A)”https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/

IBM Think: “What is A2A protocol (Agent2Agent)?”https://www.ibm.com/think/topics/agent2agent-protocol

Amazon Bedrock: “Deploy A2A servers in AgentCore Runtime”https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-a2a.html

Amazon Bedrock: “A2A protocol contract”https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-a2a-protocol-contract.html

AWS News: “Amazon Bedrock AgentCore is now generally available”https://aws.amazon.com/about-aws/whats-new/2025/10/amazon-bedrock-agentcore-available/

Google Cloud Blog: “Announcing Agent Payments Protocol (AP2)”https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol

AP2 overview / technical details (Google / partner materials)https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol

Coinbase x402 + AP2 launch with Googlehttps://www.coinbase.com/developer-platform/discover/launches/google_x402

Omni (Swedish) coverage: “Google teamar upp med betaljättar – vill låta AI-agenter shoppa åt dig”https://omni.se/a/RzkWqO

OpenAI: “Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol”https://openai.com/index/buy-it-in-chatgpt/

OpenAI Developer Docs: “Agentic Commerce Protocol – Get started”https://developers.openai.com/commerce/guides/get-started/

Stripe Newsroom: “Stripe powers Instant Checkout in ChatGPT and releases the Agentic Commerce Protocol”https://stripe.com/newsroom/news/stripe-openai-instant-checkout

TechRadar Pro: “You can now buy things through ChatGPT with a single click”https://www.techradar.com/pro/you-can-now-buy-things-through-chatgpt-with-a-single-click-if-youre-one-of-the-lucky-ones

Reuters: “OpenAI partners with Etsy, Shopify on ChatGPT payment checkout”https://www.reuters.com/world/americas/openai-partners-with-etsy-shopify-chatgpt-checkout-2025-09-29/

Salesforce Press Release: “Salesforce Announces Support for Agentic Commerce Protocol with Stripe and OpenAI”https://www.salesforce.com/news/press-releases/2025/10/14/stripe-openai-agentic-commerce-protocol-announcement/

Salesforce Investor News: “Salesforce and OpenAI Partner Across Enterprise Work and Commerce”https://investor.salesforce.com/news/news-details/2025/Salesforce-and-OpenAI-Partner-Across-Enterprise-Work-and-Commerce/default.aspx

Salesforce: Agentforce Commercehttps://www.salesforce.com/commerce/

Coinbase Developer Platform: “x402: The internet-native payment protocol”https://www.coinbase.com/developer-platform/products/x402

Base Docs: “Building Autonomous Payment Agents with x402”https://docs.base.org/base-app/agents/x402-agents

Cloudflare Agents Docs: “x402 · Cloudflare Agents docs”https://developers.cloudflare.com/agents/x402/

Cloudflare Blog: “Launching the x402 Foundation with Coinbase, and support for x402 transactions”https://blog.cloudflare.com/x402/

Cloudflare x402 tag pagehttps://blog.cloudflare.com/tag/x402/

Zuplo Blog: “Autonomous API & MCP Server Payments with x402”https://zuplo.com/blog/mcp-api-payments-with-x402

Kite whitepaper: “Building Trustless Payment Infrastructure for Agentic AI”https://gokite.ai/kite-whitepaper

Kite: “Whitepaper”https://gokite.ai/whitepaper

Kite Docs: “Introduction & Mission”https://docs.gokite.ai/get-started-why-kite/introduction-and-mission

PayPal Newsroom: “Kite Raises $18M in Series A Funding To Enforce Trust in the Agentic Web”https://newsroom.paypal-corp.com/2025-09-02-Kite-Raises-18M-in-Series-A-Funding-To-Enforce-Trust-in-the-Agentic-Web

PayPal Ventures: “The state of agentic commerce and why we invested in Kite AI”https://paypal.vc/news/news-details/2025/The-state-of-agentic-commerce-and-why-we-invested-in-Kite-AI-2025-LroAXfplpA/default.aspx

Binance Research: “Kite enables an agentic internet…”https://www.binance.com/en-KZ/research/projects/kite

Phemex Academy: “What Is Kite (KITE)? Guide to the AI Agent Economy”https://phemex.com/academy/what-is-kite-ai-agent-economy

Finextra: “PayPal leads funding round in agentic AI firm Kite”https://www.finextra.com/newsarticle/46535/paypal-leads-funding-round-in-agentic-ai-firm-kite

Plug and Play Tech Center: “How Kite is Building the Infrastructure for the Agentic Internet”https://www.plugandplaytechcenter.com/venture-capital/investment-announcements/kite-investment

PYMNTS: “PayPal Ventures-Backed Kite Nets $18M for Agentic AI”https://www.pymnts.com/news/investment-tracker/2025/paypal-backed-kite-raises-18-million-for-agentic-web/

GlobeNewswire: “Kite announces investment from Coinbase Ventures…”https://www.globenewswire.com/news-release/2025/10/27/3174837/0/en/Kite-announces-investment-from-Coinbase-Ventures-to-Advance-Agentic-Payments-with-the-x402-Protocol.html

Keycard – official sitehttps://www.keycard.ai/

Keycard: product page (alternate URL)https://www.keycard.sh/

Help Net Security: “Keycard emerges from stealth with identity and access platform for AI agents”https://www.helpnetsecurity.com/2025/10/22/keycard-ai-agents-identity-access-platform/

GlobeNewswire: “Keycard Launches to Solve the AI Agent Identity and Access Problem…”https://www.globenewswire.com/news-release/2025/10/21/3170297/0/en/Keycard-Launches-to-Solve-the-AI-Agent-Identity-and-Access-Problem-With-38-Million-in-Funding-From-Andreessen-Horowitz-Boldstart-Ventures-and-Acrew-Capital.html

The post Comparing the Top 6 Agent-Native Rails for the Agentic Internet: MCP, A2A, AP2, ACP, x402, and Kite appeared first on MarkTechPost.

Build a biomedical research agent with Biomni tools and Amazon Bedrock …

This post is co-authored with the Biomni group from Stanford.
Biomedical researchers spend approximately 90% of their time manually processing massive volumes of scattered information. This is evidenced by Genentech’s challenge of processing 38 million biomedical publications in PubMed, public repositories like the Human Protein Atlas, and their internal repository of hundreds of millions of cells across hundreds of diseases. There is a rapid proliferation of specialized databases and analytical tools across different modalities including genomics, proteomics, and pathology. Researchers must stay current with the large landscape of tools, leaving less time for the hypothesis-driven work that drives breakthrough discoveries.
AI agents powered by foundation models offer a promising solution by autonomously planning, executing, and adapting complex research tasks. Stanford researchers built Biomni that exemplifies this potential. Biomni is a general-purpose biomedical AI agent that integrates 150 specialized tools, 105 software packages, and 59 databases to execute sophisticated analyses such as gene prioritization, drug repurposing, and rare disease diagnosis.
However, deploying such agents in production requires robust infrastructure capable of handling computationally intensive workflows and multiple concurrent users while maintaining security and performance standards. Amazon Bedrock AgentCore is a set of comprehensive services to deploy and operate highly capable agents using any framework or model, with enterprise-grade security and scalability.
In this post, we show you how to implement a research agent using AgentCore with access to over 30 specialized biomedical database tools from Biomni, thereby accelerating scientific discovery while maintaining enterprise-grade security and production scale. The code for this solution is available in the open-source toolkit repository of starter agents for life sciences on Amazon Web Services (AWS). The step by step instruction helps you deploy your own tools and infrastructure, along with AgentCore components, and examples.
Prototype-to-production complexity gap
Moving from a local biomedical research prototype to a production system accessible by multiple research teams requires addressing complex infrastructure challenges.
Agent deployment with enterprise security
Enterprise security challenges include OAuth-based authentication, secure tool sharing through scalable gateways, comprehensive observability for research audit trails, and automatic scaling to handle concurrent research workloads. Many promising prototypes fail to reach production because of the complexity of implementing these enterprise-grade requirements while maintaining the specialized domain expertise needed for accurate biomedical analysis.
Session-aware research context management
Biomedical research workflows often span multiple conversations and require persistent memory of previous analyses, experimental parameters, and research preferences across extended research sessions. Research agents must maintain contextual awareness of ongoing projects, remember specific protein targets, experimental conditions, and analytical preferences. All that must be done while facilitating proper session isolation between different researchers and research projects in a multi-tenant production environment.
Scalable tool gateway
Implementing a reusable tool gateway that can handle concurrent requests from research agent, proper authentication, and consistent performance becomes critical at scale. The gateway must enable agents to discover and use tools through secure endpoints, help agents find the right tools through contextual search capabilities, and manage both inbound authentication (verifying agent identity) and outbound authentication (connecting to external biomedical databases) in a unified service. Without this architecture, research teams face authentication complexity and reliability issues that prevent effective scaling.
Solution overview
We use Strands Agents, an open source agent framework, to build a research agent with local tool implementation for PubMed biomedical literature search. We extended the agent’s capabilities by integrating Biomni database tools, providing access to over 30 specialized biomedical databases.
The overall architecture is shown in the following diagram.

The AgentCore Gateway service centralizes Biomni database tools as more secure, reusable endpoints with semantic search capabilities. AgentCore Memory service maintains contextual awareness across research sessions using specialized strategies for research context. Security is handled by AgentCore Identity service, which manages authentication for both users and tool access control. Deployment is streamlined with the AgentCore Runtime service, providing scalable, managed deployment with session isolation. Finally, the AgentCore Observability service enables comprehensive monitoring and auditing of research workflows that are critical for scientific reproducibility.
Step 1 – Creating tools such as the Biomni database tools using AgentCore Gateway
In real-world use cases, we need to connect agents to different data sources. Each agent might duplicate the same tools, leading to extensive code, inconsistent behavior, and maintenance nightmares. AgentCore Gateway service streamlines this process by centralizing tools into reusable, secure endpoints that agents can access. Combined with the AgentCore Identity service for authentication, AgentCore Gateway creates an enterprise-grade tool sharing infrastructure. To give more context to the agent with reusable tools, we provided access to over 30 specialized public database APIs through the Biomni tools registered on the gateway. The gateway exposes Biomni’s database tools through the Model Context Protocol (MCP), allowing the research agent to discover and invoke these tools alongside local tools like PubMed. It handles authentication, rate limiting, and error handling, providing a seamless research experience.

def create_gateway(gateway_name: str, api_spec: list) -> dict:
# JWT authentication with Cognito
auth_config = {
“customJWTAuthorizer”: {
“allowedClients”: [
get_ssm_parameter(“/app/researchapp/agentcore/machine_client_id”)
],
“discoveryUrl”:
get_ssm_parameter(“/app/researchapp/agentcore/cognito_discovery_url”),
}
}

# Enable semantic search for BioImm tools
search_config = {“hcp”: {“searchType”: “SEMANTIC”}}

# Create the gateway
gateway = bedrock_agent_client.create_gateway(
name=gateway_name,
collectionexecution_role_arn,
protocolType=”MCP”,
authorizerType=”CUSTOM_JWT”,
authorizerConfiguration=auth_config,
protocolConfiguration=search_config,
description=”My App Template AgentCore Gateway”,
)

We use an
AWS Lambda function to host the Biomni integration code. The Lambda function is automatically configured as an MCP target in the AgentCore Gateway. The Lambda function exposes its available tools through the API specification (
api_spec.json).
# Gateway Target Configuration
lambda_target_config = {
“mcp”: {
“lambda”: {
“lambdaArn”: get_ssm_parameter(“/app/researchapp/agentcore/lambda_arn”),
“toolSchema”: {“inlinePayload”: api_spec},
}
}
}

# Create the target
create_target_response = gateway_client.create_gateway_target(
gatewayIdentifier=gateway_id,
name=”LambdaUsingSDK”,
description=”Lambda Target using SDK”,
targetConfiguration=lambda_target_config,
credentialProviderConfigurations=[{
“credentialProviderType”: “GATEWAY_IAM_ROLE”
}],
)
The full list of Biomni database tools included on the gateway are listed in the following table:

Group
Tool
Description

Protein and structure databases
UniProt
Query the UniProt REST API for comprehensive protein sequence and functional information

AlphaFold
Query the AlphaFold Database API for AI-predicted protein structure predictions

InterPro
Query the InterPro REST API for protein domains, families, and functional sites

PDB (Protein Data Bank)
Query the RCSB PDB database for experimentally determined protein structures

STRING
Query the STRING protein interaction database for protein-protein interaction networks

EMDB (Electron Microscopy Data Bank)
Query for 3D macromolecular structures determined by electron microscopy

Genomics and variants
ClinVar
Query NCBI’s ClinVar database for clinically relevant genetic variants and their interpretations

dbSNP
Query the NCBI dbSNP database for single nucleotide polymorphisms and genetic variations

gnomAD
Query gnomAD for population-scale genetic variant frequencies and annotations

Ensembl
Query the Ensembl REST API for genome annotations, gene information, and comparative genomics

UCSC Genome Browser
Query the UCSC Genome Browser API for genomic data and annotations

Expression and omics
GEO (Gene Expression Omnibus)
Query NCBI’s GEO for RNA-seq, microarray, and other gene expression datasets

PRIDE
Query the PRIDE database for proteomics identifications and mass spectrometry data

Reactome
Query the Reactome database for biological pathways and molecular interactions

Clinical and drug data
cBioPortal
Query the cBioPortal REST API for cancer genomics data and clinical information

ClinicalTrials.gov
Query ClinicalTrials.gov API for information about clinical studies and trials

OpenFDA
Query the OpenFDA API for FDA drug, device, and food safety data

GtoPdb (Guide to PHARMACOLOGY)
Query the Guide to PHARMACOLOGY database for drug targets and pharmacological data

Disease and phenotype
OpenTargets
Query the OpenTargets Platform API for disease-target associations and drug discovery data

Monarch Initiative
Query the Monarch Initiative API for phenotype and disease information across species

GWAS Catalog
Query the GWAS Catalog API for genome-wide association study results

RegulomeDB
Query the RegulomeDB database for regulatory variant annotations and functional predictions

Specialized databases
JASPAR
Query the JASPAR REST API for transcription factor binding site profiles and motifs

WoRMS (World Register of Marine Species)
Query the WoRMS REST API for marine species taxonomic information

Paleobiology Database (PBDB)
Query the PBDB API for fossil occurrence and taxonomic data

MPD (Mouse Phenome Database)
Query the Mouse Phenome Database for mouse strain phenotype data

Synapse
Query Synapse REST API for biomedical datasets and collaborative research data

The following are examples of how individual tools get triggered through the MCP from our test suite:

# Protein and Structure Analysis
“Use uniprot tool to find information about human insulin protein”
# → Triggers uniprot MCP tool with protein query parameters
“Use alphafold tool for structure predictions for uniprot_id P01308”
# → Triggers alphafold MCP tool for 3D structure prediction
“Use pdb tool to find protein structures for insulin”
# → Triggers pdb MCP tool for crystallographic structures
# Genetic Variation Analysis
“Use clinvar tool to find pathogenic variants in BRCA1 gene”
# → Triggers clinvar MCP tool with gene variant parameters
“Use gnomad tool to find population frequencies for BRCA2 variants”
# → Triggers gnomad MCP tool for population genetics data

As the tool collection grows, the agent can use built-in semantic search capabilities to discover and select tools based on the task context. This improves agent performance and reducing development complexity at scale. For example, the user asks, “tell me about HER2 variant rs1136201.” Instead of listing all 30 or more tools from the gateway back to the agent, semantic search returns ‘n’ most relevant tools. For example, Ensembl, Gwas catalog, ClinVar, and Dbsnp to the agent. The agent now uses a smaller subset of tools as input to the model to return a more efficient and faster response.
The following graphic illustrates using AgentCore Gateway for tool search.

You can now test your deployed AgentCore gateway using the following test scripts and compare how semantic search narrows down the list of relevant tools based on the search query.
uv run tests/test_gateway.py –prompt “What tools are available?”
uv run tests/test_gateway.py –prompt “Find information about human insulin protein” –use-search
Step 2- Strands research agent with a local tool
The following code snippet shows model initialization, implementing the PubMed local tool that’s declared using the Strands @tool decorator. We’ve implemented the PubMed tool in research_tools.py that calls PubMed APIs to enable biomedical literature search capabilities within the agent’s execution context.

PubMed Tool Creation

from agent.agent_config.tools.PubMed import PubMed

@tool(
name=”Query_pubmed”,
description=(
“Query PubMed for relevant biomedical literature based on the user’s query. ”
“This tool searches PubMed abstracts and returns relevant studies with ”
“titles, links, and summaries.”
),
)
def query_pubmed(query: str) -> str:
“””
Query PubMed for relevant biomedical literature based on the user’s query.

This tool searches PubMed abstracts and returns relevant studies with
titles, links, and summaries.

Args:
query: The search query for PubMed literature

Returns:
str: Formatted results from PubMed search
“””
pubmed = PubMed()

print(f”nPubMed Query: {query}n”)
result = pubmed.run(query)
print(f”nPubMed Results: {result}n”)

return result

Create the Strands research agent with the local tool and Claude Sonnet 4 Interleaved Thinking.

class ResearchAgent:
def __init__(
self,
bearer_token: str,
memory_hook: MemoryHook = None,
session_manager: AgentCoreMemorySessionManager = None,
bedrock_model_id: str = “us.anthropic.claude-sonnet-4-20250514-v1.0”,
#bedrock_model_id: str = “openai.gpt-oss-120b-1.0”, # Alternative
system_prompt: str = None,
tools: List[callable] = None,
):

self.model_id = bedrock_model_id
# For Anthropic Sonnet 4 interleaved thinking
self.model = BedrockModel(
model_id=self.model_id,
additional_request_fields={
“anthropic_beta”: [“interleaved-thinking-2025-05-14”],
“thinking”: {“type”: “enabled”, “budget_tokens”: 8000},
},
)

self.system_prompt = (
system_prompt
if system_prompt
else “””
You are a **Comprehensive Biomedical Research Agent** specialized in conducting
systematic literature reviews and multi-database analyses to answer complex biomedical research
questions. Your primary mission is to synthesize evidence from both published literature
(PubMed) and real-time database queries to provide comprehensive, evidence-based insights for
pharmaceutical research, drug discovery, and clinical decision-making.

Your core capabilities include literature analysis and extracting data from 30+ specialized
biomedical databases** through the Bioimm gateway, enabling comprehensive data analysis. The
database tool categories include genomics and genetics, protein structure and function, pathways
and system biology, clinical and pharmacological data, expression and omics data and other
specialized databases.
“””
)

In addition, we implemented citations that use a structured system prompt to enforce numbered in-text citations [1], [2], [3] with standardized reference formats for both academic literature and database queries, marking sure every data source is properly attributed. This allows researchers to quickly access and reference the scientific literature that supports their biomedical research queries and findings.

“””
<citation_requirements>
– ALWAYS use numbered in-text citations [1], [2], [3], etc. when referencing any data source
– Provide a numbered “References” section at the end with full source details
– For academic literature: format as “1. Author et al. Title. Journal. Year. ID: [PMID/DOI], available at: [URL]”
– For database sources: format as “1. Database Name (Tool: tool_name), Query: [query_description], Retrieved: [current_date]”
– Use numbered in-text citations throughout your response to support all claims and data points
– Each tool query and each literature source must be cited with its own unique reference number
– When tools return academic papers, cite them using the academic format with full bibliographic details
– Structure: Format each reference on a separate line with proper numbering – NO bullet points
– Present the References section as a clean numbered list, not a confusing paragraph
– Maintain sequential numbering across all reference types in a single “References” section
</citation_requirements>
“””

You can now test your agent locally:
uv run tests/test_agent_locally.py –prompt “Find information about human insulin protein”
uv run tests/test_agent_locally.py –prompt “Find information about human insulin protein” –use-search
Step 3 – Add Persistent Memory for contextual research assistance
The research agent implements the AgentCore Memory service with three strategies: semantic for factual research context, user_preference for research methodologies, and summary for session continuity. The AgentCore Memory session manager is integrated with Strands session management and retrieves relevant context before queries and save interactions after responses. This enables the agent to remember research preferences, ongoing projects, and domain expertise across sessions without manual context re-establishment.
# Test memory functionality with research conversations
python tests/test_memory.py load-conversation<br />python tests/test_memory.py load-prompt “My preferred response format is detailed explanations”
Step 4 – Deploy with AgentCore Runtime
To deploy our agent, we use AgentCore Runtime to configure and launch the research agent as a managed service. The deployment process configures the runtime with the agent’s main entrypoint (agent/main.py), assigns an IAM execution role for AWS service access, and supports both OAuth and IAM authentication modes. After deployment, the runtime becomes a scalable, serverless agent that can be invoked using API calls. The agent automatically handles session management, memory persistence, and tool orchestration while providing secure access to the Biomni gateway and local research tools.
agentcore configure –entrypoint agent/main.py -er arn:aws:iam::&lt;Account-Id&gt;:role/&lt;Role&gt; –name researchapp&lt;AgentName&gt;
For more information about deploying with AgentCore Runtime, see Get started with AgentCore Runtime in the Amazon Bedrock AgentCore Developer Guide.
Agents in action 
The following are three representative research scenarios that showcase the agent’s capabilities across different domains: drug mechanism analysis, genetic variant investigation, and pathway exploration. For each query, the agent autonomously determines which combination of tools to use, formulates appropriate sub-queries, analyzes the returned data, and synthesizes a comprehensive research report with proper citations. The accompanying demo video shows the complete agent workflow, including tools selection, reasoning, and response generation.

Conduct a comprehensive analysis of trastuzumab (Herceptin) mechanism of action and resistance mechanisms you’ll need:

HER2 protein structure and binding sites
Downstream signaling pathways affected
Known resistance mechanisms from clinical data
Current clinical trials investigating combination therapies
Biomarkers for treatment response predictionQuery relevant databases to provide a comprehensive research report.

Analyze the clinical significance of BRCA1 variants in breast cancer risk and treatment response. Investigate:

Population frequencies of pathogenic BRCA1 variants
Clinical significance and pathogenicity classifications
Associated cancer risks and penetrance estimates
Treatment implications (PARP inhibitors, platinum agents)
Current clinical trials for BRCA1-positive patients Use multiple databases to provide comprehensive evidence

The following video is a demonstration of a biomedical research agent:

Scalability and observability
One of the most critical challenges in deploying sophisticated AI agents is making sure they scale reliably while maintaining comprehensive visibility into their operations. Biomedical research workflows are inherently unpredictable—a single genomic analysis might process thousands of files, while a literature review could span millions of publications. Traditional infrastructure struggles with these dynamic workloads, particularly when handling sensitive research data that requires strict isolation between different research projects.In this deployment, we use Amazon Bedrock AgentCore Observability to visualize each step in the agent workflow. You can use this service to inspect an agent’s execution path, audit intermediate outputs, and debug performance bottlenecks and failures. For biomedical research, this level of transparency is not just helpful—it’s essential for regulatory compliance and scientific reproducibility.
Sessions, traces, and spans form a three-tiered hierarchical relationship in the observability framework. A session contains multiple traces, with each trace representing a discrete interaction within the broader context of the session. Each trace contains multiple spans that capture fine-grained operations. The following screenshot shoes the usage of one agent: Number of sessions, token usage, and error rate in production

The following screenshot shows the agents in production and their usage (number of Sessions, number of invocations)

The built-in dashboards show performance bottlenecks and identify why certain interactions might fail, enabling continuous improvement and reducing the mean time to detect (MTTD) and mean time to repair (MTTR). For biomedical applications where failed analyses can delay critical research timelines, this rapid issue resolution capability makes sure that research momentum is maintained.
Future direction
While this implementation focuses on only a subset of tools, the AgentCore Gateway architecture is designed for extensibility. Research teams can seamlessly add new tools without requiring code changes by using the MCP protocol. Newly registered tools are automatically discoverable by agents allowing your research infrastructure to evolve alongside the rapidly changing tool sets.
For computational analysis that requires code execution, the AgentCore Code Interpreter service can be integrated into the research workflow. With AgentCore Code Interpreter the research agent can retrieve data and execute Python-based analysis using domain-specific libraries like BioPython, scikit-learn, or custom genomics packages.
Future extensions could support multiple research agents to collaborate on complex projects, with specialized agents for literature review, experimental design, data analysis, and result interpretation working together through multi-agent collaboration. Organizations can also develop specialized research agents tailored to specific therapeutic areas, disease domains, or research methodologies that share the same enterprise infrastructure and tool gateway.
Looking ahead with Biomni
“Biomni today is already useful for academic research and open exploration. But to enable real discovery—like advancing drug development—we need to move beyond prototypes and make the system enterprise-ready. Embedding Biomni into the workflows of biotech and pharma is essential to turn research potential into tangible impact.
That’s why we are excited to integrate the open-source environment with Amazon Bedrock AgentCore, bridging the gap from research to production. Looking ahead, we’re also excited about extending these capabilities with the Biomni A1 agent architecture and the Biomni-R0 model, which will unlock even more sophisticated biomedical reasoning and analysis. At the same time, Biomni will remain a thriving open-source environment, where researchers and industry teams alike can contribute tools, share workflows, and push the frontier of biomedical AI together with AgentCore.”
Conclusion
This implementation demonstrates how organizations can use Amazon Bedrock AgentCore to transform biomedical research prototypes into production-ready systems. By integrating Biomni’s comprehensive collection of over 150 specialized tools through the AgentCore Gateway service, we illustrate how teams can create enterprise-grade tool sharing infrastructure that scales across multiple research domains.The combination of Biomni’s biomedical tools with the enterprise infrastructure of Bedrock AgentCore organizations can build research agents that maintain scientific rigor while meeting production requirements for security, scalability, and observability. Biomni’s diverse tool collection—spanning genomics, proteomics, and clinical databases—exemplifies how specialized research capabilities can be centralized and shared across research teams through a secure gateway architecture.
To begin building your own biomedical research agent with Biomni tools, explore the implementation by visiting our GitHub repository for the complete code and documentation. You can follow the step-by-step implementation guide to set up your research agent with local tools, gateway integration, and Bedorck AgentCore deployment. As your needs evolve, you can extend the system with your organization’s proprietary databases and analytical tools. We encourage you to join the growing environment of life sciences AI agents and tools by sharing your extensions and improvements.

About the authors
Hasan Poonawala is a Senior AI/ML Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Hasan helps design, deploy and scale Generative AI and Machine learning applications on AWS. He has over 15 years of combined work experience in machine learning, software development and data science on the cloud. In his spare time, Hasan loves to explore nature and spend time with friends and family.
Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the Healthcare and Life Sciences Industry. He is currently based in New York City.
Necibe Ahat is a Senior AI/ML Specialist Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Necibe helps customers to advance their generative AI and machine learning journey. She has a background in computer science with 15 years of industry experience helping customers ideate, design, build and deploy solutions at scale. She is a passionate inclusion and diversity advocate.
Kexin Huang is a final-year PhD student in Computer Science at Stanford University, advised by Prof. Jure Leskovec. His research applies AI to enable interpretable and deployable biomedical discoveries, addressing core challenges in multi-modal modeling, uncertainty, and reasoning. His work has appeared in Nature Medicine, Nature Biotechnology, Nature Chemical Biology, Nature Biomedical Engineering and top ML venues (NeurIPS, ICML, ICLR), earning six best paper awards. His research has been highlighted by Forbes, WIRED, and MIT Technology Review, and he has contributed to AI research at Genentech, GSK, Pfizer, IQVIA, Flatiron Health, Dana-Farber, and Rockefeller University.

Make your web apps hands-free with Amazon Nova Sonic

Graphical user interfaces have carried the torch for decades, but today’s users increasingly expect to talk to their applications. Amazon Nova Sonic is a state-of-the-art foundation model from Amazon Bedrock, that helps enable this shift by providing natural, low-latency, bidirectional speech conversations over a simple streaming API. Users can collaborate with the applications through voice and embedded intelligence rather than merely operating them.
In this post we show how we added a true voice-first experience to a reference application—the Smart Todo App—turning routine task management into a fluid, hands-free conversation.
Rethinking user interaction through collaborative AI voice agents
Important usability enhancements are often deprioritized—not because they aren’t valuable, but because they’re difficult to implement within traditional mouse-and-keyboard interfaces. Features like intelligent batch actions, personalized workflows, or voice-guided assistance are frequently debated but deferred due to UI complexity. This is about voice as an additional, general-purpose interaction mode—not a replacement for device-specific controls or an accessibility-only solution. Voice enables new interaction patterns, it also benefits users of assistive technologies, such as screen readers, by offering an additional, inclusive way to interact with the application.
Amazon Nova Sonic goes far beyond one-shot voice commands. The model can plan multistep workflows, call backend tools, and keep context across turns so that your application can collaborate with the users.
The following table shows voice interactions from different application domains, like task management, CRM, and help desk.

Voice interaction (example phrase)
Intent / goal
System action / behavior
Confirmation / UX

Mark all my tasks as complete.
Bulk-complete tasks
Find user’s open tasks → mark complete → archive if configured
All 12 open tasks are marked complete.

Create a plan for preparing the Q3 budget: break it into steps, assign owners, and set deadlines.
Create multistep workflow
Generate plan → create tasks → assign owners → set deadlines → surface review options
Plan created with 6 tasks. Notify owners?

Find enterprise leads in APAC with ARR over $1M and draft personalized outreach.
Build targeted prospect list and draft outreach
Query CRM → assemble filtered list → draft personalized messages for review
Drafted 24 personalized outreach messages. Review and send?

Prioritize all P1 tickets opened in the last 24 hours and assign them to on-call.
Triage and assign
Filter tickets → set priority → assign to on-call → log changes
12 P1 tickets prioritized and assigned to the on-call team.

Amazon Nova Sonic understands the intent, invokes the required APIs, and confirms the results—no forms required. This helps to create an environment where productivity is multiplied, and context becomes the interface. It’s not about replacing traditional UI, it’s about unlocking new capabilities through voice.
The sample application at a glance
With the Smart Todo reference application, users can create to-do lists and manage notes within those lists. The application offers a focused yet flexible interface for task tracking and note organization. With the addition of voice, the application becomes a hands-free experience that unlocks more natural and productive interactions. In Smart Todo App, users can say:

“Add a note to follow up on the project charter.”
“Archive all completed tasks.”

Behind each command are focused actions—like creating a new note, organizing content, or updating task status—executed through speech in a way that feels natural and efficient.
How Amazon Nova Sonic bidirectional APIs work
Amazon Nova Sonic implements a real-time, bidirectional streaming architecture. After a session is initiated with InvokeModelWithBidirectionalStream, audio input and model responses flow simultaneously over an open stream:

Session Start – Client sends a sessionStart event with model configuration (for example, temperature and topP).
Prompt and Content Start – Client sends structured events indicating whether upcoming data is audio, text, or tool input.
Audio Streaming – Microphone audio is streamed as base64-encoded audio input events.
Model Responses – As the model processes input, it streams the following responses asynchronously:

Automatic speech recognition (ASR) results
Tool use invocations
Text responses
Audio output for playback

Session Close – Conversations are explicitly closed by sending contentEnd, promptEnd, and sessionEnd events.

Nova Sonic Architecture Diagram

You can use this event-driven approach to interrupt the assistant (barge-in), enable multi-turn conversations, and support real-time adaptability.
Solution architecture
For this solution, we use a serverless application architecture pattern, where the UI is a React single page application. The React single page application is integrated with backend web APIs running on server-side containers. The Smart Todo App is deployed using a scalable and security-aware AWS architecture that’s designed to support real-time voice interactions. The following image provides an architecture overview of AWS services working together to support bidirectional streaming needs of a voice enabled application.

Key AWS services include:

Amazon Bedrock – Powers real-time, bidirectional speech interactions through the Amazon Nova Sonic foundation model.
Amazon CloudFront – A content delivery network (CDN) that distributes the application globally with low latency. It routes /(root) traffic to the React application hosted on an Amazon S3 bucket and /api and /novasonic traffic to the Application Load Balancer.
AWS Fargate for Amazon Amazon Elastic Container Service (Amazon ECS) – Runs the backend containerized services for WebSocket handling and REST APIs capable of supporting long lived bidirectional streams.
Application Load Balancer (ALB) – Forwards web traffic /api (HTTPS REST API calls) to backend ECS services, handling Smart Todo App APIs, and /novasonic (WebSocket connections) to ECS services managing real-time voice streaming with Amazon Nova Sonic.
Amazon Virtual Private Cloud (Amazon VPC) – Provides network isolation and security for backend services. The Public Subnets host the Application Load Balancer (ALB) and Private Subnets host ECS Fargate tasks running WebSocket and REST APIs.
NAT Gateway allows Amazon ECS tasks in private subnets to more securely connect to the internet for operations like Cognito JWT token verification endpoints.
Amazon Simple Storage Service (Amazon S3) –Hosts React frontend for user interactions
AWS WAF – Helps protect the Application Load Balancer (ALB) from malicious traffic and enforces security rules at the application layer.
Amazon Cognito – Manages authentication and issues tokens.
Amazon DynamoDB – Stores application data such as to-do lists and notes.

The following image illustrates how the user requests are served with support for low-latency bidirectional streaming.

Request Workflow

Deploying the solution
To evaluate this solution, we provided sample code of a Smart Todo App available at GitHub repository.
Smart Todo App consists of multiple independent Node.js projects, including a CDK infrastructure project, a React frontend application, and backend API services. The deployment workflow makes sure that the components are correctly built and integrated with AWS services like Amazon Cognito, Amazon DynamoDB, and Amazon Bedrock.
Prerequisites

AWS account with appropriate permissions that facilitate security best practices, including least-privilege permissions.
Docker Engine installed locally and running to build container image locally.
AWS CLI configured with AWS admin credentials.
Node.js >= 20.x and npm installed.
Amazon Nova Sonic enabled in Amazon Bedrock. For more information, see Add or remove access to Amazon Bedrock foundation models.

Deployment steps

Clone the following repository:

git clone https://github.com/aws-samples/sample-amazon-q-developer-vibe-coded-projects.git
cd NovaSonicVoiceAssistant

For first-time deployment, use the following automated script:

npm run deploy:first-time

This script will:

Install the dependencies using npm (node package manager)
Build the components and container image using locally installed docker engine
Deploy the infrastructure using CDK (CDK BootStrap ==> CDK Synth ==> CDK Deploy)
Update environment variables with Amazon Cognito settings
Rebuild the UI with updated environment variables
Deploy the final infrastructure (CDK Deploy)

Verifying deployment
After deployment is successful, complete the following steps:

Access the Amazon CloudFront URL provided in the CDK outputs. Note: The URL shown in the image is for reference only, every deployment will get a unique URL.

Successful deployment screen shot

Create a new user by signing up using the Create Account section.

Create User and Log in

Test the voice functionality to verify the integration with Amazon Nova Sonic. The following image illustrates a conversation between the signed-in user and the Amazon Bedrock agent. The AI agent is able to invoke existing APIs, and the UI is updated in real time to reflect agent’s actions.

Granting Microphone access to the application

Voice interaction in Smart Todo App

Clean up
You can remove the stacks with the following command.

# move to the infra folder, assuming you are in the project’s root folder
cd infra
# Removes the AWS stack
npm run destroy

Next steps
Voice isn’t just an accessibility add-on—it’s becoming the primary interface for complex workflows. Turns out talking is faster than selecting—especially when your app talks back.
Try these resources to get started.

Sample Code repo – A working Amazon Nova Sonic integration you can run locally. See how real-time voice interactions, intent handling, and multistep flows are implemented end to end.
Amazon Nova Sonic hands-on workshop – A guided lab that walks you through deploying Amazon Nova Sonic in your AWS account and testing voice-native features.
Amazon Nova Sonic docs – Provides API reference, streaming examples, and best practices to help you design and deploy voice-driven workflows.
Contact your AWS account team to learn more about how AI-driven solutions can transform your operations.

About the authors
Manu Mishra is a Senior Solutions Architect at AWS, specializing in artificial intelligence, data and analytics, and security. His expertise spans strategic oversight and hands-on technical leadership, where he reviews and guides the work of both internal and external customers. Manu collaborates with AWS customers to shape technical strategies that drive impactful business outcomes, providing alignment between technology and organizational goals.
AK Soni is a Senior Technical Account Manager with AWS Enterprise Support, where he empowers enterprise customers to achieve their business goals by offering proactive guidance on implementing innovative cloud and AI/ML-based solutions aligned with industry best practices. With over 19 years of experience in enterprise application architecture and development, he uses his expertise in generative AI technologies to enhance business operations and overcome existing technological limitations.
Raj Bagwe is a Senior Solutions Architect at Amazon Web Services, based in San Francisco, California. With over 6 years at AWS, he helps customers navigate complex technological challenges and specializes in Cloud Architecture, Security and Migrations. In his spare time, he coaches a robotics team and plays volleyball. He can be reached at X handle @rajesh_bagwe.

Harnessing the power of generative AI: Druva’s multi-agent copilot f …

This post is co-written with David Gildea and Tom Nijs from Druva.
Generative AI is transforming the way businesses interact with their customers and revolutionizing conversational interfaces for complex IT operations. Druva, a leading provider of data security solutions, is at the forefront of this transformation. In collaboration with Amazon Web Services (AWS), Druva is developing a cutting-edge generative AI-powered multi-agent copilot that aims to redefine the customer experience in data security and cyber resilience.
Powered by Amazon Bedrock and using advanced large language models (LLMs), this innovative solution will provide Druva’s customers with an intuitive, conversational interface to access data management, security insights, and operational support across their product suite. By harnessing the power of generative AI and agentic AI, Druva aims to streamline operations, increase customer satisfaction, and enhance the overall value proposition of its data security and cyber resilience solutions.
In this post, we examine the technical architecture behind this AI-powered copilot, exploring how it processes natural language queries, maintains context across complex workflows, and delivers secure, accurate responses to streamline data protection operations.
Challenges and opportunities
Druva wants to effectively serve enterprises moving beyond traditional query-based AI and into agentic systems and meet their complex data management and security needs with greater speed, simplicity, and confidence.
Comprehensive data security necessitates tracking a high volume of data and metrics to identify potential cyber threats. As threats evolve, it can be difficult for customers to stay abreast of new data anomalies to hunt for within their organization’s data, but missing any threat signals can lead to unauthorized access to sensitive information. For example, a global financial services company managing more than 500 servers across multiple regions currently spends hours manually checking logs across dozens of systems when backup fails. With an AI-powered copilot, they could simply ask, “Why did my backups fail last night?” and instantly receive an analysis showing that a specific policy update caused conflicts in their European data centers, along with a step-by-step remediation, reducing investigation time from hours to minutes. This solution not only reduces the volume of support requests and accelerates the time to resolution, but also unlocks greater operational efficiency for end users.
By reimagining how users engage with the system—from AI-powered workflows to smarter automation—Druva saw a clear opportunity to deliver a more seamless customer experience that strengthens customer satisfaction, loyalty, and long-term success.
The key opportunities for Druva in implementing a generative AI-powered multi-agent copilot include:

Simplified user experience: By providing a natural language interface, the copilot can simplify complex data protection tasks and help users access the information they need quickly.
Intelligent Troubleshooting: The copilot can leverage AI capabilities to analyze data from various sources, identify the root causes of backup failures, and provide personalized recommendations for resolution.
Streamlined Policy Management: The multi-agent copilot can guide users through the process of creating, modifying, and implementing data protection policies, reducing the potential for human errors and improving compliance.
Proactive Support: By continuously monitoring data protection environments, the copilot can proactively identify potential issues and provide guidance to help prevent failures or optimize performance.
Scalable and Efficient Operations: The AI-powered solution can handle a large volume of customer inquiries and tasks simultaneously, reducing the burden on Druva’s support team so that they can focus on more complex and strategic initiatives.

Solution overview
The proposed solution for Druva’scopilot leverages a sophisticated architecture that combines the power of Amazon Bedrock (including Amazon Bedrock Knowledge Bases), LLMs, and a dynamic API selection process to deliver an intelligent and efficient user experience. In the following diagram, we demonstrate the end-to-end architecture and various sub-components.

At the core of the system is the supervisor agent, which serves as the central coordination component of the multi-agent system. This agent is responsible for overseeing the entire conversation flow, delegating tasks to specialized sub-agents, and maintaining seamless communication between the various components.
The user interacts with the supervisor agent through a user interface, submitting natural language queries related to data protection, backup management, and troubleshooting. The supervisor agent analyzes the user’s input and routes the request to the appropriate sub-agents based on the nature of the query.
The data agent is responsible for retrieving relevant information from Druva’s systems by interacting with the GET APIs. This agent fetches data such as scheduled backup jobs, backup status, and other pertinent details to provide the user with accurate and up-to-date information.
The help agent assists users by providing guidance on best practices, step-by-step instructions, and troubleshooting tips. This agent draws upon an extensive knowledge base, which includes detailed API documentation, user manuals, and frequently asked questions, to deliver context-specific assistance to users.
When a user needs to perform critical actions, such as initiating a backup job or modifying data protection policies, the action agent comes into play. This agent interacts with the POST API endpoints to execute the necessary operations, making sure that the user’s requirements are met promptly and accurately.
To make sure that the multi-agent copilot operates with the most suitable APIs and parameters, the solution incorporates a dynamic API selection process. In the following diagram, we highlight the various AWS services used to implement dynamic API selection, with which both the data agent and the action agent are equipped. Bedrock Knowledge Bases contains comprehensive information about available APIs, their functionalities, and optimal usage patterns. Once an input query is received, we use semantic search to retrieve the top K relevant APIs. This semantic search capability enables the system to adapt to the specific context of each user request, enhancing the Copilot’s accuracy, efficiency, and scalability. Once the appropriate APIs are identified, the agent prompts the LLM to parse the top K relevant APIs and finalize the API selection along with the required parameters. This step makes sure that the copilot is fully equipped to run the user’s request effectively.

Finally, the selected API is invoked, and the multi-agent copilot carries out the desired action or retrieves the requested information. The user receives a clear and concise response, along with relevant recommendations or guidance, through the user interface.
Throughout the interaction, users can provide additional information or explicit approvals by using the user feedback node before the copilot performs critical actions. With this human-in-the-loop approach, the system operates with the necessary safeguards and maintains user control over sensitive operations.
Evaluation
The evaluation process for Druva’s generative AI-powered multi-agent copilot focuses on assessing the performance and effectiveness of each critical component of the system. By thoroughly testing individual components such as dynamic API selection, isolated tests on individual agents, and end-to-end functionality, the copilot delivers accurate, reliable, and efficient results to its users.
Evaluation methodology:

Unit testing: Isolated tests are conducted for each component (individual agents, data extraction, API selection) to verify their functionality, performance, and error handling capabilities.
Integration Testing: Tests are performed to validate the seamless integration and communication between the various components of the multi-agent copilot, maintaining data flow and control flow integrity.
System Testing: End-to-end tests are executed on the complete system, simulating real-world user scenarios and workflows to assess the overall functionality, performance, and user experience.

Evaluation results
Choosing the right model for the right task is critical to the system’s performance. The dynamic tool selection represents one of the most critical parts of the system—invoking the correct API is essential for end-to-end solution success. A single incorrect API call can lead to fetching wrong data, which cascades into erroneous results throughout the multi-agent system. To optimize the dynamic tool selection component, various Nova and Anthropic models were tested and benchmarked against the ground truth created using Sonnet 3.7.
The findings showed that even smaller models like Nova Lite and Haiku 3 were able to select the correct API every time. However, these smaller models struggled with parameter parsing such as calling the API with the correct parameters relative to the input question. When parameter parsing accuracy was taken into account, the overall API selection accuracy dropped to 81% for Nova Micro, 88% for Nova Lite, and 93% for Nova Pro. The performance of Haiku 3, Haiku 3.5, and Sonnet 3.5 was comparable, ranging from 91% to 92%. Nova Pro provided an optimal tradeoff between accuracy and latency with an average response time of just over one second. In contrast, Sonnet 3.5 had a latency of eight seconds, although this could be attributed to Sonnet 3.5’s more verbose output, generating an average of 291 tokens compared to Nova Pro’s 86 tokens. The prompts could potentially be optimized to make Sonnet 3.5’s output more concise, thus reducing the latency.
For end-to-end testing of real world scenarios, it is essential to engage human subject matter expert evaluators familiar with the system to assess performance based on completeness, accuracy, and relevance of the solutions. Across 11 challenging questions during the initial development phase, the system achieved scores averaging 3.3 out of 5 across these dimensions. This represented solid performance considering the evaluation was conducted in the early stages of development, providing a strong foundation for future improvements.
By focusing on evaluating each critical component and conducting rigorous end-to-end testing, Druva has made sure that the generative AI-powered multi-agent copilot meets the highest standards of accuracy, reliability, and efficiency. The insights gained from this evaluation process have guided the continuous improvement and optimization of the copilot.

“Druva is at the forefront of leveraging advanced AI technologies to revolutionize the way organizations protect and manage their critical data. Our Generative AI-powered Multi-agent Copilot is a testament to our commitment to delivering innovative solutions that simplify complex processes and enhance customer experiences. By collaborating with the AWS Generative AI Innovation Center, we are embarking on a transformative journey to create an interactive, personalized, and efficient end-to-end experience for our customers. We are excited to harness the power of Amazon Bedrock and our proprietary data to continue reimagining the future of data security and cyber resilience.”- David Gildea, VP of Generative AI at Druva

Conclusion
Druva’s generative AI-powered multi-agent copilot showcases the immense potential of combining structured and unstructured data sources using AI to create next-generation virtual copilots. This innovative approach sets Druva apart from traditional data protection vendors by transforming hours-long manual investigations into instant, AI-powered conversational insights, with 90% of routine data protection tasks executable through natural language interactions, fundamentally redefining customer expectations in the data security space. For organizations in the data security and protection space, this technology enables more efficient operations, enhanced customer engagement, and data-driven decision-making. The insights and intelligence provided by the copilot empower Druva’s stakeholders, including customers, support teams, partners, and executives, to make informed decisions faster, reducing average time-to-resolution for data security issues by up to 70% and accelerating backup troubleshooting from hours to minutes. Although this project focuses on the data protection industry, the underlying principles and methodology can be applied across various domains. With careful design, testing, and continuous improvement, organizations in any industry can benefit from AI-powered copilots that contextualize their data, documents, and content to deliver intelligent and personalized experiences.
This implementation leverages Amazon Bedrock AgentCore Runtime and Amazon Bedrock AgentCore Gateway to provide robust agent orchestration and management capabilities. This approach has the potential to provide intelligent automation and data search capabilities through customizable agents, transforming user interactions with applications to be more natural, efficient, and effective. For those interested in implementing similar functionalities, explore Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases and Amazon Bedrock AgentCore as a fully managed AWS solution.

About the authors
David Gildea With over 25 years of experience in cloud automation and emerging technologies, David has led transformative projects in data management and cloud infrastructure. As the founder and former CEO of CloudRanger, he pioneered innovative solutions to optimize cloud operations, later leading to its acquisition by Druva. Currently, David leads the Labs team in the Office of the CTO, spearheading R&D into Generative AI initiatives across the organization, including projects like Dru Copilot, Dru Investigate, and Amazon Q. His expertise spans technical research, commercial planning, and product development, making him a prominent figure in the field of cloud technology and generative AI.
Tom Nijs is an experienced backend and AI engineer at Druva, driven by a passion for both learning and sharing knowledge. As the Lead Architect for Druva’s Labs team, he channels this passion into developing cutting-edge solutions, leading projects such as Dru Copilot, Dru Investigate, and Dru AI Labs. With a core focus on optimizing systems and harnessing the power of AI, Tom is dedicated to helping teams and developers turn groundbreaking ideas into reality.
Gauhar Bains is a Deep Learning Architect at the AWS Generative AI Innovation Center, where he designs and delivers innovative GenAI solutions for enterprise customers. With a passion for leveraging cutting-edge AI technologies, Gauhar specializes in developing agentic AI applications, and implementing responsible AI practices across diverse industries.
Ayushi Gupta is a Senior Technical Account Manager at AWS who partners with organizations to architect optimal cloud solutions. She specializes in ensuring business-critical applications operate reliably while balancing performance, security, and cost efficiency. With a passion for GenAI innovation, Ayushi helps customers leverage cloud technologies that deliver measurable business value while maintaining robust data protection and compliance standards.
Marius Moisescu is a Machine Learning Engineer at the AWS Generative AI Innovation Center. He works with customers to develop agentic applications. His interests are deep research agents and evaluation of multi agent architectures.
Ahsan Ali is an Senior Applied Scientist at the Amazon Generative AI Innovation Center, where he works with customers from different industry verticals to solve their urgent and expensive problems using Generative AI.
Sandy Farr is an Applied Science Manager at the AWS Generative AI Innovation Center, where he leads a team of scientists, deep learning architects and software engineers to deliver innovative GenAI solutions for AWS customers. Sandy holds a PhD in Physics and has over a decade of experience developing AI/ML, NLP and GenAI solutions for large organizations.
Govindarajan Varadan is a Manager of the Solutions Architecture team at Amazon Web Services (AWS) based out of Silicon Valley in California. He works with AWS customers to help them achieve their business objectives through innovative applications of AI at scale.
Saeideh Shahrokh Esfahani is an Applied Scientist at the Amazon Generative AI Innovation Center, where she focuses on transforming cutting-edge AI technologies into practical solutions that address real-world challenges.

How to Build a Fully Self-Verifying Data Operations AI Agent Using Loc …

In this tutorial, we build a self-verifying DataOps AIAgent that can plan, execute, and test data operations automatically using local Hugging Face models. We design the agent with three intelligent roles: a Planner that creates an execution strategy, an Executor that writes and runs code using pandas, and a Tester that validates the results for accuracy and consistency. By using Microsoft’s Phi-2 model locally in Google Colab, we ensure that the workflow remains efficient, reproducible, and privacy-preserving while demonstrating how LLMs can automate complex data-processing tasks end-to-end. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install -q transformers accelerate bitsandbytes scipy
import json, pandas as pd, numpy as np, torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig

MODEL_NAME = “microsoft/phi-2″

class LocalLLM:
def __init__(self, model_name=MODEL_NAME, use_8bit=False):
print(f”Loading model: {model_name}”)
self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = self.tokenizer.eos_token
model_kwargs = {“device_map”: “auto”, “trust_remote_code”: True}
if use_8bit and torch.cuda.is_available():
model_kwargs[“quantization_config”] = BitsAndBytesConfig(load_in_8bit=True)
else:
model_kwargs[“torch_dtype”] = torch.float32 if not torch.cuda.is_available() else torch.float16
self.model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
self.pipe = pipeline(“text-generation”, model=self.model, tokenizer=self.tokenizer,
max_new_tokens=512, do_sample=True, temperature=0.3, top_p=0.9,
pad_token_id=self.tokenizer.eos_token_id)
print(“✓ Model loaded successfully!n”)

def generate(self, prompt, system_prompt=””, temperature=0.3):
if system_prompt:
full_prompt = f”Instruct: {system_prompt}nn{prompt}nOutput:”
else:
full_prompt = f”Instruct: {prompt}nOutput:”
output = self.pipe(full_prompt, temperature=temperature, do_sample=temperature>0,
return_full_text=False, eos_token_id=self.tokenizer.eos_token_id)
result = output[0][‘generated_text’].strip()
if “Instruct:” in result:
result = result.split(“Instruct:”)[0].strip()
return result

We install the required libraries and load the Phi-2 model locally using Hugging Face Transformers. We create a LocalLLM class that initializes the tokenizer and model, supports optional quantization, and defines a generate method to produce text outputs. We ensure that the model runs smoothly on both CPU and GPU, making it ideal for use on Colab. Check out the FULL CODES here.

Copy CodeCopiedUse a different BrowserPLANNER_PROMPT = “””You are a Data Operations Planner. Create a detailed execution plan as valid JSON.

Return ONLY a JSON object (no other text) with this structure:
{“steps”: [“step 1″,”step 2″],”expected_output”:”description”,”validation_criteria”:[“criteria 1″,”criteria 2″]}”””

EXECUTOR_PROMPT = “””You are a Data Operations Executor. Write Python code using pandas.

Requirements:
– Use pandas (imported as pd) and numpy (imported as np)
– Store final result in variable ‘result’
– Return ONLY Python code, no explanations or markdown”””

TESTER_PROMPT = “””You are a Data Operations Tester. Verify execution results.

Return ONLY a JSON object (no other text) with this structure:
{“passed”:true,”issues”:[“any issues found”],”recommendations”:[“suggestions”]}”””

class DataOpsAgent:
def __init__(self, llm=None):
self.llm = llm or LocalLLM()
self.history = []

def _extract_json(self, text):
try:
return json.loads(text)
except:
start, end = text.find(‘{‘), text.rfind(‘}’)+1
if start >= 0 and end > start:
try:
return json.loads(text[start:end])
except:
pass
return None

We define the system prompts for the Planner, Executor, and Tester roles of our DataOps Agent. We then initialize the DataOpsAgent class with helper methods and a JSON extraction utility to parse structured responses. We prepare the foundation for the agent’s reasoning and execution pipeline. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser def plan(self, task, data_info):
print(“n” + “=”*60)
print(“PHASE 1: PLANNING”)
print(“=”*60)
prompt = f”Task: {task}nnData Information:n{data_info}nnCreate an execution plan as JSON with steps, expected_output, and validation_criteria.”
plan_text = self.llm.generate(prompt, PLANNER_PROMPT, temperature=0.2)
self.history.append((“PLANNER”, plan_text))
plan = self._extract_json(plan_text) or {“steps”:[task],”expected_output”:”Processed data”,”validation_criteria”:[“Result generated”,”No errors”]}
print(f”n Plan Created:”)
print(f” Steps: {len(plan.get(‘steps’, []))}”)
for i, step in enumerate(plan.get(‘steps’, []), 1):
print(f” {i}. {step}”)
print(f” Expected: {plan.get(‘expected_output’, ‘N/A’)}”)
return plan

def execute(self, plan, data_context):
print(“n” + “=”*60)
print(“PHASE 2: EXECUTION”)
print(“=”*60)
steps_text = ‘n’.join(f”{i}. {s}” for i, s in enumerate(plan.get(‘steps’, []), 1))
prompt = f”Task Steps:n{steps_text}nnData available: DataFrame ‘df’n{data_context}nnWrite Python code to execute these steps. Store final result in ‘result’ variable.”
code = self.llm.generate(prompt, EXECUTOR_PROMPT, temperature=0.1)
self.history.append((“EXECUTOR”, code))
if ““`python” in code: code = code.split(““`python”)[1].split(““`”)[0]
elif ““`” in code: code = code.split(““`”)[1].split(““`”)[0]
lines = []
for line in code.split(‘n’):
s = line.strip()
if s and (not s.startswith(‘#’) or ‘import’ in s):
lines.append(line)
code = ‘n’.join(lines).strip()
print(f”n Generated Code:n” + “-“*60)
for i, line in enumerate(code.split(‘n’)[:15],1):
print(f”{i:2}. {line}”)
if len(code.split(‘n’))>15: print(f” … ({len(code.split(‘n’))-15} more lines)”)
print(“-“*60)
return code

We implement the Planning and Execution phases of the agent. We let the Planner create detailed task steps and validation criteria, and then the Executor generates corresponding Python code based on pandas to perform the task. We visualize how the agent autonomously transitions from reasoning to generating actionable code. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef test(self, plan, result, execution_error=None):
print(“n” + “=”*60)
print(“PHASE 3: TESTING & VERIFICATION”)
print(“=”*60)
result_desc = f”EXECUTION ERROR: {execution_error}” if execution_error else f”Result type: {type(result).__name__}n”
if not execution_error:
if isinstance(result, pd.DataFrame):
result_desc += f”Shape: {result.shape}nColumns: {list(result.columns)}nSample:n{result.head(3).to_string()}”
elif isinstance(result, (int,float,str)):
result_desc += f”Value: {result}”
else:
result_desc += f”Value: {str(result)[:200]}”
criteria_text = ‘n’.join(f”- {c}” for c in plan.get(‘validation_criteria’, []))
prompt = f”Validation Criteria:n{criteria_text}nnExpected: {plan.get(‘expected_output’, ‘N/A’)}nnActual Result:n{result_desc}nnEvaluate if result meets criteria. Return JSON with passed (true/false), issues, and recommendations.”
test_result = self.llm.generate(prompt, TESTER_PROMPT, temperature=0.2)
self.history.append((“TESTER”, test_result))
test_json = self._extract_json(test_result) or {“passed”:execution_error is None,”issues”:[“Could not parse test result”],”recommendations”:[“Review manually”]}
print(f”n✓ Test Results:n Status: {‘ PASSED’ if test_json.get(‘passed’) else ‘ FAILED’}”)
if test_json.get(‘issues’):
print(” Issues:”)
for issue in test_json[‘issues’][:3]:
print(f” • {issue}”)
if test_json.get(‘recommendations’):
print(” Recommendations:”)
for rec in test_json[‘recommendations’][:3]:
print(f” • {rec}”)
return test_json

def run(self, task, df=None, data_info=None):
print(“n SELF-VERIFYING DATA-OPS AGENT (Local HF Model)”)
print(f”Task: {task}n”)
if data_info is None and df is not None:
data_info = f”Shape: {df.shape}nColumns: {list(df.columns)}nSample:n{df.head(2).to_string()}”
plan = self.plan(task, data_info)
code = self.execute(plan, data_info)
result, error = None, None
try:
local_vars = {‘pd’: pd, ‘np’: np, ‘df’: df}
exec(code, local_vars)
result = local_vars.get(‘result’)
except Exception as e:
error = str(e)
print(f”n Execution Error: {error}”)
test_result = self.test(plan, result, error)
return {‘plan’: plan,’code’: code,’result’: result,’test’: test_result,’history’: self.history}

We focus on the Testing and Verification phase of our workflow. We let the agent evaluate its own output against predefined validation criteria and summarize the outcome as a structured JSON. We then integrate all three phases, planning, execution, and testing, into a single self-verifying pipeline that ensures complete automation. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef demo_basic(agent):
print(“n” + “#”*60)
print(“# DEMO 1: Sales Data Aggregation”)
print(“#”*60)
df = pd.DataFrame({‘product’:[‘A’,’B’,’A’,’C’,’B’,’A’,’C’],
‘sales’:[100,150,200,80,130,90,110],
‘region’:[‘North’,’South’,’North’,’East’,’South’,’West’,’East’]})
task = “Calculate total sales by product”
output = agent.run(task, df)
if output[‘result’] is not None:
print(f”n Final Result:n{output[‘result’]}”)
return output

def demo_advanced(agent):
print(“n” + “#”*60)
print(“# DEMO 2: Customer Age Analysis”)
print(“#”*60)
df = pd.DataFrame({‘customer_id’:range(1,11),
‘age’:[25,34,45,23,56,38,29,41,52,31],
‘purchases’:[5,12,8,3,15,7,9,11,6,10],
‘spend’:[500,1200,800,300,1500,700,900,1100,600,1000]})
task = “Calculate average spend by age group: young (under 35) and mature (35+)”
output = agent.run(task, df)
if output[‘result’] is not None:
print(f”n Final Result:n{output[‘result’]}”)
return output

if __name__ == “__main__”:
print(” Initializing Local LLM…”)
print(“Using CPU mode for maximum compatibilityn”)
try:
llm = LocalLLM(use_8bit=False)
agent = DataOpsAgent(llm)
demo_basic(agent)
print(“nn”)
demo_advanced(agent)
print(“n” + “=”*60)
print(” Tutorial Complete!”)
print(“=”*60)
print(“nKey Features:”)
print(” • 100% Local – No API calls required”)
print(” • Uses Phi-2 from Microsoft (2.7B params)”)
print(” • Self-verifying 3-phase workflow”)
print(” • Runs on free Google Colab CPU/GPU”)
except Exception as e:
print(f”n Error: {e}”)
print(“Troubleshooting:n1. pip install -q transformers accelerate scipyn2. Restart runtimen3. Try a different model”)

We built two demo examples to test the agent’s capabilities using simple sales and customer datasets. We initialize the model, execute the Data-Ops workflow, and observe the full cycle from planning to validation. We conclude the tutorial by summarizing key benefits and encouraging further experimentation with local models.

In conclusion, we created a fully autonomous and self-verifying DataOps system powered by a local Hugging Face model. We experience how each stage, planning, execution, and testing, seamlessly interacts to produce reliable results without relying on any cloud APIs. This workflow highlights the strength of local LLMs, such as Phi-2, for lightweight automation and inspires us to expand this architecture for more advanced data pipelines, validation frameworks, and multi-agent data systems in the future.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build a Fully Self-Verifying Data Operations AI Agent Using Local Hugging Face Models for Automated Planning, Execution, and Testing appeared first on MarkTechPost.

How Powerful are Diffusion LLMs? Rethinking Generation with Any-Proces …

How powerful are Diffusion LLMs compared to classic autoregressive LLMs, once you treat generation as an algorithm with time and space complexity, not just as a decoding trick? A new research paper from a team researchers from Toyota Technological Institute at Chicago and MIT gives a formal answer. This new research compares Auto-Regressive Models (ARM), Masked Diffusion Models (MDM), and a new family called Any-Process MDM (AP-MDM), using complexity theory and controlled reasoning tasks.

https://arxiv.org/pdf/2510.06190

ARM vs MDM: Same Expressivity, Different Parallel Time

ARM uses next token prediction in a strict left to right order. Prior work already shows that with enough intermediate steps, ARM is Turing complete, so it can represent any computable function in principle, given enough context and compute.

MDM, the discrete diffusion style used in diffusion LLMs, works on a masked sequence. The model starts from a fully masked sequence and iteratively unmasks tokens. It can update many positions in parallel and in any order. MDM is modeled as an encoder only Transformer with context length (S(n)) and decoding steps (T(n)) for an input of size (n).

The research team shows:

MDM can simulate any PRAM (Parallel Random Access Machine) algorithm with parallel time (T(n)) using (O(T(n))) diffusion steps and context (S(n)) proportional to total work.

This makes MDM Turing complete and lets it match ideal parallel time on problems in NC, such as graph connectivity and some context free language tasks, where ARM needs time linear in sequence length.

Diffusion LLMs therefore gain efficiency on parallelizable problems, not extra expressive power by themselves.

Any-Order Generation Has Limited Benefits

A natural question is whether Any-Order Generation is strictly more powerful than left to right generation.

To isolate this, the research team defines an Any-Order MDM (AO-MDM) and a corresponding Masked ARM with the same architecture and similar token budget, but decoding in a fixed left to right way over a sequence padded with masks.

The main result:

Any computation performed by AO-MDM with one token per step and context (S(n)) can be reorganized into a left to right schedule and simulated by a Masked ARM with sequence length (O(S(n))) plus a constant number of extra layers.

In other words, once you control for parallelism and architecture, any order generation alone does not expand the class of problems beyond what ARM can already handle.

Both ARM and AO-MDM also share a space limitation. With context length (S(n)), they cannot efficiently solve problems that require more than roughly (S(n)3) serial time. With polynomial context, they are effectively limited to problems in the class P and cannot handle general NP hard tasks just by test time scaling.

Any-Process Generation and AP-MDM

To go beyond these limits, the research team proposes Any-Process Generation, instantiated as Any-Process MDM (AP-MDM).

AP-MDM keeps the masked diffusion view but extends the transition function with three extra operations, in addition to the usual unmask:

remask: turn an already decoded token back into the mask token M

insert: insert a new mask token at a chosen position

delete: delete a mask token that is no longer needed

These are controlled by a 3 bit vector per position (ct,i = (ct,i[1], ct,i[2], ct,i[3]). The same Transformer backbone predicts both content logits and these control bits.

remask uses the first bit to decide whether to overwrite a position with M, which enables backtracking and self correction.

insert and delete use the second and third bits to add or remove mask tokens, so the sequence length can grow or shrink during decoding.

Architecturally, AP-MDM only adds three small linear heads on top of an encoder only Transformer, so it is easy to add on top of existing MDM style diffusion LLMs.

https://arxiv.org/pdf/2510.06190

The key theoretical result:

AP-MDM can simulate any PRAM algorithm with optimal parallel time and optimal space, using context proportional to the true space (S(n)) rather than total work. With polynomial context, AP-MDM can realize computations in PSPACE, while standard MDM and ARM under the same context budget are restricted to P.

The research team also tried to prove that there exists a constant depth AP-MDM whose generation process cannot be simulated by any constant depth ARM or Masked ARM, under standard complexity assumptions.

Empirical Results: Sudoku, Dyck, Graphs, Parity

The experiments match the theory and make the differences concrete.

Sudoku

Sudoku, generalized to (n2 x n2) grids, is NP complete.

AP-MDM reaches 99.28 percent accuracy with about 1.2 million parameters and only 100 training instances.

An ARM baseline with ordering reaches 87.18 percent using 1.8 million training instances and about 5 times more parameters.

The best AO-MDM baseline reaches 89.49 percent under the same large data regime.

https://arxiv.org/pdf/2510.06190

This shows that editing operations, especially remask, are crucial to exploit test time scaling on hard reasoning tasks.

Dyck languages and coding style constraints

The research also analyzes two sided Dyck k languages, which model matched parentheses and are a core abstraction for code syntax. It proves that fixed ARM models cannot ensure valid generation for arbitrary lengths, while there exists an AP-MDM that generates exactly the Dyck language using insert and remask.

This matches how coding tasks require structure aware edits under global constraints, for example balanced brackets and consistent scopes.

Graph generation and structural editing

For graph editing tasks under global constraints, AP-MDM uses insert, delete and remask to implement a sequence of structured edits over a graph representation. The reported accuracy stays near perfect as graph size scales, while ARM degrades as the graph gets larger.

Parity and length generalization

On parity, AP-MDM learns a local elimination rule by repeatedly deleting pairs of bits, driven by remask and delete. It is trained only on length 2 sequences, then achieves 100 percent generalization to arbitrary lengths. ARM baselines struggle to reach similar generalization even with much longer training sequences.

https://arxiv.org/pdf/2510.06190

Key Takeaways

Any order Masked Diffusion Models are as expressive as autoregressive models once you fix architecture and parallelism, they mainly provide parallel efficiency rather than new computational power.

Masked Diffusion Models can simulate PRAM algorithms and achieve exponential speedup on parallelizable tasks in NC, but with polynomial context they remain effectively limited to problems in class P, similar to autoregressive models.

Any Process MDM extends diffusion LLMs with remask, insert and delete operations, implemented via a three bit control vector per token, and can simulate PRAM with both optimal parallel time and optimal space, reaching PSPACE level expressivity under polynomial context.

On hard reasoning tasks such as generalized Sudoku, Dyck languages, graph editing and parity, AP MDM shows strong empirical advantages, for example achieving about 99.28 percent Sudoku accuracy with only 100 training instances and a much smaller parameter budget than autoregressive and any order MDM baselines.

For domains like coding, mathematics and AI4Science that involve structured edits and revision histories, AP MDM aligns better with the underlying generation processes than next token prediction, and its editing operations are provably hard to simulate with constant depth autoregressive models.

Editorial Comments

Any-Process MDM is an important step because it treats generation as a full algorithm, not just a decoding order. The research work shows that Masked Diffusion Models already match PRAM parallel time, but remain in P under polynomial context, similar to autoregressive models. By adding remask, insert and delete, AP-MDM reaches PSPACE-level expressivity with polynomial context and achieves strong empirical gains on Sudoku, Dyck, graph editing and parity. Overall, AP-MDM makes a strong case that future frontier LLMs should adopt edit-based Any-Process Generation, not just faster autoregression.

Check out the Paper and Repo. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How Powerful are Diffusion LLMs? Rethinking Generation with Any-Process Masked Diffusion Models appeared first on MarkTechPost.

How to Build a Fully Functional Custom GPT-style Conversational AI Loc …

In this tutorial, we build our own custom GPT-style chat system from scratch using a local Hugging Face model. We start by loading a lightweight instruction-tuned model that understands conversational prompts, then wrap it inside a structured chat framework that includes a system role, user memory, and assistant responses. We define how the agent interprets context, constructs messages, and optionally uses small built-in tools to fetch local data or simulated search results. By the end, we have a fully functional, conversational model that behaves like a personalized GPT running. Check out the FULL CODES here. 

Copy CodeCopiedUse a different Browser!pip install transformers accelerate sentencepiece –quiet
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from typing import List, Tuple, Optional
import textwrap, json, os

We begin by installing the essential libraries and importing the required modules. We ensure that the environment has all necessary dependencies, such as transformers, torch, and sentencepiece, ready for use. This setup allows us to work seamlessly with Hugging Face models inside Google Colab. Check out the FULL CODES here. 

Copy CodeCopiedUse a different BrowserMODEL_NAME = “microsoft/Phi-3-mini-4k-instruct”
BASE_SYSTEM_PROMPT = (
“You are a custom GPT running locally. ”
“Follow user instructions carefully. ”
“Be concise and structured. ”
“If something is unclear, say it is unclear. ”
“Prefer practical examples over corporate examples unless explicitly asked. ”
“When asked for code, give runnable code.”
)
MAX_NEW_TOKENS = 256

We configure our model name, define the system prompt that governs the assistant’s behavior, and set token limits. We establish how our custom GPT should respond, concise, structured, and practical. This section defines the foundation of our model’s identity and instruction style. Check out the FULL CODES here. 

Copy CodeCopiedUse a different Browserprint(“Loading model…”)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
device_map=”auto”
)
model.eval()
print(“Model loaded.”)

We load the tokenizer and model from Hugging Face into memory and prepare them for inference. We automatically adjust the device mapping based on available hardware, ensuring GPU acceleration if possible. Once loaded, our model is ready to generate responses. Check out the FULL CODES here. 

Copy CodeCopiedUse a different BrowserConversationHistory = List[Tuple[str, str]]
history: ConversationHistory = [(“system”, BASE_SYSTEM_PROMPT)]

def wrap_text(s: str, w: int = 100) -> str:
return “n”.join(textwrap.wrap(s, width=w))

def build_chat_prompt(history: ConversationHistory, user_msg: str) -> str:
prompt_parts = []
for role, content in history:
if role == “system”:
prompt_parts.append(f”<|system|>n{content}n”)
elif role == “user”:
prompt_parts.append(f”<|user|>n{content}n”)
elif role == “assistant”:
prompt_parts.append(f”<|assistant|>n{content}n”)
prompt_parts.append(f”<|user|>n{user_msg}n”)
prompt_parts.append(“<|assistant|>n”)
return “”.join(prompt_parts)

We initialize the conversation history, starting with a system role, and create a prompt builder to format messages. We define how user and assistant turns are arranged in a consistent conversational structure. This ensures the model always understands the dialogue context correctly. Check out the FULL CODES here. 

Copy CodeCopiedUse a different Browserdef local_tool_router(user_msg: str) -> Optional[str]:
msg = user_msg.strip().lower()
if msg.startswith(“search:”):
query = user_msg.split(“:”, 1)[-1].strip()
return f”Search results about ‘{query}’:n- Key point 1n- Key point 2n- Key point 3″
if msg.startswith(“docs:”):
topic = user_msg.split(“:”, 1)[-1].strip()
return f”Documentation extract on ‘{topic}’:n1. The agent orchestrates tools.n2. The model consumes output.n3. Responses become memory.”
return None

We add a lightweight tool router that extends our GPT’s capability to simulate tasks like search or documentation retrieval. We define logic to detect special prefixes such as “search:” or “docs:” in user queries. This simple agentic design gives our assistant contextual awareness. Check out the FULL CODES here. 

Copy CodeCopiedUse a different Browserdef generate_reply(history: ConversationHistory, user_msg: str) -> str:
tool_context = local_tool_router(user_msg)
if tool_context:
user_msg = user_msg + “nnUseful context:n” + tool_context
prompt = build_chat_prompt(history, user_msg)
inputs = tokenizer(prompt, return_tensors=”pt”).to(model.device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=MAX_NEW_TOKENS,
do_sample=True,
top_p=0.9,
temperature=0.6,
pad_token_id=tokenizer.eos_token_id
)
decoded = tokenizer.decode(output_ids[0], skip_special_tokens=True)
reply = decoded.split(“<|assistant|>”)[-1].strip() if “<|assistant|>” in decoded else decoded[len(prompt):].strip()
history.append((“user”, user_msg))
history.append((“assistant”, reply))
return reply

def save_history(history: ConversationHistory, path: str = “chat_history.json”) -> None:
data = [{“role”: r, “content”: c} for (r, c) in history]
with open(path, “w”) as f:
json.dump(data, f, indent=2)

def load_history(path: str = “chat_history.json”) -> ConversationHistory:
if not os.path.exists(path):
return [(“system”, BASE_SYSTEM_PROMPT)]
with open(path, “r”) as f:
data = json.load(f)
return [(item[“role”], item[“content”]) for item in data]

We define the primary reply generation function, which combines history, context, and model inference to produce coherent outputs. We also add functions to save and load past conversations for persistence. This snippet forms the operational core of our custom GPT. Check out the FULL CODES here. 

Copy CodeCopiedUse a different Browserprint(“n— Demo turn 1 —“)
demo_reply_1 = generate_reply(history, “Explain what this custom GPT setup is doing in 5 bullet points.”)
print(wrap_text(demo_reply_1))

print(“n— Demo turn 2 —“)
demo_reply_2 = generate_reply(history, “search: agentic ai with local models”)
print(wrap_text(demo_reply_2))

def interactive_chat():
print(“nChat ready. Type ‘exit’ to stop.”)
while True:
try:
user_msg = input(“nUser: “).strip()
except EOFError:
break
if user_msg.lower() in (“exit”, “quit”, “q”):
break
reply = generate_reply(history, user_msg)
print(“nAssistant:n” + wrap_text(reply))

# interactive_chat()
print(“nCustom GPT initialized successfully.”)

We test the entire setup by running demo prompts and displaying generated responses. We also create an optional interactive chat loop to converse directly with the assistant. By the end, we confirm that our custom GPT runs locally and responds intelligently in real time.

In conclusion, we designed and executed a custom conversational agent that mirrors GPT-style reasoning without relying on any external services. We saw how local models can be made interactive through prompt orchestration, lightweight tool routing, and conversational memory management. This approach enables us to understand the internal logic behind commercial GPT systems. It empowers us to experiment with our own rules, behaviors, and integrations in a transparent and fully offline manner.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers appeared first on MarkTechPost.

How to Build an End-to-End Interactive Analytics Dashboard Using PyGWa …

In this tutorial, we explore the advanced capabilities of PyGWalker, a powerful tool for visual data analysis that integrates seamlessly with pandas. We begin by generating a realistic e-commerce dataset enriched with time, demographic, and marketing features to mimic real-world business data. We then prepare multiple analytical views, including daily sales, category performance, and customer segment summaries. Finally, we use PyGWalker to interactively explore patterns, correlations, and trends across these dimensions through intuitive drag-and-drop visualizations. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser!pip install pygwalker pandas numpy scikit-learn

import pandas as pd
import numpy as np
import pygwalker as pyg
from datetime import datetime, timedelta

We begin by setting up our environment, installing all necessary dependencies, and importing essential libraries, including pandas, numpy, and pygwalker. We ensure that everything is ready for building our interactive data exploration workflow in Colab. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdef generate_advanced_dataset():
np.random.seed(42)
start_date = datetime(2022, 1, 1)
dates = [start_date + timedelta(days=x) for x in range(730)]
categories = [‘Electronics’, ‘Clothing’, ‘Home & Garden’, ‘Sports’, ‘Books’]
products = {
‘Electronics’: [‘Laptop’, ‘Smartphone’, ‘Headphones’, ‘Tablet’, ‘Smartwatch’],
‘Clothing’: [‘T-Shirt’, ‘Jeans’, ‘Dress’, ‘Jacket’, ‘Sneakers’],
‘Home & Garden’: [‘Furniture’, ‘Lamp’, ‘Rug’, ‘Plant’, ‘Cookware’],
‘Sports’: [‘Yoga Mat’, ‘Dumbbell’, ‘Running Shoes’, ‘Bicycle’, ‘Tennis Racket’],
‘Books’: [‘Fiction’, ‘Non-Fiction’, ‘Biography’, ‘Science’, ‘History’]
}
n_transactions = 5000
data = []
for _ in range(n_transactions):
date = np.random.choice(dates)
category = np.random.choice(categories)
product = np.random.choice(products[category])
base_prices = {
‘Electronics’: (200, 1500),
‘Clothing’: (20, 150),
‘Home & Garden’: (30, 500),
‘Sports’: (25, 300),
‘Books’: (10, 50)
}
price = np.random.uniform(*base_prices[category])
quantity = np.random.choice([1, 1, 1, 2, 2, 3], p=[0.5, 0.2, 0.15, 0.1, 0.03, 0.02])
customer_segment = np.random.choice([‘Premium’, ‘Standard’, ‘Budget’], p=[0.2, 0.5, 0.3])
age_group = np.random.choice([’18-25′, ’26-35′, ’36-45′, ’46-55′, ’56+’])
region = np.random.choice([‘North’, ‘South’, ‘East’, ‘West’, ‘Central’])
month = date.month
seasonal_factor = 1.0
if month in [11, 12]:
seasonal_factor = 1.5
elif month in [6, 7]:
seasonal_factor = 1.2
revenue = price * quantity * seasonal_factor
discount = np.random.choice([0, 5, 10, 15, 20, 25], p=[0.4, 0.2, 0.15, 0.15, 0.07, 0.03])
marketing_channel = np.random.choice([‘Organic’, ‘Social Media’, ‘Email’, ‘Paid Ads’])
base_satisfaction = 4.0
if customer_segment == ‘Premium’:
base_satisfaction += 0.5
if discount > 15:
base_satisfaction += 0.3
satisfaction = np.clip(base_satisfaction + np.random.normal(0, 0.5), 1, 5)
data.append({
‘Date’: date, ‘Category’: category, ‘Product’: product, ‘Price’: round(price, 2),
‘Quantity’: quantity, ‘Revenue’: round(revenue, 2), ‘Customer_Segment’: customer_segment,
‘Age_Group’: age_group, ‘Region’: region, ‘Discount_%’: discount,
‘Marketing_Channel’: marketing_channel, ‘Customer_Satisfaction’: round(satisfaction, 2),
‘Month’: date.strftime(‘%B’), ‘Year’: date.year, ‘Quarter’: f’Q{(date.month-1)//3 + 1}’
})
df = pd.DataFrame(data)
df[‘Profit_Margin’] = round(df[‘Revenue’] * (1 – df[‘Discount_%’]/100) * 0.3, 2)
df[‘Days_Since_Start’] = (df[‘Date’] – df[‘Date’].min()).dt.days
return df

We design a function to generate a comprehensive e-commerce dataset that mirrors real-world business conditions. We include product categories, customer demographics, seasonal effects, and satisfaction levels, ensuring that our data is diverse and analytically rich. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserprint(“Generating advanced e-commerce dataset…”)
df = generate_advanced_dataset()
print(f”nDataset Overview:”)
print(f”Total Transactions: {len(df)}”)
print(f”Date Range: {df[‘Date’].min()} to {df[‘Date’].max()}”)
print(f”Total Revenue: ${df[‘Revenue’].sum():,.2f}”)
print(f”nColumns: {list(df.columns)}”)
print(“nFirst few rows:”)
print(df.head())

We execute the dataset generation function and display key insights, including total transactions, revenue range, and sample records. We get a clear snapshot of the data’s structure and confirm that it’s suitable for detailed analysis. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserdaily_sales = df.groupby(‘Date’).agg({
‘Revenue’: ‘sum’, ‘Quantity’: ‘sum’, ‘Customer_Satisfaction’: ‘mean’
}).reset_index()

category_analysis = df.groupby(‘Category’).agg({
‘Revenue’: [‘sum’, ‘mean’], ‘Quantity’: ‘sum’, ‘Customer_Satisfaction’: ‘mean’, ‘Profit_Margin’: ‘sum’
}).reset_index()
category_analysis.columns = [‘Category’, ‘Total_Revenue’, ‘Avg_Order_Value’,
‘Total_Quantity’, ‘Avg_Satisfaction’, ‘Total_Profit’]

segment_analysis = df.groupby([‘Customer_Segment’, ‘Region’]).agg({
‘Revenue’: ‘sum’, ‘Customer_Satisfaction’: ‘mean’
}).reset_index()

print(“n” + “=”*50)
print(“DATASET READY FOR PYGWALKER VISUALIZATION”)
print(“=”*50)

We perform data aggregations to prepare multiple analytical perspectives, including time-based trends, category-level summaries, and performance metrics for customer segments. We organize this information to make it easily visualizable in PyGWalker. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserprint(“n Launching PyGWalker Interactive Interface…”)
walker = pyg.walk(
df,
spec=”./pygwalker_config.json”,
use_kernel_calc=True,
theme_key=’g2′
)

print(“n PyGWalker is now running!”)
print(” Try creating these visualizations:”)
print(” – Revenue trend over time (line chart)”)
print(” – Category distribution (pie chart)”)
print(” – Price vs Satisfaction scatter plot”)
print(” – Regional sales heatmap”)
print(” – Discount effectiveness analysis”)

We launch the PyGWalker interactive interface to visually explore our dataset. We create meaningful charts, uncover trends in sales, satisfaction, and pricing, and observe how interactive visualization enhances our analytical understanding.

Data View

Visualization

Chat with Data

In conclusion, we developed a comprehensive data visualization workflow using PyGWalker, encompassing dataset generation, feature engineering, multidimensional analysis, and interactive exploration. We experience how PyGWalker transforms raw tabular data into rich, exploratory dashboards without needing complex code or BI tools. Through this exercise, we strengthen our ability to derive insights quickly, experiment visually, and connect data storytelling directly to practical business understanding.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Build an End-to-End Interactive Analytics Dashboard Using PyGWalker Features for Insightful Data Exploration appeared first on MarkTechPost.

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compa …

How can we get large model level multimodal reasoning for documents, charts and videos while running only a 3B class model in production? Baidu has added a new model to the ERNIE-4.5 open source family. ERNIE-4.5-VL-28B-A3B-Thinking is a vision language model that focuses on document, chart and video understanding with a small active parameter budget.

https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking

Architecture and training setup

ERNIE-4.5-VL-28B-A3B-Thinking is built on the ERNIE-4.5-VL-28B-A3B Mixture of Experts architecture. The family uses a heterogeneous multimodal MoE design with shared parameters across text and vision plus modality specific experts. At the model level, it has 30B total parameters, while the architecture is in the 28B-VL branch, and only 3B parameters are activated per token through an A3B routing scheme. This gives the compute and memory profile of a 3B class model while keeping a larger capacity pool for reasoning.

The model goes through an additional mid training stage on a large visual language reasoning corpus. This stage is designed to improve representation power and semantic alignment between visual and language modalities, which matters for dense text in documents and fine structures in charts. On top of that, ERNIE-4.5-VL-28B-A3B-Thinking uses multimodal reinforcement learning on verifiable tasks, with GSPO and IcePop strategies and dynamic difficulty sampling to stabilize MoE training and push the model toward hard examples.

Key capabilities

Baidu researchers position this model as a lightweight multimodal reasoning engine that can activate only 3B parameters while approaching the behavior of larger flagship systems on internal benchmarks. Officially listed capabilities include visual reasoning, STEM reasoning, visual grounding, Thinking with Images, tool utilization and video understanding.

Thinking with Images is at the core. The model can zoom into regions, reason on cropped views and then integrate those local observations into a final answer. Tool utilization extends this with calls to tools such as image search when internal knowledge is not enough. Both features are exposed as part of the reasoning parser and tool call parser path in deployment.

Performance and positioning

The lightweight vision language model ERNIE-4.5-VL-28B-A3B achieves competitive or superior performance compared to Qwen-2.5-VL-7B and Qwen-2.5-VL-32B on many benchmarks, while using fewer activation parameters. ERNIE-4.5-VL models also support both thinking and non thinking modes, with the thinking mode improving reasoning centered tasks while keeping strong perception quality.

For the specific Thinking variant, Baidu researchers describe ERNIE-4.5-VL-28B-A3B-Thinking as closely matching the performance of industry flagship models across internal multimodal benchmarks.

Key Takeaways

ERNIE-4.5-VL-28B-A3B-Thinking uses a Mixture of Experts architecture with about 30B total parameters and only 3B active parameters per token to deliver efficient multimodal reasoning.

The model is optimized for document, chart and video understanding through an additional visual language reasoning mid training stage and multimodal reinforcement learning using GSPO, IcePop and dynamic difficulty sampling.

Thinking with Images lets the model iteratively zoom into image regions and reason over crops, while tool utilization enables calls to external tools such as image search for long tail recognition.

It demonstrate strong performance on analytics style charts, STEM circuit problems, visual grounding with JSON bounding boxes and video segment localization with timestamped answers.

The model is released under Apache License 2.0, supports deployment via transformers, vLLM and FastDeploy, and can be fine tuned with ERNIEKit using SFT, LoRA and DPO for commercial multimodal applications.

Comparison Table

ModelTraining stageTotal / active parametersModalitiesContext length (tokens)ERNIE-4.5-VL-28B-A3B-BasePretraining28B total, 3B active per tokenText, Vision131,072ERNIE-4.5-VL-28B-A3B (PT)Posttraining chat model28B total, 3B active per tokenText, Vision131,072ERNIE-4.5-VL-28B-A3B-ThinkingReasoning oriented mid training on ERNIE-4.5-VL-28B-A3B28B architecture, 3B active per token, HF model size 30B paramsText, Vision131,072 (FastDeploy example uses 131,072 max model length)Qwen2.5-VL-7B-InstructPosttraining vision language model≈8B total (7B class)Text, Image, Video32,768 text positions in config (max_position_embeddings)Qwen2.5-VL-32B-InstructPosttraining plus reinforcement tuned large VL model33B totalText, Image, Video32,768 text positions (same Qwen2.5-VLTextConfig family)

Editorial Comments

ERNIE-4.5-VL-28B-A3B-Thinking is a practical release for teams that want multimodal reasoning on documents, charts and videos with only 3B activated parameters, while still using a Mixture-of-Experts architecture with about 30B total parameters and Apache License 2.0. It connects Thinking with Images, tool utilization and multimodal reinforcement learning into a deployable stack that directly targets real world analytics and understanding workloads.

Check out the Repo, Model Weights and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking: An Open-Source and Compact Multimodal Reasoning Model Under the ERNIE-4.5 Family appeared first on MarkTechPost.

How to Reduce Cost and Latency of Your RAG Application Using Semantic …

Semantic caching in LLM (Large Language Model) applications optimizes performance by storing and reusing responses based on semantic similarity rather than exact text matches. When a new query arrives, it’s converted into an embedding and compared with cached ones using similarity search. If a close match is found (above a similarity threshold), the cached response is returned instantly—skipping the expensive retrieval and generation process. Otherwise, the full RAG pipeline runs, and the new query-response pair is added to the cache for future use.

In a RAG setup, semantic caching typically saves responses only for questions that have actually been asked, not every possible query. This helps reduce latency and API costs for repeated or slightly reworded questions. In this article, we’ll take a look at a short example demonstrating how caching can significantly lower both cost and response time in LLM-based applications. Check out the FULL CODES here.

How Semantic Caching in LLM Works

Semantic caching functions by storing and retrieving responses based on the meaning of user queries rather than their exact wording. Each incoming query is converted into a vector embedding that represents its semantic content. The system then performs a similarity search—often using Approximate Nearest Neighbor (ANN) techniques—to compare this embedding with those already stored in the cache. 

If a sufficiently similar query-response pair exists (i.e., its similarity score exceeds a defined threshold), the cached response is returned immediately, bypassing expensive retrieval or generation steps. Otherwise, the full RAG pipeline executes, retrieving documents and generating a new answer, which is then stored in the cache for future use. Check out the FULL CODES here.

What Gets Cached in Memory

In a RAG application, semantic caching only stores responses for queries that have actually been processed by the system—there’s no pre-caching of all possible questions. Each query that reaches the LLM and produces an answer can create a cache entry containing the query’s embedding and corresponding response. 

Depending on the system’s design, the cache may store just the final LLM outputs, the retrieved documents, or both. To maintain efficiency, cache entries are managed through policies like time-to-live (TTL) expiration or Least Recently Used (LRU) eviction, ensuring that only recent or frequently accessed queries remain in memory over time. Check out the FULL CODES here.

How Semantic Caching Works: Explained with an example

Installing dependencies

Copy CodeCopiedUse a different Browserpip install openai numpy

Setting up the dependencies

Copy CodeCopiedUse a different Browserimport os
from getpass import getpass
os.environ[‘OPENAI_API_KEY’] = getpass(‘Enter OpenAI API Key: ‘)

For this tutorial, we will be using OpenAI, but you can use any LLM provider.

Copy CodeCopiedUse a different Browserfrom openai import OpenAI
client = OpenAI()

Running Repeated Queries Without Caching

In this section, we run the same query 10 times directly through the GPT-4.1 model to observe how long it takes when no caching mechanism is applied. Each call triggers a full LLM computation and response generation, leading to repetitive processing for identical inputs. Check out the FULL CODES here.

This helps establish a baseline for total time and cost before we implement semantic caching in the next part.

Copy CodeCopiedUse a different Browserimport time
def ask_gpt(query):
start = time.time()
response = client.responses.create(
model=”gpt-4.1″,
input=query
)
end = time.time()
return response.output[0].content[0].text, end – start

Copy CodeCopiedUse a different Browserquery = “Explain the concept of semantic caching in just 2 lines.”
total_time = 0

for i in range(10):
_, duration = ask_gpt(query)
total_time += duration
print(f”Run {i+1} took {duration:.2f} seconds”)

print(f”nTotal time for 10 runs: {total_time:.2f} seconds”)

Even though the query remains the same, every call still takes between 1–3 seconds, resulting in a total of ~22 seconds for 10 runs. This inefficiency highlights why semantic caching can be so valuable — it allows us to reuse previous responses for semantically identical queries and save both time and API cost. Check out the FULL CODES here.

Implementing Semantic Caching for Faster Responses

In this section, we enhance the previous setup by introducing semantic caching, which allows our application to reuse responses for semantically similar queries instead of repeatedly calling the GPT-4.1 API.

Here’s how it works: each incoming query is converted into a vector embedding using the text-embedding-3-small model. This embedding captures the semantic meaning of the text. When a new query arrives, we calculate its cosine similarity with embeddings already stored in our cache. If a match is found with a similarity score above the defined threshold (e.g., 0.85), the system instantly returns the cached response — avoiding another API call.

If no sufficiently similar query exists in the cache, the model generates a fresh response, which is then stored along with its embedding for future use. Over time, this approach dramatically reduces both response time and API costs, especially for frequently asked or rephrased queries. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browserimport numpy as np
from numpy.linalg import norm
semantic_cache = []

def get_embedding(text):
emb = client.embeddings.create(model=”text-embedding-3-small”, input=text)
return np.array(emb.data[0].embedding)
def cosine_similarity(a, b):
return np.dot(a, b) / (norm(a) * norm(b))

def ask_gpt_with_cache(query, threshold=0.85):
query_embedding = get_embedding(query)

# Check similarity with existing cache
for cached_query, cached_emb, cached_resp in semantic_cache:
sim = cosine_similarity(query_embedding, cached_emb)
if sim > threshold:
print(f” Using cached response (similarity: {sim:.2f})”)
return cached_resp, 0.0 # no API time

# Otherwise, call GPT
start = time.time()
response = client.responses.create(
model=”gpt-4.1″,
input=query
)
end = time.time()
text = response.output[0].content[0].text

# Store in cache
semantic_cache.append((query, query_embedding, text))
return text, end – start

Copy CodeCopiedUse a different Browserqueries = [
“Explain semantic caching in simple terms.”,
“What is semantic caching and how does it work?”,
“How does caching work in LLMs?”,
“Tell me about semantic caching for LLMs.”,
“Explain semantic caching simply.”,
]

total_time = 0
for q in queries:
resp, t = ask_gpt_with_cache(q)
total_time += t
print(f” Query took {t:.2f} secondsn”)

print(f”nTotal time with caching: {total_time:.2f} seconds”)

In the output, the first query took around 8 seconds as there was no cache and the model had to generate a fresh response. When a similar question was asked next, the system identified a high semantic similarity (0.86) and instantly reused the cached answer, saving time. Some queries, like “How does caching work in LLMs?” and “Tell me about semantic caching for LLMs,” were sufficiently different, so the model generated new responses, each taking over 10 seconds. The final query was nearly identical to the first one (similarity 0.97) and was served from cache instantly.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post How to Reduce Cost and Latency of Your RAG Application Using Semantic LLM Caching appeared first on MarkTechPost.

Maya1: A New Open Source 3B Voice Model For Expressive Text To Speech …

Maya Research has released Maya1, a 3B parameter text to speech model that turns text plus a short description into controllable, expressive speech while running in real time on a single GPU.

What Maya1 Actually Does?

Maya1 is a state of the art speech model for expressive voice generation. It is built to capture real human emotion and precise voice design from text inputs.

The core interface has 2 inputs:

A natural language voice description, for example ‘Female voice in her 20s with a British accent, energetic, clear diction” or “Demon character, male voice, low pitch, gravelly timbre, slow pacing’.

The text that should be spoken

The model combines both signals and generates audio that matches the content and the described style. You can also insert inline emotion tags inside the text, such as <laugh>, <sigh>, <whisper>, <angry>, <giggle>, <gasp>, <cry> and more than 20 emotions.

Maya1 outputs 24 kHz mono audio and supports real time streaming, which makes it suitable for assistants, interactive agents, games, podcasts and live content.

The Maya Research team claims that the model outperforms top proprietary systems while remaining fully open source under the Apache 2.0 license.

Architecture and SNAC Codec

Maya1 is a 3B parameter decoder only transformer with a Llama style backbone. Instead of predicting raw waveforms, it predicts tokens from a neural audio codec named SNAC.

The generation flow is

text → tokenize → generate SNAC codes (7 tokens per frame) → decode → 24 kHz audio

SNAC uses a multi scale hierarchical structure at about 12, 23 and 47 Hz. This keeps the autoregressive sequence compact while preserving detail. The codec is designed for real time streaming at about 0.98 kbps.

The important point is that the transformer operates on discrete codec tokens instead of raw samples. A separate SNAC decoder, for example hubertsiuzdak/snac_24khz, reconstructs the waveform. This separation makes generation more efficient and easier to scale than direct waveform prediction.

Training Data And Voice Conditioning

Maya1 is pretrained on an internet scale English speech corpus to learn broad acoustic coverage and natural coarticulation. It is then fine tuned on a curated proprietary dataset of studio recordings that include human verified voice descriptions, more than 20 emotion tags per sample, multiple English accents, and character or role variations.

The documented data pipeline includes:

24 kHz mono resampling with about minus 23 LUFS loudness

Voice activity detection with silence trimming between 1 and 14 seconds

Forced alignment using Montreal Forced Aligner for phrase boundaries

MinHash LSH text deduplication

Chromaprint based audio deduplication

SNAC encoding with 7 token frame packing

The Maya Research team evaluated several ways to condition the model on a voice description. Simple colon formats and key value tag formats either caused the model to speak the description or did not generalize well. The best performing format uses an XML style attribute wrapper that encodes the description and text in a natural way while remaining robust.

In practice, this means developers can describe voices in free form text, close to how they would brief a voice actor, instead of learning a custom parameter schema.

https://huggingface.co/maya-research/maya1

Inference And Deployment On A Single GPU

The reference Python script on Hugging Face loads the model with AutoModelForCausalLM.from_pretrained(“maya-research/maya1″, torch_dtype=torch.bfloat16, device_map=”auto”) and uses the SNAC decoder from SNAC.from_pretrained(“hubertsiuzdak/snac_24khz”).

The Maya Research team recommends a single GPU with 16 GB or more of VRAM, for example A100, H100 or a consumer RTX 4090 class card.

For production, they provide a vllm_streaming_inference.py script that integrates with vLLM. It supports Automatic Prefix Caching for repeated voice descriptions, a WebAudio ring buffer, multi GPU scaling and sub 100 millisecond latency targets for real time use.

Beyond the core repository, they have released:

A Hugging Face Space that exposes an interactive browser demo where users enter text and voice descriptions and listen to output

GGUF quantized variants of Maya1 for lighter deployments using llama.cpp

A ComfyUI node that wraps Maya1 as a single node, with emotion tag helpers and SNAC integration

These projects reuse the official model weights and interface, so they stay consistent with the main implementation.

Key Takeaways

Maya1 is a 3B parameter, decoder only, Llama style text to speech model that predicts SNAC neural codec tokens instead of raw waveforms, and outputs 24 kHz mono audio with streaming support.

The model takes 2 inputs, a natural language voice description and the target text, and supports more than 20 inline emotion tags such as <laugh>, <cry>, <whisper> and <gasp> for local control of expressiveness.

Maya1 is trained with a pipeline that combines large scale English pretraining and studio quality fine tuning with loudness normalization, voice activity detection, forced alignment, text deduplication, audio deduplication and SNAC encoding.

The reference implementation runs on a single 16 GB plus GPU using torch_dtype=torch.bfloat16, integrates with a SNAC decoder, and has a vLLM based streaming server with Automatic Prefix Caching for low latency deployment.

Maya1 is released under the Apache 2.0 license, with official weights, Hugging Face Space demo, GGUF quantized variants and ComfyUI integration, which makes expressive, emotion rich, controllable text to speech accessible for commercial and local use.

Editorial Comments

Maya1 pushes open source text to speech into territory that was previously dominated by proprietary APIs. A 3B parameter Llama style decoder that predicts SNAC codec tokens, runs on a single 16 GB GPU with vLLM streaming and Automatic Prefix Caching, and exposes more than 20 inline emotions with natural language voice design, is a practical building block for real time agents, games and tools. Overall, Maya1 shows that expressive, controllable TTS can be both open and production ready.

Check out the Model Weights and Demo. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Maya1: A New Open Source 3B Voice Model For Expressive Text To Speech On A Single GPU appeared first on MarkTechPost.

Meta AI Releases Omnilingual ASR: A Suite of Open-Source Multilingual …

How do you build a single speech recognition system that can understand 1,000’s of languages including many that never had working ASR (automatic speech recognition) models before? Meta AI has released Omnilingual ASR, an open source speech recognition suite that scales to more than 1,600 languages and can be extended to unseen languages with only a few speech text examples, without retraining the model.

Data and language coverage

The supervised training data comes from a combined corpus called AllASR. AllASR contains 120,710 hours of labeled speech paired with transcripts across 1,690 languages. This corpus merges several sources, including open source datasets, internal and licensed corpora, partner created data, and a commissioned collection called the Omnilingual ASR Corpus.

The Omnilingual ASR Corpus contributes 3,350 hours of speech for 348 languages, with data collected through field work with local organizations and speakers in regions such as Africa and South Asia. Prompts are open ended, so speakers produce natural monologues in their own language instead of reading fixed sentences, which gives more realistic acoustic and lexical variation.

https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

For self supervised pre training, the wav2vec 2.0 encoders are trained on a large unlabeled speech corpus. The pre training dataset contains 3.84M hours of speech with language identification across 1,239 languages, plus another 460K hours without language identification. The total unlabeled audio used for pre training is therefore about 4.3M hours. This is still significantly smaller than the 12M hours used by USM, which makes the reported results more interesting from a data efficiency perspective.

https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

Model family

Omnilingual ASR exposes 3 main model families that all share the same wav2vec 2.0 speech encoder backbone:

SSL encoders (OmniASR W2V)Self supervised wav2vec 2.0 encoders with the following parameter counts• omniASR_W2V_300M with 317,390,592 parameters• omniASR_W2V_1B with 965,514,752 parameters• omniASR_W2V_3B with 3,064,124,672 parameters• omniASR_W2V_7B with 6,488,487,168 parameters. These models are trained with the standard wav2vec 2.0 contrastive objective. After training, the quantizer is discarded and the encoder is used as a speech representation backbone.

CTC (connectionist temporal classification) ASR modelsCTC models add a simple linear layer on top of the encoder and train end to end with a character level CTC loss. The released CTC models range from 325,494,996 parameters to 6,504,786,132 parameters and reach real time factors as low as 0.001 for the 300M model on A100 for 30 second audio with batch size 1.

LLM ASR modelsLLM ASR stacks a Transformer decoder on top of the wav2vec 2.0 encoder. The decoder is a language model like Transformer that operates on character level tokens plus special tokens such as <BOS> and <EOS>. Training uses standard next token prediction on sequences of the form gs(x), gt(<BOS>), gt(y), gt(<EOS>) where gs is the speech encoder and gt is the text embedding matrix. The LLM ASR family ranges from about 1.63B parameters for omniASR_LLM_300M to 7,801,041,536 parameters for omniASR_LLM_7B. A separate omniASR_LLM_7B_ZS checkpoint with 7,810,900,608 parameters is used for zero shot ASR.

All LLM ASR models support optional language conditioning. Languages are represented as {language_code}_{script} such as eng_Latn for English in Latin script or cmn_Hans for Mandarin Chinese in Simplified Chinese script. A learned embedding for the language script identifier is injected into the decoder input. In training, the language ID token is sometimes dropped, so the model can also operate without explicit language tags at inference.

Zero shot ASR with context examples and SONAR

The supervised models cover more than 1,600 languages. However, many languages still have no transcribed ASR data. To handle these cases, Omnilingual ASR extends the LLM ASR model with a zero shot mode trained with context examples.

During training for the zero shot variant, the decoder consumes N + 1 speech text pairs from the same language. The first N pairs act as context and the final pair is the target. All pairs are embedded with the speech encoder and text embedding matrix, then concatenated into a single decoder input sequence. The loss is still next token prediction on the target transcription. This teaches the decoder to infer the mapping from speech to text in a given language from a small prompt of in language examples.

At inference, the omniASR_LLM_7B_ZS model can receive a few speech text examples from any language, including languages not present in training, and then transcribe new utterances in that language without updating weights. This is in context learning for ASR.

The system includes an example retrieval mechanism based on SONAR, a multilingual multimodal encoder that projects audio and text into a shared embedding space. The target audio is embedded once, then nearest neighbor search over a database of speech text pairs selects the most relevant examples to include in the context window. This SONAR based selection improves zero shot performance compared with random example selection or simple text similarity.

https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

Quality and benchmarks

The omniASR_LLM_7B model achieves character error rate below 10 percent for 78 percent of the more than 1,600 supported languages.

The research team reports that on multilingual benchmarks such as FLEURS 102, the 7B LLM ASR model outperforms the 7B CTC models and also surpasses Google USM variants in average character error rate, despite using about 4.3M unlabeled hours instead of 12M and a simpler pre training pipeline. This suggests that scaling the wav2vec 2.0 encoder and adding an LLM style decoder is an effective path for high coverage multilingual ASR.

Key Takeaways

Omnilingual ASR provides open source ASR coverage for more than 1,600 languages and can generalize to more than 5,400 languages using zero shot in context learning.

The models are built on large scale wav2vec 2.0 encoders trained on about 4.3M hours of unlabeled audio from 1,239 labeled languages plus additional unlabeled speech.

The suite includes wav2vec 2.0 encoders, CTC ASR, LLM ASR, and a dedicated zero shot LLM ASR model, with encoder sizes from 300M to 7B parameters and LLM ASR up to about 7.8B parameters.

The 7B LLM ASR model achieves character error rate below 10 percent on 78 percent of the more than 1,600 supported languages, which is competitive with or better than prior multilingual systems in low resource settings.

Editorial Comments

Omnilingual ASR is a significant systems level contribution because it treats multilingual ASR as an extensible framework, not a fixed language list, combining a 7B wav2vec 2.0 encoder, CTC and LLM ASR decoders, and a zero shot LLM ASR model that can adapt to new languages with a few in context examples, while achieving character error rate below 10 percent on 78 percent of more than 1,600 supported languages and releasing everything under Apache 2.0 and CC BY 4.0. Overall, this launch establishes Omnilingual ASR as the most extensible open source speech recognition model currently available.

Check out the Paper, Repo and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
The post Meta AI Releases Omnilingual ASR: A Suite of Open-Source Multilingual Speech Recognition Models for 1600+ Languages appeared first on MarkTechPost.